Guidance is hard.
Seriously, you’d think it would be easy. You’d think you say things like, don’t shrink your database, most tables should have a clustered index, never go against a Sicilian when death is on the line, don’t mix sharks and tornados, and that would be it. You’d be done. But it’s not that easy. Even worse, it’s SHOCKINGLY easy to get stuff wrong.
An example.
I was looking at information over at Microsoft Developers Network (MSDN) in the SQL Server Books Online. I was reading through information about wait types when I found this little beauty:
Occurs when trying to synchronize the query processor exchange iterator.
Know what that is? Yeah, OK, a few of you who memorize wait stats do (shut up Tom), but most of us won’t recognize the CXPACKET wait. Maybe if Bob Ward explained it:
Used to synchronize threads involved in a parallel query. This wait type only means a parallel query is executing.
Oh, well, heck, that makes sense. But that’s not what I’m going off about here. No. What I’m freaked about is the next sentence over at Microsoft:
You may consider lowering the degree of parallelism if contention on this wait type becomes a problem.
Go run this query:
SELECT * FROM sys.dm_os_wait_stats;
(or you can use sys.dm_db_wait_stats on a WASD database)
There’s a pretty good chance, on most servers that have multiple CPUs, that you’re going to see CXPACKET within the top 5 or maybe top 10 waits. Now, using Microsoft’s guidance, you should immediately go and reduce the parallelism on your server, right?
NO! No, no, no, no, no, no. NO!
There are a ton of things you should do first. For example, as we see from Bob Ward’s explanation, this just indicates that you have parallel queries. It doesn’t mean that you have a problem with parallel queries. Parallel queries are good. Some people (Adam, looking at you here) go out of their way to try to get parallel queries (where appropriate). So no, you shouldn’t be setting MAXDOP = 1 the first time you see CXPACKET waits.
Instead, check to see if you have other indications of excessive CPU usage. Do you have queues on the CPU? Do you have queries that are using excessive amounts of CPU?
Let’s say you do have high CPU. Time for MAXDOP = 1 right? No! Stop! Let’s try this first. Let’s go see what the Cost Threshold for Parallelism is set to on your server. The default is 5. That’s an INSANELY low number. It’s using the cost estimates of the query to arrive at which plans might benefit from parallelism. So, if we think we’re suffering from excessive parallelism, instead of turning it off, let’s first try raising the threshold a bit, say to 35. Run with that for a week on the server and see how things stand now. We can also try, oh, I don’t know, tuning the query, making sure we have good indexes, making sure we have good statistics, in short, a whole bunch of stuff before we resort to MAXDOP = 1.
Guidance is hard. But you really can’t afford too be this far off the mark. Go and read through the guidance offered by Bob Ward. It’s not one sentence. It can’t be. Offering guidance requires quite a lot of information because there is seldom a one sentence answer that adequately covers guiding someone through a problem, or even identifying if you have a problem. We, the crazy people who try to tell others what to do, need to be careful when providing glib and simple answers to complex problems.
Oh, and avoid land wars in Asia too.
UPDATE: While this was good information, things sometimes do move quickly on the internet. Microsoft has updated their entry on CXPACKET waits. Read all about it here.
I suddenly feel like the Man in Black fighting Fezzik:
I just want you to feel you’re doing well.
I have a server where decreasing the MAXDOP value increases the performance of the app it supports (vendor supplied) but hinders in-house written apps and queries. Very annoying.
@James – Over months of trial and error I think I’ve hit the sweet spot on our primary LOB server where the Cost Threshold for Parallelism is 50 and the MAXDOP (for an 8 vCPU server) is 4. This means that up to three vCPU cores are available for parallel query processing, plus the organizer/coordinator thread.
This way, even if a query does use parallelism, it can’t tie up all 8 vCPU cores.
Exactly the kind of careful adjustment that has to be done. And, the kind of thing that can’t be summed up in a single line.
I’ve tried different things but I have yet to find that sweet spot. I will keep at it of course.
Pretty sure that’s the first time someone has told me to shut up before I could even open my mouth…OK, it isn’t, but I’m going to believe that it is.
Oh I know it’s not the first time you’ve been told that.
However, I just didn’t want you chiming in with your near infinite knowledge and spoiling it for everyone else.
[…] recently posted some comments about some guidance offered by Microsoft when talking about the CXPACKET wait type. Nothing I said was wrong, no. But, evidently there are a few whip smart and agile people who work […]
Really in poor taste to gratuitously slam people from any region. I know you thought you’re making a cute joke, but I think there are a lot of other cultural groups you’d never dream of using in that same sentence.
I assume you’re referring to the Sicilian and Asia references. Those are both quotes from a movie, The Princess Bride. It’s one of the most innocuous and friendly films ever made. No ill intent was meant in the movie and no ill intent was used in quoting it. In the film, one person is offering another advice (bad advice it turns out). Same thing here. There was no slam here, at all.
Love The referrences to the Princess Bride.
[…] Fritchey (blog | @gfritchey) wrote an article titled “Be Cautious Offering Guidance”. The point of the article was not to call out the fact that Microsoft offered limited information […]
[…] you need to be careful about the advice on the internet. It can be like taking candy from strangers. There still quite a bit of misleading advice that can cause more harm than good. An easy example […]
[…] you need to be careful about the advice on the internet. It can be like taking candy from strangers. There still quite a bit of misleading advice that can cause more harm than good. An easy example […]
Heh… Your article really strikes a note for me but not the way that most people would think.
I amazed at how much time and money some people spend on searching for “guidance” for esoteric hardware settings and then implementing that “guidance” to get nearly trivial percentages of improvement. Sure, there are exceptions to the rule but even a 200% improvement performance pales in comparison to fixing the real problem, which can easily result in 4 digit percentages of improvement (try THAT with a settngs solution!).
That brings us back to the problem of offering “guidance”. The one piece of “guidance” that I’ll never prefix with the phrase of “It Depends” is “Crap Code WILL cripple your server and there is little to do with hardware or settings that will fix it. Learn to identify it, fix it, and avoid it in the future”.
Of course, that will never happen within the walls of many companies because they’re more concerned with schedule than product. So my “guidance” to them (especially their managers) is “If you want it real bad, that’s the way you’ll get it”. 😉
Shifting gears, thank you for the time you spend on this blog. I never fail to learn something new here.
We recently had to ask Microsoft for help in fixing a performance issue with our data warehouse. The OLAP app that was feeding off of the db was performing poorly. The Microsoft engineer suggested that we reduce the maxdop from 4 to 1! His justification? It helped him fix similar performance issues for past clients. Outright setting maxdop to 1 for oltp-type systems is bad enough a strategy already (a mistake I am guilty of); for olap- type workloads it is an even more obviously bad advice. I know one should not accept candy from strangers, but what if one’s parent or grandparent (Microsoft) offers the candy?! 🙂
Right there with you Jeff. The vast majority of all performance problems go right back to the code and the data structures. There’s no getting around that.
Yeah, hard to argue with the Microsoft consultants, even when they’re wrong. I’ve faced the same issue in the past. But, you have to stick to your guns.
BWAAA-HAAAA!!!! I’d rather stick my gun to the “gurus” that made such a suggestion! Man! what were they thinking??? 😉
Thinking?
You keep using that word. I do not think it means what you think it means.