Columnstore indexes are fascinating and really cool. Unfortunately, they’re adding an interesting new wrinkle to an old problem.
What’s the Cost Threshold for Parallelism set to on your server? If you just said “The whatsis of whositz?” then the value is 5. The cost threshold is the point at which the estimated cost of an execution plan goes from definitely serial to possibly parallel. This default was set for SQL Server 2000 and hasn’t been changed since. I’ve long argued, loudly, that it’s too low. I’ve suggested changing it to a much higher value. My advice has gone from 35 to 50 and several places in between. You could just look at the median or the mode of costs on your system and use the higher of those values as starting point. Getting a cost higher than 5 in a query is insanely simple. Which means that queries that are absolutely not going to benefit from parallelism go parallel. This causes unnecessary CPU load on your system and slow performance of the queries. So, listen to my advice and run out there and change this value immediately.
Or…
Columnstore indexes have two methods of processing. The first is the row mode. That’s actually the same method used by all your standard indexes and tables. You can write a query that does aggregate calculations against a table that has a Columnstore index. You’ll then see a performance benefit. However, check the properties on the Columnstore operator in the execution plan. Does it say “Actual/Estimated Execution Mode = Row?” Then you’re getting row level processing. That can be fine and you can see a performance improvement using row mode. There is another level of processing though.
Batch mode processing is where it takes 1000 rows at a time and processes the aggregations simultaneously across multiple threads. Wait, multiple threads? Yes. You have to have parallelism in order to get batch mode. Joe Sack talks about it here. You can read more about it in the SQL Server documentation. Batch mode processing is faster (depending, etc.). Especially if you’re dealing with larger data sets (more than tens of thousands of rows), you’re going to want to see Batch as the Execution Mode in the Columnstore operators.
There is an issue here though. That’s right, we just bumped our Cost Threshold for Parallelism up to 5000 (if 50 is good, 500 is better, and we’ll add a zero just to be sure). OK, 50. How high does the cost have to be in order to see benefits from parallelism and batch mode processing? Ah, well, there’s the rub. When I was testing this, I had about 100,000 rows in a table. Nothing large. Execution times went from 100ms to 50ms. Yep, getting parallel execution, batch mode processing, of the nonclustered columnstore index cut my execution time in half.
Next question should be, oh, well, what was the estimated cost of that query? 30? 20? 15?
Try 3.7.
Yeah, I got a 50% improvement in performance by changing the Cost Threshold from 50 to 3.
In other tests I had a plan with an estimated cost of .77. Execution time went from 75ms to 57ms and reads went from 115 to 62 because of batch mode processing.
Now, I am not advocating dropping everyone’s Cost Threshold to 3 (or less). What I am absolutely saying is that you have a new wrinkle that you must add to your calculations. Are you using Columnstore indexes anywhere? Then you need to examine those execution plans. What are their costs? If you adjust your Cost Threshold below them, do you see a benefit? If so, how low did you just go? Now, is that negatively impacting your other queries? You could leave your Cost Threshold as is and use a query hint with TRACEFLAG 8649 (documented by the Sensei Paul White) on your queries against Columnstore indexes, but that comes with a list of caveats as long as your arm. The other alternative is to, again, leave the Cost Threshold alone and use Adam Machanic’s method of forcing parallelism just for the specified queries. These options can be unattractive, but you have to deal with getting parallelism for Columnstore indexes somehow.
Welcome to the quandary. Tuning in and around parallelism just got harder because of Columnstore Indexes.
Want to talk query tuning some more? I’m doing a live, all day session at Connections on query tuning using execution plans this September, 2015. We’ll go over examples with columnstore and lots more. Click here now to sign up.
Thanks for the thoughtful post. The “Batch execution for single-threaded queries” feature coming in SQL Server 2016 (it’s in the CTP, at least, so hopefully it remains in the final product!) will definitely be a welcome addition that is relevant to this topic as well: https://msdn.microsoft.com/en-us/library/dn934994(v=sql.130).aspx
Ah, that’s interesting. I didn’t know that. I haven’t started digging into 2016 yet. That would put the argument back to the beginning, at least in 2016.
[…] Parallelism and Columnstore Indexes – Grant Fritchey (Blog|Twitter) […]
[…] queries. However, as more people start using Columnstore indexes, it could mean you need lower thresholds or force serial plans. Perhaps someday we will need a specialized setting to set default cost […]