Reading execution plans in SQL Server is just hard. There’s a lot to learn and understand. I previously outlined the basics I use to get started when I’m looking at an execution plan for the first time. However, just those pointers are not enough. I want to explain a little further why and how those basic steps are how you get started reading execution plans. To begin with, instead of talking about the first operator, which I’ve detailed before, we’ll talk about the highest cost operators.
Highest Cost Operator
Every execution plan within SQL Server includes what the optimizer has determined to be the estimated cost of each operation. All these estimated operator costs are tallied up, and that makes up the estimated cost of the whole execution plan.
You get the same data on the execution plan whether your capturing what is called an Estimated Plan, or you have an execution plan with runtime metrics, sometimes called an Actual Plan. The estimated values for the execution plan, or the plan plus runtime metrics, remains the same. The only exception that would cause these numbers to change is a recompile event. Otherwise, they’ll always be the same.
Because the optimizer makes the choices it makes based on these estimated costs, even though they are not measurements of performance, even on a plan with runtime metrics, these are the numbers that begin you on your journey of reading an execution plan. So, one of the first things I will always look for in an execution plan are the highest cost operators.
Let’s look at an operator to see exactly which number I mean. Start with, here’s the full execution plan:
Let’s focus on one operator, in fact, the most costly operator in the plan, the Key Lookup. Here I’ve highlighted the estimated cost of this operator:
The highlighted cost shows that this one operator is estimated to cost 60% of the overall cost of the query. Let’s take a look at the same operator in an execution plan plus runtime metrics:
Even though we’ve added runtime metrics including the time for this operator to execute and the actual number of rows versus the estimated number of rows, you can still see that same estimated cost highlighted for this operator.
To spot the highest cost operator, using only SSMS, you’ll have to scroll around to find the operator with the highest cost. Third party tools frequently offer up the highest cost operator in a list. You could also query the XML of the plan to find the highest cost operator by NodeID and then search for that NodeID in SSMS.
OK. I’ve Identified the Highest Cost Operator, Now What?
This is the tricky part. You’ve found the highest cost operator. In the example plan, the Key Lookup operator. Now, you need to understand what that operator is doing, and, if what it’s doing actually ought to cost as much as it does.
Sometimes, this is simple. You have a SELECT * query with no where clause. The highest cost operator in the plan is the Scan operation to retrieve all the data. Makes perfect sense.
However, what about our example, the Key Lookup operator?
First, do you know what a Key Lookup operator does? In a nutshell, a non-clustered index was used to bring back a number of rows. However, that non-clustered index wasn’t a covering index, meaning, it didn’t have all the columns needed by the query either in it’s key or in INCLUDE columns. So, we have to go to where the data is stored, the clustered index, and lookup the necessary columns.
OK. So, now we know what it does. So, we have to determine, within the context of the plan, is it really the biggest pain point? How do we make that determination.
As I said, this is the tricky part. We have to start looking at the details we have for this operator. We know what it’s doing and we can determine specifically why by looking at the properties and the “Output List” to see the details:
We’re retrieving the column ‘City’ from the clustered index. So, yes, it’s necessary. Now, does it really cost as much as it should? Well, we need to look at the details in the properties again. Where possible, we want to get execution plans plus runtime metrics. Why? Because they add the actual values for some of the estimates. In fact, we can see it in the operator above where it shows that we’re only returning 6% of the estimated rows, 16 of an estimated 265. This is a clue about this operator. Let’s look at the details in the properties to see if we can see more:
I’ve highlighted three pieces of data. The actual number of rows, the actual number of executions, and the estimated number of executions. You can see that the estimated number of executions was 265.054. The actual was 16. Just this alone suggests that the estimated cost for this operator might not be accurate, at least for this execution of the query.
We need to set the context within which this operator is running. That means understanding it’s relationship with other operators in the plan.
Operator Context
To fully understand how a given operator is behaving, you have to look at it in relation to the other operators in the plan. In our example plan, the operator that immediately preceded this one, the one above and to the left of it, is a Nested Loops Join operator. A Nested Loops Join operator works on the basis of walking through a set of values given it by the preceding operator, in this case, another Nested Loops Join, and then looking for matching values in the operations below, in this case, our highest cost operator, the Key Lookup.
Again, we have to look at the properties to get the details. Our goal is simple. Understand why our Key Lookup operation thought it was returning 265 rows, but only returned 16. You can see that the Nested Loops Join thought it was getting 265.054 rows. Since a Nested Loops isn’t the source of the data, we have to move further into the plan.
The next operator was that other Nested Loops operator. It’s joining data between two tables, using an Index Seek on both, the StateProvince table and the Address table:
Note that Index Seek on the Address table. That’s where the source of our issues is coming from. The optimizer thought that it was getting 265 rows. In actuality it got 16. Hence our disparity with all the other estimates.
Fine, But Where Is The Problem?
Again, our highest cost operator is a sign post to point out the problem, but it’s not always going to be simple. Instead, we have to look at the rest of the plan to understand if attempting to change or eliminate the Key Lookup is likely to gain us the performance enhancements we’d like to get out of this query.
Looking at the rest of the plan, we spot some stuff that might be more problematic:
We’re scaning 19,614 rows in the BusinessEntityAddress table and 19,820 rows in the Customer table. Also note, after the joins, this is returning 3 rows. However, we are paying the costs of scanning 20,000 rows, twice, plus the costs of building hash tables to support the Hash Match operators. So, while eliminating the most costly operator, the Key Lookup, might get us a small performance enhancement, maybe. In fact, those scans are the real pain points combined with the Hash Match joins needed to deal with the large data sets we’re getting here.
So, while the highest cost operator is a driving factor for reading an execution plan, you must set that cost within the context of the entire plan in order to better understand where the pain point actually is in the query.
Conclusion
Anyone who says reading execution plans is easy is probably trying to sell you something. In fact, reading execution plans is hard. However, we take my initial list as a starting point. Then go through the trouble to understand the operator that you’ve identified by looking for the highest cost operator. Finally, understand that operator’s context within the plan. With this you’ll begin the process of understanding execution plans.
Nice the explanation Grant, thanks. Hope you go further into this topic.
Thanks! That’s the plan.