It’s been emphasized over and over that the costs of operations within an execution plan, and the estimated costs of the plan themselves are, in fact, estimates. But it goes further than that. The estimated values are based on statistics, or the lack thereof. Statistics themselves are also estimates. This means that the costs you’re seeing are extrapolations based on extrapolations. So, you should just ignore those values and move on, right? Wrong.
In order to understand how the optimizer is choosing to put together an execution plan for your query so that you can use that understanding to then make intelligent choices as to modifying the query or the structure of your database, you must use the values you have at hand. However, you must also understand where and how those values were derived in order to make a determination on how much faith you can put into them (because you are simply placing faith in those numbers). Above all else, this means you must understand your data as presented by the statistics and what these statistics represent in terms of defining your data.
For a more detailed discussion of what exactly statistics represent, see my article in Simple-Talk.
Next, you need to understand what the operator itself is doing. In the majority of cases, this means just reading the description. They’re usually pretty clear. But sometimes, it might not be that clear. Further, while the descriptions of an operator may be clear, you need to further understand why or what it is doing, not simply what it is. This means further drill down to reading through the properties of the operator (NOTE: not the tool tip) in an attempt to understand what’s going on. Boogle or Ging will be a friend here. You can search up descriptions of what operators are to assist with your understanding. You need to know when something doesn’t even have statistics, such as a table variable, and therefore the optimizer assumes it only has 1 row, because that’s going to radically affect the cost estimates displayed for you, even in an actual execution plan. That’s right, the actual plan costs are still just estimates.
I wish I could tell you to rely on these numbers, but you can’t. I wish I could tell you to ignore these numbers, but you can’t. These are the only numbers you get that show you what’s happening internally within the query within the optimizer, so you must use them. Just use them with the full knowledge that they are calculations based on other calculations based on extrapolations. In short, a bit of a guess.
Want to talk query tuning, execution plans, optimizer and statistics some more? Come see me at the all day pre-conference seminar in Dallas before the 2013 SQL Saturday there in Dallas this November.
One concern I have with not giving enough consideration to query plan cost is a potential side effect to a parallelism management strategy that is gaining traction: increasing the cost of parallelism. If something like ascending key problem and rows estimate of 1 result in a horribly lowball cost estimate of 50 on a system with COP increased to 55, poor performance of the selected operators could be massively compounded by the query executing in serial instead of parallel.
True. It’s a perpetual balancing act to try to work with the constructs at the edges.