I was presenting on execution plans when another question came up that I didn’t know the answer to immediately. Yes, I know you’ve seen that phrase before on this blog. I love presenting because you get exactly the kinds of questions that make you think and make you learn. I’m presenting, in part, to learn, just as much as I am to teach. It was the same with kenpo. The more I taught, the better I learned the art. Wait, this isn’t supposed to be a blog post about learning. This one is about statistics.
The question was, does the execution plan have the statistics that were used by the optimizer to decide on the execution plan. And no, what was meant, was not does it show the estimated rows, which come from the statistics, but specifically does it show that it used a set of statistics named X? The answer to that is yes & no. Or, to be more DBAish about it, it depends.
Let’s take a really simple query run against a freshly installed copy of AdventureWorks2008R2:
SELECT p.BusinessEntityID, p.FirstName FROM Person.Person AS p WHERE p.FirstName LIKE 'Toni%';
This query generates this execution plan:
Yes, a very sophisticated and hard to understand execution plan. Now, here’s the deal, there were two sets of statistics used to make this plan, but only one of them can be seen in the plan. See any statistics there? Sure you do. IX_Person_LastName_FirstName_MiddleName. That is one of the two sets of statistics that were used to make this execution plan. Where’s the second set? Not in the execution plan.
[sourcecode language=â€sqlâ€]sp_helpstats N’Person.Person’, ‘ALL’;
The results are here:
There is the second set of statistics used for this query, right at the top. Because I was searching the FirstName column, the optimizer found that it did not have the statistics it needed, so they were created, on the fly, and then, were not a part of the execution plan. Further, I probably looked at other statistics such as the PK_Person_BusinessEntityID because that’s the clustered index for the table. It could have scanned that to get the list of values just as easily as the other index. But, that other index is probably smaller, which means fewer pages scanned.
So, back to the question, can you see the statistics used by the optimizer inside the execution plan? Some of them, yes, but not all of them.
Please, if I’m presenting, ask questions. I’ll know a few of the answers, right off the top of my head. Others will make me go and learn so that I can answer the next person who asks the same question. I don’t mind losing at a game of Stump the Chump, so let’s play.
It’s irritating, because they must be stored somewhere in the internal representation of the plan so that the pre-execution validation checks know what stats need to be checked for updates.
Yeah, no doubt they’re in there somewhere, at least I assume so. Maybe not. I mean if the FirstName stats get updated, it doesn’t cause a recompile… I think. Probably should test that.
Cool! I learned something new. I didn’t know about sp_helpstats. I looked it up in BOL, and it says, “This feature will be removed in the next version of Microsoft SQL Server…To obtain information about statistics, query the sys.stats and sys.stats_columns catalog views.” Do you know how you would find the info using those?
Ya, an optimizer decision tree log to show which stats were used and decision results would be awesome. I smell a trace flag! I’d love to see a history of ‘found CI PK_My_Index, decided not to use it because of Reason Code x: Covering NCI idx_FirstName found.” “Analyzed NCI idx_FirstName, need FirstName, all columns found.” “Used full Scan due to use of LIKE.”. Or whatever. I know the steps and decisions are far more exhaustive for what the optimizer decides and why, but it would be cool to peer inside.
Yeah, I was just being lazy. Here:
SELECT *
FROM sys.stats AS s
WHERE s.object_id = object_id(‘Person.Person’)
Here’s one alternative to sp_helpstats
SELECT OBJECT_NAME(object_id), name, auto_created, user_created, no_recompute, has_filter, filter_definition, LEFT(list,LEN(list)-1)
FROM sys.stats AS s
CROSS APPLY (
SELECT
name + ‘, ‘ AS [text()]
FROM
sys.stats_columns AS sc INNER JOIN sys.columns AS c ON sc.object_id = c.object_id AND sc.column_id = c.column_id
WHERE
s.object_id = sc.object_id AND s.stats_id = sc.stats_id
ORDER BY
stats_column_id
FOR XML PATH(”)
) columnlist (list)
@Jes I knocked this up quickly, think it looks ok:
SELECT s.name AS statistics_name,
STUFF
(
(
SELECT ‘, ‘ + c.name AS [text()]
FROM sys.stats_columns sc
JOIN sys.columns c ON sc.object_id = c.object_id
AND sc.column_id = c.column_id
WHERE sc.object_id = s.object_id
AND sc.stats_id = s.stats_id
ORDER BY sc.stats_column_id
FOR XML PATH (”)
),
1,
2,
”
) AS statistics_keys
FROM sys.stats s
WHERE s.object_id = OBJECT_ID(‘Person.Person’)
ORDER BY s.name
The trace flags you are after are 9204 (statistics loaded) and/or 9292 (statistics header loaded). You also need 3604 to redirect the output to the Messages window.
Text output only appears when a compilation or recompilation occurs – so clear the plan from cache first, dump the plan cache completely, or use OPTION (RECOMPILE) – though that may change the plan of course.
You can also use another undocumented option to enable the trace flags just for the statement, e.g. for stats headers:
SELECT
p.BusinessEntityID,
p.FirstName
FROM Person.Person AS p
WHERE
p.FirstName LIKE ‘Toni%’
OPTION
(
QUERYTRACEON 3604,
QUERYTRACEON 9292,
RECOMPILE
);
Cheers,
Paul
OK. So I was being VERY lazy. Nice posts.
Maybe you could: http://sqlblog.com/blogs/paul_white/archive/2011/09/21/how-to-find-the-statistics-used-to-compile-an-execution-plan.aspx
Only good when you’re running the query, not examining an exec plan from cache.
yep. What Gail said. Remember the question, can I see the statistics used to create the execution plan in the plan? The answer is still, it depends.
[…] get exactly the kinds of questions that make you think and make you learn. I’m presenting, in… Read more… Categories: Database Performance Performance Tuning […]
Thanks Paul!
I figured that there must be some undocumented trace flags for this since the SQL Engine/optimizer QA department needs something to ensure the engine behaves properly and decides accurately with each release/Service Pack. You certainly can’t leave that to the promises of the developers, can you? P’scha. I know they are not using trial and error.
Outstanding find here. Nice work and thanks for reporting. I look forward to playing with this in my lab.
Hi Grant, I’ve just blogged about that… there is a way… take a look here…
http://blogfabiano.com/2012/07/03/statistics-used-in-a-cached-query-plan/
Regards
Fabiano Amorim
[…] really possible to see the statistics used in the generation of a query plan. If you read the comments here, I was corrected of that notion. However, I’ve never been a fan of using undocumented trace […]
[…] doable to see the statistics used in the generation of a question program. If you go through the reviews here, I was corrected of that idea. Even so, I have under no circumstances been a enthusiast of making […]