Refining the Missing Index Data

18Feb 2009 by Grant Fritchey 24 Comments

In my previous post I showed how you could determine if a query had missing index information, but it was difficult to get meaningful data out because I didn’t know XQuery well enough. I spent some more time refining the query and here are the results. This new query will show some basic performance information gathered from the aggregate data in sys.dm_exec_query_stats. It combines this with the full data pulled from the Missing Indexes element in the XML of the execution plan. I’ve got it listing all the recommended columns and grouping. So this means that the performance data is repeated in order to allow for the full listing of groups & columns. It seems to work well.

A couple of interesting points. My purpose is to provide a short-cut method of identifying possible indexing needs for systems that run lots of dynamic or ad hoc queries. In my own environment I’m mainly talking about either Microsoft CRM or nHibernate, but the rules would apply to almost any other client-side query engine. I wanted to list the database name as part fo the query. DBID is included with the sys.dm_exec_query_plan, but unfortunately, when a query is prepared or ad hoc, this value is null. Luckily enough the data is available in the execution plan itself. That’s going to be something to watch for in the future when using these execution plans.

Here’s the query:
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS sp) SELECT p.query_plan.value(N'(sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/sp:MissingIndex/@Database)[1]', 'NVARCHAR(256)') AS DatabaseName ,s.sql_handle ,s.total_elapsed_time ,s.last_execution_time ,s.execution_count ,s.total_logical_writes ,s.total_logical_reads ,s.min_elapsed_time ,s.max_elapsed_time ,p.query_plan ,p.query_plan.value(N'(sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/sp:MissingIndex/@Table)[1]', 'NVARCHAR(256)') AS TableName ,p.query_plan.value(N'(/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/sp:MissingIndex/@Schema)[1]', 'NVARCHAR(256)') AS SchemaName ,p.query_plan.value(N'(/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/@Impact)[1]', 'DECIMAL(6,4)') AS ProjectedImpact ,ColumnGroup.value('./@Usage', 'NVARCHAR(256)') AS ColumnGroupUsage ,ColumnGroupColumn.value('./@Name', 'NVARCHAR(256)') AS ColumnName FROM sys.dm_exec_query_stats s CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) AS p CROSS APPLY p.query_plan.nodes('/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/sp:MissingIndex/sp:ColumnGroup') AS t1 (ColumnGroup) CROSS APPLY t1.ColumnGroup.nodes('./sp:Column') AS t2 (ColumnGroupColumn) WHERE p.query_plan.exist(N'/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan//sp:MissingIndexes') = 1 ORDER BY s.total_elapsed_time DESC

24 thoughts on “Refining the Missing Index Data”

Peter

Excelent query, I can’t stress how important this query is to have in ones toolbox. After correcting ` and â€™ to ‘ and adding a top 20 in the select clause I immediatly got valuable results.

And what did I know, the biggest offender on one of our production databases is an external party that has 1 database on our server for an application that servers one of our own customers.

Many thanks for this wonderful query…..I rate it 10/10 🙂

February 23, 2009 at 5:35 am
Reply
scarydba

Wow! Thank you. I’m glad it was useful. I know we’re starting to put it to to work within our team. I had finished it witout the database name in the query, but when one of my co-workers started testing it we found one database that had so many badly performing queries with missing index reports (we knew it had problems, but…) that I had to get the database name in for filtering purposes.

I’m sure there are better ways to perform the XQuery parts of the script because I’m barely getting started with my XQuery syntax.

February 23, 2009 at 8:39 am
Reply
Peter

I am new to XQuery, and expected XPath when working with XML (which I do both know quite well). It is something else I have to set my teeth in as the I can see it becoming more important in the future.

A thing people can find intimidating is the output in SQL Server 2005. The queryplan xml can be clicked on and then it just open a raw XML document in a seperate tab.

But in SQL Server 2008 it will take you directly to a graphical query plan and you can right click in the header to get the original query text (complete with formatting) and a create script for the suggested index.

Which is all very neat!

We now scheduled a cleanup day for next week to fix bad running queries on our servers and educate people on what to look for. Personaly I am trying make the output easyer to read by glueing the suggested index fields on one line so that top 20 means 20 queries instead of 20 fields in total.

Once I got mine perfect (can take a while), I will post it here!

Again, many thanks!

February 23, 2009 at 11:01 am
Reply
scarydba

Excellent. I look forward to it. After you get it working really well, go on over to SQL Server Central and post it in their script library too. A lot more people will have access to it and see it then. But still post it here to, or at least post a pointer. I’d be very interested.

February 23, 2009 at 11:11 am
Reply
andrew

Any chance of mailing me this, I have cut and pasted it into management studio and got rid of the funky quotation marks but still getting errors, don’t know what I am missing?

Andrew

February 23, 2009 at 6:02 pm
Reply
andrew

It looks like there may be an issue using 2008 management studio, any one else getting errors, any one know the fix?

Andrew

February 23, 2009 at 6:12 pm
Reply
andrew

Sorry, my mistake, bonehead was connected to a 2000 instance while testing and doing some other work, thanks for the code.

Andrew

February 23, 2009 at 6:50 pm
Reply
scarydba

Sorry it took so long to get back. Glad you worked it out. Sorry about the funky quotes. I’m still wrestling with getting code to publish appropriately to the blog.

February 23, 2009 at 7:54 pm
Reply
andrew

ScaryDba,

I have been using this code and been able to tame some pretty bad performance from some dynamic SQL, what I would like to do is parse out the p.query_plan field and get the missing index detail, that way I could run this on a job and save the missing index details over time and do some queries to figure which would be of best benefit over time, I am trying to figure this out, but as always if the wheel has already been invented I would rather not reinvent:-)

Regards

Andrew

February 25, 2009 at 3:56 pm
Reply
scarydba

Hey Andrew,

Yeah, that’s the direction we’re headed in. No wheels on the bus yet though. We’re just thinking about it at the moment… how often to query it, how long to keep it, how to aggregate it… I’m leaning pretty heavily towards adding in the query hash as a mechanism for aggregation & identification. More experiments over the next couple of weeks. I’ll post again when I get something. However, if you beat me to it, please share.

February 25, 2009 at 5:04 pm
Reply
greg

After replacing the quotes, I’m still unable to get this to work. My assumption is that I’m on a SQL Server 2005 database, but running in 2000 compatibility mode. Is my assumption correct? (the error says “the value function requires 2 argument(s)”)

March 3, 2009 at 5:05 pm
Reply
- scarydba
  
  I am sorry about the quotes. I can’t install a code formatting widget unless I go on the pay site.
  
  I’m not sure what would happen if you were running 2000 compatibility mode, but the query is working on a 2005 server, I’ve used it a bunch.
  
  March 3, 2009 at 5:15 pm
  Reply
LimeGargoyle

Greg:
On items 1 and 11-15, make certain that you got the funky quotes areound the comma separating the elements of the value function. I missed one set and got the same error. Try replacing them even if they LOOK right.

March 6, 2009 at 1:22 pm
Reply
ksper

Receive and error when i run this against one of my server.
XML datatype instance has too many levels of nested nodes. Maximum allowed depth is 128 levels.
Any ideas ?

March 7, 2009 at 3:19 pm
Reply
scarydba

I’m guessing it’s the formatting. You have to replace the double quotes. I’m very sorry about that.

March 7, 2009 at 6:33 pm
Reply
scarydba

Unless you’ve got missing index queries that are suggesting more than 128 columns… which is scary, but interesting. Can you see the missing index information in the XML just raw, without my query running?

March 7, 2009 at 6:35 pm
Reply
andrew

This is what I ended up doing, to get the index definition without opening up MS, I collect the details into a table on the DB being looked at for missing indexes, run every minute and then run the below script to loop through the table and create the index columns, not very pretty but it appears to work, I then do a bunch of string manipulation to get the end definition.

declare @MissingIndexDefinition NVARCHAR(2000)
IF OBJECT_ID (N’tempdb.dbo.MissingIndexDetailsProcess’, N’U’) IS NOT NULL
DROP TABLE tempdb.dbo.MissingIndexDetailsProcess
Create Table tempdb.dbo.MissingIndexDetailsProcess (
[ID] [int] NOT NULL ,
sql_handle_id varbinary(max),
execution_count int,
Avg_CPU_Time int,
Query_Text nvarchar(max),
query_plan xml NOT NULL,
TableName nvarchar(250),
ColumnGroupUsage nvarchar(250),
ColumnName nvarchar(250),
ProjectedImpact float,
[Processed] [bit] NOT NULL CONSTRAINT [MissingIndexDetailsProcess1] DEFAULT (0),
InsertDate datetime,
Index_Definition nvarchar(max))

IF OBJECT_ID (N’tempdb.dbo.MissingIndexDetails’, N’U’) IS NOT NULL
DROP TABLE tempdb.dbo.MissingIndexDetails
Create Table tempdb.dbo.MissingIndexDetails (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
sql_handle_id varbinary(max),
execution_count int,
Avg_CPU_Time int,
Query_Text nvarchar(max),
query_plan xml NOT NULL,
TableName nvarchar(250),
ColumnGroupUsage nvarchar(250),
ColumnName nvarchar(250),
ProjectedImpact float,
[Processed] [bit] NOT NULL CONSTRAINT [tempdb.dbo.MissingIndexDetails] DEFAULT (0),
InsertDate datetime,
Index_Definition nvarchar(max))

insert into tempdb.dbo.MissingIndexDetails
(sql_handle_id, execution_count, Avg_CPU_Time, Query_Text, query_plan,
TableName, ColumnGroupUsage, ColumnName, ProjectedImpact, InsertDate)
select sql_handle, execution_count, Avg_CPU_Time, Query_Text,
query_plan, TableName, ColumnGroupUsage, ColumnName, ProjectedImpact, getdate() from DBA_MissingIdexDetails
update tempdb.dbo.MissingIndexDetails set Index_Definition = ”

while (select Count(*)
from tempdb.dbo.MissingIndexDetails where Processed = 0
) > 0

begin

insert into tempdb.dbo.MissingIndexDetailsProcess
select * from tempdb.dbo.MissingIndexDetails
where sql_handle_id in

(select sql_handle_id from tempdb.dbo.MissingIndexDetails
where id in
(select min(id) from
(select min(id) as id, sql_handle_id from tempdb.dbo.MissingIndexDetails
where Processed = 0
group by sql_handle_id) T1))

while (select Count(*)
from tempdb.dbo.MissingIndexDetailsProcess where Processed = 0
) > 0

begin
select @MissingIndexDefinition =
(select case when ColumnGroupUsage = ‘EQUALITY’ then ‘ #’+ColumnName
when ColumnGroupUsage = ‘INEQUALITY’ then ‘ #’+ColumnName
when ColumnGroupUsage = ‘INCLUDE’ then ‘ @’+ColumnName
end
from tempdb.dbo.MissingIndexDetailsProcess
where id in
(select top 1 id from tempdb.dbo.MissingIndexDetailsProcess where Processed = 0
order by id))

update tempdb.dbo.MissingIndexDetailsProcess
set Index_Definition = Index_Definition+@MissingIndexDefinition
where Processed = 0

update tempdb.dbo.MissingIndexDetailsProcess set Processed = 1
where id in (select top 1 id
from tempdb.dbo.MissingIndexDetailsProcess where Processed = 0
order by id)

end

set @MissingIndexDefinition = ”

update tempdb.dbo.MissingIndexDetails set Processed = 1
where sql_handle_id in

(select sql_handle_id from tempdb.dbo.MissingIndexDetails
where id in
(select min(id) from
(select min(id) as id, sql_handle_id from tempdb.dbo.MissingIndexDetails
where Processed = 0
group by sql_handle_id) T1))

end

truncate table tempdb.dbo.MissingIndexDetails

set identity_insert tempdb.dbo.MissingIndexDetails on

insert into tempdb.dbo.MissingIndexDetails
(ID, sql_handle_id, execution_count, Avg_CPU_Time, Query_Text, query_plan, TableName, ColumnGroupUsage,
ColumnName, ProjectedImpact, Processed, InsertDate, Index_Definition)
select * from tempdb.dbo.MissingIndexDetailsProcess
set identity_insert tempdb.dbo.MissingIndexDetails off

March 9, 2009 at 4:01 pm
Reply
Peter

These are the functions I ended up with:

Function 1: Optimized for quick, but potentially inaccurate database filtering, something which is otherwise very slow when applying such filtering to a large production environment.

create function dbo.fn_MissingIndexes( @database sysname = null ) returns table
as
return
(
— Missing index detection
—
with
xmlnamespaces( ‘http://schemas.microsoft.com/sqlserver/2004/07/showplan’ as sp )
select
p.query_plan
, ig.missingIndexGroup.value( ‘@Impact’, ‘decimal(6,4)’ ) as [impact]
, i.missingIndex.value ( ‘@Database’, ‘nvarchar(256)’ ) as [database]
, i.missingIndex.value ( ‘@Schema’, ‘nvarchar(256)’ ) as [schema]
, i.missingIndex.value ( ‘@Table’, ‘nvarchar(256)’ ) as [table]
, cast( s.total_elapsed_time as money ) / 1000000 as [total_elapsed_time]
, cast( s.min_elapsed_time as money ) / 1000000 as [min_elapsed_time]
, cast( s.max_elapsed_time as money ) / 1000000 as [max_elapsed_time]
, s.execution_count
, s.last_execution_time
, s.total_logical_writes
, s.total_logical_reads
, stuff
(
(
select
cast( ‘, ‘ as varchar(max) )
+ case when cg.indexColumnGroup.value ( ‘@Usage’, ‘nvarchar(256)’ ) = ‘INCLUDE’ then ‘include: ‘ else ” end
+ c.indexColumn.value( ‘@Name’, ‘nvarchar(256)’ )
from
i.missingIndex.nodes( ‘sp:ColumnGroup’ )
as cg( indexColumnGroup )

cross apply cg.indexColumnGroup.nodes( ‘sp:Column’ )
as c( indexColumn )
for
xml path( ” )
)
, 1
, 2
, ” ) as ‘index_columns’
, st.StmtSimple.value ( ‘@StatementText’, ‘nvarchar(max)’ ) as [StmtSimple]
, s.sql_handle
from
sys.dm_exec_query_stats as s
cross apply sys.dm_exec_sql_text( s.sql_handle ) as q
cross apply sys.dm_exec_query_plan( s.plan_handle ) as p

cross apply p.query_plan.nodes( ‘/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple’ )
as st( StmtSimple )

cross apply st.StmtSimple.nodes( ‘sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup’ )
as ig( missingIndexGroup )

cross apply ig.missingIndexGroup.nodes( ‘sp:MissingIndex’ )
as i( missingIndex )
where
q.text not like ‘%”http://schemas.microsoft.com/sqlserver/2004/07/showplan”%’ and

— quick database detection (is context of execution, not neccecarily the database of the object that needs an index)
—
(
@database is null or
exists( select 1 from sys.dm_exec_plan_attributes( s.plan_handle ) as a where a.attribute = ‘dbid’ and a.value = DB_ID( @database ) )
)
)
;
go

select top 5 * from master.dbo.fn_MissingIndexes( default )
order by
— last_execution_time desc
— execution_count desc, total_elapsed_time desc
total_elapsed_time desc
— total_logical_writes desc
— total_logical_reads desc
— min_elapsed_time desc

option( recompile )
;

select top 5 * from master.dbo.fn_MissingIndexes( ‘mydatabase’ )
order by
— last_execution_time desc
— execution_count desc, total_elapsed_time desc
total_elapsed_time desc
— total_logical_writes desc
— total_logical_reads desc
— min_elapsed_time desc

option( recompile )
;

March 10, 2009 at 6:54 am
Reply
Peter

Function 2: Slow but accurate database filtering.

alter function dbo.fn_MissingIndexesExact( @database sysname = null ) returns table
as
return
(
— Missing index detection
—
with
xmlnamespaces( ‘http://schemas.microsoft.com/sqlserver/2004/07/showplan’ as sp )
select
p.query_plan
, ig.missingIndexGroup.value( ‘@Impact’, ‘decimal(6,4)’ ) as [impact]
, i.missingIndex.value ( ‘@Database’, ‘nvarchar(256)’ ) as [database]
, i.missingIndex.value ( ‘@Schema’, ‘nvarchar(256)’ ) as [schema]
, i.missingIndex.value ( ‘@Table’, ‘nvarchar(256)’ ) as [table]
, cast( s.total_elapsed_time as money ) / 1000000 as [total_elapsed_time]
, cast( s.min_elapsed_time as money ) / 1000000 as [min_elapsed_time]
, cast( s.max_elapsed_time as money ) / 1000000 as [max_elapsed_time]
, s.execution_count
, s.last_execution_time
, s.total_logical_writes
, s.total_logical_reads
, stuff
(
(
select
cast( ‘, ‘ as varchar(max) )
+ case when cg.indexColumnGroup.value ( ‘@Usage’, ‘nvarchar(256)’ ) = ‘INCLUDE’ then ‘include: ‘ else ” end
+ c.indexColumn.value( ‘@Name’, ‘nvarchar(256)’ )
from
i.missingIndex.nodes( ‘sp:ColumnGroup’ )
as cg( indexColumnGroup )

cross apply cg.indexColumnGroup.nodes( ‘sp:Column’ )
as c( indexColumn )
for
xml path( ” )
)
, 1
, 2
, ” ) as ‘index_columns’
, st.StmtSimple.value ( ‘@StatementText’, ‘nvarchar(max)’ ) as [StmtSimple]
, s.sql_handle
from
sys.dm_exec_query_stats as s
cross apply sys.dm_exec_sql_text( s.sql_handle ) as q
cross apply sys.dm_exec_query_plan( s.plan_handle ) as p

cross apply p.query_plan.nodes( ‘/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple’ )
as st( StmtSimple )

cross apply st.StmtSimple.nodes( ‘sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup’ )
as ig( missingIndexGroup )

cross apply ig.missingIndexGroup.nodes( ‘sp:MissingIndex’ )
as i( missingIndex )
where
q.text not like ‘%”http://schemas.microsoft.com/sqlserver/2004/07/showplan”%’ and

— Affected database based on actual object in need of an index
—
i.missingIndex.value ( ‘@Database’, ‘nvarchar(256)’ ) = ‘[‘ + @database + ‘]’
)
;
go

select top 5 * from master.dbo.fn_MissingIndexesExact( default )
order by
— last_execution_time desc
— execution_count desc, total_elapsed_time desc
total_elapsed_time desc
— total_logical_writes desc
— total_logical_reads desc
— min_elapsed_time desc
;

select top 5 * from master.dbo.fn_MissingIndexesExact( ‘MyDatabase’ )
order by
— last_execution_time desc
— execution_count desc, total_elapsed_time desc
total_elapsed_time desc
— total_logical_writes desc
— total_logical_reads desc
— min_elapsed_time desc
;

March 10, 2009 at 6:56 am
Reply
Peter

You can see from the way I invokes the functions in my examples that I created them in my master database.

Another thing you can do to speed searching up is to run ** once **:

DBCC FREEPROCCACHE;

This clears your complete queryplan cache and all attached accumulated statistics. Then let your system work as normal for an hour to a day and accumulate new statistics. Then run the queries to see what could have been improved.

When getting results, do not blindly add new indexes as the plans might suggest. Study the situation carefully as too many indexes have downsides too. Just use the suggestions as input to replan your indexes for an entire table as a whole to serve all queries running against it.

March 10, 2009 at 7:04 am
Reply
scarydba

Interesting stuff. I’ll need to play with it a bit. Thanks for sharing.

One point, I would not recommend running DBCC FREEPROCCACHE on a production system that’s in use, especially on a heavily used system. Getting all the procs back into cache can be an expensive operation for some systems..

March 10, 2009 at 7:15 am
Reply
Peter

I agree with your recomodation of not using it on a 24/7 heavy loaded system. It would get hit with a brick if you clear the cache completely in such a situtation. The servers I work with are more related to office hours and become quiet before midnight and stay that way till 6am (bar the backup period). These servers have time to get up to speed as more users arrive and start processed performing DB access. It is not like it goes from nothing to peak load :).

March 10, 2009 at 8:12 am
Reply
Douglas Osborne

Grant,

Which is the more meanigful sort, s.total_elapsed_time DESC or ProjectedImpact DESC?

They seem to not correlate.

Thanks,
Doug

March 31, 2009 at 12:33 pm
Reply
scarydba

Hey Doug,

No I suspect they wouldn’t correlate. The total_elapsed_time is going to be the actual run times for the queries and the ProjectedImpact is going to the optimizer’s best guess at how the query will be affected by including the index that it thinks it could use. I’d be surprised if there was a correlation between those two values. It’s because the estimated values from the optimizer are just not to be really, entirely, trusted. That doesn’t make it less than a useful measure, it would be. You’d just need to be sure to not rely on it completely as TRUTH.

I used total_elapsed_time, but avg time or number of executions or any of the other measures would be equally valid.

March 31, 2009 at 1:08 pm
Reply