The other day a developer showed up at my desk. They were getting time-outs in production on a query that didn’t normally give them trouble. With the parameters they provided, I ran the query. It ran for over 30 seconds, the application side timeout, before it returned it’s data. So I ran it again with an execution plan. It had a bunch of index scans with loop joins across thousands of rows and even created a table spool with 700 million rows as part of the process. Clearly not good.
Next I looked at the query plan. It wasn’t too bad, as these things go. It was probably moving too many columns and apparently the business wanted a pivot on the data since they were using an aggregate method to pivot some of the columns. The only thing that jumped out at me about the query initially was the use of MAX(1) instead of TOP 1 for some of the versioned queries (look for an article in the next, and last, SQL Standard magazine on the topic of MAX vs. TOP).
I let the developer know that it was going to take a while to clean up everything and I got to work. A couple of other DBA’s saw what I was looking at and came over to offer suggestions (it sounds worse than it is, these guys are smart and helpful). There was a lot of discussion about the table spool. I said that changing the MAX queries to use TOP would fix it. There was a bit of argument, but the proof is always in the pudding, so I made the change (in a non-production environment, of course) and sure enough the table spool went away. We still had a bunch of index scans and the loop joins looked funny since there was 15,000 rows being filtered against 15,000 rows. One of the DBA’s with me, Scott, suggested that the statistics were out of date. So I updated the stats and reran the query. It didn’t make a bit of a difference. Another DBA, Det, suggested updating the indexes with a FULL SCAN. Sure enough, not only did the index scans go away, but the table spool and the joins were replaced with HASH joins that worked much better with the data we were returning.
A little more investigation found that 5000 rows had been added to the table for the organization we were working with, but somehow the random selection of rows used by the normal update of statistics wasn’t reflecting this change accurately. It needed the FULL SCAN. So, while scheduling the full scan is easy enough, we wanted to know how to know if it really was needed. Determining the age of the statistics is easy:
SELECT ind.name
    ,stats_date(ind.id,ind.indid)
FROM sysindexes ind
WHERE object_name(ind.id) = ‘tablename’
But figuring out if the FULL SCAN is needed is still a bit elusive. Det’s suggestion was to query the statistics and then run a FULL SCAN update and then requery the statistics comparing the two. While that might show that the FULL SCAN, you just did, was needed, I’m still trying to find a more effecient mechanism.
Hi,
Would you possibly tell me what exactly you mean by Full Scan here? What is it useful for and how did you do that?
My initial guest is that you executed a simple query to return all rows and columns of the table to perform a full scan.
Thank you
FULL SCAN in this instance is not referring to a table scan or an index scan in the execution plan. Instead, I’m talking about updating the statistics using the FULL SCAN option. Normally when you run sp_updatestats or any other statistic update command, you get a random sampling of the data that is defined by the statistic, usually an index. But setting the option to FULL SCAN, you get a complete scan of all the data defined by the statistic which can get you more accurate numbers. You just don’t want to run that all the time because it’s very cpu & memory intensive. Most statistics get maintained just fine by the automatic processes and most of the rest get maintained just fine by the sampling in sp_updatestats. Some need the FULL SCAN.