Clustered Indexes Have Statistics Too

SQL Server, T-SQL
It may seem obvious, but I've heard more than one person suggest to me that statistics on a clustered index just don't matter. That if the clustered index can satisfy a given query, it's going to get selected. That just didn't make any sense to me, but I haven't seen anyone set up a test that shows how it might work one way or the other. Here you go. First, I'm going to create a table and load it up with data. I'm intentionally using strings because I don't want to confuse the ease of management of integers within indexes. I also went for one column that would have a very attractive set of statistics and one that would have a very ugly set. Also, because we're only dealing with…
Read More

Interviewing a DBA

PASS, SQL Server, T-SQL
I'm not a fan of trivia style interview questions. Yes, I ask a few because you have to in order to immediately eliminate the completely unqualified applicants. Even those types of questions, in my opinion, need to be focused on concepts and not syntax. The reason we have the Books Online with SQL Server is because you shouldn't have to memorize every possible command along with all their parameters. Want to know how to write a MERGE query? Look it up. What does a MERGE query do? That you ought to know. I think concepts are important. Questions about the recovery models within SQL Server aren't trivia about the system, they're trying to get to your understanding of how point in time recovery works. I don't really like posting interview…
Read More

SQL Server vs. Oracle

PASS, Redgate Software, SQL Server, T-SQL
Just so we're clear, I use SQL Server. I like SQL Server. But, this doesn't mean I have anything against Oracle. It's fine. It's good. But, I know very little about it. However, throughout my career I've found myself needing to understand it better. Either because I'm trying to train Oracle people to better use SQL Server and I need to be able to speak a little of their language to facilitate translation. Or, because I'm defending SQL Server on some technical point that the Oracle people don't completely understand. Or, because I've said something stupid about Oracle in my ignorance. Now, you know how busy you are, and I know how busy I am, so I doubt either of us has the time we really need to learn Oracle…
Read More

24 Hours of PASS, Fall 2012

PASS, SQL Server, T-SQL, Tools
It's time to get your learn on again. The schedule for the Fall 24 Hours of PASS is up and ready for registration. This is the Summit preview session, so many (most, all) of the speakers are showing off some of what you can learn at their sessions at the PASS Summit 2012 itself. It looks like a pretty exciting bunch of topics given by some of the best professionals in the industry. I'll be presenting Three Ways to Identify Slow Running Queries on September 20th, 1400 GMT. This is just a sub-set of the information that I'll be presenting during my all day pre-conference seminar, Query Performance Tuning: Start to Finish. The full seminar I talk about how to measure the performance of your systems, identify which queries are…
Read More

Querying Information from the Plan Cache, Simplified

SQL Server, T-SQL
One of the great things about the Dynamic Management Objects (DMOs) that expose the information in plan cache is that, by their very nature, they can be queried. The plans exposed are in XML format, so you can run XQuery against them to pull out interesting information. For example, what if you wanted to see all the plans in cache that had a Timeout as the reason for early termination from the optimizer? It’d be great way to see which of your plans were less than reliable. You could so like this: WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'), QueryPlans AS ( SELECT RelOp.pln.value(N'@StatementOptmEarlyAbortReason', N'varchar(50)') AS TerminationReason, RelOp.pln.value(N'@StatementOptmLevel', N'varchar(50)') AS OptimizationLevel, --dest.text, SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1, (deqs.statement_end_offset - deqs.statement_start_offset) / 2 + 1) AS StatementText, deqp.query_plan, deqp.dbid, deqs.execution_count, deqs.total_elapsed_time, deqs.total_logical_reads, deqs.total_logical_writes FROM…
Read More

Problems with my new book

Professional Development, SQL Server
First off, I apologize. As if writing a book wasn't hard enough, now we get new problems because of on-demand printing. Here's the story. Book. Nine months of writing. Excellent technical editing. Great copy editing. Book complete about six weeks ago. Yay! Now things get fun... Here's how it works. Everyone these days uses digital copies of the book and prints on demand. So Apress is printing some copies of the book, but not all. They send a file out to places like Amazon. Amazon uses that file to print some copies of the book, as needed, on-demand. Everyone is, in theory, printing from the same digital file, creating exactly the same book. Or are they? What happens if, oh, let's just say that a file was corrupted somehow prior to…
Read More

Guest Blog

SQL Server
I was given the opportunity to put together a guest blog post for the MVP blog. I did a little something on determining whether or not you have high memory use through the use of a DMO. Check it out.
Read More

Never, Ever Use Clustered Indexes

SQL Server
This whole concept of the clustered index as a foundational structure within SQL Server is just plain nuts. Sure, I get the concept that if a table has a clustered index, then that index actually becomes the table. When you create a clustered index on a table, the data is now stored at the leaf level of the Balanced Tree (b-tree) page distribution for that index, and I understand that retrieving the data using a seek on that index is going be extremely fast because no additional reads are necessary. Unlike what would happen with a non-clustered index on a heap table. Yes, I get that if I store my data in a heap, the only way to access the data is through the Index Allocation Mapping (IAM)  pages that…
Read More

Which SELECT * Is Better?

SQL Server, T-SQL
The short answer is, of course, none of them, but testing is the only way to be sure. I was asked, what happens when you run ‘SELECT *’ against a clustered index, a non-clustered index, and a columnstore index. The answer is somewhat dependent on whether or not you have a WHERE clause and whether or not the indexes are selective (well, the clustered & non-clustered indexes, columnstore is a little different). Let’s start with the simplest: SELECT    * FROM    Production.ProductListPriceHistory AS plph; This query results in a clustered index scan and 5 logical reads. To do the same thing with a non-clustered index… well, we’ll have to cheat and it’ll look silly, but let’s be fair. Here’s my new index: CREATE NONCLUSTERED INDEX TestIndex ON Production.ProductListPriceHistory (ProductID,StartDate,EndDate,ListPrice,ModifiedDate); When I…
Read More

Execution Plan for a User Defined Function

SQL Server, T-SQL
When you execute a multi-statement user-defined function you may see an execution plan that looks something like this: It appears as if the cost of the UDF is free. This is especially true if you use the UDF in a query with other objects, such as joining it to actual tables. Since the optimizer always assumes a multi-statement UDF has a single row for statistics estimates, it' frequently displays a low cost. But you know that there’s more going on there, right? It’s a multi-statement UDF because it’s doing a lot of work, but that is not reflected in the execution plan.. or is it? What if we went after the cache? Let’s run this little query: SELECT deqp.query_plan, dest.text, SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1, (deqs.statement_end_offset - deqs.statement_start_offset) /…
Read More