There’s power in naming things. Supposedly some types of magic are even based on knowing the correct names for things. The name for the T-SQL clause Common Table Expression (CTE) is actually pretty accurate. It’s an expression that looks like a table and can be used in common across the entire query (at least I think that’s what the common part refers to). But note, I didn’t say it was a table. It’s not. It’s an expression. If you look at the T-SQL definition at the link, it refers to a “temporary” result set. Now, to a lot of people, that means table. But it isn’t. Let’s look at this in more detail.
Here’s a query that defines a simple CTE and then uses it to query the date in the next T-SQL statement:
WITH x AS (SELECT soh.OrderDate, soh.SalesOrderID, sod.LineTotal, sod.ProductID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883 ) SELECT x.OrderDate, x.LineTotal, p.Name FROM x JOIN Production.Product AS p ON p.ProductID = x.ProductID;
If we run this, we get the following execution plan:
Let’s not worry about the plan for the moment. Instead, I want to look at a couple of more queries:
--Derived Table SELECT x.OrderDate, x.LineTotal, p.Name FROM (SELECT soh.OrderDate, soh.SalesOrderID, sod.LineTotal, sod.ProductID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883 ) AS x JOIN Production.Product AS p ON p.ProductID = x.ProductID; --Just a JOIN SELECT soh.OrderDate, sod.LineTotal, p.Name FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID JOIN Production.Product AS p ON p.ProductID = sod.ProductID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883;
Three, nominally, different queries. Yet, if you run these queries, all of them return exactly the same data and all of them have exactly the same execution plan. That’s right, the CTE didn’t change the way that SQL Server derived these queries in any way. Nor would it. That’s because, the CTE is absolutely not a table. It’s an expression.
That name appears to hold magic. We see the word table and we think “Tables, I know all about them” and we proceed to start treating our CTEs as if they were tables, but they’re not. And Microsoft’s definition is completely accurate, if maybe just a touch misleading. These are not tables we’re dealing with when we work with Common Table Expressions.
Let’s look at the difference:
--Table Valued Parameter DECLARE @TVP AS TABLE (OrderDate DATETIME, SalesOrderID INT, LineTotal NUMERIC(38,6), ProductID INT, INDEX tt_ci CLUSTERED (SalesOrderID), INDEX tt_i NONCLUSTERED(ProductID)); INSERT @TVP (OrderDate, SalesOrderID, LineTotal, ProductID ) SELECT soh.OrderDate, soh.SalesOrderID, sod.LineTotal, sod.ProductID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883; SELECT x.OrderDate, x.LineTotal, p.Name FROM @TVP AS x JOIN Production.Product AS p ON p.ProductID = x.ProductID; --Temporary Table CREATE TABLE #TT (OrderDate DATETIME, SalesOrderID INT, LineTotal NUMERIC(38,6), ProductID INT, INDEX tt_ci CLUSTERED (SalesOrderID), INDEX tt_i NONCLUSTERED (ProductID)); INSERT #TT (OrderDate, SalesOrderID, LineTotal, ProductID ) SELECT soh.OrderDate, soh.SalesOrderID, sod.LineTotal, sod.ProductID FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID BETWEEN 43683 AND 43883; SELECT x.OrderDate, x.LineTotal, p.Name FROM #TT AS x JOIN Production.Product AS p ON p.ProductID = x.ProductID;
These two queries are using table objects, a table valued parameter and a temporary table. The execution plans are decidedly different from the one above and each other:
Interesting enough it chose to do a Merge instead of the Nested Loops and ordered the table scan to accomplish it. Yet, that query runs faster and has less reads than all the others… As long as you don’t count the time to load the temporary table. Then it just doesn’t work as well as the others, at all. Speaking of execution times, the first three queries all had exactly 27 reads and, within some variations, ran in about 30ms on my system. Anyway, query tuning is not the point of the discussion.
You can see the differences in the execution plans that you get between an expression, a CTE, and actual temporary storage, either table variables or temporary tables. Yet, you can also see that the CTE, an expression, just an expression, was resolved in the exact same way as the derived table and the simple JOIN. In short, a common table expression is not a table. Absolutely not. It’s an expression.
On a side-note, if you’re thinking of attending SQL Rally Nordic 2015 in Copenhagen and you want to spend some time talking about query tuning, I’m doing a full day pre-conference seminar. Click here now to register.
You should note that if you join a CTE to itself, it will execute the expression again. I learned that recently and… regretted some past choices.
The execution plan for this will show two scans.
WITH c1
AS ( SELECT [ProductID] ,
[Name] ,
[ProductNumber]
FROM [Production].[Product]
)
SELECT ca.ProductID ,
ca.Name ,
ca.ProductNumber
FROM c1 ca
JOIN c1 cb ON cb.ProductID = ca.ProductID
Oh yeah, absolutely. I wasn’t getting into what happens with recursion and multiple references. But, it’s still not a table. It’s just two queries instead of just one. Fine if that’s what you need. Horrible if you thought you were joining two tables together.
Exactly, and I think that’s a good demonstration of why they’re not tables. You don’t have to populate a temp table twice to join to it, nor can you say that three times fast with any measure of comfort.
I’ve always liked to think of CTE’s as an in-line view. I haven’t compared the execution plans though.
A view is just a query so…
[…] Common Table Expressions Are Not Tables – Grant Fritchey (Blog|Twitter) […]
It is worth a mention that in other SQL products, a CTE might be materialized if the optimizer sees that it is used more than once or would have some other advantage.
Thanks Joe. I don’t think I’ve seen that one occur yet, but I’ve read about it.
> A view is just a query so…
Yeah, but the metadata of the view is explicitly stored in the database (sorry sorry, I know being pedantic is frowned upon 😉
What I like (or find curious) is that you can “update” a CTE. Something like:
WITH CTE_BaseTable AS
(SELECT ProductName, Color FROM dbo.Products)
UPDATE CTE_BaseTable
SET Color = ‘Red’
WHERE ProductName = ‘Gate’;
The timeliness of this post on sqlservercentral.com today is amazing. Yesterday I was thinking that there should be an enhancement to ANSI SQL language that allows the SQL developer to specify when to materialize/cache a CTE. It would look something like:
WITH cte_name (col_list) AS CACHED (SELECT col_list FROM table_name)
etc.
This would basically tell the plan generator, “do this SELECT first, put it in a temp table, use the result set in the rest of the query, and then destroy the temp table.” No muss, no fuss. Only caveat is that I am giving up the ability to declare a PK or create an index, but maybe that is another enhancement.
@Curt
Surely you just mean temporary tables but with MORE typing and no possibility of using the same cached data in more than one query?
The only benefit I can see is that it would be all wrapped up in a single query.
SELECT col_list INTO #cte_name FROM table_name
(Add index, PK etc…)
etc…
etc…
etc…
DROP TABLE #cte_name
@Sam
If you use a CTE then by definition you don’t want re-use. Part of the appeal of a CTE is limited scope and lack of persistence (e.g. no DROP TABLE to worry about). When it is more appropriate to use a temp table then use a temp table.
@Curt… you don’t need to use DROP TABLE on temp tables in a stored procedure or even in a session. The temp tables automatically drop when the session ends or the “lesser scope” ends. The meta data is automatically preserved a bit in case the stored procedure is soon used again to improve performance, where a CTE will not, even if you trick it into “pre-materializing” with a “blocking operator”, which is another performance trick that can be used in some cases.