Most people are very aware of the fact that having a foreign key constraint in place on your tables adds overhead to inserts and deletes. And many people believe that there is no benefit to foreign keys beyond referential integrity (which, the application can handle perfectly well, right?). But is that an accurate statement? Here’s the basis for our investigation, a query in AdventureWorks2008R2:
SELECT p.LastName + ', ' + p.FirstName AS 'PersonName' FROM Person.Address AS a JOIN Person.BusinessEntityAddress AS bea ON a.AddressID = bea.AddressID JOIN Person.BusinessEntity AS be ON bea.BusinessEntityID = be.BusinessEntityID JOIN Person.Person AS p ON be.BusinessEntityID = p.BusinessEntityID;
This query results in the following execution plan:
I know that is an ugly query and an ugly query plan, but bear with me for a moment. Do you notice anything about the query and the plan at this point? Count the tables and the access operators. That’s four tables and two access operators. That means, that despite the fact that I listed four tables in the query, the optimizer was smart enough to figure out that it only needed to pull data from two of the tables and completely ignored the others because, the key values in the Person.Person table and the Person.BusinessEntityAddress table were the same. It didn’t even bother with the Address table since nothing from that table is in the WHERE or SELECT and, it skipped the BusinessEntity table because, the foriegn key relationships ensure that the data in the other tables can be trusted.
Don’t believe me? For those trying this at home, run this script in AdventureWorks2008R2:
SELECT * INTO dbo.MyAddress FROM Person.Address; SELECT * INTO dbo.MyBusinessEntityAddress FROM Person.BusinessEntityAddress; SELECT * INTO dbo.MyBusinessEntity FROM Person.BusinessEntity; SELECT * INTO dbo.MyPerson FROM Person.Person;
Then modify the original query so that it looks like this:
SELECT p.LastName + ', ' + p.FirstName AS 'PersonName' FROM dbo.MyAddress AS a JOIN dbo.MyBusinessEntityAddress AS bea ON a.AddressID = bea.AddressID JOIN dbo.MyBusinessEntity AS be ON bea.BusinessEntityID = be.BusinessEntityID JOIN dbo.MyPerson AS p ON be.BusinessEntityID = p.BusinessEntityID
This query will produce the following execution plan:
That is the exact same query running against exact copies of the tables from the original, yet, now, instead of two operators and one join operation, we’re looking at four operators and three join operations. That my friends is the optimizer taking advantage of the fact that there are foreign keys in place that ensure trust in the data which enables the optimzer to eliminate unnecessary tables from the plan. Not only do the execution plans differ, but the execution time was about 1/6 slower, consistently, in the second query as it did all kinds of reads against the two tables that were eliminated in the original query plan.
I know some of you are saying, “Well, we can just put indexes on the tables to fix that problem, we still don’t need constraints.†OK. Let’s check it out. Here’s a script to put indexes in place, which will surely fix the heinous execution plan above:
ALTER TABLE dbo.MyAddress ADD CONSTRAINT PK_MyAddress_AddressID PRIMARY KEY CLUSTERED ( AddressID ASC ) CREATE NONCLUSTERED INDEX IX_MyBusinessEntityAddress_AddressID ON dbo.MyBusinessEntityAddress ( AddressID ASC ) ALTER TABLE dbo.MyBusinessEntityAddress ADD CONSTRAINT PK_MyBusinessEntityAddress_BusinessEntityID_AddressID_AddressTypeID PRIMARY KEY CLUSTERED ( BusinessEntityID ASC, AddressID ASC, AddressTypeID ASC ) ALTER TABLE dbo.MyBusinessEntity ADD CONSTRAINT PK_MyBusinessEntity_BusinessEntityID PRIMARY KEY CLUSTERED ( BusinessEntityID ASC ) USE [AdventureWorks2008R2] GO ALTER TABLE dbo.MyPerson ADD CONSTRAINT PK_Person_BusinessEntityID PRIMARY KEY CLUSTERED ( BusinessEntityID ASC )
After building out the indexes, we get this execution plan:
Oops. Except for the fact that we have Clustered Index Scan instead of Table Scan for most operators, the only real change to this is the inclusion of the Index Scan operation against the new index on the MyBusinessEntityAddress table. In short, no improvement at all.
OK, you might be thinking to yourself, what if this was a proper query and there was a WHERE clause. Let’s modify the queries:
SELECT p.LastName + ', ' + p.FirstName AS 'PersonName' FROM Person.Address AS a JOIN Person.BusinessEntityAddress AS bea ON a.AddressID = bea.AddressID JOIN Person.BusinessEntity AS be ON bea.BusinessEntityID = be.BusinessEntityID JOIN Person.Person AS p ON be.BusinessEntityID = p.BusinessEntityID WHERE p.LastName LIKE 'Ran%'; SELECT p.LastName + ', ' + p.FirstName AS 'PersonName' FROM dbo.MyAddress AS a JOIN dbo.MyBusinessEntityAddress AS bea ON a.AddressID = bea.AddressID JOIN dbo.MyBusinessEntity AS be ON bea.BusinessEntityID = be.BusinessEntityID JOIN dbo.MyPerson AS p ON be.BusinessEntityID = p.BusinessEntityID WHERE p.LastName LIKE 'Ran%';
The first query resulted in this execution plan:
Now clearly, more tuning is probably possible here since we’re still looking at an Index Scan, but the point is not whether or not the query is tuned, the point is, that the optimizer can eliminate tables because of the trust created by the foreign key constraints. And what did the other query produce?
Okay… The less said the better. Once again, tuning opportunities do exist, but we’re still accessing tables where it is absolutely not necessary, as demonstrated by the original query and it’s execution plan. The data returned has been the same, each and every time.
Back to the question, do foreign key constraints help performance? Let me ask you one in return. Have you ever looked to see which was done more in your OLTP, reads or writes? If you haven’t, take a look. I’ll be the answer surprises you. My answer to the question, yes, it can. Not that it will, but that it can.
Nice!!
Awesome Post! I wonder how the OLAP folks will take it — thou that resistance is because of ETL.
Thanks Grant, the more articles like this the better, maybe the truth will finally sink in! One former employer had a policy of using foreign keys on development but then remove them on production “because they just slow the system down and the application guarantees integrity.” Needless to say, read activity hugely dominated write activity, plus a good chunk of my DBA time was spent cleaning up bad records inserted by the application.
Excellent post.
Nice post, Grant! Thanks for that. 🙂
Thanks everyone. Struck a chord with this one evidently. But, it’s true, even very high write OLTP systems are usually more read than write, so stuff like this can, and will, matter.
Excellent post Grant. When I grow up, I want to write blogs posts like this.
Brian, don’t wait until you grow up. If I had to wait that long I still wouldn’t be blogging (since I still haven’t quite grown up). And thanks.
Thanks Pat.
Great post Grant. Really helpful!
Excellent post! thanks for the clear demonstration of what could be a confusing topic.
Pure SQL #awesomesauce Grant – Thank you for clarifying this in plain english!
Thanks Wendy, Meredith.
Great post Grant. First off let me say I completely agree with you about having FK’s. That being said it made me think a little bit about a system that does have really high volumes of writes. What are you thoughts about creating a plan guide with FK’s in place, removing them and using the plan guide to get the performance gain? I know it could be a maintenance nightmare. I better go shower after that last thought…
Hey Dave,
That’s actually not a bad idea. I’m not sure that it would work, and you’re still possibly going to run into integrity issues, but that could work… although… if I recall correctly, the plan guide can’t force the optimizer to do things it can’t do, like scan a table that’s not part of the query, that sort of thing… without the FK… I suppose I have another blog post now. Thanks.
Once again you’ve proven that taking general guidelines into account pays off.
The engine really takes these things into account and uses it to its – and finally your – advantage.
“Tell your system what you know. e.g. PK, AK, FK”
Thanks.
Thanks Johan. Glad you liked it.
[…] Do Foreign Key Constraints Help Performance? – Hmmm sounds like a trick question but is it? SQL Server MVP Grant Fritchy discusses. […]
Great Post!
Maybe the obvious is to obvious to get noticed, but people need to keep in mind that having contraints declared using “with nocheck” will only serve a documentation purpose. SQLServer knows that constraints “with nocheck” are not to be trusted, so it will not be able to optimze as shown in the execlent article.
It takes time, but eventually we will get there 😉
Great article!
As ALZDBA states, you need to make sure your foreign keys are ‘trusted’ – check the is_not_trusted column in sys.foreign_keys – if this isn’t set, the execution plan will take the same route as the non-FK query.
To set them simply ALTER TABLE … WITH CHECK CHECK CONSTRAINT …
One final note: trusted is not the same as enabled – the FK can be enabled, but still untrusted – all that means is that SQL Server hasn’t verified it.
Another great post on the trustworthy nature of FKs by the Datachix at http://datachix.com/2010/09/09/can-you-trust-your-foreign-key-or-feature-821-i-didnt-already-know-about-sql-server/
[…] […]
This is something really simple that can make a big difference, great explanation as well.
[…] I ran across an old post form Grant Fritchey that shows Foreign Keys do more than that. They can actually help performance because the SQL Server database engine knows that there is data in the related tables that matches […]
Grant,
Its great post.
Grant you always “GRANT” (SQL GRANT)something to floks.
That’s y your parents named you grant.
Thanks! That’s a good one.
Hi,
Understood well about using foreign key , thanks a lot for the article..
And to be realistic, when in the history of relational databases has an application ever succeeded in enforcing the data integrity better than or even as well as the database could do it? I’ve used hundreds (maybe even thousands when you consider data imports where I see the data but not the orginal database) of different databases and I have never yet seen one that had data integrity if the FK relationship was not in place in the database. It is too easy to get around the application. And putting checks only in the application doesn’t consider that other data affecting activities will not always go through the data layer, even if you think they will.
Grant, curious to know, if you add the predicate “bea.AddressID <= 1000" to your queries, why does the optimizer no longer eliminate the Person.Address table from the execution plan? I would’ve thought that the trust created by the foreign key would still hold in this case, but it seems not.
Thanks for this article Grant. But I have one question, you said after building indexes also on other set of tables dbo.MyAddress, dbo.MyBusinessEntityAddress, dbo.MyBusinessEntity and dbo.MyPerson, you got execution plan with tables not really required. But I observed that when you create unique clustered indexes or primary keys and exclude creation of foreign keys, still you get perfect execution plan with only minimal (desired) set of tables. Could you please recheck?
Simply awesome, good Explanation! – Never feed wrong inputs to Optimizer, you query will hung!
[…] within the parent table. This allows for better optimization of joins, even elimination of joins (see an example of this here). But, that’s not the same as what the optimizer does with indexes. Again, this is where the […]
Thanks for your post,
I have a doubt in your post,
1.) Why should you join all the four tables? you are going to select the columns from person table alone
SELECT p.LastName + ‘, ‘ + p.FirstName AS ‘PersonName’
FROM dbo.MyAddress AS a
JOIN dbo.MyBusinessEntityAddress AS bea
ON a.AddressID = bea.AddressID
JOIN dbo.MyBusinessEntity AS be
ON bea.BusinessEntityID = be.BusinessEntityID
JOIN dbo.MyPerson AS p
ON be.BusinessEntityID = p.BusinessEntityID
WHERE p.LastName LIKE ‘Ran%’
2.In the last Execution plan it has skipped the Address table alone. can you please explain why it has skipped ?
Please Explain me..
Thanks
V.Jeyaraman
1) It’s pretty common to see tables that are not needed to be included in queries. Imagine, for example, this was within a view. As long as the foreign keys are enforced using the WITH CHECK option, the optimizer can do something about them. But as soon as it’s not, as demonstrated, there’s nothing the optimizer can do except deal with the tables presented, whether they are needed for a particular query or not.
2) It’s part of the simplification step, it removed tables that it determined were not needed.
Thanks for your reply..
you have written “WITH CHECK option” what is mean that WITH CHECK option ? please Explain me
Thanks
V.Jeyaraan
Here, check out the ALTER TABLE command http://technet.microsoft.com/en-us/library/ms190273(v=sql.105).aspx. It shows WITH CHECK and WITH NOCHECK and explains them.
Thank U Mr.Grant Fritchey,
[…] are in place and enforced the optimizer can take advantage of them to get you better performance. Here’s a blog post with a simple example that shows it in […]
[…] posted previously about how a foreign key constraint can change how a SELECT query behaves. Logically that just makes sense. But other types of constraints don’t affect execution plans […]
–“I wonder how the OLAP folks will take it?”
I’m an OLAP person and all this is great by me. I do not understand the absence of FKs beyond the initial staging tables.
“the optimizer was smart enough to figure out that it only needed to pull data from two of the tables and completely ignored the others because, the key values in the Person.Person table and the Person.BusinessEntityAddress table were the same. ”
So basically you wrote a bad query joining tables you didn’t need to join and the optimizer was smart enough to fix it for you.
In the case where FK’s are removed and integrity enforced via the APP the SQL developer can make the same assumption the optimizer made and simply join the two tables that are required. Resulting in a query plan that is exactly the same not to mention much easier to read.
But to answer your question. Can FK’s help performance? Sure, if you write bad queries.
[…] created this example several years ago that illustrates how foreign key constraints can help performance. It’s a […]
[…] Inside an SSDT Deployment Contributor Deployment Contributor KeepTableColumns Filter Generate HTML Formatted Emails from SQL Server SQL Server 2016 Query Store Example Stairway to SQL Server Security Level 11: Auditing Follow-up on cursor options SQL SERVER – Scope of ERROR_MESSAGE Repairing a replication subscriber Step by Step SSIS – Conditional Split Transformation TIP #117 Predicates and Event Data No thrills about spills Interesting things about INSTEAD OF triggers Put down Wireshark. It’s overkill. Do Foreign Key Constraints Help Performance? […]
For those focused on the simplification aspect of this, I wrote another blog post that shows additional performance benefits for trusted foreign keys: https://www.scarydba.com/2015/09/09/yes-foreign-keys-help-performance/
[…] RI, it actually feeds this knowledge into the database optimizer. Meaning, certain queries can be more efficient when RI is in place. So, while you will take a performance degradation on “writes”, […]
Note that this only applies to a specific use case, where your query includes joins against tables that are precisely equivalent conditions to the FK tells you, and thus don’t impact the final result. The FK constraints allow the optimizer to drop the joins against those unneeded tables. But if the tables *did* influence the result, the FK wouldn’t help you.
[…] https://www.scarydba.com/2010/11/22/do-foreign-key-constraints-help-performance/ […]
[…] https://www.scarydba.com/2010/11/22/do-foreign-key-constraints-help-performance/ […]
[…] are in place and enforced the optimizer can take advantage of them to get you better performance. Here’s a blog post with a simple example that shows it in […]