I schedule many of my blog posts at least 2-3 weeks out, so I have time to adjust them, change, them, insert new things into the schedule, what have you. So, as you read this, I’ll have written it at least two weeks ago. As I write this, I’m recovering from a very mild, but irritating, cold. I needed to write a blog post, but my stuffed head wasn’t coming up with ideas. So, I half-jokingly asked Twitter for any execution plan questions so I could write something. First response, also a joke, was the question at the title of this post:
What is the preferred operator when joining tables: Hash Match, Nested Loops or Merge?
While my immediate response to this question is, yes. Meaning, they’re all preferred, situationally. I decided to expand on that a bit.
What is the Preferred Join Operator
I worked with a consultant once, incredibly smart and capable person from whom I learned a ton, that tuned a query by using the hint, FAST 1. It just happened that this particular query ran faster when it used Loops joins, but the optimizer kept picking Hash joins. The FAST 1 hint worked. The query was faster. The consultant saved the day.
The lesson learned by the development team he was supporting was that Nested Loops joins were always faster and that FAST 1 ensured that you got all Nested Loops joins. Therefore, adding FAST 1 to every single query in the application was the key to performance.
Yeah, it took a long time to get that fixed.
The problem is, the answer to the question, which is the preferred join operator can only be: it depends. It depends on the amount of data being returned. It depends on the existence of indexes to support the join operation. It depends on the T-SQL. In fact, it depends on a lot of things. The one absolute statement I can make is that you can not look at an execution plan, see any of the join operators, and based simply on the existence of that operator declare that there is a problem and you need to implement FAST 1 (or whatever your preferred query hint that fixes everything is).
Let’s be clear. There is no preferred join operator.
For small data sets (and that word “small” is hard to define), Nested Loops works really well. However, if you have ordered data, Merge is probably the most efficient join operation, regardless of data size (but note the caveat at the front, ordered data). Hash joins cover a multitude of sins and give us efficient join mechanisms when dealing with larger data sets and/or poor or missing indexes. Further, there are caveats and exceptions to each of the general descriptions I’ve just given.
Oh, and, adaptive joins now play a factor, so read up on them.
Fine, but Which Operator
Everyone wants a really simple check list here. There is not one. It’s extremely situational. So, the best thing I can tell you is, follow the general, and vague, guidelines above. After that, it’s about drilling down into the query, the data structure, the statistics, and the data to understand if a given join operator is right in the exact situation you’re dealing with. The information you’ll need to do that is within the properties of the execution plan. There is no preferred join operator.
For more on execution plans and a whole bunch of other stuff, please join one of my upcoming, all-day seminars on the tools of query tuning:
For SQLSaturday Indianapolis (and my first time speaking in Indiana) on August 10, 2018. Please go here to sign up.
I’ll be at SQLSaturday Sioux Falls (and my first time speaking in South Dakota) on August 18, 2018. Please sign up here.
For SQLSaturday Oslo on August 31, 2018. Click here right now to register.
I’ll be at my local event too, SQLSaturday Boston, on September 21st. You can go here to register.
[…] Grant Fritchey dodges the important questions: […]