I was feeling quite confident about my new-found abilities with spatial indexes so I did a presentation for my team, to share what I had learned. I had also been sharing with one co-worker as I developed the knowledge of spatial indexes. While I was preparing my presentation, he was preparing his. I had focused on finding a set of data that showed it’s proximity to a test location and then showing how retrieving that set of data was faster because of the spatial index. He took a different approach. He took the idea of saying, here’s a list of different test locations, let’s see which one of our internal locations meet the proximity test. At the same time, he tried three different spatial indexes, one with high granularity, one with medium and a final one with low granularity.
The day came for presentations. I showed how the spatial index improved performance and how you could read from sp_help_spatial_geography_index to see how efficient the index was for the sample data. It all went without a hitch. Then my co-worker showed his tests. Wherein the low density index outperformed high or medium density indexes, despite having a primary_filter at about 5% efficiency. I was floored. I couldn’t explain and nor could my co-worker. But as far he was concerned, this whole spatial index thing was for the birds.
I went away and I started testing. Finally, after feeling like I had a handle on things, I asked my co-worker for his scripts. He had a cursor (which he acknowledged was a lousy way to write the query) that walked through the test locations, one at a time, and ran my query against our data. Sure enough, when I carefully built out the indexes and ran the test, the low density indexes worked better than the high density indexes. I set up a trace event and captured all the statement completion events so I could compare the high & low density runs. That’s when I spotted the issue.
The majority of the test locations didn’t find any matches in our location data. When the query ran against the low density index it would usually have a few hundred reads and a duration around 20ms to return no data. On the other hand, the same query, returning no data, on the high density index had about 5000 reads and took about 34ms. With each one of these cumulative queries running faster against the low density index, it really appeared to be better. But, when you got a query where data was returned, the low density index had more reads and a lot longer duration, about 500ms for the low density index compared to a consistant 34ms for the high density index.
There was my explanation. The cursor masked the fact that over and over again, the query was running but no data was being returned. I’m in the process now of rewriting the query to use the same set of test locations, but JOIN it against our location data to see which of the indexes is faster.
Actually it makes sense, from my limited knowledge of spatial indexes.
A match on a spatial index can return false positives, but never false negatives. So if a seek on the low density spatial index returns no matches, that’s it, done and done. Since the low density index doesn’t require many reads, it’ll be quick.
If it had returned any rows then each of the rows matching that it returned would have been checked against the actual function to see if the match was false or real. That requires reading over the data rows.
The higher the density of the index, the more accurate the index is for matches or no matches and the fewer rows will have to be checked to see if it’s a false positive or not, but the tradeoff is that the index is larger and more expensive to read
At least I think that’s how it works…..
p.s. If you have the PASS DVDs from last year there was a session just on spatial indexes. Was very good. Can’t offhand recall the speaker, think he was MS.
Your description of how it works certainly is in line with the data I’m seeing from this experiment.
Thanks for the hint on the PASS DVD. I do have them (benefit of being a chapter president). I’ll look through them today.