What’s the story with AWS RDS and SQL Server deadlocks? I’m approaching AWS RDS like I was taking on a new role at a new organization. Do we have backups in place? Yes, great. Can I test them? Yes. Do they meet our RTO & RPO? Yes. Moving on. What have we got for monitoring? AWS RDS has a good percentage of the fundamentals. Now, it’s laid out a little oddly. You have the stuff going through CloudWatch which is largely OS oriented. Then you have enhanced monitoring, which you have to turn on, which covers eight key metrics for SQL Server. Finally, you can enable Performance Insights which gives you metrics on query behaviors (and yeah, any or all of these may be the subject of upcoming blog posts).
With a bare bones metric collection there (to be fair, there are a bunch of additional metrics I’d like to see, but the core is in place), what about errors? Can see them? Let’s generate a deadlock and see what happens.
Nothing.
Let the investigation begin.
Create an Alarm
My first idea was simple. Can we just create an alarm to get alerts on errors or deadlocks in particular. You can spot this from the “Log & events” tab here:
This gets us to a basic window:
The core behaviors are simple. Send notifications yes or no. I’ve already clicked no on this one just to keep things a little more clear & not get into some messy stuff. Next up, how are we going to measure the metric?
I’m going to pick SampleCount, but you can see that we could go with other approaches. I think SampleCount will work because I just need to know that I have any Deadlocks at all. Next, let’s look through the Metric list:
Well that was easy. There it is right there. Deadlocks. I’m going to select Deadlocks and then modify the rest of the Alert as follows:
I’ll click on Create and we’re good to go. Right? Go off to my test code, generate a deadlock, and get nothing. Lots and lots of nothing. I’ve generated deadlocks several different ways (well, it’s all one way really, but multiple different queries). I’ve modified the alert multiple different ways and it just won’t fire when I get a deadlock. Further, when I go to look at the Metrics page, deadlocks aren’t listed, so I’m unsure where this is coming from.
Now what?
Old School
Well, maybe deadlock errors aren’t firing because I have to configure them somehow. It’s entirely possible that I need to go to trace flags. OK. Fine. 1222 it is.
Ah, but you can’t simply set trace flags in AWS. No, you either need to get the command line involved, or, you have to create a custom Parameters setting to adjust the behavior of your instance (or modify the default Paramters). Here’s my custom one:
For details on this, read this SQLTips blog post.
Now, while this did generate the deadlock graph into my error logs, as defined, it didn’t cause the alert to fire.
New School, AKA Better School
Currently, Extended Events are not supported within AWS RDS SQL Server. I suspect that will change sooner rather than later. However, if you look for it, the system_health session is there and is running. Guess what the system_health session has inside? Deadlock graphs. You just need a query to pull them out and I happen to have the exact query needed:
WITH fxd
AS (SELECT CAST(fx.event_data AS XML) AS Event_Data
FROM sys.fn_xe_file_target_read_file(N'system_health*.xel', NULL, NULL, NULL) AS fx )
SELECT dl.deadlockgraph
FROM
(
SELECT dl.query('.') AS deadlockgraph
FROM fxd
CROSS APPLY event_data.nodes('(/event/data/value/deadlock)') AS d(dl)
) AS dl;
And it works. There’s a full set of deadlock information.
While this is working, the alert still is not.
Conclusion
I’m missing something. There’s a small piece of data that I don’t have. That’s preventing the alert from firing appropriately. Or, maybe that’s a holdover from one of the other RDS systems, MySQL or PostgreSQL. It’s not clear to me at this point.
What is clear to me is that while AWS RDS has a very rich set of tools, there is a learning curve associated with them.
I’ve been struggling with this also. It brought a smile to my face this morning when I read the title of the email regarding this post thinking you had resolved the challenge. Oh well not yet. Thank you for bringing attention & visibility to the new types of challenges faced with when starting to use RDS SQL Server.
I’m going to keep working on it. Fingers crossed, one or two people notice this blog post and tell us all what I got wrong.
Any progress? 🙂 Also looking for that!