Please, let me reiterate: The only valid test of a backup is a restore.
THE
ONLY
VALID
TEST
OF
A
BACKUP
IS
A
RESTORE.
I’m happy that you have backups. That’s great. Can you restore them?
Let’s tell a story.
I Play With Radios
“WE KNOW! SHUT UP ABOUT IT! Damn radio people. Worse than crossfitters.”
Cool, cool! I won’t get into the radio stuff…. too much. Just a quick setup. Please bear with me.
There’s an organization in the US, American Radio Relay League (ARRL). Founded back in the dawn of time, more or less the first radio club. Now, the ARRL does a lot of stuff for radio users in the US, but also everywhere, but there’s controversy (when isn’t there). One thing they did was build this tool called Logbook of The World (LoTW). This is an online database (and now you see where I’m going and why I needed you to bear with me, hang on just another couple of seconds) for storing contacts between radio operators. It was one of, if not the, first online databases for doing this, so, it’s kind of universally used (kind of). However, LoTW was designed and built at the dawn of the internet era, and lordy does it show. There, all radio talk done. On with the story.
The Story
In May, the ARRL was hacked. You can read the official series of press releases about the incident here.
TLDR: Hacked, didn’t report that for two weeks, took six weeks to get LOTW online, to this day, some services are still offline.
You can get even more detail of the attack from ARRL here, including how they paid the ransom to get their systems back online. Additional coverage can be read here.
“Dude, where’s the database in this story?”
Yeah, yeah. Sheesh.
Why did it take six weeks to get LoTW back online? Heck, why are there still services offline?
A person heavily involved with the ARRL, a board member, Mickey Backer, N4MB, gave a presentation that’s available online (less said about Zoom & presenting, the better). Yeah, the story starts with the cyber attack, but, hoo boy, it goes downhill.
They either didn’t have offsite backups, or, the Threat Actors (TA) got to ’em. Ouch. But that’s not the story I want to share. They didn’t adequately protect their backups and didn’t have offline copies of any of them. They were gone. Now there were other means of recovering the data…
Oh, this makes me cry, the OS was CentOS and the database was SAP MaxDB, both out of support for over five years. But that’s OK, they were running other systems on Windows 98 and FoxPro. Yes, you read that correctly. In fact, they don’t know all the operating systems and databases that they have under management, despite everything being in one room. So yeah, expensive consultants were able to rebuild databases & stuff from scratch. You won’t be shocked to find that recovery wasn’t much of a focus.
Seems like between the loss of the backups and the hack itself, the IT director was let go. New IT director. New “backup appliance”.
I can hear him now. “I can’t fix what happened in the past. No one can. But this new fully functional Death Star, uh, I mean, backup appliance, that cost less than a Death Star, but more than an aircraft carrier, will protect us from ever having to go through this again. We are protected and safe! You can thank me now.”
Mickey Backer and I must be separated at birth because he asked the only question that matters: “Cool, have you tested a restore?”
Since you know this is one of my rants, you already know the answer to that question.
“No.”
SWEET JUMPING FREYA ON A SKILLET!
How many times in how many situations? Restores are all that matters. If you can’t restore, your backups are meaningless. If you don’t know how to restore, your backups are useless. If you haven’t tested and practiced restoring databases, you’re doing it wrong. Period. Full stop. I have zero, and I mean, zero sympathy for this. It’s 20<insert curse word>24! We know this. Or we should. There are no more excuses. For crying out loud.
OK. It’s OK. I see you crying. I’m not mad, I’m just…
No. In this case. I’m just mad. I am mad.
I know IT is hard. I know budgets get cut. I know, many, many friends who aren’t really working as IT professionals, but instead as firefighters (and not naked ones, old joke, if you get it you get it), running from fire to fire. Some are striving to fix things. Others have just acquiesced that they’re just going to be putting out fires forever. Be with the first group, not the second.
We can get better. It’s just not easy.
Oh, and please, go and test your backups… with a restore. You got this.
Thank you, Grant for a tale of the very real importance of DB Backups and testing those backups. Despite the goofy graphics, it’s often not emphasized enough.
Richard (KO4IVE)
Apologies for the goofiness. When I feel like I’m only yelling at people, I do try to inject a bit of humor.
🙂
My other job is as a pilot, where we try VERY hard to learn from other people’s mistakes, especially when they have been thoughtful enough to kill themselves making it. So yes, I have listened and followed your advice – multiple backups, including to a drive that lives turned off. I log onto the server, check it over manually and when all is verified okay, I manually turn on that drive, copy over the newest backups, unmount it and turn it off again. And I regularly do a full restore and run the primary application against the restored DB. Never had it fail, in the almost 20 years I’ve been playing with SQL Server, and I sleep a lot better KNOWING that my backups are (1) beyond the reach of any possible malware and (2) fully functional.
Awesome!