I’ve been reading the General Data Protection Regulation (GDPR) and discussing the ramifications of the beginning of enforcement with lots of people. The implications of it all are fascinating. The real serious issues remain primarily a business problem, with business defined solutions. However, there are technology issues that we need to think about. For example, performance metrics are going to be impacted by the GDPR.
Private Data and Monitoring Queries
First and foremost, let me say something I’ve said before. The vast majority of the focus around GDPR has to come from your business. Second, the bulk of your work and focus must be on ensuring core functionality in support of the GDPR. Third, the attack vectors and leaks for GDPR are not going to primarily be around something like monitoring. However, this is yet another place where you could be collecting private information that, in theory (because the lawyers have yet to speak), could be a location you need to deal with when addressing the requirement of the GDPR and related regulations.
When you capture query metrics through trace events or extended events, either using rpc_completed or sql_batch_completed, you not only get the query. You also get any parameter values associated with that query. Article 17 of the GDPR is extremely clear:
The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay…
While there are a list of exceptions to the definitions of Article 17 listed at the link, none of those is because the data isn’t in the database or is stored in some separate information store such as your monitoring of queries. Instead, the GDPR pretty much says that any place (SharePoint, Excel, etc.) that the data resides, must be documented as part of your processing and is subject to control through the Regulation.
So, you will have to ensure that you also deal with your query metrics when it comes to the GDPR. Granted, this data vector is probably not the first, second, or third, highest priority on your list. However, it needs to make your list. Why? If for no other reason, if you’re like me, you don’t treat your query monitoring the same way as you do your production systems and the backups of those systems. Yet, all the data that passes through your code on its way to storage in your database is captured in your query monitoring. So you probably have this stuff stored in locations that are not as secure as others. You probably move this data to secondary, non-production servers, for processing and testing. You’re potentially creating a data location that could lead to a breach.
Conclusion
It bears repeating, there is absolutely no reason to panic when it comes to working towards GDPR compliance. Further, thinking about tertiary processes and places where data resides, such as query monitoring, shouldn’t be your primary concern. However, once you have to major concerns in hand, you also should address these secondary processing locations.
The single best thing you can do to prepare for the GDPR is to read the regulation itself to understand what is required. Secondarily, I suggest reading through the ICO documentation from the UK. There’s a lot of good material there to reinforce and clarify what’s coming from an arm of government that is likely to be enforcing the application of these regulations.
[…] Grant Fritchey points out another spot that might store personal information: […]