Honestly, sincerely, no kidding, I love Distributed Replay. Yes, I get it. Proof positive I'm an idiot. As we needed proof. To be a little fair to me, I love what Distributed Replay could have been, with a little more love. However, fact is, it's on the deprecation list for 2022. Which means, what minimal amount of love, if any, that Microsoft was giving to it, it's all gone, forever. Unlike the Little Engine That Could, turns out that Distributed Replay was the Little Engine That Almost Could, But Didn't. Really Didn't. Let's discuss it a bit. Distributed Replay The concept is wonderful. Capture a bunch of queries from your production system. Replay them on a non-production system for testing. Add in the idea of being able to chain together…
In the last few Fundamentals posts you were introduced to a couple of ways to limit and control the data stored in the tables in your database. A primary key won’t allow a duplicate value. A foreign key won’t allow a value to be added that doesn’t already exist in the parent table and it will prevent data from being deleted. These are types of constraints on data in your database. There are a bunch of other ways to constrain the data in an effort to ensure that the data stored is exactly what the business needs. The next few Fundamentals posts will cover several methods of limiting data. Unique Constraints to Stop Duplicates When the concept of the primary key was introduced earlier in the series, two different types…
Everyone knows that you only get a single clustered index, right? Wouldn't it be great though if you could have two clustered indexes? Well, you can. Sort of. Let's talk about it. Two Clustered Indexes First I'm going to create a table: DROP TABLE IF EXISTS dbo.od; GO SELECT pod.PurchaseOrderID, pod.PurchaseOrderDetailID, pod.DueDate, pod.OrderQty, pod.ProductID, pod.UnitPrice, pod.LineTotal, pod.ReceivedQty, pod.RejectedQty, pod.StockedQty, pod.ModifiedDate INTO dbo.od FROM Purchasing.PurchaseOrderDetail AS pod; With that in place, let's start with a clustered index: CREATE CLUSTERED INDEX TestCIndex ON od (ProductID); And, a query to test with: SELECT od.PurchaseOrderID, od.PurchaseOrderDetailID, od.DueDate, od.OrderQty, od.ProductID, od.UnitPrice, od.LineTotal, od.ReceivedQty, od.RejectedQty, od.StockedQty, od.ModifiedDate FROM dbo.od WHERE od.ProductID BETWEEN 500 AND 510 ORDER BY od.ProductID; This results in the following execution plan: OK. Well done, Grant. That's how a clustered index works. The…
Quite a few years ago, I wrote a post about SELECT * and performance. That post had a bit of a click-bait title (freely admitted). I wrote the post because there was a really bad checklist of performance tips making the rounds (pretty sure it's still making the rounds). The checklist recommended a whole bunch of silly stuff. One silly thing it recommended was to simply substitute ALL columns (let me emphasize that again, name each and every column) instead of SELECT * because "it was faster". My post, linked above, showed that this statement was nonsense. Let's be clear, I'm not a fan of SELECT *. Yes, it has some legitimate functionality. However, by and large, using SELECT * causes performance problems. SELECT * Hurts The most fundamental place…
Reading execution plans in SQL Server is just hard. There's a lot to learn and understand. I previously outlined the basics I use to get started when I'm looking at an execution plan for the first time. However, just those pointers are not enough. I want to explain a little further why and how those basic steps are how you get started reading execution plans. To begin with, instead of talking about the first operator, which I've detailed before, we'll talk about the highest cost operators. Highest Cost Operator Every execution plan within SQL Server includes what the optimizer has determined to be the estimated cost of each operation. All these estimated operator costs are tallied up, and that makes up the estimated cost of the whole execution plan. You…
Wouldn't it be great to be able to put together queries and waits at the same time? You all capture query metrics using some method. Most of us query sys.dm_os_wait_stats or sys.dm_db_wait_stats. Combining them is hard. You could query the wait stats. Store the results in a table variable. Run the query in question. Then query the wait stats again into a different table variable. Join the two table variables together to find the differences. Ta-da, you have query waits. Well. Probably. If you're the only one running queries on the system. Also, you're not seeing system waits or other noise caused by activity on the system. Or, we could put Extended Events to work. Queries and Waits Just like Profiler/Trace, you can capture stored procedures, batches, and individual statements…
Of all the things that Extended Events does, I've found the ability to quickly and easily gather a little bit of data and then use the Data Explorer window Live Data grouping to aggregate it to be one of the greatest. Sure, if we're talking about using Extended Events on a busy production server, this method probably isn't going to work well. There, you are going to be better off querying the XML (I know, I know, but I have ways to help there too). But in development, when doing testing and query tuning, the Live Data window is a gift of the gods on par with fire or beer (it's not as good as whiskey). Live Data Grouping Let's imagine a scenario like this. You're working on some query…
In April, I said I was going to start learning Jupyter Notebooks. It's November. Let's get going with your first Jupyter Notebook. A quick aside before we start. I think one of the huge strengths that is going to come out of these things is as a runbook. You can share a notebook with someone, they can run the queries on it against their own systems and return the book, with the results to you. That's going to be extremely useful as a troubleshooting tool, but has all sorts of other functionality as well. I strongly suggest you start learning these things, as I am. Azure Data Studio There are a number of ways to create and consume Jupyter Notebooks, but I want to focus on the functionality around data…
In order to take advantage of R and Python (and Java in SQL Server 2019) directly from your SQL Server scripts, you'll be using the function sp_execute_external_script. When you see this code in use for the first time, it's going to remind you of sp_execute_sql. The very first thing I thought about was, "Oh no. Another SQL Injection vector." I have a little good news and a little bad news. It's Not SQL The first and most important thing to understand is, we're not talking about SQL. Let's start with looking at some code. This is straight from the examples in the Microsoft documentation linked above: DROP PROC IF EXISTS generate_iris_model; GO CREATE PROC generate_iris_model AS BEGIN EXEC sp_execute_external_script @language = N'R' , @script = N' library(e1071); irismodel <-naiveBayes(iris_data[,1:4], iris_data[,5]);…
In my last post I showed how you can create a volume with your container. I then showed a few things you can with a container using a volume. I want to explore volumes just a little bit more. Locate Your Volume To have a little more fun with volumes, first, let's share a drive. You do this in the Settings in Docker Desktop (assuming that's what you're using): While this should just work, it didn't for me until I restarted Docker. So you may need to do that. Go to the drive and create a directory. I'm putting one in at C:\Docker\SQL. Once I've done that, let's create a new container: docker run ` --name SQL19 ` -p 1433:1433 ` -e "ACCEPT_EULA=Y" ` -e 'SA_PASSWORD=$cthulhu1988' ` -v C:\Docker\SQL:/sql `…