About five years ago, I was checking an app before it went to production. I hadn’t seen the app before then and a junior dba had worked with the developers designing and building the app. It didn’t use a single stored procedure or view. Instead, it was built entirely of multi-statement UDF’s. These UDF’s called other UDF’s which joined to UDF’s… It was actually a very beautiful design in terms of using the functions more or less like objects within the database. Amazing. It also would not, and could not, perform enough to function, let alone scale. It was a horror because they thought they were done and ready to go to production, but no one had ever tested more than a couple of rows of data in any of the tables. Of course, a couple of rows of data worked just fine. It was when we put in 10, 1000, a few million, that the thing came to a total and complete halt. We spent weeks arguing about the stupid thing. The developers instisted that since it was “possible” to do what they did, that, in fact, it was OK to do what they did.
Anyway, with the help of a Microsoft consultant, we finally cleaned up the app and got it on it’s feet. Ever since then, I’ve preached the dangers of the multi-statement table valued function. The thing to remember is, there are no statistics generated for these things. That means the optimizer thinks they return a single row of data. When they do only return a few rows, everything is fine. When they return even as little as a hundred rows, like the example I’m posting below, they stink.
Anyway, I boiled up this silly example because some developer accused me and several other DBA’s of spreading Fear, Undertainty, and Doubt because we suggested that the multi-statement UDF is something to avoid if possible. Actually, he pretty much all but stated that we didn’t know what we were talking about. I was peeved. Hence this example. Feel free to check it out. Oh, and if you check the execution plans, note that the multi-statement UDF is marked as the least costly even though it actually performs twice as slow as the others. One more example of execution plans being wrong.
Here are the time results from one run of the view & UDF’s:
(99 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.(99 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.(99 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.
And the code to test for yourself:
CREATE TABLE dbo.Parent
(ParentId int identity(1,1)
,ParentDate datetime)CREATE TABLE dbo.Child
(ChildId int identity(1,1)
,ParentId int
,ChildDate datetime)DECLARE @i int
DECLARE @j int
SET @i = 1
SET @j = 1
WHILE @i < 100
BEGIN
INSERT INTO dbo.Parent
(ParentDate)
SELECT GETDATE()
WHILE @j < 100
BEGIN
INSERT INTO dbo.Child
(ParentId
,ChildDate)
SELECT @i
,GETDATE()
SET @j = @j + 1
END
SET @i = @i + 1
ENDCREATE VIEW dbo.vJoin
AS
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentIdCREATE FUNCTION dbo.SingleUDF ()
RETURNS TABLE
AS
RETURN
(
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
)CREATE Function dbo.MultiUDF ()
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime)
AS
BEGIN
INSERT INTO @Multi
(ParentId
,ParentDate
,ChildId
,ChildDate)
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
RETURN
ENDset statistics time on
select * from vJoin
select * from SingleUDF()
select * from MultiUDF()
set statistics time offUPDATE -- Edited the formatting.
Thanks for the link.
Did you notice that the person in question disappeared from the ‘discussion’ as soon as some tests got posted?
Nothing like lobbing a few rocks and running away to show both bravery and intelligence…
Actually, I’m in the multi-statement-table-function crowd, but with a caveat: I always specify a primary key on the table to be returned.
With the primary key in place, I find these functions to be significantly faster than views or table-valued functions, especially as the data set grows larger.
Am I missing something… all have the same reading ?
Does this nullify the statement above ?
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(99 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(99 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(99 row(s) affected)
Okay now this was with a 1098 … udf win so far …
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.
(1098 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
(1098 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
(1098 row(s) affected)
Okay so my cpu bit the dust bit stil it would seem that udf is faster ? Or am I missing something ? Maybe with computer having more and more ram udf might actualy be better ?
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 109 ms, elapsed time = 882 ms.
(101097 row(s) affected)
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 844 ms.
(101097 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 589 ms.
(101097 row(s) affected)
Hi there,
Great article! I know you wrote this a while ago, but there was a little bug in the code. You forgot to reset @j in the outer loop which made the query insert only the first 100 records in the table. I fixed that and created a table with about a million records in it. Please find the results below:
View
—-
CPU time = 1482 ms, elapsed time = 10633 ms
Table ‘Child’. Scan count 1, logical reads 3281
sub-tree cost: 11.24 (19%)
inline UDF
———-
CPU time = 1201 ms, elapsed time = 11551 ms.
Table ‘Child’. Scan count 1, logical reads 3281
sub-tree cost: 11.24 (19%)
Multi-valued UDF
—————–
CPU time = 6256 ms, elapsed time = 27981 ms.
Table ‘#1273C1CD’. Scan count 1, logical reads 4118, physical reads 63, read-ahead reads 3745
sub-tree cost: 36.76 (62%)
Formatting has gone wonky on this one, not sure why.
Anyway, thanks for the catch.
Another interesting observation. If I Display Estimated Execution Plan on the three select statements, I get the vJoin as 26%, SingleUDF as 26% and MultiUDF as two queries of 0% and 47% respectively. The last query plan shows all the query parts. If I use Include Actual Plan, I get three query plans of 50%, 50% and 0%.
Perhaps you’d re-do this with more real-world configurations…
Maybe, as most of us tend to code, you’d use local variable IDs for Parent and Child and re-code the UDFs to be parameterized, then run the full-set, parent-set and parent-child-explicit test cases with far more data than 100 rows…
For example, minus the extra data and trying to remove bias by running the selects in different sequences to try to minimize the impact of the cache/fetch (likely could be vastly improved upon):
BEGIN TRAN
SET NOCOUNT on
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MultiUDF]’) AND type in (N’FN’, N’IF’, N’TF’, N’FS’, N’FT’))
DROP FUNCTION [dbo].[MultiUDF]
GO
CREATE TABLE dbo.Parent
(ParentId int identity(1,1)
,ParentDate datetime)
CREATE TABLE dbo.Child
(ChildId int identity(1,1)
,ParentId int
,ChildDate datetime)
DECLARE @i int
DECLARE @j int
SET @i = 1
SET @j = 1
WHILE @i < 100
BEGIN
INSERT INTO dbo.Parent
(ParentDate)
SELECT GETDATE()
WHILE @j < 100
BEGIN
INSERT INTO dbo.Child
(ParentId
,ChildDate)
SELECT @i
,GETDATE()
SET @j = @j + 1
END
SET @i = @i + 1
END
go
CREATE VIEW dbo.vJoin
AS
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
go
CREATE FUNCTION dbo.SingleUDF (@ParentID int, @ChildID int)
RETURNS TABLE
AS
RETURN
(
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
WHERE p.ParentID = ISNULL(@ParentID,p.ParentID)
AND c.ChildID = ISNULL(@ChildID,c.ChildID)
)
go
CREATE Function dbo.MultiUDF (@ParentID int, @ChildID int)
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime)
AS
BEGIN
INSERT INTO @Multi
(ParentId
,ParentDate
,ChildId
,ChildDate)
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
WHERE p.ParentID = ISNULL(@ParentID,p.ParentID)
AND c.ChildID = ISNULL(@ChildID,c.ChildID)
RETURN
END
go
DECLARE @ParentID int
DECLARE @ChildID int
set statistics time on
select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
set statistics time off
SET @ParentID = 1
set statistics time on
select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
set statistics time off
SET @ChildID = 50
set statistics time on
select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
set statistics time off
RAISERROR(‘Done VIEW P and C’,0,1) WITH NOWAIT
SET @ParentID = NULL
SET @ChildID = NULL
set statistics time on
select * from SingleUDF(@ParentID,@ChildID)
set statistics time off
SET @ParentID = 1
set statistics time on
select * from SingleUDF(@ParentID,@ChildID)
set statistics time off
SET @ChildID = 50
set statistics time on
select * from SingleUDF(@ParentID,@ChildID)
set statistics time off
RAISERROR(‘Done SingleUDF P and C’,0,1) WITH NOWAIT
SET @ParentID = NULL
SET @ChildID = NULL
set statistics time on
select * from MultiUDF(@ParentID,@ChildID)
set statistics time off
SET @ParentID = 1
set statistics time on
select * from MultiUDF(@ParentID,@ChildID)
set statistics time off
SET @ChildID = 50
set statistics time on
select * from MultiUDF(@ParentID,@ChildID)
set statistics time off
RAISERROR(‘Done MultiUDF P and C’,0,1) WITH NOWAIT
ROLLBACK
RAISERROR(‘************************************************************************’,0,1) WITH NOWAIT
BEGIN TRAN
SET NOCOUNT on
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MultiUDF]’) AND type in (N’FN’, N’IF’, N’TF’, N’FS’, N’FT’))
DROP FUNCTION [dbo].[MultiUDF]
GO
CREATE TABLE dbo.Parent
(ParentId int identity(1,1)
,ParentDate datetime)
CREATE TABLE dbo.Child
(ChildId int identity(1,1)
,ParentId int
,ChildDate datetime)
DECLARE @i int
DECLARE @j int
SET @i = 1
SET @j = 1
WHILE @i < 100
BEGIN
INSERT INTO dbo.Parent
(ParentDate)
SELECT GETDATE()
WHILE @j < 100
BEGIN
INSERT INTO dbo.Child
(ParentId
,ChildDate)
SELECT @i
,GETDATE()
SET @j = @j + 1
END
SET @i = @i + 1
END
go
CREATE VIEW dbo.vJoin
AS
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
go
CREATE FUNCTION dbo.SingleUDF (@ParentID int, @ChildID int)
RETURNS TABLE
AS
RETURN
(
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
WHERE p.ParentID = ISNULL(@ParentID,p.ParentID)
AND c.ChildID = ISNULL(@ChildID,c.ChildID)
)
go
CREATE Function dbo.MultiUDF (@ParentID int, @ChildID int)
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime)
AS
BEGIN
INSERT INTO @Multi
(ParentId
,ParentDate
,ChildId
,ChildDate)
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
WHERE p.ParentID = ISNULL(@ParentID,p.ParentID)
AND c.ChildID = ISNULL(@ChildID,c.ChildID)
RETURN
END
go
DECLARE @ParentID int
DECLARE @ChildID int
set statistics time on
select * from SingleUDF(@ParentID,@ChildID)
set statistics time off
SET @ParentID = 1
set statistics time on
select * from SingleUDF(@ParentID,@ChildID)
set statistics time off
SET @ChildID = 50
set statistics time on
select * from SingleUDF(@ParentID,@ChildID)
set statistics time off
RAISERROR(‘Done SingleUDF P and C’,0,1) WITH NOWAIT
SET @ParentID = NULL
SET @ChildID = NULL
set statistics time on
select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
set statistics time off
SET @ParentID = 1
set statistics time on
select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
set statistics time off
SET @ChildID = 50
set statistics time on
select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
set statistics time off
RAISERROR(‘Done VIEW P and C’,0,1) WITH NOWAIT
SET @ParentID = NULL
SET @ChildID = NULL
set statistics time on
select * from MultiUDF(@ParentID,@ChildID)
set statistics time off
SET @ParentID = 1
set statistics time on
select * from MultiUDF(@ParentID,@ChildID)
set statistics time off
SET @ChildID = 50
set statistics time on
select * from MultiUDF(@ParentID,@ChildID)
set statistics time off
RAISERROR(‘Done MultiUDF P and C’,0,1) WITH NOWAIT
ROLLBACK
The results on the 100 rows for the SingleUDF are nits different from the qualified View SELECTs.
Thanks for raising to points you eloquently narrate – it’s made me re-evaluate whether some of the code I have is optimal. My concern with views is the degree to which data are assembled prior to a WHERE condition being applied, hence the "more data than 100 rows" request.
Cheers!
IE9 makes posts near unreadable, even in compatibility mode.
Which browser should I really use, please?
I’m not sure what happened to the formatting. It’s been fine until I saw these new comments come in. I’m working on it.
It looks like it’s just the comments that are messing up. It’s messed up in IE and Chrome.
In response to your tests, mine were only to make a point. If I set up real world tests with 10,000+ rows, so we can see how data & structures & statistics really work with multi-statement UDFs, and then nest and JOIN the UDFs as they were originally, these points become much more clear. But the thing is, there are demonstrable (albeit extremely minor) differences at extremely low numbers of rows.
This is best one article so far I have read online. I would like to appreciate you for making it very simple and easy. I have found another nice post related to this post over the internet which also explained very well. For more details please check out this link…
http://mindstick.com/Articles/8a08ba84-57f6-4f8f-a18c-e4c5c96a549b/?Table-Valued%20Functions%20in%20SQL%20Server
Thanks
[…] recommendation to use Table Variables? This is another of those cases where, like scalar UDFs and table-valued multi-statement UDFs, developers can sometimes get into trouble with a relatively benign-looking feature, due to way […]
I got the same result as you did the first time I created and ran these queries.
Then I put the queries into another query window and ran them again with stats on and execution plan on
Stats:
set statistics time on
select * from vJoin –TOOK:CPU time = 0 ms, elapsed time = 52 ms.
select * from SingleUDF() –TOOK:CPU time = 0 ms, elapsed time = 36 ms.
select * from MultiUDF() –TOOK:CPU time = 0 ms, elapsed time = 52 ms.
set statistics time off
Exec Plan:
select * from vJoin –TOOK:47%
select * from SingleUDF() –TOOK:47%
select * from MultiUDF() –TOOK:6%
I then took SAinCA advise and added a primary key to the MultiUDF function, like
ALTER Function dbo.MultiUDF ()
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime,
PRIMARY KEY ( ParentId,ChildId ))
–first 7 lines shown for brevity
Then I ran the queries again, TWICE
the first was akin the the first with no PK
On the second:
Stats:
set statistics time on
select * from vJoin –TOOK:CPU time = 0 ms, elapsed time = 81 ms.
select * from SingleUDF() –TOOK:CPU time = 0 ms, elapsed time = 46 ms.
select * from MultiUDF() –TOOK:CPU time = 0 ms, elapsed time = 79 ms.
set statistics time off
Exec Plan:
select * from vJoin –TOOK:47%
select * from SingleUDF() –TOOK:47%
select * from MultiUDF() –TOOK:6%
Then I repeated the above with a a where clause on each of the “select” command
No PK on MultiUDF (second query run):
set statistics time on
select * from vJoin where ParentId = 1 and childID between 25 and 55 –TOOK:CPU time = 0 ms, elapsed time = 32 ms. execPlan = 40%
select * from SingleUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 46 ms. execPlan = 40%
select * from MultiUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 1 ms. execPlan = 19%
set statistics time off
With PK on MultiUDF (second query run):
set statistics time on
select * from vJoin where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 29 ms. execPlan = 40%
select * from SingleUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 47 ms. execPlan = 40%
select * from MultiUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 1 ms. execPlan = 19%
set statistics time off
So PK or no PK doesn’t matter, but that may depend on the where clause
But MultiUDF does work much faster than the other 2 when the where clause is a range
Then I thought what about passing paramters, so I created:
Create Function dbo.MultiUDFWithParams (@ParentId_1 int, @ParentId_2 int, @ParentDate_1 datetime,@ParentDate_2 datetime,@ChildId_1 int,@ChildId_2 int, @ChildDate_1 datetime ,@ChildDate_2 datetime)
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime)
AS
BEGIN
INSERT INTO @Multi
(ParentId
,ParentDate
,ChildId
,ChildDate)
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
where
p.ParentId between isnull(@ParentId_1,p.ParentId) and isnull(@ParentId_2,p.ParentId)
AND
p.ParentDate between isnull(@ParentDate_2,p.ParentDate) and isnull(@ParentDate_2,p.ParentDate)
AND
c.ChildId between isnull(@ChildId_1,c.ChildId) and isnull(@ChildId_2,c.ChildId)
AND
C.ChildDate between isnull(@ChildDate_1,C.ChildDate) and isnull(@ChildDate_2,C.ChildDate)
RETURN
END
Then ran the queries again:
set statistics time on
select * from vJoin where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 27 ms. execPlan = 40%
select * from SingleUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 49 ms. execPlan = 40%
select * from MultiUDFWithParams(1, NULL, NULL,NULL, 45,55,NULL, NULL) –TOOK:CPU time = 0 ms, elapsed time = 1 ms. execPlan = 19%
set statistics time off
So no real difference.
There might be a difference if more data is added
So I add 100K of rows
non params:
set statistics time on
select * from vJoin where ParentId between 45 and 55 and childID between 45 and 55–TOOK:CPU time = 0 ms, elapsed time = 109 ms. execPlan = 50%
select * from SingleUDF() where ParentId between 45 and 55 and childID between 45 and 55–TOOK:CPU time = 0 ms, elapsed time = 83 ms. execPlan = 50%
select * from MultiUDF() where ParentId between 45 and 55 and childID between 45 and 55–TOOK:CPU time = 0 ms, elapsed time = 256 ms. execPlan = 0%
set statistics time off
Maybe there is something in that “incorrect execution plan ” comment from Grant
then the param UDF
set statistics time on
select * from vJoin where ParentId between 45 and 55 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 89 ms. execPlan = 40%
select * from SingleUDF() where ParentId between 45 and 55 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 87 ms. execPlan = 40%
select * from MultiUDFWithParams(45, 55, NULL,NULL, 45,55,NULL, NULL) –TOOK:CPU time = 0 ms, elapsed time = 49 ms. execPlan = 2%
set statistics time off
param query is much better when there is lots of data
SingleUDF is most consistent overall
The original post was an extremely simplistic example, that, by and large has still held up to what I said originally. I can build a more substantive test at some point when I’ve got some time. I’ve seen consistently poor performance from multi-statement UDFs, as have the majority of query tuning consultants I know.
[…] you use a multi-statement UDF, then your inner select is executed exactly once for each outer row. The multi-statement UDF is […]