The company where I work has been using virtualization in development, QA, testing, etc., for many years now. We have not even attempted virtualization in production. Things change. But, before we did it, I thought I’d bounce the question out to the people who know, What are some things to watch for in using virtual servers in a production environment.
Introduction
Flat out, I asked people over Twitter. The overall message was extremely positive. No one suggested it was a bad idea. A few thought that some high volume servers are better served by physical boxes. Volume in this case could be read as high amounts of reads or high amount of writes, both were suggested as reasons to avoid virtualization. The cause for this is clear, from each individual; the number one problem was around I/O.
I put out a general call for information and received a pretty good set of responses. I’m not expert enough in all this technology to put together a single summary, so I’ll more or less present the information as it was presented to me. When all the information was bounding past, a few people asked me to aggregate it, so here it is.
What to watch for
According to Allen Kinsel (blog|twitter), a DBA for a multi-line insurance company who manages Oracle, Sybase and SQL Server and is very experienced with virtualization in support of all his products, I/O can be more than double from a virtual machine. With this in mind, Mr. Kinsel and Brent Ozar (blog|twitter), the second non-Microsoft employee to be certified as a SQL Server Certified Master (MCM) in North America (I think that’s the claim), both recommend avoiding the use of raw LUN’s unless you experience severe I/O binding issues. Brent and Allen both recommend you get limited VSphere access in order to closely monitor I/O through the VMWare interface because the normal I/O measurements would not be accurate (Allen suggests buying the VM admin some beer, probably excellent advice). Brent goes on to outline a few other specific points. Set the minimum memory reservation on the VM to avoid the balloon driver telling SQL it’s out of memory. Brent suggests that you use as few virtual CPUs as possible. If the physical box averages 25% CPU load across four physical cores, then only add 2 virtual cores. Brent also suggests using IOMeter or SQLIO to test the configuration. If you’re maxing at 100-125mb/sec then you’re looking at an I/O bottleneck and will need to make adjustments to the system. Brent says that with an fiber channel SAN such as we use that we should be fine with I/O as long as we’re not using physical boxes in active/active multipathing and that we have more than 2 HBAs per physical box.
Denny Cherry (blog|twitter), a SQL Server MVP and consultant, just sent me a list of things to keep an eye on but all largely related to I/O
- Disk I/O
- CPU Queuing on the host
- Memory over commit
- Disk I/O over commit
- CPU over commit
Documentation
Several people, rather than send me stories, or in addition to the stories, simply pointed out documentation of which you should take advantage. First, and most important of these from the stand point of SQL Server and virtualization is that the Professionall Association of SQL Server users (PASS) has a special interest group dedicated to virtualization: http://virtualization.sqlpass.org
Disk Performance Information:
http://stef-bauer.com/2010/05/22/richmond-codecamp-disk-io-presentation/
San Multipathing (introductory): http://www.brentozar.com/archive/2009/05/san-multipathing-part-1-waht-are-paths/
SAN Performance Tuning with SQLIO
http://sqlserverpedia.com/wiki/san_performance_tuning_with_sqlio
Virtualization and SAN Basics
http://www.pragmaticworks.com/resources/webinars/April2010webinar.aspx
 Virtualization Information:
How Microsoft IT Reduces Operating Expenses using Virtualization http://blogs.msdn.com/b/nickmac/archive/2010/05/28/how-microsoft-it-reduces-operating-expenses-using-virtualization/
Virtualization Myths
SQL Server Performance on Vmware vSphere: http://blogs.vmware.com/performance/2009/06/sql-server-performance-on-vspehere.html
 Summary
That’s it. I won’t pretend that I understood everything. I’m passing it along in a semi-structured format as a public service. I also received some information from the MVP back channels, but I can’t publish that without permission. Suffice to say, while extremely useful, none of it changed the overall picture as represented by the information listed above.
[…] This post was mentioned on Twitter by Grant Fritchey, Ronald Dameron. Ronald Dameron said: RT @GFritchey: Virtualizing Servers in Production: The company where I work has been using virtualization in development, QA, tes… http://bit.ly/9w0WXw […]
fyi – your link to Jonathan Kehayias’ (look at that, spelled his name right on the first try) article should end in .aspx, not /
Thanks. Updated it.
Thanks for this post. Slowly collating information and documentation for SQL w/ Virtualization, please find a few more useful urls (hopefully havent repeated any). Credit goes to the two people that provided them in Linked in (Menno Bos and Terrell Jones), so thanks to you all (and of course the authors).
http://www.vmware.com/files/pdf/sql_server_virt_bp.pdf
http://www.microsoft.com/industry/professionalservices/casestudylibrary.mspx?casestudyid=4000006429
http://www.vmware.com/files/pdf/perf_vsphere_sql_scalability.pdf
http://jasonmassie.com/archive/2009/07/new-sql-server-on-vsphere-whitepaper/
http://jasonmassie.com/archive/2009/07/private-clouds-and-the-rdbms/
http://www.citrix.com/%2Fsite%2Fresources%2Fdynamic%2Fsalesdocs%2FSQL_Server2008_XS5_WP.pdf
I ran into interesting IO performance degradation issues when using the standard LSI driver in Windows for doing (busy) SAN based IO. With the right (wrong?) IO mix, it’s possible to get read perf degrade from 3000-4000 reads/sec with 10ms latency, to around 100-200 with 30-100s of ms latency.
After weeks of testing I was even able to reproduce the problem at will.
Only a guest OS reboot helps. Or using the PVSCSI driver with ESX 4. The difference between the standard LSI SCSI and the PVSCSI is the SCSI stack used in Windows: SCSIPort vs the newer StorPort stack.
Just so you know..
Thanks. Great information. Thanks for sharing.