Normally I’m not a “TGIF” kind of guy. I enjoy work, I love digging in to the databases to make things better. This week we rolled out some new technology to virtualize some of our systems. This is the second round of these upgrades, the first having gone off without a hitch.
This week, after the new hardware was installed, configured, and the systems virtualized we noticed a severe decrease in performance. It was one of those situations where every theory had an outlier that prevented it from being the right theory! 2 of the 3 virtualized servers were bad, 1 was good. 2 of the 3 systems left alone were ok, 1 was not. The SQL Server instance that wasn’t touched was now crawling with almost no waits & almost no I/O in use.
I LOVE the vast amounts of information available to us in times of crisis. My boss, even though everything was going crazy, wanted me to figure out why SQL Server was not running well. We have a few batch processes that copy data from one database to another, and this was backed up. We couldn’t understand why this was slow given both databases were in the same instance, and neither had been changed.
First thing I did was check the Activity Monitor (not actual image)
Avg % Processor Time (< 15%)
Waiting Tasks (2-5)
Database I/O (0-2 MB/sec)
Batch Requests /sec (<2000)
Everything looked fine. On to Waits
I can’t thank Brent Ozar enough for being the go to guy in all situations, but in this case I relied on http://www.brentozar.com/sql/wait-stats/ for a guide to resources to help me. Reading through his list and using some of the queries it contains helped me get at the meat of the server.
I also used “Wait statistics, or please tell me where it hurts” by Paul Randal which, when combined with Brent’s guides & scripts allowed me to tell my boss that I was 90% sure NOTHING was wrong with SQL Server. I felt the problem was that the batches weren’t getting TO SQL.
The network guys couldn’t find anything wrong with the network.
In the end we reverted the changes, restoring the virtualized servers back to their physical machines & all returned to normal operation. Obviously we want to go forward with the virtualization, but first we’ll have to figure out what went wrong.
So all in all, a long, semi-stressful day. Even at 90% confidence I was right, I was terrified of the 10% chance I had missed something. So here it is Friday, a weekend at the camp with beer and hopefully some golf is looming, and I can’t wait to get to it!