Cleveland DBA

SQL Server administration in Cleveland, OH

The waiting is the hardest part

A developer came to me last week with a problem.  A relatively simple query within his application that normally takes under a second to run, is intermittently taking upwards of 30 seconds.  This, in turn, is causing his application to time out.  He said he could increase the timeout on his application to work around the problem, but wanted to know if I could help fix it, first.

As I said, the query itself is a very simple one, joining 3 tables, the largest of which only contains about a half a million records.  And the query itself only returns about a thousand rows.  So I started going through the usual suspects:

The Code:  The first thing I did was check out the query itself to ensure the developer wasn’t doing anything silly like applying functions to columns in the where clause, etc.  I also saw that he was using literal values, thereby ruling out parameter sniffing.

Blocking:  Given the intermittent nature of the problem, blocking initially looked like a viable cause.  However, the developer is using nolock on all tables in the query.  To make doubly sure, I used the blocked process threshold in conjunction with a server-side trace to track any blocking on the server longer than 10 seconds.  There was no blocking at all occurring in the related database. 

Statistics/Query plan:  Statistics are updated nightly on this database.  During the day, the data is static, so stats shouldn’t be the problem.  Another server-side trace confirmed that the query was using the exact same query plan for good runs as it did for bad runs. 

Waits:  That brings me to waits.  I decided to use sp_WhoIsActive to try to trap the problem when it happens and see what the process is waiting for.  The problem was actually catching it in the act.  Up to this point, by the time we were aware of the problem, it was gone.  To catch it, I’d need to have the procedure already running in the background, recording data. 

My first step was using the following script to create a table to hold the output.

DECLARE @wia_table VARCHAR(4000) ;
DECLARE @schema_out VARCHAR(4000) ;

SET @wia_table = 'DBA_Temp.dbo.WhoIsActive_' + CONVERT(VARCHAR, GETDATE(), 112) ;

EXEC sp_WhoIsActive
   @get_transaction_info = 1,
   @get_plans = 1,
   @get_additional_info = 1,
   @get_task_info  = 2,
   @RETURN_SCHEMA = 1,
   @SCHEMA = @schema_out OUTPUT ;

SET @schema_out = REPLACE(@schema_out, '<table_name>', @wia_table) ;

EXEC(@schema_out) ;

The next part was a script to record the sp_WhoIsActive output for the application login every 10 seconds.  Given that the bad executions run over 30 seconds, I figured that every 10 seconds would be sufficient to catch the problem and minimize overhead.  The script ends itself at 5pm.

DECLARE
    @wia_table VARCHAR(4000) ,
    @stoptime DATETIME

SET @wia_table = 'DBA_temp.dbo.WhoIsActive_' + CONVERT(VARCHAR, GETDATE(), 112)
SET @stoptime = cast(CONVERT(VARCHAR, GETDATE(), 112) + ' 17:00:00'  as datetime )

WHILE GETDATE() < @stoptime
    BEGIN
      
       EXEC sp_WhoIsActive
           @get_transaction_info = 1,
           @get_plans = 1,
           @get_additional_info = 1,
           @get_task_info  = 2,
           @filter = 'my_user', @filter_type = 'login',        
           @destination_table = @wia_table;

       WAITFOR DELAY '00:00:10'

    END ;
GO
 

And finally, I created a job to run both of these scripts, scheduling it to run daily at 7am.  Between this job, the server-side trace, and the blocking trace, I think I have all my bases covered.  The only thing left now is to wait… (Get it?  Wait??)

February 15, 2011 Posted by | Troubleshooting | , | 2 Comments

The obligatory first post

This past weekend I attended SQL Saturday here in sunny Cleveland.  And while there were probably more fun ways to spend a Saturday, I have to say I’m very glad I went.  There were some great sessions by Steve Jones, Thomas LaRock, Grant Fritchey, and Jeremiah Peschka and I learned a ton of information.

However, I would say that 2 things stand out as the most important lessons I came away with.

1) sp_whoisactive.  I don’t know where I’ve been.  Under a rock, apparently.  This stored procedure by Adam Mechanic is AWESOME.  Period.  I downloaded it this morning and within seconds was able to provide my users with the exact SQL from a couple of reports that were killing our system.  Out of the box, this procedure lists details about all active sessions, including the time since the last batch, the spid, full SQL text, login, CPU utilization, tempdb usage, disk and usage, and a lot more.  I haven’t even explored all the options yet.  This is my new sp_who, my new activity monitor, my new [insert name of pretty third party tool here].  If you haven’t checked it out yet, do it now!

2) This one is a bit embarrassing.  I haven’t updated my resume since… ok I’m not going to say exactly when.  Suffice it to say it’s been years.  Years.  There, I said it.  My head hangs in shame.  Why so long?  I don’t really have a good explanation.  I’ve been at the same employer for over 10 years.  I’m not looking to leave at this point.  I realize that just because I have no intention of leaving, that doesn’t mean my employer might not have other ideas.  And the longer I leave it, the more difficult the task becomes.  Key points get forgotten.  So the second important (and probably the *most* important) thing I took away from SQL Saturday is that I have to make more of an effort in building and maintaining a “brand” for myself, which includes keeping my resume up to date.
Which leads me to this blog.  This is more for me than anyone else.  A place to document things I know, things I learn, questions I face, and maybe even help someone else.  And this is my first post, done.  Glad we got that out of the way.

February 8, 2011 Posted by | Professional Development | , , | Leave a comment