Cleveland DBA

SQL Server administration in Cleveland, OH

Stuff I learned … yesterday – statistics

Yesterday was a pretty full day between training and the ONSSUG June meeting immediately afterwards, so I didn’t get a chance to blog.  Anyway, I did learn something yesterday, but it didn’t come from the class I’m taking.  Erin Stellato (blog | twitter) gave a great presentation at ONSSUG about statistics.  I really wish she’d had more time because it was simply crammed with good information. 

Anyway, two key things I learned:

  1. Multicolumn statistics – Just what they sound like, statistics over multiple columns.  These are created by default when you create a composite index, but you can also create them yourself without an index using the CREATE STATISTICS command.  Helpful when you have queries that filter on multiple columns where no index exists.
  2. Filtered statistics – You can also put a filter on your statistics.  This is useful when you have a very large table but normally only query on a small subset of rows.  If these rows are highly selective, creating filtered statistics will help the optimizer choose the best path.

For more information on these 2 features or on statistics in general, check out Books Online.


June 8, 2011 Posted by | Features, SQL Tuning | , | Leave a comment

How is SQL Server using all that memory, anyway?

I want to stick with the subject of memory because I think it’s probably the most misunderstood (and sometimes downright mysterious) component of SQL Server. I know I, for one, could benefit from a better understanding of its inner workings. So today I’d like to share that little bit of code I mentioned last week, that I find very handy for getting an overall picture of how memory is being used by a SQL instance. Personally I run it as a complete script, but I’ll break it down here and provide a little description of each section. I should say, too, that I didn’t write any of this from scratch. For most of it I started with code found here, here, and here, and modified it to suit.

So, without further ado, let’s get started. The first section provides an big-picture look at how SQL is currently allocating memory between the database page cache, procedure cache, and miscellaneous memory usage. It also provides the total memory usage, and I added in the maximum available workspace memory.


declare @plan_cache_size float, @obj_data_size float , @avail_workspace_size float

-- plan cache size

select @plan_cache_size = (cntr_value*8)/1024.0 FROM sys.dm_os_performance_counters
	WHERE object_name like '%Plan Cache%' and counter_name = 'Cache Pages' and instance_name = '_Total'


select @obj_data_size = (cntr_value*8)/1024.0  FROM sys.dm_os_performance_counters
	WHERE object_name like '%Buffer Manager%' and counter_name like 'Database pages%'

-- Maximum workspace available for sorts, hashes, etc

select @avail_workspace_size = (cntr_value)/1024.0 FROM sys.dm_os_performance_counters
	WHERE object_name like '%Memory Manager%' and counter_name = 'Maximum Workspace Memory (KB)'

select   @obj_data_size [Database_Page_Cache_MB]
 , @plan_cache_size [Procedure_Cache_MB]
 , [Misc_Memory_Usage_MB]
 , [Misc_Memory_Usage_MB] + @obj_data_size + @plan_cache_size [Total_Memory_Usage_MB]
 , @avail_workspace_size [Maximum_Workspace_MB]
 select sum(cntr_value)/1024.0 [Misc_Memory_Usage_MB]
 from sys.dm_os_performance_counters
 where object_name like '%memory%'
 and (
   counter_name like '%Connection Memory (KB)%'
   counter_name like '%Granted Workspace Memory (KB)%'
   counter_name like '%Lock Memory (KB)%'
   counter_name like '%Optimizer Memory (KB)%'
   counter_name like '%SQL Cache Memory (KB)%'
) x

The output:

The next section uses sys.dm_os_memory_cache_counters to examine how the procedure cache is being broken down.

-- How the Procedure cache is being used

	LEFT([name], 20) as [name],
	LEFT([type], 20) as [type],
	([single_pages_kb] + [multi_pages_kb])/1024 AS cache_mb,
FROM sys.dm_os_memory_cache_counters
order by single_pages_kb + multi_pages_kb DESC

The output:

The third part of the script goes deeper into the procedure cache and displays the top 25 cached plans by size. Because it displays some application code, I’m not going to include the results here.

-- Top cached plans by size

select top 25
, ObjType
, (pagesUsed * 8192)/1024.0/1024.0 [space_used_MB]
, db_name(dbid) [database_name]
, object_name(objid, dbid) [object_name]
, [sql]
from master..syscacheobjects (nolock)
order by pagesUsed desc

And, finally, the last section shows us how much of the buffer pool is being utilized by each database, including the dirty and clean page counts, sorted by total memory.

-- Buffer Pool Memory Per DB

(CASE WHEN ([database_id] = 32767) THEN 'Resource Database' ELSE DB_NAME (database_id) END) AS 'Database Name',
SUM(CASE WHEN ([is_modified] = 1) THEN 1 ELSE 0 END) AS DirtyPageCount,
SUM(CASE WHEN ([is_modified] = 1) THEN 0 ELSE 1 END) AS CleanPageCount,
count(*)AS TotalPageCount,
cast(count(*) * 8192.0 / (1024.0 * 1024.0) as decimal(8,2)) as BufferPoolMB
FROM sys.dm_os_buffer_descriptors
GROUP BY database_id
ORDER BY TotalPageCount desc

And the output:

And there you have it. I hope you find this useful!

May 2, 2011 Posted by | General, SQL Tuning, Troubleshooting | , , , | 1 Comment

Available Workspace Memory

My research into the Resource Governor question and its answer brought up another question for me:

SELECT * FROM sys.dm_exec_query_resource_semaphores

Where does that total_memory_kb number for the default pool actually come from? After all, I have a max server memory of 26000MB for the instance, so why 18175MB?  The BOL description for this field wasn’t very helpful.  Googling for total_memory_kb didn’t really yield me much.  Most of what I found was unrelated to this particular DMV.

So I started looking at the sys.dm_os_performance_counters, and there I finally found something that looked promising.

 select cntr_value FROM sys.dm_os_performance_counters 
WHERE object_name like '%Memory Manager%' and counter_name = 'Maximum Workspace Memory (KB)' 

If you add up the 2 total_memory_kb values for the default pool, you get the maximum workspace memory. Woot!

That still didn’t tell me how that particular value was derived, but it gave me something better to search on. And that search returned this article at which says (emphasis mine):

The SQL Server: Memory Manager object includes the following workspace-related PerfMon counters: Memory Grants Pending, which tells you the number of waiting queries; Memory Grants Outstanding, which counts the number of queries that have grants and are currently executing; Granted Workspace Memory (KB), which measures the total amount of grants SQL Server has made to running queries; and Maximum Workspace Memory (KB), which measures the maximum amount of memory SQL Server is currently willing to devote to workspace-memory grants. In general, SQL Server is able to devote between 25 and 75 percent of the buffer pool’s virtual memory to workspace memory. The value varies based on the demand for memory for other uses (such as database page cache and plan cache). If demand is low, SQL Server devotes a larger fraction of memory to workspace memory needs; if demand is high, SQL Server devotes a smaller fraction to workspace memory.

So there you have it. I feel better, don’t you feel better?

That entire article is worth reading for a better understanding of SQL Server memory. In fact, it helped me put together a bit of code to show me how memory is being used by SQL Server. More on that in a future post (I’ve got to give you a reason to keep coming back, right?).

April 28, 2011 Posted by | General, SQL Tuning, Troubleshooting | , | Leave a comment

Composite indexes – Does column order matter?

When it comes to creating composite indexes, a common bit of advice we’re given is to put the most selective column first.  And I’ve always just taken this at face value.  But recently I got to thinking, is this still true?  I’d never actually tested this theory myself.  Does this rule still apply?

So let’s take a look.  I started by creating a test table in AdventureWorks using existing data.  My goal was to create a table with one high-cardinality column and at least one very low-cardinality column.

USE AdventureWorks;


ID int IDENTITY(1,1),

SELECT LastName, StateProvinceCode FROM Person.Contact CROSS JOIN Person.StateProvince ;

SELECT    COUNT(distinct ID) AS DistinctIDs,
 COUNT(DISTINCT surname) AS DistinctNames,
 COUNT(DISTINCT location) AS DistinctLocs
FROM myBigTable;

Perfect.  ID is unique and location is very low cardinality.  I’ll use those 2 columns in my test indexes.

CREATE INDEX i_highcard ON myBigTable (ID, location);  -- high cardinality first
CREATE INDEX i_lowcard ON myBigTable (location, ID);    -- low cardinality first

Before we go any further, let’s take a look at the physical size of the indexes, just to see if there’s any difference there.

select name, index_id from sys.indexes where object_id = OBJECT_ID('myBigTable')
SELECT * FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('myBigTable'), NULL , NULL, 'LIMITED');

Judging by the fragment_count and page_count values, I’d say there’s no difference in size.  Now, on to query performance.  First we’ll query the table for a particular ID value.  I’ll use the INDEX hint to force the optimizer to use each index.

SELECT * FROM myBigTable WITH (INDEX = i_highcard) WHERE ID = 77321
SELECT * FROM myBigTable WITH (INDEX = i_lowcard) WHERE ID = 77321

Check out those execution plans.  When we query based on the first column in the index, the optimizer is able to do an index seek.  But when we query based on the second column in the index, as in the second query, the optimizer doesn’t know where to enter the index, so it’s forced to do an index scan.  (In fact, the optimizer even suggests we create an index on ID.)  We can also see the impact of these different execution plans in the IO statistics.

From the first query:
Table ‘myBigTable’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And from the second query:

Table ‘myBigTable’. Scan count 5, logical reads 9563, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

That’s a lot more IO the database has to do.  And we all know, when it comes to IO, less is more.

Now, those results were pretty predictable.  But what about when both indexed columns are in the where clause?  Based on what I’ve been told, the index with the high-cardinality column first will be more efficient.  But I want to see that for myself.

SELECT * FROM myBigTable WITH (INDEX = i_highcard) WHERE ID = 77321 AND LOCATION = '04'
SELECT * FROM myBigTable WITH (INDEX = i_lowcard) WHERE ID = 77321 AND LOCATION = '04'

First, the IO stats:

Table ‘myBigTable’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘myBigTable’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Exactly the same.  What about the execution plans?

Also the same.  Even down to the cost.

That surprises me.  I really thought the i_highcard index would be more efficient.  So, I’m back to my original question, when it comes to composite indexes, does column order matter?  And I guess the answer is:  it depends.  SQL Server only maintains statistics on the first column in an index, so having a highly selective first column can be more efficient if you’re using that column in your WHERE clauses.  But if you’re using both columns as filter criteria, it doesn’t seem to matter.  This another reason is why it’s extremely important to know your application and how it accesses the data.

April 7, 2011 Posted by | General, SQL Tuning | , | 2 Comments

Plan guides and parameterization

We’ve all had this experience. A third party application is running a piece of sql that simply isn’t choosing the best query plan, and if only it would use a hash join instead of a nested loop. Or maybe it’s blocking other processes and a nolock hint would help. The problem is, you can’t change the code.

Or can you?

SQL Server plan guides effectively allow you to optimize performance of queries you can’t modify directly by attaching hints to them. Let’s say you have an application running the following query:


And after some testing, you decide it will run much better if it were using MERGE JOINs.  So you add a hint.

plan_guides_with hint

That’s all well and good, but you can’t go into the application to add that hint.  And then there’s the added complication of the usage of literal values in the sql, instead of variables.  So the optimizer will see every execution of this query for ‘Manufacturing’ as completely different from an execution for ‘Quality Assurance’.  In order for a query plan to work for any value, you’ll have to parameterize the query.  Now, you could force parameterization at the database level, but that’s a mighty big hammer for this little nail.  So, instead we’ll create a plan guide using 3 steps.

  • Step 1: We’ll use sp_get_query_template to generate a parameterized version of this query
  • Step 2: We’ll force parameterization of any query that matches this form.  
  • Step 3: We can create the plan guide. 

Sound complicated?  It’s not.  Look:

/*    Step 1: Create a parameterized version of the query. */

DECLARE @stmt nvarchar(max)
DECLARE @params nvarchar(max)
EXEC sp_get_query_template
N'SELECT E.EmployeeID, C.LastName, C.FirstName, D.GroupName, E.Title, P.PayFrequency, P.Rate
    HumanResources.Employee E 
    inner join Person.Contact C on E.ContactID = C.ContactID
    inner join HumanResources.EmployeePayHistory P on E.EmployeeID = p.EmployeeID
            AND P.RateChangeDate = 
                (SELECT MAX(RateChangeDate) FROM HumanResources.EmployeePayHistory P2
                    WHERE P2.EmployeeID = P.EmployeeID)
    inner join HumanResources.EmployeeDepartmentHistory DH on E.EmployeeID = DH.EmployeeID
            AND DH.EndDate IS NULL
    inner join HumanResources.Department D on D.DepartmentID = DH.DepartmentID
    D.GroupName = ''Manufacturing''    AND
    E.CurrentFlag = 1
@stmt OUTPUT,            --try selecting these 2 variables to see how they get parameterized
@params OUTPUT

/*    Step 2: Force parameterization of any query that matches this form.  So, any time this 
    sql statement is executed, regardless of the literal values used, it will be parameterized.  
    Therefore all iterations of the query will look the same to the optimizer. This is critical to
    using plan guides.  */
EXEC sp_create_plan_guide N'MyTemplateGuide', 

/*    Step 3: Now that we've parameterized the query, we can apply a plan guide to it.  In this
    instance, we're giving it a hint to use a MERGE JOIN */

EXEC sp_create_plan_guide N'MyPlanGuide', 


Pretty straightforward, right?  And please note that if your query is already using parameters, you don’t need to do steps 1 and 2.  Now, let’s try that original sql again.  Notice the original code is now executing as though it has the MERGE JOIN hint:



You can see what plan guides are in your current database by querying sys.plan_guides.



And you can disable/enable a particular plan guide using sp_control_plan_guide.

disable plan_guide

And that’s it.  You’re a hero.

February 9, 2011 Posted by | SQL Tuning, T-SQL | , , | Leave a comment