How SQL Server statistics are generated and updated
One of the easiest ways to increase query performance on your database is making sure your statistics are up-to-date. Statistics are – if you enabled the options in your database properties – automatically created and updated by the SQL Server Engine. This does not however mean that the statistics are correct or up-to-date! In this article I will describe how statistics are being generated, updated, and how you can check if they are up-to-date on your server.
How statistics are generated and updated
Statistics are used by the query engine of SQL Server to generate an execution plan. Depending on the statistics SQL Server can decide to use different types of operators to retrieve the data you requested. As you can imagine, having out-of-date or incorrect statistics can lead to a bad execution plan which slows down query execution!
Statistics are normally generated automatically by SQL Server if you have the Auto Create Statistics option set to True in your database properties or you can add statistics yourself. So when does SQL Server create the statistics? To find out I created a new – very basic – test table in the AdventureWorks database:
USE [AdventureWorks2012] GO CREATE TABLE Stat_Test ( c1 int, c2 int, ) GO
If we open up this table in SQL Server Management Studio and look at the statistics folder we will see that SQL Server didn’t create any statistics:
A way to get SQL Server to generate statistics is by querying a column that doesn’t have any statistics generated yet. I wrote the query below to select a value from the c2 column (after inserting 10 rows):
INSERT INTO Stat_Test (c1, c2) VALUES (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10) GO -- Select a record SELECT c2 FROM Stat_Test WHERE c2 = 5 GO
If we refresh our table in SSMS we will see that SQL Server added new statistics:
The WA actually stands for Washington, the Sys for System so we know this statistic was generated by SQL Server.
Our current statistics date can be found using the DBCC SHOW_STATISTICS function:
In this case SQL Server automatically generated our statistics on May 16 2014 at 1:30PM when we first ran our query against the table. As you can see the number of rows we have in our table is the same as the number in the statistics.
Now that we know when statistics are automatically being generated by SQL Server, when does SQL Server update them? This Microsoft article has the answer:
- The table size has gone from 0 to >0 rows (test 1).
- The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
- The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).
So for automatically generated statistics on our c2 column this would mean they would only get updated after more then 500 changed since our table is smaller then 500 rows (only 10).
Let’s add 100 more rows to our table:
INSERT INTO Stat_Test (c1, c2) SELECT MAX(c1)+1, MAX(c2)+1 FROM Stat_Test GO 100
If we take a look at our statistics now using DBCC SHOW_STATISTICS:
We can see our statistics aren’t automatically updated by SQL Server, which means the execution engine will receive wrong information about the number of rows in our table. This is still working as intended since we didn’t do 500 changes do our table, we just added 100 more rows.
If we look at the Execution plan for the query below we can see the old statistics are still being used:
SELECT c2 FROM Stat_Test WHERE c2 BETWEEN 50 AND 90 GO
The Estimated Number of Rows is 1 even though the actual number of rows is 41.
Let’s update the statistics manually by running UPDATE STATISTICS and check if anything has changed:
UPDATE STATISTICS Stat_Test _WA_Sys_00000002_2882FE7D GO
As you can see the statistics are updated.
If we run the query above again and look at the execution plan we hope to see a difference:
SQL Server now estimated 40 rows which is a lot closer to the actual number of rows then our previous plan!
Now you can probably work out what kind of impact outdated statistics can have on your queries. In this case we used a very simple query against a very small table, but imagine having complex queries again tables with millions of rows!
Another problem is when SQL Server automatically updates statistics. Imagine have a table with 1.000.000 rows, in this case there would need to be 500 + 20% (200.500) row changes in this table before statistics would get updated! This can, and probably will, slow down query performance against this table eventually until statistics are updated again.
It will probably be a good idea to perform statistics maintenance on a regular interval on tables that are very large and thus need a large amount of changes before they are updated. You can do this using the SQL command above or by running sp_updatestats which updated the statistics for every user table in your database.