Examining PAGEIOLATCH waits
In a previous article we looked at CXPACKET waits, a wait type that is can be on the top of your server’s list if you enabled parallelism using the MDOP setting.
Another wait type that you probably will be seeing in your top 10 of waits (unless you have blazing fast storage) is PAGEIOLATCH_XX where the XX is the type of latch used.
In this article we will take a look at the various PAGEIOLATCH types and what they can mean for your system. Before we can start diving directly into the waits we will take a look at what latches actually are.
Microsoft describes latches as “very lightweight, short-term synchronization objects protecting actions that need not be locked for the life of a transaction”. What is very important to remember about latches is that they are not locks! Even though they share some of the same types as locks (like SH or EX) they do not play a role in isolating you transaction.
There are a lot of different type of latches SQL Server uses, you can look them up by querying the sys.dm_os_wait_stats DMV for all Buffer related latches and sys.dm_os_latch_stats for all non-buffer related latches:
SELECT * FROM sys.dm_os_wait_stats WHERE wait_type LIKE '%LATCH%' SELECT * FROM sys.dm_os_latch_stats
Since we are focusing on PAGEIOLATCH waits in this article I’ll explain what role they play on SQL Server.
A PAGEIOLATCH wait occurs when a user requests a page that is not in the buffer cache. SQL Server uses a PAGIOLATCH_EX on a free space in the buffer cache and transfers the requested page from disk into memory. Next to the PAGEIOLATCH_EX a PAGEIOLATCH_SH is placed on the place in the buffer cache so the user can read the page from the buffer after it has been transferred there and the PAGEIOLATCH_EX is released. The PAGEIOLATCH_SH is removed after the user is finished reading the page from the buffer cache.
Next to PAGIOLATCH_SH en PAGIOLATCH_EX you can see the following types of PAGEIOLATCH:
A task is waiting to destroy a page from disk.
A task is waiting to update a page from disk.
A task is waiting to receive another latch.
Internal undocumented latch.
High PAGEIOLATCH waits
As I said in the introduction it is pretty common to see high PAGEIOLATCH_XX waits when you are looking at your wait statistics. As you read in the part above PAGEIOLATCH_XX waits occur when SQL Server has to move a page from disk to the buffer cache. So your main troubleshooting focus when looking at PAGEIOLATCH waits will be on the IO subsystem. Keep in mind though that the problem doesn’t necessarily mean your IO subsystem is performing badly! The following reasons can show high PAGEIOLATCH waits even though your IO subsystem is working just fine:
- You restarted the SQL Server service, this means the buffer cache is empty and has to be filled again. All those pages are being read from disk and will result in high PAGEIOLATCH waits.
- Memory pressure, when you don’t have enough free memory available for the buffer cache pages will be swapped out quickly and replaced.
- Bad indexing or outdated statistics, both of these can result in more PAGEIOLATCH because, for instance, more table scans are occurring and thus more pages are being read from disk to the buffer cache.
As always make sure you have a baseline for your SQL Server system when it is functioning normally. If you see high PAGEIOLATCH waits compare it to that baseline and check if it really needs your attention. Just querying sys.dm_os_wait_stats and running to your storage admin isn’t usually a good idea!
The following Perfmon counter can help you check where the PAGEIOLATCH waits are coming from:
- Avg. Disk sec/read
This will show you the read latency for getting data from the IO subsystem, values higher then 10ms for a longer period of time are an indication that you might be running into storage issues.
- Avg. Disk sec/write
This will show you the write latency for writing data to the IO subsystem, values higher then 10ms for a longer period of time are an indication that you might be running into storage issues.
- Page Life Expectancy
Compare it to your ‘normal’ baseline. If the values start to drop drastically you might be running into memory related problems. Please ignore the 300 second rule Microsoft recommend. We have servers these days with 10-20 times more memory as when they first said PLE should always be above 300 second.
As I cannot stress this enough, always compare these counters with a baseline value of when your system is functioning normally!
Just as a last tip before you complete this article, go and download the “SQL Server 2005 waits and queues” whitepaper! Even though it is written for SQL Server 2005 much of it still applies today. It will give you a much deeper understanding of waits and latches!
Need more info on how wait statistics are collected, I wrote about that one as well.