Home, Cost-Based Optimizer, Benchmarks, Server Systems, Systems Architecture, Processors, Storage,
  Storage Overview, System View of Storage, SQL Server View of Storage, File Layout,
  PCI-E, SAS, FC, HDD, SSD, RAID Controllers, Direct-Attach,
  SAN, Dell MD3200, CLARiiON AX4, CX4, V-Max, HP P2000, EVA, P9000/VSP, Hitachi AMS

IO Queue Depth Control

I have complained about lack of queue depth control in SQL Server IO before and I am complaining again. I fully understand the desire to restrict the number of knobs, which used incorrectly can cause more harm than good. The main point about IO queue depth is that there is not a single right answer for all circumstances but rather depends on both the hardware environment and the purpose.

One example is a transaction processing server that is also used for reports, or perhaps queries that behave as a report. The transaction may involve a series of queries each for a single or few rows. These would be issued synchronously at queue depth 1.

A report-type query such that the estimated involves 25 or more rows would issue IO asynchronously at high queue depth. In principle, this would increase the storage subsystem throughput by allowing the hard disk to employ elevator seeking. But it also increases latency.

The report query completes faster despite the higher latency because it issues multiple IO at once. The transaction query issues 1 (sequence of) IO, and waits for it to complete before issues the next.

So why complain now? I am testing a storage system for a data warehouse environment. I noticed that for a table scan, SQL Server was not issuing IO at sufficient high queue depth when the data files are distributed over 16 or more volumes (RAID groups). SQL Server issues 512K IO at high queue depth for 8 or fewer volumes. The onset of high queue depth IO seems to occur when degree of parallelism (DOP) is equal or higher than the number of files.

The throughput on each volume can be up to 600MB/s, the volume being comprised of six 10K HDDs in one case and six 7.2K HDDs in another. With 8 volumes spread over 4 RAID controllers, it is possible to for a table scan to generate IO at approximately 4GB/s.

Each RAID controller, HP P812 (this is an older model) in this case, can support 2GB/s IO. This is rather low considering the PCI-E slot is gen 2 and x8 wide, supporting 3.2GB/s. This might be because the memory on the RAID controller is 64-bits (8-byte) wide and operates in 800MHz for a nominal bandwidth of 6.4GB/s. That's a lot right? Well not if a disk read is written to memory first, then read back to be sent to the server (where the memory write-read sequence happens again).

SDRAM, including DDR derivatives, write at half the read rate. So the effective write-read throughput is one-third of the memory bandwidth. LSI controllers have similar (or less) bandwidth to external memory, but can drive higher bandwidth. There may be some other effect, perhaps the controller's internal memory?

Anyways, with 4 controllers each capable of 2GB/s, the whole system should support 8GB/s? Why 4GB/s? It turns out there is a setting in Resource Governor named REQUEST_MAX_MEMORY_GRANT_PERCENT which can "improve I/O throughput" at high degree parallelism. (oh yeah, the applicable DMV is dm_resource_governor_workload_groups) Well simply issuing IO at sufficient queue depth so that all volumes are working will do this. There is no reason to be cryptic.

In summary. there is no a priori single correct setting on queue depth and IO size for any or all circumstances and certainly not for different hardware configurations. To favor any specific purpose, SQL Server must know how many hard disk drives (or SSDs) that each volume is comprised of, and whether these are dedicated or part of a shared pool. The operating system cannot see beyond what the RAID controller presents, so we must provide this information manually.

Next, SQL Server must know what the mission is. Is the mission to keep latency low for transactions? Or is the mission to flood the queues for maximum throughput in batch processing? Or something in between? It is also disappointing that SQL Server has only a single table scan IO size, such that any RAID volume not at a specific number of disks x RAID stripe size is sub-optimal.

Below is the disk IO generated by the TPC-H LINEITEM table scan after DBCC DROPCLEANBUFFERS, SQL Server already has memory allocated from the OS (see previous post on Lock Pages in Memory)