In a later blog, I want to talk about parallel execution plans. Before jumping into that topic, it is helpful to first discuss some points on interpreting execution plans. So here goes.
The purpose of the cost based optimizer is to determine the best (or a very good) table access strategy in terms of which index to use or not, what join types and what join order. It is not necessary to estimate the true cost, particularly the scalar operations that do not affect the table access strategy. The optimizer estimates row counts involved from data distribution statistics when parameter values are known and uses rules when no estimate is available.
On executing a query with the Show Execution Plan option, the actual row count is shown along with the estimate. The costs are shown without the estimate qualifier as if it were based on the actual row count, but are actually still based on the estimated row counts.
The cost detail for each component operation in an execution plan (displayed when the mouse cursor is held over that component) consists of IO and CPU contributions. The full cost of a specific operation also factor in the number of executes. The number of executes applies to bookmark lookup and loop join inner source accesses. In SQL Server 2000, this is one of the lines in the cost detail. It is not displayed in SQL Server 2005, but can be obtained as the number of rows from the outer source going into the operation. (My mistake. It is Query Analyzer that shows the estimated number of executions whether connected to SQL Server 2000 or 2005, and SQL Server Management Studio that does not.)
The full cost of an operation should be the number executes multiplied by the CPU cost, plus a more involved formula for the number of IO required. The formula for IO represents the probability that an IO will already be in memory after a number of pages have already been accessed. For large tables, it also models the chances that a previously accessed page may have already been evicted when it is needed again. The sub-tree cost represents the cost of the current operation plus all operations that feed into the current operation.
The unit of measure for execution plan costs is buried in the documentation under the topic of the query governor cost limit: “Query cost refers to the estimated elapsed time, in seconds, required to complete a query on a specific hardware configuration.” Since the basic architecture of this cost model originates in SQL Server version 7, with significant changes in version 2005, the reference system might be a Pentium Pro. It might also be implied that the execution plan IO cost refers only to time spent waiting for IO, and not CPU time consumed. In actuality, the IO request to disk by itself has non-negligible CPU cost.
The IO cost model needs to reflect disk performance, not CPU performance. Small block random IO has a cost of 0.003125 which works out to 320 IOPS. Sequential IO has an incremental cost of 0.00074074 per 8KB page for 1350 pages per second, or 10.5MB/sec. It is uncertain what disk configuration this is supposed to reflect. The random IOP could be 4 x 7200RPM disks of 1995 vintage, which had sequential transfer rate of around 5MB/sec. A recent generation 15K drive can do 200-300 random IOPS depending mostly on queue depth and actual average seek time as the data is accessed, and 95MB/sec sequential (125MB/sec for the very latest models). A SAN has very different sequential performance characteristics, sometimes 10-30MB/sec per disk.
For non-parallel queries, the individual cost values for random and sequential IO used by the internal optimizer is not hugely important except for the ratio. Because the IO component is much larger than the CPU component, the index and bookmark lookup versus table scan transition point depends predominantly on the ratio of the random to sequential IO cost. It does not really make a difference if the IO cost is 320 IOPS random and 10.5MB/sec sequential or 640 random and 21 sequential, if the ratio is the same. In essence, this model says 320 bookmark lookups (adjusted) has equal IO cost to 1350 pages (10.5MB) in a table scan.
For a perfectly laid out table and file system, the latest 15K drive can scan over 13,500 pages for every 200-300 bookmark lookups or loop join inner source seeks. A direct attach disk array with 15 drives on one PCI-E SAS controller can support table scans at nearly 800MB/sec (100,000 pages/sec.) compared with 3000-4000 random IOPS. A SAN with 30 disks on one 4Gbit/s FC port can support 330MB/sec sequential or 6000-8000 IOPS random, which is close to the SQL Server internal cost model.
So far, it has been assumed that leaf level pages mostly reside on disk and is not cached. If most data is in memory as is the case for many memory heavy transactional systems today, there is zero or very little true IO cost. The actual CPU cost of one bookmark lookup or loop join with index seek on the inner source is approximately equal to the cost of one page in a table scan operation excluding row lock costs (SQL Server 2000 took a row for every row in a small table scan, with serious performance penalties relative to page, table or nolock).
The main point here is that the actual storage system performance can have a very different characteristic than the optimizer model. The direct attach storage configured for high bandwidth can support a table scan at ten times the sequential to random IO ratio assumed by the optimizer. When all data is in memory, the true cross over point for switching from index seek plus bookmark lookup (or loop join) to a tables is four times higher the other way than assumed by the optimizer.
Conceivably, the optimizer could check to see how much of a table is in memory before making a decision, but if this check occurs to frequently, it could cause excessive recompiles, which could be worse. Still, it would be helpful to have a system setting for the random to sequential ratio and a query hint for using an in-memory assumption (1 lookup equals 1 page scan). There is a trace flag T2301 described as: “Trace flag to enable more accurate query run-time behavior modeling in the SQL Server query optimizer”.
SQL Server 2008 also has the new query hint: FORCESEEK: “The FORCESEEK table hint forces the query optimizer to use only an index seek operation as the access path to the data in the table or view referenced in the query”. When a child should probably be given a croquet mallet, Microsoft gives a sledgehammer. Still, I do very much like hints that let optimizer chose the join order. I have used the Loop Join hint to get the execution plan to use an index, but this requires that I explicitly write the query for the best join order. It is time consuming to explicitly write a query that joins two tables and then feed into another join as an inner source, but what the heck, I get paid by the hour. I do not like to use index hints because I might want the change the index name later.
A side issue is that the relatively high IO costs gives a distorted impression of execution plan cost structure to any one not completely familiar with the true cost structure. The true CPU cost of a query is more substantial.
Below is the cost detail displayed by Query Analyzer connected to SQL Server 2005 for a Loop Join inner source index seek. Notice the Estimated number executes line below Estimated CPU cost.
The same cost detail from SQL Server Management Studio. Estimated number of executes is not displayed.
The number of executes is simply the number of rows from the loop join out source.