Parent: SQL Server Cost Based Optimizer, SQLBlog link: Why Logical IO is a poor performance metric

Why Logical IO is a poor performance metric

Many years ago, I attended a conference where a very highly respected Oracle performance expert advocated the use of logical IO count as a performance metric. The argument was that a logical IO is a fairly substantial operation relative to others. I suppose that detailed logical IO was already collected supported this. So why would logical IO be advocated over CPU? On the Windows side, many people have probably noticed that Profiler frequently shows 0 CPU for low cost queries.

(The resolution of Profiler in SQL Server 2000 was some approximate integer multiple of 10 ms for the single processor kernel and 15.625 ms for multi-processor kernel. It does turn out that averaging many calls yields a reasonably accurate CPU measure, possibly to the range of 1 msec.)

I am not sure what the situation is for Oracle, which runs on many platforms, UNIX, Windows and other OS. It cannot be easy collecting performance counters whose meaning is uniform across all the different platforms.

Anyways, the argument was convincing enough that I thought it warranted investigation on the SQL Server side. By comparing logical IO and average CPU (measured in full saturation load tests where SQL Server is driven to near 100% CPU), it was quickly apparent that Logical IO and true cost had no meaningful relation.

Consider a table for which the indexes have depth 4, so that there is a root level, 2 intermediate levels and the leaf level. The index seek for a single index key (that is, no key lookup) would generate 4 logical IO (LIO). Now consider if a key lookup is required. If the table has a clustered index also of depth 4, each key lookup generates 4 LIO. If the table were a heap, each key lookup generates 1 LIO. In actuality, the key lookup to a heap is about 20-30% less expensive than a key lookup to a clustered index, not anywhere close to the 4:1 LIO ratio. Now consider a loop join between two tables. The index seek to outer source generates 4-5 LIO, 4 for the index depth, and possibly one or more IO for additional rows that do not fit in the same leaf level page. Each index seek to the inner source generates 4 LIO as the index is traversed. For a loop join involving 100 rows in a one-to-one join, (no key lookups) there are 404 or so LIO. Now consider a hash or merge join for the same number of rows (for which there is an explicit SARG on each source). There are 4-5 LIO to each table, for a total of 8-10 LIO. In actuality, the cost between the 3 join types at this row count is not far different (the hash join has a higher startup cost than a loop join, but lower cost per incremental row), but the LIO ratio is 404:10!

The SQL Server Cost Based Optimizer itself clearly does not use LIO for its cost estimation. Just compare an index seek for tables and indexes with different index depth. The cost is the same, which is approximately true. After assembling a very deep collection of SQL Server performance measurements, I went back to the Oracle expert to ask about the LIO argument. He said that in the intervening time, a huge amount of evidence on the Oracle side also indicated LIO is a poor metric and this had been abandoned.

The SQL Server 2005 DMV dm_exec_query_stats keep milli-sec resolution CPU stats, and micro-sec in SQL Server 2008. I strongly suggest people abandon LIO as a performance metric. Use CPU (total_worker_time) for non-parallel execution plans. For parallel plans, the DMV reports 1000. So I suppose one must revert to Profiler traces for parallel execution plans. (If any one know how to get parallel plan CPU from DMV, please advise.)

Note also, the other post indicating physical IO has a completely different cost structure than logical IO.

What you are saying is that you use LIO to assess the guilty party, its some resource which may not be related to CPU. What I said is LIO is not a meaningful measure of any guilt: PIO or memory or CPU, which is most of the resources that have meaning.

Consider 2 queries, one doe 10000 key lookups to a clustered index, another does 10000 to a heap. The first generates 40K LIO, the second 10K. Both have the same likelihood of generating PIO, both generate the same memory usage, the second does have 20-30% lower CPU, and yet LIO counting assessed 4X the guilt to the first!

If you are after PIO, then go after PIO not LIO. Even if you are after memory, after fixing the big CPU consumers, it is very likely you will have fixed the memory consumers as well.

The main resource to watch is CPU, if a query finishes quickly with minimum CPU, that's being a good citizen, LIO has no bearing. (Duration is a good counter to watch after CPU has been brought down.) There are examples of good citizenship, but again LIO count has no bearing.

That's not always the case - depends on IO type (sequential or random) for example. Classical case of the opposite: scan versus seek + lookup. Optimizer usually prefers scan over seek + lookup when > 1-1.5% of the table should be returned. In such a case scan would consume more IO but the overall duration will be lower.

Sometimes people just want to be difficult. The situation is: there is a reliable metric (CPU) and an unreliable metric (LIO) available. Why is it people to gravitate to the unreliable metric? In most cases?, but if you tune to CPU, it is all cases! So why work with the metric that is not reliable? If query averages 10000 LIO, nobody has any idea what that means, because we do not know the source of the LIO. If CPU is 100ms, I can guarantee you that the upper bound on your system is 10 calls/sec per core and if you have an 8 core, the upper bound 80/sec (more likely 20-30% lower than 80). If you tune a query reducing it 10 LIO, you don't know how much a gain you made. If you tune it down to 10ms, you made 10X gain. If you are doing tuning, it probably means you are changing the execution plan. When you change the execution plan, LIO from one plan has no relation to LIO in a different plan!

If you what another example. Suppose a query involves a loop join for 10000 rows (outer source) and the index depth to the inner source table is 4. This means your query is about 40K LIO. Suppose the inner source table is 30K pages in size (240MB) and you forced a hash join. Then LIO will be around 30K. Did you improve it? by LIO yes, by CPU no! The loop join for 10K rows probably cost around 50 cpu-ms. The hash with scan (mostly from the scan) will cost more.

what idea world! if you can go to sys.dm_exec_query_stats for LIO, why is it so hard to aggregate total_worker_time??? So why the stubborn persistent clinging to the less reliable indicator?

I will talk about hash vs loop join in a later post. Per my earlier post, the 1-1.5% idea is another peeve. The SQL optimizer does not use an x% rule, so we should not tell people that it does. The simplified version of the rule that the optimizer uses is: each incremental sequential IO page cost 0.00074074 sec or 1350/sec which works out to 10.5MB/sec. Each "random" IO in a key lookup costs 0.003125 sec or 320 IOPS, with a method for estimating how many IOs are required per row. For a large number of rows, its usually 90+%, so essentially the index seek + key lookup versus scan decision is made based on the 1350 : 320 ratio (4.2:1), which the rate of scan pages per sec versus key lookups per sec.

How can say logical IO must be any part of the performance equation if no one, NO ONE, can say what it means? Consider the pre-euro days, you are traveling throughout Europe, every time you buy something, you increment a counter for each of the local currency you spent. 5L in England, 9 francs in France, 8 marks in Germany, 1000 lire in Italy. At the end of your trip, what does your counter tell you? how meaningful is it? unless you know the actual execution plan and rows that made up your LIO, you don't know what it means? And if you did have the components, you don't need LIO. If LIO was so meaningful, why doesn't the SQL Optimizer use it? it doesn't, because it has no real consistent quantitative value.

On index seek + key lookup versus table scan, for better performance, you need to specify CPU or duration. In any case LIO doesn't tell you anything. If you are talking about when the execution plan switches from IS+KL to Scan, it occurs around 1350:320 ratio (4.2 to 1, or 10.5MB/sec to 320 IOPS) used by the optimizer. If your table data size is 8GB, ie, 1M leaf level pages, then around 250K or so rows, the execution plan changes from IS+KL to Scan. If your clustered index key depth just happens to be 4, then the switch just happens to occur around the same LIO count(order of magnitude). But if the key depth was 3 or 5, then you are way off. See my other posts on the formulas used by the SQL optimizer. If you are talking about CPU, with all data in memory, then 1 row in a key lookup might cost about the same as 1 page of a table scan depending on the rows per page, which is drasticaly different from what the optimizer model. If you are going to disk, and looking at duration, then it all depends on your actual random IOP to sequential transfer rate is. For a single 15K SAS drive, assuming tightly packed data, you might get 400-800 IOPS (note this is not a true random IOP) in the key lookup (with scatter gather activated) or you could get 80-120MB/sec in the table scan, depending exactly how your data is laid out and how the SQL is written. Again this is far different than the optimizer model in the other direction than in-memory.

Yes I know many people have written about LIO as if it were a key system resource, but know one could actually say what. The truth is it is sued because it is easy to collect, which does not say it is useful. Like people who cling to guns and religion, the scientific dba needs to let go of LIO, and the db buggyman is not lurking in the closet at night

Still, the point is LIO is not a consistent measure of CPU, nor is their any internal limited resource governed by LIO. Now I have made my complaints about the SQL optimizer cost formulas. (MS has made their point that they can not fix past errors because so many people now depend plans generated from the existing silly model.) The SQL Server Cost Based Optimizer since version 7.0 is based on duration, not cpu, and not LIO. The duration is based on (P)IO time and CPU time. The CBO is not trying to predict CPU usage, but does use a model of CPU usage. Yes, LIO count can be much more accurately predicted than true CPU, but clearly MS (and probably Oracle as well) felt that LIO was worthless as an optimization strategy, or they would have used it.

ok, so long as you understand that the primary objective is to reduce query CPU. (Ocassionally, network transmission or disk io might be more important, but let set that aside for the moment). First determine whether the query time is in the compile/recompile, or the actual execution. Assess whether the cpu is in the table/row access, or logical. For example, SELECT SUM(col1), SUM(col2), SUM(col3) FROM X is more expensive than just SUM(col1) FROM X (assuming this difference does not change the plan, indexes etc). Its not just the SELECT logic, but any join or where clause logic. The optimizer does not consider the difference if this does not affect the plan, but it can be substantial. Consider the number of rows and pages accessed and whether this is the most efficient possible. If your query is WHERE Col1 = A and Col3 = B and your index is leads Col1, Col2, Col3, the plan may have an index seek, buts it not the most effective. So keeping the rows involved close to min helps. Next, of the rows involved are they colocated? or spread out? There are "experts" out there who talk about leading an index with the most selective column, which is not entirely true. In a join, it is more important to keep the rows accessed together stored together. Even better would be if they were in the correct sorted order.