Home, Cost-Based Optimizer, Benchmarks, Server Systems, Processors, Storage, TPC-H Studies

Query Optimizer Gone Wild - Full-Text Search Query Plans

Overall, SQL Server has become a very capable and mature product, with a very powerful engine and sophisticated query optimizer. Still, every now and then, a certain query structure throws the optimizer for a loop resulting in an execution plan that will take forever. The key to identifying this type of problem begins with the exeuction plan. First, the plan cost does not tell the whole story. It is necessary to know which execution plan operations can run well on modern server systems and which do not. Solving the problem can be a simple matter of rewriting the SQL to a different execution plan, one that uses good execution components.

Of course, when working with 3rd party applications that do not use stored procedures, it is necessary to convince the ISV, often first talking to someone who does not write code, not mention someone with any understanding of the SQL Server query optimizer.

Anyways, the topic here is Full-Text Search, in particular CONTAINS and CONSTAINSTABLE. CONTAINS is "a predicte used in a WHERE clause" per Microsoft documentation, while CONTAINSTABLE acts as a table.

Consider the two queries below, the first is an example of CONTAINS and the second an example of CONTAINSTABLE.

LoopJoinScan

We might intuitively think that there should be no difference between the two, which is why in SQL Server, we should never even bother with intuition and instead always, repeat always, focus on the execution plan.

LoopJoinScan

LoopJoinScan

Both queries perform a Full-Text search, both the CONTAINS function must also scan a index on the source table to get the count. The CONTAINSTABLE function on the other hand, being a row source, can be summed directly. In this example, the Document table is on the order of 60GB excluding lob structures stored out of the table, the index in question is 150MB, and there are 16M rows in the table. Both queries run in about 2+ sec elapsed. The first consumes 6 CPU-sec running in 2.2 sec, while the second query consumes 2.6 CPU-sec in 2.6 sec as it the not a parallel plan. OK, so the first query runs slightly faster with parallel execution on the Stream Aggregate, while the second is single-threaded. But the Full-Text function itself is not multi-threaded, and probably the bulk of the 2.2 sec of the first query. So why is the CONTAINS operation beneficial?

Before jumping to the title topic - Query Optimizer Gone Wild, lets look at another query, shown below.

LoopJoinScan

Below is the query plan. Note that neither column in the search argument is indexed, because this is an administrative query that the executive director runs once every month as a Key Performance Indicator, which is also probably related to why I am not an executive. So the execution plan is a table scan.

LoopJoinScan

The IO portion of the full table (Clustered Index) scan is 5677 (1350 pages or 10.5MB has an IO cost of 1 in a scan operation). For this particular example, the Fulltext Match Table Valued Function is assessed a plan cost of 1.6. When combined with the other components, Stream Aggregate and Filter, the total plan cost of this Full-Text search is 4.14.

On this particular system, a Xeon E7-48xx, with max degree of parallelism set to 8, the table scan query consumes 25 CPU-sec running 3.8 sec when data is in memory. At MAXDOP 20, the query consumes 37 CPU-sec running in 2.1sec. This is why I emphasized earlier that plan cost is not hugely relavent.

(In case you were curious, the 60GB, 16M row table scan consumes 23 CPU-sec at DOP 1, 24.5 CPU-sec, 12.3 sec elapsed at DOP 2, the same 24.5 CPU-sec, 6.6 sec elapsed at DOP 4, i.e., excellent scaling to DOP 8, and good continued scaling to DOP 20. This is an amazing 2.6GB/s per sec per core, and 700,000 rows per sec per core. Of course, this is a wide table with 175 columns averaging 3750 bytes per row.)

The Wild Plan

The actual query we are interested in is not the ones discussed above. Due to the nature of the application, PowerPoint documents can be indicated by the expression shown in any of three columns, one of which is part of a Full-Text catalog, as expressed by the query below.

LoopJoinScan

(It actually turns out that this query is not completely correct from the technical perspective, but it is correct by executive direction; also part of the reason why I will never be an executive.)

Given that this is a relatively simple SQL expression, and that the two elements of this query are known to run quickly, we might intuitively expect this composite query to also run quickly. But as I said earlier, do not even bother with intuition, and always always focus on the execution plan as shown below.

LoopJoinScan

Can you say: "We are sooo screwed!"

What is wrong with this plan? Let us compare this plan with the table scan plan from above. Both plans have approximately equal cost, as the 60GB table scan dominates. The extra Table Valued function contributes very little, as shown below.

LoopJoinScan

The problem with this execution plan is that there is a fat arrow (indicating very many rows, 16M in fact) coming from the outer source (top) with the Full-Text search in the inner source (bottom). For each row from the outer source, the inner source is evaluated.

This is why I said to not pay much attention to the plan cost, including components with high cost relative to other components.

Instead, it is important to focus on the SQL operations, the number of rows and pages involved, along with our knowledge of how each operation behaves non-parallel and parallel execution plans and the difference between data in memory versus on hard drive, and now also SSD storage.

This execution plan will attempt to performance 16M Full-Text searches. We have already established that this particular Full-Text search takes about 2 sec. The full query might take 32M sec. There are 86,400 seconds per day. We should expect this query to complete in 370 days, assuming there is not a need to reboot the OS after a critical security patch. And oh by the way, we need to run this query next month too, every month as a matter of fact.

Note, in the first Query Optimizer Gone Wild, the topic was a loop join with a table scan on the inner source. So this is another loop join example.

The Tamed Plan

Now that we have identified a problem, and we know exactly what to look for in the execution plan, it is time to solve the problem. Because we have been working with SQL Server and other DBMS engines built around a cost base optimizer for many years, we know exactly what to do. The solution is to rewrite the SQL to get a good execution plan where the two base operations forle which we know that have run time is reasonable are only executed only once each.

The query below meets this objective.

LoopJoinScan

The execution plan is below.

LoopJoinScan

This query consumes 37 CPU-sec, and 6.9 sec elapsed. Given that the two component elements of this query combined for 27 CPU-sec and 6.4 sec elapsed, the hash join and 2 parallelism repartition streams component increased true cost by 10 CPU-sec, but almost a minuscue 0.5 sec elapsed time.

I suppose that I should file this on Connect, but Microsoft locked my account out, and does not want to send me the unlock code. So I am posting this here.