Home, Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage,

Relativity Part 1, Part 2, Part 3, Part 4, Part 5, Addendum

3) AND OR Combinations - Hash Join Hint Fails

Relativity generates SQL with a combination EXISTS, NOT EXISTS, AND and OR conditions. Presumably this is the more natural translation of user specification to SQL. This is perfectly valid SQL. However, when there is a combination of AND and OR conditions, SQL Server cannot deduce whether an equality join condition exists. In SQL Server documentation, it is stated that the Hash Join requires at least 1 equality clause in the join predicate. And so a hash join hint would fail in a statement with AND and OR combinations.

Now, a row estimate error mixed with the TOP clause creates a serious problem that cannot be fixed with the Hash Join hint. An example is shown below.

SELECT TOP 1000 [Document].[ArtifactID]
FROM [Document] (NOLOCK)
WHERE [Document].[AccessControlListID_D] IN (1,1000064)
AND ( [Document].[RI_1003671] IN (
  SELECT [Document].[RI_1003671] FROM [Document] (NOLOCK)
  WHERE (( [Document].[ArtifactID] IN (
   SELECT [Document].[ArtifactID] FROM [Document] (NOLOCK)
   WHERE [Document].[AccessControlListID_D] IN (1,1000064)
   AND ([Document].[RI_1003671] IN (
    SELECT [Document].[RI_1003671] FROM [Document] (NOLOCK)
    WHERE (EXISTS(
     SELECT [ArtifactID] FROM ArtifactAncestry (NOLOCK)
     WHERE [ArtifactAncestry].[ArtifactID] = [Document].[ArtifactID]
     AND AncestorArtifactID IN (5605547) ))
   )) )
  AND (
   NOT EXISTS (
    SELECT CodeArtifactID FROM CodeArtifact (NOLOCK)
    WHERE AssociatedArtifactID = [Document].[ArtifactID]
    AND CodeArtifact.CodeTypeID = 1)
   OR NOT EXISTS (
    SELECT CodeArtifactID FROM CodeArtifact (NOLOCK)
    WHERE AssociatedArtifactID = [Document].[ArtifactID]
    AND CodeArtifact.CodeTypeID = 2)
  ) ))
))
ORDER BY [Document].[ArtifactID]

If we apply the hash join hint to the above query, the following error message appears.

Msg 8622, Level 16, State 1, Line 1
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

When this occurs, Relativity re-issues the query without the hash join hint. To observe this event, in Profiler trap the Error columns for values not like '0%'. I suppose there is an extended events version of this.

The form of the query is Set A AND (NOT Set B OR NOT Set C). It appears that whenever there is a combination of AND and OR, the Hash Join hint fails. It does not matter if it is AND or AND NOT. The Hash Join hints works for any number of AND conditions only and also with any number of OR conditions only.

And Or Estimated Plans

Below are the estimated execution plans for the above query in the COUNT, no TOP, and TOP forms. As before, all are at DOP 1.

Estimated Plan - And Or, Count
AndOr_Est_Cnt

Estimated Plan - And Or, no Top
AndOr_Est_NoTop

Again, the COUNT and no Top query are the same up to the last join with subtree cost about 2878, differing only in the final processing. Both of these plans would normally be a parallel execution plan.

As before, the Top plan is different because the estimate rows that meet the search conditions is high, so the strategy is to get the first 1000 by working down the ordered list.

Estimated Plan - And Or, Top
AndOr_Est_Top

The estimate was that only 1009 rows from the upper right table scan operation would be required before the Top 1000 rows are found, with plan cost approximately 20. Instead, the sub-structure below the table scan operator is executed 21M times.

Even though the plan cost is over the cost threshold for parallelism, a parallel plan will not be generated probably because there is little reduction from the loop joins to overcome the cost of the parallelism operators.

And Or Actual Plans

Below are the actual plans in this example for Count, no Top and Top.

Actual Plan - And Or, Count
AndOr_Act_Cnt

Actual Plan - And Or, no Top
AndOr_Act_NoTop

In the Count and no Top query plans, note that there are few rows past the last join as indicated by the thin arrow.

Actual Plan - And Or, Top
AndOr_Act_Top2

Note the fat arrow on the table scan at the top. All 21M rows are evaluated instead of the estimated 1009 rows.

The CPU and elapsed time at DOP 1 for Count and Top is approximately 88-98 sec. The CPU and elapsed time for the Top query is 2947 sec. A parallel plan for the Top query does occur because the optimizer believes this is a cheap plan. Even though the plan cost is over 5, the cost of the parallelism operations outweighs the CPU reduction in the common operations, according to the SQL Server cost model.

So now we have an example of the Top clause problem not involving data type mismatch and that cannot be resolved with the Hash Join hint. As with the other examples, the problem can be resolved by simply removing the Top clause when the Count query indicates few rows (relative to the number of rows in Documents). My preference is that this be a user configurable set point and I would want to set it well over 1000, perhaps 10000 or as a percentage of rows in Documents.

Alternative SQL

Before leaving this topic, it can be pointed out that queries can be expressed in different styles. Relativity just happens to use AND, OR, EXISTS and NOT EXISTS combinations which probably is a more direct translation from the user interface.

All queries can also be written entirely with a combination of joins and unions. New operators Except and Intercept were introduced with SQL Server version 2005, but I never got around to exploring this.

The interior sub-query of the above example can be written with Inner Join replacing AND Exists, and Left Join replacing NOT Exists as below.


  SELECT [Document].[RI_1003671]
  FROM [Document] (NOLOCK)
  INNER JOIN (
   SELECT [Document].[ArtifactID] FROM [Document] (NOLOCK)
   WHERE [Document].[AccessControlListID_D] IN (1,1000064)
   AND ([Document].[RI_1003671] IN (
    SELECT [Document].[RI_1003671] FROM [Document] (NOLOCK)
    WHERE (EXISTS(
     SELECT [ArtifactID] FROM ArtifactAncestry (NOLOCK)
     WHERE [ArtifactAncestry].[ArtifactID] = [Document].[ArtifactID]
     AND AncestorArtifactID IN(5605547) ))
   ))
  ) x ON x.[ArtifactID] = [Document].[ArtifactID]
  LEFT JOIN (
   SELECT AssociatedArtifactID, CodeArtifactID FROM CodeArtifact (NOLOCK)
   WHERE CodeArtifact.CodeTypeID = 1
  ) y ON y.AssociatedArtifactID = [Document].[ArtifactID]
  LEFT JOIN (
   SELECT AssociatedArtifactID, CodeArtifactID FROM CodeArtifact (NOLOCK)
   WHERE CodeArtifact.CodeTypeID = 2
  ) z ON z.AssociatedArtifactID = [Document].[ArtifactID]
  WHERE y.CodeArtifactID IS NULL OR z.CodeArtifactID IS NULL

Note that the AND (NOT EXISTS yy OR NOT EXISTS zz) is replaced by LEFT JOINs for each expression and the WHERE clause handles the OR part. There is a difference between IN and JOIN in that the IN may remove duplicates. But Relativity always has the outer IN structure to remove duplicates, so we do not need to be concerned on the sub-expressions.

Below is the estimated plan for the alternate SQL in the no Top form. The relative cost of 27% is compared to 73% for the original no Top form.

Estimated Plan - Alternate SQL
AndOr_Est_NoTop_Alt

The estimated cost prior to the sort is 1176 versus 2878 in the original SQL. The estimated rows in the alternate SQL is 3.9M versus 12.2M in the original, neither of which are particularly accurate. But we were not expecting an accurate estimate for a query of this nature.

Below is the actual plan.

Actual Plan - Alternate SQL
AndOr_Act_NoTop_Alt

CPU and elapsed time for the alternate SQL is 65 sec versus 90 sec for the original.

It should always be possible to convert a SQL expression using AND, OR, EXISTS and NOT EXISTS to an expression with Join and Union. The SQL Server query optimizer has no trouble generating good execution plans for expressions with any number of AND Exists/Not Exists, and for expressions with any number of OR Exists/Not Exists. It is the AND-OR combinations that prevent SQL Server from generating a plan with Hash and Merge joins, which says it could not find the equivalent Join/Union form or an equality join condition.

The results are correct for both forms, and we should not draw too many conclusions from the Join version having 30% lower actual cost. Technically Hash and Merge joins are more efficient for handling a large number of rows. But when all data is in-memory, this is only be a factor of 2. The hash joins are more likely to have better scaling at higher degrees of parallelism.

AND OR Combinations Summary

For both the current Relativity style SQL with AND/OR EXISTS/NOT EXISTS, and the alternate SQL style with Join and Union, the better solution is to not apply the TOP clause when the Count query has already reported few rows. In fact, it is better to exlcude the Top clause unless the count is actually very high. Queries of this nature are almost certain to have significant row estimate errors. There is no possible benefit when the actual row count is low. There is an almost certain severe negative impact when the estimate is high and the actual is low.

The Hash Join hint results in an error message for the way Relativity generates SQL with AND/OR combinations. An alternate SQL style emphasizing join and union may be better and does allow the Hash Join hint (which is not necessary when the Top clause is not used).

 

Relativity Part 1, Part 2, Part 3, Part 4, Part 5, Addendum