Home, Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage,

Relativity Part 1, Part 2, Part 3, Part 4, Part 5, Addendum

Relativity Legal Document Discovery on SQL Server 2013-04

This document focuses on the more difficult aspects of Relativity pertaining to complex searches or large cases.
Many of the broadly applicable good practices are covered elsewhere.
Most of the material in based on version 7.3.

Update 2013-08 In version 8.0, it appears that the data type mismatch between ASCII and Unicode has been resolved.
The TOP 1000 issue remains.

Update 2013-09 The SQL Server Customer Advisory Team (CAT) article When To Break Down Complex Queries (reposted from SQLCAT.com, original date 2011-Oct) discusses "there are certain query patterns that can cause problems for the query optimizer"

Relativity from kCura is a legal document discovery application that runs on SQL Server. It uses both SQL Server Full-Text Search, and interfaces to dtSearch. Full-text search results may be saved and re-used in further searches, so there can also be SQL search queries as well. There are a number of significant and potentially serious challenges in supporting the Relativity database on SQL Server (for large or complex cases), which are covered here.

The SQL for Relativity is generated on the application server and sent to SQL Server as a language event (TSQL SQL:Batch in Profile Trace). This constrains our options in addressing performance problems. After having examined the SQL traffic from Relativity versions 7.1-7.3 for an extended period, there does not appear to be any insurmountable problems. However, several issues can only be addressed with code changes on the Relativity application server.

Some of the fixes proposed here are simple to implement. Others can be complicated or even very complicated. For some issues, perhaps a different strategy is better. All of the material here has been sent to the vendor more than one year ago. The vendor may have come to different conclusions. Since then, features have been added that may be oriented at the symptoms of the problem rather than the source.

One example is ensuring that a long running query is cancelled when then browser session times out. Otherwise the query continues to run until it completes or the (separate) query time out is reached. Even if the query does complete, there is no user interface to display the result. The user may have already openned a new session and re-issued the original query, most probably timing out again (and again). The problems discussed here are presented with root cause analysis and fixes at the source.

Without code-side fixes, the best that can be done is to understand why certain queries are painfully slow, or may never complete (before the next monthly patch requiring a reboot). It may not be a pleasant answer, but the pattern of searches that do not complete can be understood. Then at least we know when to give up. This better than trying it again and again when the only possible outcome is query time out.

On the hardware side, there are interesting characteristics with varchar(max) columns stored in the lob pages. SQL Server file IO to LOB pages do not follow the well-known rules for in-row pages, so we can neither rely on a Fast Track Data Warehouse Reference Architecture for high-bandwidth scans, nor on asynchronous IO at high queue depth for higher IOPS per disk. The nature of LOB pages also has implications in the database physical architecture.

Hardware

An individual Relativity case database is not expected to be large by today's standard. Since document discovery is not a permanent activity in most organizations, it is a candidate for outsourcing to a hosting provider with Relativity specific skills. For both hosting providers and large Relativity cases, my standard system and storage configuration is shown below.

Most people have no problem with powerful CPU and massive memory. The main problem is understanding IO performance. Too many people only consider storage capacity. Performance is achieved by distributing IO over multiple HBA/RAID controllers and with very many HDD or SSD devices evenly distributed over the IO channels. Storage vendors like to emphasize auto-tiering and flash-cache which add cost, and may actually impede database performance. The SQL Server database Filegroup and File structure must be matched to the actual storage configuration.

The figure above shows a clustered system with Dell Power Vault MD3220 storage. Normally I do not like to favor one vendor over another. An alternative configuration from HP would be the P2000 G3. But the P2000 G3 only supports 1.6GB/s throughput while the Dell MD3220 support 2.8GB/s. I am not sure why HP settled at such a low IO bandwidth. It is possible their efforts were directed to high-end storage systems, perhaps deliberately keeping the capability of the entry level system low. Or their efforts are fragmented from having too many storage solutions.

In other writings, I have said that SSD pricing is very favorable, but only for open market products. As of early April 2013, Dell only offers SLC SSDs with 6Gbps SAS interface in 200 and 400GB capacities at $1291 and $2444 respectively (Value line). There is also a mainstream line 6Gbps SAS SLC SSDs 200/400GB at $2,235 and $4,191, that may be a previous generation product. In the 3Gbps SAS generation, Dell had eMLC SSDs, but not in the current generation? See Dell Enterprise Hard Drive and Solid-State Drive Specifications (2013-01?).

HP has both SLC SSDs at 200 and 400GB and endurance MLC SSDs in 200, 400 and 800GB capacities. The list price for the MLC SSDs is approximately $10K per TB, probably discounting to $5K per TB? and 18K per TB for SLC. Depending on the actual pricing, a HDD/SSD mix may be a good option. When high endurance MLC SSD pricing from a server system vendor reaches $3K per TB, I would probably recommend SSD for the entire database, retaining HDD for local backups and other purposes.

There should two or preferably 4 RAID group/volumes for data on each MD3220, so the IO traffic can be distributed over the two paths. See Storage Performance 2013 for more details.

SAN Option

Some organizations love SAN storage systems so much that everything must be on a SAN. If so, then at least the configuration should be correct. There should be 8 FC links at 8Gbit per sec. Each 8Gbps FC port can support a net bandwidth of 700MByte/s.

This would provide about the same bandwidth as 2 SAS RAID controllers in PCI-E gen 2 x8 slots. A single PCI-E gen x8 RAID controller can now deliver 6GB/s, although this has not yet found its way into the main server system vendors' product line with supporting infrastructure.

It would be better to have additional bandwidth, but a very large number of FC ports requires more effort to manage and it can difficult to verify that the actual active path for each volume is correctly distributed. Just one incorrect path assignment on any of the many volumes can reduce IO bandwidth by a factor of two.

The number of volumes for data should be an integer multiple of the number of FC ports. Consider 8 or 16 volumes for data. It is absolutely essential in a SAN to verify that the active paths for the data volumes are uniformly distributed over the FC paths. If even one volume inadvertently ends up on the alternate path, that path could then carry 50% or double the IO traffic. Since a table scan must draw from each file (ie, volume) uniformly, then system level IO bandwidth could be reduced by half.

I know I just said that twice, but it is important. This occurred on a customer's Fast Track Data Warehouse implementation. Everything was done right on except this. I asked for the SAN vendor's support engineer to reassign paths with even distribution. He looked and said all volumes have a path so everything is fine. Do not buy from that vendor!

I do not recommend iSCSI. When you have problems, complain to someone else. If the problem must be resolved, call me when you are ready to throw the iSCSI system away.

There are two tables in the Relativity database that can become large, AuditRecord and Document. The activity to AuditRecord is mostly insert. It is important that there are no table locks on AuditRecord of extended duration, as this will block audited user activity. The Document table could become large in size due to on the varchar(max) fields. The general expectation is that the number of rows should not become exceptionally high, so nonclustered indexes on Documents should be small. A good strategy is to put frequently needed narrow columns in the include list of nonclustered indexes.

Most of the query activity to Relativity will be either to the Full Text Catalog, the nonclustered indexes on Document or other narrow tables that are essentially nonclustered indexes. As such, there should not be intense IO activity with a few exceptions. One exception is the IO to the lob pages of Document.

LOB IO

At the beginning, it was mentioned that LOB pages have different IO characteristics. Even if the execution plan shows a table scan, access to the LOB pages could have the characteristics of random IO. If the Document table is not created with TEXTIMAGE_ON to another FG, then the lob pages are interspersed with normal pages. (A later clustered index rebuild does not move LOB pages. Building a partitioned clustered index does move LOB pages. But this would also entail dropping and rebuilding the Full-Text indexes.) So we cannot rely on the strategies from the Microsoft Fast Track Data Warehouse reference architecture for high sequential IO bandwidth.

Furthermore, if a query accesses columns stored in LOB pages, SQL Server does not use asynchronous IO? (This might be because we cannot have a nonclustered index include the pointer to the lob page instead of the full lob field?) So we cannot rely on high-queue depth IO characteristics where it is possible to drive 300+ IOPS per disk. (SAN systems seem to defeat this capability on its own).

The IO for LOB pages that I observed was synchronous on a per thread basis. In a query plan at DOP 8, each thread issues synchronous IO to LOB pages. On a HDD storage system with 15K disks and data distributed over the entire disk, IO latency should be 5ms at low queue depth, yielding 200 IOPS per thread (or DOP). It does not matter how many hard disks there are in the RAID groups supporting data.

For this reason, when on HDD, we might like to keep the Document table LOB pages contained to a narrow range of the disks to minimize access latency. Another strategy is to use higher DOP for these queries, as the performance is 200 IOPS per worker thread in a parallel execution plan. This is assuming that the entire disk array can support the total IOPS of all threads.

One other option is to place the Document LOB pages on SSD, leveraging the very low latency IO. There are otherwise few reasons to need exceptionally high IOPS capability in Relativity. The normal activity is Full-Text searches, which does not have this issue once the full text catalogs are built.

Model Database, File Groups and File Layout

Below is a simplified filegroup and file layout representation. Only the volumes supporting data and tempdb are shown. The additional volumes for OS and log files are not shown. If there are 4 controllers and 4 data volumes on each controller, then there would be a total of 16 volumes dedicated to data and tempdb.


For databases that support exceptionally highly transaction volume, the log disks should be on a separate controller. Relativity does not, so the log disks can be attached to any one of the controllers for data. For each database and each filegroup within a database, there should be 1 data file on each of the volumes dedicated to data. The same applies to tempdb. Relativity has its own “model” database, so this structure can be applied to the Relativity model database.

Filegroups

The Relativity case database consists of:
  1) ordinary tables,
  2) tables with varchar(max) columns that may exceed the 8K in-row limit
  3) full text indexes

Because of the characteristics of LOB pages, these should be kept in its own filegroup, perhaps just for the Document table. Along with good size estimate of the lob FG before data population, this would help keep the lob pages to a very narrow range of the disk drives. The expectation is that this would reduce average disk access time for queries that access the varchar(max) fields of the Document table.

Relativity recommends that the full text indexes be kept in a separate filegroup (default?). This would leave just the ordinary in-row and row overflow pages in the primary FG.

AuditRecord is other table that warrants special consideration. The normal usage of this table is just inserts. Any user action that requires auditing will get one (or more) rows in AuditRecord. Some people are concerned with table fragmentation. There are some popular index maintenance scripts that reorganize indexes at one fragmentation level and rebuild at a higher level. The rebuild of a very large AuditRecord table (example 200M rows, 1TB) could take over 1 hour, during which the table is locked.

User activity that is just a read with auditing will be blocked because the audit requires an insert in the AuditRecord. Two possible options are
  1) place AuditRecord in its own filegroup, or
  2) partition the AuditRecord table

If AuditRecord is in its own filegroup, then it should not become fragmented, except due to file growth. In the partitioned table strategy, by policy the indexes on the active partition are never rebuilt. Each of the older partitions is rebuilt only once, soon after the active partition moves to the next range.

In all, there should probably be 3 filegroups for the Relativity case database:
  1) primary
  2) one for the Document table TEXTIMAGE
  3) one for Full Text Indexes

Partitioning the Document table to move the LOB fields is not viable because there are Full-Text indexes, which would have to be dropped and recreated(?) requiring Relativity to be offline?

For simplicity, this structure can be created on the Relativity model database. Be sure to set file growth and pre-size files as fits.

Relativity SQL

The SQL generated by Relativity encounters a number of sometimes very severe performance issues, some of which are rather interesting. The nature of Relativity search queries is a sequence of three steps:
  SELECT COUNT(*) to get a count,
  SELECT TOP 1000 ArtifactID for the ArtifactID rowset,
  and final a SELECT query
for the required columns based on the list of IDs in the previous queries.

The query logic is determined by user criteria, and SQL is generated on the application server. It might then seem that there is no purpose to using stored procedures, but we will revisit this. As the SQL is generated on the application server, there are fewer tuning options.

There several types of problems that will be discussed here:
  1) Data Type Mismatch (fixed in version 8.0)
  2) TOP Clause with Row Estimate Error (2 examples)
  3) AND OR Combinations (Hash Join hint fails)
  4) Repeating Sub-expressions
  5) Full-Text Search
 

It can be pointed out the above issues that occur in Relativity also apply to SQL Server in general. The impact caused by the above can be severe for both intermediate and large databases.

1) Data Type Mismatch - ASCII and Unicode (fixed in version 8.0)

Data type mismatch is a well-known issue that has been documented elsewhere in sufficient detail. Only a very brief discussion will be given here to provide scope in the first example. There are two problems that can occur. One is an index might not be used and the second is row count estimate error.

A data type mismatch can occur in the data type between two columns in a join condition or it can be in the search argument (SARG) specification between a column and a parameter (or explicit) value. When a data type mismatch occurs, SQL Server implicitly converts the more restrictive element to the other type. For example, a smallint would be converted to an int, or a varchar would be converted to nvarchar.

On an implicit conversion, SQL Server must strictly follow the rules pertaining to the conversion. The most problematic implicit conversion is from ASCII (char and varchar) to Unicode (nchar and nvarchar). If the example were: Column (type: varchar) = @Param (type: nvarchar), then SQL Server cannot use an index seek on the column, but rather must employ a scan.

This specific implicit conversion also causes row count estimation inaccuracies. So the problem caused by implicit ASCII to Unicode conversion is not only the direct operation, but could also result in problems further on in the execution plan due to propagation of the row count estimate errors accompanying the conversion.

Consider the simple example below. The Batch table Name column is type varchar. When the search parameter value is specified as nvarchar, the row count estimate is 1290 while a parameter type of varchar results in an estimate of 2 rows

Note: SQL Server 2012 shows the yellow triangle with exclamation when there is an implicit conversion. Earlier version did not have the visual cue, but the information is in the operator detail.

Relativity Part 1, Part 2, Part 3, Part 4, Part 5, Addendum