The previous chapters examined the true query costs for some basic SQL operations. On reducing the measured costs to formulas, a simple model can be used for calculating the performance of a database application based on characteristic queries, and cardinality of the data. From the material covered in the previous chapters, the main components that describe query cost are categorized into the following areas:
1) Procedure (base) overhead cost
2) Startup costs
3) Single row query costs
4) Multi-row and Table Scan costs
Procedure overhead cost includes all costs incurred by the server for receiving and handling a stored procedure but excludes costs due to actual queries involving tables. Presumably there are similar costs for receiving an SQL statement which requires parsing, but this has not been investigated. The procedure overhead cost is also referred to as the base overhead cost.
Startup costs are associated with table operations. The startup cost is defined here as a cost associated with a type of query that occurs once per procedure regardless of how many times queries of that type appears in a procedure.
Single row query costs include the cost of the query, excluding the base overhead cost and any startup costs. Single row query costs are reasonably easy to understand from the material in the previous chapters and are not discussed here.
Multi-row operations have a fixed cost and a cost per additional row. Table Scan costs have a fixed cost and a cost per additional page when not in row lock mode. When a query returns multiple rows, the fixed cost refers to the cost for producing the first row. When a table scan involves multiple pages, the fixed cost refers to the cost for scanning the first page. The remaining cost is simply the cost per row times the number of rows. The full cost for one or more single row queries in a stored procedure is expressed as:
Base overhead cost + Startup cost + Query cost
The cost for multi-row queries has only been examined as aggregates and is expressed below. Aggregation avoids network saturation, but also excludes the cost of sending rows over the network.
Base overhead cost + Fixed cost + per row costs
The fixed cost as reported here may incorporate a startup cost component, as not all operations were tested to resolve the startup cost.
The information presented in the Query Costs chapters show that the procedure overhead cost is on the order of 5-7 times higher than a single row Index Seek operation. This is determined by issuing a procedure without any table operations. Not all of the overhead cost is directly attributed to the network. Even if the client application resides on the same server as the SQL Server database, there is still substantial overhead cost. Some of this cost is associated with interrupting the SQL Server process. Even when an execution plan exists, there is some cost to finding the plan. Stored procedures and other information are stored in system tables. So there are costs for looking up system information that do not show up in the plan costs as exposed by Display Estimated Execution Plan or other functions.
The overhead costs for issuing a stored procedure (by RPC) for tested platforms are listed in Table 6.1. The costs are higher for queries issued with sp_executesql. The maximum theoretical procedures per second rates are computed based on all available CPU-cycles being used for handling the procedure call. Naturally, this leaves no CPU-cycles for actual database queries. This also assumes that there is sufficient network bandwidth to handle the traffic. The test setup used here did not have the capability to fully saturate the database with blank stored procedures. The highest measured procedure rate was about 60% of the theoretical maximum.
|Stored procedure calls per sec versus query cost||2xPIII 733/256K||4xPIII Xeon 700/2M||2xXeon 2.0/512K|
|Base overhead cost (CPU-cycles)||140,000||140,000||270,000|
|Available CPU-cycles per second||1,466M||2,800M||4,000M|
| Maximum theoretical Proc./sec |
(zero query cost)
|Proc./sec for 140/270K query cost||5,236||10,000||7,407|
Table 6.1 Rows per second capability based on 100% of CPU-cycles.
It does not matter if it is actually possible to achieve the theoretical maximum. The theoretical maximum serves as a guide for what query rate to plan on being well below. When the average query cost is the same the procedure overhead (140K for PIII/ 270K for Xeon), the maximum procedure throughput is one-half of the theoretical maximum. In this case, 50% of CPU-cycles are used for overhead and 50% for the actual queries. If it is anticipated that a database application will handle a heavy volume of low-cost queries from application server, then the Virtual Interface Architecture (VIA) might be a good solution. VIA supposedly has lower overhead cost for procedure calls between servers. When the typical query is complex, that is, several times more expensive than the base overhead, then procedure overhead cost is not an issue.
The startup cost is determined by comparing the cost of executing a given statement once in a procedure to the cost of executing the statement more than once per procedure. Table 6.2 lists the startup costs for some basic operations.
|Startup cost||2xPIII 733/256K||4xPIII Xeon 700/2M||2xXeon 2.0/512K|
|SELECT – single row Index Seek||20-30,000||Very low||30-40,000|
Table 6.2 Startup costs for basic operations.
It could be that the startup cost is related to the type of locking required for a procedure. Hence the SELECT query has lower startup cost than INSERT, UPDATE and DELETE queries. Queries that employ explicit transactions or SERIALIZABLE READ and REPEATABLE READ might also have higher startup costs. This is only a possibility as the nature of the startup cost has not been fully investigated.
The startup cost for simple single row operations is presented because the cost is significant relative to the single row query cost itself. The startup cost for more complex queries has not been investigated since the query cost is believed to be much higher than the startup cost.
Multi-row and Table Scan operations have a fixed cost and a cost per additional row or page. The fixed cost and cost per additional row are determined by examining a multi-row query across a number of rows. The same applies for Table Scans across a range of pages. The fixed cost may contain a startup cost, as the startup cost for multi-row operations was generally not investigated for more complex operations. Technically, the fixed cost should subtract the cost for one row or page, but this is usually small and is not considered.
Table 6.3 lists the fixed cost for some multi-row and multi-page operations.
|Fixed cost for multi-row/page operations||2xPIII 733/256K||4xPIII Xeon 700/2M||2xXeon 2.0/512K|
|Multi-row Index Seek||90,000||40,000||140,000|
|Bookmark Lookup (Sequential)||110,000||24,000||150,000|
|Bookmark Lookup (Distributed)||110,000||24,000||180,000|
Table 6.3 Fixed cost for multi-row some operations.
It might be logical to conclude that the fixed cost for a multi-row Index Seek operation should be the startup cost plus the single row Index Seek cost. The multi-row Bookmark Lookup fixed cost should be the startup cost plus the single row Bookmark Lookup cost. The loop join fixed cost should be the startup cost plus twice the single-row Index Seek cost. In fact the fixed cost is somewhat higher. It could be that the aggregate function adds some costs of its own.
Table 6.4 is a listing of the CPU-cycles cost per row for certain operations. Table 6.5 lists the Table Scan cost per page with page lock and table lock. A Table Scan with no lock has essentially the same cost as with table lock.
|CPU-cycles cost per row by Operation||2xPIII 733/256K||4xPIII Xeon 700/2M||2xXeon 2.0/512K|
|Multi-row Index Seek||1,300||1,300||1,800|
|Bookmark Lookup on Clust. Index||17,500||15,500||21,000|
|Bookmark Lookup on Heap||11,500||11,500||15,000|
|Merge Join, Page Lock||3,000||3,000||4,000|
Table 6.4 General cost per row summary.
|CPU-cycles cost per page||2xPIII 733/256K||4xPIII Xeon 700/2M||2xXeon 2.0/512K|
|Table Scan – Page Lock||26,000||26,000||26-27,000|
|Table Scan – Table Lock||25,000||25,000|
Table 6.5 Table Scan cost per page (99 rows per page) with page lock.
The row and page count costs set the upper limit on system performance in terms of the number of row and page operations per second that can be performed. The row count for critical operations needs to be considered carefully. An application cannot blindly allow uncontrolled growth in the row count of key queries. The type of row operations needs to be considered. Ideally, high-row count Bookmark Lookup operations should be avoided. There can be only one clustered index for each table, which may not encompass all high row-count queries. A covered index can also eliminate Bookmark Lookups. However, covered indexes may not be effective if large columns are involved. Indexed view may be a viable alternative if read queries are more frequent the write queries. If there is not a preferred column for a clustered index, then consideration should be given to leaving the table as a heap, where the Bookmark Lookup costs per row are significantly lower. One other point of note is that the SQL Server query optimizer will switch a Bookmark Lookup plan to a Table Scan at a much lower row count than the true cross-over point where a Table Scan is more effective than Bookmark Lookups.
When multi-row joins are involved, then Merge join is most effective if a one-to-many join can be guaranteed with a primary key, and if an approximately equal number of rows from both tables are involved and if clustered indexes can isolate the required rows from both sources. High row-count loop joins may be a reason for concern.
Table Scans in execution plans should be carefully watched. Table Scans should be allowed only if there is a legitimate reason. Even then, every effort should be made to determine whether row-locking is required, and if not, have row locking explicitly disabled.
Table 6.6 shows the peak rows per second capability for various operations excluding other costs such as the base overhead and startup or fixed costs. Table 6.7 shows the peak pages per second capability for the listed operations on each of three platforms.
|Peak rows per second for various Operations||2x PIII 733/256K||4x PIII Xeon 700/2M||2x Xeon 2.0/512K|
|Multi-row Index Seek||1,127,692||2,153,846||2,222,222|
|Bookmark Lookup on Clust. Index||83,771||180,645||190,476|
|Bookmark Lookup on Heap||127,478||243,478||266,667|
|Merge Join, Page Lock||488,667||933,333||1,000,000|
Table 6.6 Rows per second capability based on 100% of CPU-cycles.
|Pages per second||2xPIII 733/256K||4xPIII Xeon 700/2M||2xXeon 2.0/512K|
|Table Scan – Page Lock||53,565||102,308||146,154|
Table 6.7 Pages per second capability based on 100% of CPU-cycles.
The simple model of excluding one time cost per procedure or query allows quick computations of what is possible and what is not. For large queries where the base and fixed costs are small relative to the cost per row contribution, the peak row and page throughput is a good estimate. For small queries, the base overhead procedure handling cost and any startup or fixed costs needs to be taken into account. For example, a database application that is targeting a performance of 100 complex queries per second (plus some simple queries) on a dual processor Xeon 2.0GHz/512K server, the average complex query needs to be within 22,000 row for a multi-row index seek, 1,900 rows for Bookmark Lookups on a Clustered Index or 2,600 on a Heap organized table, 1,900 rows for Loop joins, 4,000 for merge joins or 10,000 for merge joins with page lock on both table and 1,400 pages for Table Scan. The fixed cost for a hash join is rather high, so these simple calculations should be used with care.
Figure 6.1 shows the queries per second capability for a 4xPentium III 700/2M systems calculated from on the base cost, fixed costs and costs per row or page for the various operations based on 100% CPU utilization. The horizontal scale is rows for all operations except the table scan, which is in pages. The values for Figure 6.1 are listed in Table 6.8.
Figure 6.1Calculated queries per second for 4xPIII Xeon-700/2M.
|Rows/Pages||Index Seek||BL on Cl||BL on Heap||Loop Join||Hash Join||Merge Join||Merge PL||Table Scan|
Table 6.8 Calculated queries per second for 4xPIII Xeon-700/2M.
The measured performance did not achieve the calculated performance at the lower rows per query, but that may have been due to saturation on the load generator system. The maximum query rate achieved by the load generator system in these tests is approximately 9,000 per sec. The measured performances at the higher row count per query are reasonably in line with the calculated values.
Figure 6.2 shows the calculated queries per second capability for the 2xXeon 2.0GHz/512K system. The values are listed in Table 6.9.
Figure 6.2 Calculated queries per second for 2xXeon-2.0GHz/512K.
|Rows/Pages||Index Seek||BL on Cl||BL on Heap||Loop Join||Hash Join||Merge Join||Merge PL||Table Scan|
Table 6.9 Calculated queries per second for 2xXeon-2.0GHz/512K.
The same observation for the previous systems applies. The maximum query rate generated by the load generator system in these tests is approximately 8,000 per sec.
A database application might be characterized by a set of queries consisting of a mix of operations weighted according to usage. The assumption made here is that the average query cost can be approximated by taking a weighted average of the number of procedure calls and the component operations in an additive manner. This may not be exactly true, but initial assessments indicate that a simple additive approach is workable in the absence of excessive contention.
Figure 6.3 shows the query cost budget, excluding the base overhead, for a given query rate.
Figure 6.3 Calculated query budget versus query rate.
For example, if 5,000 queries sec are required, the budget is 420K CPU-cycles on the 4xPentium III Xeon-700MHz or 530K on the 2xXeon 2.0GHz in addition to the 140K base overhead on the first system and 270K on the second. Note that the cost of a particular operation differs from one platform to the other. The 2x2.0GHz Xeon system with 4 billion CPU-cycles per second, versus the 2.8 billion for the 4x700MHz Pentium III system, has the higher query budget in except at high query rates. At high query rates, however, the higher base overhead cost on the Xeon system reduces the available budget.
A simple method for calculating database performance based on the results observed from the measured query costs is discussed. It is probable there are other issues that significantly influence performance calculations. These issues will be covered as time permits. Calculations for write operations will be discussed at a later time.