Home, Query Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage, TPC-H Studies

SQL Server Benchmarks is now split into subsections:
 TPC-C,  TPC-E,  TPC-H (Updated 2011-10),  SPEC CPU 2006 Integer,  and older material on TPC-H (to be replaced)
Additional related topics:
  SQL Server 2008 R2 Data Warehouse Performance Evaluation (2010-07)
  Solid State Drive versus Memory, TPC-H Nehalem (Updated 2009-11)
  Benchmark Omissions (2010-04)
  SQL Server Benchmark Results (November 2009)
  TPC-H Studies (informal studies, does not meet TPC requirements)

Benchmark Summary 2012-11-26

Fujitsu published on Nov 5, 2012 a 4-way Xeon E5-4650 (Sandy Bridge EP) 2.7GHz 8-core with 512GB memory and 90 x 200GB SAS SSD storage. The software stack is Windows Server 2008 R2 EE sp1, SQL Server 2012 EE.

This is less than the IBM result of 2,862,61 on Jun 28, 2011 for 4 x E7-4870 (Westmere EX) 2.4GHz 10-core with 1024GB memory also on 90 x 200GB SAS SSD. The software stack is Windows Server 2008 R2 EE, SQL Server 2008 R2 EE.

The other comparable is an HP result of 2,451.51 on Jun 20, 2011 for 4-way E7-4870 (Westmere EX) 2.4GHz 10-core with 1024GB memory on 1100 x 15K HDD over 11 RAID controllers. The software stack is Windows Server 2008 R2 EE, SQL Server 2008 R2 EE sp1.

Processor
Architecture
Process
TPC2-way4-way8-way16-way
Sandy Bridge-32nm
8C
TPC-C----
TPC-E1,871.81 (E5 8c)2,651.27--
TPC-H----
Inter-Lagos
32nm
16C
TPC-C1,207,982n/an/an/a
TPC-E1,232.84n/an/an/a
Westmere-32nm
Xeon 5600 6C
E7-x870 10C
TPC-C1,024,380
803,068
3,014,684 (DB2)-future
TPC-E 1,284.14 (X5690 6c)
1,560.70 (E7 10c)   
2,862.61 (SSD)
2,454.51 (HDD)
4,593future
TPC-H73,974.6@100G- 219,888@1TB
173,961@1TB
future
Magny-Cours
45nm
12C
TPC-C705,6521,263,599n/an/a
TPC-E887.41,464n/an/a
TPC-H71,438.3@100G107,561@300Gn/an/a
Nehalem 45nm
Xeon 5500 4C
7500 8C
TPC-C661,4752,308,099 (DB2)
1,807,347 (SQL)
--
TPC-E850.02,022.643,800.00-
TPC-H51,086@100G121,346@300G
101,719.3@1TB
162,601@3TB-

Benchmark Summary 2012-10-02

TPC-E for 2-way Xeon E5 (Sandy Bridge-EP) came in June 2012. The result is for E5-2690, 8 core 2.9GHz. Notice performance improvement over 2x10-core 2.4GHz E7 (Westmere-EX) and 2x6-core 3.46GHz, reflecting both improved aggregate Core-GHz and improve core compute per cycle (IPC).

Benchmark Summary 2011-11-xx

AMD Inter-Lagos 16c results are out on 2-way, HP DL385, for TPC-C and E.

Benchmark Summary 2011-10-01

Below is a summary of the best available TPC benchmark results for Intel Xeon (Westmere 32nm and Nehalem 45nm) and AMD Opteron (Magny-Cours) server systems. Most results are SQL Server with a few Oracle or Sybase results. Ingres Vectorwise and Exa Solutions results are not in this list.

There has been relatively light activity in TPC Benchmarks recently with the exception of the raft of Dell TPC-H results with Exa Solutions. It could be that systems today are so powerful that few people feel the need for benchmarks. IBM published an 8-way Xeon E7 (Westmere-EX) TPC-E result of 4593 in August, slightly higher than the Fujitsu result of 4555, published in May 2011. Both systems has 2TB memory. IBM prices 16GB DIMMs at $899 each, $115K for 2TB or $57.5K per TB. The Fujistu system has 384 SSDs of the 60GB SLC variety, $1014 each, and IBM employed 143 SSDs of the 200GB eMLC variety, $1800 each. Except for unusually write intensive situations, eMLC or even regular MLC is probably good enough for most environments.

HP published a TPC-H at 1TB of 219,887.p QphH for their 8-way ProLiant DL980 G7 with the Xeon E7-4870. Also of interest are TPC-H 1TB reports published for the 16-way SPARC M8000 (June 2011) with SPARC64 VII+ processors and the 4-way SPARC T4-4 (Sep 2011). IBM published an 8-way 10-core TPC-E with slightly higher score than the previous Fujitsu result.

Benchmark Summary 2011-07-31

Below is a summary of the best available TPC benchmark results for Intel Xeon (Westmere 32nm and Nehalem 45nm) and AMD Opteron (Magny-Cours) server systems. Most results are SQL Server with a few Oracle or Sybase results. Ingres Vectorwise and Exa Solutions results are not in this list.

IBM published a TPC-E for a 2-way Xeon E7-2870, 10-core 2.40GHz at 1,560.7 tpsE, about 20% higher than the 2-way Xeon 5690 6-core 3.46GHz result below. The 2x10-core E7-2870 system was configured with 512GB memory versus 192GB for the 2x6-core Xeon 5690. Both systems employed SSD storage.

In June, IBM published a 4-way Xeon E7-4870 TPC-E with SSD storage, compared with the previous 4x10-core result on HDD storage.

2011 May Intel Xeon 5600 refresh and Westmere-EX (E7) processors

Also, HP published a TPC-E result with the Violin Memory System V3205 Flash Array storage system.

2011 Mar 4 IBM 4-way 32-core Xeon 7560 TPC-H SF1000

2010 Oct 5 Oracle/Sun SPARC 32-way 128-core TPC-H SF3000

Oracle published a TPC-H SF3000 result. There are now reasonably recent TPC-H SF3000 reports for Xeon 7400, 7500, IBM POWER6 and SPARC systems, encompassing SQL Server, Sybase and Oracle. Its too bad there is still no TPC-H result for IBM POWER7 as the expectation is that this processor is comparable to the Intel Nehalem-EX architecture at the core level. The difference is that the design point of the Xeon 7500 is 4 and 8-ways and for POWER7 32-way systems.

2010 Sep 28 Fujitsu RX 900 8-way Xeon 7560 TPC-E 1.85X scaling over 4-way

Fujitsu just published an astounding TPC-E benchmark result of 3,800 tpsE for their 8-way Xeon 7560 system, the Primergy RX900 S1. Fujitsu had previously published a TPC-E result of 2046.96 for their 4-way Xeon 7560 system, the Primergy RX600 S5. The new results shows 85.6% scaling from 4-socket to 8-socket.

Microsoft Windows Server 2008 R2 introduced core OS improvements in that not only increased the number of logical processors supported from 64 to ???, but removed many locks, including the dispatch scheduler lock. This improved high-end scaling (64 to 128 cores?) from 1.5X to 1.7X, based on tests with the HP Superdome and Itanium processors. At the time of this announcement, the Xeon 7500 processor were not yet available.

When the Xeon 7500 did become available in early 2010, the first TPC-E benchmarks were 2,022.64 and 3141.76 tpsE for the 4-way and 8-way Xeon 7560 systems respectively. The scaling from 4S to 8S was 1.55X, well below the expectation of 1.7X set by Microsofts 2008 R2 announcement. This was understandable as the 8-way result was probably rushed to alignment with product launch. Perfect benchmark results are ready on their own schedule, which is not always in time for marketing blitzes. (Of course, considering that the marketing budget may be paying for the benchmarks, it would be advisable to try really really hard to have a good result for product launch.)

There are two apparent differences between the new Fujitsu and original NEC 8-way Xeon 7560 TPC-E reports. One is the Fujitsu uses SSD while the NEC system used HDD storage. The SSD configuration yields much better average response times mostly in the Trade Lookup and Trade Update transactions, with a reductions from 50/56ms to 13/14ms respectively. In the 4-way Xeon 7560 TPC-E reports, the use of SSD over HDD yields 1% improvement. The other difference is that the Fujistu system distributes network traffic over 6 GbE ports compared with 2 for the NEC system. There are 24 or so(?) RPC calls per TPC-E transaction, so the extra network ports might provide another minor improvment.

Nothing apparent can explain the 4S to 8S scaling improvement from 1.55X to 1.85X. This is certainly not impossible, as IBM figured out how to do this and better with their POWER4 line some years ago. At the time, I thought this was mostly the massive inter-processor bandwidth of the POWER4. Now it is more clear that the OS and database engine along with processor and system architecture together all contribute to exceptional, nearly perfect scaling.

My thinking is that some one at Micrsoft has been watching the performance traces and finally figured out the most critical points of contention. So I believe this is a new build of Windows and SQL Server, but build numbers do seem to be obvious in the TPC reports, even though full disclosure is required. It is never some magic registry entry like Turbo Mode: ON.

2010 Sep 19 4-way Xeon 7560 TPC-C and TPC-H@300, HP ProLiant DL580 G7

TPC-H results was finally published for 4-way Xeon 7500 @300GB on 14 Sep. A TPC-C result was also published for the 4-way 7500 on 27 Aug. There will probably not be a TPC-C for the 8-way DL980 as there may be a limitation for SQL Server in the ability to write to a single log file. HP seems to be the only vendor active in TPC-H. This could be because other companies have cut staff. Benchmarking is a specialized skill. It usually takes a dedicated person for each benchmark and environment. It is not the benchmark result that is important. It is the investigation into the root cause of bottlenecks to improve performance in the next iteration that is important. So this means only HP will be making contributions in DW.

Benchmark Summary 2010-09-xx

Below is the original summary of the best available TPC benchmark results with histoty going back to Conroe/Dunnington and Barcelona.

Processor
Architecture
Process
TPC2-way4-way8-way16-way
Core2 65nm
Xeon 5300 QC
7300 QC
TPC-C
TPC-E
TPC-H
251,300
5160 only
17,686@100
407,079
479.51
34,990@100
841,809
804.0
46,034@300
-
1,250.0
-
Barcelona
65nm
QC
TPC-C
TPC-E
TPC-H
-
-
-
471,883
-
-
-
-
52,860@300
-
-
-
Core2 45nm
Xeon 5400 QC
7400 6C
TPC-C
TPC-E
TPC-H
275,149
317.45
-
634,825
729.65
-
Linux DB2
1,165.56
-
-
2,012.8 (R2)
102,778@3T
Shanghai 45nm
QC
TPC-C
TPC-E
TPC-H
-
-
-
579,814
635.4
-
-
-
57,685@300G
-
-
-
Istanbul 45nm
6C
TPC-C
TPC-E
TPC-H
-
-
-
-
-
-
-
-
91,558@300G*
-
-
-
Nehalem 45nm
Xeon 5500 QC
7500 8C
TPC-C
TPC-E
TPC-H
661,475
850.0
51,086@100G
1,807,347
2,022.64
121,346@300G
-
3,141.76
162,601@3TB
-
-
-
Westmere 32nm
Xeon 5600 6C
7600 10C
TPC-C
TPC-E
TPC-H
803,068
1,110
73,974.6@100G
future
future
future
future
future
future
future
future
future
Magny-Cours
45nm
12C
TPC-C
TPC-E
TPC-H
705,652
887.4
71,438.3@100G
1,193,472
1,464
107,561@300G
n/a
n/a
n/a
n/a
n/a
n/a
* and 81,514QphH@1000GB
Xeon W5580 3.2GHz, instead of X5570 2.93GHz (Linux/Oracle too)
the Hot Chips 22 conference lists a paper: Westmere-EX: A 20 Thread Server CPU.
IBM Power 780 with 2 quad-core POWER7 4.14GHz TPC-C: 1,200,011 (4 logical proc/core)
HP Integrity Superdome 16 quad-core Itanium 9350 TPC-H 140,181.1 QphH@1000GB

2010 July 16 Xeon 5600 and Opteron 6100 Benchmark Results

TPC-H results results for the ProLiant DL380 G7, DL385 G7 (both @100GB) Notice that at 2-way, Xeon 5680 higher than Opteron 6176 by 14% on TPC-E and 25% on TPC-E, but only by 3.5% on TPC-H. This is expected because Hyper-Threading provides a large boost on high call-volume (low average cost per call), and a smaller boost in large (DW) queries.

Unisys published a TPC-E for a 4-way Xeon 7560, system ES7000 Model 7600R G2, which is an OEM of the IBM x3850 X5.

2010 June 21

HP ProLiant servers, and other results

HP has just announced the ProLiant DL580 G7 and DL980 G7 servers based on the Xeon 7500 series processors, and the DL585 G7 4-way server with the 12-core AMD Opteron 6100 series (Magny-Cours).
Apparently the reason for the delay is that the 8-way DL980 G7 employs custom silicon node controllers (XNC), and possibly, so HP could make a splash in announcing all three system at their big annual conference: HP Technology Forum. The DL580 and 585 G7 are available now(?), and the 980 G7 should be available later in Q3.

TPC-H results for the DL585 G7 (300GB) and DL980 G7 (3TB). TPC-E results for the DL580 G7 and DL585 G7. TPC-E results for the DL380 G7 and DL385 G7 were previously published in May 2010.

While the Intel Xeon 7500 processor allows a glue-less 8-way system, HP felt that the design could be improved with node controllers. The node controllers reduce snoop traffic for a majority of memory accesses, and can achieve a 30% reduction in memory latency in some circumstances. It should be considered that HP needed to build custom silicon crossbar (& node controllers) for their SuperDome2 system and the Itanium 9300 series processors, which use the same QuickPath Interconnect (QPI) as the Nehalem processors. There are differences in the way the Itanium and Xeon processors use QPI. There are also differences between the node controllers for the Itanium and Xeon systems. (The Itanium node controller implements directory based cache coherency and the Xeon node controller is snoop filter).

HP may have built a glueless 8-way Xeon 7500 system if they had not already invested the effort to built the XNC for their Itanium systems. This also means that HP should have the components to built a 16-way Xeon 7500 system, meaning that if there were market demand, such a system could be brought to market. Intel did say that there were 16-way Xeon 7500 system designs, but none have surfaced yet.

Dell has also released a 2-way TPC-E result for the Xeon 5600, and Fujitsu released a 4-way TPC-E result for the Xeon 7500

System Pricing

Below are the bare system pricing based on HP's published TPC reports in 2010. Base system pricing only includes the system chassis and processors. There might be differences in memory pricing because of differences in memory type.

 IntelAMD
4-way  
SystemDL580G7DL585G7
Processor4xXeon 75604x6176
Price$24,597$11,235
 IntelAMD
2-way  
SystemDL380G7DL385G7
Processor2xXeon 56802x6176
Price$6,189$5,109

I do not consider the TPC overall pricing and pricing-performance metrics to be particularly helpful in comparative assessment. The software licensing may be much higher than the server costs when per-processor licensing apply, but is inconsequential for CAL mode in DW environments. Also, storage can be a very large part of the overall system cost. Some vendors can use SSD or direct-attach for favorable pricing, other vendors employ more expensive SAN storage. In any case, the customer always pick their own storage, so the storage pricing in the TPC reports is of no consequence to the actual customer. That being said, there is no meaningful difference in the TPC-E Price/Performance between the AMD and Intel based platforms at either 2-way or 4-way.

The AMD objective for the near-term (until Bulldozer?) is probably not competing with Intel on a socket basis but overall system price. The price difference between the ML/DL 380 and 385 is nominal at best, and so I would definitely prefer the 380 with the Xeon 5680, because any time I have a big single threaded operation, I want the fastest core, not aggregate core-GHz.

At the 4-way level, between the DL 580 and 585, there is a better story. If we add 256GB memory at $16K (the HP report says $32K for the DL585G7, but I think that is too high), then we are looking at $40.5 vs 27K, which 1.5 to 1. This roughly corresponds to the difference in performance. So it all comes down to the cost of the other components: storage and software licensing.

I am inclined to recommend the 2-way Xeon 5600 series for medium tasks, the 4-way Xeon 7500 series for big tasks, and the 8-way Xeon 7500 ($79K base) for really big tasks. Yes, the 4-way 12-core Opteron does fit in the gap between 2 x 5600 and 4 x 7500. But I am not sure this is a viable gap, as I would recommend the 2 x 5600 when the choice is between a middle and big system, in part because processors today are so powerful and because the Xeon 5600 is the fastest for single-threaded tasks.

ps - High-Availability requirements might steer towards the Xeon 7500 because of the MCA features etc.

2010 April xx

Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX) Performance

Intel launched the Xeon 5600 series (Westmere-EP, 32nm) six-core processors on 16 March 2010 without any TPC benchmark results. In the performance world, no results almost always mean bad or not good results. Yet there is every reason to believe that the Xeon 5600 series with six-cores (X models only) will performance exactly as expected for a 50% increase in the number of cores at the same frequency (as the 5500) with no system level bottlenecks. The expectation is that a six-core Xeon 5600 should provide 30%+ improvement over the comparable quad-core Xeon 5500 in throughput oriented tests, particularly OLTP type workloads. Single stream parallel execution plans will probably show less gain, as scaling via parallelism is not a simple matter.

Then two weeks later on 30 March 2010, Intel launched the Xeon 7500 series 8-core processors for 4-way+ systems (and the Xeon 6500 for high-end 2-way systems) with TPC-E results on 4-way and 8-way systems but no TPC-H results. The TPC-E results were exactly what Intel said it was going to be last September at IDF, 2.5X over the previous generation Xeon 7400 series and 2.5X over the contemporary 2-way Xeon 5500 series.

My guess is that Intel wanted it to be clear that the 4-way Xeon 7500 achieved the stated performance objectives of 2.5X over the 2-way Xeon 5500, just in case some slide-decks did not mention which 2-way system the 2.5X claim referred to. Of course, the Intel statement of 2.5X for Xeon 7500 was most probably based on performance measurements already run on proto-type systems. It was probably also felt that the Xeon 5600 series is such a natural choice to supersede the 5500 series that TPC benchmarks were not essential, as there were sufficient other benchmarks to support the claims.

In brief, the Intel Core 2 architecture processors were avoiding comparisons against AMD Opteron in TPC-H, except for the 16-way Unisys system, for which there is no comparable Opteron system.

Opteron on the other hand, avoided comparison with Core2 architecture in 2-way systems and TPC-C/E OLTP benchmarks across the board. In the 2-way systems, the Intel old-FSB technology was still adequate, and the powerful Core2 architecture core was enough to beat a 2-way Opteron. There were respectable 4-way TPC-C and TPC-E results for Shanghai. When AMD announced the HT-Assist feature in Istanbul, one might have thought AMD was finally going to be able compete in 4-way OLTP. But there have been zero benchmarks published as of current.

When the 2-way Intel Xeon 5500 processor, based on the Nehalem architecture, came out in early 2009, outstanding results were published for both the OLTP oreiented TPC-E and DW/DSS oriented TPC-H. In February 2010, a TPC-C was published as well, even though Microsoft had previously said all new OLTP benchmarks were going to be TPC-E. This result was with SQL Server 2005 instead of 2008.

There was every expectation with the Xeon 7500 Nehalem-EX, that there would be both OLTP and DW/DSS benchmark results, as Xeon 7500 should produce world-class (and world-record) results in both. It is possible that performance problems were encountered in trying to achieve good scaling over 32-cores and 64-threads in a 4-way Xeon 7500 system. If this is identified as something that can be fixed in the Windows operating system or SQL Server engine, then a change request would be made. I seriously doubt that another processor stepping would be done for this, as Xeon 7500 is already D-step at release.

TPC-H Scaling

It is also quite possible Intel will have to face the fact that 2.5X the 2-way Xeon 5500 TPC-H SF100 result of 51,000 QphH is not going to be achieved no matter how good Xeon 7500 is at DW. This is because the TPC-H scores is a geometric mean of the 22 queries. There are several small queries in TPC-H, two of which already run in under 1 seconds on the 2-way 8-core Xeon 5570 for SF100, and several that run near or under 2 seconds. There is limited opportunity to continue to improve the performance of small queries with increasing degree of parallelism, as the overhead to setup each thread becomes larger compared to the actual work done be each thread, especially if one also has the give up frequency, dropping from 2.93 to 2.26GHz. It would be helpful to know what the actual frequency is during a performance run with the turbo-boost feature.

It is possible that some marketing putz does not understand this and denied permission to publish perfectly good Xeon 7500 TPC-H results because it did not meet the 2.5X goal. (Along with making a negative ranking and review entry for the person responsible for TPC-H benchmarking due to failing to achieve the 2.5X goal. But lets not grind axes on here. Besides, who said life was fair? It takes exceptional talent to accomplish the impossible. A clever person anticipates impossible problems, and transfers to another group to avoid a sticky wicket).

Achieving 2.5X in the big queries is a more meaningful goal. Achieving 50% better than the 8-way Opteron 6-core TPC-H SF300 or SF1TB would also be a worthwhile accomplishment, if Xeon 7500 were upto the task.

TPC-E Scaling

Finally, a quick comment on Xeon 7500 scaling from 4-way (32-cores, 64-threads) to 8-way (64-cores, 128-threads). In the past, achieving 1.5 scaling with this number of cores would have been a triumph. Given the announcement Microsoft made on Windows Server 2008 R2, on removing the thread scheduler and other impediments to high-end scaling, we were expecting 1.7X scaling. It could be that scaling beyond 64-threads in tricky, because of the 64-thread limit per group(insert correct terminology). Hopefully the 4-way to 8-way to 16-way scaling will improve over time as problems are solved one at a time, while the task master whips his/her draft horses (again, I digress).

Benchmark Omissions

Earlier, I had commented about benchmark omissions from the quad-core generation on. Below is a summary of processors and systems for which TPC results are published. (The Intel Xeon 7500 Processor Product Brief shows 3.03X relative to 7400 for OLTP Brokerage Database, which is TPC-E, but 2022 over 729 is 2.77X.)

Unkown date?

Nehalem-EX

From information available, Nehalem-EX will become the Xeon 7500 series superseding the Xeon 7400 series. Previously, Intel did not target 7000 series processors in 2-way systems. Now that AMD has the six-core Istanbul Opteron processor in 2-way system, it is expected that system vendors will have 2-way systems based on Nehalem-EX. This might become a Xeon 6500 series. The Xeon 5500 series processors have 3 memory channels and 2 QPI links in a 1366-pin package. Nehalem-EX has 4 memory channels and 4 QPI links, so Xeon 5500 and 6500 would not be interchangeable.

older material

Benchmarks

As processor performance cannot be characterized by a single metric, it is necessary to look at several benchmarks. The most useful for single core performance is SPEC CPU 2006 integer base. SPEC CPU results are base on 12 individual applications. Another important aspect is that the very latest compilers used for SPEC CPU designed to generate the results. The compilers used for common applications like SQL Server do not incorporate the very latest enhancements. The more recent versions of SPEC CPU are now actually multi-threaded (?).

An alternative for single core performance metrics is to simply run certain test queries in SQL Server with all data in memory. Examples tests are: 1) rows per second for a non-clustered index seek with bookmark lookup, or loop join, 2) table scan pages per sec, and 3) network round trips per second.

At the system level, the most widely recognized benchmarks TPC-C, E, and H. TPC-C is a transaction processing benchmark that has long history. Results are available from 1995 to present. TPC-C has the most extensive set of results allowing reasonable ability to compare platforms across multiple generations. TPC-E is a relatively new transaction processing benchmark that is supposed to replace TPC-C. Results are available from 2007 to present. Only SQL Server results have been published. TPC-H is a data warehouse/DSS benchmark with published result from 2004 to present. While TPC-H performance characteristics are important, unfortunately there is not anywhere near enough published results a comprehensive comparison of system and processor architectures. Intel, and the top system vendors most probably have a complete set of TPC-H performance results, electing only to publish when it suits their purpose.

While TPC-C is described as a transaction processing benchmark, it has no relation to any actual transaction processing application. The more important aspect of the TPC-C benchmark is that it is moderately network round-trip intensive and very disk IO intensive. Consider the recent TPC-C for the 4-way Xeon 7460 (24 cores) of 634,825 transactions per minute (tpm-C). This is 10,580 transactions per sec. On average, each TPC-C transaction requires 2.25 calls from the application server to the database server. So this system generates 23,806 network round-trips per second, or 991 per core per sec. This in turn implies that the average cost per call is just over 1 CPU-milli-sec.

The other important metric to draw is the number of disk drives used for the data files. TPC-C has a performance and price-performance metrics. The system should be optimized for full utilization. Then a reasonable assumption is that the disk load is approximately 175-200 IOPS per disk (queue depth 1 operation). Certain TPC-C publications may target best price-performance, in which case the disks will be driven to much higher queue depth, and higher IOPS per disk. The result cited above was configured with 1000 disk drives, meaning the disk load is very likely to be in 175-200K IOPS range.

The reason the network round-trips is a very important is that scaling SQL Server performance has a very different set of issues for driving high network round-trip volume than in executing SQL statements. The TPC-C workload is below but near the boundary where the network call volume becomes more important than SQL execution. SAP has characteristics that are almost purely a matter of driving network round-trips. The other aspect of TPC-C is that disk IO is far higher than most any actual transaction processing application, which might contribute as much as 20-30% of the overall load.

Some other aspects of TPC-C are as follows. TPC-C benefits from Hyper-Threading, a feature of Pentium 4, Nehalem and Itanium 2 processors (starting from the dual core 9100 series). In the Pentium 4 architecture, HT (not AMD Hyper-Transport) improves network call volume throughput, but had no consistent affect in SQL operations. HT may cause erratic SQL performance.

A large processor cache improves the fixed startup cost of SQL operations, but does not change the incremental cost per row. Large cache benefits TPC-C and TPC-E, but does not benefit TPC-H.

The TPC-C only uses 5 stored procedures, meaning there is no cost expended for SQL compiles, which can be a sizeable portion of a less than perfectly designed transaction database.

TPC-E results are reported in transactions per second (tpsE). There are 10 transaction types that make up the main part of TPC-E. Each type may be made up of more than one frame (or stored procedure). The transaction that is scored is the Trade-Result, which makes up 10% of the transactions. There are about 25 frames total in the 10 transaction types. On average, there are 22.3 stored procedure calls for each scored tpsE. Consider the 4-way X7460 result of 721.4 tpsE. This implies that there are approximately 16,087 RPC call per second, or 670 per second for each of the 24 cores. So TPC-E performance characteristics are very similar to that of TPC-C.

The main stages of the competition between Intel Xeon and AMD Opteron are: 130nm single core, 90nm single core, 90nm dual-core, Intel 65nm dual core versus AMD 90nm dual core, Intel 65nm quad-core versus Opteron 90nm dual core and Intel 45nm six core versus Opteron 45nm quad-core.

At the time of the Opteron 4-way system introduction in 2003, it was competitive with Xeon in TPC-C. If we account for Xeon benefiting from Hyper-Threading in TPC-C by about 7%, when most environments have HT disabled for stability purposes, the Opteron had a moderate advantage. As Opteron frequencies started to climb up, AMD gained a more substantial advantage. The 90nm 3GHz Xeon MP with 8M L3 was not particularly impressive as it suffered from a poorly designed bus to the on-die L3 cache. It was found that the desktop Pentium 4 core at 3.66GHz with just 1M L2 cache in a Xeon package performed much better than the custom design server core with the large L3 cache.

The dual-core Opteron appeared in 2005 initially at 2.2GHz, against which the dual core Xeon 7041 at 3.0 GHz was competitive. Over time, Opteron dual core processor frequency increased incrementally, but Intel Xeon did not. The dual core Opteron result for 2.8GHz was actually a new improved design with DDR2 memory at 667MHz, compared DDR at up to 333MHz before. The disproportionate performance increase from 214K to 263K (23%) for a 7.7% frequency might suggest that there were additional improvements beyond the increased memory bandwidth.

The Intel dual core Xeon 7140 appeared in October 2006 featuring 2 Pentium 4 architecture cores plus a large 16M L3 cache on one die at 3.4GHz with an excellent TPC-C result of 318K versus 263K for the contemporary dual core Opteron at 2.8GHz. As mentioned before, TPC-C benefits from large cache and Hyper-Threading. So while the Xeon TPC-C was better, the Opteron did very well on many ad-hoc tests. The dual-core Opteron 8220 scored better at TPC-H than the Xeon 7140.

The Intel Core 2 architecture appeared in mid-2006 for two-way systems (Xeon 5100 series) and below. The 65nm Core 2 consists of two cores with a shared 4M L2 cache. This was soon followed by the quad core Xeon 5300 series consisting of two dual core die in a single package. The two-way Xeon 5355 was almost as powerful at the system level as 4-way dual core systems. The Core 2 was significantly faster at the single core level than Opteron or Pentium 4 architecture processors.

The Core 2 architecture for 4-way systems, Xeon 7300 series (quad-core) appeared in late 2007. Only with this did Intel finally regain clear advantage over Opteron in 4-way servers in both high call volume (TPC-C and TPC-E) and large queries (TPC-H) applications. AMD did not have a quad-core until 2008.

At the single core level, the Intel Core 2 architecture is considered to be more powerful than a contemporary generation AMD Opteron. The Xeon X5470 at 3.33GHz SPEC CPU 2006 Integer base result is 26.3, compared to 16.9 for the Opteron 8384 released in late 2008 at 2.5GHz. In the six-core Xeon 7460, top frequency was limited to 2.66GHz on account of the power consumption of the two extra cores and L3 cache, came in at 21.7. A recently announced Opteron 8393 at 3.1GHz is reported at 19.7, which is not far off the Xeon X7460.

AMD was late with quad-core, codename Barcelona, on 65nm process, with a 2.3GHz benchmark result published in March 2008 of 402K, and a 2.5GHz result in July 2008 of 471K. For less than 9% delta in frequency, there was a 17% increase, which is one indication of the bugs reported in early version of Barcelona. The 65nm Opteron 8360 quad core did scored higher than the 65nm Xeon 7350 at 4-way. At 45nm, the Intel Core 2 architecture (X7460) scored higher than the Opteron (8384) on the strength of both the 16M L3 cache and the 2 extra cores per socket, even at a lower frequency than the 65nm (2.66GHz versus 2.93GHz).

TPC-H is a data warehouse benchmark that stresses the ability to run very large queries. Unfortunately there is not a sufficiently large set of the comparable TPC-H results. The following is known. TPC-H does not benefit from large processor cache at all.

While the Core 2 architecture has significantly better performance than Opteron at the single core level, the expectation is that Opteron scales better to a large number of sockets. This means the performance with 2, 4, 8 or more sockets relative to the performance at a single socket.