Home,
Benchmarks,
TPC-C,
TPC-E,
TPC-H,
SPEC CPU,
Details:
SF100,
SF300,
SF1000,
SF3000
See also SQL Server vs Sybase
The TPC-H section in the process of being updated, consolidating older material with the most recent assessments.
HP published a TPC-H 1TB of 219,887.p QphH for their 8-way ProLiant DL980 G7 with the Xeon E7-4870 (Westmere-EX), 26% higher in the overall composite score than the IBM x3580 with the Xeon E7-8870, which is essentially the same processor?. The HP scores 16% higher in power and 37.7% higher in throughput. Both throughput tests were with 7 streams. The HP system had Hyper-Threading enabled (80 physical cores, 160 logical) while the IBM system did not. Both systems had 2TB memory, more than sufficient to hold the entire database, data and indexes in memory. The IBM system had 7 PCI-E SSDs, the HP system had 416 HDDs over 26 D2700 disk enclosures, 10 LSI SAS RAID controllers, 3 P411 and 1 dual-port 8Gbps FC controller.
Also of interest are TPC-H 1TB reports published for the 16-way SPARC M8000 (June 2011) with SPARC64 VII+ processors and the 4-way SPARC T4-4 (Sep 2011). The table below shows configuration information for recent TPC-H 1000GB results.
| TPC-H 1000GB | IBM x3850 X5 | HP ProLiant DL980 G7 | IBM Power 780 | SPARC M8000 | SPARC T4-4 |
|---|---|---|---|---|---|
| DBMS | SQL 2K8R2 EE | SQL 2K8R2 EE | Sybase IQ ASE 15.2 | Oracle 11g R2 | Oracle 11g R2 |
| Processors | 8 Xeon E7 | 8 Xeon E7 | 8 POWER7 | 16 SPARC64 VII+ | 4 SPARC T4 |
| Cores Threads | 80-80 | 80-160 | 32-128 | 64-128 | 32-256 |
| Memory | 2048TB | 2048TB | 512GB | 512GB | 512GB |
| IO Controllers | 7 | 13 | 12 | 4 Arrays | 4 Arrays |
| HDD/SSD | 7 SSD | 416 HDD | 52 SSD | 4x80 SSD | 4x80 SSD |
The figure below shows TPC-H 1000GB power, throughput and QphH composite scores for 4 x Xeon 7560 (32 cores, 64 threads), two 8 x Xeon E7 (80 cores, 80 and 160 threads) systems, 8 x POWER7 (32 cores, 128 threads) 16 SPARC64 VII+ (64 cores, 128 threads) and the 4 SPARC T4 (32 cores, 256 threads).
TPC-H SF 1000 Results
The HP 8-way Xeon and both Oracle/Sun systems, one with 16 sockets and the newest with 4 SPARC T4 processors, are comparable, within 10%.
An important point is that both Oracle/Sun and the IBM Power systems are configured with 512GB memory versus 2TB for the 8-way Xeon E7 systems, which enough to keep all data and indexes in memory. There is still disk IO for the initial data load and tempdb intermediate results. This good indication that Oracle and Sybase have been reasonably optimized on IO, in particular when to use an index and when not to. I had previously raised the issue that the SQL Server query optimizer should consider the different characteristics of in-memory, DW optimized HDD storage (100MB/s per disk sequential) and SSD.
Sun clearly made tremendous improvements from the SPARC 64 VII+ to the T4, with the 4-way new system essentially matching the previous 16-way. Of course, the Sun had been lagging at the individual processor socket level until now. Another interesting aspect is that Sun decided on 8 threads per core. The expectation is that server applications has a great deal of pointer chasing code, that is: fetch memory which determines next address to fetch with inherently poor locality.
A modern microprocessor with core frequency 3GHz corresponds to a 0.33 nano-second clock cycle. Local node memory access time might be 50ns, or 150 CPU-clocks. Remote node memory acess time might be 100ns for a neighboring node to over 250ns for multi-hop nodes after cache-coherency is taken into account. So depending on how many instructions are required for each non-cached memory access, we can expect each thread or logical core to have many dead cycles, possibly enough to justify 8 threads per core. What is surprising is that Oracle published a TPC-H benchmark with their new T4-4 and not a TPC-C/E which is more likely to emphasize the pointer chasing code than DW.
Below are the 22 individual query times for the above systems in the power test (1 stream).
TPC-H SF 1000 Queries 1-22
Below are the 22 individual query power times for just the two 8 Xeon E7 systems. Overall, the HP system (with HT enabled) has 16% TPC-H power score, but the IBM system without HT is faster or comparable in 9 of the 22 queries. Not considering the difference in system architecture, the net might be attributed to HT?
TPC-H SF 1000 IBM and HP 8-way Xeon E7
Below are the 22 individual query power times for the HP 8 Xeon E7 and Oracle SPARC T4-4 systems.
TPC-H SF 1000 8-way HP Xeon E7 and 4-way SPARC T4
Recent TPC-H publications include an 8-way Xeon E7-8870 (Westmere-EX) at 1TB by IBM on 5 Apr, along with the IBM 4-way Xeon X7560 (Nehalem-EX) also at SF 1TB on 3 Mar. Dell published several TPC-H results, all with EXAsolution and their PowerEdge R710, with 2 to 32 nodes, at SF 100GB, 300GB, 1TB, 3TB and 10TB.
TPC-H SF 1000 Results for Opteron, Itanium, Xeon and POWER7
New Oracle/Sun 3TB result on M9000 with 64 SPARC64 VII+, 256 cores total.
TPC-H SF 3000 Results for Xeon, POWER6 and SPARC systems.
| System | Processor | Total Cores | Mem | Disks | TPC-H Power | TPC-H Throughput | Composite QphH | Streams |
|---|---|---|---|---|---|---|---|---|
| DL980 G7 | Xeon 7560 | 64 | 512 | 500 | 185,297.7 | 142,685.6 | 162,601.7 | 8 |
| Power 595 | 32 POWER6 | 64 | 512 | 288 | 142,790.7 | 171,607.4 | 156,537.3 | 9 |
| M9000 | 32 SPARC | 128 | 512 | 256 | 182,350.7 | 216,967.7 | 198,907.5 | 64 |
| M9000 | 64 SPARC | 256 | 1024 | 512 | 316,835.8 | 471,428.6 | 386,478.3 | 128 |
The individual query run times are on SF3000 page.
IBM just published a TPC-H SF 1000 result for their x3850 X5, 4-way Xeon 7560 system featuring a special MAX5 memory expansion board to support 1.5TB memory. In Dec 2010, IBM also published a TPC-H SF1000 for their Power 780 system, 8-way, quad-core, (4 logical processors per physical core). In Feb 2011, Ingres published a TPC-H SF 100 on a 2-way Xeon 5680 for their VectorWise column-store engine (plus enhancements for memory architecture, SIMD and compression).
The figure table below shows TPC-H SF 1000 results for the 8-way 6-core Opteron 8439 on SQL Server and Sybase, the 16-way quad-core Itanium 9350 on Oracle, the 4-way Xeon 7560 on SQL Server and the 8-way POWER7 on Sybase. On TPC-H Power (single stream), the 4-way Xeon on SQL Server is competitive relative to the 16-way Itanium and 8-way POWER7 systems. In other words, the 8-way Xeon should be comparable to the 8-way POWER7. If there is a weak point in SQL Server, it is in the throughput test with multiple concurrent query streams. This aspect is probably something that could be corrected. Unfortunately, it is probably not a priority for the SQL Server team at this time.
TPC-H SF 1000 Results for HP DL785 and Integrity Superdome servers
| System | Processor | Total Cores | Memory | SQL | Power | Throughput | Composite QphH | Streams |
|---|---|---|---|---|---|---|---|---|
| HP DL785 G6 | Opt 8439 | 48 | 512 | 2008 rtm | 95,789.1 | 69,367.6 | 81,367.6 | 7 |
| HP DL785 G6 | Opt 8439 | 48 | 384 | Sybase 15.1 | 108,436.8 | 96,652.7 | 102,375.3 | 7 |
| HP Superdome2 | It 9350 | 64 | 512 | O11g R2 | 139,181.0 | 141,188.1 | 140,181.1 | 64 |
| IBM x3850 X5 | Xeon 7560 | 32 | 1536 | 2008 R2 | 127,676.1 | 81,039.6 | 101,719.3 | 7 |
| IBM Power 780 | POWER 7 | 32 | 512 | Sybase 15.2 | 170,206.1 | 159,463.1 | 164,747.2 | 9 |
| IBM x3850 X5 | Xeon E7-8870 | 80 | 2048 | 2008 R2 | 200,899.9 | 150,635.8 | 173,961.8 | 7 |
The individual query run times are on SF1000 page.
Below is the comparison of SQL Server and VectorWise. See Columnar Databases for discussion.
TPC-H SF 100 Results for HP DL380G7, SQL Server and Ingres VectorWise
It appears that only HP is actively publishing TPC-H benchmark results for SQL Server. There is some activity by IBM on other RDBMS. Fortunately, HP has published results for a broad range of systems, both over time and up and down the product lineup. There are results for 2-way Xeon 5500 & 5600, 2 and 4-way Opteron 6176, 4-way and 8-way Xeon 7500 systems, and a good historical record for past Opteron processors.
I am inclined to think other companies have cut back on benchmark efforts and personnel. Generating a respectable result requires special skills that can only be acquired with a full-time dedicated person for each benchmark, more if there are many platforms.
The graph and table below shows recent SF 100 results, all on 2-way systems. The performance gain from the Xeon 5355 to the 5570 is in part from the improved processor performance and in part from having nearly the entire database in memory. There is a Dell 2-way Xeon 5570 result with 48GB memory at 28,773 QphH. HP published 4 sets of results for the 2-way Xeon 5570 at SF100, two with Fusion-IO storage, one with HDD and one with SATA SSD storage. Even though system memory was (almost?) sufficient for the entire database, there is still tempdb activity. Hence there is potential for advantage to SSD storage, even though SSD technology was not fully mature at this point in time.
TPC-H SF 100 Results for various HP ProLiant servers
TPC-H SF 100 Results for various HP ProLiant servers
Overall, both the Xeon 5680 and Opteron 6176 are very close and show strong gain over the Xeon 5570. In the individual TPC-H queries, there are performance variations. This could be from the increased memory to 192GB, which may nearly eliminate tempdb I/O, or from improvements in the Windows Server 2008R2 and SQL Server 2008 R2 software stack, probably both.
See SF100 for additional details.
The chart below shows SF300 results for 8-way quad-core & six-core Opteron, 4-way 12-core Opteron and 4-way Xeon 7500 systems.
TPC-H SF 300 Results for various HP ProLiant servers
The table below is for the above systems, and includes a 4-way dual-core Opteron system omitted from the chart above.
TPC-H SF 300 Results for various HP ProLiant servers
The progression in performance over time is expected. The advance from quad-core Shanghai to six-core Istanbul is not just the additional cores, but also the HT-Assist, which improves realizable memory bandwidth(?). Somewhat surprising is the strong gain from 8-way six-core Istanbul 2.8GHz to 4-way 12-core Magny-Cours 2.3GHz. It is unclear how much gain can be attributed to the procesor core, the system architecture topology, or the R2 software stack.
See SF300 for additional details.
At the larger database sizes, there are not as many SQL Server TPC-H results, so it is necessary to include other databases, specifically Oracle and Sybase. In the TPC-C and TPC-E transactional benchmarks, the cost model generally does not impact the execution plan, which is always an index seek and key lookup. Either there is no significant difference, or just a difference in join order with no real difference.
In TPC-H, the queries involve a very large number rows, and may involve search arguments on different tables. The cost model can lead to very different execution plans, with radically different cost structures. Even if the oveall performance is comparable, there can be wide variations in the individual query performance.
It is important not to draw unsupported conclusions on the difference between database engines. With that cautionary, it can be very interested in looking at the execution plans for queries with radically different performance between database engines. The more recent SQL Server TPC-H results include the showplan in text format. It is unfortunate that the XML actual plans are not provided. I am not sure if the recent Sybase and Oracle results include execution plans.
I also notice that Sybase has a ton of tuning parameters, perhaps I will look into this sometime.
At SF 1000, the only recent SQL Server result is for the 8-way Opteron 8439 6-core. There is 2007 result for a 32-way dual-core Itanium on SQL Server 2005. This may be too old for a useful comparison (and is not shown in the chart below). There is SF1000 result for Sybase on the same 8-way Opteron 8439 system. A cross database comparison is interesting when there are different execution plans with wildly different elapsed times.
Oracle 11g results for a 32-way Superdome with Itanium 9140 dual-core and a 16-way Superdome2 with Itanium 9350 quad-core result are of some interest. Itanium is still supported for the current version of Windows and SQL Server, but is not endorsed because of the very strong performance of the Xeon and Opteron processors relative to the Itanium processor. The final SF1000 result of interest is a Oracle RAC of 64-node with 2-way Xeon 5450 3.0GHz quad-processors.
TPC-H SF 1000 Results for HP DL785 and Integrity Superdome servers
| System | Processor | Total Cores | Memory | SQL | Power | Throughput | Composite QphH | Streams |
|---|---|---|---|---|---|---|---|---|
| DL785 G6 | Opt 8439 | 48 | 512 | 2008 rtm | 95,789.1 | 69,367.6 | 81,367.6 | 7 |
| DL785 G6 | Opt 8439 | 48 | 384 | Sybase | 108,436.8 | 96,652.7 | 102,375.3 | 7 |
| Superdome | It 9050 | 64 | 256 | 2005 sp2 | 90,909.1 | 53,898.5 | 69,999.0 | 7 |
| Superdome | It 9140 | 64 | 384 | O11g | 111,577.0 | 128,259.1 | 123,323.1 | 64 |
| Superdome2 | It 9350 | 64 | 512 | O11g R2 | 139,181.0 | 141,188.1 | 140,181.1 | 64 |
| BL460c | Xeon 5450 | 512 | 2080 | O11g R2 | 782,608.7 | 1,740,122 | 1,166,976.6 | 498 |
Additional details are below.
| System | DL785 G6 | DL785 G6 | Superdome | Superdome 2 | BL640c |
|---|---|---|---|---|---|
| Database | SQL Server | Sybase 15.1 | Oracle 11g | Oracle 11g R2 | Oracle 11g R2 RAC |
| Processor | Opteron 8439 | Opteron 8439 | Itanium 9140 | Itanium 9350 | Xeon 5450 |
| Sockets-Cores | 8 x 6 = 48 | 8 x 6 = 48 | 32 x 2 = 64 | 16 x 4 = 64 | 64x2x4 |
| Hyper-Threading | no | no | disabled | disabled | no |
| Frequency | 2.8GHz | 2.8GHz | 1.6GHz | 1.73GHz | 3.0GHz |
| Memory | 512GB | 384GB | 384GB | 512G | 1x64+63x32 |
| Storage Controllers | 6 P800 | 8 x 8Gbps dual-port FC |
64 4Gbps dual-port FC | 48 8Gpbs dual-port FC | Infini-Band |
| Storage Ext | 12 MSA70 | 4 MSA2324fc | 32 EVA4400 | 24 MSA2324 | Exadata |
| Data disks | 240 HDD | 96 HDD | 768 HDD | 576 HDD |   |
| Controller-Disks | 3x50, 25, 30, 35 | 1 per 24 | 1 per 24 | 1 per 24 |   |
| LUNs-disks | 48x5? | ? | 1 per 12 | 3 per 6 |   |
| OS | 2008 R2 EE | RHEL 5.3 | HP-UX 11 | HP-UX 11 | Linux |
| Database | 2008 EE | Sybase 15.1 | Oracle 11g | Oracle 11g R2 |
Two aspects of the Sybase result are of interest. One is that Sybase achieved better overal performance in both Power and Throughput than SQL Server on the same 8-way six-core Opteron system, with significantly fewer disk drives, 96 versus 240 for SQL Server (The difference is memory is not expected to contribute a meaningful impact in TPC-H, as disk IO for tables is unavoidable). The second point of interest is that the Sybase throughput is very close to the power metric (11% lower) while for SQL Server throughput is 27% lower.
On the first point, in Sybase being able to produce a very respectable TPC-H result with far fewer disk drives, I have long complained that the SQL Server query optimizer employs an long antiquated fixed cost model for random and sequential IO, with 1350 pages of sequential IO having equal cost to 320 random IO (both correspond to IO cost 1). Now it so happens this model is reasonable in certain SAN configurations. However in a storage system properly designed for DW, the sequential to randon disk IO (time) cost equivalence point could be in the 36:1 region. This means the SQL Server query optimizer will stay with a key lookup or loop join long past the point when it would have been better to switch to a table scan. It is my suspicion that this deficiency contibutes to SQL Server needing far more HDDs to produce a good TPC-H result. Of course, before launching a campaign to push for a change to IO cost model, on SSD and with data in memory, the optimal sequential to random ratio is in the other direction.
The second point concerns throughput degradation relative to the single-stream power result. The general theory is that for single-stream, increasing parallelism up to one thread per core improves performance at the expense of efficiency. For example assumming nearly idea scaling, if a query takes 256 seconds elapsed time to run at DOP 1, consuming 256 CPU-seconds. Then at DOP 2, it might take somewhat more that 128 seconds elapsed time to run, while consuming somewhat more than 256 CPU-seconds. The CPU-seconds is expected to rise with increasing DOP.
When there are multiple active streams, the expectation is that the SQL Server engine should know not to start a new query at maximum parallelism (with one thread per processor core). If a long running query starts with threads on all processors, and then one or more other connections because active, perhaps it is not practical (or possible) to reduce the number of threads on a running query. In this case, the proper strategy for the query running on all processors might be to bring 2 or more threads onto each processor to better share resources? So when multiple streams are active, each stream should be running at lower degrees of parallelism, which while taking longer elapsed time to run, should be more CPU efficient, and hence throughput should be higher than power.
The Superdome systems with Oracle were configured with a rather large number of disk drives, 768 in the 32 dual-core 9140 and 576 in the 16 quad-core 9350. It is unclear as to the reason why. It might be because HP wanted to showcase their EVA4440 and MSA2300 SAN storage systems. SAN systems do not provide good sequential bandwidth except possibly with 2 disk RAID 1 groups or 3-disk RAID 5 group. RAID groups more disks might provide 27MB/sec per disk. It might be because Oracle Automated Storage Management prefers this type of configuration. It is also unclear as to why HP elected to configure so many HBA and FC ports. The data consumption rate for the big Line Item scan queries seem to be generating no more than 5-8GB/sec. So why configure 48 dual-port HBAs (2 HBA per MSA with 24 disks) at 8Gbps which could conceivably support 70GB/s? A dozen dual-port 8Gbps HBAs (1 HBA per 2 MSA, 48 disks) should have been sufficient.
Both Superdome/Oracles results do show higher Throughput than Power.
See SF1000 for additional details.
The chart below shows TPC-H SF3000 results for the 16-way Xeon 7460 (96-cores total), the 8-way Xeon 7560 (64-core total) and 32-way IBM POWER6 (also 64-cores). The IBM POWER6 is a 2007(?) processor even though the TPC-H report is November 2009. The POWER6 has recently been suceeded by the POWER7 for which no TPC-H result have been published yet. The indications are that the new POWER7 is significantly improved at the core performance level.
Oracle just published a SF3000 result with a 32-way quad-core SPARC 2.88GHz system. I realize this Oracle TPC-H result employed very many streams, 64 in this case. Compare this with 8 streams for the HP DL980G7 results for SQL Server. I will look into this in more detail later.

TPC-H SF 3000 Results for Xeon 7460 & 7560 and IBM POWER6 (Sybase)
TPC-H SF 3000 Results for Xeon and IBM POWER6
| System | Processor | Total Cores | Mem | Disks | TPC-H Power | TPC-H Throughput | Composite QphH | Streams |
|---|---|---|---|---|---|---|---|---|
| Unisys | Xeon 7460 | 96 | 1024 | 900 | 120,254.8 | 87,841.4 | 102,778.2 | 8 |
| DL980 G7 | Xeon 7560 | 64 | 512 | 500 | 185,297.7 | 142,685.6 | 162,601.7 | 8 |
| Power 595 | 32 POWER6 | 64 | 512 | 288 | 142,790.7 | 171,607.4 | 156,537.3 | 9 |
| M9000 | 32 SPARC | 128 | 512 | 256 | 182,350.7 | 216,967.7 | 198,907.5 | 64 |
Additional details are below:
| System | ES7000 | DL980G7 | Power 595 |
|---|---|---|---|
| Processor | Xeon 7460 | Xeon 7560 | Power6 |
| Sockets-Cores | 16 x 6 = 96 | 8 x 8 = 64 | 32 x 2 = 64 |
| Hyper-Threading | no | yes | 4/core? |
| Frequency | 2.66GHz | 2.26GHz | 5.0GHz |
| Memory | 1024GB | 512GB | 512GB |
| Controllers | 24 FC | 10+3 SAS | 24 x 4G DP FC 12 DS4800 |
| Enclosures | 32+28 | 20+6 D2700 | 48 EXP810 |
| Data disks | 900 HDD | 500 HDD | 288 HDD |
| Controller-Disks | 8x60+14x30 | 10x50+3x48 | 1 per 24 |
| LUN-disks | 15-1 | 1? | 2? |
| OS | 2008 R2 DC | 2008 R2 EE | AIX 6.1 |
| Database | 2008 R2 DC | 2008 R2 EE | Sybase 15.1 |
As with the SF1000 Sybase result on 8-way Opteron, the POWER6 with Sybase was also configured with relatively few disks. The TPC-H throughput is significantly higher (20%) than the power. Compare this to the Xeon 7560 in which throughput is 23% lower than power.
See SF3000 for additional details.
There is a TPC-H SF 10000 result with SQL Server on a Unisys ES7000 with 16 Xeon 7460 processors, using 64 of the 96 cores, as this was pre-2008R2. There is also a 32-way dual-core Itanium result also with SQL Server.