Home, Benchmarks, TPC-C, TPC-E, TPC-H, SPEC CPU,
Details: SF100, SF300, SF1000, SF3000 See also SQL Server vs Sybase

The TPC-H section in the process of being updated, consolidating older material with the most recent assessments.

TPC-H 2011 Sep

HP published a TPC-H 1TB of 219,887.p QphH for their 8-way ProLiant DL980 G7 with the Xeon E7-4870 (Westmere-EX), 26% higher in the overall composite score than the IBM x3580 with the Xeon E7-8870, which is essentially the same processor?. The HP scores 16% higher in power and 37.7% higher in throughput. Both throughput tests were with 7 streams. The HP system had Hyper-Threading enabled (80 physical cores, 160 logical) while the IBM system did not. Both systems had 2TB memory, more than sufficient to hold the entire database, data and indexes in memory. The IBM system had 7 PCI-E SSDs, the HP system had 416 HDDs over 26 D2700 disk enclosures, 10 LSI SAS RAID controllers, 3 P411 and 1 dual-port 8Gbps FC controller.

Also of interest are TPC-H 1TB reports published for the 16-way SPARC M8000 (June 2011) with SPARC64 VII+ processors and the 4-way SPARC T4-4 (Sep 2011). The table below shows configuration information for recent TPC-H 1000GB results.

TPC-H 1000GBIBM x3850 X5HP ProLiant DL980 G7IBM Power 780SPARC M8000SPARC T4-4
DBMS SQL 2K8R2 EESQL 2K8R2 EESybase IQ ASE 15.2Oracle 11g R2Oracle 11g R2
Processors8 Xeon E78 Xeon E78 POWER716 SPARC64 VII+4 SPARC T4
Cores Threads 80-8080-16032-12864-12832-256
Memory 2048TB2048TB512GB512GB512GB
IO Controllers 713124 Arrays4 Arrays
HDD/SSD7 SSD416 HDD52 SSD4x80 SSD4x80 SSD

The figure below shows TPC-H 1000GB power, throughput and QphH composite scores for 4 x Xeon 7560 (32 cores, 64 threads), two 8 x Xeon E7 (80 cores, 80 and 160 threads) systems, 8 x POWER7 (32 cores, 128 threads) 16 SPARC64 VII+ (64 cores, 128 threads) and the 4 SPARC T4 (32 cores, 256 threads).

tpch100
TPC-H SF 1000 Results

The HP 8-way Xeon and both Oracle/Sun systems, one with 16 sockets and the newest with 4 SPARC T4 processors, are comparable, within 10%.

An important point is that both Oracle/Sun and the IBM Power systems are configured with 512GB memory versus 2TB for the 8-way Xeon E7 systems, which enough to keep all data and indexes in memory. There is still disk IO for the initial data load and tempdb intermediate results. This good indication that Oracle and Sybase have been reasonably optimized on IO, in particular when to use an index and when not to. I had previously raised the issue that the SQL Server query optimizer should consider the different characteristics of in-memory, DW optimized HDD storage (100MB/s per disk sequential) and SSD.

Sun clearly made tremendous improvements from the SPARC 64 VII+ to the T4, with the 4-way new system essentially matching the previous 16-way. Of course, the Sun had been lagging at the individual processor socket level until now. Another interesting aspect is that Sun decided on 8 threads per core. The expectation is that server applications has a great deal of pointer chasing code, that is: fetch memory which determines next address to fetch with inherently poor locality.

A modern microprocessor with core frequency 3GHz corresponds to a 0.33 nano-second clock cycle. Local node memory access time might be 50ns, or 150 CPU-clocks. Remote node memory acess time might be 100ns for a neighboring node to over 250ns for multi-hop nodes after cache-coherency is taken into account. So depending on how many instructions are required for each non-cached memory access, we can expect each thread or logical core to have many dead cycles, possibly enough to justify 8 threads per core. What is surprising is that Oracle published a TPC-H benchmark with their new T4-4 and not a TPC-C/E which is more likely to emphasize the pointer chasing code than DW.

Below are the 22 individual query times for the above systems in the power test (1 stream).

tpch100
TPC-H SF 1000 Queries 1-22

Below are the 22 individual query power times for just the two 8 Xeon E7 systems. Overall, the HP system (with HT enabled) has 16% TPC-H power score, but the IBM system without HT is faster or comparable in 9 of the 22 queries. Not considering the difference in system architecture, the net might be attributed to HT?

tpch100
TPC-H SF 1000 IBM and HP 8-way Xeon E7

Below are the 22 individual query power times for the HP 8 Xeon E7 and Oracle SPARC T4-4 systems.

tpch100
TPC-H SF 1000 8-way HP Xeon E7 and 4-way SPARC T4

TPC-H 2011 Apr

Recent TPC-H publications include an 8-way Xeon E7-8870 (Westmere-EX) at 1TB by IBM on 5 Apr, along with the IBM 4-way Xeon X7560 (Nehalem-EX) also at SF 1TB on 3 Mar. Dell published several TPC-H results, all with EXAsolution and their PowerEdge R710, with 2 to 32 nodes, at SF 100GB, 300GB, 1TB, 3TB and 10TB.

tpch100
TPC-H SF 1000 Results for Opteron, Itanium, Xeon and POWER7

 

TPC-H 2011 Mar 23

New Oracle/Sun 3TB result on M9000 with 64 SPARC64 VII+, 256 cores total.

tpch100
TPC-H SF 3000 Results for Xeon, POWER6 and SPARC systems.

SystemProcessorTotal
Cores
MemDisksTPC-H
Power
TPC-H
Throughput
Composite
QphH
Streams
DL980 G7Xeon 756064512500185,297.7142,685.6162,601.78
Power 59532 POWER664512288142,790.7171,607.4156,537.39
M900032 SPARC128512256182,350.7216,967.7198,907.564
M900064 SPARC2561024512316,835.8471,428.6386,478.3128

The individual query run times are on SF3000 page.

 

TPC-H 2011 Mar 4

IBM just published a TPC-H SF 1000 result for their x3850 X5, 4-way Xeon 7560 system featuring a special MAX5 memory expansion board to support 1.5TB memory. In Dec 2010, IBM also published a TPC-H SF1000 for their Power 780 system, 8-way, quad-core, (4 logical processors per physical core). In Feb 2011, Ingres published a TPC-H SF 100 on a 2-way Xeon 5680 for their VectorWise column-store engine (plus enhancements for memory architecture, SIMD and compression).

The figure table below shows TPC-H SF 1000 results for the 8-way 6-core Opteron 8439 on SQL Server and Sybase, the 16-way quad-core Itanium 9350 on Oracle, the 4-way Xeon 7560 on SQL Server and the 8-way POWER7 on Sybase. On TPC-H Power (single stream), the 4-way Xeon on SQL Server is competitive relative to the 16-way Itanium and 8-way POWER7 systems. In other words, the 8-way Xeon should be comparable to the 8-way POWER7. If there is a weak point in SQL Server, it is in the throughput test with multiple concurrent query streams. This aspect is probably something that could be corrected. Unfortunately, it is probably not a priority for the SQL Server team at this time.

tpch100
TPC-H SF 1000 Results for HP DL785 and Integrity Superdome servers

SystemProcessorTotal
Cores
MemorySQLPowerThroughputComposite
QphH
Streams
HP DL785 G6Opt 8439485122008 rtm95,789.169,367.681,367.67
HP DL785 G6Opt 843948384Sybase 15.1108,436.896,652.7102,375.37
HP Superdome2It 935064512O11g R2139,181.0141,188.1140,181.164
IBM x3850 X5Xeon 75603215362008 R2127,676.181,039.6101,719.37
IBM Power 780POWER 732512Sybase 15.2170,206.1159,463.1164,747.29
IBM x3850 X5Xeon E7-88708020482008 R2200,899.9150,635.8173,961.87

The individual query run times are on SF1000 page.

SQL Server versus VectorWise

Below is the comparison of SQL Server and VectorWise. See Columnar Databases for discussion.

tpch100
TPC-H SF 100 Results for HP DL380G7, SQL Server and Ingres VectorWise

TPC-H 2010 Sept 22

Nehalem-EX, Westmere and Magny-Cours

It appears that only HP is actively publishing TPC-H benchmark results for SQL Server. There is some activity by IBM on other RDBMS. Fortunately, HP has published results for a broad range of systems, both over time and up and down the product lineup. There are results for 2-way Xeon 5500 & 5600, 2 and 4-way Opteron 6176, 4-way and 8-way Xeon 7500 systems, and a good historical record for past Opteron processors.

I am inclined to think other companies have cut back on benchmark efforts and personnel. Generating a respectable result requires special skills that can only be acquired with a full-time dedicated person for each benchmark, more if there are many platforms.

SF 100 Results - 2-way Systems

The graph and table below shows recent SF 100 results, all on 2-way systems. The performance gain from the Xeon 5355 to the 5570 is in part from the improved processor performance and in part from having nearly the entire database in memory. There is a Dell 2-way Xeon 5570 result with 48GB memory at 28,773 QphH. HP published 4 sets of results for the 2-way Xeon 5570 at SF100, two with Fusion-IO storage, one with HDD and one with SATA SSD storage. Even though system memory was (almost?) sufficient for the entire database, there is still tempdb activity. Hence there is potential for advantage to SSD storage, even though SSD technology was not fully mature at this point in time.

tpch100
TPC-H SF 100 Results for various HP ProLiant servers

tpch100
TPC-H SF 100 Results for various HP ProLiant servers

Overall, both the Xeon 5680 and Opteron 6176 are very close and show strong gain over the Xeon 5570. In the individual TPC-H queries, there are performance variations. This could be from the increased memory to 192GB, which may nearly eliminate tempdb I/O, or from improvements in the Windows Server 2008R2 and SQL Server 2008 R2 software stack, probably both.

See SF100 for additional details.

SF 300 Results

The chart below shows SF300 results for 8-way quad-core & six-core Opteron, 4-way 12-core Opteron and 4-way Xeon 7500 systems.

tpch100
TPC-H SF 300 Results for various HP ProLiant servers

The table below is for the above systems, and includes a 4-way dual-core Opteron system omitted from the chart above.

tpch100
TPC-H SF 300 Results for various HP ProLiant servers

The progression in performance over time is expected. The advance from quad-core Shanghai to six-core Istanbul is not just the additional cores, but also the HT-Assist, which improves realizable memory bandwidth(?). Somewhat surprising is the strong gain from 8-way six-core Istanbul 2.8GHz to 4-way 12-core Magny-Cours 2.3GHz. It is unclear how much gain can be attributed to the procesor core, the system architecture topology, or the R2 software stack.

See SF300 for additional details.

Comparing Results from Different Database Engines

At the larger database sizes, there are not as many SQL Server TPC-H results, so it is necessary to include other databases, specifically Oracle and Sybase. In the TPC-C and TPC-E transactional benchmarks, the cost model generally does not impact the execution plan, which is always an index seek and key lookup. Either there is no significant difference, or just a difference in join order with no real difference.

In TPC-H, the queries involve a very large number rows, and may involve search arguments on different tables. The cost model can lead to very different execution plans, with radically different cost structures. Even if the oveall performance is comparable, there can be wide variations in the individual query performance.

It is important not to draw unsupported conclusions on the difference between database engines. With that cautionary, it can be very interested in looking at the execution plans for queries with radically different performance between database engines. The more recent SQL Server TPC-H results include the showplan in text format. It is unfortunate that the XML actual plans are not provided. I am not sure if the recent Sybase and Oracle results include execution plans.

I also notice that Sybase has a ton of tuning parameters, perhaps I will look into this sometime.

SF 1000 Results

At SF 1000, the only recent SQL Server result is for the 8-way Opteron 8439 6-core. There is 2007 result for a 32-way dual-core Itanium on SQL Server 2005. This may be too old for a useful comparison (and is not shown in the chart below). There is SF1000 result for Sybase on the same 8-way Opteron 8439 system. A cross database comparison is interesting when there are different execution plans with wildly different elapsed times.

Oracle 11g results for a 32-way Superdome with Itanium 9140 dual-core and a 16-way Superdome2 with Itanium 9350 quad-core result are of some interest. Itanium is still supported for the current version of Windows and SQL Server, but is not endorsed because of the very strong performance of the Xeon and Opteron processors relative to the Itanium processor. The final SF1000 result of interest is a Oracle RAC of 64-node with 2-way Xeon 5450 3.0GHz quad-processors.

tpch100
TPC-H SF 1000 Results for HP DL785 and Integrity Superdome servers

SystemProcessorTotal
Cores
MemorySQLPowerThroughputComposite
QphH
Streams
DL785 G6Opt 8439485122008 rtm95,789.169,367.681,367.67
DL785 G6Opt 843948384Sybase108,436.896,652.7102,375.37
SuperdomeIt 9050642562005 sp290,909.153,898.569,999.07
SuperdomeIt 914064384O11g 111,577.0128,259.1123,323.164
Superdome2It 935064512O11g R2139,181.0141,188.1140,181.164
BL460cXeon 54505122080O11g R2782,608.71,740,1221,166,976.6498

Additional details are below.

SystemDL785 G6DL785 G6 SuperdomeSuperdome 2BL640c
DatabaseSQL ServerSybase 15.1Oracle 11gOracle 11g R2Oracle 11g R2 RAC
ProcessorOpteron 8439Opteron 8439Itanium 9140Itanium 9350Xeon 5450
Sockets-Cores8 x 6 = 488 x 6 = 4832 x 2 = 6416 x 4 = 6464x2x4
Hyper-Threadingnonodisableddisabledno
Frequency2.8GHz2.8GHz1.6GHz1.73GHz3.0GHz
Memory512GB384GB384GB512G1x64+63x32
Storage Controllers6 P8008 x 8Gbps
dual-port FC
64 4Gbps
dual-port FC
48 8Gpbs
dual-port FC
Infini-Band
Storage Ext12 MSA704 MSA2324fc32 EVA440024 MSA2324Exadata
Data disks240 HDD96 HDD768 HDD576 HDD 
Controller-Disks3x50, 25, 30, 351 per 241 per 241 per 24 
LUNs-disks48x5??1 per 123 per 6 
OS2008 R2 EERHEL 5.3HP-UX 11HP-UX 11Linux
Database2008 EESybase 15.1Oracle 11gOracle 11g R2

Two aspects of the Sybase result are of interest. One is that Sybase achieved better overal performance in both Power and Throughput than SQL Server on the same 8-way six-core Opteron system, with significantly fewer disk drives, 96 versus 240 for SQL Server (The difference is memory is not expected to contribute a meaningful impact in TPC-H, as disk IO for tables is unavoidable). The second point of interest is that the Sybase throughput is very close to the power metric (11% lower) while for SQL Server throughput is 27% lower.

On the first point, in Sybase being able to produce a very respectable TPC-H result with far fewer disk drives, I have long complained that the SQL Server query optimizer employs an long antiquated fixed cost model for random and sequential IO, with 1350 pages of sequential IO having equal cost to 320 random IO (both correspond to IO cost 1). Now it so happens this model is reasonable in certain SAN configurations. However in a storage system properly designed for DW, the sequential to randon disk IO (time) cost equivalence point could be in the 36:1 region. This means the SQL Server query optimizer will stay with a key lookup or loop join long past the point when it would have been better to switch to a table scan. It is my suspicion that this deficiency contibutes to SQL Server needing far more HDDs to produce a good TPC-H result. Of course, before launching a campaign to push for a change to IO cost model, on SSD and with data in memory, the optimal sequential to random ratio is in the other direction.

The second point concerns throughput degradation relative to the single-stream power result. The general theory is that for single-stream, increasing parallelism up to one thread per core improves performance at the expense of efficiency. For example assumming nearly idea scaling, if a query takes 256 seconds elapsed time to run at DOP 1, consuming 256 CPU-seconds. Then at DOP 2, it might take somewhat more that 128 seconds elapsed time to run, while consuming somewhat more than 256 CPU-seconds. The CPU-seconds is expected to rise with increasing DOP.

When there are multiple active streams, the expectation is that the SQL Server engine should know not to start a new query at maximum parallelism (with one thread per processor core). If a long running query starts with threads on all processors, and then one or more other connections because active, perhaps it is not practical (or possible) to reduce the number of threads on a running query. In this case, the proper strategy for the query running on all processors might be to bring 2 or more threads onto each processor to better share resources? So when multiple streams are active, each stream should be running at lower degrees of parallelism, which while taking longer elapsed time to run, should be more CPU efficient, and hence throughput should be higher than power.

The Superdome systems with Oracle were configured with a rather large number of disk drives, 768 in the 32 dual-core 9140 and 576 in the 16 quad-core 9350. It is unclear as to the reason why. It might be because HP wanted to showcase their EVA4440 and MSA2300 SAN storage systems. SAN systems do not provide good sequential bandwidth except possibly with 2 disk RAID 1 groups or 3-disk RAID 5 group. RAID groups more disks might provide 27MB/sec per disk. It might be because Oracle Automated Storage Management prefers this type of configuration. It is also unclear as to why HP elected to configure so many HBA and FC ports. The data consumption rate for the big Line Item scan queries seem to be generating no more than 5-8GB/sec. So why configure 48 dual-port HBAs (2 HBA per MSA with 24 disks) at 8Gbps which could conceivably support 70GB/s? A dozen dual-port 8Gbps HBAs (1 HBA per 2 MSA, 48 disks) should have been sufficient.

Both Superdome/Oracles results do show higher Throughput than Power.

See SF1000 for additional details.

SF 3000 Results

The chart below shows TPC-H SF3000 results for the 16-way Xeon 7460 (96-cores total), the 8-way Xeon 7560 (64-core total) and 32-way IBM POWER6 (also 64-cores). The IBM POWER6 is a 2007(?) processor even though the TPC-H report is November 2009. The POWER6 has recently been suceeded by the POWER7 for which no TPC-H result have been published yet. The indications are that the new POWER7 is significantly improved at the core performance level.

Oracle just published a SF3000 result with a 32-way quad-core SPARC 2.88GHz system. I realize this Oracle TPC-H result employed very many streams, 64 in this case. Compare this with 8 streams for the HP DL980G7 results for SQL Server. I will look into this in more detail later.

tpch100

TPC-H SF 3000 Results for Xeon 7460 & 7560 and IBM POWER6 (Sybase)

tpch100
TPC-H SF 3000 Results for Xeon and IBM POWER6

SystemProcessorTotal
Cores
MemDisksTPC-H
Power
TPC-H
Throughput
Composite
QphH
Streams
UnisysXeon 7460961024900120,254.887,841.4102,778.28
DL980 G7Xeon 756064512500185,297.7142,685.6162,601.78
Power 59532 POWER664512288142,790.7171,607.4156,537.39
M900032 SPARC128512256182,350.7216,967.7198,907.564

Additional details are below:

SystemES7000DL980G7Power 595
ProcessorXeon 7460Xeon 7560Power6
Sockets-Cores16 x 6 = 968 x 8 = 6432 x 2 = 64
Hyper-Threadingnoyes4/core?
Frequency2.66GHz2.26GHz5.0GHz
Memory1024GB512GB512GB
Controllers24 FC10+3 SAS24 x 4G DP FC
12 DS4800
Enclosures32+2820+6 D270048 EXP810
Data disks900 HDD500 HDD288 HDD
Controller-Disks8x60+14x3010x50+3x481 per 24
LUN-disks15-11?2?
OS2008 R2 DC2008 R2 EEAIX 6.1
Database2008 R2 DC2008 R2 EESybase 15.1

As with the SF1000 Sybase result on 8-way Opteron, the POWER6 with Sybase was also configured with relatively few disks. The TPC-H throughput is significantly higher (20%) than the power. Compare this to the Xeon 7560 in which throughput is 23% lower than power.

See SF3000 for additional details.

SF 10000 Results

There is a TPC-H SF 10000 result with SQL Server on a Unisys ES7000 with 16 Xeon 7460 processors, using 64 of the 96 cores, as this was pre-2008R2. There is also a 32-way dual-core Itanium result also with SQL Server.