Home, Benchmarks, TPC-C, TPC-E, TPC-H, SPEC CPU,

TPC-E Summary

The TPC transaction processing benchmarks (TPC-C and TPC-E) used to be a more prominent part of database discussions, but this has diminished for several reasons. One, Intel has clearly won the processor competition with niche segments to POWER7 and SPARC T3. Second, Microsoft moved to the newer TPC-E benchmark while Oracle stayed with the older TPC-C, which is more amenable to scale-up hardware configuration (RAC) while TPC-E is not. A third reason is that in recent years, hardware has become so powerful that system performance-capability is nolonger a serious concern for most situations.

Still, there are useful observations that can be drawn from the limited number of results being reported. One example is the progression over time for the 4 and 8-socket Nehalem-EX and Westmere-EX systems. At the 8-way level, between the first Nehalem-EX to the most recent Westmere-EX results, performance has increased 73% from 3141 to 5457 tpsE. The contribution for raw compute capability is 32% (8 to 10 cores per socket, and 2.27GHz to 2.4GHz). Some of the gain can be attributed to memory, from 1TB to 4TB, the switch from HDD to SSD reducing the number of transactions in flight, and some to NUMA improvements between SQL Server 2008 R2 to 2012 engine. But there is still a significant portion that can be attributed to just learning how to configure a system with so very many processor cores to support high transaction volume.

Also interesting are the three TPC-E results on 2-socket E5-2690 system and SQL Server 2012. All use SSD storage and have very similar performance. IBM posted the first result of 1,863 tpsE on 6 Mar 2012. Fujitsu followed with 1,872 on 5 Jul 2012. HP posted last with 1,882 on 20 Nov 2012 for their DL380p Gen8. The IBM and Fujitsu configured 512GB memory, with 16 x 32GB DIMMs. The inteesting aspect is that HP employed 256GB with 16 x 16GB DIMMs. Today, the 16GB registered ECC DIMM is $210 from Crucial, and $1,349 for the 32GB DIMM. I am inclined to speculate that the performance difference between 256GB and 512GB might be a few percentage points, all thing being equal and on a powerful storage system.

Update 2013-03

IBM just posted a new 8-way Xeon E7-8870 (Westmere-EX) result. The new result does use a whopping 4TB memory versus a mere 2TB in the earlier Westmere-EX results. The first 8-way E7 did not show good scaling, so presumably there was some problem that took time to resolve. Below is the history of 8-way results for Nehalem and Westmere EX.

ProcessorsCoresFreq.MemorySQLVendorTPC-EDate
8 x E7-8870102.4GHz4096GB2012IBM5,457.202013-03-08
8 x E7-8870102.4GHz2048GB2012NEC4,614.222012-03-27
8 x E7-8870102.4GHz2048GB2K8R2IBM4,593.172011-08-26
8 x E7-8870102.4GHz2048GB2K8R2Fujitsu4,555.542011-06-11
8 x E7-8870102.4GHz2048GB2K8R2NEC4,200.612011-04-28
8 x X756082.27GHz1024GB2K8R2Fujitsu3,800.002010-09-24
8 x X756082.27GHz1024GB2K8R2NEC3,141.762010-03-10
8 x X746062.67GHz256GB2K8Unisys1,165.562009-04-13

The IBM 2013/3 result is on Win 2012, the earlier results are on W2K8 R2 SP1.

Compare this with 4-socket systems on Nehalem-EX, Westmere-EX and Sandy Bridge EP.

ProcessorsCoresFreq.MemorySQLVendorTPC-EDate
4 x E5-465082.7GHz512GB2012Fujitsu2,651.272012-11-05
4 x E7-4870102.4GHz2048GB2012IBM3,218.462012-11-28
4 x E7-4870102.4GHz1024GB2K8R2IBM2,862.612011-06-27
4 x E7-4870102.4GHz1024GB2K8R2HP2,454.512011-04-05
4 x X756082.27GHz1024GB2K8R2IBM2,022.642010-03-30
4 x X746062.67GHz128GB2K8IBM729.652008-09-15

Below are representative 2-socket system results

ProcessorsCoresFreq.MemorySQLVendorTPC-EDate
2 x E5-269082.9GHz256GB2012HP1,881.762012-11-21
2 x E7-2870102.4GHz512GB2K8R2IBM1,560.702011-05-27
2 x X569063.46GHz192GB2K8R2HP1,284.142011-05-04
2 x X568063.33GHz96GB2K8R2HP1,110.102010-05-11
2 x X557042.93GHz96GB2K8IBM817.152009-07-14
2 x X546043.16GHz64GB2K8Fujitsu317.452008-05-30

Notice that in the progression from 2x10 Westmere-EX on SQL 2K8R2 to 2x8 Sandy Bridge EP on SQL 2012, performance increases even though memory decreases from 512G to 256GB. From 4x10 Westmere-EX to 4x8 Sandy Bridge EP, performance decreases relative to both of the more recent Westmere-EX results.

The charts below show the progression of performance over time for the some of results above.

tpcE

For the 2-socket systems, West-1 is from the first set of TPC-E results reported for Westmere X5680 with HDD storage and West-2 is the later X5690 report with SSD storage. Both are 6-core Westmere-EP processors. The West-3 is the E7-2870 10-core (Westmere-EX) on SSD storage.

For the 4-socket systems, West-1 is on HDD storage, and West-2 on SSD, both 2K8R2 and 1TB memory. The West-3 is on Win/SQL 2012, 2TB memory and SSD storage.

tpcE

For the 8-socket systems, Neh-1 is on HDD storage and Neh-2 is the later result on SSD storage. West-1 is the initial 2011-04 NEC report, West-2 is the 2011-06 Fujitsu report and West-3 is the IBM 2013-03 report with SQL 2012.

8-way details

Below are more details for the 8-ways

SystemIBMNECIBM  
Processor E7-8870E7-8870E7-8870  
Sockets-Cores 8 x 108 x 108 x 10  
Hyper-Threadingyesyesyes  
Frequency (GHz)2.402.402.40  
Memory (GB) 204820484096  
IO Controllers11+1 SAS11+1 SAS11+1 SAS  
Data Disks 143 SSD396 SSD220 SSD  
OS 2008 R22008 R22012  
SQL Server 2008 R220122012  
tps-E4,593.174,614.22 5,457.29   

Below are average response times for the 8-ways

SystemIBMNECIBM  
Processor E7-8870E7-8870E7-8870  
Storage SSDSSDSSD weight
Broker-Volume 0.030.050.02 4.9%
Customer-Position0.020.040.01 13.0%
Market-Feed 0.020.030.01 1.0%
Market-Watch 0.030.030.01 18.0%
Security-Detail 0.010.020.00 14.0%
Trade-Lookup 0.110.100.07 8.0%
Trade-Order 0.050.080.04 10.1%
Trade-Result 0.060.100.02 10.0%
Trade-Status 0.010.020.01 19.0%
Trade-Update 0.130.120.08 2.0%
Data-Maintenance 0.030.020.01 n/a
Weighted Avg Resp.0.03540.0484 0.0193  
Txn in flight1,6112,202 878  
Bus recov time02:02:2301:08:5300:34:56  

The tps-E score is the rate of Trade-Orders, which makes up 10.1% of call volume. Hence the number of transactions (of all types) in flight at any given point in time is (tx /sec)*(1.0 all/0.101 trade-order)*(avg response time)

Below are max response times for the three recent 8-way systems.

SystemIBMNECIBM  
Processor E7-8870E7-8870E7-8870  
Storage SSDSSDSSD  
Broker-Volume 0.862.790.22  
Customer-Position4.715.520.72  
Market-Feed 4.685.631.11  
Market-Watch 0.625.460.36  
Security-Detail 5.214.525.21  
Trade-Lookup 2.495.580.53  
Trade-Order 4.925.841.11  
Trade-Result 4.775.761.53  
Trade-Status 1.464.510.64  
Trade-Update 4.945.780.83  
Data-Maintenance 0.030.170.07  

Between the 2011-08 and 2013-03 IBM reports, there was a good reduction in average response times, but a dramatic reduction in maximum response times. Only Security-Detail did not experience a reduction in max response time.

Update 2012

Intel Xeon E5 (Sandy Bridge-EP) and SQL Server 2012 Benchmarks

Intel officially announced the Xeon E5 2600 series processor based on the Sandy Bridge-EP core, upto 8 cores and 20MB LLC per socket. Only one TPC benchmark accompanied product launch, summary below.

ProcessorsCores perFrequencyMemorySQLVendorTPC-E
2 x Xeon E5-269082.9GHz512GB (16x32GB)2012IBM1,863.23
2 x Xeon E7-2870102.4GHz512GB (32x16GB)2008R2IBM1,560.70
2 x Xeon X569063.46GHz192GB (12x16GB)2008R2HP1,284.14

The Xeon E5 superceeds 2-socket systems based on both the Xeon 5600 (Westmere-EP) and Xeon E7 (Westmere-EX). It is evident that Sandy Bridge improves performance over Westmere at both the socket and core and also on a GHz basis.

ArchitectureTotal CoresFrequencyCore-GHzTPC-Etps-E per core-GHz
Sandy Bridge-EP2 x 8 = 162.9GHz46.41,863.2340.16
Westmere-EX2 x 10 = 202.4GHz48.01,560.7032.51
Westmere-EP2 x 6 = 123.46GHz41.521,284.1430.93

A later version of the Xeon E5 will support 4-socket systems. There is no explanation as to whether glue-less 8-socket systems will be supported in the future. It was previously discussed that there would a EN variant of Sandy Bridge with 3 memory channels and fewer PCI-E lanes.

2-way Intel Xeon 5600 and AMD Opteron Magny-Cours Systems

The initial TPC-E results for 2-way Xeon 5680 systems with HDD storage were around 1,100 tps-E. Later Fujitsu achieved 1,246 with SSD storage. The 2011 HP result with the Xeon 5690 is of note in the use of the Violin SSD storage system.

SystemDell
T710
HP
DL380 G7
HP
DL385 G7
Fujitsu
RX300
HP
DL380 G7
Processor Xeon 5680Xeon 5680Opt 6176Xeon 5680Xeon 5690
Sockets-Cores 2 x 6 = 122 x 6 = 122 x 12 = 242 x 6 = 122 x 6 = 12
Hyper-Threadingyes yesnoyesyes
Frequency 3.33GHz 3.33GHz2.3GHz3.33GHz3.46GHz
Memory 144GB 96GB128GB96GB192GB
IO 6 SAS 6 SAS5 SAS5 SAS?
Enclosures 24 MD122024 MSA7024 D27005?2?
Data Disks 576 HDD528 HDD528 HDD120 SSD2 Violin
OS 2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE
Database 2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE
tps-E1,074.141,110.11887.38 1,246.131,284.14

Most if not all TPC-E reports employ RAID-5 for data, and RAID-10 for logs. The storage enclosures and number of disks in the above tables here are for data only?

The Fujitsu system, the 5 disk enclosures (24 x 2.5in SAS/SATA) are $2,611 each, and $13,056 for quantity 5. The 64GB SATA SSD is $1,131 each and $135,762 for quantity 120. The total storage cost is $151,300 including rack, but not including HDDs in the server itself.

The HP DL380 with Xeon 5690, the Violin storage system is $100,500 each, and $201,000 for quantity 2. By contrast, the 528 disks in the HP DL380 G7 result cost structure is $3,199 per enclosure at quantity 24 for $76,776 li, and $349 per 15K 72GB HDD for $184,272, with total $261,000, excluding spares.

Below are the average response times for the 10 TPC-E transactions, along with the weight of the transaction and the number of frames in each transaction.

SystemDell
T710
 
HP
DL380
G7
HP
DL385 G7
Fujitsu
RX300
 
HP
DL380 G7
 
weightframes
Processor Xeon 5680Xeon 5680Opt 6176Xeon 5680Xeon 5690   
StorageHDDHDDHDDSSDViolin   
Broker-Volume 0.040.060.050.040.02 4.9%1
Customer-Position0.040.050.040.050.02 13%2
Market-Feed 0.040.070.040.040.02 1%1
Market-Watch 0.020.050.030.030.02 18%1
Security-Detail 0.020.020.020.020.01 14%1
Trade-Lookup 0.620.810.650.150.09 8%4
Trade-Order 0.090.120.090.110.05 10.1%4
Trade-Result 0.100.120.100.130.06 10%6
Trade-Status 0.030.050.030.030.01 19%1
Trade-Update 0.690.900.720.170.11 2%3
Data-Maintenance 0.090.190.060.130.02   
Weighted
Avg Response
0.10220.13840.10740.06230.0311   
Average
tx in flight
1087 1521 943.6 768.7 395.4   

The transactions in flight is calculated as the (transactions per sec) x (1/0.101) x (average response time) The seconds term is the inverse of fraction trade-orders. For each trade-order, there are just less than 10 transactions, some of which have multiple frames, even though only the trade-order is scored. The total transaction volume per seconds times the average response time (in seconds) is the average number of active transactions in flight. In theory, having a lower number of transactions in flight at any given point in time should reduce contention, with moderately improved performance.

Maximum Response Times

The table below is the maximum response time for each transaction. It expected that as the system is pushed towards 100%, the maximum response time should increase sharply.

SystemDell
T710
 
HP
DL380
G7
HP
DL385 G7
Fujitsu
RX300
 
HP
DL380 G7
 
Processor Xeon 5680Xeon 5680Opt 6176Xeon 5680Xeon 5690 
StorageHDDHDDHDDSSDViolin 
Broker-Volume 1.74 4.83 1.87 1.220.08 
Customer-Position 2.8610.89 2.68 2.350.99 
Market-Feed 3.9017.25 2.67 9.781.03 
Market-Watch 2.60 6.02 1.88 1.660.78 
Security-Detail 2.75 5.57 1.81 1.280.58 
Trade-Lookup 3.8816.32 3.77 1.480.31 
Trade-Order 2.8311.39 1.98 2.891.36 
Trade-Result 4.0618.78 4.35 2.291.51 
Trade-Status 2.9111.88 2.06 1.421.41 
Trade-Update 3.79 5.62 3.72 1.291.11 
Data-Maintenance 0.52 1.04 0.31 2.260.07 

4-way Xeon 7560, E7 & Magny-Cours Systems

Below are the 4 TPC-E results for 4-way Xeon 7560 systems. The IBM and HP systems are very similar, with the same memory and type of storage, and the results are within 1% of each other. The Dell result is 3.5% lower than the HP result and 4.5% lower than the IBM, which can probably be attributed to the difference in memory, 512GB versus 1TB.

SystemIBM
x3950 X5
Dell
R910
HP
DL580 G7
Fujitsu
RX600 S5
HP
DL585 G7
HP
DL580 G7
ProcessorXeon 7560Xeon 7560Xeon 7560Xeon 7560Opt 6176Xeon E7-4870
Sockets-Cores4 x 8 = 324 x 8 = 324 x 8 = 324 x 8 = 324 x 12 = 484 x 10 = 40
Hyper-Threadingyesyesyesyesnoyes
Frequency2.26GHz2.26GHz2.26GHz2.26GHz2.3GHz2.40GHz
Memory1024GB512GB1024GB512GB256GB1024GB
IO 6x2 SAS6+3 SAS10 SAS8 SAS7 SAS11 SAS
Storage 1008 HDD576+480 HDD750+240 HDD192 SSD700 HDD950+150 HDD
OS 2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE
SP1
Database2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE2008 R2 EE
tps-E2,022.64 1,933.962,001.12 2,046.96 1,400.14 2,454.51

IBM 6 ServerRAID-M5025 controllers, 84x12 disks, 42x24, 6x7x24=1008, 7 sets of 24 disks in RAID-10.
Dell 6 PERC H800, 2 LSI MegaRAID SAS 9280-8e, 24 MD1220, 20 MD1120.
HP 10 Smart Array P411, 40 StorageWorks D2700.

The Dell system 64x8GB DIMMs cost $32,000, while the 64x16GB DIMMs in the HP system costs $92,736. Some care should be taken in assessing whether the performance gain justifies the higher cost of 16GB DIMMs.

The Fujitsu system has the same 512GB memory as in the Dell, with SSD replacing HDD storage, for a 5.8% performance gain. The performance advantage over the IBM and HP with 1TB memory, but HDD storage, is 1-2%.

Below are the average response times for the 10 TPC-E transactions, along with the weight of the transaction and the number of frames in each transaction.

SystemIBM
x3950
 
Dell
R910
 
HP
DL580 G7
Fujitsu
RX600 S5
HP
DL585 G7
HP
DL580 G7
weightframes
ProcessorXeon 7560Xeon 7560Xeon 7560Xeon 7560Opt 6176Xeon E7  
Storage HDDHDDHDDSSDHDDHDD  
Broker-Volume 0.050.060.060.070.060.044.9%1
Customer-Position0.030.040.040.050.030.0313%2
Market-Feed 0.030.040.030.030.040.031%1
Market-Watch 0.030.040.040.050.030.0318%1
Security-Detail 0.020.020.020.030.020.0114%1
Trade-Lookup 0.400.520.410.120.570.408%4
Trade-Order 0.080.090.080.110.080.0710.1%4
Trade-Result 0.090.110.100.140.090.0710%6
Trade-Status 0.020.030.020.030.020.0219%1
Trade-Update 0.450.580.460.140.630.452%3
Data-Maintenance 0.070.090.070.070.090.07  
Weighted
Avg Response
0.0770.0980.0820.0670.09450.0718  
Average
tx in flight
153618671631135113101745  

Maximum Response Times

SystemIBM
x3950
Dell
R910 
HP
DL580 G7
Fujitsu
RX600 S5
HP
DL585 G7
HP
DL580 G7
ProcessorXeon 7560Xeon 7560Xeon 7560Xeon 7560Opt 6176Xeon E7
Storage HDD HDD HDD SSD HDDHDD
Broker-Volume 1.72 4.36 2.72 3.11 6.96 0.27
Customer-Position 2.19 3.62 4.56 3.40 8.0720.33
Market-Feed 17.0644.18 6.9417.7826.1820.72
Market-Watch 2.46 3.93 4.56 3.25 6.95 1.08
Security-Detail 2.16 2.87 2.64 2.75 7.04 0.91
Trade-Lookup 3.34 3.07 5.0818.53 7.71 1.85
Trade-Order 2.7710.0219.56 3.6310.2720.60
Trade-Result 14.72 6.4020.41 3.3710.00 5.54
Trade-Status 2.45 2.16 4.52 3.22 6.92 1.44
Trade-Update 10.3212.41 5.03 3.53 7.57 1.87
Data-Maintenance 0.32 0.58 0.54 2.49 0.56 0.87

The Fujitsu system with SSD storage has significantly lower average response time on Trade-Lookup and Trade-Update compared with HDD, but curiously slightly higher response time on Trade-Order and Trade-Result.

The Dell system has the longest average response time (weighted over all calls) as expected with less memory, but not the SSD storage in the Fujitsu system.

8-way Xeon 7560 (Nehalem-EX) & E7 (Westmere-EX) Systems

Below are the 8 TPC-E results for 8-way Xeon systems.

VendorNECFujitsuNECFujitsuIBM
SystemA1080aRX900 S1A1080aRX900 S2x3850 X5
ProcessorXeon 7560Xeon 7560Xeon E7-8870Xeon E7-8870Xeon E7-8870
Sockets-Cores8 x 8 = 648 x 8 = 648 x 10 = 808 x 10 = 808 x 10 = 80
Hyper-Threadingyesyesyesyesyes
Frequency2.26GHz2.26GHz2.40GHz2.40GHz2.40GHz
Memory1024GB1024GB2048GB2048GB2048GB
IO 7 FC14 SAS11 SAS16 SAS11 SAS
Storage 1872 HDD336 SSD399 SSD384 SSD143 SSD
OS 2008 R2 DC2008 R2 DC2008 R2 EE2008 R2 DC2008 R2 EE
Database2008 R2 DC2008 R2 DC2008 R2 EE2008 R2 DC2008 R2 EE
tps-E3,141.763,800.00 4,200.614,555.544,593.17

Transaction Response Times, Average and Maximum

SystemNEC
A1080a
Fujitsu
RX900 S1
NEC
A1080a
Fujitsu
RX900 S2
IBM
x3850 X5
weightframes
Processor X7560 X7560 E7-8870E7-8870E7-8870   
Storage HDD SSD SSDSSDSSD   
Broker-Volume 0.050.060.040.030.034.9%1
Customer-Position0.020.050.030.020.0213%2
Market-Feed 0.030.030.020.010.021%1
Market-Watch 0.030.050.040.020.0318%1
Security-Detail 0.010.020.010.010.0114%1
Trade-Lookup 0.500.130.110.100.118%4
Trade-Order 0.070.100.070.040.0510.1%4
Trade-Result 0.070.130.070.030.0610%6
Trade-Status 0.020.030.020.010.0119%1
Trade-Update 0.560.140.120.110.132%3
Data-Maintenance 0.110.070.020.030.03  
 
tps-E3,141.763,800.00 4,200.614,555.544,593.17  
weighted
Avg Response
0.08120.06350.04370.0283x  
Average
tx in flight
2526.52389.11817.51276.5x  

Transaction Response Times, Maximum

SystemNEC
A1080a
Fujitsu
RX900 S1
NEC
A1080a
Fujitsu
RX900 S2
IBM
x3850 X5
  
Processor X7560 X7560 E7-8870E7-8870E7-8870   
Storage HDD SSD SSDSSDSSD   
Broker-Volume 2.886.72 2.470.650.86  
Customer-Position43.553.49 2.661.824.71  
Market-Feed 48.813.48 7.693.784.68  
Market-Watch 2.772.83 2.671.880.62  
Security-Detail 2.893.79 2.631.825.21  
Trade-Lookup 49.093.3014.422.412.49  
Trade-Order 45.963.74 2.701.894.92  
Trade-Result 68.737.10 2.821.514.77  
Trade-Status 60.236.51 2.631.821.46  
Trade-Update 3.463.7914.422.144.94  
Data-Maintenance        

I talked about these two on SQL Blog I am also thinking the max response times are also very important, and may be a better indication of SSD, or rather, a proper storage solution

TPC-E details

The tps-E score is the number of Trade-Order transactions completed per second. The weighted average response time is the sum of the average response time of the individual transactions weighted by the percentage weight. Note that the transaction volume is approximately 10 times the tps-E score (1 over 10.1%). However, the weighted average number of frames per transaction is 2.213. So the volume of stored procedure (RPC) calls (as reported by the performance counter SQL Statistics -> Batch Requests/sec) is approximately 22 times the tps-E score.

A frame is implemented as its own stored procedure. In a transaction, each frame is called separately in sequence (?).

TPC-E transaction frames

TPC-E

DateSystemcoresMHzCacheMemoryProcesstps-E
02/20084 x Xeon X7350 Quad2.93G2x4M128G65nm479.51*
07/20084 x Opteron 8360 Quad2.50G2M - 65nmno result
09/20084 x Xeon X7460 Six 2.66G16M 128G45nm729.65*
02/20094 x Opteron 8384 Quad2.70G6M 64G 45nm635.43
07/20092 x Xeon X5570 Quad2.93G8M 96G 45nm817.15
n/a 4 x Opteron 8439 Six 2.80G6M - 45nmno result

* IBM x3850M2 on proprietary chipset

A note of caution in comparing Xeon 5500 series with Xeon 7300/7400 and Opteron processors. The higher-end Xeon 5500 have Hyper-Threading feature which probably make substantial improvements to TPC-C and TPC-E results. The HT feature does not benefit all SQL operations evenly. Still, the 2-way Xeon 5570 is very impressive compared to all the most recent 4-way systems, even the 24-core Xeon 7460.