Home, Benchmarks, TPC-C, TPC-E, TPC-H, SPEC CPU,
Details: SF100, SF300, SF1000, SF3000, See also SQL Server vs Sybase

Additional details are below. The two IBM results employ SSD storage. The older results are on HDD storage. In addition, the IBM x3850 X5 (with Xeon 7560) system is configured with 1.5TB memory. The total size of the TPC-H SF 1000 database, all tables and indexes, should be 1.4TB. A storage system (7 SSDs) capable of very high IO rates is still required to handle the intense tempdb activity.

SystemDL785 G6DL785 G6Superdome 2x3850 X5Power 780x3850 X5
DatabaseSQL ServerSybase 15.1Oracle 11g R2SQL ServerSybase 15.2SQL Server
ProcessorOpteron 8439Opteron 8439Itanium 9350Xeon 7560POWER7Xeon E7-8870
Sockets-Cores8 x 6 = 488 x 6 = 4816 x 4 = 644 x 8 = 328 x 4 = 328 x 10 = 80
Hyper-Threadingnonodisabled2 per4 perdisabled
Storage Controllers6 P8008 x 8Gbps
dual-port FC
48 8Gpbs
dual-port FC
Storage Ext12 MSA704 MSA2324fc24 MSA2324 4 EXP 12 
Data disks240 HDD96 HDD576 HDD7 SSD52 SAS SSD7 SSD
Controller-Disks3x50, 25, 30, 351 per 241 per 24   
LUNs-disks48x5??3 per 6   
OS2008 R2 EERHEL 5.3HP-UX 112008 R2 EERHEL 6 
Database2008 EESybase 15.1Oracle 11g R22008 R2Sybase 15.22008 R2

Below are the individual query run times for the three more recent reports.

TPC-H SF 1000 individual query execution times

Note the wide variation between different systems and database engines. This could reflect differences in any of:

 1) processor and system architecture,
 2) memory versus disk, HDD and SSD
 3) execution plans
 4) the efficiency between component operations (scan, index seek, hash, sort, etc)

and probably other factors as well. It would be interesting to compare the execution plans between different database engines, even to force the SQL Server execution plan to one as close as possible to the plans employed on the other database engines.

The main point of interest is not moderate differences in the overall (geometric mean) performance, but rather the very large differences in certain queries. The long run time for Q18 should probably be investigated.

Another view, the 4-way Xeon 7560 SF 1TB with 1.5TB memory + SSD versus 4-way Xeon SF 3TB with 0.5TB memory & HDD. The number of processor doubles, but the database is 3 times larger. On this alone, we might expect a 50% difference in query time. But there are also significant differences in memory-to-data ratios, and storage performance characteristics.

TPC-H individual query execution times for 4-way 1TB and 8-way 3TB

On the 8-way system at SF 3TB, Q18 actually runs faster than on the 4-way system at SF 1TB. But the other larger queries, Q1, 9, and 21, show the expected pattern.


TPC-H 1000GB: 8-way 6-core Opteron 785G6 vs 16-way quad-core Itanium

Below are the TPC-H 1000GB results on the 8-way ProLiant DL785 G6 with the Opteron 8439 processor for both SQL Server and Sybase, and for Oracle 11g (original and R2) on the 32-way Superdome dual-core Itanium 9140 and the 16-way Superdome 2 with quad-core Itanium 9350.

When the operating system and database engine are both completely different, caution is warranted in comparing the results. There are two questions involved. One is the performance of each database engine on a specific execution execution plan, with allowance for minor differences. The other question is whether there are significant difference in the execution and how the cost based optimizer in each engine weighted the alternative plans.

SQL Server versus Sybase

The Sybase system has less memory at 384GB versus 512GB for the SQL Server system. This is not expected to have much impact as other TPC-H report indicate that unless all data can reside in memory, the TPC-H queries performance is not significantly impacted by memory capacity. The other difference is the Sybase has fewer disk drives of 96 versus 240 for the SQL Server system. The storage system is different as the SQL Server system employs RAID controllers with direct-attach storage. The Sybase system has 8 dual-port FC HBAs with the MSA2324fc SAN based storage.

tpch300 DL785 vs DL585
TPC-H Power query run times, Sybase relative to SQL Server, both on DL785 48-core

The SQL Server system has a rather peculiar disk arrangement. Three of the P800 RAID Controllers each connect to 50 disks. There are 25, 30 and 35 disks on each of the other 3 controllers. It is presumed that there are 48 5-disk RAID 5 LUNs It might be that the 3 P800 controllers with 50 disks resides in the 3 x16 PCI-E slots, and the other 3 P800 controllers reside in the x8 PCI-E slots. Given that the P800 has a x8 PCI-E port, the exact placement should not matter so long as each P800 resides in a x8 or x16 slot(?). The P800 controller should be able to sustain 1.6GB/sec, but assuming balanced IO per disk, only the 3 P800 with 50 disks in this system will drive 1.6GB/s. Each set of 5 disks will drive 160MB/sec (because of the 50 disks via 1 P800 at 1.6GB/sec), so the other 3 controllers will be driving 0.8, 0.96 and 1.1GB/s for possibly a combined bandwidth of 7.6GB/s.

The Sybase system has 8 dual-port 8Gbps FC HBA for 16 x 8Gbps FC ports. Each 8Gbps port should be able to sustain 600MB/sec+, or 1.2GB/sec per HBA for an upper bound of 9.6GB/sec over 8 HBAs. However this will not be possible in this system because there are only 6 PCI-E slots that are x8 or wider. This would also imply 100MB/sec per disk and 2.4GB/sec per MSA2324fc. The Microsoft Fast Track Data Warehouse Reference Guide has found that 100MB/sec is possible with 2 disk RAID 1 LUNs. An HP report suggests that an earlier model of the MSA with 4Gbps FC connections has a sustained bandwidth of 1.3GB/s. If the bandwidth in limited at the PCI-E x4 slots, then each HBA will be able to sustain 800MB/sec or 6.4GB/sec total.

The Sybase file layout shows 16 LUNs per filegroup. The strategy is that there is 1 file for each FC port as the IO to a specific must travel through 1 path at any given time(?). There maybe fail-over paths. This also implies 8 disks per LUN, meaning that high sequential throughput can be achieved in LUNs with more than 2 disks as employed in the MS FTDW reference.

In any case, both storage system should have sequential IO bandwith (close enough) to support the consumption by the database engines. The significant difference that the SQL Server system with 240 disks will be able to support 2.5X more random IO.

The chart below shows the TPC-H 1000GB Power query runtime for the Sybase system relative to the SQL Server system. There is wide variation in the individual queries, ranging from SQL Server much faster on some to Sybase mucher faster on others. The geometric mean has a 13% advantage for Sybase. The queries that Sybase runs much faster are 6, 11, 12, 17, 18, 19, and 21.

tpch300 DL785 vs DL585
TPC-H Power query run times, Sybase relative to SQL Server, both on DL785 48-core

tpch300 DL785 vs DL585
TPC-H Power query run times, Sybase relative to SQL Server, both on DL785 48-core

tpch300 DL785 vs DL585
TPC-H Power query run times, Sybase relative to SQL Server, both on DL785 48-core

Superdome2 versus Superdome

The HP Superdome2 16-way quad-core 1.73GHz result represents an improvement over a 2009 result for the Superdome 32-way dual-core 1.6GHz, both on HP-UX 11i v3, the second on Oracle 11g. We can probably attribute 5% of the performance gain to the 8% increase in processor frequency. The rest of the performane should probably be attributed to the improved system architecture of the Superdome2 and sx3000 chipset (along with Itanium 9350 integrated memory controller) over the original Superdome and sx2000 chipset. The gain is not large, as the Superdome already had a good system architecture.

As the expectation is that doubling the number of processors should lead to approximately 1.6X performance gain, we can see that six-core Opteron 8439 is the same neigbhorhood as the quad-core Itanium 2 9350. The individual Opteron processor is probably a little better than the Itanium at the socket level in the TPC-H Power test, but the Itanium has the advantage in through-put oriented usage.

The chart below shows the TPC-H power query run times for the 16-way Itanium relative to the 8-way Opteron.

tpch1000 DL785 vs Itanium
TPC-H Power query run times, 16-way quad-core Itanium relative to 8-way 6-core Opteron

tpch1000 DL785 vs Itanium
TPC-H Power query run times, Itanium 16-way quad-core relative to 32-way dual-core

As expected, there is wide variation in the individual queries. The are differences in almost every important area: the processor and system architecture, the operating system and the database engine. It is not just the difference in the database engine, but also the execution plans.

Below is an Oracle RAC result with a 64-nodes, 128 Xeon 5450 processors and 512-cores total versus the 16-way Itanium 9350 quad-core.

tpch1000 DL785 vs Itanium
TPC-H Power Oracle RAC 512-core Xeon 5450 versus 64-core Itanium 9350

Overall, the 512-core Oracle RAC is 5.6 times higher on TPC-H Power than the 64-core Itanium. At the core level, the Core2 architecture is a more powerful processor than the Itanium. So the aggregate compute capability should be more than 8 times greater in the RAC system. Some TPC-H queries natually scaling in a distributed system, and others do not. The results range from 1.6X to 13.8X scaling, in-line with expectations on the low and high-end.