Home, Benchmarks, TPC-C, TPC-E, TPC-H, SPEC CPU,
Details: SF100, SF300, SF1000, SF3000, See also SQL Server vs Sybase

TPC-H SF3000

TPC-H 2011 Mar 23

New Oracle/Sun 3TB result on M9000 with 64 SPARC64 VII+, 256 cores total.

tpch100
TPC-H SF 3000 Results for Xeon, POWER6 and SPARC systems.

SystemProcessorTotal
Cores
MemDisksTPC-H
Power
TPC-H
Throughput
Composite
QphH
Streams
DL980 G7Xeon 756064512500185,297.7142,685.6162,601.78
Power 59532 POWER664512288142,790.7171,607.4156,537.39
M900032 SPARC128512256182,350.7216,967.7198,907.564
M900064 SPARC2561024512316,835.8471,428.6386,478.3128

tpch100
TPC-H SF 3000 individual query execution times

8-way Xeon 7560 vs 16-way Xeon 7460 at 30TB

Below are the TPC-H 3000GB results for the 8-way ProLiant DL980 G7 with the Xeon 7560 processor and the 16-way ES7000 with the Xeon 7460. The 32-way dual-core IBM 5GHz Power6 result is also shown.

tpch100

TPC-H SF 3000 Results for Xeon 7460 & 7560 and IBM POWER6 (Sybase)

SystemProcessorTotal
Cores
MemDisksTPC-H
Power
TPC-H
Throughput
Composite
QphH
Streams
UnisysXeon 7460961024900120,254.887,841.4102,778.28
DL980 G7Xeon 756064512500185,297.7142,685.6162,601.78
Power 59532 POWER664512288142,790.7171,607.4156,537.39
M900032 SPARC128512256182,350.7216,967.7198,907.564

The Unisys 16 Xeon 7460 system had 8 FC HBA with 60 disks, 12 HBA with 30 disks and 4 HBA with on FC port to 15 data/temp disks (one of which also had 15 disks for log on the second port). The total is 900 disks for data and tempdb files. The 8 HBAs connected to 60 disks each operated at 8Gbps to a switch, that connected to disk enclosures at 4Gbps. The remaining HBA The HP DL980G7 system had 660 disks, of which 500 disks connected via 10 controllers held the data and tempdb files, 144 disks on 3 controllers were strictly for backup and flat file (ie, have no impact on performance), 16 disks on one controller for logs, plus the OS boot disks.

The Unisys system may have been over-configured in disks and memory. Many of the TPC-H queries involve large table (or range) scans). If the entire entire database cannot be brought into memory, then there may not be much difference in the disk IO generated with either 512G or 1TB memory. More importantly, the Windows operating system and SQL Server versions match, so there is high confidence we are seeing mostly the difference between the two processor (and system) architectures.

The IBM system may appear to be under-configured in terms of the number of disk drives. But it does seem that Sybase is better in switching from pseudo-random to sequential scan IO operations than SQL Server, and can work fine with fewer disks.

While the Xeon 7400 series processor core was top of the line in its time, even the 4-way Xeon 7400 system had limited memory bandwidth (and channels). Scaling beyond 4-way was not a simple matter. Of course, the Xeon 7400 systems were still competitive with systems based on processors with better scalability, but weaker single core performance.

Based on the 16-way Xeon 7460 result, the expectation is that an 8-way Xeon 7460 would be in the range of 75,000, i.e., doubling the number of processors should increase performance by 1.6X. In turn, there is sufficient reason to estimate that the Xeon 7560 is about 2.5X more powerful than the Xeon 7460 for data warehouse usage. This is less than the 2.77X observed in OLTP, which is inline with expectations because OLTP derives substantial benefits from Hyper-Threading (30%?) and data warehousing derives only a modest benefit from HT (10%?).

The chart below shows the TPC-H power query run times for the 8-way Xeon 7560 relative to the 16-way Xeon 7460. In the single stream test, the geometric mean is 50% higher, so the average query run time for the 8-way 7560 should be two-thirds that of the 16-way 7460.

tpch300 DL785 vs DL585
TPC-H Power query run times, 8-way Xeon 7560 relative to 16-way 7460

Below is the speedup from the 16-way Xeon 7460 to the 8-way Xeon 7560, overall 50% higher. This is the inverse of the chart above.

tpch100
TPC-H Power query run times, 16-way 7460 relative to 8-way Xeon 7560

Query 22 is 5.6X faster.

As with the earlier comparison, there is also wide variation in the individual queries. Many queries are 40% faster, two are about the same, two are actually slower, and one is more than 5X faster.

8-way 8-core Xeon 7650 versus 32-way dual-core IBM POWER6

The chart below shows the TPC-H power query run times for the 32-way IBM p595 with 64-cores relative to the 8-way Xeon 7560 also with 64 cores.

tpch300 DL785 vs DL585
TPC-H Power query run times, 64-core POWER6 relative to 64-core Xeon

The 64 core Xeon 7560 has 30% better TPC-H Power than the 64 core POWER6. The POWER6 in turn has 20% better TPC-H Throughput than the Xeon. Again, there is also wide variation in the individual queries. In query 18 and 19, where the Sybase is faster, the SQL Server execution plan shows key lookups at SF100. It would be helpful if HP could provide execution plans at SF3000. We should not draw too many conclusions when comparing completely different system architectures and completely different database engines. But, I think this is good hint for Microsoft to re-evaluate the execution plan cost formulas.

from sybase section

SQL Server 2008 R2 and Sybase IQ ASE 15.1

When comparing very different database engines, it is desirable to compare them on the same hardware platform, as done above. A comparison involving reasonably comparable systems might also be acceptable. The TPC-H 3000GB report for the 8-way ProLiant DL980 G7 with 8-core Xeon 7560 running SQL Server 2008 R2 can be compared with the 32-way dual-core IBM POWER6 running Sybase 15.1 with some caution to not draw unsupportable conclusions.

Obviously the Intel Xeon 7560 and IBM POWER6 are completely difference processor architectures and support completely different system architectures. At the processor core level, the individual Intel Xeon 7560 core has better performance than the IBM POWER6, even though the 7560 runs at 2.26GHz and the POWER6 in the p595 runs at 5.0GHz. This is evident in the SPEC CPU 2006 Integer (base) benchmark (see below). Still this is valid comparison at the 64-core system level because the IBM POWER6 was design with massive (CPU to memory and node to node) IO bandwidth, both necessary for scaling up.

The table shows the TPC-H 3000GB results for the 8-way 8-core Xeon 7560 system running SQL Server 2008 R2 and the 32-way dual-core POWER6 system running Sybase 15.1.

The 64-core Xeon is 30% higher on the TPC-H Power metric while the 64-core POWER6 is 20% higher on the TPC-H throughput metric. The overal composite queries per hour (normalized per GB) is close enough that these two system could be considered comparable for data warehouse type query performanace.

The system configuration details are below:

The HP TPC-H report lists the ProLiant DL980 G7 as being configured with 660 disk drives. Only 500 disks are for data (and tempdb) files. There are 144 disks used only for backups and flat files (as required?) and 16 disks are for the log file. The data disks are connected to the 10 LSI SAS 9200 controller, with 50 disks per controller, 25 disks per D2700 enclosure. The 144 backup disks are connected to the 3 P411 controllers, 48 disks per P411, and 24 disk per D2700 enclosure. Another 16 disks connected by 8 Gbps FC to a MSA2324fc SAN for logs.

The LSI SAS 9200 is actually a simple SAS controller, not a RAID controller. So this system was configured with 500 data files for each file group, one file per disk. My understanding was that this system was used for SSD testing, which did not function correctly with RAID controllers. Otherwise, a system with this many disk drives would have normally use RAID controllers?

HP did demonstrate this system at HP Technology Forum, showing a SQL Server table scan driving 26GB/sec sustained. This works out to 2.6GB/s per controller, in-line with LSI specifications of 2.8GB/s, and the x8 PCI-E gen 2 slot limit.

The IBM system may appear to be under-configured in terms of the number of disk drives. But it does seem that other database engines are better in switching from pseudo-random to sequential scan IO operations than SQL Server, and can work fine with fewer disks.

The chart below shows the TPC-H power query run times for the 32-way IBM p595 with 64-cores relative to the 8-way Xeon 7560 also with 64 cores.

The 64 core Xeon 7560 has 30% better TPC-H Power than the 64 core POWER6. The POWER6 in turn has 20% better TPC-H Throughput than the Xeon. Again, there is also wide variation in the individual queries. All seven queries in this example where SQL Server is slower than Sybase are also much slower in the DL 785 example previously discussed.

tpch100
TPC-H Power query run times

tpch100
TPC-H Power query run times

tpch100
TPC-H Power query run times