Intel launched the Xeon 5600 series (Westmere-EP, 32nm) six-core processors on 16 March 2010 without any TPC benchmark results. In the performance world, no results almost always mean bad or not good results. Yet there is every reason to believe that the Xeon 5600 series with six-cores (X models only) will performance exactly as expected for a 50% increase in the number of cores at the same frequency (as the 5500) with no system level bottlenecks. The expectation is that a six-core Xeon 5600 should provide 30%+ improvement over the comparable quad-core Xeon 5500 in throughput oriented tests, particularly OLTP type workloads. Single stream parallel execution plans will probably show less gain, as scaling via parallelism is not a simple matter.
Then two weeks later on 30 March 2010, Intel launched the Xeon 7500 series 8-core processors for 4-way+ systems (and the Xeon 6500 for high-end 2-way systems) with TPC-E results on 4-way and 8-way systems but no TPC-H results. The TPC-E results were exactly what Intel said it was going to be last September at IDF, 2.5X over the previous generation Xeon 7400 series and 2.5X over the contemporary 2-way Xeon 5500 series.
My guess is that Intel wanted it to be clear that the 4-way Xeon 7500 achieved the stated performance objectives of 2.5X over the 2-way Xeon 5500, just in case some slide-decks did not mention which 2-way system the 2.5X claim referred to. Of course, the Intel statement of 2.5X for Xeon 7500 was most probably based on performance measurements already run on proto-type systems. It was probably also felt that the Xeon 5600 series is such a natural choice to supersede the 5500 series that TPC benchmarks were not essential, as there were sufficient other benchmarks to support the claims.
Earlier, I had commented about benchmark omissions from the quad-core generation on.
Below is a summary of processors and systems for which TPC results are published.
(The Intel Xeon 7500 Processor Product Brief shows 3.03X relative to 7400 for
OLTP Brokerage Database, which is TPC-E, but 2022 over 729 is 2.77X.
see Performance Benchmarks for updates.)
| Processor Architecture Process | TPC | 2-way | 4-way | 8-way | 16-way |
|---|---|---|---|---|---|
| Core2 65nm Xeon 5300 QC 7300 QC |
TPC-C TPC-E TPC-H |
251,300 5160 only 17,686@100 |
407,079 479.51 34,990@100 |
841,809 804.0 46,034@300 |
- 1,250.0 - |
| Barcelona 65nm QC |
TPC-C TPC-E TPC-H |
- - - |
471,883 - - |
- - 52,860@300 |
- - - |
| Core2 45nm Xeon 5400 QC 7400 6C |
TPC-C TPC-E TPC-H |
275,149 317.45 - |
634,825 729.65 - |
Linux DB2 1,165.56 - |
- 2,012.8 (R2) 102,778@3T |
| Shanghai 45nm QC |
TPC-C TPC-E TPC-H |
- - - |
579,814 635.4 - |
- - 57,685@300G |
- - - |
| Istanbul 45nm 6C |
TPC-C TPC-E TPC-H |
- - - |
- - - |
- - 91,558@300G* |
- - - |
| Nehalem 45nm Xeon 5500 QC 7500 8C |
TPC-C TPC-E TPC-H |
661,475† 850.0 51,086@100G |
- 2,022.64 - |
- 3,141.76 162,601@3TB |
- - - |
| Westmere 32nm Xeon 5600 6C 7600 ?C |
TPC-C TPC-E TPC-H |
803,068 1,110 - |
future future future |
future future future |
future future future |
| Magny-Cours 45nm 12C |
TPC-C TPC-E TPC-H |
705,652 887.4 - |
1,193,472 1,464 107,561@300G |
n/a n/a n/a |
n/a n/a n/a |
* and SF 1TB report
† Xeon W5580 3.2GHz, versus X5570 2.93GHz
Magny-Cours will not support >4 socket systems
In brief, the Intel Core 2 architecture processors were avoiding comparisons against AMD Opteron in TPC-H, except for the 16-way Unisys system, for which there is no comparable Opteron system.
Opteron on the other hand, avoided comparison with Core2 architecture in 2-way systems and TPC-C/E OLTP benchmarks across the board. In the 2-way systems, the Intel old-FSB technology was still adequate, and the powerful Core2 architecture core was enough to beat a 2-way Opteron. There were respectable 4-way TPC-C and TPC-E results for Shanghai. When AMD announced the HT-Assist feature in Istanbul, one might have thought AMD was finally going to be able compete in 4-way OLTP. But there have been zero benchmarks published as of current.
When the 2-way Intel Xeon 5500 processor, based on the Nehalem architecture, came out in early 2009, outstanding results were published for both the OLTP oreiented TPC-E and DW/DSS oriented TPC-H. In February 2010, a TPC-C was published as well, even though Microsoft had previously said all new OLTP benchmarks were going to be TPC-E. This result was with SQL Server 2005 instead of 2008.
There was every expectation with the Xeon 7500 Nehalem-EX, that there would be both OLTP and DW/DSS benchmark results, as Xeon 7500 should produce world-class (and world-record) results in both. It is possible that performance problems were encountered in trying to achieve good scaling over 32-cores and 64-threads in a 4-way Xeon 7500 system. If this is identified as something that can be fixed in the Windows operating system or SQL Server engine, then a change request would be made. I seriously doubt that another processor stepping would be done for this, as Xeon 7500 is already D-step at release.
It is also quite possible Intel will have to face the fact that 2.5X the 2-way Xeon 5500 TPC-H SF100 result of 51,000 QphH is not going to be achieved no matter how good Xeon 7500 is at DW. This is because the TPC-H scores is a geometric mean of the 22 queries. There are several small queries in TPC-H, two of which already run in under 1 seconds on the 2-way 8-core Xeon 5570 for SF100, and several that run near or under 2 seconds. There is limited opportunity to continue to improve the performance of small queries with increasing degree of parallelism, as the overhead to setup each thread becomes larger compared to the actual work done be each thread, especially if one also has the give up frequency, dropping from 2.93 to 2.26GHz. It would be helpful to know what the actual frequency is during a performance run with the turbo-boost feature.
It is possible that some marketing putz does not understand this and denied permission to publish perfectly good Xeon 7500 TPC-H results because it did not meet the 2.5X goal. (Along with making a negative ranking and review entry for the person responsible for TPC-H benchmarking due to failing to achieve the 2.5X goal. But lets not grind axes on here. Besides, who said life was fair? It takes exceptional talent to accomplish the impossible. A clever person anticipates impossible problems, and transfers to another group to avoid a sticky wicket).
Achieving 2.5X in the big queries is a more meaningful goal. Achieving 50% better than the 8-way Opteron 6-core TPC-H SF300 or SF1TB would also be a worthwhile accomplishment, if Xeon 7500 were upto the task.
Finally, a quick comment on Xeon 7500 scaling from 4-way (32-cores, 64-threads) to 8=way (64-cores, 128-threads). In the past, achieving 1.5 scaling with this number of cores would have been a triumph. Given the announcement Microsoft made on Windows Server 2008 R2, on removing the thread scheduler and other impediments to high-end scaling, we were expecting 1.7X scaling. It could be that scaling beyond 64-threads in tricky, because of the 64-thread limit per group(insert correct terminology). Hopefully the 4-way to 8-way to 16-way scaling will improve over time as problems are solved one at a time, while the task master whips his/her draft horses (again, I digress).
Lets take a look at the Xeon 5600, 7500 and 6500 SKUs. The low-voltage, low power SKUs are omitted. These are fine products for high-density environments, web servers, and utility database. The Line-of-business and DW databases should be on the X models.
Xeon 5600 SKUs| Model | Cores | Threads | GHz | Turbo | L3 | QPI GT/s | Memory | Power | Price* |
|---|---|---|---|---|---|---|---|---|---|
| X5680 | 6 | 12 | 3.33 | 3.6 | 12M | 6.4 | 1333 | 130 | $1,663 |
| X5670 | 6 | 12 | 2.93 | 3.33 | 12M | 6.4 | 1333 | 95 | $1,440 |
| X5660 | 6 | 12 | 2.80 | 3.2 | 12M | 6.4 | 1333 | 95 | $1,219 |
| X5650 | 6 | 12 | 2.66 | 3.06 | 12M | 6.4 | 1333 | 95 |  $996 |
| E5640 | 4 | 8 | 2.66 | 2.93 | 12M | 5.86 | 1066 | 80 |  $774 |
| E5630 | 4 | 8 | 2.53 | 2.8 | 12M | 5.86 | 1066 | 80 |  $551 |
| E5620 | 4 | 8 | 2.40 | 2.66 | 12M | 5.86 | 1066 | 80 |  $387 |
| X5677 | 4 | 8 | 3.46 | 3.73 | 12M | 6.4 | 1333 | 130 | $1,693 |
| X5667 | 4 | 8 | 3.06 | 3.46 | 12M | 6.4 | 1333 | 95 | $1,440 |
* Intel 1k pricing
Xeon 7500 SKUs| Model | Cores | Threads | GHz | Turbo | L3 | QPI GT/s | Memory | Power | Price* |
|---|---|---|---|---|---|---|---|---|---|
| X7560 | 8 | 16 | 2.26 | 2.66 | 24M | 6.4 | 1066? | 130 | $3,692 |
| X7550 | 8 | 16 | 2.00 | 2.4 | 18M | 6.4 | ? | 130 | $2,729 |
| E7540 | 6 | 12 | 2.00 | 2.26 | 18M | 6.4 | ? | 105 | $1,980 |
| E7530 | 6 | 12 | 1.86 | 2.13 | 18M | 5.86 | ? | 105 | $1,391 |
| E7520 | 4 | 8 | 1.86 | 1.86 | 18M | 4.8 | ? | 95 |  $856 |
| X7542 | 6 | 6 | 2.66 | 2.8 | 18M | 5.86 | ? | 130 | $1,980 |
| Model | Cores | Threads | GHz | Turbo | L3 | QPI GT/s | Memory | Power | Price* |
|---|---|---|---|---|---|---|---|---|---|
| X6550 | 8 | 16 | 2.00 | 2.4 | 18M | 6.4 | ? | 130 | $2,461 |
| E6540 | 6 | 12 | 2.00 | 2.26 | 18M | 6.4 | ? | 105 | $1,712 |
| E6510 | 4 | 8 | 1.73 | 1.73 | 12M | 4.8 | ? | 105 |  $744 |
Before commenting, recall the main differences between the Xeon 5600 and Xeon 7500/6500 series. The Xeon 5600 series (32nm process) has 2 QPI links and 3 memory channels. The Xeon 7500 series (45nm process) has 4 QPI links, 4 memory channel, larger cache per core (for the 24M version, 3M vs 2M) plus extensive reliability features. The 2 QPI links on the 5600 series allows a 2-way (socket) system. The 4 QPI links on the 7500 series allows glueless 4-way and 8-way. My understanding is the 6500 series is the 7500 with only 2 QPI links enable for 2-way systems with 16-cores and 8 memory channels total, at lower frequency than the 5600 with 12-cores and 6 memory channels total, plus the 7500 RAS features.
Now lets looks at system pricing for the 2-way Dell PowerEdge T710 (Xeon 5600), R810 (either 7500 or 6500) and the 4-way R910 (7500). All systems with redundant power supplies, 2x73GB 15K 2.5in drives, 6Gb/s SAS. 4 power supplies in the 4-way
Dell PowerEdge T710 Systems with 2 Xeon 5600 processors| System | Processor | GHz | Cores | L3 | QPI | - | Memory | Price |
|---|---|---|---|---|---|---|---|---|
| T710 | X5680 | 3.33 | 6 | 12M | 6.4 | 1333 | 72GB 18x4G | $9,974 |
| T710 | X5660 | 2.80 | 6 | 12M | 6.4 | 1333 | 72GB 18x4G | $8,634 |
| T710 | X5650 | 2.66 | 6 | 12M | 6.4 | 1333 | 72GB 18x4G | $8,154 |
| T710 | E5640 | 2.66 | 4 | 12M | 5.86 | 1066 | 72GB 18x4G | $7,474 |
| T710 | E5630 | 2.53 | 4 | 12M | 5.86 | 1066 | 72GB 18x4G | $6,934 |
For some reason, Dell does not offer the T710 with the second from top X5670 2.93GHz.
Dell PowerEdge R810 Systems with 2 Xeon 7500 or 6500 processors| System | Processor | GHz | Cores | L3 | QPI | - | Memory | Price |
|---|---|---|---|---|---|---|---|---|
| R810 | X7560 | 2.26 | 8 | 24M | 6.4 | 1066 | 64GB 16x4G | $17,866 |
| R810 | X7542 | 2.66 | 6 | 12M | 5.86 | ? | 64GB 16x4G | $13,366 |
| R810 | X6550 | 2.00 | 8 | 18M | 6.4 | 1066 | 64GB 16x4G | $13,066 |
| R810 | E7540 | 2.00 | 6 | 18M | 6.4 | 1066 | 64GB 16x4G | $12,166 |
| R810 | E6540 | 2.00 | 6 | 18M | 6.4 | 1066 | 64GB 16x4G | $11,496 |
| System | Processor | GHz | Cores | L3 | QPI | - | Memory | Price |
|---|---|---|---|---|---|---|---|---|
| R910 | X7560 | 2.26 | 8 | 24M | 6.4 | 1066 | 64GB 16x4G | $19,246 |
| R910 | X7550 | 2.00 | 8 | 18M | 6.4 | 1066 | 64GB 16x4G | $16,446 |
| R910 | E7540 | 2.00 | 6 | 18M | 6.4 | 1066 | 64GB 16x4G | $13,546 |
| R910 | E7530 | 1.86 | 6 | 18M | 5.86 | 980 | 64GB 16x4G | $12,446 |
| System | Processor | GHz | Cores | L3 | QPI | - | Memory | Price |
|---|---|---|---|---|---|---|---|---|
| R910 | X7560 | 2.26 | 8 | 24M | 6.4 | 1066 | 128GB 32x4G | $34,040 |
| R910 | X7550 | 2.00 | 8 | 18M | 6.4 | 1066 | 128GB 32x4G | $28,440 |
| R910 | E7540 | 2.00 | 6 | 18M | 6.4 | 1066 | 128GB 32x4G | $22,640 |
| R910 | E7530 | 1.86 | 6 | 18M | 5.86 | 980 | 128GB 32x4G | $20,440 |
Previously, I had argued that processors and systems today were so powerful that the standard practice of buying 4-way systems for critical database server by default be changed to 2-way. What I mean by default is in lieu of proper system sizing analysis.
It may seem strange that I suggest not doing a proper sizing analysis (one of my services as a consultant). But from the sizing analysis I have seen done by other people, the quality of the work was poor and the effort cost more than a pair 4-way systems.
What this means is that the practical solution used to be to buy a 4-way system. Try it out. If it not sufficient, then hire someone (there are many people who can do this) to make it work on a 4-way. If that does not work, consider pruning features until it does work.
So why not just move up to an 8-way or larger system? Because 8-way and larger are mostly NUMA systems. Technically, all Opteron 2-way and up are NUMA. But by NUMA, I really mean systems where there is a large discrepancy between local and remote node memory access. There are very very few people who can do performance analysis on a NUMA system (not those who claim to be able to). Do a search on SQL NUMA to see who has published meaningful material on this matter.
Anyways, the default choice today should be a 2-way system. However, since this is critical system, perhaps there are features from the high-end that we want. I believe this is the rational for the Xeon 6500 from Intel, and the PowerEdge R810 from Dell.
In looking over the T710, R810 and R910, I am inclined to say the effort was not entirely successful, as with many first iterations. The effort definitely deserves merit, and is the proper direction for the future. But it just needs further refinement. Of course, the true measure whether people actually buy the R810 in volume, not just one persons opinion.
The R810 with either X7560 or X6550 just gives up too much frequency for the extra 2 cores per socket, and fourth memory channel. Some environments might want the X7500/6500 RAS features despite this. And there is only a $1400 price difference between the R810 and R910 with 2 sockets populated.
The amount of $1,400 is very small for having two extra sockets available, even though most people never populate sockets after system purchase. It would be nice if could buy the R910 with 4-sockets populated, but not have to pay the per-socket software licensing until they are turned-on, like in RISC world.
True, the R810 is a 2U form factor compared with 4U for the R910, allowing much higher density. But the assumption was this is a critical database server, for which an extra 2U is not a show stopper. (There are people who get hung up on the latest industry jargon/fads, and forget the job one is making sure your business in running.)
AMD Opteron 6176 (Magny-Cours) 2-way 12-core results have been just published, with the HP ProLiant DL385G7. I will add more detail later. The 2-way TPC-E result is 887.38 and the TPC-C result is 705,652. Interestingly, both the HP ProLiant DL370G6 with the Xeon W5580 and the DL385G7 Opteron TPC-C results are on SQL Server 2005. Perhaps the Microsoft mandate to use TPC-E is for SQL Server 2008, hence the C on 2005 was allowed? Also of interest is that the Opteron 6176 TPC-C result uses 125 SSDs instead of hard disks (1300 HDs in the Xeon W5580 result).
Before comparing the Opteron 12-core with Xeon 5500, let us first compare against the previous generation Xeon 5400 quad-core. The 2-way 12-core Opteron 6176 achieved OLTP results higher than the Xeon 5460 by 2.5X on TPC-C and 2.8X on TPC-E. These are very good results for a 3X increase in the number of cores. Now in comparing against the quad-core Xeon 5500 series, the 12-core Opteron is just marginally higher. I am inclined to think much of this is due to the Hyper-Threading capability in the Xeon 5500 series. HT was much maligned in the NetBurst architecture generation. Some people today still blindly regurgitate the advice to disable HT, not realizing this advice was applicable to the old NetBurst and not the new Nehalem architecture processors. At some point AMD may have to admit that implementing HT will be a necessity.
The price for the DL385G7 with 2x6176 processors from the TPC-H report is $1,511 for the system chassis, $1,799 for each processor, $990 for each 8GB kit, and perhaps another $1K for comparable configuration as above. This is very reasonable, except for the memory which seems high. Each 8GB kit should be around $500.
Magny-Cours is comprised of two six-core Istanbul die(?) each with 4x0.5 L2 cache and 6M L3. The Istanbul die size is 346mm2, versus 540 mm2 for Nehalem-EX with 8-cores and 24M L3. The images below were adjusted to match the die size closely, but there is no assurance that the aspect ratios are correct.
For some reason I thought Nehalem EX was 540 mm2 when in fact the Intel website says it is 684 mm2. The figure below shows the corrected scaling.
HP has just announced the ProLiant DL580 G7 and DL980 G7 servers based on the Xeon 7500 series processors,
and the DL585 G7 4-way server with the 12-core AMD Opteron 6100 series (Magny-Cours).
Apparently the reason for the delay is that the 8-way DL980 G7 employs custom silicon node controllers (XNC),
and possibly, so HP could make a splash in announcing all three system at their big annual conference:
HP Technology Forum. The DL580 and 585 G7 are available now(?),
and the 980 G7 should be available later in Q3.
While the Intel Xeon 7500 processor allows a glue-less 8-way system, HP felt that the design could be improved with node controllers. The node controllers reduce snoop traffic for a majority of memory accesses, and can achieve a 30% reduction in memory latency in some circumstances. It should be considered that HP needed to build custom silicon crossbar (& node controllers) for their SuperDome2 system and the Itanium 9300 series processors, which use the same QuickPath Interconnect (QPI) as the Nehalem processors. There are differences in the way the Itanium and Xeon processors use QPI. There are also differences between the node controllers for the Itanium and Xeon systems. (The Itanium node controller implements directory based cache coherency and the Xeon node controller is snoop filter).
HP may have built a glueless 8-way Xeon 7500 system if they had not already invested the effort to built the XNC for their Itanium systems. This also means that HP should have the components to built a 16-way Xeon 7500 system, meaning that if there were market demand, such a system could be brought to market. Intel did say that there were 16-way Xeon 7500 system designs, but none have surfaced yet.
Dell has also released a 2-way TPC-E result for the Xeon 5600, and Fujitsu released a 4-way TPC-E result for the Xeon 7500
A comparison of the TPC-H 300GB results for the 8-way ProLiant DL785 G6 and the 4-way DL585 G7 is interesting, with the 4-way DL585G7 having 18% better performance on the Power metric.
| System | TPC-H Power | TPC-H Throughput | TPC-H Composite QphH |
|---|---|---|---|
| DL785G6 | 109,067.1 | 76,860.0 | 91,558.2 |
| DL585G7 | 129,198.3 | 89,547.7 | 107,561.2 |
The significant differences between the two systems are below. Both system have the same number of total cores, the 8-way with 6-core processors and the 4-way with 12-core processors. The DL785G6 cores are 2.8GHz versus the DL585G7 at 2.3GHz, about a 20% difference. The DL585G7 has twice the memory, 512GB versus the 256GB. For TPC-H at SF300, and using SQL Server 2008 page compression, 256GB is not quite sufficient to encompass the entire database tables and indexes. With 512GB, there is more than sufficient memory for data, indexes and probably most hash join intermediate results (for minimal tempdb activity)
| System | DL785G6 | DL585G7 |
|---|---|---|
| Processor | Opteron 8439 | Opteron 6167 |
| Sockets-Cores | 8 x 6 = 48 | 4 x 12 = 48 |
| Frequency | 2.8GHz | 2.3GHz |
| Memory | 256GB | 512GB |
| Storage | 194 HDD | 4 SSD |
| Windows Server | 2008 EE SP1 | 2008 R2 EE |
| SQL Server | 2008 EE SP1 | 2008 R2 EE |
That the DL585G7 employs SSD storage is not expected to impact performance, and was probably used for lower cost. The 194 15K HDDs and 12 storage enclosures in the DL785 cost $110K, while the 4 320GB Fusio-IO drives in the DL585 cost $55K. If the DL585 had 256 or less memory, then the SSD storage would have moderately better performance than with HDD storage. Another significant difference are the improvements in Windows Server 2008 R2, several of which have major impact scaling to a high number of processor cores.
The chart below shows the TPC-H power query run times for the DL585G7 relative to the DL785G6.
TPC-H Power query run times, DL585G7 relative to DL785G6
Overall, the DL585G7 with 4 Opteron 6167 is about 20% higher than the DL785G6 with 8 Opteron 8439 processors. For the individual queries, several are moderately faster, 3 are much faster, 5 are about the same, and 3 are actually significantly slower. The DL785 has faster processors, which should make all queries run faster. It is difficult to account for differences in the system architecture, as there may be difference in how the individual dies are connected. The greater memory on the DL585 is expected to make certain queries run faster. The scaling improvements in R2 (OS and SQL) might contribute significant gains in some queries, but may also negative effects in others.
It would be very helpful to have access to the actual execution plans, along with execution statistics to determine if the differences can be attributed plans differences or differences in disk IO.
Below are the TPC-H 3000GB results for the 8-way ProLiant DL980 G7 with the Xeon 7560 processor and the 16-way ES7000 with the Xeon 7460. The 32-way dual-core IBM 5GHz Power6 result is also shown.
| System | TPC-H Power | TPC-H Throughput | TPC-H Composite QphH |
|---|---|---|---|
| 16 x Xeon 7460 | 120,254.8 | 87,841.4 | 102,778.2 |
| 8 x Xeon 7560 | 185,297.7 | 142,685.6 | 162,601.7 |
| 32 x Power6 | 142,790.7 | 171,607.4 | 156,537.3 |
Additional details are below:
| System | ES7000 | DL980G7 | Power 595 |
|---|---|---|---|
| Processor | Xeon 7460 | Xeon 7560 | Power6 |
| Sockets-Cores | 16 x 6 = 96 | 8 x 8 = 64 | 32 x 2 = 64 |
| Hyper-Threading | no | yes | 4/core? |
| Frequency | 2.66GHz | 2.26GHz | 5.0GHz |
| Memory | 1024GB | 512GB | 512GB |
| Storage | 914 HDD | 660 HDD | 288 HDD |
| OS | 2008 R2 DC | 2008 R2 EE | AIX 6.1 |
| Database | 2008 R2 DC | 2008 R2 EE | Sybase 15.1 |
The Unisys system may have been over-configured in disks and memory. Many of the TPC-H queries involve large table (or range) scans). If the entire entire database cannot be brought into memory, then there may not be much difference in the disk IO generated with either 512G or 1TB memory. More importantly, the Windows operating system and SQL Server versions match, so there is high confidence we are seeing mostly the difference between the two processor (and system) architectures.
The IBM system may appear to be under-configured in terms of the number of disk drives. But it does seem that other database engine are better in switching from pseudo-random to sequential scan operations, and can work fine with fewer disks.
While the Xeon 7400 series processor core was top of the line in its time, even the 4-way Xeon 7400 system had limited memory bandwidth (and channels). Scaling beyond 4-way was not a simple matter. Of course, the Xeon 7400 systems were still competitive with systems based on processors with better scalability, but weaker single core performance.
Based on the 16-way Xeon 7460 result, the expectation is that an 8-way Xeon 7460 would be in the range of 75,000, i.e., doubling the number of processors should increase performance by 1.6X. In turn, there is sufficient reason to estimate that the Xeon 7560 is about 2.5X more powerful than the Xeon 7460 for data warehouse usage. This is less than the 2.77X observed in OLTP, which is inline with expectations because OLTP derives substantial benefits from Hyper-Threading (30%?) and data warehousing derives only a modest benefit from HT (10%?).
The chart below shows the TPC-H power query run times for the 8-way Xeon 7560 relative to the 16-way Xeon 7460.
TPC-H Power query run times, 8-way Xeon 7560 relative to 16-way 7460
As with the earlier comparison, there is also wide variation in the individual queries. Many queries are 40% faster, two are about the same, two are actually slower, and one is more than 5X faster.
Below are the TPC-H 1000GB results for the 8-way ProLiant DL785 G6 with the Opteron 8439 processor and the 16-way Integrity Superdome 2 with the Itanium 9350.
| System | TPC-H Power | TPC-H Throughput | TPC-H Composite QphH |
|---|---|---|---|
| 8 x Opteron 8439 | 95,789.1 | 69,367.6 | 81,514.8 |
| 16 x Itanium 9350 | 139,181.0 | 141,188.3 | 140,181.1 |
Additional details are below:
| System | DL785 G6 | Superdome 2 | |
|---|---|---|---|
| Processor | Opteron 8439 | Itanium 2 9350 | |
| Sockets-Cores | 8 x 6 = 48 | 16 x 4 = 64 | |
| Hyper-Threading | no | yes | |
| Frequency | 2.8GHz | 1.73GHz | |
| Memory | 512GB | 512GB | |
| Storage | 240 HDD | 576 HDD | |
| OS | 2008 R2 EE | HP-UX | |
| Database | 2008 EE | Oracle 11g R2 |
The operating system and database engine are both completely different, so caution is warranted in comparing the results. Also very important is that the execution plans could also be very different in certain queries.
As the expectation is that doubling the number of processors should lead to approximately 1.6X performance gain, we can see that six-core Opteron 8439 is the same neigbhorhood as the quad-core Itanium 2 9350. The individual Opteron processor is probably a little better than the Itanium at the socket level in the TPC-H Power test, but the Itanium has the advantage in through-put oriented usage.
The chart below shows the TPC-H power query run times for the 16-way Itanium relative to the 8-way Opteron.
TPC-H Power query run times, 16-way quad-core Itanium relative to 8-way 6-core Opteron
As expected, there is wide variation in the individual queries. The are differences in almost every important area: the processor and system architecture, the operating system and the database engine. It is not just the difference in the database engine, but also the execution plans.