Parent

Benchmark Omissions for the Six-Core Intel Xeon AMD Opteron Processors

To date, no 4-way or 8-way TPC-H data warehouse benchmark result has been published for the six-core Xeon X7460 and no TPC-C or TPC-E OLTP benchmark result has been published for six-core Opteron. Usually, the absence of published results means the results are not competitive, in one manner or another.

TPC-C, E and H results were published for the previous generation quad-core Intel Xeon X7350. TPC-C and TPC-E results were published for the follow-on six-core Xeon X7460, but there are no 4-way or 8-way TPC-H results. Unisys did publish a 10TB TPC-H result their 16-way ES7000 with the Xeon 7460, but there is no simple way to compare this with 4-way or 8-way results at 100 or 300GB scale factors.

There are very impressive 4-way quad-core Opteron 8384 2.7GHz TPC-C and TPC-E results of 579,814 tpm-C and 635.43 tpsE respectively, but not for the six-core Opteron. For TPC-H, there is a series of 8-way results at scale factor 300GB for the quad-core and six-core Opteron processors, though curiously no 4-way results for the Opteron after dual-core 8220.

One suspected reason for the lack of 4-way or 8-way TPC-H results on the Intel Xeon X7460 is that it cannot achieve meaningful performance gains over the quad-core Xeon X7350. The large 16M L3 cache on the X7460 helps on high-call volume (>10,000 RPC/sec) benchmarks like TPC-C and TPC-E, but not in the high-row count TPC-H queries with parallel execution plans, where the lower 2.66GHz frequency is also a liability.

In the dual-core era, 4-way Intel Xeon (with the Pentium 4 based NetBurst core) and AMD Opteron systems were very close on TPC-H. It might be that the 4-way quad-core Opteron processors were not competitive with the Xeon 7300 series (Core2 architecture cores) so no TPC-H results were published. The quad-core Opteron was very competitive, significantly better even, than the Xeon 7350 without a large shared cache, in TPC-C and TPC-E. The Opteron architecture does not need as large a cache as the Xeon, but does benefit from the large 6M L3 cache in Shanghai compared with the 2M L3 cache in Barcelona.

TPC-H results were published for quad-core and six-core Opteron in the HP ProLiant DL785 8-way systems at scale factor 300GB. Previously, IBM had published an 8-way SF 300 TPC-H result for the Xeon X7350. The first 8-way Opteron quad-core had a better result, so it is possible that the Xeon bus architecture could not scale to 8-way for DW type workloads.

Significantly, the 8-way six-core Opteron 8389 2.8GHz shows a very significant TPC-H performance gain over the quad-core Opteron 8384 2.7GHz (91,558.2 QphH@300GB versus 57,684.7), more than would be suggested by the 50% increase in the number of cores and nominal frequency increase, as scaling is less than linear. Presumably, this should be attributed to micro-architecture improvements between Shanghai and Istanbul.

The most significant improvement cited by AMD is HT-assist, which is essentially a snoop filter for maintaining cache coherency in the Hyper-Transport architecture. Now ever since the Opteron with integrated memory controller and HT introduction, AMD crowed about how memory and inter-processor bandwidth scaled with the number of processors, unlike the Intel architecture, where memory and inter-processor bandwidth was bottlenecked by the front-side bus.

Well AMD neglected to mention that their scalable bandwidth was also offset by increased inter-processor communication to maintain cache-coherency (see the article by de Dela on Anandtech). So now that AMD has the snoop filter capability and a very good TPC-H result for the Opteron with HT-assist, why is there not any published TPC-C or TPC-E OLTP benchmark results?

Note that Intel had difficultly with Snoop Filter in the 5000P/X chipset. The snoop filter improved some benchmarks, and cause degradation in others. So it would be no surprise if it takes or two generations to work out the issues. The expectation is that AMD will need to work out these issues if Magny-Cours is expected to compete with Nehalem-EX systems.

For Intel, the lack of competitive Xeon 7400 series DW benchmark results will be a moot point once the next-generation Nehalem-EX systems becomes available.

Anyways, these are my suspicions. System vendors are welcome to refute any of my opinions by publishing results. I was suprised by the 2-way Xeon 5500 Nehalem TPC-H results.

Intel Xeon 5600 and 7500

Intel launched the Xeon 5600 series (Westmere-EP, 32nm) six-core processors on 16 March 2010 without any TPC benchmark results. In the performance world, no results almost always mean bad or not good results. Yet there is every reason to believe that the Xeon 5600 series with six-cores (X models only) will performance exactly as expected for a 50% increase in the number of cores at the same frequency (as the 5500) with no system level bottlenecks. The expectation is that a six-core Xeon 5600 should provide 30%+ improvement over the comparable quad-core Xeon 5500 in throughput oriented tests, particularly OLTP type workloads. Single stream parallel execution plans will probably show less gain, as scaling via parallelism is not a simple matter.

Then two weeks later on 30 March 2010, Intel launched the Xeon 7500 series 8-core processors for 4-way+ systems (and the Xeon 6500 for high-end 2-way systems) with TPC-E results on 4-way and 8-way systems but no TPC-H results. The TPC-E results were exactly what Intel said it was going to be last September at IDF, 2.5X over the previous generation Xeon 7400 series and 2.5X over the contemporary 2-way Xeon 5500 series.

My guess is that Intel wanted it to be clear that the 4-way Xeon 7500 achieved the stated performance objectives of 2.5X over the 2-way Xeon 5500, just in case some slide-decks did not mention which 2-way system the 2.5X claim referred to. Of course, the Intel statement of 2.5X for Xeon 7500 was most probably based on performance measurements already run on proto-type systems. It was probably also felt that the Xeon 5600 series is such a natural choice to supersede the 5500 series that TPC benchmarks were not essential, as there were sufficient other benchmarks to support the claims.

Earlier, I had commented about benchmark omissions from the quad-core generation on. Below is a summary of processors and systems for which TPC results are published. In brief, the Intel Core 2 architecture processors were avoiding comparisons against AMD Opteron in TPC-H, except for the 16-way Unisys system, for which there is no comparable Opteron system.

Opteron on the other hand, avoided comparison with Core2 architecture in 2-way systems and TPC-C/E OLTP benchmarks across the board. In the 2-way systems, the Intel old-FSB technology was still adequate, and the powerful Core2 architecture core was enough to beat a 2-way Opteron. There were respectable 4-way TPC-C and TPC-E results for Shanghai. When AMD announced the HT-Assist feature in Istanbul, one might have thought AMD was finally going to be able compete in 4-way OLTP. But there have been zero benchmarks published as of current.

Processor
Architecture
Process
TPC2-way4-way8-way16-way
Core2 65nm
Xeon 5300 QC
7300 QC
TPC-C
TPC-E
TPC-H
251,300
5160 only
17,686@100
407,079
479.51
34,990@100
841,809
804.0
46,034@300
-
1,250.0
-
Barcelona
65nm
QC
TPC-C
TPC-E
TPC-H
-
-
-
471,883
-
-
-
-
52,860@300
-
-
-
Core2 45nm
Xeon 5400 QC
7400 SC
TPC-C
TPC-E
TPC-H
275,149
317.45
-
634,825
729.65
-
Linux DB2
1,165.56
-
-
2,012.8 (R2)
102,778@3T
Shanghai 45nm
QC
TPC-C
TPC-E
TPC-H
-
-
-
579,814
635.4
-
-
-
57,685@300
-
-
-
Istanbul 45nm
SC
TPC-C
TPC-E
TPC-H
-
-
-
-
-
-
-
-
91,558@300*
-
-
-
Nehalem 45nm
Xeon 5500 QC
7500 8C
TPC-C
TPC-E
TPC-H
661,475†
850.0
51,086@100
-
2,022.64
-
-
3,141.76
-
-
-
-
Westmere 32nm
Xeon 5600 SC
7600 ?C
TPC-C
TPC-E
TPC-H
-
-
-
future
future
future
future
future
future
future
future
future
Magny-Cours
45nm
12C
TPC-C
TPC-E
TPC-H
future
future
future
future
future
future
future
future
future
future
future
future

* and SF 1TB report
† Xeon W5580 3.2GHz, versus X5570 2.93GHz

When the 2-way Intel Xeon 5500 processor, based on the Nehalem architecture, came out in early 2009, outstanding results were published for both the OLTP oreiented TPC-E and DW/DSS oriented TPC-H. In February 2010, a TPC-C was published as well, even though Microsoft had previously said all new OLTP benchmarks were going to be TPC-E.

There was every expectation with the Xeon 7500 Nehalem-EX, that there would be both OLTP and DW/DSS benchmark results, is Xeon 7500 should produce world-class (and world-record) results in both. It is possible that performance problems were encountered in trying to achieve good scaling over 32 cores and 64 threads in a 4-way Xeon 7500 system. If this is identified as something that can be fixed in the Windows operating system or SQL Server engine, then a change request would be made. I seriously doubt that another processor stepping would be done for this, as Xeon 7500 is already D-step at release.

It is also quite possible Intel will have face the fact that 2.5X the 2-way Xeon 5500 TPC-H SF100 result of 51,000 QphH is not going to be achieved no matter how good Xeon 7500 is at DW. This is because TPC-H uses a geometric mean of the 22 queries, and the small queries in TPC-H are unlikely get any faster with more cores, especially at lower frequency. Achieving 2.5X in the big queries is a more meaningful goal.

ModelCoresThreadsGHzL3QPI GT/sMemoryPrice*
X56806123.3312M6.41333$1,663
X56706122.9312M6.41333$1,440
X56606122.8012M6.41333$1,219
X56506122.6612M6.41333 $996
E5640483.3312M5.861066 $774
--------

* Intel 1k pricing