NEC Express5800/A1080a

NEC announced their 8-way server with Intel's Xeon 7500 launch on 30 March 2010, along with a published TPC-E performance benchmark result. The listed availability date was 24 June 2010, so the 5800/A1080a should now be available, although I have not yet ordered a system for my own personal use.

Producing a respectable benchmark result for a high-end system on a new system architecture is a major effort because a great deal of back and forth exchanges on the entire software stack are necessary to resolve issues in previously unexplored territory.

Having the benchmark result ready to publish on Intel's processor launch date is important because it leverages extensive publicity of a major Intel product launch. Especially considering that the NEC Express was the only 8-way TPC benchmark result at the time (the other TPC results was an IBM 4-way, there were also Fujitsu 8-way SAP SD 2-Tier results).

In the past, I have frequently cited system examples with Dell or HP servers, as they are systems with which I have frequent direct experience using. IBM is also a major vendor in the US server market, but as an independent consultant, I only rarely encounter IBM servers. So not having extensive experience on IBM servers, I do not cite examples unless there are unique features. (IBM does published redbooks with a great deal of technical details, which can be very helpful.)

When Intel announced the Xeon 7500, along with the IBM 4-way and NEC 8-way TPC-E results, I did not discuss the architecture of either system, but did comment on the performance relative to the previous generation Xeon 7400 architecture and the lower end concurrent generation Xeon 5500 architecture (see Intel Xeon 5600 and 7500). I also commented that I thought Intel deliberately held up publication of TPC results for the Xeon 5600 on 16 March in order to not detract from the then upcoming Xeon 7500 announcement.

Last week, I was at HP Technology Forum where HP announce the new ProLiant G7 series along with TPC-C, E and H results, including an 8-way TPC-H for the Xeon 7500 (new HP ProLiant servers). I also mentioned that were new Dell and Fujitsu TPC-E results, but did not discuss them because there were already published results for 2-way Xeon 5600 and 4-way Xeon 7500.

Apparently the NEC America Product Marketing saw that blog and was miffed that I did not mention their 8-way TPC-E report of 30 March (NEC Express 5800/A1080a TPC-E report).

Since I have frequently discussed high-end NUMA systems (Big-Iron Revival and NUMA Systems) I suggested doing a couple of papers, one for scaling SQL Server on big-iron for transaction processing, the other on storage performance for data warehouse. They have promised to provide access to their systems, so hopefully I can get these out in the near future.

NEC Express5800/A1080a TPC-E Benchmark Report

In the mean time, I will discuss some details of the NEC 8-way Xeon 7500 TPC-E report. The table below compares the NEC 8-way Xeon 7500 TPC-E with the 16-way Xeon 7400 and 4-way Xeon 7500 systems.

SystemUnisys ES7000NEC 5800/A1080IBM x3850 X5
ProcessorXeon 7460Xeon 7560Xeon 7560
Sockets-Cores16 x 6 = 968 x 8 = 644 x 8 = 32
Storage IO15x2 FC7x2 FC6x2 SAS
Storage870+12 HDD1872+20 HDD1008 HDD
Windows Server2008 R2 DC2008 R2 DC2008 R2 EE
SQL Server2008 R2 DC2008 R2 DC2008 R2 EE

The 8-way Xeon 7560 is 56% better on TPC-E than the 16-way Xeon 7460, (2.7X better than a 8-way 7460, not listed) and 55% better than a 4-way 7560. The Xeon 7500 performance relative to the previous generation Xeon 7400 is in line with information disclosed at Intel Developer Forum 2009. The scaling is below the expectation set by Microsoft for Windows Server 2008 R2. A recent Fujitsu report achieved slightly better 4-way Xeon 7560 result with 512GB memory and SSD storage.

In previous generations, 1.5X scaling for each doubling of processor cores at this level was the best that could be achieved. Last year, Microsoft disclosed that the a number of improvements in Windows Server 2008 R2, particularly in the elimination of critical locks, allows for substantially improved high-end scaling. Scaling of 1.7X, presumably measurements on an Itanium system at 64 and 128 cores (128 and 256 logical processors) was reported. Still, much work is required in actually achieving the maximum possible scaling, especially on a brand new system architecture. So the expectation is that 8-way Xeon 7500 will gradually see slightly better scaling relatve to the 4-way.

Below is a diagram of the NEC Express5800A1080a configuration used in their 8-way TPC-E report

NEC Express Overview

I am not familiar with the NEC D3-10 storage system. It appears to consist of an external controller in an enclosure for 12 3.5in disks, and can be daisy-chain to additional 12-disk enclosures. The controller front-end interface is 4Gbps FC (perhaps an 8Gbps FC front-end will also be available). The back-end is SAS. I think this is the correct choice for storage systems. There really is no value to incurring the expense of running FC to the disks.

Of the 7 dual-port FC adapters in the A1080a, 13 FC ports connects to storage for the data files. The last FC port connects to storage for the log file.

Storage ArchitectureNEC
HBA7 dual-port 4Gbps FC
FC ports (data)13
RAID controllers per port6 (78 total)
Addn'l disk enclosures78 total
Disks per RAID controller24 (1872 total)
Disks per FC port6x24 = 144
LUNs per RAID controller2 (12 disks per LUN)
LUNs2x78 = 156

Assuming that this is a cost optimized design, we can estimate that the IO load on the data disks is close to 175 IOPS per 15K disk. The IOPS per FC port is then in the range of 25,000 and 50K IOPS per HBA. For the 8KB SQL Server page size, this is 200MB/sec per 4Gbps FC and 400MB/sec per HBA, well under the 330MB/sec (practical) capability of 4Gpbs FC.

Another point concerns the number of data files. The total number of disks was determined by IO load, for which this benchmark result required in the range of 1800+ disks. The disks were distributed across multiple controllers so as to not create too much IO load on any given HBA or RAID controller (in terms of both IOPS and MB/s). The number of LUNs created from the 1872 disks was really determined by practical configuration on disks per LUN, and 12 disks per LUN is practical.

Many people like to cite without any supporting data the rule for the ratio of data file to processor cores. One should really ask if all of this was because long long ago, there was a benchmark configuration with 4 cores and 4 data files. The reason for the number files had to with the storage system configuration. In this world, there are far too many people with limited intelligence, that must have rules to live by, without any interest in the reason behind the rule. It is as if blind adherence to the rule without any understanding will absolve them of the consequences of their actions. I do not believe there was ever any substance to the cores-file ratio rule. It was all some numnut who misinterpreted two uncorrelated facts in a reference system, and inferred that this must a rule. Other people of comparable intelligence then perpetuated this new rule.

The rear view shows the 7 dual-port FC adapters.

5800/A1080a rear view

I did not find an architecture diagram of the NEC Express5800/A1080a. My understanding is that it is the standard glueless arrangement shown below, possibly with three IOH, not four.

5800/A1080a rear view

There were a number of tuning techniques mention in the NEC 5800/A1080a TPC-E report, very similar with the tuning methods employed in other TPC-C and TPC-E reports. So are absolutely critical, some are for achieving the last 1-2%, and others are merely convenient for the choice storage. All of this should be discussed in a detailed report.

Link for NEC America 5800/A1080a

Intel slidedeck for Xeon 7500 processor launch.