In many previous articles, I have discussed processor and system architecture, server systems, and performance benchmarks, often mixing more than one topic. Going forward, I will consolidate each topic into separate collections.
Performance Benchmarks
Server Systems
Systems Architecture
Processors (Architecture)
Joe Chang jchang6@yahoo.com
Before jumping into system configuration, it is useful first to understand system architecture, including how the current system architecture came to be. A discussion of system architecture inevitably involves a comparison of the Intel Xeon versus Opteron performance characteristics, so this is briefly covered. It is important to distinguish performance characteristics at the processor and system levels where possible.
This article has been split into three additional sections.
Systems Architecture - Parent
Historical
AMD Opteron
Intel QPI
Dell PowerEdge
HP ProLiant and IBM x Series
NEC Express5800/A1080a (2010-06)
One cannot simply discuss comparative system architectures without bringing up performance. To discussion the relative merits of each system architecture, it is important to distinguish between performance at the processor core level and at the complete system level as much as practical. Performance at the individual core level is still relevant because some important functions are not yet multi-threaded. The two traits of interest at the system level are scalability and overall performance. Scalability is characterized by performance or throughput versus the number of processors (cores and sockets). The important metrics are single core and the overall system performance. A system architecture with excellent scalability wins points for system architecture. If the underlying processor performance is weak, it may not be competitive at the system level against another system with lesser scalability, starting with a more powerful processor core or socket.
Next, performance cannot be characterized with single metric, especially considering that each of the processor architectures have completely different characteristics. On any given manufacturing process, the transistors are (or were down to 90nm) more or less comparable between Intel, AMD, IBM or other foundries. An important point is that the frequency on one processor architecture has no relation to frequency on a different processor architecture. (Processor frequency is like engine RPM. The important metric is power. For an engine, power is RPM x torque. For a processor, performance is frequency x IPC, or instructions per cycle.)
See the benchmark link for more information: Performance Benchmarks
A 4-way system like the Dell R900 has several processor options. Currently the Dell R900 processor options range from the Quad-Core Xeon E7310 1.6GHz, 2x2M L2 at $11,423, QC Xeon X7350 2.93GHz 2x4M L2 at $17,523 and the six-core X7460 2.67GHz 16M L3 at $20,423. All comparisons are done with 128GB (32x4GB) memory. The Dell R905 is $9,469 with the 2.4GHz Opteron 8378 and $13,119 with the 2.7GHz Opteron 8384. It does seem that the cost of the processor options in the bare system with processors and memory sockets populated is reasonably proportional to processor frequency (or expected performance). However, in considering the complete cost of the system with storage, and software licensing, especially if per processor licensing applies, (not even including operating costs), the cost difference between the low and high is not worth considering.
As discussed earlier, the system architecture has multiple memory channels. Depending on the system, while uneven memory configurations can be supported, best performance is achieved with identical memory modules populated in banks of 4 or 8, matched with the number of memory channels. In past years, it might be worthwhile to discuss how to analyze memory capacity requirements.
Today, a 4GB ECC DIMM costs around $115, and an 8GB ECC DIMM costs around $700 (Crucial, 5/20/2009). The typical 4-way system, including the Dell R900, has 32 DIMM sockets. It really is not worth the effort to determine if a system needs less than 128GB. By the default, the system should just be purchased upfront with 32x4GB DIMMs. If there is sound technical analysis to justify 256GB memory, then this would add about $19K to the system cost.
The long term overall memory price trend has been about a 50% reduction every 3 years, with the largest capacity module having a disproportionately higher price. By 2010-11, the 8GB ECC DIMM should drop to below $300, and a new 16GB ECC DIMM will have the disproportionately higher price of $1000-2000.
An interesting point to consider is this. The HP ProLiant DL785G5 is an 8-way system with 64 DIMM sockets. The DL785G5 with 8 x 3.1GHz Quad-Core Opteron 8393 processors and 256GB memory (64x4GB) is priced at $62,628. By comparison, The ProLiant DL585G5 with 4 of the same 3.1GHz QC Opteron processors cost $23,001 with 128GB memory (32x4GB) (HP website, 5/20/2009). Unfortunately, the HP website does not list the 8GB DIMM module option for the DL585G5. The Dell PowerEdge R905 with 4 x 2.7GHz is priced at $13,119 with 128GB (32x4GB) and $31,388. The implication is that one should consider a larger system with more DIMM sockets as an alternative to populating the original system with the largest capacity memory module is there is a large disproportion in price per GB. In particular, the Opteron system architecture ties the number of memory channels and DIMM sockets to the number of processors. So it is not possible to configure the ProLiant DL785G5 with 4 processors and 64 DIMM sockets. There is a hard tie of 8 DIMM sockets per processor socket.