Home, Cost-Based Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage, TPC-H Studies

System Architecture and Configuration

System Architecture has been split into multiple sections:
  Historical, AMD Opteron, Intel QPI, Dell PowerEdge, HP ProLiant, IBM x Series,
  and NUMA (in progress),
Additional related topics:
  High Call Volume SQL on NUMA, Systems Architecture 2009, NEC Express5800/A1080a (2010-06),
  Big Iron Revival (2009-05), Big Iron Revival II (2009-09), Big Iron Revival III (2010-09).

Update 2011-10-05

I know this section needs to be updated. In the mean time, see The Register article on Sandy Bridge servers Real World Technologies Sandy Bridge for Servers.

Update 2010-11-03

The Inquirer has an update on Sandy Bridge. The socket options are LGA1155 for desktop and entry server, with 16 or 20 PCI-E lanes. This line will have 4 cores and 8M last level cache (LLC), frequency up to 3.4GHz with GPU and 3.5GHz with GPU disabled.

The Xeon 5700(?) line will be LGA 1356, 6 and 8 cores, up to 20MB LLC, 3 memory channels, 24 PCI-E lanes, and a single QPI v2 at 8GT/s.

There will be an 8-core model in LGA2011, 150W TDP, 40 PCI-E lanes, 4 memory channels, and dual QPI.

Other sources: AnandTech Real World Technologies

Before jumping into system configuration, it is useful first to understand system architecture, including how the current system architecture came to be. A discussion of system architecture inevitably involves a comparison of the Intel Xeon versus Opteron performance characteristics, so this is briefly covered. It is important to distinguish performance characteristics at the processor and system levels where possible.

Performance History and Characteristics

One cannot simply discuss comparative system architectures without bringing up performance. To discussion the relative merits of each system architecture, it is important to distinguish between performance at the processor core level and at the complete system level as much as practical. Performance at the individual core level is still relevant because some important functions are not yet multi-threaded. The two traits of interest at the system level are scalability and overall performance. Scalability is characterized by performance or throughput versus the number of processors (cores and sockets). The important metrics are single core and the overall system performance. A system architecture with excellent scalability wins points for system architecture. If the underlying processor performance is weak, it may not be competitive at the system level against another system with lesser scalability, starting with a more powerful processor core or socket.

Next, performance cannot be characterized with single metric, especially considering that each of the processor architectures have completely different characteristics. On any given manufacturing process, the transistors are (or were down to 90nm) more or less comparable between Intel, AMD, IBM or other foundries. An important point is that the frequency on one processor architecture has no relation to frequency on a different processor architecture. (Processor frequency is like engine RPM. The important metric is power. For an engine, power is RPM x torque. For a processor, performance is frequency x IPC, or instructions per cycle.)

See the benchmark link for more information: Performance Benchmarks

Processor

A 4-way system like the Dell R900 has several processor options. Currently the Dell R900 processor options range from the Quad-Core Xeon E7310 1.6GHz, 2x2M L2 at $11,423, QC Xeon X7350 2.93GHz 2x4M L2 at $17,523 and the six-core X7460 2.67GHz 16M L3 at $20,423. All comparisons are done with 128GB (32x4GB) memory. The Dell R905 is $9,469 with the 2.4GHz Opteron 8378 and $13,119 with the 2.7GHz Opteron 8384. It does seem that the cost of the processor options in the bare system with processors and memory sockets populated is reasonably proportional to processor frequency (or expected performance). However, in considering the complete cost of the system with storage, and software licensing, especially if per processor licensing applies, (not even including operating costs), the cost difference between the low and high is not worth considering.

Memory

As discussed earlier, the system architecture has multiple memory channels. Depending on the system, while uneven memory configurations can be supported, best performance is achieved with identical memory modules populated in banks of 4 or 8, matched with the number of memory channels. In past years, it might be worthwhile to discuss how to analyze memory capacity requirements.

Today, a 4GB ECC DIMM costs around $115, and an 8GB ECC DIMM costs around $700 (Crucial, 5/20/2009). The typical 4-way system, including the Dell R900, has 32 DIMM sockets. It really is not worth the effort to determine if a system needs less than 128GB. By the default, the system should just be purchased upfront with 32x4GB DIMMs. If there is sound technical analysis to justify 256GB memory, then this would add about $19K to the system cost.

The long term overall memory price trend has been about a 50% reduction every 3 years, with the largest capacity module having a disproportionately higher price. By 2010-11, the 8GB ECC DIMM should drop to below $300, and a new 16GB ECC DIMM will have the disproportionately higher price of $1000-2000.

An interesting point to consider is this. The HP ProLiant DL785G5 is an 8-way system with 64 DIMM sockets. The DL785G5 with 8 x 3.1GHz Quad-Core Opteron 8393 processors and 256GB memory (64x4GB) is priced at $62,628. By comparison, The ProLiant DL585G5 with 4 of the same 3.1GHz QC Opteron processors cost $23,001 with 128GB memory (32x4GB) (HP website, 5/20/2009). Unfortunately, the HP website does not list the 8GB DIMM module option for the DL585G5. The Dell PowerEdge R905 with 4 x 2.7GHz is priced at $13,119 with 128GB (32x4GB) and $31,388. The implication is that one should consider a larger system with more DIMM sockets as an alternative to populating the original system with the largest capacity memory module is there is a large disproportion in price per GB. In particular, the Opteron system architecture ties the number of memory channels and DIMM sockets to the number of processors. So it is not possible to configure the ProLiant DL785G5 with 4 processors and 64 DIMM sockets. There is a hard tie of 8 DIMM sockets per processor socket.

IBM Power

IBM Power system architecture is out of the scope here, but this link IBM POWER Systems Overview from Blaise Barney, Lawrence Livermore National Laboratory is a good source.