Home, Query Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage, TPC-H Studies

Additional related material:
  Memory Latency, NUMA and HT 2016-Dec),  The Case for Single Socket (2016-04)

 Server Hardware 2014 Q2,  2012 Q4,  2011 Q3,  2010 Q3,  2009 Q3,

 NEC Express5800/A1080a (2010-06),  Server Sizing (Interim) (2010-08),
 Big Iron Revival III (2010-09),  Big Iron Revival II (2009-09),  Big Iron Revival (2009-05),
 Intel Xeon 5600 and 7500 series (2010-04, this material has been updated in the new links above)

Server Sizing 2016 (2016-Dec)

It has been a long time since I have put out a proper server sizing guide. The last one was 2011. I started one for 2012 and 2014 but never completed either.

The 2016 server sizing should be taken in the context of my Memory Latency, NUMA and HT article. All multi-socket systems today are NUMA, which means there is a major difference in performance depending on whether memory is in the local or remote node. Almost no one takes this into account in their database and application architecture. It is central part of the TPC-C and TPC-E benchmark configurations. For this reason, I am now advocating strong consideration for single socket systems first. Only consider multi-socket when more than 22 cores are needed.

Recommended System 2016

Options and considerations are given further down on this page. But for now, I will jump straight into an example with pricing.

Medium System

ProcessorXeon E5-2630 10-core 2.2 GHz$667 feel free to adjust the specific SKU
MotherboardSupermicro X10SRL-F$272 
ChassisSupermicro CSE-732D4F-903B$270 
Heat SinkSupermicro SNK-P0050AP4$41 
MemoryCrucial 64GB (4×16GB) kit$548adjust as necessary
PCI-E SSDIntel 750 or P3600 400GB$350adjust as necessary

There were a few mis-queues on my parting in setting up the above system. I ordered the processor, motherboard, chassis from Newegg, and memory from Crucial. After the components arrived, I found that the heatsink is not included with the Xeon E5. I had built a few Xeon E3 systems the previous year, and a heat sink came with the Intel E3 processor. I found an Intel branded heatsink for the Xeon E5 and ordered it with expedited shipping. It came, and I then realized it had mounting screws for an Intel motherboard. A desparate email was sent to Supermicro support. But then I saw the P0048AP4 heatsink and ordered it. The next day, Supermicro tech support replied saying the P0050AP4, but also said that the 48 would work, just not as quiet.

The Supermicro X10-SRLF motherboard has 4 PCI-E gen 3 x8 and 2 g3 x4 slots, which is good for 4 PCI-E NVMe SSDs and a 40/100GbE NIC. My one complaint against Supermicro is their embedded video, good for only 1280x1024 resolution. I don't need high performance graphics, but I would like high-resolution for system management.

I have used Crucial memory for a long time and continued to do so. Crucial has 5 different options for the 4×16GB kit, but only one PC4-2166 option is currently in stock. There are 4 PC4-2400 listing none of which are in stock (coming soon?). Two are dual-ranked, another is single ranked. One option is VLP single rank, being much more expensive. What the heck is VLP? - OK, very low-profile (), so I don't need this.

Similar on the Crucial 4×32GB kit, at $1044. Crucial does have a 64GB DIMM, but it is not rated for the X10SRL-F. Price on the 64GB seems to be $800-1000 or more each, so more the 2X the cost per GB of the 32GB and 16GB DIMMs.

NVMe SSDs are the new thing. But I think I would prefer to boot off a SATA SSD as these are cheap, and its just for booting anyways. I still like the Intel SSD P750 or equivalent P3500. These are older first generation NVMe SSDs. Newer models from other vendors will have better specs, like 3500MB/s read versus 2200MB/s. But the older devices are built with more (lower capacity) NAND chips, which in theory should better support small block IOPS and operation at high-queue depth, in theory?

So we are looking at about $2000 for the system, CPU, memory and boot. I have 4 Intel SSD 750's for another $1500 or so, plus the Intel 40Gbps Ethernet and a 2TB hard disk for archival. At $3500 with storage but not 40GbE, this is far more than an entry system, one that can do 8GB/s IO bandwidth.

Now if only I could find a good mechanical keyboard, but without the fancy lights that gamers like.

A Dell T430 with 1 E5-2630 (2-socket system), 4×16GB and 1 TB HDD is $2493. The T630 in similar configuration is $2735, so it might be a better option, unless its your job to lift the box into the rack at a server farm with hundreds of systems. I think it is poor choice on Dell's part to have the T30, T130 and T330 all being Xeon E3 v5 or lesser. Especially considering that the E5 v4 is such good value for the extra capabilities.

Heavy System

ProcessorXeon E5-2699 22-core 2.2 GHz$4115 feel free to adjust the specific SKU
MotherboardSupermicro X10SRL-F$272 
ChassisSupermicro CSE-732D4F-903B$270 
Heat SinkSupermicro SNK-P0050AP4$41 
MemoryCrucial 8×32GB$2100adjust as necessary
PCI-E SSD 1Intel P3600 1.2/1.6/2.0TB<$1/GB?
PCI-E SSD 2Intel P3600 1.6/3.2/4TB>$2/GB?
PCI-E SSD 3Micron 9100 1.6/2.4/3.2TB?

In the heavy system, we are looking at $7000 for the base system, 22-core processor, 256GB memory. The price of enterprise SSD storage is difficult to decipher as a search shows a wide spread. It does seem that the Intel P3600 series is about $0.85 per GB. The more interesting P3608 mostly shows to be >$2 per GB. This does not make sense. It might be a product for large customers, who want a substantial discount, so Intel priced it high to regular customers?

In a database entirely on flash, we do not need the super high write endurance of the Intel P3700 at 17 drive writes per day (dwpd) for 5 years, as this type specification is probably for SSD caching applications. The P3600 endurance of 3 dwpd (5yr) is more than good enough. The P3500 endurance of 0.3 dwpd is probably good enough. The P3520 seems to be its replacement, and is priced around $0.50 per GB?

Another option to consider is the Micron P9100

1 x 20-core versus 2 x 10-core
ComponentE5-2698 20-core2×E5-2630 10-coreComment
Motherboard$272$500 est. 


System Options 2016

Below are the Intel Xeon processor family and series choices. In the Xeon E5 2600 series, we can opt for 1 or 2 sockets. Presumably the only reason we would consider the E5 4600 is for a 4-socket system. In the E7 series, we could opt for 2, 4 or 8 sockets. There is probably not much benefit for 2-socket, and some liabilities. For the time being, 8-socket is beyond the scope here.

Sky Lake14nmXeon E3 v5 44161
Broadwell14nmXeon E5 v426002212402
Broadwell14nmXeon E5 v446002212404
Broadwell14nmXeon E7 v448001624404
Broadwell14nmXeon E7 v488002424408

I am excluding Xeon D, which is really for specialty requirements. There is one E5 1600 v4 SKU of interest. The others look to be workstation oriented. In previous generations, there were E5 series with fewer PCI-E lanes.

Xeon E3 v5 and E5 v4 2600-series

Below are the Xeon E3 v5 and Xeon E5 v4 1600 and 2600 series to be compared. Given that transaction processing favors cores and logical processors, I am only including SKUs with Hyper-Threading and generally the less expensive (lower frequency) SKU at each core count.

Xeon E3 v5123043.425062.50
Xeon E3 v5127543.633984.75
Xeon E5 v4162043.529473.50
Xeon E5 v4262082.141752.13
Xeon E5 v42630102.266766.70
Xeon E5 v42650122.2116697.17
Xeon E5 v42660142.01445103.21
Xeon E5 v42683162.11846115.31
Xeon E5 v42695182.12424134.67
Xeon E5 v42698202.23226161.30
Xeon E5 v42699222.24115187.05

I have always included Xeon E3 for consideration because that fact is a quad-core plus Hyper-Threading is very powerful, even it only has 32GB memory. One reservation is that there are no Xeon E3 motherboards with 4 PCI-E x4 slots. There are some with 3 x8 slots with the use of a PCI-E switch. This is really for workstations and adds cost to the motherboard without real value for servers.

Now that I look at the Xeon E5 1620 v4 quad-core 3.5GHz and the E5 2620 v4 8-core 2.2GHz, I would recommend passing on E3, going straight the E5 single-socket. Supermicro has a great E5 v4 UP motherboard, the X10-SRLF with 6 PCI-E gen 3 slots at about $300.

I have split the Xeon E5 1600 and 2600 SKUs into three groups based on price per core. The first group are really low cost per core. The second is the middle group. And the third are the premium models. However, the delineation is not clear-cut. The E5-2695 18-core could be either the high-end of the second or the low-end of the third group.

There is no die size given for the Broadwell 4-core model, but the 2-core GT2 graphics is 82mm2, 6.08 × 13.49mm. The 2-core with Iris graphics, listed at 133mm2. From the original raw image aspect ratio, I calculate the dimensions to be 6.80 × 19.55 mm. Notice the blank space above the cores of the 2c Iris die.

Broadwell 4c   Broadwell 10c  

Above is the 4-core Broadwell. By matching up the cores, I am guessing a die size of 172 mm2 and dimensions of 12.41 × 13.86mm.
The 10-core 246.2mm2, 15.20 × 16.20 mm.

The low price group are astonishing bargins, more so considering that the Broadwell LCC die is 246mm2. The 10-core LCC die is larger than the desktop 4-core with the big graphics engine. But the Core i7-6950X 10-core 3.0GHz, 140W does carry a price of $1723. so that probably where Intel wants to make their money on the Broadwell E LCC die. The Xeon E5-2640 v4 10-core 2.4GHz is $939. Manufacturing yield to frequency probably only plays a minor role, as most die are probably capable of high frequency. There might be some variation in power-efficiency, so the high-die count are cherry picked for the most efficienct.

What is interesting is that the price of the 16-core 2683 v4 on par in price per core to the 14-core 2660 v4. This is because the 16-core is on the HCC die which is much larger than the MCC die for the 14-core.

On the one hand, we could say that the E5-2600 series 20 and 22 core models are priced with the big boys. But compared with the 4600-series, the 2600s are half-priced.

E5 2 × 10-core versus 20-core

The E5-2630 10-core and E5-2698 20-core are both 2.2GHz. This makes an interesting comparison possible between 2 sockets × 10-cores versus 1 socket 20-cores at matching frequency.

A UP Xeon E5 v3/4 motherboard is about $300 versus $500 for a DP motherboard, but this is difference is largely inconsequential.

The 10-core is $667, quantity two is $1,334 versus $3,226 for the 20-core.

Memory also comes into play. Suppose the motherboard has 8 DIMMs per socket. The price of 4×16GB is about $550, versus 4×32GB at $1050. Then there is very little meaningful difference in 256GB total memory between 8×32GB on the UP and 16×16GB on the DP, both costing about $2,100.

The big difference comes into play at 64GB DIMMs, currently about $1,000. This is double the price per GB compared to either the 16 and 32GB DIMMs. If we need 512GB in the UP we would need 8×64GB at $8,000 versus 16×32GB in the DP motherboard at $4,100.

From this point of view, it would seem that DP is the better option at 20-cores total. However, if we were to factor in the effect of Memory Latency and NUMA, then perhaps we would be willing to pay the price of 64GB DIMMs should it be needed in a UP motherboard.

A good motherboard choice for UP is the Supermicro X10SRL-F, with 8 DIMMs, 4 PCI-E ×8 and 2 ×4 in gen 3, plus one gen 2 slot. For extreme IO, the Supermicro X10DRX has 10 gen 3 ×8 slots.


Xeon E5 v4 4600 series and E7 v4

Below are Xeon E5 4600 series and the E7 models.

Xeon E5 v44610101.81219121.90
Xeon E5 v44640122.12837236.42
Xeon E5 v44650142.23838274.14
Xeon E5 v44660162.24727295.44
Xeon E5 v44667182.25729318.28
Xeon E5 v44669222.27007318.50
Xeon E7 v4480982.11223152.88
Xeon E7 v44820102.01502150.00
Xeon E7 v44830142.02170155.00
Xeon E7 v44850162.13003187.69
Xeon E7 v48860182.24061225.61
Xeon E7 v48870202.14672233.60
Xeon E7 v48880222.25895267.95
Xeon E7 v48887242.27174298.92

I suppose it is even more curious that the E5 4600 series is more expensive than the E7 at equivalent core count. I just realized the E7 have 32 PCI-E lanes verus 40 for the E5. What gives? Did Intel need to borrow wires for the extra QPI link? I am thinking that the implementation of QPI and PCI-E are different, but Intel could just chose not to wire 8 of the PCI-E lanes, using the signals for QPI instead. I believe QPI uses 4 signals for each bit, and QPI is 20-wide?