Home

Original Server Sizing

See also: Server System 2010 Q3

Many discussions on server sizing try to approach this by starting with estimates of the expected number of users, and possibly the transactions volume per user. This is the correct technical approach. While I have extensively written on quantitative performance analysis for SQL Server, it is possible to arrive at a sufficiently correct answer for the server system very quickly without detailed analysis.

The reason is that there a very few choices of consequence. The corollary is: an essay is not required for a multiple choice question when 95% of the appropriate answer is B or C.

The choices are the type of processor

  1. Intel Xeon 5600 or Xeon 7500 series
  2. AMD Opteron 12-core

Intel Dunnington

Yet another update with the publication TPC results for the Intel X7460 six core (Dunnington)

X7460 2.67GHz 3x3M L2, 16M L3

Dell, HP lists availability Sep 15, 2008, IBM lists availability on the x3950M2 as Dec 10,

Dell R900 with 4 x X7460, 2.67GHz, 6 core, 16M L3, $17,195

HP DL580G5 with 4 x X7460, 2.67GHz 6 core, 16M L3 $19,151

I think the IBM x3950M2 with 4 x X7460 is $41K (understanding this system can be expanded to 16 sockets, and hence has higher cost structure)

Tukwila Quad Core Itanium due in Q1 2009?

I am looking over the Intel IDF slides on Tukwila. It is a quad-core Itanium, with probably just minor improvements in the core (specifically mentioned are HT) but with integrated memory controller and the QuickPath Interconnect (QPI) replacing the FSB. Tukwila will be 65nm when the first 45nm procs are 1 year old, meaning it really 2 years late (What I mean by this is a 65-nm quad core Itanium could have been built in late 2006/early 2007, if the prep work started early with clear objectives, and its performanc would have been earth shattering relative to x86/64). Frequency improvements are mentioned over the 90nm Dual-core, which is running about the same frequency as the 130nm single core w:st="on">Madison. Still, Tukwila has a large cache, 6M L3 per core, massive bandwith via QPI for good scaling characteristics (4 full width + 2 half width, allowing glueless 8-way), 4 DDR memory channels per socket, an HT, which is good for high call volume database apps. Intel mentioned about 2X performance, so they are probably targeting 740K tpm-C.

This means we will have the choice of 1) the six-core Dunnington, with the most powerful CPU core on earth (prior to Nehalem) on a weak chipset with 4 memory channels support 4 sockets of 24 cores, 2) the new quad-core Itanium with outstanding scaling characteristics plus HT for added throughput, but a weak core, 3) AMD Barcelona, also with very good scaling (8-way glueless), no HT, and slightly better than weak core, 4) Nehalem, with what will be the most powerful core, the new QPI for good scaling, 3 memory channels per socket, HT, but for the first year, only 2-sockets. Decisions, decisions.

Nehalem/Beckton

Due out in Q4 2008, the initial Nehalem core will support 2-way, 3 memory channels, 2 QPI. About a year later, Beckton, the MP server (4 sockets and up) version come out, 4 mem channels/socket, 4QPI.

Performance

TPC-C (Windows Server 2003, SQL Server 2005SP2)

4 x Intel X7460 six core 2.67GHz, 634,825 tpm-C

4 x AMD 8360 quad-core 2.5GHz 471,883 tpm-C

4 x Intel X7350 quad-core 2.93GHz 407,079 tpm-C

TPC-E (W2K8, S2K8, Dell PowerEdge R900 results)

4 x Intel X7460 six core 2.67GHz, 671.35 tps-E

4 x Intel X7350 quad core 2.93GHz, 451.29 tps-E

4 x AMD 8360 quad core ??

TPC-H (W2K3, S2K5, SF 100)

4 x Intel X7350 quad core 2.93GHz, 46,034QphH

4 x Intel X7460 six core ??

4 x AMD 8360 quad core ??

TPC-H 300GB

8 x Intel X7350 QC 46,034 QqpH (IBM x3950M2, W2K3, S2K5 sp2)

8 x AMD 8360 QC 52,860 QqpH (HP DL785, W2K8, S2K8)

On TPC-C, the 7460 six core generated a 34% edge over the quad core AMD and 56% advantage over the quad core X3750. Even with the large cache, this is higher than expected. At the time, I suspected HP did not pursue optimization with the 407K result.

On TPC-E, the six core showed a 49% edge over the older quad core.

This could indicate the 7300 chipset with 4 memory channels cannot properly scale the 4 QC 2x4M L2 processors, but can scale the new six core 16M L3 procs.

What's missing are comparable TPC-H numbers, especially at 100GB. The big cache on X7460 helps high call volume apps like TPC-C and E, but not TPC-H. Can the 7300 chipset drive the extra 8 cores (in the X7460 over 16 core in the X7350) in DW queries?

There is an 8xOpteron QC and 8xX7350 at 300GB, but the Opteron is on S2K8 while the X7350 is on S2K5, which has different characteristics.

The X7460 (Dunnington) is a clear winner at 4-way for the high-call volume apps. There are not sufficient results in DW to make a call. AMD does have a small openning in the fairly low-priced 8-way (compared with hard-NUMA systems).

As much as I would like to buy one of these for my own use in researching SQL Server performance characteristics, I am holding my 2008 budget for a Nehalem system as soon as it comes out, and a SSD array. As soon as I can confirm an SSD can do 10K IOPS on random 8K reads (I see the IDF announcements that the new Intel SSD due early 2009 will do 30K IOPS at 4K), I will get a dozen to see what is involved in reaching 100K IOPS from SQL queries. A few years ago, a quick test on a TMS SSD SAN showed 45K, limited by the SQL Server side CPU. On Nehalem, the big question is whether the Hyper-Threading issues of NetBurst has been fixed.

______________________________________________________________________

This is an update to the original post on Server Sizing for SQL Server to reflect the new quad-core Opteron systems. style='font-size:10.0pt;mso-bidi-font-size:9.5pt;font-family:Tahoma;mso-bidi-font-family: Arial'>The recommended server systems, as of Q3 2008, for line-of-business database applications are:

2-way Intel Xeon: HP ProLiant ML370G5 and Dell PowerEdge 2900 III

4-way Intel: Dell PowerEdge R900 and HP ProLiant DL580G5

2-way AMD Opteron: Dell PowerEdge R805

4-way AMD Opteron: Dell PowerEdge R905 and HP ProLiant DL585G5

8-way AMD Opteron: HP ProLiant DL 785G5

Processors

For 2-socket Xeon, the top processors include the X5460 3.16GHz, and the E5440 2.83GHz, both 2x6M cache.

For 4-socket Xeon, the X7350 2.93GHz 2x4M

For 4 & 8 socket Opteron, the top processor is the8360SE at 2.5GHz.

Processor Notes

I really do not want to get heavily into Xeon versus Opteron. It is too emotional a subject for many people and too infested with FUD driven by marketing people. This frequently involves valid technical points taken out of context. What it comes down to is the Core 2 architecture has by far the highest SPEC CPU integer (not rate) scores, and will generate the best results in certain categories of performance tests. This is most evident in single large query tests.

At the 4 & 8-socket level, AMD Opteron has the better memory architecture, with 2 DDR2 memory channels per socket, 8 total in a 4-socket system and 16 memory channels in an 8-socket system, compared with 4 in the 4-socket Xeon with the 7300 chipset. This may yield an advantage in full saturation tests, which are more difficult to run. So at the 4-socket level, the difference is Xeon has the more compute power in the processor cores, while Opteron can turn memory faster. What is better: a 400HP engine with a transmission with 70% efficiency or a 310 HP engine with class=GramE>a 90% transmission efficiency?

At the 8-socket level, Opteron is the best choice for most situations. The 8-socket Opteron ( w:st="on">Barcelona) system has what is considered to be soft a NUMA architecture, meaning that memory latency difference between local and remote nodes is low or inconsequential (i.e. do not set the NUMA flag). The IBM and Unisys big iron systems are considered hard NUMA, meaning that memory latency between local and remote nodes is high. Hard NUMA systems can scale, but would most likely required specialized performance analysis skills which are not easily found.

Additional comments

I rate the HP ProLiant ML370G5 over the PowerEdge 2900 III on technical grounds: more memory sockets, and more PCI-E sockets. On the same grounds, I rate the Dell PowerEdge R805 over the ProLiant DL385G5 because the R805 has 16 DIMM sockets over 8 for the DL385G5.

At the 4-socket level, for Intel platforms, the Dell and HP systems are sufficiently comparable.

Note that Dual-Core Opteron processors are an option in the 2 and 4-socket systems, but not the 8 socket DL785. The original and dual core Opteron processors have up to 3 full (16-bit) HT links, of which 2 connect to other processors, and 1 connects to an IO hub. In a 4-socket system, the processors are at the corners of a square, with each processor connected to processors on the two adjacent corners. Hence there is a far processor two hops away.

The Barcelona quad-core has up to 4 full width HT links, each of which can be split as two half-width (8-bit) HT links. In a 4-socket system, each processor can connect directly to all of the other three sockets with a full HT link, leaving one for IO. In an 8-socket system, each processor can connect directly to all seven other sockets with one half-wide HT link, leaving one half-wide link for IO. The HP 4-socket DL585G5 and 8-socket DL785 only support quad-core Opteron, not dual core, which may indicate the use of three full HT links to processors. The Dell R905 supports both dual and quad-core Opteron, which may indicate an older 2-hop to the far processor.

Finally, until quad-core on a current generation manufacturing process is available, Itanium has the very high memory capacity (>256-512GB) and IO bandwidth (>10GB/sec) niche. It could be pointed out that whatever the criticism of Itanium be since its launch, at the time it was conceived in the 1990s, it was a forgone conclusion that RISC would overwhelm x86, which would be not able to benefit from advanced design concepts. Intel was not content to do a johnny come lately to the RISC party, and with HP, came up with a better idea than RISC. And yet, what processor today has the best SPEC CPU int.

IBM Xeon Systems

I have said before that I do not have recent experience with IBM systems. Just from looking at the IBM redbook ob the x3850 M2, it looks very impressive. For this and the x3950 M2, IBM does their own chipset, which supports a NUMA architecture to 4 nodes of 4 sockets. I do not know if 8 nodes are still supported. What I like about the x3850 M2 memory controller is the 8 DDR2 memory channels. I really think the Intel 7300 with 4 memory channels is too weak to support 4 quad cores, and now 4 six core procs. Intel always was afraid of the high-end chipset, obsessively looking at the entry point price, which drags down the high-end configuration. The IBM x3850 M2 did post a TPC-E of 479.51 tps-E over Dell's 451.29. The IBM system has 128GB memory compared with 64GB for Dell, so it is not clear if the 8 memory channels contributed.

Is any one seriously looking at the HP ProLiant DL785G5?

Or the IBM x3950M2 8-way Quad-Core Xeon?

Consider the following specs:

8-way Quad-Core Opteron (32 cores total)

Max Memory: 512GB (64 DIMM sockets)

11 PCI-E slots: 3x16, 3x8, and 5x4 or option for 7 PCI-E and 2HTx

Compare this with the HP ProLiant DL585G5:

4-way Quad-Core Opteron (16 cores total)

Max Memory: 256GB (32 DIMM sockets)

7 PCI-E slots: 3x8, and 4x4

Aside from 8 Quad-Core processor sockets, the significant differences are 64 DIMM sockets, doubling the maximum memory of the 4-way, and 11 PCI-E slots (depending on the actual architecture, this could be 92 PCI-E lanes!)

Some pricing from the HP web site below.

ProLiant DL585G5

w/4x2.2GHz $10,389, 4x2.3GHz $12,189, 4x2.5GHz $16,189

64GB 32x2GB +$4,495, 128GB 32x4GB +$13,455, 256GB 32x8GB +$55,863

ProLiant DL785G5

w/4x2.2GHz $16,973, 8x2.3GHz $27,291, 8x2.5GHz $46,891

128GB 64x2GB +$7,612, 256GB 64x4GB +$26,492, 512GB 64x8GB +$75,316

8-way systems were somewhat popular in the Pentium III Xeon/ProFusion days, but Intel did not follow it with a respectable (read on when you stop laughing) 8-way chipset for the NetBurst based Xeon processors. HP did their own chipset for the DL740(?) which they considered moderately successful (in that it was profitable but did not warrant continuation for the next generation dual core processors).

The ProLiant DL785 posts a respectable TPC-H benchmark result of 52,860 QphH@300GB (SQL Server 2008), compared with 46,034 for the 8-way IBM x3950 with 8 Xeon 7350 (SQL Server 2005). The ProLiant DL585G5 also posted a top 4-way SQL Server TPC-C result of 471,883tpm-C on the AMD Opteron 8360 at 2.5GHz. I was really expecting that the AMD quad-core needed to be at 2.8GHz to reach this performance level.

Now getting an actual database (SQL Server or any other) to scale to 16 or 32 cores is not a simple matter. I do suggest conducting a proper quantitative scaling analysis, i.e., measuring maximum throughput with 1, 2, 4, 8, 16, and 32 cores.

The other reasons for going with the 8-way is the extra DIMM sockets. The 4-way QC 2.3GHz DL585 with 256GB memory (32x8GB DIMMs) is $68K, versus $54K for the 8-way DL785 with 64x4GB DIMMs (it is necessary to populate all 8 processor sockets to get 64 DIMM sockets, but one could restrict SQL Server to 4 sockets if per processor licensing is involved, see Andy's comment below). Even at 128GB, it is $26K for the 4-way DL585 with 32x4GB DIMMs versus $35K for the 8-way with 64x2GB. The extra 16 cores for a $9K delta is highly attractive in CAL situations.

One would also think that the 11 PCI-E sockets could support phenomenally high sequential disk transfer rates: 800MB/sec per first generation PCI-E SAS RAID controller, 1,100MB/sec+ for second generation (in x8 slot). But there is no published data on this for the AMD systems with PCI-E. The HP Itanium systems can do over 15GB/sec in SQL Server table scans.

Sizing for SQL Server 2000 and 2005

Server sizing strategies using requirement driven approaches tend to go in to inane details involving substantial effort with no clear conclusions at the end. By considering only the systems most suitable for line-of-business database servers, there are just a few meaningful choices, making for a much simpler decision process. The key performance criteria for database servers are processor, memory and IO capability. The relevant processor lines are the Intel Xeon, AMD Opteron, and the Intel Itanium 2. For the current (and past) Intel processor lines, memory and IO capability are determined by the chipset. There are only a few chipset choices for any given processor arrangement (1, 2 and 4-way, etc). The systems that make the most sense for database servers employ the chipset with the best memory and IO, and implement the full memory and IO capability. For the AMD Opteron line, memory and IO are integrated into the processor, so this capability is largely determined by the number of processors. Hence there are very few system choices that actually warrant consideration. Recommended system examples for Dell and HP are given. There are comparable IBM systems, but are not cited here.

Processors

Nearly all server systems today use Intel Xeon or AMD Opteron processors. In the 2003-2004 timeframe, the performance competition between (single core) NetBurst architecture Xeon and Opteron processors was very close. With the introduction of dual-core processors in 2005, AMD pulled solidly ahead, in part because the NetBurst processors had to be throttled below electrical limits to keep within a 130W thermal envelope. In June 2006, the Intel Core 2 architecture established a clear performance lead, but was available only for 1-way and 2-way systems. In November 2006, the quad core Xeon 5300 series for 2-way systems closed much of the performance gap relative to 4-way dual core systems. At end of 2007, the meaningful server system choices for most environments are a 2-way based on the quad core Xeon 5400 series processor or a 4-way system based on the quad core Xeon 7300 series. Sometime in 2008, w:st="on">Barcelona quad core systems may be a viable choice if AMD can get frequency into the 3GHz range to be competitive with 3GHz Core 2 architecture processors.

Itanium has not kept pace with performance progress in the X64 side. It is seriously disappointing that Itanium 2 processors in 2007 are still on the 90nm process, when Xeon processors are beginning to transition to 45nm. The 90nm dual core Itanium cannot compete against 45nm quad core Xeon on performance per socket. However, chipsets with extraordinary capability, including the HP sx2000, have enabled Itanium systems to carve out a niche in the high-end where either very large memory capacity or exceptionally high IO bandwidth are required.

2-way Systems

The recommended 2-way systems as of December 2007 are the Dell PowerEdge 2900 IIII, and the HP ProLiant ML370G5. The preferred processors are the quad core Intel Xeon X5460 3.16GHz with 2x6M L2 cache or the E5450 2.83GHz. The Xeon 5400 series are manufactured on the new 45nm process, with some minor architectural enhancements, slightly higher frequency and larger cache over the 65nm Xeon 5300 series with 2x4M cache. Note these two specific systems are named and not the comparable high density 2U systems because of greater memory capacity and IO capability in terms of the number of available PCI-E x4/8 slots. The large number of internal disk bays also make these systems suitable for entry non-clustered environments.

Technically, the ProLiant ML370G5 is the best system because it implements the full capability of the Intel 5000P chipset, with 64GB maximum memory over 16 DIMM sockets and 6 x4 PCI-E slots, compared with 48GB max memory and 1x8 + 3x4 for the PowerEdge 2900 (both have 1 x4 slot occupied by an included SAS/RAID controller). None of the current generation storage IO adapters (SAS RAID controllers or FC HBAs) can use more bandwidth than provide by a x4 PCI-E port, so there is no value in having 1x8 PCI-E over 2x4 slots in current generation database servers. The PowerEdge 2900 is more suitable for situations that do not require the maximum memory or IO configuration, especially considering Dell list prices are very attractive.

4-way Systems

The recommended 4-way systems as of December 2007 are the Dell PowerEdge R900 and the HP ProLiant DL580G5. The preferred processor is the quad-core Xeon 7350 2.93GHz with 2x4M L2 cache. There is no point dropping down to the lower priced E7330 2.4GHz as the complete solution price differential does not justify a 20% lower frequency. Both of the Dell and HP 4-way quad core systems are built around the Intel 7300 chipset. The 7300 supports 32 DIMMs for a maximum memory capacity of 128GB with 4GB DIMMs and 256GB with 8GB DIMMs. The IO configuration is 4 x8 and 3 x4 PCI-E slots. The 7300 chipset actually has 7 x4 PCI-E lanes and an ESI port (equivalent to one x4 PCI-E). PCI-E expander chips are used to create PCI-E slots with shared bandwidth. This not an issue except that load should be properly distributed across the dedicated PCI-E ports.

Itanium Systems

Itanium systems to consider for the high-end include the HP Integrity 8-way rx7640 and 16-way rx8640, with maximum memory of 256GB and 512GB respectively. Both systems use the sx2000 chipset and have extraordinary memory and IO bandwidth, far beyond what can be achieved in Xeon or Opteron systems. A table scan was observed at 15GB/sec. There is some question as to whether the 4-way rx6600 with the zx2 chipset should be considered over the 4-way Xeon or Opteron systems. The Itanium has the best 4-way dual core TPC-C result for SQL Server, probably due to the very large cache (2x12MB) and hyper-threading, but cannot match the 4-way quad core Xeon. One reason for considering Itanium is to realize the benefits of full 64-bit operating system and application when SQL Server version must be 2000 and not 2005 or later.

Summary

Microsoft did (still does?) recommend 4-way (and even 8-way systems during the Pentium III/ProFusion era) as the standard for SQL Server. From what has been observed in recent years, the default recommendation with no effort to analyze requirements should now be the 2-way quad core system. If there are indications the workload load is too heavy, tuning alone may be sufficient to resolve the issue. Even if it is necessary to replace the 2-way with a 4-way quad core or larger system, the original 2-way did not cost very much and can be used for other purposes.

Only in special situations with known exceptionally heavy load and big budgets is it actually beneficial to analyze performance requirements to determine the correct solution. The reason for this is that the cost of doing a proper sizing analysis will cost much more than a 2-way or even 4-way system. It makes more sense just to buy a 2-way quad core system that can handle most tasks. Another reason for defaulting to the 2-way system is that Intel introduces new technology in 2-way systems with about a one year lead over 4-way systems.

For people considering the 4-way quad core system (16 cores in all), SQL Server 2005 or later and the 64-bit version at that is strongly suggested. SQL Server version 2000 is too old and has too many issues in working with 16 schedulers, especially parallel execution plans for data warehouse environments.

Additional Notes

AMD Opteron 8200 and Intel Xeon 7100 series

Even though the dual core Opteron have been out classed by quad-core Xeons at the socket level, for most of 2007, the 4-way Opteron system still had leading edge performance, specifically in the TPC-H benchmark. The 4-way Opteron system had excellent memory performance with a total of 8 DDR2 memory channels when all four processor sockets are populated compared with 4 for the Intel systems. Both the Dell and HP 4-way Opteron systems featured 3x8 and 4x4 PCI-E ports, providing the best nominal IO bandwidth.

The 4-way Intel Xeon 7140 system based on the old NetBurst processor architecture did achieve the best 8-core X64 TPC-C score of 318K versus 263K for a 4-way dual core Opteron 8220 and 251K for a 2-way quad core Xeon X5365. This can probably be attributed to the very large 16M L3 on-die cache compared with 1M per core for the Opteron and 4M per 2 cores on the X5365 and the NetBurst Hyper-Threading feature which helps high call volume applications (like TPC-C) but not in other areas. So the 4-way dual core systems do have a modest edge over the 2-way quad core. The 2-way quad core system is recommended on cost grounds for most situations.

AMD Barcelona

As noted earlier, the original Opteron single core and both Opteron dual core processors were highly competitive if not industry leading in its day. w:st="on">Barcelona was supposed to keep AMD competitive by integrating four cores on a single die in combination with the transition from 90nm to 65nm manufacturing process. w:st="on">Barcelona class=GramE>appear to have moderate improvements over the previous generation Opteron in core architecture. The expectation is that single core performance between the 90nm Opteron and 65nm w:st="on">Barcelona should be comparable at the same frequency. Since Opteron reached 2.6-2.8GHz relatively quickly on 90nm, the expectation is that w:st="on">Barcelona on 65nm should reach 3.5GHz at a comparable stage. There are currently very few published w:st="on">Barcelona benchmark results (of course, Intel has not been profuse with 5400 or 7300 benchmark results either). The indications are that Barcelona will need to be around 3GHz to be competitive with the Xeon 5400 and 7300 series at around 3GHz. AMD is trying to achieve 2.5GHz in early 2008, with higher frequencies later in 2008. The window of opportunity for 65nm w:st="on">Barcelona to be competitive with the current Xeon 5400 and 7300 processors would probably close in late 2008 when the next generation Intel 45nm processors are released.

Intel Seaburg Chipset (5400)

The Intel 5400 chipset arrived about the same time as the Xeon 5400 series processors. The 5400 chipset supports both 65nm and 45nm Xeon 5100-5400 processors. Improvements in the 5400 over the 5000P/X chipset include an increase in the number of PCI-E lanes from 24 to 36 for 9 x4 PCI-E ports (in addition to the ESI port). Maximum memory capacity is increased to 128GB over 16 DIMM sockets, meaning 8GB DIMMs are required for max memory capacity. The maximum practical memory remains 64GB until the 8GB DIMM prices becomes reasonable, probably in the 2009 time frame. The PCI-E lanes configured as x16 ports supports the new PCI-E generation 2 specification 5Gbps signaling, compared with 2.5Gbps in gen 1. For now, this feature is only for workstation graphics and specialty HPC adapters. Some time in the future we may see x4 PCI-E gen 2 slots at 5Gbps.

Both the PowerEdge 2900 and ProLiant ML370G5 retain the older 5000P or X chipset. Neither vendor felt it was necessary to transition the workhorse server platform to the new 5400 chipset, even though the increased number of PCI-E ports would be a highly welcome improvement.

Core 2 Processors

As mentioned earlier, the first Core 2 processors were manufactured on a 65nm process and featured 4M L2 cache shared between two cores. The second generation Core 2 processors are 45nm with 6M L2 cache. The 45nm processors are only available in the Xeon 5400 and 5200 series. The Xeon 7300 series for 4-way systems will use the 65nm processors until late 2008. Right now, this is not much of an issue as the top Xeon 5400 series frequency (with 1333MHz FSB) is 3.16GHz compared with 2.93GHz for the top Xeon 7300 series. It is possible that the top 45nm 5400 frequency could be pushed higher, while the top 65nm 7300 will mostly likely stay put. The 50% larger cache on the 45nm version would be helpful in high call volume applications on 4-way systems, but this difference is not sufficient to be an absolute requirement. In late 2008, Intel will launch a 45nm processor based on the Core 2 architecture, codename Dunnington, for 4-way systems. Details released show a single die six core processor. The current Intel quad core processors are comprised of 2 dual core die in a single package, or processor socket. Each pair of cores will have 3M L2 cache. The entire die will have a large 12-16M L3 cache shared between all cores. The expectation is this strategy should benefit throughput oriented applications. The late 2008 time frame should also see the next generation Intel processor architecture, code name Nehalem. It is unclear as to how Nehalem systems available in late 2008 will compare to a 4-way Dunnington server with a total of 24 cores.

SAS RAID Controllers

Many of the first generation SAS RAID controllers are based on the Intel 80333 IO processor. There is now a next generation 8134x family of IO processors with improved processing and memory bandwidth. None of 80333 generation SAS RAID controllers could drive more bandwidth than available on class=GramE>a x4 PCI-E port. The major system vendors seem to be slow in adopting the new generation 8134x controllers. It is unclear whether the new generation IO controllers can drive more bandwidth than provided by a single x4 PCI-E port.

Memory

For SQL Server 2000, there are many considerations, like Enterprise Edition versus Standard Edition, 3GB and PAE. For SQL Server 2005, especially full 64-bit, my recommendation is to fill the DIMM sockets with 2GB DIMMs. When 4GB DIMMs come down in price to around parity in cost per GB with 2GB DIMMs, the recommendation changes to 4GB. Some systems accommodate additional memory cards. Be sure to order the system with all memory cards installed. Trying to get it afterwards can be a real pain, especially if the cost is about $100.

IO Bandwidth

I will discuss storage performance in more detail later. In brief, the game is brute force performance. This is achieved with many physical hard drives distributed over several IO controllers. Consider 4-8 PCI-E SAS controllers or FC HBAs as the system allows. Start with 1 rack of (10-25) external disks per controller, adding a second rack per controller only when all PCI-E slots have been filled with disk controllers.

Dell PERC6 RAID Controller Performance

Last month, Scott pointed out the really bad performance characteristics of the Dell PERC6 in RAID0 sequential write, particularly compared with RAID-5. Granted, this is not necessarily a red flag because few people use RAID 0 in production. Still, if one can't write code or test correctly, one should not be in the hardware/firmware business. Dell recently released firmware 6.1.1, the previous was 6.0.3. The driver was updated from 2.20 to 2.23.

My results for 2 x 4 disk RAID0 arrays on PERC6, writing to another PERC6 with same disks.

Large block Read test: 690-713 MB/sec

SQL Server backup test

Old 6.0.3 firmware:         81MB/sec Write-Back   129MB/sec Write-Thru

New 6.1.1 firmware: 354MB/s Write-Back      631MB/sec Write-Thru

The old driver with new firmware has the same results as new driver with new firmware, so this was really a firmware issue. The old PERC5 did not have as bad RAID 0 write class=GramE>performance as the PERC6 with old firmware. Random write IO testing shows mixed results between write-thru and write back, some favoring WT, some WB but not by a large amount. style='font-size:10.0pt'>

A curious note. The Dell TPC-E report for the R900 with 4 x Xeon X7460 used the LSI MegaRAID SAS 8888ELP controllers, not the PERC6E (there was a PERC6i for internal drives). Both PERC6 and LSI 8888 use LSI components (and common drivers?)