Home, CBO, Performance Benchmarks, Server Systems, Systems Architecture, Processors, Storage,
Storage Configuration: Part I, Part II, Part III, Part IV, Appendix

Storage Configuration Part II

Fibre Channel (FC)

It is Fibre channel to emphasize that the media is not necessarily fiber. Or it might be that some one thought fibre was more sophisticated. For a long time FC signaling stayed put at 4Gbit/sec, which I consider to be a serious mistake. The mistake might have also been in staying with a single lane, unlike SAS which employed 4 lanes as the standard connection.

Anyways, a dual-port 4Gb/s FC HBA is a good match for a PCI-E x4 slot. To make best use of system IO bandwidth, the x8 slot should be populated with a quad-port 4Gb/s FC HBA. Some Intel 4-way Xeon systems with the 7300MCH have one or two PCI-E bridge expanders, that allow two x8 slots share the upstream bandwidth of one x8 port. In this case, it is recommended that one slot be populated with storage controller and the other with a network controller, as simultaneous heavy traffic is predominately in opposite directions.

 PCI-E Gen 1PCI-E Gen 2
 x4x8x4x8
4Gb/s FCdual-portquad-portquad-port?
8Gb/s FCsingle-portdual-portdual-portquad-port?

A dual-port 8Gb/s FC HBA should be placed in a x8 PCI-E slot or a x4 PCI-E gen slot. I am not aware that there are any quad-port 8Gb/s FC HBAs for gen 2 x8 slots, much less an 8-port for the gen 2 x16 slot.

Fibre Channel HBAs

There are currently two main vendors for FC controllers and HBAs, Emulex and QLogic. Keep in mind that the SAN controller itself is just a computer system and also has HBAs, for both front-end and back-end as applicable, from these same FC controller vendors. It might be a good idea to match the HBA, firmware and driver on both the host and SAN, but this is not a hard requirement.

On the Emulex HBAs, driver settings that used to be in the registry are now set from the HBAnyware utility. Of particular note are the pairs Queue Depth and Queue Target, and CoalesceMsCnt and CoalesceRspCnt. Various Microsoft documents say that increasing Queue Depth from the default of 32 to the maximum value 254 can improve performance without qualifications.

In life, there are always qualifications. At one time, this queue depth setting was for the entire HBA. Now on Emulex, the default is per LUN, with the option being per target. The general concept is that in a SAN storage system with hundreds of disk drives, limiting the queue depth generated by one server helps prevent overloading the SAN. Well, a line-of-business SQL Server database means it runs the business, and it is the most important host. So increasing the queue depth allowed helps.

Notice that I said a storage system with hundreds of disks. What if the storage system only has 30 disks? Does increasing queue depth on the HBA help? Now that Emulex defaults to per LUN, what if each LUN only comprises 15 disks? The Microsoft Fast Track Data Warehouse papers recommend LUNs comprised of 2 disks. What should the per LUN queue depth be?

My thinking is it should be any where from 2 to 32 per physical disk in the critical LUN. The disk itself has command queuing for up to 64 tasks (128 on current Seagate enterprise drives?). Piling on the queue increases throughput at the expense of latency. In theory, restricting the queue depth to a low value might prevent one source from overloading the LUN. An attempt to test this theory showed no difference in queue depth setting over a certain range.

Note: Queue depth has meaning at multiple locations: at the operating system, on the HBA, on the SAN storage controller, possibly both front and back-end HBAs, and on the disk drive itself.

SAN Storage Systems

As Linchi Shea pointed out, SAN stands for Storage Area Network. A storage system that connects to a SAN is a SAN based storage system. But it is common refer to the SAN based storage system as the SAN.

Many documents state that the bandwidth achievable in 2Gb/s FC is in range of 160-170MB/sec, and 320-360MB/sec for 4Gb/s FC. Nominally, 4G bits translates to 500M bytes decimal. Lets assume that there is a protocol overhead of 20% leaving 400M. Then translate this to MB binary where 1MB = 1,048,576 bytes. So 400MB decimal is really 380MB binary. So there is still a gap between observed and nominal bandwidth. Back in 2Gb/s FC days, I investigated this matter, and found that it was possible to achieve 190MB/sec from host to SAN cache, but only 165MB/sec from host to storage controller, then over the back-end FC loop to disk and back. The disks in the back-end are in a loop, with 15-120 disks in one loop path. It is possible that the number of disks in a loop influences that maximum achievable bandwidth.

In the 4Gb/s FC generation, EMC introduced the UltraPoint DAE with star-point topology to disks with an encolsure. This might be what allows EMC to achieve 360MB/s per 4Gb/s FC port.

Most SAN storage systems today are 4Gb/s on the back-end. The front-end might be able to support 8Gb/s FC. SAN vendors are usually not quick about moving to the latest technology. On the front-end, it only involves the HBA. The back-end is more complicated, also involving the disk drives and the enclosures, which might have custom FC components. Personally, I think storage vendors should just ditch FC on the back-end for mid-range systems and go to SAS like the Hitachi AMS. Otherwise customers should ditch the mid-range and go with multiple entry-level systems.

The SAN configuration employs four dual-port HBAs and four fiber channel loop pairs on the backend. Each FC loop pair consists of just that, two FC loops, each loop connected to a different storage/service processor (SP) depending on SAN vendor specific terminology.

EMC CLARiiON CX4

Some details of the EMC CLARiiON CX4 line is show below. Each Clariion system is comprised of two Service Processors (SP). The SP is simply an Intel Core 2 architecture server system.

 CX4 120CX4 240CX4 480CX4 960
Processor
per SP
1 dual-core 1.2GHz1 dual-core 1.6GHz1 dual-core 2.2GHz2 quad-core 2.33GHz
Memory per SP3GB4GB8GB16GB
Max Cache600MB1.264GB/s4.5GB10.764GB
Front-end FC ports (Base/Max)4/84/128/168/24
Back-end FC ports (Base/Max) 2/24/48/88/16

The Clariion CX4 line came out in 2008. I do have some criticism on the choice of processors for each model. First, the Intel Processor price list does not even show a 1.2GHz model in the Xeon 5100 or 3000 series. This means EMC asked Intel for a special crippled version of the Core 2 processor. The Intel Xeon processors start at 1.6GHz for a dual-core with a price of $167. The quad-core X3220 2.4GHz has price of only $198, so why in the world does EMC use the 1.2GHz dual-core at the low-end? Sure, basic storage server operations does not require a huge amount of compute cycles, but all the fancy features (that really should not be used in a critical SQL system) the SAN vendors advocate do use CPU-cycles. So when the features are used, performance tanks on the crippled CPU used in the expensive SAN storage system.

Now what we really want at the mid-range 480 level is having two processor sockets populated, as this will let the system use the full memory bandwidth of the Intel 5000 (or 5400) chipset, with 4 FB-DIMM memory channels. Yes, the 960 does have two quad-core processors, but I am inclined to think that the 960 (SP pair) with up to 16 back-end FC port might be over-reaching for the capability of the Intel 5000P chipset. If the CX4 960 in fact uses the 5400 chipset, then this might be a good configuration. But I have seen no documentation that the 960 can drive 5.6GB/sec. The quad-core E5405 2.00GHz processor is a mere $209 each, and the E5410 2.33GHz used in the high-end 960 model is $256 each. In late 2008, the dual-core E5205 1.86GHz was the same price as the quad-core E5405 2.0GHz. The Dell PowerEdge 2900 with 2 E5405 quad-core processors and 16GB was $2300.

This is less than the cost of each of the quad-port FC adapters, of which there are two in each SP of the 480. Consider also the cost of the 480 and 960 base systems, and that the 16GB memory in each 960 SP has a cost of around $800 each. Why not just fill the 16 DIMM sockets allowed by the 5000P chipset with 4GB DIMMs at about $3200 for 64GB per SP, unless it is because a large cache on a storage controller is really not that useful?

My final complaint in the EMC Clariion line is the use of a slice of the first 5 disk drives for the internal operating system (which is Windows XP or version of Windows). This results in the 5 disks having slightly less performance than the other disks, which can completely undermine the load balancing strategy. Given the price that EMC charges per disk, the storage system OS really should be moved to dedicated internal disks. If it seems that I am being highly critical of the EMC Clariion line, let me say now that the other mid-range SAN storage system use even more pathetic processors. So, the Clariion CX4 is probably the best of the mid-range systems.

HP StorageWorks 2000 Modular Storage Array

First, the model name and numbering system for the HP entry storage line is utterly incomprehensible. Perhaps the product manager may have been on powerful medications at the time, or there were 2 PMs who did not talk to each other. The official name seems to be StorageWorks 2000 Modular Storage Array, but the common name seems to be MSA2000 G2 (for the second generation). This name might just apply to the parent chassis family, comprised of the 2012 12-bay enclosure for 3.5in (LFF) drives and the 2024 24-bay for 2.5in (SFF) drives. The controller itself appears to be the MSA2300 with suffix for the front-end interface. There are two models of interest for database systems, the 4Gb/s fiber channel fc model and the 3Gb/s SAS sa model. Do not even think of putting a critical database server on iSCSI. The choice is between fc and sa on the front-end interface. The configured unit might be the 2312 or 2324.

Apparently there is also the StorageWorks P2000 G3 MSA. This appears to consolidate the G2 fc and i (iSCSI) models, with 8Gb/s FC. Above this, HP has the P4000 series. I am not sure how this relates to the EVA 4400 series.

The back-end interface is SAS, and allows both SAS and SATA drives. The back-end can also connect to either additional 12-bay LFF enclosures (MSA2000) or 25-bay SFF enclosures (MSA70). There is the option of having a single controller or dual-controllers. The storage expansion enclosures can have single or dual IO interfaces. My opinion is that SAS for the back-end interface is the right choice. FC incurs a large cost premium and has no real advantages over SAS. A single 4Gb/s FC port has one-third the bandwidth of a 3Gb/s x4 SAS port, and the same ration for 8Gb/s FC to 6Gb/s x4 SAS.

There are 2 FC ports per controller on the fc model, and four SAS ports on the sa model. There is a single (3Gb/s x4?) SAS port on the backend. HP initially put out a performance report showing reasonable performance numbers for the MSA2000 G2 with 96 15K drives on the fc model of 22,800 random Read IOPS and 1,200MB/sec sequential in RAID 10, but ridiculously low numbers of 10,800 IOPS and 700MB/s for the sa model. Either this was a benchmarking mistake, which seems unlikely for HP's history in this area, or there were bugs in the sa software stack. This was later corrected to 21,800 IOPS and 1,000MB/s. This configuration is essentially the maximum for the MSA2000 with 2.5in. The random reads works out to just over 225 IOPS per disk, but the sequential is 12.5MB/sec per disk. I am presuming that 1GB/sec sequential could have been reach with about 40 disks. The Microsoft Fast Track Data Warehouse Reference Architecture 2.0 document seems to indicate that 100MB/sec per disk is possible for 2-disk RAID-1 groups.

See HP MSA2000 Technical Cook Book. for additional details. If the URL is not correct, search either the title or document number 4AA2-5505ENW.

Hitachi AMS2500

I do not know much about the Hitachi AMS line, and have never worked on one (vendors should be alert to subtle, or not, hints). I point out this SAN storage system because Hitachi did submit a SPC benchmark report for it, with a price of about $1500 per 15K disk, amortizing the controller and supporting components. Most SAN storage systems usually work out from $2,500 to $3,500 per 15K 73 or 146GB disk, and up to $6K per 450 or 600GB disk, which seems to be what SAN vendors like to push, with horrible performance consequences. The Hitachi AMS has FC on the front-end and SAS on the back-end. The HP MSA 2000 and EMC Clariion AX also have SAS back-ends, but both are entry storage systems, in having limited backend ports. The Hitachi AMS is a mid-range comparable in bandwidth capability to the CX4 line. I reiterate that FC on the back-end is a major waste of money for less performance.

Enterprise Storage Systems

The big-iron storage systems are really beyond the scope of this document, but a couple of comments are worth noting. EMC top of the line used to be the DMX-4, which was a cross-bar architecture connecting front-end, memory and back-end. Last year (2009), the new V-Max line replaced the DMX-4. The V-Max architecture is comprised of up to 8 engines. Each engine is a pair of directors. Each director is a 2-way quad-core Intel Xeon 5400 system with up to 64GB memory (compared with 16GB for the CX4-960).

VMax

Each director also has 8 back-end 4Gb/s FC ports (comprised of quad-port HBAs?) and various options for the front-end including 8 4Gb/s FC ports. In the full configuration of 128 4Gb/s FC ports on the front and back ends, the expectation is that this system could deliver 40GB/s if there a no bottlenecks in the system architecture. Of course, there is no documentation on the actual sequential capability of the V-Max system. EMC has not submitted SPC benchmark results for any of their product line.

EMC V-Max documentation does not say what the Virtual Matrix interface is, but I presume it is Infini-Band, as I do not think 4 or even 8Gb/s FC is a good choice.

VMax

The main point here is that even EMC has decided it is a waste of time and money to build a custom architecture in silicon, and just using the best of Intel Xeon (or AMD Opteron) architecture components. It should be possible to build even more powerful storage systems around the Intel Nehalem architecture infrastructure. Unfortunately, storage systems evolve slowly, usually lagging 1-2 generations behind server systems.

Disk Enclosures

The next step in the chain of devices from the system IO bus to the disk drive is the disk enclosure (EMC uses the term DAE, which will also be used here even for non-EMC enclosures). Some years ago, a 3U enclosure for 15 3.5in disk drives was more or the less the only standard configuration.

HP may have been the first major vendor to switch to a 2U 12-disk enclosure for 3.5in drives.

12LFF

The standard configuration for 2.5in drives seems to be a 2U enclosure for 24 disks (below)

24SFF


or 25 drives.
25SFF

And the legacy 15 LFF (3.5in) disk enclosure.

15LFF

My view is that the 12-disk enclosure is better matched to a single IO channel than the 15-disk enclosure. The 24 disk SFF enclosure should be split into two separate channels.

I am not aware of any SAN vendors offering the high-density enclosures for 2.5in drives, except for HP in the Storage Works 2000 MSA line. This may indicate a serious lack of appreciation (or even understanding) of the importance of performance over capacity.

Part II, Next