PCI-E, SAS, FC,
RAID Controllers, Direct-Attach,
SAN, Dell MD3200, CLARiiON AX4, CX4, VNX, V-Max, HP P2000, EVA, P9000/VSP, Hitachi AMS
SSD products: SATA SSDs, PCI-E SSDs , Fusion iO , other SSD
Reference: Fibre Channel Industry Association (FCIA)
I am not a fan of Fiber Channel over Ethernet (FCOE) or even FC for that matter. FC using single links so even at 8Gbit/s with net bandwidth in the 700-800MB/s range, it is too weak for modern systems. SAS on otherhand naturally using multiple links, x4 being common so even at 6Gbit/s per lane, the total realizable bandwidth of 2.2GB/s is fine for recent (2009-2012) generation systems.
Back to FCOE. Hardware vendors advocate FCOE perhaps because there is some commonality in generating 10GbE and 8Gbps FC signals. Then there could be some simplifications of the server room environment with common network switches.
This might be a worthwhile long term objective, but I would rather see FC as a storage protocol dropped altogether in favor of Infini-band. In the current time, I see no justification for promoting FCOE in production environments. My understanding is that 10GbE switches are more expensive than 8Gbps FC on a per port basis. So promoting FCOE will further increase pressure to accept fewer ports for critical database servers.
Furthermore, my understanding was that FCOE and 10GbE did not scale with the number of ports while 8Gbps and previous versions of FC scaled with ports just fine. Apparently Intel was fully aware of this, and has been working to resolve this matter. (Search Intel Data Direct IO or DDIO.) Below is an Intel chart on 10GbE scaling.
For some reason, it appears scaling from 2 to 4 10GbE port is poor, while there is continued scaling to 6 and 8 ports.
It is Fibre channel to emphasize that the media is not necessarily fiber. Or it might be that some one thought fibre was more sophisticated. Fibre Channel started out at 1Gb/s, which after encoding overhead, provided about 100MB/sec bandwidth, roughly comparable to its contemporary SCSI Ultra2 at 80MB/sec with 16-bit bus. FC moved to 2Gbps in 2001 soon after Ultra-160 (1999), then to 4Gpbs in 2005 and to 8Gbps in 2008. These dates are for first availability. The complete storage system typically takes another 2 years to complete the move on the HBAs, switches, hard disks, and disk enclosures.
The 7-year interval for a 4X increase in bandwidth from 2Gbps to 8Gpbs was probably a mistake that left FC vulnerable to other protocols. The more serious mistake might have been in staying with a single lane, unlike SAS which employed 4 lanes as the standard connection. Signalling at 4Gpbs in 2005 was very advanced for the time, but the bandwidth was not adequate.
A dual-port 4Gb/s FC HBA is a good match for a PCI-E gen1 x4 slot (2 x 4Gb to 4 x 2.5Gb). To make best use of system IO bandwidth, the gen1 x8 slot should be populated with a quad-port 4Gb/s FC HBA. Some Intel 4-way Xeon systems with the 7300MCH have one or two PCI-E bridge expanders, that allow two x8 slots share the upstream bandwidth of one x8 port. In this case, it is recommended that one slot be populated with storage controller and the other with a network controller, as simultaneous heavy traffic is predominately in opposite directions.
| ||PCI-E Gen 1||PCI-E Gen 2|
A dual-port 8Gb/s FC HBA should be placed in a x8 PCI-E gen1 slot or a x4 PCI-E gen2 slot. I am not aware that there are any quad-port 8Gb/s FC HBAs for gen 2 x8 slots, much less an 8-port for the gen 2 x16 slot. (2010 Oct) Both Qlogic and Emulex have recently released (or announced) quad-port 8Gb FC HBAs.
Many documents state that the net bandwidth achievable in 2Gb/s FC is in range of 160-170MB/sec, and 320-360MB/sec for 4Gb/s FC. 4 Gbits translates to 500 Mbytes decimal. After the 8b/10b encoding overhead, the remaining bandwidth is 400MB/s. On top of this, there is the FC protocol, so the net bandwidth is less.
Note bandwidths are always quoted in decimal, 1M = 1,000,000 while storage capacity is maybe given in binary by the operating system (disk vendor standard practice is to use decimal capacity). In binary, 1KB = 1024B, 1M = 1024K = 1,048,576 bytes. Disk performance counters are in decimal.
On the gap between observed and nominal bandwidth, back in 2Gb/s FC days, I investigated this matter. It was possible to achieve 190MB/sec from host to SAN cache, but only 165MB/sec from host to storage controller, then over the back-end FC loop to disk and back. The disks in the back-end are in a loop, with 15-120 disks in one loop path. It is possible that the number of disks in a loop influences that maximum achievable bandwidth.
In the 4Gb/s FC generation, EMC introduced the UltraPoint DAE with star-point topology to disks with an encolsure. This might be what allows EMC to achieve 360MB/s per 4Gb/s FC port.
There are currently two main vendors for FC controllers and HBAs, Emulex and QLogic . Keep in mind that the SAN controller itself is just a computer system and also has HBAs, for both front-end and back-end as applicable, from these same FC controller vendors. It might be a good idea to match the HBA, firmware and driver on both the host and SAN, but this is not a hard requirement.
On the Emulex HBAs, driver settings that used to be in the registry are now set from the HBAnyware utility. Of particular note are the pairs Queue Depth and Queue Target, and CoalesceMsCnt and CoalesceRspCnt. Various Microsoft documents say that increasing Queue Depth from the default of 32 to the maximum value 254 can improve performance without qualifications. The Qlogic management tool is San Surfer (search: SANsurfer FC HBA Manager Users Guide)
In life, there are always qualifications. At one time, this queue depth setting was for the entire HBA. Now on Emulex, the default is per LUN, with the option being per target. The general concept is that in a SAN storage system with hundreds of disk drives, limiting the queue depth generated by one server helps prevent overloading the SAN. Well, a line-of-business SQL Server database means it runs the business, and it is the most important host. So increasing the queue depth allowed helps.
Notice that I said a storage system with hundreds of disks. What if the storage system only has 30 disks? Does increasing queue depth on the HBA help? Now that Emulex defaults to per LUN, what if each LUN only comprises 15 disks? The Microsoft Fast Track Data Warehouse papers recommend LUNs comprised of 2 disks. What should the per LUN queue depth be?
My thinking is it should be any where from 2 to 32 per physical disk in the critical LUN. The disk itself has command queuing for up to 64 tasks (128 on current Seagate enterprise drives?). Piling on the queue increases throughput at the expense of latency. In theory, restricting the queue depth to a low value might prevent one source from overloading the LUN. An attempt to test this theory showed no difference in queue depth setting over a certain range.
Note: Queue depth has meaning at multiple locations: at the operating system, on the HBA, on the SAN storage controller, possibly both front and back-end HBAs, and on the disk drive itself.
QLogic may use the term Execution Throttle for queue depth. For check, search that term.
The QLogic SANSurfer document provides some diagrams to demonstrate HBA configuration. Below is single path to each target/LUN. Note the QLogic and Emulex use the term Target differently.
Below, there are alternative paths for Targets 0, 1, and 2.
Below is a cluster, each host having a single path to each LUN.
Below, the connection between the HBA and storage go through the SAN fabric.