Home, Cost-Based Optimizer, Benchmarks, Server Systems, Systems Architecture, Processors, Storage,
  Storage Overview, System View of Storage, SQL Server View of Storage, File Layout,

PCI-ESASFCHDDSSD Technology RAID ControllersDirect-Attach,
  SAN, Dell MD3200, CLARiiON AX4, CX4, V-Max, HP P2000, EVA, P9000/VSP, Hitachi AMS
  SSD products: SATA SSDs, PCI-E SSDs , Fusion iO , other SSD

I would like get more detailed information on high-end SAN, but deep material is hard to find. Here is a blog on VMAX versus VSP

SAN Storage Systems

As Linchi Shea pointed out, SAN stands for Storage Area Network. A storage system that connects to a SAN is a SAN based storage system. But it is common refer to the SAN based storage system as the SAN.

A SAN based storage system is really just one or more of computers that manage storage. The SAN computer (or controller or service processor) will have front-end port for host connectivity and back-end ports for storage connectivity. Fiber Channel was the popular choice for both front and back-end connectivity in the late 1990's and early 2000's, as there were many issues with SCSI in complex systems.


Most storage systems will have multiple controllers for full redundancy in both components and paths. High-end SAN systems always supported other front-end options, especially for mainframe connectivity. Recently, iSCSI connectivity is also becoming a popular options. Several storage systems are iSCSI only on the front-end, although this is not reccommended for IO intensive databases. SAS has becoming popular on the backend, first in the low-end and now even on the high-end, as FC has no performance advantages while adding a great deal of expense.

In observing the storage system market, I believed there were three classifications, entry, mid-range and high-end enterprise SAN systems. Both the entry and mid-range systems typically employed dual-controllers for failover. The entry system typically has limited expansion capability (in terms of the number of IO ports, front and back), often one fiber-channel loop-pair on the backend, but perhaps multiple ports on the front-end to eliminate the need for a switch in small environments. The mid-range system typically has multiple ports on both the front and back-end, with some expansion capbility. High-end systems are usually comprised of more than two controllers, and resemble a NUMA system with a cross-bar, or multiple interconnected systems.


Recently, I came across EMC documentation stating that departmental systems are usually active-passive, and enterpise systems are active-active. Regardless of the number of controllers, each LUN should have more than one path, but only one is active. LUNs may have different preferred active controllers, but any single LUN will only be accessed through one controller at a time. An Enterpise SAN is usually active-active, in that each LUN can be accessed through any controller. The definitions may change over time as fits the technology components. Last year, EMC abandoned the cross-bar architecture of the DMX line, for an interconnected system of 2-way quad-core Xeon 5400 processors. The interconnect technology was not described, but it could be Infini-Band.

Older Observations

8Gbps FC is now appearing in more new SAN models. Most SAN storage systems today are 4Gb/s on the back-end. The front-end might be able to support 8Gb/s FC. SAN vendors are usually not quick about moving to the latest technology. On the front-end, it only involves the HBA. The back-end is more complicated, also involving the disk drives and the enclosures, which might have custom FC components. Personally, I think storage vendors should just ditch FC on the back-end for mid-range systems and go to SAS like the Hitachi AMS. Otherwise customers should ditch the mid-range and go with multiple entry-level systems.

(this probably belongs some where else) The SAN configuration employs four dual-port HBAs and four fiber channel loop pairs on the backend. Each FC loop pair consists of just that, two FC loops, each loop connected to a different storage/service processor (SP) depending on SAN vendor specific terminology.

SAN Cache

For some inexplicable reason, people seem to want to believe that a large cache on the SAN can compensate for a low disk count. A separate disk cache for a database engine is fundamentally a silly idea. The database engine is a disk cache specially optimized for databases.

Suppose the database has an 8GB data buffer. What do you really expect to be in a 4GB SAN cache that is not already in the database engine’s internal cache? If the SAN cache is larger than the database cache, i.e., system memory, then one should probably find another person to do system configuration. Accessing cache in the database engine is much more efficient than going out to the storage controller cache.

All this having been said, a large write cache on the storage controller is beneficial in handling checkpoints and other write bursts. RAID controllers should probably be offered with 2-4GB cache, not the 256-512M typically offered.

One person said their performance improved in increasing the SAN cache from 80GB to 120GB, and the database server had 48GB, and that this was an indication that increasing SAN cache was not silly. The reason performance improved was most probably because the SAN disk subsystem has really bad performance, and increasing SAN cache got the entire database in-memory (on the SAN cache).

First, increasing memory on SQL Server would have eliminated disk reads from the SAN period (all writes still must be written eventually). Memory for a standard server may have costed $100/GB or so back in 2008. Memory on a high-end SAN system probably cost more like $2-4K per GB.

Second, reduced disk IO with the SQL Server buffer cache is far more beneficial than with the SAN cache, because the cost of evicting a page from the SQL Server buffer cache, and issuing a disk IO from the Windows OS to the SAN, cache hit there (no actual hard disk IO), sending it back to Windows, and finally SQL Server loading the page into its buffer cache is far more expensive than simply reading from the buffer cache.