Home, Cost-Based Optimizer, Benchmarks, Server Systems, Systems Architecture, Processors, Storage,
  Storage Overview, System View of Storage, SQL Server View of Storage, File Layout,

PCI-ESASFCHDDSSD Technology RAID ControllersDirect-Attach
  SAN,  Dell MD3200,  EMC AX4CX4VNXV-Max,  HP P2000EVAP9000/VSP,  Hitachi AMS
  SSD products: SATA/SAS SSDsPCI-E SSDsFusion iOother SSD

Other Solid-State Storage Products Today

STEC

STEC makes the SSD for the EMC DMX line, possibly other models, and for several other storage vendors as well.

Zeus IOPS

The Zeus IOPS has FC 4Gb, SATA 3Gb and SAS 3Gb interface options, 3.5in and 2.5in form factors, based on 2008 documentation. The STEC website lists the Zeus IOPS with FC 4Gb (3.5 FF only), SAS 3Gb and 6Gb (2.5 and 3.5in FF) options. Capacity upto 800GB for 3.5in and 400GB for 2.5in.

The STEC 2008 Whitepaper and 2008 Brochure lists different specifications for the Zeus IOPS. The STEC website of 2010-10 presumably lists the specifications for the most recent Zeus IOPS.

 SequentialRandom
 ReadWriteReadWrite
Whitepaper (2008)250MB/s200MB/s52K17K
Brochure (2008)220MB/s115MB/s45K16K
New? (2010)350MB/s300MB/s80K40K

It is presumed that there have been multiple generations of the Zeus IOPS. Perhaps STEC will eventually distinguish each generation and model more clearly.

Zeus RAM

This a RAM-like device? dual-port SAS interface, upto 8GB capacity. Average latency is 23μs

Toshiba

Toshiba has an SSD line for which the sequential read is 250MB/s and write is 180MB/s. The interface is SATA 3Gb/s. NAND and NOR flash were invented by Dr. Fujio Masuokaaround 1980, then working at Toshiba .

Oracle (Sun)

Oracle-Sun has a comphrensive line of storage products, even accounting for the Exadata Storage System. The main Oracle Flash storage product comprises the F5100 Flash Array, the F20 PCIe Card, SSD with SATA interface, and Flash Modules (these go in the F5100?). If someone could look into the Oracle Sun Flash Resource Kit, I would appreciate it.

Oracle F5100 Flash Array

The Oracle Sun F5100 Flash Array is a very progressive storage system. The interface has 16 x4 SAS channels, presumably more for multi-host connectivity. The storage unit board is shown below. The modules to the left are Energy Storage Modules (ESM) supercapacitors.

x

A Flash Module below.

x

Model
Controller
NANDCapacityMax ReadMax WriteRandom
Read IOPS
Random
Write IOPS
Price
F5100SLC480GB3.2GB/s2.4GB/s397K304K$46K
F5100SLC960GB6.4GB/s4.8GB/s795K610K$87K
F5100SLC1920GB12.8GB/s9.7GB/s1.6M1.2M$160K

Latency is listed as read 378μs, write 245μs. These are good numbers considering that this is a multi-host external storage system. The capacity is net after 25% was reserved for wear leveling (over-provisioning). For anyone (excluding young people) who have ever lifted a hard disk storage unit, the weight of the F5100 is 35lb or 16kg.

Oracle F20 PCIe Card and SSD

The PCI-E card Flash unit has capacity 96GB, 100K IOPS, $4,695. The SSD is 32GB, SLC, 2.5in, 7mm, rated for Read/Write 250/170MB/sec Sequential, 35K/3.3K IOPS, SATA interface. (I think this is the Intel X-25-E?). Price around $1,299? Sun also has Flash Modules, SO-DIMM form factor, SATA interface, 24GB, 64M DRAM. I think these go in the F5100. Do Sun systems have SATA SO-DIMM connectors?

LSI

LSI has a new PCI-E SSD product, the LSI SSS6200, with SLC NAND. I am not sure if the product is actually in production, or just sampling.

x

The diagram below is more to the point. The SSS6200 uses the LSI SAS2008 SAS controller, with up to 6 SATA SSD modules.

x

x

The specifications are very impressive, as it is fully current with PCI-E gen2. Latency is 50μs. The reliability specifications are 2M hours and BER 1 in 1017.

Model
Controller
NANDCapacityMax Read
64K
Max Write
64K
Random 4K
Read IOPS
Random 4K
Write IOPS
Price
SSS6200SLC100GB??150K190K?
SSS6200SLC200GB??150K190K?
SSS6200SLC300GB1.4GB/s1.2GB/s240K200K?

Other Solid-State Storage

Texas Memory Systems (RamSan)

Texas Memory Systems (RamSAN) has been in the solid-state storage business for a long time. Their products include SAN systems with various solid-state options. The RamSan-440 is a SAN system with 512GB DRAM memory as storage, 4.5GB/s, 600K IOPS at 15μs latency, and FC interface (4Gbps was listed, I expect 8Gb is now available?). The RamSan-630 has 10TB SLC Flash, 10GB/s, 1M IOPS, 250μs read latency, 80?s write, and FC or Infini-Band interfaces. There are also a PCI-E SLC NAND products.

Violin-Memory

Volin Memory makes SAN type solutions? Their website describes a technique for RAID on flash (vRAID), as the disk-type RAID is not suited. Spike-free latency is discussed.

x

x

DDR DRAM Storage

DRAM based SSD vendors include DDR Drive,
HyperOSSystems (HyperDrive).

The advantages of DRAM is that lead-off latency is far lower than with NAND (26-50μs at the chip level). Disadvantage is DRAM is volatile and more expensive per GB than NAND. Some means is necessary to ensure data protection during a power loss. (SuperCap can provide sufficient power to sweep data to NAND)

The DDRdrive is a consumer device, with a PCE-I x1 interface. It can deliver Read/Write 300K/200K+ IOPS at 512B, reflecting the much faster access of DRAM over NAND, and also uses NAND flash for power loss protection. IOPS at 4K is 50K/30K+, bandwidth 215/155MB/s, reflecting the limit of PCI-E x1.

More Vendors

I will add details as time permits, and pending level of interest in SQL Server world.

Smart Modular Technologies, Bit Micro.

There is also vendor that manufacturers PCI-E expansion units, that might be of interest.

Complete Rethinking of Memory and Storage

I have also thought that it was well past time to completely rethink the role of DRAM in computer system architecture as well. For years now, the vast majority of DRAM has been used as data cache, if not by the application, then by the operating system. The concept of a page file on HDD is totally obsolete. Is this not the signal that a smaller "true" main memory should be moved closer to the processor for lower latency. A completely seperate DRAM system should then be designed to function as data cache, perhaps having the page file moved here. At the time, I was thinking SRAM for true main-memory, but the Intel paper describing a DRAM chip mounted directly on the processor die with through connections would achieve the goal of much reduced latency along with massive bandwidth.

(Apparently there is a startup company RethinkDB that just raised $1.2M to do this on MySQL, "the database is the log". I hope our favorite database team is also rethinking the database?)

IBM Storage Class Memory

The FAST'10 tutorial by Dr. Richard Freitas and Lawrence Chiu of IBM, Solid-State Storage: Technology, Design and Applications introduces(?) the term Storage Class Memory (SCM).

Fusion-IO NAND Flash as a High Density Server Memory

The FMS 2010 presentation "NAND Flash as a High Density Server Memory" by David Flynn CTO, now CEO of Fusion-IO, proposes NAND flash in the form of ioMemory modules as high-density server memory. The current DRAM main memory is for the operating system, application in ioMemory meta-data, with the NAND flash ioMemory for the data previously buffered in DRAM. My thinking for this to work in database, there should be some logic capability in the flash memory.

References

Below are links to websites that cover non-volatile memory and solid-state storage.
As always, Wikipedia is a good starting point: non-volatile memory, Flash Memory, Solid-state drive.
The Open NAND Flash Interface ONFI.
The Flash Memory Summit conference (FMS 2010 conference proceedings requires an email).
Micron, Intel IDF, and Microsoft WHDC, parent site of WinHEC
WinHEC 2008 Design Tradeoffs for Solid-State Disk Performance ent-t539_wh08.
Micron WinHEC 2007 NAND Flash Memory Direction Presentation.
USENIX HotStorage '10, and FAST '10,
The 26th IEEE Symposium on Massive Storage Systems and Technologies MSST2010
Standard SSD performance reporting: currently there is not a common standard, be alert for questionable measurement and reporting. SNIA Technical Activities, will try to establish a standard Solid State Storage Performance Test Specification.
See Tom's Hardware SSD 102: The Ins And Outs Of Solid State Storage for SSD.
Marc Bevand Zorinaq's blog seems to have useful information on SSD.

Updates

Below are the Toshiba blade SSD for Mac Airbooks. I like this form factor. The standard SSD storage enclosure could be a 1U rack, for about 80 of these devices. This should probably be supported by 16-24 SAS 6Gbps lanes, for 8-12GB/s bandwidth. Each device can support 200MB/sec, so the 80 slots are really for capacity expansion.

x

This post is still rough, I will polish this over time, more frequent updates at

SSD in Database Servers

As SSD became more mature and widely available in storage systems, people began to gush how SSD was going to solve database IO performance problems. In today's server systems, 512GB memory at $400 per 8GB DIMM costs $25,600, and 1TB at $1,100 per 16GB DIMM costs $70K. A properly designed transactional database should not generated excessive disk IO than what can be managed with a properly configured storage system, hard disk or SSD. A transactional database that also supports reporting could generate intensive IO that would benefit from SSD. A Datawarehouse/DSS system could generate heavy disk IO, but much of it sequential, which can be handled by a properly configured hard disk storage system.

This is not to say that SSD have not place, but rather it is helpful to have an alternative for massive HDD arrays. Now not every database has been properly designed from architecture, to SQL to indexing. Not every storage system has been properly configured. In these cases, SSD is a brute force solution to compensate for a serious lack of competence in other areas.

SSD versus HDD

Random IO is the natural fit for SSD. Let us suppose the amortized cost of a 146GB 15K disk is $500 in direct attach, $2K in a SAN, and a similar capacity SSD is $3000. Then table below shows cost per GB, and cost per IOP between HD and SSD, using fictitious but reasonable numbers.

HDDSSD
 Direct AttachSANSLCMLC
Capacity146GB146GB150GB?150GB?
Amortized Unit Cost$500$2000$3000$450
IOPS 8KB20020020,00020,000?
$/GB3.413.7203
$/IOP2.5100.150.02

A database storage system should not be entirely SSD. Rather a mix of 15K HD and SSD should be employed. The key is to put the data subject to high random IO into its own filegroup on the SSD.

Some people have suggested and temp as good candidates for SSD. For a single active database, a storage system that works correctly with the hard drive sequential characteristics is fine. It is only the situation of multiple high activity databases. Ideally, the storage controller cache interprets the pattern of activity from multiple log files on one LUN, so that a single RAID group is sufficient. If not, then this is a good fit for SSD.

I am not certain that SSDs are necessary for temp. For the data warehouse queries, the temp on a large HD array seems to work fine. In the TPC-H benchmark results, the queries that showed strong advantage for SDD involve random IO, not heavy tempdb activity.

Two criteria can sometimes terminate extended discussion. If the database happens to fit in memory, then there will not be heavy disk activity, except possible for log and temp. The log activity can be handled by disk drives. In this, it may not be worth the effort to setup a 48-disk HD array for temp, so a single SSD for temp is a good choice.

The second criteria is for slightly larger databases that exceed system memory, perhaps in the 200-400GB range, but are sufficiently small to fit on one or two SSDs. Again, it may not be worth the effort to setup the HD array, making the SSD a good choice.

Solid State Storage in the future without RAID?

A point I stress to people is to not blindly carry a great idea from the past into the future without understanding why. (Of course, one should also not blindly discard knowledge, most especially the reason underlying reason behind the knowledge).

So why do we have RAID? In the early days, disk drives were notoriously prone to failure. Does anyone remember was the original platter size was? I thought it was in the 12-18in range. MTBF may have been in the 1000hr range? Even today, at 1M-hr MTBF, for a 1000-disk array, the expectation is 8.8 disk failures per year (the average hours per year is 8,765.76, based on 365.24 days) Some reports show much higher failure rates, perhaps 30 per year per 1000 disks. Of course this includes all components in the storage system, not just the bare drive.

Sure, SSDs will also have a non-infinite MTBF. But the HD is fundamentally a single device. If the motor or certain components in the read/write mechanism fails, the entire disk is inaccessible. An SSD is not by necessity inherently a single device. The figure below shows a functional diagram of an Intel SSD. There is a controller, and there are non-volatile memory chips (NAND Flash).

x

In system memory, the ECC algorithm is design to correct single bit errors and detect double-bit errors within an 8-byte channel using 72-bit. When four channels are uniformly populated it can also detect and correct an entire x4 or x8 dram device failure and detect double x4 chip failures. I suppose SSDs might already have some chip failure capability (but I have not found the documentation that actually states this detail).

There should be no fundamental reason an SSD cannot have redundant controllers as well. with proper design, the SSD may nolonger be subject to single component failure. with proper design, an SSD storage system could conceivably copy off data from a partially failed individual SSD to a standby SSD, or even a disk drive.

I stress that may not be what we have today, but what I think the future should be.