Home, CBO, Performance Benchmarks, Server Systems, Systems Architecture, Processors, Storage,

SSD Form Factors

The initial SSD products have either been designed for both HDD interface and form factors or plugged into a PCI-E slot, (along with some complete SSD storage systems). This strategy was to leverage existing system or storage infrastructure.

As SSDs established market volume, designs started to explore new possibilities with SSD instead of HDD storage. Mobile systems is a big driving force for new SSD form factors. Below are some existing and new SSD form factors compared to the 2.5in HDD.

PCIe

For that matter, notebooks were the primary reason for 2.5in hard disks. It was only later that enterprise 10K and 15K 2.5in HDD were introduced for server environments that desired density. The difference, aside from disk RPM and SATA versus SAS interface was that mobile quickly drove thickness down to 7mm, while servers stayed with the original 15mm form factor. With SSD, there is no reason to retain the 15mm dimension, but I do favor mixed SSD and HDD storage, to take advantage of both the bidirection signalling in SAS/SATA and reduce wear on low cost MLC SSD.

There are some groups advocating PCI-E over SATA, but to transition the huge installed base of SATA infrastructure, a transition plan is proposed under Express bay and the SFF-8639 connector.

PCIe

PCIe

In the transition phase, systems and components will use the the SFF-8639 connector. The SSD can implement either a SATA or PCI-E interface. The connector will route the SSD signal to the correct interface on the system side.

PCIe

Another aspect on the personal computing side is the need to introduce boot support for devices on PCI-E, similar to what is available via AHCI for SATA devices. Intel is rolling this into the Non-Volatile Memory Express (NVME) encompassing NVM Host Controller Interface (NVMHCI).

PCIe PCIe

Another aspect of NVME in addition to providing a common boot driver for PCI-E SSD is a driver architecture suitable for scaling to massive IOPS and high-end NUMA systems.

EMC VNX

Below is a SAN concept on NVME.

PCIe

High Endurance MLC

There are multiple indepent efforts to improve MLC NAND write endurance. The Intel program is High Endurance Technology. Another is called MLC-EE. One product employs a small SLC area with a large MLC area with the name eMLC, but eMLC is also used for the completely different entity embedded MLC.

x

PCIe

Below is an example of write endurance for standard MLC, MLC-EE and SLC.

PCIe

Below shows for recoverable bit error rate (RBER) of MLC and HET. The source is probably Intel because of the term HET.

PCIe

IBM endurance graph for SLC, C-MLC and E-MLC. 731 454

PCIe

Micron SLC and common MLC endurance and ECC. 989 556

PCIe

Source Ed Grochowski? 779 634

PCIe

100GB SSD endurance for MLC, eMLC and secret sauce MLC

PCIe

MLC and SLC write performance

PCIe

NAND_eMLC

PCIe

FTL

PCIe

Garbage Collection

PCIe

Fusion-io TPAK FPGA

PCIe

Fusion-io TPAK

PCIe

Fusion-io TPAK FPGA

PCIe

DataDirect Networks, DDN.com? 406 415

PCIe

this looks like 2 Storage Processors, each a 2 Intel Xeon E5 8-core system, with Infini-band on the front-end, and SAS on the back-end. Is the cache link QPI?

PCIe

PCIe

HL NAND, hyper link?

PCIe

PCIe

Storage Performance 2012: 5GB/s and 10GB/s

The storage performance objectives are 5GB/s for 2-socket Intel Xeon E5-2690 processors and 10GB/s for 4-socket Xeon E5-4660 processors. This is not a rigid requirement, but rather a target both in terms of what SQL Server can consume and what can be achieved with good hardware selection, configuration, and database organization at a price in balance with the overall project. Notice that I do not restrict this criteria to data warehouse systems only. It is very under appreciated that powerful IO bandwidth is beneficial transaction processing systems.

Microsoft put out the first Fast Track Data Warehouse Reference Architecture document perhaps in 2008. The problem was that most storage systems in use were utterly incapable of supporting the Data Warehouse environment. On top of the deficient IO bandwidth of most SAN storage systems, the configuration practices advocated by SAN vendors further compounded the problem. This is probably one reason why Oracle offerred the Oracle Database Machine, a complete server, storage amd software package all pre-configured to deliver IO bandwidth for DW. Microsoft presumably did not want to sell complete packages for SQL Server (prior to PDW), and so responded with detailed guidance in the FTDW Reference Architecture. The point of the FTDW Reference Architecture was to help people understand the proper performance criteria for a data warehouse system and how to achieve it with SAN storage.

In 2008, the latest processors were based on the Intel Core 2 architecture the Intel Xeon 5400 series based on the Core 2 architecture. The FTDW RA cited an IO target of 200MB/s per core. This was based on an intermediate complexity table scan aggregation query. A simple table scan can consume data at a higher rate. A table that aggregates many columns and/or requires hash match aggregation would consume data at a lower rate. Large joins generally consume data at a lower rate.

IO Performance Targets

Even though Microsoft updated the FTDW RA several times over the years, including refinements of the architecture, the basic numbers were never revised. To account this, my estimated for data consumption rate in a simple table scan aggregation query for processor architecture from the Core 2 architecture are as follows.

IntroProcessor ArchitectureFreqCores per
processor
Total
cores
MB/s
per core
Total
Bandwidth
2007Xeon 5460 Core2/Penryn3.13GHz48200MB/s1,600MB/s
2009Xeon 5570 Nehalem2.93GHz48275MB/s2,200MB/s
2010Xeon 5690 Westmere3.33GHz612275MB/s3,300MB/s
2012Xeon E5-2690Sandy-Bridge2.90GHz816300MB/s4,800MB/s

IntroProcessor ArchitectureFreqCores per
processor
Total
cores
MB/s
per core
Total
Bandwidth
2008Xeon 7460 Core2/Penryn2.66Ghz624200MB/s4,800MB/s
2010Xeon 7560 Nehalem2.26GHz832250MB/s8,000MB/s
2011Xeon E7-x870 Westmere2.40GHz1040250MB/s10,000MB/s
2012Xeon E5-4650Sandy-Bridge2.70GHz832300MB/s9,600MB/s

Storage Components HDD

The performance objectives above might seem impractically high, but in fact can be achieved at cost structure in balance with SQL Server Enterprise Edition licensing. FTDW RA cites an objective of 100MB/s per disk. A storage element based on an entry SAN with 20-22 disks would have 4 RAID groups of 4 disks in RAID 10, and the remaining disks for logs, hot-spares and other. This unit could achieve 1.6GB/s on the data volumes at 100MB/s per disk. The front-end connection for the HP P2000G# is 4 FC ports at 4Gbps, which could probably support 1.4-1.5GB/s. So 3 units would in the right range for a 2-socket system and 6 units for a 4-socket system. The cost structure of an entry-SAN can be as low as $1,000 per disk. So each element shuld be $20,000.

Personally I have always preferred direct-attach storage for DW with the argument that clustering is not required. It is easy to achieve high IO bandwidth with direct-attach storage, while a SAN environment, it is absolutely essential to follow the complicated FTDW RA configuration. The is because the connections ports are SAS x4. That is, 4 SAS lanes in parallel. Each 6Gpbs SAS x4 port has a nominal bandwidth of 3GB/s. After 8B/10B encoding and protocol overheard, the net bandwidth is 2.2GB/s. An SAS RAID controller typically has 2 x4 SAS ports. The first generation 6Gbps SAS RAID controllers had a limit of 2.8GB/s. On the PCI-E front-end, the net bandwidth is 3.2GB/s, so more recent or future controllers might be able to do this. Based on 2.5GB/s per RAID controller, the 2-socket bandwidth objective can be reached 2 with RAID contollers, each with 24 disks for data, and additional disks for logs and spares.

Another preference I have is to target 30MB/sec per disk instead of 100MB/sec in the FTDW RA. The reason is that 100MB/sec per disk can only be achieved with a great deal of effort to ensure essentially sequential file allocation for the large tables. This will often ential a strategy of rebuilding the clustered index, moving the data between 2 filegroups. The 30MB/s per disk bandwidth can be realized with 256K IO at 120 IOPS, i.e., a broken sequential layout that is common after normal database incremental population. My strategy is to fill the PCI-E x8 slots (after populating 1 dual-port 10GbE adapter and several PCI-E SSD) with RAID controllers. Start by attaching one 24-disk enclosure to each RAID controller. Add a second and possibly a third 24-disk enclosure to each RAID controller as necessary to reach the IO bandwidth objective.

Based on a direct-attach cost structure of $500-600 per disk ($300 for a 146GB 15K or 300GB 10K HDD) the 5GB/s objective would cost about $30K at the aggressive 100MB/s strategy or $80K at the more relaxed 30MB/s target.

Storage Components SSD

Since the original the original FTDW RA, SSDs have evolved into highly capable storage devices at the right price structure.

Intel SSD 910 at $2K per 400GB, 1GB/s.

Fusion IO $11K per TB,

misc

Because there is no single rigid IO bandwidth requirement that applies to all Data Warehouseet environments, storage performance should be treated as an objective factoring the IO generated by various SQL operations and the IO bandwidth that can be achieved with modern hardware in balance with the overall project cost structure.

Xeon X7460 Q3 2008. Xeon X7560 Q1 2010. Xeon E7-x870 Q2 2011. Xeon E5-4650 in Q2 2012 at 2.90GHz.