QDPMA.com - SQL Server Consultant - Joe Chang
About Joe, Contact: jchang6 at yahoo.com or jchang61 at outlook.com
SRAM as Main Memory
Proponents of SRAM as main memory almost never get past the huge price disparity between DRAM, heavily optimized for cost, and SRAM, used as on-die cache optimized for performance in L1 and L2, for density in L3 and sometimes for power. It may also be difficult to imagine database transaction processing as a candidate for SRAM main memory given the huge memory configurations commonly deployed. But consider the following.
1. A system could have both SRAM and DRAM as main memory,
with the SRAM memory being far larger than the on-die cache of modern processors,
yet smaller than DRAM. Thinks several to ten plus GBs versus several tens of MBs.
2. The large majority of memory in a database system is used to reduce IO, for which DRAM performance parameters and cost are eminently suitable. A minority of memory implemented with SRAM could potentially achieve a substantial reduction in average memory latency.
3. It so happens that database transaction performance is largely governed by round-trip memory latency, given the nature of B-tree index navigation and the disparity between the CPU clock-cycle and DRAM latency.
4. Attempting to scale transaction processing performance on a multi-socket system has moderate to limited effectiveness as few (zero?) real world databases have been architected to the characteristics of a system with non-uniform memory access (NUMA) and can even be counter-productive.
Scaling from 1 to 2 sockets might be 1.6X or less, and even lower from 2S to 4S, with the cumulative 1S to 4S gain of around 2X.
5. Increasing the degree of Hyper-Threading (HT), generic term Simultaneous Multi-Threading (SMT), has the potential for outstanding throughput performance at very high concurrency so long as there is almost zero contention between threads (guess how likely this is), but does nothing for single thread performance.
6. There is an established and prevalent pattern of willingness to throw hardware at the transaction processing system in the hope that this will solve problems of poor architecture and coding. This includes the employment of multiple $5-10K processors. Yet doing so often aggravates existing and creates new problems from NUMA and high concurrency operation.
7. A single-socket system has better single-thread performance than a multi-socket system as all memory accesses are to the local node resulting in lower average latency. Even though a single-socket system is now capable of handling most workloads, it might be perceived as insufficient for the usual bigger and more expensive must be better preconceived notions.
However, the single-socket system with SRAM plus DRAM memory further improves single thread performance and could match the throughput of the conventional DRAM two-socket and perhaps even that of a 4-socket system. The key is to make memory latency as low as possible, even if extraordinary measures are necessary.
The single socket system would not have the negative aspects of NUMA. It would need fewer cores and hence operate at lower concurrency level, avoiding contention issues. All of this makes the single-socket SRAM system a better technical solution in having fewer problematic aspects.
8. Consider that the flagship Intel Xeon SP 8180 Platinum 28-core processor carries a price of $10,000. The SRAM budget would be $10K at a 50% gain, and $30K at 100% gain for equality in throughput. The better single-thread performance and fewer negative aspects of the single socket SRAM memory system also has value, which could be substantial.
All of the above creates substantial space for SRAM to have high value impact as main memory. An SRAM cost structure 100 times higher than DRAM would not be prohibitive. And knowing database world, the real question will be: how much can I get, and I want more that.
I made this for a talk a few years ago. I liked is so much that it is now in all my slides.
Of course, that does not mean it is complete, so please send feedback via Joe on SQLblogs
In previous articles, I have discussed processor and system architecture, server systems, and performance benchmarks, often mixing more than one topic. Going forward, I will consolidate each topic into separate collections:
Query Optimizer and Performance formerly "The SQL Server Cost-Based Optimizer"
Scaling Project Overview 2016-12
Server Systems 2017-11
Rethink Server Sizing 2017-12
Memory Latency and NUMA 2016-12
System Architecture 2017-01
Amdahl Revisited Update 2015-05
Processor Architectures Updated 2016-12 new die images scaled
SQL Server scaling and other characteristics are investigated using the TPC-H data generator and test queries. The test conditions are not identical with publication requirements.
Parallel Execution in SQL Server 2016 (2017-01).
Hyper-Threading performance on TPC-H queries (2013-03).
ExecStats is my own SQL Server performance analysis and data collection tool. Please try it out and send feedback. SQL Exec Stats has been combined with SQL System as of 2012 Q4,
I am abandoning the specific build links with a generic that should point to the latest. The general link should normally be for the latest build, but it will be the last .NET 3.5 for the time being.
Exec Stats (latest, .NET 4.00, 2017-10-24)
Exec Stats 2017-09-11 change perf ctr data, change in sql text, may have some inconsistencies.
Exec Stats 2017-08-31 change main info order
Exec Stats 2017-07-02 Azure fixes, sys.databases database_id values are not consistent between master and user database, must use value in the user database.Exec Stats 2017-05-12 Collecting % Processor Performance and Processor Frequency, PF shared with the % Processor, and PP shared with IOPS. Left click on respective graph until back color changes.
Exec Stats 2017-05-08 preliminary support for Query Store
Now using Visual Studio 2017. For unknown reason, VS2017 does not like config files generated by previous versions. So copy your setting from previous versions to the current config file.
Notes on olders versions:
For the last several months, the principle mode of dm_exec_query_stats grouping was a hybrid of query hash and batch via sql_handle. The issue was that the same SQL/query hash would be assigned to the first batch encountered. The new version reintroduces batch mode, which is also a hybrid, first using the sql_handle + offset. However, any query hash for the the end is -1 is grouped by query hash. This should work for most cases, but please advise if not.
Exec Stats 2016-10-08 latest Gembox 3.9, VS15 preview 5.
Exec Stats 2016-03-03 now using dm_os_performance_counters instead of Windows API calls for SQL Server performance counters. Windows Server 2012 did not return values, and I could not find any discussion of this matter.
Exec Stats 2015-12-16 SQL Server 2008R2 RTM fix, exception handling
Exec Stats 2015-11-04 New batch mode, query hash option
2015-03-11 partition key not lead annotated
Documentation to follow. Feedback is desired.
My plan is to use JSON but I have not done so at this time.
SQLExecStats documentation, in the process of being updated,
ok, the documentation is seriously out of date.
Every time I update the documentation, I make further changes to ExecStats,
obsoleting the documentation changes just made - oh well.
Sometime I will try allow reading execution plans from SQL Trace files as well. I am contemplating implementing a very basic graphical plan. I would rather not do this as SQL Sentry has a well reviewed Plan Explorer, but it would be useful.
Climate, not my field, but I had to look into it.
SQL Performance Tools (Obsolete)
Older versions have been removed. Send me an email if older versions are desired for some reason.
Trace Analysis Build 2015-10-06
As you can see, it has been many years since I worked on SQL Clone. SQL Server 2014 SP2 now has this functionality, see
This is an interim update for scripting Roles. A more thorough update is due, when I get a chance.
Below are some helpful scripts I have built up over the years.
The first is an extension of sp_spaceused to provide table level summary.
spaceused with table level details (2013-09)
The next is spaceused at the index level
Index Summary with key and include columns, size, and dmv index usage stats.
An alternate to the index level space usage query is a system stored procedure, extending sp_helpindex.
sp_helpindex3 is my version of extended index information. It would be desired if the table and index sp_spaceused extensions could be made into system views, but apparently SQL Server does not allow modifications to system resource database anymore?
Trace is my standard Profiler script. Be careful in collecting execution plan information, as this could have high overhead.
Events and Presentations
Joe presented at SQL Saturday Oslo, 30 Aug 2014
Modern Performance 2014 Oslo
NOVA, 19 May 2014
Modern Performance 2014
SQL Saturday Atlanta, 3 May 2014
SQL Saturday Budapest, 1 Mar 2014
I was at SQL Saturday Boston on 29 Mar, but not presenting.
2013 Oct 6 at Silicon Valley Code Camp
Modern Performance 2013
(updated 2013-10-06, original 10-03)
Note: Modern Performance-SVCC, the SQL Performance 2013-Riyadh and the 2012 Automating Performance-Copenhagen slidedecks are all on the same topic.
2013 May at SQL Saturday Riyadh, Saudi Arabia
SQL Performance 2013 (Special Topics)
2012 Oct 2 & 3 at SQL Rally Nordic, Copenhagen
Presented at SQL PASS Deutschland
I presented at Munich on 10 Jan 2013. Will put up slides soon?
SQL Server Query Optimizer Cost Formulas 2010-Dec?
2011 SQL Saturday Trinidad Presentations
I presented a session at SQLBits on 1 Oct 2010 in York, England. (if this where New York was named after?) Anyways, I tried to squeeze 4-5 hours of material into 1 hour. So I will try to write up everything and collect it here.
1) NUMA System Architecture
2) Storage Performance for Data Warehouse
3) Parallel Execution Plans Part I, Part II, Part III
4) TPC-H Benchmarks, SF100, SF300, SF1000, SF3000
5) TPC-H Studies at SF10
Slidedecks: SQLBits Presentations
SQL Server 2008 Presentations
SQL Server 2005 Presentations
SQL Server 2000 Presentations from past conferences
Efficiently Clone Databases by Kevin Kline
Optimizing Power for SQL Server ... by Jimmy May
Logical I/Os – Not a useful performance metric? by Joe Sack
An Introduction to Cost Estimation Benjamin Nevarez
I posted this HealthCare.Gov Fiasco back in 2013.
In case anyone is interested, the old frameset page is here wframeset.html. I never figured out how to code it.