Joe Chang, SQL Server Consultant

jchang6@yahoo.com

About Joe

SQL Server Tools Developer

Free tools include: ExecStats for cross referencing execution plans to index usage, SQL Trace for parsing Profiler Traces, SQL Clone distribution statistics, and SQLSystem - system performance monitoring.

Reversed engineered the SQL Server Query Optimizer

also known as the Cost Based Optimizer (CBO). The query optimizer formulas (along with data distribution statistics) allows understanding exactly why the optimizer chooses a particular execution plan, and why an index is used or not used.

Recalibrated the true SQL Server Execution Plan Cost Structure

for several processor and system architectures. What does a specific operation in the SQL Server execution plan really cost per row and per page? To what degree does the plan cost represent true cost? The plan cost model is really meant to determine the best table access strategy. The cost of logic does not factor into plan cost if it does not affect table access, but can significantly increase the cost of a query.

  1. How will you know if something seems right or is seriously wrong?
  2. Can the SQL Server query optimizer pick a poor plan?
  3. Why?
  4. Is it safe to override the default plan?
  5. Are you afraid to do so?

A number of anomalies and errors identified in SQL Server 7 and 2000 were sent back to MS for correction in 2005 (involving key operations like small table scan cost, Bookmark Lookups, Nested Loops Join). Even the SQL Server 2005 and 2008 query optimizers make certain assumptions that are frequently not valid.

Pioneered Data Distribution Statistics Cloning (SQL CLone)

Working with a copy of the production database was not feasible because sheer size (true in the past, today with 500GB 2.5in hard drives, plus 17in notebooks with two internal drives, ...) not to mention data security issues. Simply having the DDL is not sufficient for query tuning, because the SQL Server query optimizer is a cost based optimzer, meaning it uses data distribution statistics to estimate row count.

By creating an empty database using the DDL to create identicial objects, but with no data, then cloning the data distribution statistics, it becomes possible to generate the same execution plans as on the production server (in most cases). This required direct manipulation of the system tables in SQL Server 2000. For SQL Server 2005 and later, Microsoft provided an API to clone data distribution statistics (because this technique was deemed important).

 

Why go to all the effort for the above?

Most people approach performance tuning with a laundry list of techniques i.e. recipes, and try each one until something good happens. With deep experience and skill, they will gradually develop a feel for what might work, trying the most likely first.

This approach to performance tuning can be called art. It can be called religion if you believe on faith in the absense of supporting evidence, and sometimes with strong evidence to the contrary.

I know this is what we do for a living, but SQL Server is not a religion.

Database Performance Science

Science is about knowing for certain why. It is about having a model that can explain observed behavior and even better: make accurate predictions before observations are made. A scientific approach to performance tuning reduces the need to make guesses, avoiding dead-end wild goose chases and possibly avoidable consultant billable hours. In government world, there are no points for coming under the bid amount, but there is value in delivering on schedule and this requires predicitive capability. Many places implement policy based on FUD. What is the value of knowledge over FUD?

When your production server is having serious problems, a Wall Street analyst ask the CEO at a public breifing in front of other analysts:

  1. why do your field offices say your computer system goes down during end-of-month payment processing?

you will not have a good day (actual event). If your 2-year development project is within weeks of cancellation for failure to meet performance requirements (also actual),
do you want art or science?

You want your problem fixed fast. Would you like to tell your boss:

  1. the problem went away but we do not know why, or if it will reoccur,
  2. or we know why it happened and how to avoid it in the future.

Can this be done with art?

Scaling and Parallel Execution Plan cost structure for SMP and NUMA systems

People have upgraded to Big-Iron systems hoping it will solve their performance problems. There are reasonable performance benchmarks showing NUMA systems can scale SQL Server performance.

What is not widely known is that certain operations do not scale on SMP or NUMA, and that certain parallel execution plans have severe negative scaling on NUMA systems. Many of these events can be indentified and avoided, if one knew precisely what look for and what to do.

  1. Would you like to avoid severe performance problems after upgrading from a standard 4-way SMP system to a $500,000 NUMA system?
  2. Who approved the purchase?
  3. Does he or she know about these problems?

Fortunately, if your CIO or CEO approved the purchase of really expensive hardware, they will most probably also authorize funds for a really expensive consultant (Not to be construed as a solicitation for services) to handle the problems on the really expensive hardware. Do not bother with an inexpensive consultant. After all, just how valuable can his advice be? Who would take it seriously?

  1. A small problem is your problem.
  2. A big problem is their problem too.

I think this is a matter of who get fired or punished for the mistake. For a small mistake, the boss can fire a low level person to show his/her boss that he/she is on top of the situation. When it is a really big mistake, firing a low or middle person is not sufficient. It is the big bosses whose jobs may be on the line as well.

In fact, this situation gives you enormous leverage in handling the situation, if you have the courage to exert your leverage. It is relatively simple for a middle-level IT professional to get another job. The process for a CIO to find another job is much more complex. Note: proposing to bring in an expensive consultant demonstrates your seriousness in resolving the problem. (again, not to be construed as a solicitation for services)

  1. The bosses will be happy,
  2. you will be happy,
  3. the consultant is happy,
  4. everybody is happy.
  5. Don't rock the happy boat.

This is also called successfully deploying a multi-million project involving tens of direct reports and possibly hundreds of underlyings. Do this make you look like CIO material? It looks really impressive on your resume (CV for non-Americans).

Consider the alternative. You are really smart. You come up with a really smart idea on how to deploy a difficult project with few people on relatively inexpensive hardware. You saved your company millions of dollars. Who knows you saved you company millions of dollars? Who knows how smart you are? So what looks better on your resume?

Storage and IO Performance

Your company just deployed a really expensive SAN based storage system. It seems that many applications are running slow, including the main line of business database. Nobody wants to believe it, but ultimately everything points to the storage system.

How can this be? The storage system was really expensive!
So why is that the expensive storage system is so often slow?

First, a SAN based storage system is really just two (or more) computers in front of storage elements that can be connect to a (sometimes) dedicated storage network. If you have ever heard a SAN vendor sales pitch, you would come away with the impression that the SAN has almost magical capabilities. Recall that the storage system has computers. In case this is surprise to anyone in our business, it is possible to write software that runs on computer systems. Software usually does something useful.

Now computer systems, including the computers in a SAN are not particularly expensive. Neither are the disk drives and disk enclosures. And yet a direct-attach storage system typically has a cost of about $500 per 15K disk, amortizing the cost of the disk enclosures and RAID controllers. A SAN based storage system might have an amortized cost of $2,000-6,000 per disk.

Lets start by asking why the storage system is expensive. The SAN vendor want to sell really expensive storage systems, otherwise, there would not be jobs for sales and marketing people. So it would be a good to provide value-add to the inexpensive storage components.

The usual argument is that the SAN storage system is shared, so multiple hosts (servers) can share resources from a common pool. In-turn, the utilization per disk is higher and fewer disks are required. Some how, the saving money argument overlooked the detail that instead of buying 200 disks at $500, we are now buying 50 disks at $4000 each.

Assuming the money part is not something that deters your organization, the problem is that all the main elements of the SAN argument is really bad for database disk performance. Storage performance is about distributing IO over very many disks and IO channels, which means the individual disks should be relatively inexpensive.

Having very many disks means the disk utilization can be kept low, which in turns means we benefit from the short-stroke effect, boosting random IO performancce, which is absolutely critical to transaction processing performance.

The main line-of-business database is almost always a critical system. If it is down, revenue cannot be booked. It is usually the case that a business opportunity missed is an opportunity that is lost. It cannot be recovered when the system is back up. So the LOB storage system should not be shared with the company email system. We not want a mid-day Exchange backup/restore (because someone forwarded a virus) to disrupt transaction processing.

The problems is almost always that the SAN vendor will configure the storage system to the principles of their value-add arguments, which is more or less guarantes bad database performance. And this is why you need an independent expert to validate your storage system performance characteristics.

What should a storage system be able to do?

If perfectly configured, it will deliver:

  1. random read IO at 5-6ms latency (queue depth 1)
  2. high-queue throughput of 400-500 IOPS per disk at 20ms (short-stroked)
  3. transaction log writes below 1ms as in 0.3ms or less
  4. sequential IO at 125MB/sec disk, and 1.1GB/sec on a x4 3Gb/s SAS channel

Expose SQL Server Disk IO Patterns

SQL Server Disk IO patterns and settings that determine serialized or asynchronous IO. The SQL Server engine may generate serialized singleton IO patterns or batch. This means a query requiring relatively few disk IOs may actually take longer than the query that generates more disk IO. When does this happen? The most severe problem occurs with updates to a large number of rows and an index exists.