jchang6 @ yahoo.com
South Deerfield, MA
US Cell: (407) 791-1126
20+ years database architecture, design and performance.
Reverse engineered the SQL Server Query Optimizer cost model and formulas.
Quantified the true cost of SQL Server execution plan operations by processor and system architecture.
Exposed unusual behavior in SQL Server scaling at high parallelism and NUMA issues.
Originator of SQL Clone, a shell database having objects and data distribution statistics, but no data,
now a standard feature in SQL Server as of version 2014 SP2
Decoded SQL Server statistics binary structure for write-your-own statistics.
Publisher SQL ExecStats for cross referencing execution plans to index usage.
Security Clearance: 2009
Database Architect and Performance Consultant
Support a planned 4X increase in volume from a corporate merger. Initially brought in as infrastructure (server and storage) expert. Rewrote critical SQL code sequences for correctness, efficiency, repeatability. Eliminated unnecessary recompiles. Identified and resolved blocking to achieve scaling at high concurrency. Restructured primary/clustered keys. Complete overhaul of nonclustered indexes with a drastic reduction. Worked around tables with computed columns defined by scalar functions at first. Then worked out plan to remove compute columns, touching several hundred procedures.
Database re-architecture to support 10X volume increase. On investigation, determined that objectives could be achieved by tuning key SQL code. Identified the true critical root source bottleneck being in the latency for remote server connection initialization, proving objectives could be achieved without re-architecture.
The client had built a CRM app with a 4GL product achieving OK performance, but a subsequent patch resulted in daily system seize-ups. Vendor engineers came on site but could not find anything wrong. I was recommended by an Oracle expert. A CPU filtered profiler trace showed some SQL called with API Server cursor. A short-term unfiltered trace was then run, showing that cursors were opened but never closed. A nuance was that the client app (with permanently open connections) generating API Server cursor calls did not have symptoms, while another application using connection pooling did. The victim of unclosed cursors was the sp_reset_connection call. Vendor was not able to provide a quick fix and probably did not understand why this was a problem either. Moving from 32-bit to 64-bit would alleviate the problem. Just resetting the client app to force close connections and cursors was a workaround.
Client had encountered serious performance problems with kCura Relativity in supporting a very large legal document discovery case with a large number of active users. A new high bandwidth storage system was specified and procured. Developed a plan for migrating the 2TB database off an iSCSI storage system on 1GbE within a limited maintenance windows. SQL Server did not produce efficient parallel execution plans for the style of SQL that Relativity generated. Demonstrated that an alternative style did allow SQL Server to generate parallel execution plans 100-1000 times faster. The kCura SQL expert instead directed developers to tackle the problem symptoms which did not help.
Corrected serious flaws in the original server and storage configurations. Identified memory round-trip latency, not bandwidth as the key determinant. Rewrote critical SQL sequences to meet project requirements to support processing 50M DNA specimen in 8 hours. Notable code was the double pivot with false aggregation. Eliminated abusively frequent user defined function calls. Recognized that even though the contracted performance requirements could be met, the 10-hour turn-around time for QA cycles posed serious problems. Either the project schedule was not going to be met or the code would have an excessive number of bugs, and probably both. A moderate cost hardware configuration allowed QA cycles to be turned inside of 2 hours.
Typical government IT contracting, spending money generating useless paperwork instead of directly tackling the important applications.
Mandate was to substantially improve performance as quickly as possible in the new US Army Funds Control management system. Instructions were to do so without changing the business logic. The project was facing cancellation in as little as six weeks if corrective action was not achieved. Performance gains far above requirements were achieved, with the necessary gains in two weeks by rewriting key business logic SQL. In the critical period of the first two weeks, coding rework was done on the production system.
Worked with developers in identify Windows API and C code strategies for backup and restore performance on high-bandwidth IO sub-systems and scaling on (HP Integrity) NUMA system architecture. Wrote papers on system and storage configurations for high bandwidth IO. Advised on tuning strategies in buffer size and count, balancing performance against available 32-bit VAS, as opposed to work by others on peak performance, allocating excessively large and overly numerous buffers when contiguous address space was not likely to be available in a production system.
Existing system on 3 Sun E10000 Starfire server (each with 60 processors) and Informix. Each server could support 300 searches per second. The goal was to determine if 2-socket Opteron systems could support 150 searches per second. A team of Cendant internal people and Oracle engineers ported code from Informix to Oracle. MCS was told that this was to be an unbiased apple-to-apples comparison between Oracle and SQL Server. I was to port the code to SQL Server accordingly. In the actual course of the project, the Oracle team kept talking about how they would achieve an advantage over SQL Server using native 64-bit code able to use 8GB of memory. SQL Server 2000 Enterprise Edition was precluded, and hence limited to 2GB in standard edition. That being the case, I felt no obligation to abide by the original instructions, and rewrote key code structures, achieving 3000 tps, versus 50 for Oracle on their port. It took the Oracle team 3 weeks to figure out my re-coding logic before declaring both implementations to be equal, and the Cendant team selected Oracle because that’s what they wanted in the first place.
A local consulting firm built a new line-of-business, specifying deployment on a Unisys E7000 with 16 processors and a storage system with 12 HDDs. In the critical first day of month payment processing, the system would seize up for reasons not understood. The was quickly determined to be caused by a sub-procedure that was called once at night to reconcile an entire day or month of transactions, and then called in volume during the day to process mostly single transactions. The night time call activated a temp table statistics re-compute, producing a parallel execution plan, scanning moderately large tables. The day time calls retained this plan, resulting in the system being unresponsive until another re-compute occurred. Immediate action was to put in a force recompile each morning after the night run. There continued to be problems with the payment procedure but having different symptoms. The execution plan for the procedure looked to be efficient. A cursor was used to process line items even though most day time transactions had just 1 item. It was shown that SQL Server 2000 could run this procedure efficiently with up to 4 concurrently running threads, but not more on a NUMA system. This problem would not have occurred on a 4-way SMP system. The cursor was removed. Also, SQL Server 2005 fixed the NUMA cursor issue, as cursors were a part of the TPC-E benchmark.
Unisys had sold 12 ProFusion 8-way Xeon servers to a large foreign bank, bringing in S1/Edify for software and implementation. One pair of systems was for the database and the others for application servers. There were some index deficiencies in the database, easily fixed. The app servers were expected to support 1000 users, in part to justify the platform cost. Each user required 2 threads. Problems were traced to the thread stack size (1M), and a multi-threaded heap allocation method (MpHeap on NT 4.0) with poor choice of caching, and another critical segment that made avoidable large memory allocations. Fixes allowed the 8-way application servers to meet objectives even though more 2-way servers would have been a better solution.
Others: National Guard G1 Lifecycle, Kaiser Permanente
Senior Software Engineer
Microprocessor Products Group, ISV Performance Lab
- Proved that scaling Exchange 5.5 on the 8-way ProFusion was possible, after everyone else gave up hope. Compaq request assistance from the Intel benchmark team for the first time in a long while.
Performance Microprocessor Division, Strategic Marketing
- Coordinate efforts of architects, design team managers and platform architects in formalizing product strategy.
- Made the successful argument for one more year of IA-32 in 4-way servers when the official POR had just decided to terminate IA-32 processor after Tanner/Cascade (Pentium III Xeon on 180nm), rejecting the ESG argument for 2 years overlap with Itanium in 4-way. Xeon was subsequently continued indefinitely.
- Contributed the idea of a bus expander for PCI at 66MHz to get around the single slot limit.
- Initiated the effort to relabel the upcoming Pentium III 180nm product (Coppermine) for 133.3MHz bus at multiplier 5 to 667MHz before any official document was distributed to OEMs showing the frequency rounded to the integer down from the numeric value.
Product Manager, Server Systems
Unrestricted Line Officer, Nuclear Propulsion, USS Arkansas CGN-41
Microsoft Certified System Engineer, 1995
Novell Certified NetWare Engineer, 1994
Munich, Germany, 10 Jan 2013
Execution Plan Cost Formulas
Stockholm, Sweden, 24 Jan 2011
Malmo, Sweden, 25 Jan 2011
Copenhagen, Denmark, 26 Jan 2011
Munich, Germany, 16 Dec 2010
SQL Performance for Developers
Melbourne, Australia, 9 Aug, 2010
Curacao, 22 Apr 2010
SQL Server Cost Based Optimizer
Queries barely over the Cost Threshold for Parallelism
In-Memory Database Technology
Big Iron Systems and NUMA System Architecture
Intel Server Strategy Shift with Sandy Bridge
Storage Performance 2013
Storage Performance for Data Warehouses
IO Queue Depth Strategy