Home, Query Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage, TPC-H Studies

Processor Architectures and History

Intel and AMD Historical, Pentium 4, AMD Opteron, Dual Core, Pentium M to Core 2, Nehalem, Sandy Bridge to Haswell, Hyper-Threading, SIMD Extensions

Update 3  2017 Jan

Silicon-28 or 28Si

Perhaps just more than ten years ago, there was a company (isonics?) trying to get isotopically pure 28Si for use as silicon wafers. Natural silicon is 92.2% Si-28, 4.7% Si-29 and 3.1% Si-30. Isotopically pure silicon is supposed to have 3× better thermal conductivity.

Heat Sink

About 10 years ago, an idea was floated related to heat transfer. All metals exposed to oxygen form a oxide layer on the surface. The oxide layer happens to contribute a large part of the thermal resistance. Processors are made out of silicon, which is a (semiconducting-)metal. Intel places a (metal) heat spreader on top of the silicon. A (metal) heatsink then sits on top of the spreader. There is thermal grease to help ensure physical contact between the spreader and the heat sink. At each level, the silicon chip, the spreader (both sides), and the heat sink, there is an oxide layer that contributes thermal resistance.

The idea was to place the components in a hard vacuum, no oxygen or anything else. A laser or some other means is used to remove the oxide layer, exposing the bare metal. When two exposed metals are placed in contact, they bond, as though they were one piece of metal. Not that we are interested in the metal bond, but we are interested in low thermal resistance.

Atom

In the discussion on Knights Landing, I suggested that the Atom core might not be bad for transaction processing. The cheapest Xeon Phi is the 7210 at $2438. About $4700 in a system. What is the difference between Atom C2750 and 2758? Both are Silvermont 8-cores, no HT. Use ECC SODIMM.

Atom has changed since its original inception, not using out-of-order execution for simplicity and power-efficiency. Silvermont added OOO. Not sure about Goldmont. Is Atom to be a slimmed down Core? with 3-wide superscalar and manufactured on the SoC version of the process?

Update 2  2016 Dec

Intel Microarchitecture 2020?

There is an article on WCCFtech, that a new Intel processor architecture to succeed the lake processors (Sky, Cannon, Ice and Tiger) will be "faster and leaner" and more interestingly might not be entirely compatible with older software. The original source is bitsandchips.it. I suppose it is curious that the Lake processors form a double tick-tock or now process-architecture-optimization (PAO), but skipped Kaby, and Cannon. Both the bridge (Sandy and Ivy) and well processors (Has and Broad) each had only one tick-tock pair.

Naturally, I cannot resist commenting on this. About time!

For perspective, in the really old days, processor architecture and instruction set architecture (ISA) was somewhat the same thing. The processor implemented the instruction set, so that was the architecture. I am excluding the virtual-architecture concept where lower cost version would not implement the complete instruction set in hardware.

The Intel Pentium Pro was a significant step away from this, with micro-architecture and instruction set architecture now largely different topics. Pentium Pro has its own internal instructions, called micro-operations. The processor dynamically decodes X86 instructions to the "native" micro-operations. This was one of the main concepts that allow Intel to borrow many of the important technologies from RISC.

The Pentium 4 processor, codename Willamette, had a Trace cache, that was a cache for decoded instructions. This may not have been in the Core 2 architecture that followed Pentium 4.

My recollection is that Pentium Pro had 36 physical registers of which only 8 are visible to the X86 ISA. The processor would rename the ISA registers as necessary to support out-of-order execution. Pentium 4 increased this to 128 registers.

Also see MIT 6.838 and NJIT rlopes

The Nehalem micro-architecture diagrams do not mention a µop cache, (somehow the acronym is DSB) but Sandy Bridge and subsequent processors do. This is curious because both Willamette and Nehalem are Oregon designs, while Core 2 and Sandy Bridge are Haifa designs.

The other stream that comes into this topic involves the Intel Itanium adventure. The original plan for Itanium was to have a hardware (silicon) X86 unit. Naturally, this would not be comparable to the then contemporary X86 processors, which would have been Pentium III, codename Coppermine at 900MHz, for Merced. So by implication, X86 execution would probably be comparable to something several years old, a Pentium II 266MHz with luck, and Itanium was not lucky.

By the time of Itanium 2, the sophistication of software CPU emulation was sufficiently advanced that the hardware X86 unit was discarded. In its place was IA-32 Execution Layer. Also see the IEEE Micro paper on this topic. My recollection was the Execution Layer emulation was not great but not bad either.

The two relevant technologies are: one, the processor having native µops instead of the visible X86 instructions, and two, the Execution Layer for non-native code. With this, why is the compiler generating X86 (ok, Intel wants to call these IA-32 and Intel 64 instructions?) binaries.

Why not make the native processor µops visible to the compiler. When the processor detects a binary with native micro-instructions, it can bypass the decoder? Also make the full set of physical registers visible to the compiler? If Hyper-threading is enabled, then the compiler should know to only use the correct fraction of registers.

Have one or two generations of overlap, for Microsoft and the Linux players make a native micro-op operating system. Then ditch the hardware decoders for X86. Any old code would then run on the Execution Layer, which may not be 100% compatible. But we need a clean break from old baggage or it will sink us.

Off topic, but who thinks legacy baggage is sinking the Windows operating system?

Addendum

Of course, I still think that one major issues is that Intel is stretching their main line processor core over too broad a spectrum. The Core is used in both high-performance and high-efficiency mode. For high performance, it is capable of well over 4GHz, probably more limited by power than transistor switching speed. For power efficiency, the core is throttled to 2 or even 1 GHz.

If Intel wants to do this in a mobile processor, it is probably not that big a deal. However, in the big server chips, with 24 core in Xeon v4 and possibly 32 cores in the next generation (v5), it becomes a significant matter.

The theory is that if a given core is designed to operate at a certain level, then doubling the logic should achieve a 40% increase in performance. So if Intel is deliberately de-rating the core in the Xeon HCC die, then they could built a different core specifically to one half the original performance is perhaps one quarter the complexity.

So it should be possible to have 100 cores with half the performance of the Broadwell 4GHz capable core, i.e., equivalent to Broadwell at 2GHz? If this supposed core were very power efficient, then perhaps we could even support the thermal envelope of 100 mini-cores?

Of course, not every application is suitable for wide parallelism. I would like to see Intel do a processor with mixed cores. Perhaps 2 or 4 high performance cores and 80 or so mini-cores?

Cornell ECE 4750 Computer Architecture  

Update 1  2016 Dec

Starting from Nehalem, I will try to provide scale Intel CPU die images. Note that in most cases, I am not scaling the raw image, but doing so in html. Also, I am assuming that the image I found online have the correct aspect ratio, which is not guaranteed. I will try to use the scale 10 pixels to 1 mm. But in some cases, the actual die dimensions are listed on line. If anyone notices errors and have the correct ratio, please advise.

Nehalem 45nm

Nehalem Nehalem-EX

Nehalem 263 mm2, 19.45 × 13.52 mm, Nehalem-EX 684mm2, 31.41 × 21.78 mm

Many years ago, max die size was 440 mm2, something to do with reticle limit, or the optics used. then it became it became possible to do 680mm2, This was about the time full frame camera sensors became much less extremely expensive, full frame is 35mm x 24mm?

Westmere 32nm

A comparison of quad-core Nehalem 45nm with 32nm dual-core Westere 81 mm2, 8.52 × 9.50mm, and the six-core Westmere EP 248 mm2, 21.55 × 11.51mm .

Nehalem   Westmere   Westmere   Westmere
Nehalem,                           Westmere 2 cores,       and Westmere EP six-cores

Westmere and Sandy Bridge 32nm

Westmere   Sandy Bridge
Westmere 6-core 248mm2 and Sandy Bridge 4-core 216mm2, 21.25 × 10.16mm (could be 20.87 × 10.35?), both 32nm

Below are Westmere-EX and Sandy Bridge EP/EN.

Westmere-EX   Sandy Bridge
Westmere-EX 10-cores 513mm2, 25.85 × 19.84 mm, and Sandy Bridge EP & EN, 8 cores, 435mm2, 22.19 × 19.61

Ivy Bridge 22nm

Ivy Bridge-10c   Ivy Bridge-15c

The 10-core 341mm2, 16.43 × 20.76 mm, and 15-core 541mm2, 25.38 × 21.32 mm. There is also a 6-core 256.5mm2, no image found.

Ivy Bridge 4c  

I believe the above left is Ivy Bridge 4-core HM-4, 133mm2, 7.656 × 17.349mm. The Ivy Bridge HE-4 is 160mm2, 8.141 × 19.361mm (not shown).

Haswell 4c  

Above is Haswell 4-core GT2 graphics, 177mm2, 8.09 × 21.89mm.

Broadwell 4c   Broadwell 4c

Above left is Broadwell 2-core GT2 graphics, 82mm2, 6.08 × 13.49mm. On the right is the 2-core with Iris graphics, listed at 133mm2.
From the original raw image aspect ratio, I calculate the dimensions to be 6.80 × 19.55 mm. I do not know why the standard graphics die is 6.08 mm and the Iris graphics die is 6.80 mm. Never mind, notice the blank space above the cores of the 2c Iris die.

Broadwell 4c

Above is the 4-core Broadwell. I cannot find die size or dimensions. In trying to match up the cores, I am guessing a die size of 172 mm2 and dimensions of 12.41 × 13.86mm.

Haswell 22nm

Haswell 8c   Haswell 18c

The 8-core 354mm2, 17.62 × 20.09 mm,
and 18-core 661mm2, 31.81 × 20.78 mm.
There is also a 12-core 492mm2, no image found.

Broadwell 14nm

Broadwell 10c  

The 10-core 246.2mm2, 15.20 × 16.20 mm.
The 15-core 306.2mm2, 18.90 × 16.20 mm
and 24-core 456.1mm2, 25.20 × 18.10 mm, no image found.

Sky Lake 14nm

Sky Lake  

Sky Lake 4-core 122.4mm2, 13.36 × 9.16 mm.

 

Knights Landing 14nm, Xeon Phi 200

Sky Lake  

Above is based on die size of 683mm2, and if image aspect ratio is correct, then dimensions are 32.2 × 21.2 mm.

 

 

 

 

All Desktops

Nehalem 263 mm2, 19.45 × 13.52 mm,
with 32nm dual-core Westere 81 mm2, 8.52 × 9.50mm,
and the six-core Westmere EP 248 mm2, 21.55 × 11.51mm

Nehalem   Westmere   Westmere  

Westmere   Sandy Bridge
Westmere 6-core 248mm2 and Sandy Bridge 4-core 216mm2, 21.25 × 10.16mm (could be 20.87 × 10.35?), both 32nm

Ivy Bridge 4c   Haswell 4c   Haswell 4c

I believe the above left is Ivy Bridge 4-core HM-4, 133mm2, 7.656 × 17.349mm. The Ivy Bridge HE-4 is 160mm2, 8.141 × 19.361mm (not shown).

In the middle is Haswell 4-core GT2 graphics, 177mm2, 8.09 × 21.89mm.

On the right is Broadwell 2-core GT2 graphics, 82mm2, 6.08 × 13.49mm

Broadwell 4c   Broadwell 4c

Above left is Broadwell 2-core GT2 graphics, 82mm2, 6.08 × 13.49mm. On the right is the 2-core with Iris graphics, listed at 133mm2.
From the original raw image aspect ratio, I calculate the dimensions to be 6.80 × 19.55 mm. I do not know why the standard graphics die is 6.08 mm and the Iris graphics die is 6.80 mm. Never mind, notice the blank space above the cores of the 2c Iris die.

Broadwell 4c

Above is the 4-core Broadwell. I cannot find die size or dimensions. In trying to match up the cores, I am guessing a die size of 172 mm2 and dimensions of 12.41 × 13.86mm.

Sky Lake  

Sky Lake 4-core 122.4mm2, 13.36 × 9.16 mm.

 

 

 

 

 

All Server with ppt diagrams

Nehalem-EX   Nehalem-EX
Nehalem-EX 684mm2, 31.41 × 21.78 mm

 

Westmere-EX  Westmere-EX
Westmere-EX 10-cores 513mm2, 25.85 × 19.84 mm,

 

  Sandy Bridge   SandyBridge-EX
Sandy Bridge EP & EN, 8 cores, 435mm2, 22.19 × 19.61

 

Ivy Bridge-10c   IvyBridge-EX
Ivy Bridge 10-core 341mm2, 16.43 × 20.76 mm,

Ivy Bridge-15c   IvyBridge-EX
and 15-core 541mm2, 25.38 × 21.32 mm.
There is also a 6-core 256.5mm2, no image found.

 

Haswell 8c   Haswell 18c

Haswell-EX   Haswell-EX
Haswell 8-core 354mm2, 17.62 × 20.09 mm,
and 18-core 661mm2, 31.81 × 20.78 mm. There is also a 12-core 492mm2, no image found.

 

Broadwell 10c   Broadwell-EX
The Broadwell 10-core 246.2mm2, 15.20 × 16.20 mm.

The 15-core 306.2mm2, 18.90 × 16.20 mm and 24-core 456.1mm2, 25.20 × 18.10 mm, no image found.

Broadwell-EX   Broadwell-EX

 

Wikichips seems to have a decent collection of details on Broadwell EP/EX and other Intel processors from Sandy Bridge forward.

 

 

Server stick diagrams

 


  Nehalem-EX   Westmere-EX
Nehalem-EX 684mm2, 31.41 × 21.78 mm, Westmere-EX 10-cores 513mm2, 25.85 × 19.84 mm,

 

  SandyBridge-EX
Sandy Bridge EP & EN, 8 cores, 435mm2, 22.19 × 19.61

 


  IvyBridge-EX   IvyBridge-EX   IvyBridge-EX
Ivy Bridge 10-core 341mm2, 16.43 × 20.76 mm, and 15-core 541mm2, 25.38 × 21.32 mm. There is also a 6-core 256.5mm2, no image found.

 

  Haswell-EX   Haswell-EX
Haswell 8-core 354mm2, 17.62 × 20.09 mm, There is also a 12-core 492mm2, no image found.

  Haswell-EX
and 18-core 661mm2, 31.81 × 20.78 mm.

 

  Broadwell-EX   Broadwell-EX   Broadwell-EX
The Broadwell 10-core 246.2mm2, 15.20 × 16.20 mm. The 15-core 306.2mm2, 18.90 × 16.20 mm and 24-core 456.1mm2, 25.20 × 18.10 mm, no image found.

 

 

Wikichips seems to have a decent collection of details on Broadwell EP/EX and other Intel processors from Sandy Bridge forward.

 

 

 

 

 

Update 2014 Feb

To better understand server systems, it is necessary to understand the microprocessors on which server systems are based. In turn, it is helpful to understand the recent history of microprocessors. The above links trace Intel processors from Pentium Pro to Haswell (1995-2014) The Intel Pentium Pro was a landmark that brought Intel into prominence in server systems. It was the first X86 (or IA-32) processor that shook the presumption that RISC was superior and would eventually take over the market.

The RISC argument had become so broadly accepted that Intel evens planned on replacing their X86/IA-32 line. Instead of fielding yet another RISC processor, Intel felt compelled to come up with an even better architecture that became EPIC, the foundation of the Itanium processors. There is a valid technical foundation to the argument that RISC concepts were becoming outdated. RISC was conceived with the anticipation of transistor budgets in the tens of thousands. In the EPIC/Itanium time frame, the transistor budget was in the tens of millions. Broad consensus does not seem to be a reliable predictor of the future.

New material will be below until they can be incorporated into the appropriate sections.

Ivy Bridge EP came in 2013 Q3. It was disclosed that there would be 3 distinct dies. The diagram below is from AnandTech, see references below

IvyBridgeDies

See Anandtech Intel's Xeon E5-2600 V2: 12-core Ivy Bridge EP for Servers by Johan De Gelas September 17 2013, and Intel Readying 15-core Xeon E7 v2,
SemiAccurate A technical look at Intel's new Ivy Bridge-EX,
and Toms Hardware Intel Xeon E5-2600 v2: More Cores, Cache, And Better Efficiency.

When the Ivy Bridge EP separate dies with 10 and 12 cores was announced, it seemed rather an unusual choice. Later, when the 15-core EX model was brought, it then became clear the 12-core E5 v2 actually shares a 15-core die with the E7 v2. Below is my rendering of the 3 Ivy Bridge EP dies that will be used in system architecture diagrams.

IvyBridgeDies

Presumably the 6-core die is to better support the lower price points.