HomeQuery OptimizerBenchmarksServer Systems Processors,  Storage ScriptsExecStats

Processor Architectures and History

Intel processor die images: desktop ,   Xeon (big die) images ,   Intel vs. AMD Opteron   Xeon w/stick diagrams,  

Itanium,   AMD Notes: Zen 2/3  

Intel and AMD Historical, Pentium 4, AMD Opteron, Dual Core, Pentium M to Core 2, Nehalem, Sandy Bridge to Haswell, Hyper-Threading, SIMD Extensions

Other incomplete items
Performance Overview (2017 Jul), Microarchitecture 2020 (2016 Dec),  Update Jan 2017Update Feb 2014

Turbo-Boost Alternative,   2018 Oct

Preliminary  

Many years ago, perhaps in the mid-1990's, max die size was 440 mm2, something to do with reticle limit, or the optics used. Then it became it became possible to do 680mm2.

We can look at the progression in camera sensor size over time. The Canon 1D had an APS-H sensor at 28.7 × 19mm = 548mm2 in late 2001.

The 1Ds followed in late 2002 with a full-frame sensor at 36 × 24mm = 864mm2. While expensive, it was less than extremely expensive. Note, APS-C at 23.6 × 15.7 mm = 370mm2 is relatively inexpensive.

Number of whole die per 300mm wafer is 151, so raw die cost might be less than $100?

2022

In this page L2 and L3 size, I am trying estimate the L2 and L3 cache area for some recent processors. I believe Intel uses the high-performance SRAM cell for L2. Which SRAM cell does Intel use for L3? Also, same question for Alder Lake E-cores??

Anandtech article Intel 4 Process Node In Detail: 2x Density Scaling, 20% Improved Performance by Ryan Smith on June 13, 2022

2022 VLSI Symposium Intel 4 Technology papers.

Recent Processors   2022 Dec

Images below are scaled 1mm = 10 pixels.

Comet Lake 206.1mm2, 22.4×9.2mm   Rocket Lake 281mm2, 24 x 11.7 mm CometLake RocketLake

Alder Lake 215.25mm2, 20.5×10.5mm   Raptor Lake 257mm2, 23.8×10.8mm? AlderLake RaptorLake

Raptor Lake  2022 Dec

Images below are scaled 1mm = 30 pixels.

Raptor Lake 8P, 16E
raptor lake Fritzchens
sourced through wikipedia from Fritzchens Fritz - Raptor Lake

Below is the annotated Raptor Lake die from wikichips,  

RaptorLake

Alder Lake  2022 Oct

The actual desktop Alder Lake die seems to be
8P + 8E and 32 EU GPU, 10.5 x 20.5mm, 215.25 mm2
6P + 0E and 32 EU GPU, 10.5 x 15.5mm, 162.75 mm2

The 8P+8E die image is below.

AlderLake

It is apparent that the 4E slice is larger than one 1 P slice. Obviously Intel did not set an arbitrary size ratio between the P and E cores. The sizes are determined by architectural decisions. What this means is we cannot simply imagine what P+E combinations we would like on the assumption that P and 4E are exact.

It has been reported that Raptor Lake will be 8P+16E. While this is a good incremental progression as software is adapted to the high-low mix, what I would like to see is 8P + 32E or even 64E.

The mobile and ultra mobile versions are
6P + 8E and 96 EU GPU, 10.62 x 20.45mm, 217.18 mm2
2P + 8E and 96 EU GPU, ? x ?mm, ? mm2
The mobile version gives up 2P cores for a 3X GPU having an additional 64EUs. This does not mean 64EUs

Update  2021 Sep

No die pictures of Intel Alder Lake yet, though the following diagrams were in Intel Architecture Day 2021 slides, representing Ultra Mobile, Mobile and Desktop models, with 2, 6 and 8 P-cores respectively.

AlderLake AlderLake AlderLake

Each of the three die have 8-E cores, which come in groups of 4, the group of 4 have approximately the die area of one P-core. The UM and M die have graphics with 96 EU. while the desktop die has 32 EU.

The theory behind the P and E-cores stems from the transistor density/budget half of Moore's Law. Each new process targets a 0.707X linear shrink (one over square root of two). Moving a CPU design from a previous process to the next (current) results in die size reduction by a factor two (excluding off-die IO elements). Next processor architecture targets 2X the transistor budget/complexity for the same die size as the previous architecture on the previous process. The goal of the 2X transistor budget in a single new core is 1.41X increase in performance (square root of 2).

One alternative to doubling the transistor budget/complexity of a single core is to have two cores of same transistor complexity. There is no improvement in performance at the core level (some gain is possible from refinements over time), but a 2X increase in throughput could be possible for highly (and effectively) threaded workloads.

In the early period of Moore's Law, everyone persued the path of increase performance of the core, as this would benefit most applications, and it was thought the few situations in which single thread performance could be sacrificed. In the early 2000's, it was realized that continuing to achieve significant gains in general purpose performance at the single core level, did the course change to increasing the number of cores.

Note: graphics is one segment in which ever increasing massive parallelism for throughput gains if favored over single thread performance.

All this said, what is the right choice now, considering that 8 or more performance cores are practical for desktop (power connected) systems? What is the solution for mobile which prefer power-efficient cores, but still desire performance cores when connected to power, or there might be situations in which one is willing trade battery life for performance.

With the rule of 2X transistor budget for 1.41X gain, then a 4X transistor (die area) corresponds to 2X performance at the single core level. However, 4 of the small cores would have 2X the throughput of one large core. That the smaller cores are more power efficient is a separate arguement but it is said to be true for the Intel E and P cores.

Obviously, in order to fully benefit from many cores, the workload must be effectively multi-threaded. If a workload is effectively multi-threaded then very many small cores would achieve less many big cores on a fix die area basis. The reason for a workload to require single thread performance is that it does not multi-thread effectively, hence it would use only a single core. Presumably, there would not be many such tasks running simultaneously, depending on the specific situation. In any case, we can expect it to take several generations to fully work out the best method to implement an asymmetric core strategy.

Also see Asymmetric Cores, (2018-02) 

 

Saphire Rapids
AlderLake

Update  2016 Dec

Starting from Nehalem, I will try to provide scale Intel CPU die images. Note that in most cases, I am not scaling the raw image, but doing so in html. Also, I am assuming that the image I found online have the correct aspect ratio, which is not guaranteed. I will try to use the scale 10 pixels to 1 mm. But in some cases, the actual die dimensions are listed on line. If anyone notices errors and have the correct ratio, please advise.

Nehalem 45nm

Nehalem Nehalem-EX

Nehalem 263 mm2, 19.45 × 13.52 mm, Nehalem-EX 684mm2, 31.41 × 21.78 mm

Nehalem is the codename for the processor architecture. The quad-core die with QPI interfaces had the codename Bloomfield. A quad-core part with PCI-E had the codenaame Lynnfield. The 8-core big die part could be called either Nehalem-EX or Beckton?

Nehalem   Lynfield
Bloomfield (QPI),       and     Lynnfield (PCI-E)

Lynnfield, above on the right, is 296 mm2, 22.01 × 13.45 mm

Wikichip has more details along with large die images of Nehalem and in general, very good coverage of the more recent Intel processors.
Wikipedia may have more information on the older Intel processors, Nehalem.

In bit-tech Intel Core i7 - Nehalem Architecture Dive - Architecture enhancements, Nehalem is cited as having 20-24 pipeline stages compared to 14 in Core 2.

Nehalem L1 is 4 cycle. L2 cache is 256K, 10-cycle access. The Nehalem-EP 8M L3 shared by 4 cores is cited as 35ns. By comparison, the 45nm Penryn L1 is 3 cycle, the 6M L2 shared by 2 cores is 15ns. The 65nm Conroe 4M shared L2 is 14ns.

Westmere 32nm

The 32nm dual-core Westmere 81 mm2, 8.52 × 9.50mm, and the six-core Westmere EP 248 mm2, 21.55 × 11.51mm .

Westmere   Westmere   Westmere
Westmere 2 cores,       and Westmere EP six-cores

Wikichip Westmere.

Westmere and Sandy Bridge 32nm

Westmere 6-core and Sandy Bridge 4-core 216mm2, 21.25 × 10.16mm (could be 20.87 × 10.35?), both 32nm.

Westmere   Sandy Bridge

Above is Sandy Bridge 4c and GT2 graphics, 12EUs. There are also 2-core versions with GT2 and GT1, 12 and 6 EUs respectively.

Wikichip Sandy Bridge. The Sandy Bridge 4-core uses a ring to connect the 4 cores, the system agent and graphics. Previously, a ring was used on the Nehalem and Westmere EX processors, but not the smaller EP? Sandy Bridge L3 is 26-31 cycles, perhaps 10-11ns?

The Pentium 4 processors had a trace cache. Nehalem had a µop loop buffer. Sandy Bridge has a µop cache, 1536 µops, 8-way (32 sets of 6?)

Below are Westmere-EX and Sandy Bridge EP/EN.

Westmere-EX   Sandy Bridge
Westmere-EX 10-cores 513mm2, 25.85 × 19.84 mm,
and Sandy Bridge EP & EN, 8 cores, 435mm2, 22.19 × 19.61

Ivy Bridge 22nm

Wikichip Ivy Bridge.

Ivy Bridge-6c   Ivy Bridge-10c   Ivy Bridge-15c

The 10-core 341mm2, 16.43×20.76 , 17.03 × 20.03mm and 15-core 541mm2, 25.38 × 21.32 mm.
There is also a 6-core 256.5mm2, no image found, 17.06 × 15.04mm

Ivy Bridge 4c  

I believe the above left is Ivy Bridge 4-core HM-4, 133mm2, 7.656 × 17.349mm. The Ivy Bridge HE-4 is 160mm2, 8.141 × 19.361mm (not shown).

Wikichip indicates the above is actually the 160mm2 part with 2x8 EU? If so, then scaled image is:

Ivy Bridge 4c  

Haswell 22nm

Wikichip Haswell,  

Haswell 4c   Haswell 4c  

Above left is Haswell 4-core GT2 graphics (2×10 EU), 177mm2, 8.09 × 21.89mm. Above right is 2-core GT3 (2×16 EU?) 181mm2, 8.39 × 21.58mm.

Haswell 8c   Haswell 18c

The 8-core 354mm2, 17.62 × 20.09 mm, cores are arrayed 4 vertical, 2 across.
and 18-core 661mm2, 31.81 × 20.78 mm,
cores arrayed 4 vertical in the first 3 columns and 6 in the last column.
There is also a 12-core 492mm2, no image found.

Broadwell 14nm

Wikichip Broadwell,  

Broadwell 4c   Broadwell 4c

Above left is Broadwell 2-core GT2 graphics, 82mm2, 6.08 × 13.49mm. On the right is the 2-core with Iris graphics, listed at 133mm2.
From the original raw image aspect ratio, I calculate the dimensions to be 6.80 × 19.55 mm. I do not know why the standard graphics die is 6.08 mm and the Iris graphics die is 6.80 mm. Never mind, notice the blank space above the cores of the 2c Iris die.

Broadwell 4c

Above is the 4-core Broadwell. I cannot find die size or dimensions. In trying to match up the cores, I am guessing a die size of 172 mm2 and dimensions of 12.41 × 13.86mm.

Broadwell 10c  

The 10-core 246.2mm2, 15.20 × 16.20 mm, cores are arrayed 5 vertical, 2 across.
The 15-core 306.2mm2, 18.90 × 16.20 mm, 5 vertical, 3 across.
and 24-core 456.1mm2, 25.20 × 18.10 mm, 6 vertical, 4 across.
Note the apparent blank space to the right of the second column of cores.
No image for either 15 or 24 core die.

Sky Lake 14nm

Wikichip Skylake,  

Sky Lake  

Sky Lake 4-core 122.4mm2, 13.36 × 9.16 mm.

2017 Jul
I believe this is Skylake-SP, from Mark Bohr's Technology Manufacturing Day, 3/28/2017

Sky Lake  

Cores are arrayed 5 vertical, and 6 across. The grid spots in the left and right-most colums, second row are occupied by the memory controllers instead of cores, with the buffering circuits in the narrow vertical strips at the edges.

The other two die options are 3 vertical x 4 across, net 10 cores, and 5 vertical x 4 across, net 18 cores.

The slide below might be from 3dcenter.org, but I don't read German. I will chase it down if I can.

Epyc  

See ISSCC 2018: Intel’s Skylake-SP Mesh and Floorplan for more on Skylake SP mesh and floorplan.

Wikichip Kaby Lake,   Coffee Lake

 

Knights Landing 14nm, Xeon Phi 200

KnightsLanding  

Above is based on die size of 683mm2, and if image aspect ratio is correct, then dimensions are 32.2 × 21.2 mm.

 

 

 

 

All Desktops

Nehalem 263 mm2, 19.45 × 13.52 mm,
with 32nm dual-core Westere 81 mm2, 8.52 × 9.50mm,
and the six-core Westmere EP 248 mm2, 21.55 × 11.51mm

Nehalem   Westmere   Westmere  

Westmere   Sandy Bridge
Westmere 6-core 248mm2 and Sandy Bridge 4-core 216mm2, 21.25 × 10.16mm (could be 20.87 × 10.35?), both 32nm

Ivy Bridge 4c   Haswell 4c   Haswell 4c

I believe the above left is Ivy Bridge 4-core HM-4, 133mm2, 7.656 × 17.349mm. The Ivy Bridge HE-4 is 160mm2, 8.141 × 19.361mm (not shown).

In the middle is Haswell 4-core GT2 graphics, 177mm2, 8.09 × 21.89mm.

On the right is Broadwell 2-core GT2 graphics, 82mm2, 6.08 × 13.49mm

Broadwell 4c   Broadwell 4c

Above left is Broadwell 2-core GT2 graphics, 82mm2, 6.08 × 13.49mm. On the right is the 2-core with Iris graphics, listed at 133mm2.
From the original raw image aspect ratio, I calculate the dimensions to be 6.80 × 19.55 mm. I do not know why the standard graphics die is 6.08 mm and the Iris graphics die is 6.80 mm. Never mind, notice the blank space above the cores of the 2c Iris die.

Broadwell 4c

Above is the 4-core Broadwell. I cannot find die size or dimensions. In trying to match up the cores, I am guessing a die size of 172 mm2 and dimensions of 12.41 × 13.86mm.

Sky Lake  

Sky Lake 4-core 122.4mm2, 13.36 × 9.16 mm.