Home, Cost-Based Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage, TPC-H Studies

Intel Microarchitecture Diagrams


Sandy Bridge


Nehalem


Core


NetBurst (Pentium 4, AKA Willamette


Pentium M


Atom

Notes from Intel 64 and IA-32 Architectures Optimization Reference Manual

decoded Icache Branch Prediction Macro-fusion

Pipeline:
In-order issue front-end - fetch and decode into micro-ops.
Out-of-order superscalar execution engine
In-order retirement

2.1.1 Sandy Bridge

1 Branch Prediction Unit
a. Decoded ICache
b. Instruction Cache
c. L2 cache, LLC and memory
2. Micro-ops sent to Rename/retirement block
scheduler
execute - 3 stacks
branch misprediction
3. Memory operations - managed and reordered 4. Exceptions -

2.1.2 Front End

Instruction Cache - 32K
Legacy Decode pipeline micro-op queue and Decoded ICache
Decoded ICache
MSROM
Branch Prediction Unit
Micro-op queue

2.1.2.1 Legacy Decode Pipeline

32KB ICache, 8-ways, 128 ITLB 4K, 8 ITLB large page
Instruction PreDecode
Instruction Decode
MicroFusion
Macro-Fusion

2.1.2.2 Decoded ICache
32 sets, 1 set = 8 ways, 1 way = upto 6 micro-ops, 1536 total

2.1.2.3 Branch Prediction

2.1.2.4 Micro-op Queue, Loop Stream Detector (LSD)

2.1.3 Out-of-Order Engine

2 blocks
Rename/retirement block
Scheduler
3 components

2.1.3.1 Renamer

2.1.3.2 Scheduler

Retirement

2.1.4 Execution Core

6 ports
Execution stacks
General Purpose integer
SIMD integer and floating point
X87

P0 P1 P2 P3 P4 P5
Int ALU ALU LS LS St ALU
SSE I x x - - St ALU
SSE F
X87

2.1.5 Cache Hierarchy

latency BW per core per cycle
L1D 4 2x16
L2U 12 1x32
LLC 26-31 1x32
L2 and L1D in other cores 43 clean hit 60 dirty hit