Home,   Parent

Asymmetric Processor Cores (2018-02)

Most Intel processors use one of two cores, the main-line core and the Atom core. Desktop processors might use the newer Kaby Lake or Coffee Lake micro-architecture cores, while the Xeon SP lines uses the previous generation Skylake micro-architecture. All of these are part of the main core line. The Atom core is used in Xeon Phi, Atom and some Pentium or Celeron processors. In simple terms, the main-line is a big core and the Atom is a small core.

The obvious question is which is better: very many small cores or not quite as many large cores? The answer, as always, is: it depends. Workloads that are highly amenable to multi-threading do better on more smaller cores. This has to do with the second aspect of Moore's Law. Workloads that do not scale well with multi-threading may do better on big cores. It would seem that all we have to do is match the workload to the processor with the desired core type. For example, database transaction processing should scale well with cores and threads. And if so, then it is a match to the processor with very many small cores.

Except that life and databases are not simple. The main workload might be transaction processing, but there are also many other important activities that occur on the system. Some do not scale well on many cores and heavily favors the big core. Is the answer to this conundrum a mix of small and big cores?

A processor could have an asymmetric arrangement of small and large cores. In a large die, the could be both very many small cores and a small number of big cores all in one processor. The number of small cores is larger than the number of big cores that can be placed on a single die. A workload that does not scale well to high thread counts still has access to the large cores. Because this workload does not scale, there is no point in having many of the large cores.

Skylake Xeon SP and Knights Landing Xeon Phi

Below left is the Intel Skylake-based Xeon SP, XCC die and below right is the Knights Landing based Xeon Phi x200. Both die are just under 700mm2.

Xeon SP (Skylake), 694mm2Xeon Phi 200 (Knights Landing) 683mm2
5 rows, 6 columns, 28 cores max.7 rows, 6 columns, 2 cores/tile

Below is a layout representation of the Skylake XCC die. The Skylake XCC die has 5 rows by 6 columns minus 2 spots for the memory controllers, and excluding the top row for PCI-E and UPI elements, for a maximum of 28 big cores.


Below is the layout of Knights Landing (KNL), organized as 7 rows by 6 columns of tiles, excluding the top and bottom rows for the MCDRAM controllers.


Two tile spots are used by the DDR memory controller, and two are used for PCI-E and DMI. This leaves 38 tiles. There are 2 Atom cores in each tile. The Xeon Phi x200 series has a maximum of 72 cores. Two tiles are spares used to improve manufacturing yield.

Below is a functional representation of the KNL tile. The sub-units may not be to scale. There are 2 cores, two double-VPU units, 1MB L2 cache shared by the two cores, and the agent which handles connections to other tiles and functional units.


As far as I am aware, the SQL Server core engine code does not use the vector (SSE/AVX) instructions. It would be a shame to waste so much silicon real estate on unused units. Either new instructions should be added to be useful in b-tree navigation or page-row-column offset calculation or the VPU should be removed from the mix core product. I gave a brief discussion on this in Rethink System Architecture.

Below is a simpler representation of the big core Xeon SP and little core Xeon Phi processors.

Xeon SP (Skylake), 694mm2Xeon Phi 200 (Knights Landing) 683mm2

Based on an eye-ball estimate of the die images, a mixed core processor could have four large cores and 30 tiles for double small cores in a slightly smaller die of about 670mm2, or 4 large and 36 tiles in a larger 750mm2 die. A representation is shown below.

Mix1, ~670mm2Mix2 750mm2
4 big cores, 30 tiles4 big cores, 36 tiles

The smaller option might be better because both Skylake and KNL processors are thermally limited? Other mix options are possible. The assumption here is that the big cores are for workloads that do not scale well at high thread count and hence there is a need for only a limited number of large cores. It is further assumed that some mechanism will be implemented to determine which type of cores handle each particular function or workload.

One option in having very many small cores is in how cores are allocated. Polling might be a better mechanism to handle extreme IO than interrupts. One or more small cores could be dedicated (and customized?) to kernel tasks like polling.


Matt Gillespie paper, "Preparing for the Second Stage of Multi-Core Hardware: Asymmetric (Heterogeneous) Cores", 2008