The Pentium 4 architecture (Xeon MP is 4-way server variant) was meant run at very high clock rates. The very high clock rate on the Pentium 4 is achieved with a very deep pipeline, that is, each operation is divided into many very small steps. There are 20 in the first generation Pentium (Willamette and Northwood) and ~31 for the second generation (Prescott and Cedar Mill), compared to 10-12 on Pentium Pro to Pentium III line, 12 stages for integer and 17 for floating point on the Opteron and 14 for the Core 2. The Itanium 2 has 8 pipeline stages, but can execute 6 instructions simultaneously. To realize the full benefit of the very deep Pentium 4 pipeline, the code sequence must be very predictable. Otherwise the pipeline gets flushed too often, and the Pentium 4 is no better than other processors running at lower clock rate. The Pentium 4 tends to have uneven characteristics relative to other architectures across a range of applications.
The main innovative features of the Opteron processor architecture are: Hyper-Transport replacing the FSB, the integrated memory controller, the instruction set extensions for 64-bit, expanding the number of integer and floating point registers from 8 to 16 each (Intel architects swore that this could not be done. It was said that Sun Microsystems provided assistance on the instruction set architecture.), and a fully pipelined floating point unit. Like the Pentium Pro, the Opteron core implemented out-of-order 3-wide superscalar execution units. Beyond that, Opteron has several other micro-architectural innovations to be competitive in the Pentium 4 time frame.
The first generation Pentium 4 was reasonably competitive with the Opteron. For both on the 130nm process, the Pentium 4 at 3.0GHz would be competitive with Opteron at 2.0GHz. Depending on the nature of the application, Pentium 4 had an advantage on some and Opteron on others. The Intel strategy for the second generation Pentium 4 architecture (codename Prescott, 90nm) was to achieve even more extraordinarily high clock rates, hence the deeper pipelining to 31 stages. Unfortunately for Intel, the leakage current for off-state transistor in the 90nm process was much higher the anticipated, so the Prescott core could not clock any where near as high as it was designed, within the 130W envelope that capped the amount heat that could be removed from a medium sized die (112 square mm). At 90nm, Pentium 4 reached 3.8GHz (3.66GHz in MP server variants, 3.33GHz with 8M L3 cache) compared with 2.8-3.0GHz for Opteron. There are indications that the second generation Pentium 4 architecture was targeted to reach 5-6GHz at 90nm and 9GHz on the 65nm process.
A brief note on comparing Intel and AMD processor frequencies: Intel tends to introduce a processor line with a range of frequencies, and then sometimes provide one or two higher frequencies later, but the main focus is to launch the next generation. AMD may incremental frequencies throughout the product life.
Around this time, it was clear that single core performance could not be pushed at the traditional 40% per year pace of Moore’s Law. There was plenty of transistor budget, as each manufacturing process step doubles the number of transistors at a given die size. For server applications, the best strategy for performance in terms of through-put was multiple cores, either on one die or multiple die in one package (socket). Perhaps the one and only benefit (at this point in time) of a processor using the shared bus protocol is that it is relatively simple to put two single core die into one package, i.e., a processor socket. AMD had to do substantial design work to build a single die dual-core processor.
The down side for Intel on the dual-core Pentium 4 architecture was that frequency had to be scaled back significantly to fit within the maximum supportable thermal envelope, 165W for a dual-core 3GHz 90nm Xeon 7000 series. By contrast, the Opteron at 90nm single core model 856 (DDR) was 3GHz in a 93W thermal envelope. The dual-core model 890 (DDR) 2.8GHz fit in a 95W envelope, the dual-core model 8220SE (DDR2) 2.8GHz had a 120W envelope. So AMD was able to accommodate dual core Opteron on 90nm without giving up much of the design frequency. Intel had to give up substantial frequency on the Pentium 4 architecture to stay within the desired thermal envelope.
At 90nm, Intel was limited to 3.0GHz for the dual-core. AMD started at 2.2GHz in 2005, and slowly incremented frequency to 2.8GHz by 2006.
Intel had a single die, dual core Pentium 4 with 16M shared L3 cache on the 65nm process in late 2006. This was able to clock at 3.4-3.5GHz. Because of the large cache, and hyper-threading, it was able to produce a very impressive TPC-C result, but the dual-core Opteron at 2.8GHz was able to achieve better performance in other aspects.

Xeon 7000 series, two Pentium 4 90nm, 135mm2, 2M L2, 169M transistors per die (2005)
and Dual Core Opteron, 90nm, 199mm2, 2x1M L2, 233M transistors (2005/6)
Pentium 4, 65nm, 81mm2, 2M L2, 188M transistors per die (2006)

Xeon 7100, 65nm, 435mm2, 2x1M L2, 16M L3, 1.328B transistors (2006)
Afterwards, these became the 7000 series. The first, numbered 70xx, were the standard 65nm NetBurst with two single core die in a single package, each due with 2M L2 cache used in desktop and 2-way systems. In 2006, this was followed by the Xeon 7100 series, which was 2 NetBurst cores, 1M dedicated L2 and 16M L3 shared all on a single die. The very large L3 cache significantly improves high-call volume transaction-type applications. In 2007, the 7300 series comprised two dual-core 65nm Core 2 die in a single package, with 2 x 4M L2 cache. In 2008, the 7400 series was a 45nm process, six cores comprised of 3 dual-core pairs each with 3M L2 shared by the pair, and 16M L3 shared cache, all on a single die. As with the 7100 series, the very large cache significantly improved high-call volume applications.
The Core 2 on 65nm, has two cores, with 4M L2 shared cache, and run at up to 3.0GHz.

Xeon 7300, 65nm, 143mm2, dual core per die, 65nm, 4M L2, 291M (2007)
The Core 2 architecture (codename: Conroe) was designed for power efficiency from the beginning. Detailed discussions of Core 2 architecture can be found else where.
As mentioned earlier, the Core 2 architecture at 3GHz was far more powerful that the Pentium 4 architecture at 3.4GHz or Opteron at 2.8GHz at the processor core level. This is evidenced by the SPEC CPU Integer base scores of 21 for Core 2, 12 for the Pentium 4, and 13 for Opteron. Single query tests in SQL also demonstrate the outstanding single core performance of the Core 2 architecture. Shortly after the original dual-core Core 2 launch, a quad core product with two die in on package was released.
A Core 2 architecture for 4-way systems was not made available until late 2007, this being the quad core, two die 65nm Xeon 7350, up to 2.93GHz. In late 2008, a customize 45nm, single die 6 six core, 16M L3 shared Xeon 7460 was launched.
Xeon 7400, 45nm, 503mm2, 3 dual core pairs, 3 x 3ML2, 16M L3, 1.9B (2008)

Quad Core Barcelona, 65nm, 285mm2, 4x512K, 2M L3, 463M transistors (2007),
QC Shanghai, 45nm, 258mm2, 6M L3 758M transistors (2008),
Istanbul, 45nm, 346mm2 6M L2, 904M transistors (2009)

Conroe 65nm (2006), Penryn, 45nm 6M L2, 107mm2, 420m transistors (2007)
Nehalem, 45nm, 256K L2 per core, 8M L3, 263mm2, 731M transistors (2008)

| Year | 2006 | 2007 | 2008 | 2007/8 | 2008/9 |
|---|---|---|---|---|---|
| Architecture | Core 2 | Core 2 | Nehalem | Opteron | Opteron |
| Process | 65nm | 45nm | 45nm | 65nm | 45nm |
| Top Frequency | 2.93 | 2.66 | 2.93 | 2.5 | 2.9? |
| 4+ Sockets | 7350 | 7460 | 5570 | 8360 | 8370 |
| Cores | Quad | Six | 4 | 4 | 4 |
| L2 Cache | 2x4M | 3x3M | 8x256K | 4x256K | 4x256K |
| L3 Cache | - | 1x16M | 1x16M | 1x2M | 1x6M |
| Top Frequency | 2.93 | 2.66 | - | 2.5 | 2.7/3.1 |
| SPEC CPU 2006 Int. base | 21.0 | 21.7 | 31.5 | 14.4 | 16.9/19.7 |
| 2-way tpm-C | 631,766 | ||||
| 2-way tps-E | 766.47 | ||||
| 2-way tpc-H | 17,686 | 51,422 | |||
| 4-way tpm-C | 407,079 | 634,825 | 471,883 | 579,814 | |
| 4-way tps-E | 451.29 | 671.35 | 635.43 | ||
| 4-way tpc-H | 34,990 |