Analysis: Could AMD be catching up to Intel in the CPU stakes?
AMD has announced its Phenom II X6 line this week, one wonders how AMD's latest line of x86 desktop processors will compete against the current performance leader, the Intel Core i7 980X, as well as its more reasonably priced mainstream LGA1156 socket siblings including the Core i7 880 and 875K.
You already know the specs of AMD's latest desktop chip. It's still fabbed on the 45nm process and six cores share 6MB of joint L3 on-die cache while their own individual 512KB L2 caches are still using the 'exclusive' approach with data in L2 not copied into L3.
That approach has its pros and cons. The plus point is that a total of 9MB of data can be stored in the combined six L2 plus one L3 cache for increased total cache capacity, and the minus point is that, whenever one CPU core needs some data from other CPU cores, it might have to search for it across all the other L2 caches plus the L3 cache, and then go to main memory if not found. Of course, improved search algorithms can cut the penalty quite a bit, but not totally.
In the case of Intel chips with their "inclusive" cache approach, the smaller 256KB L2 per-core cache contents for each CPU are all also copied to the shared L3 cache. For the 3.33GHz Core i7 980X, that's not a problem at all since it has a huge 12MB L3 cache. The benefit here is that, whatever data is required by any core beyond its own local cache, just one search in the shared L3 cache is required before going to main memory if needed.
In this case, the performance benefit for either side will depend on the application, but mostly Intel's approach errs on the side of larger total cache and fewer search hops, as latency benefits come into play.
Now, the Phenom II X6 die seemingly has more in common with AMD's latest dual die "Magny Cours" 12-core and single die "Lisbon" six core Opteron server chips than the earlier "Istanbul" six core DDR2-only Opteron server chips. The Phenom II X6 is expected to have the same core throughput enhancements as Magny Cours and Lisbon, as well as, of course, native DDR3-1600 dual channel memory support and a HyperTransport 3.1 link to the chipset at a full 6.4 Gigatransfers/sec speed, matching Intel's Quick Path Interconnect (QPI) link.
At launch, the fastest Phenom II X6 part is the 1090 model at 3.2GHz speed, even though a lot of preliminary benchmarks were done on the 3GHz 1075 and 2.8GHz 1060 parts. Ignoring the "Turbo" from both sides, as well as Intel multithreading, this now comes to within 5 per cent of the fastest Intel part's clock speed. However, Intel's parts do have a noticeable clock-for-clock performance advantage over AMD's previous Phenom chips, and Intel's six-core Westmere chips managed to widen that gap further.
theinquirer.net (c) 2010 Incisive Media
Issue: 133 | February, 2012