Friday February 10, 2012 4:29 PM AEST

Inside AMD's latest CPU instruction sets

By The Inquirer
10:03 May 12, 2009 | 2 Comments
Tags: amd | intel | cpu | instructions | sets | fusion
Inside AMD's latest CPU instruction sets

Is this the first sign of what AMD's Fusion platform will look like?

A few days ago, AMD announced that it would support Intel's AVX instruction set rather than continuing on with SSE5. This stops any fragmentation and lets the best implementation win.

SSE5 is a superset of AVX. AMD put the bits not covered in another ISA called XOP, so with one exception, SSE5 became AVX + XOP. When Sandy Bridge and Bulldozer come out, we will get a chance to see how each one is done, and what the various strengths and weaknesses are.

The one difference seems pretty silly. Intel updated the AVX spec in January and changed an instruction called Fused Multiply Add (FMA) from a four operand instruction to a three operand; lets call them FMA4 and FMA3 respectively.

FMA is an operation that multiplies two numbers then adds it to a third. It looks like (A * B) + C. The difference between the -3 and -4 versions is where the result ends up. If you think of A, B and C as registers, FMA3 is (A * B) + C = C while (A * B) + C = D. AMD's version puts the result in a fourth register, Intel's overwrites register C.

If you ask AMD, it will say that FMA4 saves you a copy after the operation. Intel will likely say that you don't need to move the result. One thing is clear though, if you need to do a lot of operations, like (A * B) + constant, the AMD method will save you a bunch of cycles. That said, there are technical tradeoffs to both methods.

This one op defines the differences between Intel's AVX and AMD's AVX, and it is something that any compiler can easily work around. What will be interesting is seeing if Intel's professed compiler detente will take this into account. Time will tell.

The rest of SSE5 ended up in a set of opcodes called XOP. You can read about them in an AMD blog here. A few interesting ones to check out, Integer Multiply/Accumulate (IMA), Byte Permute (BP), Bit-wise Conditional Move (BCM), and Half-Precision Convert (HPC).

IMA is interesting because it allows you to do a traditionally FP calculation with 128-bit integers. The next two, BP and BCM are somewhat similar. BP takes bytes from two 16-bit vectors and copies them to a destination using a third vector as a mask. It can also twiddle the bytes as it copies them. BCM is similar, but it uses bits, not bytes, and obviously you can't twiddle a single bit much.

The last one, HPC, has it's own extension called CVT16, and that carries a CPUID flag as well. The short story is that it will convert between half and full precision on loads and stores with control over rounding and denorms.
This may seem like a yawner, but stop and think about this, why would one single instruction need it's own ISA name and CPUID flag? Well, the instruction is very useful in graphics and setting up pixels. Top it off with denorms being a part of DX11 and... and it doesn't take a genius to see the beginnings of the fusion ISA.

All in all, AMD did the right thing here in preventing ISA fragmentaion, and a mild slap on the wrist to Intel for changing the spec so late in the game. That said, AMD seems to have the better ISA on paper, but paper is not CPU performance.

The real test will be in the products that use them. Did one side implement it as a two pass 128-bit operation and the other 256-bit one pass? Is one a vastly better implementation? Does FMA4 do a lot better performance-wise than FMA3? These are open questions, and we will be unlikely to know the answer for sure until early 2011.

 

theinquirer.net (c) 2010 Incisive Media

 
Behind the scenes with Mass Effect 3! GTX 560 VGA round-up! Essential Skyrim tweaks to improve your game! Plus reviews, news, hardware, more games, and easy to following modding guides for PC builders. ON SALE NOW!
2 Comments
ozacube
May 12, 2009 11:04 AM
I can't believe it, an Inquirer article that was actually informative, helpful and not full of biased drivel!

What's the world coming to?!
PAPA600
May 12, 2009 12:20 PM
And what did any of that actualy mean? i understood about only half of any of that...whats a BCM or a FMA4???
Comments have been disabled on this article.
 
Latest Competitions
 
Atomic Magazine

Issue: 133 | February, 2012

Atomic is a magazine aimed squarely at computer enthusiasts, gamers, and serious PC upgraders.

Every month we bring you the latest reviews of new technology and PC components, in depth features on everything from overclocking to console hacking, and gaming previews and interviews.
 
Latest Comments
 
Latest User Reviews
Battlefield 3 is the new benchmark online FPS
90%
A very fun and realistic multiplayer ride.
 
Antec Kuhler 920 - liquid cool
90%
Antec Kuhler 920 silent but effientive out of the box no maintence water cooling kit
 
Antec's Lanboy Air - our new favourite case
90%
Antec Lan boy Air in red a very cool design
 
Antec's Lanboy Air - our new favourite case
90%
This product overall is awesome.
 
MSI's GT780 laptop as fast as it gets
90%
Nice laptop
 
 
Close Get the February, 2012 issue of Atomic mailed to you for $8.95, including postage.

SubscribeBuy nowDigital Version