Friday February 10, 2012 8:23 PM AEST

CPU and GPU now, the convergence goes on

By The Inquirer
10:02 Nov 2, 2009 | 6 Comments
Tags: CPU | GPU | CPGPU | graphics | processor | news
« 
CPU and GPU now, the convergence goes on

GPU advances
On the other hand, GPU priorities are a little different. Multiplying the thread counts and processing unit numbers here was more important than the power of each processing unit within the GPU, as the typical graphics pipeline is far more predictable and more parallel than most tasks run on general purpose CPUs. So, if an AMD/ATI HD5870 GPU has 1,600 simple shaders in parallel, or an Nvidia GT300 has 512 more complex and more CPU-like shader cores, the GPU looks way different from 4 to 8 CPU cores on a processor die.

Then, despite the four times slower average clock speed for the core, or three times for the shaders, versus the standard CPUs, the vast parallelism of GPUs allows far higher theoretical computational power. When it comes to the double-precision floating-point throughput we discussed before, let's look at what AMD/ATI and Nvidia might have in a few months, in the same timeframe with Intel's Gulftown and AMD's Magny Cours.

On the ATI side, a speed update for the HD5870, probably something called HD58X0, should be there with the refinement and stepping updates of the R800 family dies. If running at a default 950MHz GPU and proportionally sped up shaders, the new device should reach 3 TFLOPS in single-precision floating-point and, more importantly, 600 GFLOPS in double-precision floating-point, both IEEE compliant. In fact, some of the overclockable HD5870 entries, like those from Asus, already provide such speeds.

So, if your code can run efficiently with AMD Stream libraries and such, a dual-GPU hypothetical HD58X0 card will likely give you 1.2 TFLOPS of double-precision floating-point power for precision runs, and 6 TFLOPS of single-precision floating-point for parametrisation and estimation runs. Now, just make sure there is enough memory in there to hold the data sets of multiple threads without running over the PCIe bus to the main memory, as, despite the limited GPU caching, the slow link can cut the performance by as much as an order of magnitude. Therefore, 2GB of GDDR5 memory per GPU is strongly recommended, if doing GPU computation.

By early next year, we all hope that Nvidia's GT300 will already be launched and shipping, because if it isn't, that will be big trouble for the green graphics gang. Let's assume it does. With 512 shader processors that can do either 512 single-precision or 256 double-precision fused multiply adds per clock, that would at, say, 1.8GHz shader clock, give you 1.8 TFLOPS in single-precision mode or 900 GFLOPS in double-precision mode. Not bad at all.

But what's far more interesting is that the GT300 promises to enable a far greater range of codes to make use of all that power. With an overall architecture far closer to a CPU this time, many normal C, C++ and Fortran codes should be able to run on it out of local GPU memory. With up to 6GB of onboard memory in the first iteration, and 8GB in the subsequent one, the latter with a 512-bit memory bus, the GT300 should be quite a bundle.

What the GT300 misses to really be a true CPU and run all the usual stuff, including booting an OS, are a full fledged memory management unit (MMU), for virtual to physical memory translation, and a front-end general purpose CPU instruction set. That's why I was saying many times that Nvidia should have had a real CPU, like say the Alpha did, which would provide both ultrahigh performance better than the X86 to fill in that niche, and also offer the built-in capability to run X86 code very fast via a real-time translator like the famed FX!32 without having to pay for an X86 license.

Don't forget that the last planned Alpha incarnation, the EV9 21564, was supposed to have a kilobyte-wide (yes 8,192 bits) vector unit able to put out over 100 GFLOPS in double-precision floating-point, some 9 years ago. Imagine what would it be able to achieve today.

The Tegra and other ARM-based stuff is simply too weak to be a front end for a gigantic TFLOPS-class GPU. For a proper "fusion" at the system level, you need very fast and wide main system memory, a multi-channel multi-gigabyte setup at least, to feed it from the CPU side, and very fast multiple HyperTransport or QuickPath or Alpha EV links to connect multiple GPUs with the main CPUs for efficient coherent shared memory access between GPU and CPU memory banks. In the absence of a general purpose CPU that's able to do this, Nvidia might have to negotiate a QPI license with Intel to directly link its GPUs to the Westmere and future CPUs, in order to enable more of the coprocessor model here. But wait, wouldn't the long delayed Larrabee be gunning for the same role?

I'll have more on this, and the 'ideal' CPU-GPU system configuration, in Part 2.

 
« 
 
Behind the scenes with Mass Effect 3! GTX 560 VGA round-up! Essential Skyrim tweaks to improve your game! Plus reviews, news, hardware, more games, and easy to following modding guides for PC builders. ON SALE NOW!
6 Comments
pkroeze
Nov 2, 2009 10:20 AM
i think it will be a long time before we get an intergrated CPU/GPU because that would mean less money for all companies involved. take AMD/ATI if they do not merge these technologies they sell 2 cores for every PC but once they merge CPU and GPU they only sell 1. On the other side Nvidia and intel probably won't try it either first of all because of legal bindings between the 2 but also if Intel tried to make a cpu/gpu they would have a hard time selling them until these technologies match seperate CPU and GPU's.
Jeruselem
Nov 2, 2009 10:42 AM
I'm not a fan of absolute single point of failure given how hot CPUs and GPUs get.
tunksy
Nov 2, 2009 10:46 AM
thanks for a very intresting read.
thesorehead
Nov 2, 2009 1:01 PM
As Jerusalem said: top-of-the-line GPU and CPU parts are going to need to be separate for some time simply for the purposes of heat dissipation.

However, GPUs integrated into the MOBO have been fine for "home/office" use where all you need is the Windows desktop and a bit of Java/Flash/whatever shininess. I can see a scenario where Intel/AMD compete in this budget-conscious arena with a fully-integrated system.
CK
Nov 2, 2009 11:13 PM
How about a motherboard with 2 Separate sockets on it? One the CPU and other the GPU. Both spaced apart enough to get aftermarket heat sinks on them for cooling/overclocking, Able to both access system memory when they need it(hopefully DDR5 or something for the GPU's sake).Didn't P55 just do away with an extra chip on the motherboard? Just coincidence maybe???
omega
Nov 3, 2009 1:23 PM
CK, then you'd have things like an AMD/Intel, AMD/Nvidia (would they even...), AMD/Intel(?), Intel/ATI, Intel/Nvidia, Intel/Intel(?) motherboard options as I dont see ATI/Nvidia and Intel all sharing the same socket type for their GPU's.

It would make buying a mobo a lot harder as you dont know which company (blue, green or red) will have the best GPU in 2 years time when you upgrade.
Comments have been disabled on this article.
 
Latest Competitions
 
Atomic Magazine

Issue: 133 | February, 2012

Atomic is a magazine aimed squarely at computer enthusiasts, gamers, and serious PC upgraders.

Every month we bring you the latest reviews of new technology and PC components, in depth features on everything from overclocking to console hacking, and gaming previews and interviews.
 
Latest Comments
 
Latest User Reviews
Battlefield 3 is the new benchmark online FPS
90%
A very fun and realistic multiplayer ride.
 
Antec Kuhler 920 - liquid cool
90%
Antec Kuhler 920 silent but effientive out of the box no maintence water cooling kit
 
Antec's Lanboy Air - our new favourite case
90%
Antec Lan boy Air in red a very cool design
 
Antec's Lanboy Air - our new favourite case
90%
This product overall is awesome.
 
MSI's GT780 laptop as fast as it gets
90%
Nice laptop
 
 
Close Get the February, 2012 issue of Atomic mailed to you for $8.95, including postage.

SubscribeBuy nowDigital Version