Saturday February 11, 2012 8:48 AM AEST

NVIDIA Tesla is reborn as ATI Streams Fire

By Nebojsa Novakovic
10:42 Jun 30, 2008
Tags: NVIDIA | Tesla | is | reborn | as | ATI | Streams | Fire
 »
NVIDIA Tesla is reborn as ATI Streams Fire

The basics of the terafight for teraflops.

Both new GPUs from NVIDIA and ATI claim teraflops-class performance: OK, for the generic $US500+ GTX280, you'll need to clock it at 650MHz or above on the GPU and with a correspondingly increased shader clock, around 1.5GHz, to get that performance.

As for the HD4850, it is advertised as the first graphics card with 1tflops single precision theoretical peak performance for less than $US200 - its faster cousin, HD4870, is rated at 1.2 tflops for less than $US300.

The "professional high performance computing" versions, the Tesla 10-series for NVIDIA and Firestream for ATI, will differ with larger on-card memory to fit in the large HPC datasets, possibly slightly higher FP performance, and - of course - lack of graphics outputs. And a higher price, natch.

Today, let's look at how the two compare on raw hardware capabilities and scaling. We'll be covering more on the increased application usage of GPGPUs despite their limitations, the programming pros and cons, as well the competition, in subsequent stories. Yes, ultimately, it may be more important who will run the new Photoshop filters or GPGPU antivirus first.

So, the landmark 1Tflop milestone has been passed: but how much of it do you really get at the end? A big chunk is determined, of course, by optimised libraries and the programming environment - what about the hardware limitations?

First, GPUs don't have nearly as much cache or other memory on-chip as typical CPUs. The sustainable speed depends more on the memory bandwidth and the memory controller efficiency for long bursts, where the initial latency can be hidden - clever pre-fetching techniques can help, too.

Let's even say that either of the GPUs mentioned here could sustain 100GB/s read or write speed to their local memory when using those long burst transfers. To sustain, say, half a teraflop and just sending out one 32-bit single precision FP number per flop to the memory - not even counting the reads for the operands - would take two terabytes/s sustained memory speed if trying to sustain that speed across data in local memory. Doable? Maybe, with those proposed Rambus next-gen memories, but not with the current stuff. Working on loops within GPU chips' internal memory would alleviate this.

Secondly, there's the double precision FP: after all, while you can use single-precision in some tasks, or parametrisation for the bigger jobs, IEEE standard DP FP is still the mainstream of most scientific and engineering codes.

 
 »

theinquirer.net (c) 2010 Incisive Media

 
Behind the scenes with Mass Effect 3! GTX 560 VGA round-up! Essential Skyrim tweaks to improve your game! Plus reviews, news, hardware, more games, and easy to following modding guides for PC builders. ON SALE NOW!
 
Latest Competitions
 
Atomic Magazine

Issue: 133 | February, 2012

Atomic is a magazine aimed squarely at computer enthusiasts, gamers, and serious PC upgraders.

Every month we bring you the latest reviews of new technology and PC components, in depth features on everything from overclocking to console hacking, and gaming previews and interviews.
 
Latest Comments
 
Latest User Reviews
Battlefield 3 is the new benchmark online FPS
90%
A very fun and realistic multiplayer ride.
 
Antec Kuhler 920 - liquid cool
90%
Antec Kuhler 920 silent but effientive out of the box no maintence water cooling kit
 
Antec's Lanboy Air - our new favourite case
90%
Antec Lan boy Air in red a very cool design
 
Antec's Lanboy Air - our new favourite case
90%
This product overall is awesome.
 
MSI's GT780 laptop as fast as it gets
90%
Nice laptop
 
 
Close Get the February, 2012 issue of Atomic mailed to you for $8.95, including postage.

Buy nowDigital Version