Thursday May 24, 2012 11:01 PM AEST

Nvidia is losing on the high performance front

By The Inquirer
10:27 May 31, 2010 | 5 Comments
Tags: Nvidia | Fermi | geforce | gpu | video | card | news | opinion
 »
Nvidia is losing on the high performance front

Analysis: Forget Fermi, bring on the Atom.

Nvidia might have an unlikely competitor thanks to the mess it made with Fermi.

Fermi chips on Nvidia's high performance computing (HPC) Tesla boards have disappointed, not just in terms of performance but more importantly with the firm's continued inability to rein in power consumption. The ill-advised design choices it made have, as The INQUIRER reported first, left some of its former loyal supporters such as Silicon Graphics International (SGI) looking elsewhere for viable alternatives.

For HPC vendors the Fermi problem isn't just its lower than expected performance but in the reality of data centres, where Nvidia's Fermi based Tesla cards limit the amount of hardware a customer can put in a rack before running out of cooling capacity. The Fermi architecture was designed to extend Nvidia's rapid expansion in the HPC market, so the news that the Green Goblin had to not only cut back on the number of 'Cuda' codes but increase the thermal design power (TDP) as well was a double whammy.

As Nvidia got too clever for its own good, it forgot what had made its GPGPUs so popular. SGI's senior director of server product marketing Bill Mannel told The INQUIRER that even though it had invested heavily in Field Programmable Gate Array (FPGA) through its RASC programme, when surveying alternatives, including the Cell architecture, Nvidia's Cuda represented the best fit for the firm at the time. In some ways SGI's decision was the correct one, as Mannel says that many of the other options have since "fallen away". Now there's a chance that Nvidia will join that list.

To understand why power draw and cooling are so important in a graphics chip that's destined for HPC environments, one must look at the mindset behind running equipment that is designed for remote administration. Those who have spent any time working in a data centre will attest to the fact that they are hostile places to work.

Servers and cooling equipment create an oppressive cacophony of noise and when combined with the low ambient temperature, which Fermi does so well to raise, even the simplest of tasks can become tedious. Sending an engineer can be especially costly given the growth in modularised data centres, allowing for equipment to be dumped in the middle of nowhere to make use of lower data centre costs.

For this reason, vendors such as SGI design servers so that the majority of maintenance can be done remotely, which leaves hardware failure as the primary reason to ever send engineers down to the data centre. However, that's expensive. Therefore when Mannel admits reliability is affected by "hot cards" such as Fermi, it's hardly surprising that SGI is thinking of jumping ship.

Thanks to its acquisition of Cray back in 1996, SGI has access to several exotic HPC cooling approaches. The same Cray engineers who designed some of the supercomputing icons from the 70s and 80s, according to Mannel, are still on the payroll. Coming from a company that was known for over engineering its products, those engineers, who Mannel calls "a conservative bunch" when it comes to pushing the thermal design envelope, are wary of Fermi, and rightly so. The engineers lighten up when it comes to 'cloud computing' clusters, presumably because a limited number of failures can be masked through the abstraction of quantity.

The aura surrounding Fermi cards is enough to instil fear into systems vendors. SGI not only has to put "an additional amount of work" in testing systems but even those which do not ship with a Fermi board have to be designed "with the knowledge that a [Fermi] Tesla board may be added". Given that Mannel already alluded that "hot cards" can lead to a "worse failure profile", how long can Nvidia expect vendors to go the extra mile to design, implement and service boards that give them so much trouble at every stage of their lifecycle?

 
 »

theinquirer.net (c) 2010 Incisive Media

 
Aliens: Colonial Marines in depth; Z-77 Motherboard round-up; strategy gaming special; Home Server tutorial. PLUS MUCH MORE - ON SALE NOW!
5 Comments
SceptreCore
May 31, 2010 11:42 AM
Now it all makes sense. This is why AMD is looking so interesting. Quite soon they will be able to offer their Llano chip just for this type of work, giving the benefit of CPU and GPU to allow customers like SGI to have linear CPU crunching, and GPU parallelism with the need for only 1 piece of hardware.

Very clever.
Tythais
May 31, 2010 9:49 PM
Wow, what an incredibly biased article... Yeah the multi-atom servers are cool, but the fermi-TESLA based workstations are incredible for parrallel processing, and in areas like physics won't be going away any time soon.
omega
Jun 2, 2010 4:45 PM
Tythais - you sound suprised that The Inquirer is biased against nVidia.
This is common practice for them. Anytime they have an article about GPU's you can bet they will have a pot shot at nVidia for something.
sheok
Jun 9, 2010 1:56 PM
the article wasn't directed at workstations but server farms. one card blowing heat and sucking down power is fine in a single machine, but in a rackmount it creates havoc and liability.
elitePraetorian
Jun 30, 2010 4:25 AM
No one's denying this fact im just glad that people keep dissing Nvidia an chucking oil on the fire of the Card wars, keeps them competitive :p
Comments have been disabled on this article.
 
Atomic Magazine

Issue: 137 | June, 2012

Atomic is a magazine aimed squarely at computer enthusiasts, gamers, and serious PC upgraders.

Every month we bring you the latest reviews of new technology and PC components, in depth features on everything from overclocking to console hacking, and gaming previews and interviews.
 
Latest Comments
 
Latest User Reviews
Battlefield 3 is the new benchmark online FPS
90%
A very fun and realistic multiplayer ride.
 
Antec Kuhler 920 - liquid cool
90%
Antec Kuhler 920 silent but effientive out of the box no maintence water cooling kit
 
Antec's Lanboy Air - our new favourite case
90%
Antec Lan boy Air in red a very cool design
 
Antec's Lanboy Air - our new favourite case
90%
This product overall is awesome.
 
MSI's GT780 laptop as fast as it gets
90%
Nice laptop