Greener and meaner; in no way leaner.
Six months is a long time in the computing world, and for many it can determine the difference between a clear winner, and the ultimate loser. Winning in this game means the strongest sales over twelve months or more; it means superiority of brand, and it pays to play on time. NVIDIA has tough competition in the form of ATI - who has been the leader in the market since September of 2009 - and NVIDIA's betting the farm on the GTX480 pulling them through this cycle of graphics tech. Will it be enough to slow the runaway freight train that is ATI?
Firming up FermiNVIDIA performed a flawless coup in November of 2006 with its 8800GTX graphics card, based around the G80 core that was manufactured on a relatively huge 90nm process. It boasted a small complement of 128 stream processors, and was revolutionary for entirely re-thinking how workloads were acted upon; removing the need for specialised hardware units, with each piece able to change its duties on-the-fly to meet demand.
The company built upon the success of the G80 core to bring about the G200; a core that almost doubled the processor count to 240, and shrunk the manufacturing process markedly to only 55nm. The GT200 saw use in the GTX285 and dual-chip GTX295, and though the manufacturing process had shrunk by 35nm, the transistor count more than doubled from 681 million to a whopping 1.4 billion. However, progress is very rarely static, and NVIDIA decided to take the new GF100 to a new level entirely - with a transistor count of three billion.
The raw power of the GF100 is undeniable: double the processing cores, full support for GDDR5 memory and a completely restructured core. However, there's more going on behind the scenes that isn't accounted for in the transistor count. Logic would otherwise dictate that 2.8 billion transistors should have made an appearance, but this clearly isn't the case. Manufacturing difficulties appear to have hindered yields of the GF100 with the full amount of processors, and considering the gigantic die size of 529mm2, it isn't surprising to see the amount reduced. As each transistor is roughly the size of a chain of DNA, which is only 34nm long, the precision required is fantastically difficult to guarantee - especially when it's repeated three billion times. To find out where the extra transistors came from, we're going to delve deeper, into the very heart of the core itself.
Processing powerThe GF100 core takes a similar approach to the task of graphical computing as the preceding architectures, but it does so in a very structured way. While the stream processors are now called CUDA Cores, their purpose has not changed, and there are 480 of them in use - sadly not the originally promised total of 512. They've been structured into explicit groupings, the highest level of which is called the Graphics Processing Cluster, or GPC.There are four GPCs in the GF100 core, and each is an entirely self-contained ecosystem of processing components that can function individually; meaning that carving the core up to create lower-end models will be much easier. Inside each GPC resides a total of four large blocks called Streaming Multiprocessors, or SMs.
These SMs each contain 32 CUDA Cores, the processors that do the actual calculation work, though it appears that a single SM in the GF100 is deactivated. The fifteen functional SMs add up to a total of 480 CUDA Cores.
However, the CUDA Cores themselves can break down further, with miniature support units adjacent to each Core that handle Floating Point and Integer tasks, as well as the Operand Collector; a unit that simulates a multi-port memory interface without the need for additional bandwidth, which essentially manages the memory usage of the Cores. The final important unit within an SM is the Polymorph Engine, a dedicated geometry pipeline that provides hardware acceleration for tessellation.
Issue: 137 | June, 2012