Graphics processing unit

A graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a dedicated graphics rendering device for a personal computer, workstation, or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. A GPU can sit on top of a video card, or it can be integrated directly into the motherboard. More than 90% of new desktop and notebook computers have integrated GPUs, which are usually far less powerful than their add-in counterparts.

Graphics accelerators

 * A GPU (Graphics Processing Unit) is a type of CPU attached to a Graphics card dedicated to calculating floating point operations and the like.
 * A graphics accelerator incorporates custom microchips which contain special mathematical operations commonly used in graphics rendering. The efficiency of the microchips therefore determines the effectiveness of the graphics accelerator. They are mainly used for playing 3D games or high-end 3D rendering.
 * A GPU implements a number of graphics primitive operations in a way that makes running them much faster than drawing directly to the screen with the host CPU. The most common operations for early 2D computer graphics include the BitBLT operation (combines several bitmap patterns using a RasterOp), usually in special hardware called a "blitter", and operations for drawing rectangles, triangles, circles, and arcs. Modern GPUs also have support for 3D computer graphics, and typically include digital video-related functions.

Early 1980s
Modern GPUs are descended from the monolithic graphic chips of the early 1980s and 1990s. These chips had limited BitBLT support in the form of sprites (if they had BitBLT support at all), and usually had no shape-drawing support. Some GPUs could run several operations in a display list, and could use DMA to reduce the load on the host processor; an early example was the ANTIC co-processor used in the Atari 800 and Atari 5200. In the late 1980s and early 1990s, high-speed, general-purpose microprocessors became popular for implementing high-end GPUs. Several high-end graphics boards for PCs and computer workstations used TI's TMS340 series (a 32-bit CPU optimized for graphics applications, with a frame buffer controller on-chip) to implement fast drawing functions; these were especially popular for CAD applications. Also, many laser printers from Apple shipped with a PostScript raster image processor (a special case of a GPU) running on a Motorola 68000-series CPU, or a faster RISC CPU like the AMD 29000 or Intel i960. A few very specialised applications used digital signal processors for 3D support, such as Atari Games' Hard Drivin' and Race Drivin' games.

As chip process technology improved, it eventually became possible to move drawing and BitBLT functions onto the same board (and, eventually, into the same chip) as a regular frame buffer controller such as VGA. These cut-down "2D accelerators" were not as flexible as microprocessor-based GPUs, but were much easier to make and sell.

1980s
The Commodore Amiga was the first mass-market computer to include a blitter in its video hardware, and IBM's 8514 graphics system was one of the first PC video cards to implement 2D primitives in hardware.

The Amiga was unique, for the time, in that it featured what would now be recognized as a full graphics accelerator, offloading practically all video generation functions to hardware, including line drawing, area fill, block image transfer, and a graphics coprocessor with its own (though primitive) instruction set. Prior (and quite some time after on most systems) a general purpose CPU had to handle every aspect of drawing the display.

1990s
By the early 1990s, the rise of Microsoft Windows sparked a surge of interest in high-speed, high-resolution 2D bitmapped graphics (which had previously been the domain of Unix workstations and the Apple Macintosh). For the PC market, the dominance of Windows meant PC graphics vendors could now focus development effort on a single programming interface, Graphics Device Interface (GDI).

In 1991, S3 Graphics introduced the first single-chip 2D accelerator, the S3 86C911 (which its designers named after the Porsche 911 as an indication of the speed increase it promised). The 86C911 spawned a host of imitators: by 1995, all major PC graphics chip makers had added 2D acceleration support to their chips. By this time, fixed-function Windows accelerators had surpassed expensive general-purpose graphics coprocessors in Windows performance, and these coprocessors faded away from the PC market.

Throughout the 1990s, 2D GUI acceleration continued to evolve. As manufacturing capabilities improved, so did the level of integration of graphics chips. Additional application programming interfaces (APIs) arrived for a variety of tasks, such as Microsoft's WinG graphics library for Windows 3.x, and their later DirectDraw interface for hardware acceleration of 2D games within Windows 95 and later.

In the early and mid-1990s, CPU-assisted real-time 3D graphics were becoming increasingly common in computer and console games, which lead to an increasing public demand for hardware-accelerated 3D graphics. Early examples of mass-marketed 3D graphics hardware can be found in fifth generation video game consoles such as PlayStation and Nintendo 64. In the PC world, notable failed first-tries for low-cost 3D graphics chips were the S3 ViRGE, ATI Rage, and Matrox Mystique. These chips were essentially previous-generation 2D accelerators with 3D features bolted on. Many were even pin-compatible with the earlier-generation chips for ease of implementation and minimal cost. Initially, performance 3D graphics were possible only with separate add-on boards dedicated to accelerating 3D functions (and lacking 2D GUI acceleration entirely) such as the 3dfx Voodoo. However, as manufacturing technology again progressed, video, 2D GUI acceleration, and 3D functionality were all integrated into one chip. Rendition's Verite chipsets were the first to do this well enough to be worthy of note.

OpenGL appeared in the early 90s as a professional graphics API, but became a dominant force on the PC, and a driving force for hardware development. Software implementations of OpenGL were common during this time although the influence of OpenGL eventually lead to widespread hardware support. Over time a parity emerged between features offered in hardware in those offered in OpenGL. DirectX became popular among Windows game developers during the late 90s. Unlike OpenGL, Microsoft insisted on a providing strict one-to-one support of hardware. The approach made DirectX less popular as a stand alone graphics API initially since many GPUs provided their own specific features, which existing OpenGL applications were already able to benefit from, leaving DirectX often one generation behind. (See: Comparison of OpenGL and Direct3D).

Over time Microsoft began to work closer with hardware developers, and started to target the releases of DirectX with those of the supporting graphics hardware. Direct3D 5.0 was the first version of the burgeoning API to gain wide-spread adoption in the gaming market, and it competed directly with many more hardware specific, often proprietary graphics libraries, while OpenGL maintained a strong following. Direct3D 7.0 introduced support for hardware-accelerated transform and lighting (T&L). 3D accelerators moved beyond being just simple rasterizers to add another significant hardware stage to the 3D rendering pipeline. The NVIDIA GeForce 256 (also known as NV10) was the first card on the market with this capability. Hardware transform and lighting, both already existing features of OpenGL, came to hardware in the 90s and set the precedent for later pixel shader and vertex shader units which were far more flexible and programmable.

2000 to present
With the advent of the OpenGL API and similar functionality in DirectX, GPUs added programmable shading to their capabilities. Each pixel could now be processed by a short program that could include additional image textures as inputs, and each geometric vertex could likewise be processed by a short program before it was projected onto the screen. NVIDIA was first to produce a chip capable of programmable shading, the GeForce 3 (core named NV20). By October 2002, with the introduction of the ATI Radeon 9700 (also known as R300), the world's first Direct3D 9.0 accelerator, pixel and vertex shaders could implement looping and lengthy floating point math, and in general were quickly becoming as flexible as CPUs, and orders of magnitude faster for image-array operations. Pixel shading is often used for things like bump mapping, which adds texture, to either make an object look shiny, dull, rough, or even round or extruded.

As the processing power of GPUs have increased, so has their demand for electrical power. High performance GPUs often consume more energy than current CPUs. See also performance per watt and quiet PC.

Today, parallel GPUs have begun making computational inroads against the CPU, and a subfield of research, dubbed GPGPU for General Purpose Computing on GPU, has found its way into fields as diverse as oil exploration, scientific image processing, and even stock options pricing determination. There is increased pressure on GPU manufacturers from "GPGPU users" to improve hardware design, usually focusing on adding more flexibility to the programming model.

GPU companies
There have been many companies producing GPUs over the years, under numerous brand names. The current dominators of the market are AMD (manufacturers of the ATI Radeon and ATI FireGL graphics chip line) and NVIDIA (manufacturers of the NVIDIA Geforce and NVIDIA Quadro graphics chip line.) Intel is steadily moving into the graphics market as they evolve their integrated graphics products to better compete with the discrete GPUs offered by ATI and NVIDIA.

Computational functions
Modern GPUs use most of their transistors to do calculations related to 3D computer graphics. They were initially used to accelerate the memory-intensive work of texture mapping and rendering polygons, later adding units to accelerate geometric calculations such as rotating and translating vertices into different coordinate systems. Recent developments in GPUs include support for programmable shaders which can manipulate vertices and textures with many of the same operations supported by CPUs, oversampling and interpolation techniques to reduce aliasing, and very high-precision color spaces. Because most of these computations involve matrix and vector operations, engineers and scientists have increasingly studied the use of GPUs for non-graphical calculations.

In addition to the 3D hardware, today's GPUs include basic 2D acceleration and frame buffer capabilities (usually with a VGA compatibility mode). In addition, most GPUs made since 1995 support the YUV color space and hardware overlays (important for digital video playback), and many GPUs made since 2000 support MPEG primitives such as motion compensation and iDCT. Recent graphics cards even decode high-definition video on the card, taking some load off the central processing unit.

Dedicated graphics cards
The most powerful class of GPUs typically interface with the motherboard by means of an expansion slot such as PCI Express (PCIe) or Accelerated Graphics Port (AGP) and can usually be replaced or upgraded with relative ease, assuming the motherboard is capable of supporting the upgrade. A few graphics cards still use Peripheral Component Interconnect (PCI) slots, but their bandwidth is so limited that they are generally used only when a PCIe or AGP slot is unavailable.

A dedicated GPU is not necessarily removable, nor does it necessarily interface with the motherboard in a standard fashion. The term "dedicated" refers to the fact that dedicated graphics cards have RAM that is dedicated to the card's use, not to the fact that most dedicated GPUs are removable. Dedicated GPUs for portable computers are most commonly interfaced through a non-standard and often proprietary slot due to size and weight constraints. Such ports may still be considered PCIe or AGP in terms of their logical host interface, even if they are not physically interchangeable with their counterparts.

Multiple cards can draw together a single image, so that the number of pixels can be doubled and antialiasing can be set to higher quality. If the screen is parted into a left and right, each card can cache the textures and geometry from their side (See Scalable Link Interface (SLI) and ATI CrossFire).

Integrated graphics solutions


Integrated graphics solutions, or shared graphics solutions are graphics processors that utilize a portion of a computer's system RAM rather than dedicated graphics memory. Such solutions are cheaper to implement than dedicated graphics solutions, but are less capable. Historically, integrated solutions were often considered unfit to play 3D games or run graphically intensive programs such as Adobe Flash. (Examples of such IGPs would be offerings from SiS and VIA circa 2004.) However, today's integrated solutions such as the Intel's GMA X3000 (Intel G965), AMD's Radeon X1250 (AMD 690G) and NVIDIA's GeForce 7050 PV (NVIDIA nForce 630a) are more than capable of handling 2D graphics from Adobe Flash or low stress 3D graphics. Of course the aforementioned GPUs still struggle with high-end video games. Modern desktop motherboards often include an integrated graphics solution and have expansion slots available to add a dedicated graphics card later.

As a GPU is extremely memory intensive, an integrated solution finds itself competing for the already slow system RAM with the CPU as it has no dedicated video memory. System RAM may be 2 GB/s to 12.8 GB/s, yet dedicated GPUs enjoy between 10 GB/s to over 100 GB/s of bandwidth depending on the model.

Older integrated graphics chipsets lacked hardware transform and lighting, but newer ones include it.

Hybrid solutions
This newer class of GPUs competes with integrated graphics in the low-end PC and notebook markets. The most common implementations of this are ATI's HyperMemory and NVIDIA's TurboCache. Hybrid graphics cards are somewhat more expensive than integrated graphics, but much less expensive than dedicated graphics cards. These also share memory with the system memory, but have a smaller amount of memory on-board than discrete graphics cards do to make up for the high latency of the system RAM. Technologies within PCI Express can make this possible. While these solutions are sometimes advertised as having as much as 768MB of RAM, this refers to how much can be shared with the system memory.

Stream processing
Another new concept application for GPUs is that of stream processing. This concept uses massively parallel floating-point, yet dedicated computational power of a modern graphics accelerator's shader pipeline. In certain applications massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. The two largest discrete GPU designers, ATI and NVIDIA, are firmly pursuing this new market.

Recently NVidia began releasing cards supporting an API extension to the C programming language called CUDA ("Compute Unified Device Architecture"), which allows specified functions from a normal C program to run on the GPU's stream processors. This makes C programs capable of taking advantage of a GPU's ability to operate on large matrices in parallel, while still making use of the CPU where appropriate. CUDA is also the first API to allow CPU-based applications to access directly the resources of a GPU for more general purpose computing without the limitations of using a graphics API.

General Purpose GPUs (GPGPU)
A new concept is to use a modified form of a stream processor to allow a general purpose graphics processing unit. This concept turns the massive floating-point computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power, as opposed to being hard wired solely to do graphical operations. In certain applications requiring massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. The two largest discrete GPU designers, ATI and NVIDIA, are beginning to pursue this new market with an array of applications. ATI has teamed with Stanford University to create a GPU-based client for its Folding@Home distributed computing project (for protein folding calculations) that in certain circumstances yields results forty times faster than the conventional CPUs traditionally used in such applications.