-
Notifications
You must be signed in to change notification settings - Fork 7
What is a GPU?
The GPU, or Graphics Processing Unit, was initially intended to be a secondary device that the CPU could offload the increasingly complex task of rendering (representing 3D meshes in one or more 2D images) onto. Because of this, GPUs had to be designed to perform many calculations very quickly in order to keep up with the stream of images needing to be displayed. This was accomplished by making the GPU highly parallel, allowing for multiple parts of an image to be rendered at once and thus reducing the total amount of time it took to render an image compared to the CPU.
To get an idea of the vast difference in computing that GPUs brought, check out this awesome visual demonstration by Mythbusters duo Adam Savage and Jamie Hyneman.
Early on, GPUs were highly specialized for rendering, meaning it was difficult if not impossible to harness their capabilities for other compute tasks, but eventually GPU architectures were modified slightly and GPU programming languages were developed to enable GPUs to be used for a more generic workload, allowing them to be used for complex, time-consuming tasks such as machine learning and physics simulations, among many more.
Before getting to how GPUs work, it is handy to know how a CPU works for comparison, because both use the same types of internal components and largely follow the same processing pattern. At a high level, a typical CPU consists of 3 main components, shown below, which are: the control unit, a microprocessor within the CPU that determines what the other components in the CPU should do based on the instructions it receives; the Arithmetic Logic Unit (ALU) which is responsible for carrying out the operations done by the CPU, and cache memory which stores data directly on the chip for high-speed access after it has initially been loaded from slower, but much larger Random Access Memory (RAM), also called main memory.
The typical CPU processing cycle begins when the CPU receives an instruction. The instruction is decoded by the control unit, which then retrieves the necessary data, first looking for it in the cache and then loading it from RAM if the needed data is not in the cache. The control unit then sends the data to the ALU and signals it to perform the necessary operations, and finally routes the resulting data back to the cache and/or a destination outside the CPU. This process, repeated hundreds of thousands of times per second, is what makes up modern day computing.
GPUs work somewhat similarly to CPUs, just on a much larger scale. Rather than one ALU, GPUs have dozens of arithmetic and logic execution cores that are functionally similar to the ALU, organized into groups so that they are easier to manage. Each group of cores carries with it a streaming multiprocessor (SM), the GPU's equivalent of the CPU control unit, as well as a set of register banks which serve to hold data that each SM group will operate on and output. In addition, the GPU also has a larger but slower shared memory bank that is accessible by all of the SM groups that is used in much the same way that the CPU utilizes main memory. It also enables shared-memory communication between different SM groups that may be collaborating on the same task.
When an instruction arrives on the GPU, it is stored on a shared instruction buffer that is periodically checked by all of the SMs. The SM that received the instruction will then schedule a number of sub-groups of its execution cores, called warps, to complete the task. Warps typically consist of 32 execution cores (though in our downsized example they only contain 4) and are the smallest execution unit within the GPU. This means that any instruction received on the GPU will run at least a single warp, even if it only requires one of the warp's many cores to run in order to complete.
After determining the correct number of warps to run, the SM then coordinates the simultaneous execution of all assigned warps. During execution, each core fetches its own unique input data from the register bank, operates on it, and in some cases stores the result in another register.
In the next article, we'll look at what GPU execution looks like on the software side by digging into the GPU programming model and some basic CUDA syntax.