Imagination Technologies PowerVR Performance Counters
- Native SDK - to get access to the performance counters.
- PVRTune Counter List and Description (2018), [backup].
- PVRTune and PVRScope Documentation
2D - 2D Core (TLA). The purpose of the 2D core is to perform efficient blitting operations. For example, OS composition may utilize the 2D core so that the rest of the GPU pipeline can be dedicated to application rendering. Used for: buffer copy, image copy, image blit.
3D - TBDR pass ?
FBA - Frame Buffer Accumulate unit.
HSR - Hidden Surface Removal.
ISP - Image Synthesis Processor. ISP fetches the primitive data and performs Hidden Surface Removal (HSR), along with depth and stencil tests. The ISP only fetches screen-space position data for the geometry within the tile. (Rasterizer?)
IMR - Immediate Mode Renderer.
MCU - Multi-level Memory Cache Unit.
PB - Parameter Buffer. Tile list and the transformed vertex data are both stored in an intermediate store PB.
RTU - Ray Tracing unit. Accelerate ray-triangle, ray-box intersections.
Renderer task - ?
SPM - If the GPU overflows the parameter buffer during vertex processing it will enter smart parameter mode (SPM) and attempt to grow the parameter buffer.
SHF - (scene hierarchy fetch) ?
SHG - Scene Hierarchy Generator. Takes the output of the SHF and is responsible for generating a scene hierarchy acceleration structure for the provided components which can later be used by the RTU.
TDM - texture data master? Time-division multiplexing (high-speed TDM I/O) - mem bus? T* Display Manager ? (active on texture blit)
TA - Tile Accelerator, determines which tiles contain each transformed primitive.
TLA - ?
TSP - Texture and Shading Processor. Applies colouring operations, like fragment shaders, to the visible pixels.
NativeSDK has function PVRScopeReadTimingData()
which returns time intervals. Each type of interval can overlap with another type because they executed in different queues (vertex, fragment, compute hw queues). Vulkan timestamps are not supported in PowerVR GPUs, on other devices writing timestamp implicitly adds barrier and prevents GPU to overlap execution.
To measure multiple passes you can sum all intervals or calculate min/max time of all passes, depends on what information you need.
name | units | desc |
---|---|---|
Frame time | seconds | Average time it has taken the GPU to process a frame over the selected period. [ref] |
Frames per second (FPS) | 1/seconds | [ref] |
Geometry active | % | Tiler active? |
Geometry time per frame | seconds | Tiles time per frame? |
Geometry time | seconds | |
GPU clock speed | MHz | On many modern devices, the GPU clock speed will be change dynamically depending on the workload of the GPU and the thermal limits of the chip. [ref] |
GPU memory interface load | % | Total utilisation of the GPU memory bus, for both read and write memory operations over the GPU memory interface within the current period. [ref] |
GPU memory read bytes per second | bytes/second | GPU is reading data from the system memory bus in bytes/sec. [ref] |
GPU memory total bytes per second | bytes/second | GPU is reading or writing data over the system memory bus in bytes/sec. [ref] |
GPU memory write bytes per second | bytes/second | GPU is writing data to the system memory bus in bytes/sec. [ref] |
Renderer active | % | Percentage of time that Renderer tasks were active. Renderer time refers to any time that is spent processing pixels and shading them. This includes the ISP (Image Synthesis Processor), Texturing and Shader Processor units. [ref] |
Renderer time per frame | seconds | Time spent processing Renderer tasks (in seconds) during the specified period. [ref] |
Renderer time | seconds | |
SPM active | % | Percentage of Renderer task which are due to SPM. [ref] |
TDM active | seconds | |
TDM time per frame | seconds | |
TDM time | seconds | |
Tiler | - | |
Triangle ratio | % | Ratio of triangles output from the Tiler over triangles input to the Tiler. [ref] |
Triangles input per frame | Total number of triangles submitted to the Tiler per frame. [ref] | |
Triangles input per second | 1/seconds | Total number of triangles submitted to the Tiler per second. [ref] |
Triangles output per frame | Total triangles written to the Parameter Buffer per-frame after back-face, off-screen and sub-pixel culling. [ref] | |
Triangles output per second | 1/seconds | Total triangles written to the Parameter Buffer per-second after back-face, off-screen and sub-pixel culling. [ref] |
Vertices per triangle | Average number of vertices per triangle. This is calculated as the number of input vertices processed divided by the number of input triangles processed. This counter gives an indication of how efficiently transformed vertices are shared between triangles. [ref] | |
Renderer | - | |
HSR efficiency | % | Effectiveness of the Hidden Surface Removal (HSR) engine, rejecting obscured pixels before they get processed. [ref] |
ISP pixel load | % | Percentage of the time that the Image Synthesis Processor (ISP) pixel-processing is busy. [ref] |
ISP tiles in flight | % | |
Shader | - | |
Compute kernels per frame | Number of compute invocations per frame. [az] | |
Compute kernels per second | 1/seconds | Number of compute invocations per second. [az] |
Cycles per compute kernel | Hz? | Average number of cycles that the Shader Processor has spent processing compute kernels (compute shader invocations). [az] |
Cycles per pixel | Hz? | Average number of cycles that the Shader Processor has spent processing fragments. [ref] |
Cycles per vertex | Hz? | Average number of clock cycles that the Shader Engine has spent processing vertices. [ref] |
Pipelines starved | % | Tiles, Rasterizer, ... has not work? |
Primary ALU Pipeline starved | % | ALU has no work because of memory access? |
Processing load: compute | % | Average compute workload of the Shader Processor. A high value indicates that a large percentage of the Shader’s workload has been spent executing compute kernels. [ref] |
Processing load: pixel | % | Average pixel workload of the Shader Processor. A high value indicates that a large percentage of the Shader’s workload has been spent shading fragments. [ref] |
Processing load: vertex | % | Average vertex workload of the Shader Processor. A high value indicates that a large percentage of the Shader’s workload has been spent shading vertices. [ref] |
Register overload: pixel Register overload: vertex |
% | This counter indicates when the hardware is under register pressure. Register pressure means we cannot queue as many tasks to the hardware due to register requirements being too high. This reduces latency and bandwidth tolerance because we have less tasks available to switch to - hiding these stalls. [ref] |
Shaded pixels per frame | Total number of pixels that the Shader unit has processed per frame. This includes the number of pixels visible and blended. [ref] | |
Shaded pixels per second | 1/seconds | Total number of pixels that the Shader unit has processed per second. This includes the number of pixels visible and blended. [ref] |
Shaded vertices per frame | Total number of vertices that the Shader unit has processed per frame. [ref] | |
Shaded vertices per second | 1/seconds | Total number of vertices that the Shader unit has processed per second. [ref] |
Shader processing load | % | Average workload of the Shader Processor, i.e. when it is processing vertices, pixels or compute. [ref] |
Texturing | - | |
Texture fetches per pixel | ||
Texture filter cycles per fetch | Hz? | |
Texture filter input load | % | |
Texture filter load | % | |
Texture read cycles per fetch | Hz? | |
Texture read stall | % |