Skip to content

Latest commit

 

History

History
95 lines (83 loc) · 11.8 KB

PowerVR_PC.md

File metadata and controls

95 lines (83 loc) · 11.8 KB

Imagination Technologies PowerVR Performance Counters

References

Notes

2D - 2D Core (TLA). The purpose of the 2D core is to perform efficient blitting operations. For example, OS composition may utilize the 2D core so that the rest of the GPU pipeline can be dedicated to application rendering. Used for: buffer copy, image copy, image blit.
3D - TBDR pass ?
FBA - Frame Buffer Accumulate unit.
HSR - Hidden Surface Removal.
ISP - Image Synthesis Processor. ISP fetches the primitive data and performs Hidden Surface Removal (HSR), along with depth and stencil tests. The ISP only fetches screen-space position data for the geometry within the tile. (Rasterizer?)
IMR - Immediate Mode Renderer.
MCU - Multi-level Memory Cache Unit.
PB - Parameter Buffer. Tile list and the transformed vertex data are both stored in an intermediate store PB.
RTU - Ray Tracing unit. Accelerate ray-triangle, ray-box intersections.
Renderer task - ?
SPM - If the GPU overflows the parameter buffer during vertex processing it will enter smart parameter mode (SPM) and attempt to grow the parameter buffer.
SHF - (scene hierarchy fetch) ?
SHG - Scene Hierarchy Generator. Takes the output of the SHF and is responsible for generating a scene hierarchy acceleration structure for the provided components which can later be used by the RTU.
TDM - texture data master? Time-division multiplexing (high-speed TDM I/O) - mem bus? T* Display Manager ? (active on texture blit)
TA - Tile Accelerator, determines which tiles contain each transformed primitive.
TLA - ?
TSP - Texture and Shading Processor. Applies colouring operations, like fragment shaders, to the visible pixels.

Timings

NativeSDK has function PVRScopeReadTimingData() which returns time intervals. Each type of interval can overlap with another type because they executed in different queues (vertex, fragment, compute hw queues). Vulkan timestamps are not supported in PowerVR GPUs, on other devices writing timestamp implicitly adds barrier and prevents GPU to overlap execution. To measure multiple passes you can sum all intervals or calculate min/max time of all passes, depends on what information you need.

BXM-8-256

name units desc
Frame time seconds Average time it has taken the GPU to process a frame over the selected period. [ref]
Frames per second (FPS) 1/seconds [ref]
Geometry active % Tiler active?
Geometry time per frame seconds Tiles time per frame?
Geometry time seconds
GPU clock speed MHz On many modern devices, the GPU clock speed will be change dynamically depending on the workload of the GPU and the thermal limits of the chip. [ref]
GPU memory interface load % Total utilisation of the GPU memory bus, for both read and write memory operations over the GPU memory interface within the current period. [ref]
GPU memory read bytes per second bytes/second GPU is reading data from the system memory bus in bytes/sec. [ref]
GPU memory total bytes per second bytes/second GPU is reading or writing data over the system memory bus in bytes/sec. [ref]
GPU memory write bytes per second bytes/second GPU is writing data to the system memory bus in bytes/sec. [ref]
Renderer active % Percentage of time that Renderer tasks were active. Renderer time refers to any time that is spent processing pixels and shading them. This includes the ISP (Image Synthesis Processor), Texturing and Shader Processor units. [ref]
Renderer time per frame seconds Time spent processing Renderer tasks (in seconds) during the specified period. [ref]
Renderer time seconds
SPM active % Percentage of Renderer task which are due to SPM. [ref]
TDM active seconds
TDM time per frame seconds
TDM time seconds
Tiler -
Triangle ratio % Ratio of triangles output from the Tiler over triangles input to the Tiler. [ref]
Triangles input per frame Total number of triangles submitted to the Tiler per frame. [ref]
Triangles input per second 1/seconds Total number of triangles submitted to the Tiler per second. [ref]
Triangles output per frame Total triangles written to the Parameter Buffer per-frame after back-face, off-screen and sub-pixel culling. [ref]
Triangles output per second 1/seconds Total triangles written to the Parameter Buffer per-second after back-face, off-screen and sub-pixel culling. [ref]
Vertices per triangle Average number of vertices per triangle. This is calculated as the number of input vertices processed divided by the number of input triangles processed. This counter gives an indication of how efficiently transformed vertices are shared between triangles. [ref]
Renderer -
HSR efficiency % Effectiveness of the Hidden Surface Removal (HSR) engine, rejecting obscured pixels before they get processed. [ref]
ISP pixel load % Percentage of the time that the Image Synthesis Processor (ISP) pixel-processing is busy. [ref]
ISP tiles in flight %
Shader -
Compute kernels per frame Number of compute invocations per frame. [az]
Compute kernels per second 1/seconds Number of compute invocations per second. [az]
Cycles per compute kernel Hz? Average number of cycles that the Shader Processor has spent processing compute kernels (compute shader invocations). [az]
Cycles per pixel Hz? Average number of cycles that the Shader Processor has spent processing fragments. [ref]
Cycles per vertex Hz? Average number of clock cycles that the Shader Engine has spent processing vertices. [ref]
Pipelines starved % Tiles, Rasterizer, ... has not work?
Primary ALU Pipeline starved % ALU has no work because of memory access?
Processing load: compute % Average compute workload of the Shader Processor. A high value indicates that a large percentage of the Shader’s workload has been spent executing compute kernels. [ref]
Processing load: pixel % Average pixel workload of the Shader Processor. A high value indicates that a large percentage of the Shader’s workload has been spent shading fragments. [ref]
Processing load: vertex % Average vertex workload of the Shader Processor. A high value indicates that a large percentage of the Shader’s workload has been spent shading vertices. [ref]
Register overload: pixel
Register overload: vertex
% This counter indicates when the hardware is under register pressure. Register pressure means we cannot queue as many tasks to the hardware due to register requirements being too high. This reduces latency and bandwidth tolerance because we have less tasks available to switch to - hiding these stalls. [ref]
Shaded pixels per frame Total number of pixels that the Shader unit has processed per frame. This includes the number of pixels visible and blended. [ref]
Shaded pixels per second 1/seconds Total number of pixels that the Shader unit has processed per second. This includes the number of pixels visible and blended. [ref]
Shaded vertices per frame Total number of vertices that the Shader unit has processed per frame. [ref]
Shaded vertices per second 1/seconds Total number of vertices that the Shader unit has processed per second. [ref]
Shader processing load % Average workload of the Shader Processor, i.e. when it is processing vertices, pixels or compute. [ref]
Texturing -
Texture fetches per pixel
Texture filter cycles per fetch Hz?
Texture filter input load %
Texture filter load %
Texture read cycles per fetch Hz?
Texture read stall %