Skip to content

Latest commit

 

History

History
112 lines (76 loc) · 4.23 KB

ARM-Mali-5thGen.md

File metadata and controls

112 lines (76 loc) · 4.23 KB

5th generation of Mali GPU architecture.

Content:

Gen1

Examples

  • Mali-G620
  • Mali-G720
  • Immortalis-G720

References

1.1. Arm Mali-G620 Performance Counters Reference Guide
1.2. Arm Immortalis-G720 and Arm Mali-G720 Performance Counters Reference Guide, [backup]
1.3. Vulkan features for Mali-G720

Features

  • Deferred Vertex Shading - optimization for small triangles.

Notes

  • G720 core config [2]:

    • 4 ALU
    • 512 fp16/cy (128 per ALU)
    • 256 fp32/cy (64 per ALU)
    • 4 frag/cy
    • 4 pix/cy
    • 8 tex/cy
  • Immortalis-G720 MC12 VK_ARM_shader_core_builtins, VK_ARM_shader_core_properties [1.3]:

    • shaderCoreCount: 12
    • shaderWarpsPerCore: 64 -- maximum number of simultaneously executing warps on a shader core
    • fmaRate: 128 -- maximum number of single-precision fused multiply-add operations per clock per shader core.
    • pixelRate: 4 -- maximum number of pixels output per clock per shader core.
    • texelRate: 8 -- maximum number of texels per clock per shader core.

Gen2

Examples

  • Mali-G625
  • Mali-G725
  • Immortalis-G925

References

2.1. Arm Immortalis-G925 and Arm Mali-G725 Performance Counters Reference Guide, [backup]
2.2. Arm Mali-G625 Performance Counters Reference Guide
2.3. Vulkan features for Mali-G925-Immortalis MC12
2.4. Hidden Surface Removal in Immortalis-G925: The Fragment Prepass, [webarchive]

Features

  • Fragment Pre-pass - hardware depth pre-pass.

Notes

  • G725 core config [2]:

    • 4 ALU
    • 512 fp16/cy (128 per ALU)
    • 256 fp32/cy (64 per ALU)
    • 4 frag/cy
    • 4 pix/cy
    • 8 tex/cy
  • Immortalis-G925-Immortalis MC12 VK_ARM_shader_core_builtins, VK_ARM_shader_core_properties [2.3]:

    • shaderCoreCount: 12
    • shaderWarpsPerCore: 64 -- maximum number of simultaneously executing warps on a shader core
    • fmaRate: 128 -- maximum number of single-precision fused multiply-add operations per clock per shader core.
    • pixelRate: 4 -- maximum number of pixels output per clock per shader core.
    • texelRate: 8 -- maximum number of texels per clock per shader core.

All gens

References

  1. Deferred Vertex Shading: slide 1, slide 2
  2. Arm GPU Datasheet, [backup]

Notes

  • Deferred Vertex Shading.

    • Brings vertex and fragment shading together to keep intermediate data local. [1]
    • The Tiles chooses which triangles to defer and which to shade upfront, to prevent excessive re-shading. [1]
    • Larger tiles mean each triangle spans fewer tiles, so less re-shading and more triangles can be deferred (DVS). [1]
    • During the tiling phase, Arm GPUs do not write out position data for small triangles. [2.4]
    • During the fragment phase, Arm GPUs will execute a full vertex shader for small triangles. [2.4]
  • Added a 2x MSAA module, as previously when a developer would request 2x MSAA from the GPU, it would automatically jump to 4x MSAA.

  • Fragment Task with 64x64 pixels region.

  • Fragment Pre-pass.

    • Is a Hidden Surface Removal (HSR) technique that does a first pass over the fragments to find out which fragments are going to be visible in the result. When that is done, it loops back and renders only the visible ones. [2.4]
  • Tile size:

    • 64x64 if <= 128 bits per pixel
    • 64x32 if <= 256 bpp
    • 32x32 if > 256 bpp