- A14
- M1
- Discover Metal enhancements for A14 Bionic
- Mesa driver details
- Dissecting the Apple M1 GPU: 1, 2, 3
- M1 Benchmarks
- M1 reverse engineering
- iGPU Cache Setups Compared, Including M1
- Reverse engineering the Apple G13 GPU architecture
- FP16:
- fp32 has same rate as fp16. [4]
- that there is a penalty (of exactly one cycle) for switching between FP32 and FP16 operation. ref
- FP32 ALU rate is half of FP16 rate on A14 (and earlier chips). That has not changed on A14. F32 ALU rate relative to F16 increased on M1.
- Ray tracing (software).
-
Local memory (L1): [6]
- size: 32KB
- latency: 43ns
- bandwidth: 671 GB/s
-
L2 Cache: [6]
- size: 1MB
- latency: 76.3ns
- bandwidth: 384 GB/s
-
System level cache (L3): [6]
- size: 8MB
- latency: 266ns
- bandwidth: 134 GB/s
-
RAM: [6]
- latency: 311ns
- bandwidth: 50.4 GB/s
-
CPU to GPU bandwidth: 17 GB/s [6]
-
GPU to CPU bandwidth: 17.5 GB/s [6]
-
M1 8-core GPU:
- max clock: 1278 MHz
- 16 EU per core
- 8 fp32/fp16 ALU per EU
- 2 SFU per EU [4]
- 1024 total ALUs (FMA/cy)
- theoretical fp32/fp16 performance: 1278 MHz * 1024 ALUs = 1308*10^9^ FMA/s = 2.6 TFLOPS