- faster
enqueueReadBuffer()
on modern CPUs with 64-Byte-aligned host_buffer
- updated OpenCL headers
- better OpenCL device specs detection using vendor ID and Nvidia compute capability
- better VRAM capacity reporting correction for Intel dGPUs
- fixed wrong device name reporting for AMD GPUs (unlike every sane GPU vendor they don't report device name as
CL_DEVICE_NAME
but need CL_DEVICE_BOARD_NAME_AMD
extension instead)
- fixed TFlops estimate for Intel Battlemage GPUs
|----------------.------------------------------------------------------------|
| Device ID | 1 |
-| Device Name | gfx90a:sramecc+:xnack- |
+| Device Name | AMD Instinct MI210 |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3625.0 (HSA1.1,LC) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 104 at 1700 MHz (6656 cores, 22.630 TFLOPs/s) |
| Memory, Cache | 65520 MB, 16 KB global / 64 KB local |
| Buffer Limits | 65520 MB global, 67092480 KB constant |
|----------------'------------------------------------------------------------|