Releases: JuliaGPU/AMDGPU.jl
Releases · JuliaGPU/AMDGPU.jl
v0.4.15
AMDGPU v0.4.15
Merged pull requests:
v0.4.14
AMDGPU v0.4.14
Closed issues:
- Switching to device ≠ 1 hangs on multi-GPU node (#425)
- @ROCDynamicLocalArray: add support for dynamic eltype and expressions for dims (#428)
Merged pull requests:
- Fix host synchronization (#417) (@pxl-th)
- Add device selection in current task by ID (#420) (@luraess)
- Declare compatibility with
LLVM_jll
15 (#426) (@giordano) - Remove buggy uses of default_device (#427) (@jpsamaroo)
- at-ROC*LocalArray: Escape arguments (#430) (@jpsamaroo)
v0.4.13
AMDGPU v0.4.13
Merged pull requests:
v0.4.12
v0.4.11
v0.4.10
AMDGPU v0.4.10
Merged pull requests:
v0.4.9
AMDGPU v0.4.9
Closed issues:
- State of queues and streams (#337)
- rocBLAS: Remove old hand-wrapped code (#384)
- HSA memory fault upon switching from default device on multi-GPU node (#385)
- Test fail locally with
AssertionError: AMDGPU.Runtime.LOGGING_STATIC_ENABLED
(#399)
Merged pull requests:
- Switch to task-focused synchronization model (#374) (@jpsamaroo)
- Use broadcast instead of copies to initialize mapreduce buffers. (#390) (@maleadt)
- tests: Skip logging tests if disabled (#391) (@jpsamaroo)
- Add blas wrappers for triangular matrix mul / div (#392) (@pxl-th)
- Simplify signal pooling (#393) (@pxl-th)
- Adapt to GPUCompiler 0.18 (#394) (@pxl-th)
- Reduce memory usage (#395) (@pxl-th)
- Add support for KernelAbstraction 0.9 (#398) (@vchuravy)
- Update to GPUCompiler 0.19 & LLVM 5 (#407) (@pxl-th)
- Fix compiler timespan logging (#408) (@pxl-th)
- rocBLAS: define highlevel dot, gemm, axpy functions for FP16 (#409) (@pxl-th)
- Add KernelAbstractions.jl unsafe_free! (#410) (@pxl-th)
v0.4.8
AMDGPU v0.4.8
Merged pull requests:
- ROCSignal: Pool signals in ctor (#369) (@jpsamaroo)
- Reduce allocations (#376) (@pxl-th)
- Report and exit on memory fault (#379) (@jpsamaroo)
- versioninfo: Indicate if using JLLs or System (#381) (@jpsamaroo)
- ROCSignal: Disable IPC by default (#383) (@jpsamaroo)
v0.4.7
v0.4.6
AMDGPU v0.4.6
Closed issues:
- Implement occupancy API (#271)
getinfo
should determine theRef
output container automatically (#273)
Merged pull requests:
- Add timespan logging via TimespanLogging.jl (#263) (@jpsamaroo)
- Add occupancy API and groupsize tuning (#326) (@jpsamaroo)
- Reduce signal wait allocations (#361) (@jpsamaroo)
- Add more intrinsics, enable
always_inline
(#362) (@jpsamaroo) - Simplify math intrinsics (#363) (@pxl-th)
- Implement unified getinfo interface (#364) (@jpsamaroo)
- Assorted fixes (#365) (@jpsamaroo)
- Add memory allocation limiters (#366) (@jpsamaroo)
- Specify return types for getinfo calls (#368) (@pxl-th)