v0.4.9
AMDGPU v0.4.9
Closed issues:
- State of queues and streams (#337)
- rocBLAS: Remove old hand-wrapped code (#384)
- HSA memory fault upon switching from default device on multi-GPU node (#385)
- Test fail locally with
AssertionError: AMDGPU.Runtime.LOGGING_STATIC_ENABLED
(#399)
Merged pull requests:
- Switch to task-focused synchronization model (#374) (@jpsamaroo)
- Use broadcast instead of copies to initialize mapreduce buffers. (#390) (@maleadt)
- tests: Skip logging tests if disabled (#391) (@jpsamaroo)
- Add blas wrappers for triangular matrix mul / div (#392) (@pxl-th)
- Simplify signal pooling (#393) (@pxl-th)
- Adapt to GPUCompiler 0.18 (#394) (@pxl-th)
- Reduce memory usage (#395) (@pxl-th)
- Add support for KernelAbstraction 0.9 (#398) (@vchuravy)
- Update to GPUCompiler 0.19 & LLVM 5 (#407) (@pxl-th)
- Fix compiler timespan logging (#408) (@pxl-th)
- rocBLAS: define highlevel dot, gemm, axpy functions for FP16 (#409) (@pxl-th)
- Add KernelAbstractions.jl unsafe_free! (#410) (@pxl-th)