Releases: JuliaGPU/AMDGPU.jl
Releases · JuliaGPU/AMDGPU.jl
v0.8.4
v0.8.3
AMDGPU v0.8.3
Merged pull requests:
- [rocSPARSE] Update sv! and sm! (#567) (@amontoison)
- Use correct
warpId
in device-side RNG (#568) (@pxl-th) - Initial ROCm 6 enablement (#572) (@pxl-th)
- Update rocSPARSE to ROCm 6 (#573) (@pxl-th)
- Use the stage preprocess in rocsparse_spmv (#574) (@amontoison)
- Add a generator for ROCsolver (#575) (@amontoison)
- Implement device side rng in RDNA3 plus fix it on julia master (#576) (@gbaraldi)
- Fix repr test (#578) (@pxl-th)
Closed issues:
v0.8.2
AMDGPU v0.8.2
Merged pull requests:
- [rocSPARSE] Add a structure MatInfo for IC(0) and ILU(0) preconditioners (#558) (@amontoison)
- Define comparison method for HIPContext (#561) (@pxl-th)
- Improve type inference (#562) (@pxl-th)
- Refactor alloc/retry (#563) (@pxl-th)
- Fix functional (#565) (@pxl-th)
- Use regular malloc/free (#566) (@pxl-th)
Closed issues:
- has_rocm_gpu() fails (#564)
v0.8.1
AMDGPU v0.8.1
Merged pull requests:
- Implement device-side RNG (#380) (@utkarsh530)
- Fix path detection in ubuntu like systems (#545) (@gbaraldi)
- Simplify ROCm discovery (#548) (@pxl-th)
- [rocSPARSE] Add new constructors (#550) (@amontoison)
- Check context is valid before freeing streams, arrays. (#552) (@pxl-th)
- [rocSPARSE] Update helpers.jl (#554) (@amontoison)
- Use Atomix.jl for atomics (#555) (@pxl-th)
- Reset exception holder immediately after exception (#556) (@pxl-th)
- Fix exception reporting (#557) (@pxl-th)
- Cleanup (#559) (@pxl-th)
Closed issues:
- Implement sparse BLAS routines (#15)
- Implement iterative solvers (#13)
- Create a Docker image for AMDGPU.jl (#33)
- Implement batched off-thread HSA signal waiting (#128)
- HSA_STATUS_ERROR_INVALID_CODE_OBJECT on gfx803 (#192)
hsa_executable_freeze
can hang during high GPU load (#208)- Implement copy!() (#218)
- ROCM/Hip not downloading (?) when ]added (#230)
- mapreducedim! is not implemented for AnyROCArray Types (#234)
- Test of AMDGPU fails on 5900HX - hipErrorNoBinaryForGpu (#244)
- Don't disable ROCm external library type definitions when non-functional (#350)
- AMDGPU.jl doesn't seem to work with 7900 series GPUs (#371)
- Support for rand from Julia Base on device code (#378)
- Detect hardware queue limit and use to limit queue pool size (#403)
- AMDGPU on windows (#465)
- Rely on Atomix.jl for atomics (#547)
v0.8.0
AMDGPU v0.8.0
This release brings initial suport for Windows (see requirements).
Removed "mixed-mode", everything is done automatically under-the-hood.
Merged pull requests:
- ROCm discovery for Windows (#542) (@pxl-th)
- Fix kernel compilation on Windows (#543) (@pxl-th)
- [Windows] Fix D2H memcopy & don't test unsupported functionality (#544) (@pxl-th)
Closed issues:
v0.7.4
AMDGPU v0.7.4
Merged pull requests:
- Update preconditioners.jl (#533) (@amontoison)
- [rocSPARSE] Interface the generic routines (#535) (@amontoison)
- Defer freeing hostcall buffers & add 1.10 CI (#538) (@pxl-th)
- Have separate
free!
method for hostcalls (#539) (@pxl-th) - Switch to artifact device libraries if ROCm 5.5+ is detected (#540) (@pxl-th)
- Fix artifact discovery in global project (#541) (@pxl-th)
Closed issues:
v0.7.3
v0.7.2
v0.7.1
v0.7.0
AMDGPU v0.7.0
Merged pull requests:
- Enable 5.4 JLLs on LLVM <16 (#503) (@jpsamaroo)
- Use refs instead of pointers to get a slightly friendlier abi (#504) (@gbaraldi)
- Bump actions/checkout from 3 to 4 (#506) (@dependabot[bot])
- Add ROCm mixed mode (#508) (@pxl-th)
- Do runtime ROCm discovery (#509) (@pxl-th)
- Switch tests to ReTestItems.jl (#511) (@pxl-th)
- Use non-blocking synchronization by default (#512) (@pxl-th)
- Bump GPUCompiler to 0.25 (#513) (@pxl-th)
- Add a method for getrf! (#514) (@amontoison)
- Use branches instead of 'ifelse' (#519) (@pxl-th)
- Interface getrf_batched and getri_batched (#520) (@amontoison)
- Bring back CI (#523) (@pxl-th)
- Add workgroup synchronization primitives (#524) (@pxl-th)
- Use HIP for retrieving GCN arch (#525) (@pxl-th)
- Mention Julia 1.10+ requirement for Navi 3 (#526) (@pxl-th)
Closed issues:
- Runtime Locking (#64)
- 2x slower AMDGPU.jl kernel compared to HIP (#331)
- sincos() x3.5 slower than separate sin()/cos() calls (#341)
- HSA memory fault using
AMDGPU.rand()
on device ≠ 1 (#386) - WARNING: could not import AMDGPU.device_libs_path into Compiler (#434)
sincos
intrinsic is broken with GPUCompiler 0.24 (#502)- Navi 3 causes
malloc(): unsorted double linked list corrupted
(#518)