Documentation for rocSPARSE is available at https://rocm.docs.amd.com/projects/rocSPARSE/en/latest/.
- Added
rocsparse_[s|d|c|z]csritilu0_compute_ex
routines for iterative ILU - Added
rocsparse_[s|d|c|z]csritsv_solve_ex
routines for iterative triangular solve - Added BSR format to SpMM generic routine
rocsparse_spmm
- Added
GPU_TARGETS
to replace the now deprecatedAMDGPU_TARGETS
in cmake files - Added
azurelinux
OS name for correcting gfortran dependency - Added test filters
smoke
,regression
, andextended
for emulation tests.
- By default, build rocsparse shared library using
--offload-compress
compiler option which compresses the fat binary. This significantly reduces the shared library binary size.
- Fixed an issue in the routine
rocsparse_spgemm
when usingrocsparse_spgemm_stage_symbolic
androcsparse_spgemm_stage_numeric
, where the routine would crash whenalpha
andbeta
were passed as host pointers and wherebeta != 0
.
- Improved the adaptive CSR sparse matrix-vector multiplication algorithm when the sparse matrix has many empty rows at the beginning or at the end of the matrix. This improves the routines
rocsparse_spmv
androcsparse_spmv_ex
when the adaptive algorithmrocsparse_spmv_alg_csr_adaptive
is used. - Improved stream CSR sparse matrix-vector multiplication algorithm when the sparse matrix size (number of rows) decreases. This improves the routines
rocsparse_spmv
androcsparse_spmv_ex
when the stream algorithmrocsparse_spmv_alg_csr_stream
is used. - Compared to
rocsparse_[s|d|c|z]csritilu0_compute
, the routinesrocsparse_[s|d|c|z]csritilu0_compute_ex
introduce a number of free iterations. A free iteration is an iteration that does not compute the evaluation of the stopping criteria, if enabled. This allows the user to tune the algorithm for performance improvements. - Compared to
rocsparse_[s|d|c|z]csritsv_solve
, the routinesrocsparse_[s|d|c|z]csritsv_solve_ex
introduce a number of free iterations. A free iteration is an iteration that does not compute the evaluation of the stopping criteria. This allows the user to tune the algorithm for performance improvements. - Improved user documentation
- Deprecated
rocsparse_[s|d|c|z]csritilu0_compute
routines. Users should use the newly addedrocsparse_[s|d|c|z]csritilu0_compute_ex
routines going forward. - Deprecated
rocsparse_[s|d|c|z]csritsv_solve
routines. Users should use the newly addedrocsparse_[s|d|c|z]csritsv_solve_ex
routines going forward. - Deprecated
AMDGPU_TARGETS
using in cmake files. Users should useGPU_TARGETS
going forward.
- Add
rocsparse_create_extract_descr
,rocsparse_destroy_extract_descr
,rocsparse_extract_buffer_size
,rocsparse_extract_nnz
, androcsparse_extract
APIs to allow extraction of the upper or lower part of sparse CSR or CSC matrices. - Support for the gfx1151, gfx1200, and gfx1201 architectures.
- Change the default compiler from hipcc to amdclang in install script and cmake files.
- Change address sanitizer build targets so that only gfx908:xnack+, gfx90a:xnack+, gfx940:xnack+, gfx941:xnack+, and gfx942:xnack+ are built when
BUILD_ADDRESS_SANITIZER=ON
is configured.
- Improved user documentation
- Fixed the
csrmm
merge path algorithm so that diagonal is clamped to the correct range. - Fixed a race condition in
bsrgemm
that could on rare occasions cause incorrect results. - Fixed an issue in
hyb2csr
where the CSR row pointer array was not being properly filled whenn=0
,coo_nnz=0
, orell_nnz=0
. - Fixed scaling in
rocsparse_Xhybmv
when only performingy=beta*y
, for example, wherealpha==0
iny=alpha*Ax+beta*y
. - Fixed
rocsparse_Xgemmi
failures when the y grid dimension is too large. This occured when n >= 65536.
- New Merge-Path algorithm to SpMM, supporting CSR format
- SpSM now supports row order
- rocsparseio I/O functionality has been added to the library
rocsparse_set_identity_permutation
has been added
- Adjusted rocSPARSE dependencies to related HIP packages
- Binary size has been reduced
- A namespace has been wrapped around internal rocSPARSE functions and kernels
rocsparse_csr_set_pointers
,rocsparse_csc_set_pointers
, androcsparse_bsr_set_pointers
do now allow the column indices and values arrays to be nullptr ifnnz
is 0- gfx803 target has been removed from address sanitizer builds
- Improved user manual
- Improved contribution guidelines
- SpMV adaptive and LRB algorithms have been further optimized on CSR format
- Improved performance of SpMV adaptive with symmetrically stored matrices on CSR format
- Compilation errors with
BUILD_ROCSPARSE_ILP64=ON
have been resolved
- New LRB algorithm to SpMV, supporting CSR format
- rocBLAS as now an optional dependency for SDDMM algorithms
- Additional verbose output for
csrgemm
andbsrgemm
- CMake support for documentation
- Triangular solve with multiple rhs (SpSM, csrsm, ...) now calls SpSV, csrsv, etcetera when nrhs equals 1
- Improved user manual section Installation and Building for Linux and Windows
rocsparse_inverse_permutation
- Mixed-precisions for SpVV
- Uniform int8 precision for gather and scatter
- Added new
rocsparse_spmv
routine - Added new
rocsparse_xbsrmv
routines - When using host pointer mode, you must now call
hipStreamSynchronize
followingdoti
,dotci
,spvv
, andcsr2ell
doti
routine- Improved spin-looping algorithms
- Improved documentation
- Improved verbose output during argument checking on API function calls
rocsparse_spmv_ex
rocsparse_xbsrmv_ex
- Auto stages from
spmv
,spmm
,spgemm
,spsv
,spsm
, andspitsv
- Formerly deprecated
rocsparse_spmv
routines - Formerly deprecated
rocsparse_xbsrmv
routines - Formerly deprecated
rocsparse_spmm_ex
routine
- Bug in
rocsparse-bench
where the SpMV algorithm was not taken into account in CSR format - BSR and GEBSR routines (
bsrmv
,bsrsv
,bsrmm
,bsrgeam
,gebsrmv
,gebsrmm
) didn't always showblock_dim==0
as an invalid size - Passing
nnz = 0
todoti
ordotci
wasn't always returning a dot product of 0 gpsv
minimum size is nowm >= 3
- More mixed-precisions for SpMV, (
matrix: float
,vectors: double
,calculation: double
) and (matrix: rocsparse_float_complex
,vectors: rocsparse_double_complex
,calculation: rocsparse_double_complex
) - Support for gfx940, gfx941, and gfx942
- Bug in
csrsm
andbsrsm
- In
csritlu0
, the algorithmrocsparse_itilu0_alg_sync_split_fusion
has some accuracy issues when XNACK is enabled (you can userocsparse_itilu0_alg_sync_split
as an alternative)
- Memory leak in
csritsv
- Bug in
csrsm
andbsrsm
bsrgemm
andspgemm
for BSR formatbsrgeam
- Build support for Navi32
- Experimental hipGraph support for some rocSPARSE routines
csritsv
,spitsv
csr iterative triangular solve- Mixed-precisions for SpMV
- Batched SpMM for transpose A in COO format with atomic algorithm
csr2bsr
csr2csr_compress
csr2coo
gebsr2csr
csr2gebsr
- Documentation
- Bug in COO SpMV grid size
- Bug in SpMM grid size when using very large matrices
- In
csritlu0
, the algorithmrocsparse_itilu0_alg_sync_split_fusion
has some accuracy issues when XNACK is enabled (you can userocsparse_itilu0_alg_sync_split
as an alternative)
rocsparse_spmv_ex
routinerocsparse_bsrmv_ex_analysis
androcsparse_bsrmv_ex
routinescsritilu0
routine- Build support for Navi31 and Navi 33
- Segmented algorithm for COO SpMV by performing analysis
- Improved performance when generating random matrices
bsr2csr
routine
- Integer overflow bugs
- Bug in
ellmv
- Transpose A for SpMM COO format
- Matrix checker routines for verifying matrix data
- Atomic algorithm for COO SpMV
bsrpad
routine
- Bug in
csrilu0
that could cause a deadlock - Bug where asynchronous
memcpy
would use wrong stream - Potential size overflows
- Batched SpMM for CSR, CSC, and COO formats
- Packages for test and benchmark executables on all supported operating systems using CPack
- Clients file importers and exporters
- Clients code size reduction
- Clients error handling
- Clients benchmarking for performance tracking
- Test adjustments due to round-off errors
- Fixing API call compatibility with rocPRIM
gtsv_interleaved_batch
gpsv_interleaved_batch
SpGEMM_reuse
- Allow copying of mat info struct
- Optimization for SDDMM
- Allow unsorted matrices in
csrgemm
multipass algorithm
csrmv
,coomv
,ellmv
, andhybmv
for (conjugate) transposed matricescsrmv
for symmetric matrices- Packages for test and benchmark executables on all supported operating systems using CPack
spmm_ex
has been deprecated and will be removed in the next major release
- Optimization for
gtsv
- Triangular solve for multiple right-hand sides using BSR format
- SpMV for BSRX format
- SpMM in CSR format enhanced to work with transposed A
- Matrix coloring for CSR matrices
- Added batched tridiagonal solve (
gtsv_strided_batch
) - SpMM for BLOCKED ELL format
- Generic routines for SpSV and SpSM
- Beta support for Windows 10
- Additional atomic-based algorithms for SpMM in COO format
- Extended version of SpMM
- Additional algorithm for SpMM in CSR format
- Added (conjugate) transpose support for CsrMV and SpMV (CSR) routines
- Packaging has been split into a runtime package (
rocsparse
) and a development package (rocsparse-devel
): The development package depends on the runtime package. When installing the runtime package, the package manager will suggest the installation of the development package to aid users transitioning from the previous version's combined package. This suggestion by package manager is for all supported operating systems (except CentOS 7) to aid in the transition. Thesuggestion
feature in the runtime package is introduced as a deprecated feature and will be removed in a future ROCm release.
- Bug with
gemvi
on Navi21 - Bug with adaptive CsrMV
- Optimization for pivot-based
gtsv
- (batched) Tridiagonal solver with and without pivoting
- Dense matrix sparse vector multiplication (gemvi)
- Support for gfx90a
- Sampled dense-dense matrix multiplication (SDDMM)
- client matrix download mechanism
- removed boost dependency in clients
- SpMM (CSR, COO)
- Code coverage analysis
- Install script
- Level 2/3 unit tests
rocsparse-bench
no longer depends on boost
gebsrmm
gebsrmv
gebsrsv
coo2dense
anddense2coo
- Generic APIs, including
axpby
,gather
,scatter
,rot
,spvv
,spmv
,spgemm
,sparsetodense
,densetosparse
- Support for mixed indexing types in matrix formats
- Changelog
csr2gebsr
gebsr2gebsc
gebsr2gebsr
- Treating filename as regular expression for YAML-based testing generation
- Documentation for
gebsr2csr
bsric0
- gfx1030 has been adjusted to the latest compiler
- Replace old XNACK 'off' compiler flag with new version
- Updated Debian package name
prune_csr2csr
,prune_dense2csr_percentage
andprune_csr2csr_percentage
addedbsrilu0 added
csrilu0_numeric_boost
functionality added
bsric0
- No changes for this ROCm release
- Fortran bindings
- CentOS 6 support
bsrmv
- Default compiler switched to HIP-Clang
csr2dense
,csc2dense
,csr2csr_compress
,nnz_compress
,bsr2csr
,csr2bsr
,bsrmv
, andcsrgeam
- Triangular solve for BSR format (
bsrsv
) - Options for static build
- Examples
dense2csr
anddense2csc
- Installation process