Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* started creating skeleton code for bsrmm * rebase bsrmm to squash commits clang formatting Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts bump version Replace host code in bsr2csr (#48) * removed host bsr2csr and csr2bsr code and replaced it with device calls * clang formatting Co-authored-by: jsandham <[email protected]> bump version added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version hipclang related fixes (#51) * hipclang related fixes * bump version sanity check for matrix download (#52) added fallback for unit test matrix downloads (#53) examples fix (#54) * header fix for examples * bump version got bsrmm working for block dim less than 8 clang formatting fixing bugs and getting benchmark to work optimizing and working on kernels for block dimension greater than 8 kernels and code for block dimension greater than 8 and B matrix transposed expanded loop unrolling up to block dimension 16 clang formatting Remove gpg check for CI package CentOS install (#57) updated internal function names (#61) * renamed internal csrtr to trm * clang-format added missing header (#62) fixes to documentation remove compile time evaluation of direction to help reduce the number of kernels clang formatting small performance improvements to transpose kernel clang formatting increase transpose performance clang formatting re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <[email protected]> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version examples fix (#54) * header fix for examples * bump version Remove gpg check for CI package CentOS install (#57) added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <[email protected]> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version adding fortran example code fixing fortran compile error adding bsrmm to fortran_module.f90 fixing fortran example array order fix fortran compile error fix fortran compile error adding cpp example code for bsrmm clang formatting working on optimizing kernels working on optimizing kernels optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm reverting back to original kernels optimizing bsrmm making test2 kernel active for block dim 8 optimizing bsrmm significant performance improvement for block dimensions 5 to 32 further performance improvements to transpose and non-transpose case reduce compile times and replaced general kernel optimizing for n <= 16 Correction to the cmake RUNPATH parameter (#79) Co-authored-by: Pruthvi Madugundu <[email protected]> bump version cmake update (#80) * cmake update * disabling OpenMP until this is fixed within hipclang Csr2bsr optimization (#78) * optimized csr2bsr_nnz * rebase csr2bsr_optimization branch to squash commits Working on optimizing csr2bsr device code changed blocksize to 16 as this runs twice as fast clang formatting removing comments performance optimizations clang formatting improve performance clang formatting csr2bsr optimization added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <[email protected]> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <[email protected]> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Co-authored-by: jsandham <[email protected]> * reducing number of tests * removing bank conflicts * removing duplicate code from rocsparse-functions header * fixing line in rocspasrse-functions header changed by bad merge * fix formating from merge * fix formatting errors from merge Co-authored-by: jsandham <[email protected]>
- Loading branch information