Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop stream 2024-07-01 #512

Merged
merged 69 commits into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from 68 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
7ec5fda
Removed accidentally included #include "hip/amd_detail/host_defines.h"
mfep May 21, 2024
32922c8
chore(gitignore): ignore python venvs
Apr 5, 2024
1f6c4d1
docs(api reference): rocm-docs-core headers and stylesheets in doxyfile
Apr 5, 2024
1e22c52
Merge branch 'fix_doxygen_style' into 'develop_stream'
Apr 8, 2024
789ce4f
improve accuracy of poisson histogram test
nolmoonen Apr 12, 2024
e2c2834
fix format and copyright dates
parbenc Apr 15, 2024
006f964
Merge branch '240-poisson-distribution-test-failure' into 'develop_st…
nolmoonen Apr 16, 2024
5b659f9
feat(test): Added CMake option RUN_EXTRA_TESTS
mfep Apr 22, 2024
ca16c1c
Removed deprecated internal headers, src/rng/distribution/distributio…
NB4444 Apr 24, 2024
c28da46
Using .lint:clang-format
mfep Apr 19, 2024
17aaed3
feat(test): Added large size tests for host generators
mfep Apr 26, 2024
a09dc7e
Merge branch '341-remove-deprecated-internal-headers' into 'develop_s…
NB4444 Apr 25, 2024
99bc62e
fix(generator): Fixed the usage of min in host generators
mfep Apr 22, 2024
7ffc14e
docs(dyn_ordering): Use GPU_TARGETS instead of AMDGPU_TARGETS
May 3, 2024
98cfcae
Merge branch '346-host-large-sizes' into 'develop_stream'
mfep Apr 29, 2024
86e62b5
Use alias method in rocrand_discrete for MTGP32, LFSR113 and ThreeFry
ex-rzr May 10, 2024
31d2082
Merge branch 'docs-gpu-targets' into 'develop_stream'
May 9, 2024
4ca9aac
Merge branch '285-discrete_cdf-is-incorrectly-used-by-some-prngs' int…
ex-rzr May 13, 2024
2eb5e07
refactor mt19937 to support host version as well
parbenc Feb 16, 2024
8bb422f
update test
parbenc Feb 19, 2024
b7698f8
move jump_ahead_thread_count back to template param
parbenc Feb 19, 2024
10b79e8
implement memcpy in host and device systems
parbenc Feb 22, 2024
898dbc6
fix compilation issues and segfault
parbenc Mar 1, 2024
3f8d454
Removed 'apt-get install flang'
mfep May 16, 2024
3129825
fix jump_ahead on host
parbenc Mar 8, 2024
a28f772
create host implementation of some functions
parbenc Mar 21, 2024
3a0bca6
fix remaining inconsistencies in host mt19937 generator
parbenc Mar 28, 2024
7201795
fix format end compile errors
parbenc Apr 4, 2024
f6c4ba9
fix missing gen_next_n calls
parbenc Apr 8, 2024
d922943
fix format issues and missing __host__s
parbenc Apr 11, 2024
961123f
fix messed up host/device allocations
parbenc Apr 15, 2024
f718fe9
fix merge conflicts
parbenc Apr 18, 2024
1188c63
fix format issues and compile error
parbenc Apr 19, 2024
1abe891
fix format issues
parbenc Apr 25, 2024
2f771bb
fix format issue
parbenc Apr 29, 2024
bb0fe50
disable most mt19937 host tests for normal run (enabled for slow test…
parbenc May 3, 2024
5b786cb
fix review comments
parbenc May 9, 2024
fe1650a
remove synchronization from host_system::launch function
parbenc May 16, 2024
e2518b9
Implement asynchronous initialization of poisson distribution
mfep May 22, 2024
ca10206
use ROCRAND_HIP_FATAL_ASSERT for hipDeviceSynchronize call
mfep May 16, 2024
472fd0f
generate_poisson test with many lambdas
mfep May 22, 2024
a62e2c5
Merge branch '309_mt19937_host' into 'develop_stream'
parbenc May 17, 2024
c27eece
Testing poisson with hipGraphs
mfep May 22, 2024
0130141
Test [blocking] host_generator with non-blocking stream
mfep May 22, 2024
8c83427
Fixing poisson distribution selection in benchmark_tuning
mfep May 22, 2024
ecbd2a7
Updated changelog
mfep May 23, 2024
e2c573f
fix(docs): Added links to unaccessible doc pages
mfep Jun 6, 2024
2e00eb1
Merge branch 'poisson-graph-2' into 'develop_stream'
mfep May 31, 2024
511291a
fix(docs): Removed duplicated CUDA Compatibility section from Program…
mfep Jun 6, 2024
4b802a7
Added hipGraphs doc and sample
mfep Jun 6, 2024
a4388af
Fix performance regression of Poisson distribution
ex-rzr Jun 17, 2024
436d26a
Merge branch '360-doc-hip-graph' into 'develop_stream'
ex-rzr Jun 18, 2024
df7893a
clang-format: Break after attributes
ex-rzr Jun 20, 2024
d9a1b03
Merge branch '366-fix-performance-regression-of-poisson' into 'develo…
ex-rzr Jun 19, 2024
c3b4cf9
Add missing __forceinline__ to improve performance on ROCm 6.2
ex-rzr Jun 21, 2024
07c873c
Remove meaningless code in xorwow introduced during rebase/merge
ex-rzr Jun 24, 2024
1f33881
Merge branch '363-improve-performance-on-rocm-6-2' into 'develop_stream'
ex-rzr Jun 26, 2024
67f4fdb
style: update formatting
Naraenda Jul 1, 2024
846b188
ci(.gitlab-ci.yml): replace 'ROCM_PATH' variable with 'env:HIP_PATH' …
Naraenda Jul 2, 2024
578cfa7
ci(.gitlab-ci.yml): do not force download deps on windows
Naraenda Jul 2, 2024
73dea65
ci(.gitlab-ci.yml): pass amdclang filepath properly to windows packag…
Naraenda Jul 2, 2024
abfb6b4
Merge branch 'ci-remove-rocm-path-variable' into 'develop_stream'
Naraenda Jul 3, 2024
e163301
Remove unused FindTestU01.cmake
ex-rzr Jul 12, 2024
f209572
Added checks for nullptr data with tests
NB4444 Jul 12, 2024
7f6766c
Merge branch 'remove-unused-findtestu01-cmake' into 'develop_stream'
ex-rzr Jul 16, 2024
d6a97de
Merge branch '369-add-check-for-nullptr-data-when-calling-host-genera…
NB4444 Jul 22, 2024
49add41
Fix bit rotation for threefry2x64 and threefry4x64
ex-rzr Jul 22, 2024
d4f972c
Merge branch '371-fix-threefry2x64-and-threefry4x64' into 'develop_st…
ex-rzr Jul 23, 2024
ec409a3
chore: bump version
Naraenda Aug 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: Yes
AttributeMacros: ['QUALIFIERS', 'FQUALIFIERS']
BinPackArguments: false
BinPackParameters: false
BitFieldColonSpacing: Both
Expand Down Expand Up @@ -88,7 +87,7 @@ EmptyLineAfterAccessModifier: Never
EmptyLineBeforeAccessModifier: Always
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros: [ foreach, Q_FOREACH, BOOST_FOREACH ]
ForEachMacros: []
IfMacros: []
IncludeBlocks: Preserve
IndentAccessModifiers: false
Expand Down Expand Up @@ -139,4 +138,17 @@ SpacesInConditionalStatement: false
SpacesInContainerLiterals: true
SpacesInParentheses: false
SpacesInSquareBrackets: false

AttributeMacros: ['__host__', '__device__', '__global__', '__forceinline__', '__shared__', '__launch_bounds__']
# Trick clang into thinking that our C-style attributes are C++-style attributes
Macros:
- __host__=[[host]]
- __device__=[[device]]
- __global__=[[global]]
- __forceinline__=[[forceinline]]
- __shared__=[[shared]]
- __launch_bounds__(x)=[[launch_bounds(x)]]
- __attribute__(x)=[[attribute(x)]]
BreakAfterAttributes: Always

---
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,4 @@ CMakeLists.txt.user

# Python
__pycache__
.venv
31 changes: 6 additions & 25 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@

variables:
GIT_SUBMODULE_STRATEGY: normal # Fetch submodules before job start (non-recursively)
ROCM_PATH: /opt/rocm

include:
- project: amd/ci-templates
Expand All @@ -31,6 +30,7 @@ include:
- /defaults.yaml
- /deps-cmake.yaml
- /deps-docs.yaml
- /deps-format.yaml
- /deps-rocm.yaml
- /deps-nvcc.yaml
- /deps-windows.yaml
Expand All @@ -46,20 +46,7 @@ stages:

clang-format:
extends:
- .deps:rocm
stage: lint
needs: []
tags:
- build
variables:
CLANG_FORMAT: "/opt/rocm/llvm/bin/clang-format"
GIT_CLANG_FORMAT: "/opt/rocm/llvm/bin/git-clang-format"
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
script:
- cd $CI_PROJECT_DIR
- git config --global --add safe.directory $CI_PROJECT_DIR
- scripts/code-format/check-format.sh $CI_MERGE_REQUEST_DIFF_BASE_SHA --binary "$CLANG_FORMAT"
- .lint:clang-format

copyright-date:
extends:
Expand Down Expand Up @@ -144,12 +131,10 @@ copyright-date:
- .rules:build
needs: []
script:
- $SUDO_CMD apt-get install -y flang
- cmake
-S $CI_PROJECT_DIR
-B $CI_PROJECT_DIR/build
-G Ninja
-D CMAKE_FORTRAN_COMPILER=/usr/bin/flang
-D CMAKE_CXX_COMPILER=${COMPILER}
-D BUILD_TEST=ON
-D BUILD_BENCHMARK=ON
Expand All @@ -173,12 +158,10 @@ copyright-date:
- .rules:build
needs: []
script:
- $SUDO_CMD apt-get install -y flang
- cmake
-S $CI_PROJECT_DIR
-B $CI_PROJECT_DIR/build
-G Ninja
-D CMAKE_FORTRAN_COMPILER=/usr/bin/flang
-D CMAKE_CXX_COMPILER=${COMPILER}
-D CMAKE_CUDA_HOST_COMPILER=${COMPILER}
-D BUILD_TEST=ON
Expand Down Expand Up @@ -353,7 +336,6 @@ benchmark:benchmark-tuning-generate-results:
- .rules:test
stage: test
script:
- $SUDO_CMD apt-get install -y flang
- cd $CI_PROJECT_DIR/build
# Parallel execution (with other AMDGPU processes) can oversubscribe the SDMA queue.
# This causes the hipMemcpy to fail, which is not reported as an error by HIP.
Expand Down Expand Up @@ -598,10 +580,9 @@ test:nvcc-parity:
-D BUILD_SHARED_LIBS="$BUILD_SHARED_LIBS"
-D BUILD_TEST=ON
-D CMAKE_BUILD_TYPE=Release
-D CMAKE_CXX_COMPILER:FILEPATH="$ROCM_PATH/bin/clang++.exe"
-D CMAKE_CXX_COMPILER:FILEPATH="${env:HIP_PATH}/bin/clang++.exe"
-D CMAKE_INSTALL_PREFIX:PATH="$CI_PROJECT_DIR/build/install"
-D CMAKE_PREFIX_PATH:PATH="$ROCM_PATH/lib/cmake"
-D DEPENDENCIES_FORCE_DOWNLOAD=ON
-D CMAKE_PREFIX_PATH:PATH="${env:HIP_PATH}/lib/cmake"
-D DISABLE_WERROR=OFF *>&1
# Building
- cmake --build "$CI_PROJECT_DIR/build" *>&1
Expand Down Expand Up @@ -629,8 +610,8 @@ test:windows:
-B "$CI_PROJECT_DIR/build_install_test"
-G Ninja
-D CMAKE_BUILD_TYPE=Release
-D CMAKE_CXX_COMPILER="$ROCM_PATH/bin/clang++.exe"
-D CMAKE_PREFIX_PATH="$ROCM_PATH/lib/cmake;$CI_PROJECT_DIR/build/install" *>&1
-D CMAKE_CXX_COMPILER:FILEPATH="${env:HIP_PATH}/bin/clang++.exe"
-D CMAKE_PREFIX_PATH:FILEPATH="${env:HIP_PATH}/lib/cmake;$CI_PROJECT_DIR/build/install" *>&1
# Build package test
- cmake --build "$CI_PROJECT_DIR/build_install_test"
# Copy rocRAND.dll to the package test build directory
Expand Down
17 changes: 15 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,24 @@
Documentation for rocRAND is available at
[https://rocm.docs.amd.com/projects/rocRAND/en/latest/](https://rocm.docs.amd.com/projects/rocRAND/en/latest/)

## (Unreleased) rocRAND-3.2.0 for ROCm 6.3.0

### Additions

* Added host generator for MT19937
* Support for `rocrand_generate_poisson` in hipGraphs

### Changes

* `rocrand_discrete` for MTGP32, LFSR113 and ThreeFry generators now uses the alias method, which is faster than binary search in CDF.

## (Unreleased) rocRAND-3.1.1 for ROCm 6.2.0

## Fixes
* Fixed " unknown extension ?>" issue in scripts/config-tuning/select_best_config.py
when using python version thats older than 3.11

* Fixed " unknown extension ?>" issue in scripts/config-tuning/select_best_config.py
when using python version thats older than 3.11
* Fixed low random sequence quality of `ROCRAND_RNG_PSEUDO_THREEFRY2_64_20` and `ROCRAND_RNG_PSEUDO_THREEFRY4_64_20`.

## (Unreleased) rocRAND-3.1.0 for ROCm 6.2.0

Expand Down
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ cmake_dependent_option(BUILD_BENCHMARK_TUNING
option(BUILD_ADDRESS_SANITIZER "Build with address sanitizer enabled" OFF)
option(CODE_COVERAGE "Build with code coverage flags (clang only)" OFF)
option(DEPENDENCIES_FORCE_DOWNLOAD "Don't search the system for dependencies, always download them" OFF)
cmake_dependent_option(RUN_SLOW_TESTS "Run extra tests with CTest. These cover niche functionality and take long time" OFF "BUILD_TEST" OFF)
stanleytsang-amd marked this conversation as resolved.
Show resolved Hide resolved

# Install prefix
set(CMAKE_INSTALL_PREFIX "/opt/rocm" CACHE PATH "Install path prefix, prepended onto install directories")
Expand Down
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,4 @@ Bugs and feature requests can be reported through the
Contributions of any kind are most welcome! You can find more information at
[CONTRIBUTING](./CONTRIBUTING.md).

Licensing information is located at [LICENSE](./LICENSE.txt). Note that [statistical tests](./test/crush) link
to the TestU01 library distributed under GNU General Public License (GPL) version 3. Therefore, the GPL
version 3 license applies to that part of the project.
Licensing information is located at [LICENSE](./LICENSE.txt).
8 changes: 2 additions & 6 deletions benchmark/benchmark_rocrand_host_api.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,8 @@ int main(int argc, char* argv[])
std::vector<rng_type_t> benchmarked_engine_types{ROCRAND_RNG_PSEUDO_LFSR113,
ROCRAND_RNG_PSEUDO_MRG31K3P,
ROCRAND_RNG_PSEUDO_MRG32K3A,
ROCRAND_RNG_PSEUDO_MTGP32,
ROCRAND_RNG_PSEUDO_MT19937,
ROCRAND_RNG_PSEUDO_PHILOX4_32_10,
ROCRAND_RNG_PSEUDO_THREEFRY2_32_20,
ROCRAND_RNG_PSEUDO_THREEFRY2_64_20,
Expand All @@ -190,12 +192,6 @@ int main(int argc, char* argv[])
ROCRAND_RNG_QUASI_SOBOL64,
ROCRAND_RNG_QUASI_SCRAMBLED_SOBOL64};

if(!benchmark_host)
{
benchmarked_engine_types.push_back(ROCRAND_RNG_PSEUDO_MTGP32);
benchmarked_engine_types.push_back(ROCRAND_RNG_PSEUDO_MT19937);
}

const std::map<rocrand_ordering, std::string> ordering_name_map{
{ROCRAND_ORDERING_PSEUDO_DEFAULT, "default"},
{ ROCRAND_ORDERING_PSEUDO_LEGACY, "legacy"},
Expand Down
9 changes: 5 additions & 4 deletions benchmark/tuning/benchmark_tuning.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,10 @@ void run_benchmark(benchmark::State& state, const benchmark_config& config)
generator.set_stream(stream);

const auto generate_func = [&]
{ return generator.generate(data, size, default_distribution<Distribution>{}(config)); };
{
default_distribution<Distribution> default_distribution_provider;
return generator.generate(data, size, default_distribution_provider(config));
};

// Warm-up
ROCRAND_CHECK(generate_func());
Expand Down Expand Up @@ -147,9 +150,7 @@ class generator_benchmark_factory
if constexpr(std::is_same_v<T, unsigned int>)
{
// The poisson distribution is only supported for unsigned int.
using poisson_distribution_t = rocrand_impl::host::poisson_distribution<
rocrand_impl::host::DISCRETE_METHOD_ALIAS>;
add_benchmarks_impl<T, poisson_distribution_t>();
add_benchmarks_impl<T, select_poisson_distribution_t<GeneratorTemplate>>();
}
}
else if constexpr(std::is_floating_point_v<T> || std::is_same_v<T, half>)
Expand Down
4 changes: 3 additions & 1 deletion benchmark/tuning/benchmarked_generators.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,9 @@ using mtgp32_generator_template
ConfigProvider>;

template<class ConfigProvider>
using mt19937_generator_template = rocrand_impl::host::mt19937_generator_template<ConfigProvider>;
using mt19937_generator_template
= rocrand_impl::host::mt19937_generator_template<rocrand_impl::system::device_system,
ConfigProvider>;

template<class ConfigProvider>
using philox4x32_10_generator_template
Expand Down
51 changes: 49 additions & 2 deletions benchmark/tuning/distribution_traits.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,15 @@ struct distribution_name<
}
};

template<>
struct distribution_name<rocrand_impl::host::mrg_poisson_distribution>
{
std::string operator()()
{
return "poisson_unsigned_int";
}
};

template<class Distribution>
struct default_distribution
{
Expand Down Expand Up @@ -191,11 +200,49 @@ struct default_distribution<
{
auto operator()(const benchmark_config& config)
{
return rocrand_impl::host::poisson_distribution<rocrand_impl::host::DISCRETE_METHOD_ALIAS>(
config.lambda);
return std::get<
rocrand_impl::host::poisson_distribution<rocrand_impl::host::DISCRETE_METHOD_ALIAS>>(
m_poisson_manager.get_distribution(config.lambda));
}

private:
rocrand_impl::host::poisson_distribution_manager<rocrand_impl::host::DISCRETE_METHOD_ALIAS>
m_poisson_manager;
};

template<>
struct default_distribution<rocrand_impl::host::mrg_poisson_distribution>
{
auto operator()(const benchmark_config& config)
{
auto poisson_distribution = std::get<
rocrand_impl::host::poisson_distribution<rocrand_impl::host::DISCRETE_METHOD_ALIAS>>(
m_poisson_manager.get_distribution(config.lambda));
return rocrand_impl::host::mrg_poisson_distribution(poisson_distribution);
}

private:
rocrand_impl::host::poisson_distribution_manager<rocrand_impl::host::DISCRETE_METHOD_ALIAS>
m_poisson_manager;
};

template<template<class> class GeneratorTemplate>
struct select_poisson_distribution
{
using dummy_generator_t = GeneratorTemplate<rocrand_impl::host::static_config_provider<0, 0>>;
static constexpr inline rocrand_rng_type rng_type = dummy_generator_t::type();
static constexpr inline bool is_mrg
= rng_type == ROCRAND_RNG_PSEUDO_MRG31K3P || rng_type == ROCRAND_RNG_PSEUDO_MRG32K3A;

using type = std::conditional_t<
is_mrg,
rocrand_impl::host::mrg_poisson_distribution,
rocrand_impl::host::poisson_distribution<rocrand_impl::host::DISCRETE_METHOD_ALIAS>>;
};

template<template<class> class GeneratorTemplate>
using select_poisson_distribution_t = typename select_poisson_distribution<GeneratorTemplate>::type;

} // namespace benchmark_tuning

#endif // ROCRAND_BENCHMARK_TUNING_DISTRIBUTION_TRAITS_HPP_
52 changes: 0 additions & 52 deletions cmake/Modules/FindTestU01.cmake

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
.. meta::
:description: rocRAND documentation and API reference library
:keywords: rocRAND, ROCm, API, documentation, cuRAND

.. _data-type-support:

Data type support
******************************************

Expand Down
4 changes: 2 additions & 2 deletions docs/conceptual/dynamic_ordering_configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ Building the tuning benchmarks

The principle of the tuning is very simple: the random number generation kernel is run for a list of kernel block size / kernel grid size combinations, and the fastest combination is selected as the dynamic ordering configuration for the particular device. rocRAND provides an executable target that runs the benchmarks with all these combinations: `benchmark_rocrand_tuning`. This target is disabled by default, and can be enabled and built by the following snippet.

Use the `AMDGPU_TARGET` variable to specify the comma-separated list of GPU architectures to build the benchmarks for. To acquire the architecture of the GPU(s) installed, run `rocminfo`, and look for `gfx` in the "ISA Info" section. ::
Use the `GPU_TARGETS` variable to specify the comma-separated list of GPU architectures to build the benchmarks for. To acquire the architecture of the GPU(s) installed, run `rocminfo`, and look for `gfx` in the "ISA Info" section. ::

$ cd rocRAND
$ cmake -S . -B ./build
-D BUILD_BENCHMARK=ON
-D BUILD_BENCHMARK_TUNING=ON
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++
-D AMDGPU_TARGETS=gfx908
-D GPU_TARGETS=gfx908
$ cmake --build build --target benchmark_rocrand_tuning

Additionally, the following CMake cache variables control the generation of the benchmarked matrix:
Expand Down
Loading
Loading