diff --git a/BUILD.md b/BUILD.md
index c4d8b1b356..c94bb24204 100644
--- a/BUILD.md
+++ b/BUILD.md
@@ -5,9 +5,12 @@
- [Build Dependencies](#required_depenencies)
- [Header-only C++](#install_header_only_cpp)
- [C++ Shared Libraries](#shared_cpp_libs)
+ - [Improving Rebuild Times](#ccache)
- [Googletests](#gtests)
+ - [Googlebench](#gbench)
- [C++ Using Cmake](#cpp_using_cmake)
- [Python](#python)
+ - [Documentation](#docs)
- [Using RAFT in downstream projects](#use_raft)
- [Cmake Header-only Integration](#cxx_integration)
- [Using Shared Libraries in Cmake](#use_shared_libs)
@@ -27,15 +30,14 @@ In addition to the libraries included with cudatoolkit 11.0+, there are some oth
#### Required
- [RMM](https://github.com/rapidsai/rmm) corresponding to RAFT version.
-
+
#### Optional
-- [mdspan](https://github.com/rapidsai/mdspan) - On by default but can be disabled.
- [Thrust](https://github.com/NVIDIA/thrust) v1.15 / [CUB](https://github.com/NVIDIA/cub) - On by default but can be disabled.
- [cuCollections](https://github.com/NVIDIA/cuCollections) - Used in `raft::sparse::distance` API.
- [Libcu++](https://github.com/NVIDIA/libcudacxx) v1.7.0
- [FAISS](https://github.com/facebookresearch/faiss) v1.7.0 - Used in `raft::spatial::knn` API and needed to build tests.
-- [NCCL](https://github.com/NVIDIA/nccl) - Used in `raft::comms` API and needed to build `Pyraft`
-- [UCX](https://github.com/openucx/ucx) - Used in `raft::comms` API and needed to build `Pyraft`
+- [NCCL](https://github.com/NVIDIA/nccl) - Used in `raft::comms` API and needed to build `raft-dask`
+- [UCX](https://github.com/openucx/ucx) - Used in `raft::comms` API and needed to build `raft-dask`
- [Googletest](https://github.com/google/googletest) - Needed to build tests
- [Googlebench](https://github.com/google/benchmark) - Needed to build benchmarks
- [Doxygen](https://github.com/doxygen/doxygen) - Needed to build docs
@@ -53,11 +55,6 @@ The following example will download the needed dependencies and install the RAFT
./build.sh libraft --install
```
-The `--minimal-deps` flag can be used to install the headers with minimal dependencies:
-```bash
-./build.sh libraft --install --minimal-deps
-```
-
### C++ Shared Libraries (optional)
For larger projects which make heavy use of the pairwise distances or nearest neighbors APIs, shared libraries can be built to speed up compile times. These shared libraries can also significantly improve re-compile times both while developing RAFT and developing against the APIs. Build all of the available shared libraries by passing `--compile-libs` flag to `build.sh`:
@@ -72,6 +69,14 @@ Individual shared libraries have their own flags and multiple can be used (thoug
Add the `--install` flag to the above example to also install the shared libraries into `$INSTALL_PREFIX/lib`.
+### `ccache` and `sccache`
+
+`ccache` and `sccache` can be used to better cache parts of the build when rebuilding frequently, such as when working on a new feature. You can also use `ccache` or `sccache` with `build.sh`:
+
+```bash
+./build.sh libraft --cache-tool=ccache
+```
+
### Tests
Compile the tests using the `tests` target in `build.sh`.
@@ -86,23 +91,30 @@ Test compile times can be improved significantly by using the optional shared li
./build.sh libraft tests --compile-libs
```
-To run C++ tests:
+The tests are broken apart by algorithm category, so you will find several binaries in `cpp/build/` named `*_TEST`.
+
+For example, to run the distance tests:
+```bash
+./cpp/build/DISTANCE_TEST
+```
+
+It can take sometime to compile all of the tests. You can build individual tests by providing a semicolon-separated list to the `--limit-tests` option in `build.sh`:
```bash
-./cpp/build/test_raft
+./build.sh libraft tests --limit-tests=SPATIAL_TEST;DISTANCE_TEST;MATRIX_TEST
```
-### Benchmarks
+### Benchmarks
-Compile the benchmarks using the `bench` target in `build.sh`:
+The benchmarks are broken apart by algorithm category, so you will find several binaries in `cpp/build/` named `*_BENCH`.
```bash
./build.sh libraft bench
```
-To run the benchmarks:
+It can take sometime to compile all of the tests. You can build individual tests by providing a semicolon-separated list to the `--limit-tests` option in `build.sh`:
```bash
-./cpp/build/bench_raft
+./build.sh libraft bench --limit-bench=SPATIAL_BENCH;DISTANCE_BENCH;LINALG_BENCH
```
### C++ Using Cmake
@@ -128,10 +140,7 @@ RAFT's cmake has the following configurable flags available:.
| RAFT_COMPILE_DIST_LIBRARY | ON, OFF | OFF | Compiles the `libraft-distance` shared library |
| RAFT_ENABLE_NN_DEPENDENCIES | ON, OFF | OFF | Searches for dependencies of nearest neighbors API, such as FAISS, and compiles them if not found. Needed for `raft::spatial::knn` |
| RAFT_ENABLE_thrust_DEPENDENCY | ON, OFF | ON | Enables the Thrust dependency. This can be disabled when using many simple utilities or to override with a different Thrust version. |
-| RAFT_ENABLE_mdspan_DEPENDENCY | ON, OFF | ON | Enables the std::mdspan dependency. This can be disabled when using many simple utilities. |
-| RAFT_ENABLE_nccl_DEPENDENCY | ON, OFF | OFF | Enables NCCL dependency used by `raft::comms` and needed to build `pyraft` |
-| RAFT_ENABLE_ucx_DEPENDENCY | ON, OFF | OFF | Enables UCX dependency used by `raft::comms` and needed to build `pyraft` |
-| RAFT_USE_FAISS_STATIC | ON, OFF | OFF | Statically link FAISS into `libraft-nn` |
+| RAFT_USE_FAISS_STATIC | ON, OFF | OFF | Statically link FAISS into `libraft-nn` |
| RAFT_STATIC_LINK_LIBRARIES | ON, OFF | ON | Build static link libraries instead of shared libraries |
| DETECT_CONDA_ENV | ON, OFF | ON | Enable detection of conda environment for dependencies |
| NVTX | ON, OFF | OFF | Enable NVTX Markers |
@@ -143,22 +152,26 @@ Currently, shared libraries are provided for the `libraft-nn` and `libraft-dista
### Python
-Conda environment scripts are provided for installing the necessary dependencies for building and using the Python APIs. It is preferred to use `mamba`, as it provides significant speedup over `conda`. The following example will install create and install dependencies for a CUDA 11.5 conda environment:
+Conda environment scripts are provided for installing the necessary dependencies for building and using the Python APIs. It is preferred to use `mamba`, as it provides significant speedup over `conda`. In addition you will have to manually install `nvcc` as it will not be installed as part of the conda environment. The following example will install create and install dependencies for a CUDA 11.5 conda environment:
```bash
mamba env create --name raft_env_name -f conda/environments/raft_dev_cuda11.5.yml
mamba activate raft_env_name
```
-The Python APIs can be built using the `build.sh` script:
+The Python APIs can be built and installed using the `build.sh` script:
```bash
-./build.sh pyraft pylibraft
+# to build pylibraft
+./build.sh libraft pylibraft --install --compile-libs
+# to build raft-dask
+./build.sh libraft raft-dask --install --compile-libs
```
`setup.py` can also be used to build the Python APIs manually:
-```bash
-cd python/raft
+
+```
+cd python/raft-dask
python setup.py build_ext --inplace
python setup.py install
@@ -169,16 +182,28 @@ python setup.py install
To run the Python tests:
```bash
-cd python/raft
-py.test -s -v raft
+cd python/raft-dask
+py.test -s -v
+
+cd python/pylibraft
+py.test -s -v
+```
+
+### Documentation
+
+The documentation requires that the C++ headers and python packages have been built and installed.
+
+The following will build the docs along with the C++ and Python packages:
-cd python pylibraft
-py.test -s -v pylibraft
```
+./build.sh libraft pylibraft raft-dask docs --compile-libs --install
+```
+
+
## Using RAFT in downstream projects
-There are two different strategies for including RAFT in downstream projects, depending on whether or not the required dependencies are already installed and available on the `lib` and `include` paths.
+There are two different strategies for including RAFT in downstream projects, depending on whether or not the required dependencies are already installed and available on the `lib` and `include` paths.
### C++ header-only integration using cmake
@@ -187,7 +212,7 @@ When the needed [build dependencies](#required_depenencies) are already satisfie
set(RAFT_GIT_DIR ${CMAKE_CURRENT_BINARY_DIR}/raft CACHE STRING "Path to RAFT repo")
ExternalProject_Add(raft
GIT_REPOSITORY git@github.com:rapidsai/raft.git
- GIT_TAG branch-22.04
+ GIT_TAG branch-22.10
PREFIX ${RAFT_GIT_DIR}
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
@@ -205,13 +230,13 @@ The pre-compiled libraries contain template specializations for commonly used ty
The following example tells the compiler to ignore the pre-compiled templates for the `libraft-distance` API so any symbols already compiled into pre-compiled shared library will be used instead:
```c++
-#include
-#include
+#include
+#include
```
### Building RAFT C++ from source in cmake
-RAFT uses the [RAPIDS-CMake](https://github.com/rapidsai/rapids-cmake) library so it can be more easily included into downstream projects. RAPIDS cmake provides a convenience layer around the [CMake Package Manager (CPM)](https://github.com/cpm-cmake/CPM.cmake).
+RAFT uses the [RAPIDS-CMake](https://github.com/rapidsai/rapids-cmake) library so it can be more easily included into downstream projects. RAPIDS cmake provides a convenience layer around the [CMake Package Manager (CPM)](https://github.com/cpm-cmake/CPM.cmake).
The following example is similar to invoking `find_package(raft)` but uses `rapids_cpm_find`, which provides a richer and more flexible configuration landscape by using CPM to fetch any dependencies not already available to the build. The `raft::raft` link target will be made available and it's recommended that it be used as a `PRIVATE` link dependency in downstream projects. The `COMPILE_LIBRARIES` option enables the building the shared libraries.
@@ -219,15 +244,15 @@ The following `cmake` snippet enables a flexible configuration of RAFT:
```cmake
-set(RAFT_VERSION "22.04")
+set(RAFT_VERSION "22.10")
set(RAFT_FORK "rapidsai")
set(RAFT_PINNED_TAG "branch-${RAFT_VERSION}")
function(find_and_configure_raft)
set(oneValueArgs VERSION FORK PINNED_TAG USE_FAISS_STATIC
COMPILE_LIBRARIES ENABLE_NN_DEPENDENCIES CLONE_ON_PIN
- USE_NN_LIBRARY USE_DISTANCE_LIBRARY
- ENABLE_thrust_DEPENDENCY ENABLE_mdspan_DEPENDENCY)
+ USE_NN_LIBRARY USE_DISTANCE_LIBRARY
+ ENABLE_thrust_DEPENDENCY)
cmake_parse_arguments(PKG "${options}" "${oneValueArgs}"
"${multiValueArgs}" ${ARGN} )
@@ -272,7 +297,6 @@ function(find_and_configure_raft)
"RAFT_USE_FAISS_STATIC ${PKG_USE_FAISS_STATIC}"
"RAFT_COMPILE_LIBRARIES ${PKG_COMPILE_LIBRARIES}"
"RAFT_ENABLE_thrust_DEPENDENCY ${PKG_ENABLE_thrust_DEPENDENCY}"
- "RAFT_ENABLE_mdspan_DEPENDENCY ${PKG_ENABLE_mdspan_DEPENDENCY}"
)
endfunction()
@@ -295,7 +319,6 @@ find_and_configure_raft(VERSION ${RAFT_VERSION}.00
ENABLE_NN_DEPENDENCIES NO # This builds FAISS if not installed
USE_FAISS_STATIC NO
ENABLE_thrust_DEPENDENCY YES
- ENABLE_mdspan_DEPENDENCY YES
)
```
diff --git a/CHANGELOG.md b/CHANGELOG.md
index b341367022..ac1e8581df 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,79 @@
+# raft 22.10.00 (12 Oct 2022)
+
+## 🚨 Breaking Changes
+
+- Separating mdspan/mdarray infra into host_* and device_* variants ([#810](https://github.com/rapidsai/raft/pull/810)) [@cjnolet](https://github.com/cjnolet)
+- Remove type punning from TxN_t ([#781](https://github.com/rapidsai/raft/pull/781)) [@wphicks](https://github.com/wphicks)
+- ivf_flat::index: hide implementation details ([#747](https://github.com/rapidsai/raft/pull/747)) [@achirkin](https://github.com/achirkin)
+
+## 🐛 Bug Fixes
+
+- ivf-pq integration: hotfixes ([#891](https://github.com/rapidsai/raft/pull/891)) [@achirkin](https://github.com/achirkin)
+- Removing cub symbol from libraft-distance instantiation. ([#887](https://github.com/rapidsai/raft/pull/887)) [@cjnolet](https://github.com/cjnolet)
+- ivf-pq post integration hotfixes ([#878](https://github.com/rapidsai/raft/pull/878)) [@achirkin](https://github.com/achirkin)
+- Fixing a few compile errors in new APIs ([#874](https://github.com/rapidsai/raft/pull/874)) [@cjnolet](https://github.com/cjnolet)
+- Include knn.cuh in knn.cu benchmark source for finding brute_force_knn ([#855](https://github.com/rapidsai/raft/pull/855)) [@teju85](https://github.com/teju85)
+- Do not use strcpy to copy 2 char ([#848](https://github.com/rapidsai/raft/pull/848)) [@mhoemmen](https://github.com/mhoemmen)
+- rng_state not including necessary cstdint ([#839](https://github.com/rapidsai/raft/pull/839)) [@MatthiasKohl](https://github.com/MatthiasKohl)
+- Fix integer overflow in ANN kmeans ([#835](https://github.com/rapidsai/raft/pull/835)) [@Nyrio](https://github.com/Nyrio)
+- Add alignment to the TxN_t vectorized type ([#792](https://github.com/rapidsai/raft/pull/792)) [@achirkin](https://github.com/achirkin)
+- Fix adj_to_csr_kernel ([#785](https://github.com/rapidsai/raft/pull/785)) [@ahendriksen](https://github.com/ahendriksen)
+- Use rapids-cmake 22.10 best practice for RAPIDS.cmake location ([#784](https://github.com/rapidsai/raft/pull/784)) [@robertmaynard](https://github.com/robertmaynard)
+- Remove type punning from TxN_t ([#781](https://github.com/rapidsai/raft/pull/781)) [@wphicks](https://github.com/wphicks)
+- Various fixes for build.sh ([#771](https://github.com/rapidsai/raft/pull/771)) [@vyasr](https://github.com/vyasr)
+
+## 📖 Documentation
+
+- Fix target names in build.sh help text ([#879](https://github.com/rapidsai/raft/pull/879)) [@Nyrio](https://github.com/Nyrio)
+- Document that minimum required CMake version is now 3.23.1 ([#841](https://github.com/rapidsai/raft/pull/841)) [@robertmaynard](https://github.com/robertmaynard)
+
+## 🚀 New Features
+
+- mdspanify raft::random functions uniformInt, normalTable, fill, bernoulli, and scaled_bernoulli ([#897](https://github.com/rapidsai/raft/pull/897)) [@mhoemmen](https://github.com/mhoemmen)
+- mdspan-ify several raft::random rng functions ([#857](https://github.com/rapidsai/raft/pull/857)) [@mhoemmen](https://github.com/mhoemmen)
+- Develop new mdspan-ified multi_variable_gaussian interface ([#845](https://github.com/rapidsai/raft/pull/845)) [@mhoemmen](https://github.com/mhoemmen)
+- Mdspanify permute ([#834](https://github.com/rapidsai/raft/pull/834)) [@mhoemmen](https://github.com/mhoemmen)
+- mdspan-ify rmat_rectangular_gen ([#833](https://github.com/rapidsai/raft/pull/833)) [@mhoemmen](https://github.com/mhoemmen)
+- mdspanify sampleWithoutReplacement ([#830](https://github.com/rapidsai/raft/pull/830)) [@mhoemmen](https://github.com/mhoemmen)
+- mdspan-ify make_regression ([#811](https://github.com/rapidsai/raft/pull/811)) [@mhoemmen](https://github.com/mhoemmen)
+- Updating `raft::linalg` APIs to use `mdspan` ([#809](https://github.com/rapidsai/raft/pull/809)) [@divyegala](https://github.com/divyegala)
+- Integrate KNN implementation: ivf-pq ([#789](https://github.com/rapidsai/raft/pull/789)) [@achirkin](https://github.com/achirkin)
+
+## 🛠️ Improvements
+
+- Some fixes for build.sh ([#901](https://github.com/rapidsai/raft/pull/901)) [@cjnolet](https://github.com/cjnolet)
+- Revert recent fused l2 nn instantiations ([#899](https://github.com/rapidsai/raft/pull/899)) [@cjnolet](https://github.com/cjnolet)
+- Update Python build instructions ([#898](https://github.com/rapidsai/raft/pull/898)) [@betatim](https://github.com/betatim)
+- Adding ninja and cxx compilers to conda dev dependencies ([#893](https://github.com/rapidsai/raft/pull/893)) [@cjnolet](https://github.com/cjnolet)
+- Output non-normalized distances in IVF-PQ and brute-force KNN ([#892](https://github.com/rapidsai/raft/pull/892)) [@Nyrio](https://github.com/Nyrio)
+- Readme updates for 22.10 ([#884](https://github.com/rapidsai/raft/pull/884)) [@cjnolet](https://github.com/cjnolet)
+- Breaking apart benchmarks into individual binaries ([#883](https://github.com/rapidsai/raft/pull/883)) [@cjnolet](https://github.com/cjnolet)
+- Pin `dask` and `distributed` for release ([#858](https://github.com/rapidsai/raft/pull/858)) [@galipremsagar](https://github.com/galipremsagar)
+- Mdspanifying (currently tested) `raft::matrix` ([#846](https://github.com/rapidsai/raft/pull/846)) [@cjnolet](https://github.com/cjnolet)
+- Separating _RAFT_HOST and _RAFT_DEVICE macros ([#836](https://github.com/rapidsai/raft/pull/836)) [@cjnolet](https://github.com/cjnolet)
+- Updating cpu job in hopes it speeds up python cpu builds ([#828](https://github.com/rapidsai/raft/pull/828)) [@cjnolet](https://github.com/cjnolet)
+- Mdspan-ifying `raft::spatial` ([#827](https://github.com/rapidsai/raft/pull/827)) [@cjnolet](https://github.com/cjnolet)
+- Fixing __init__.py for handle and stream ([#826](https://github.com/rapidsai/raft/pull/826)) [@cjnolet](https://github.com/cjnolet)
+- Moving a few more things around ([#822](https://github.com/rapidsai/raft/pull/822)) [@cjnolet](https://github.com/cjnolet)
+- Use fusedL2NN in ANN kmeans ([#821](https://github.com/rapidsai/raft/pull/821)) [@Nyrio](https://github.com/Nyrio)
+- Separating test executables ([#820](https://github.com/rapidsai/raft/pull/820)) [@cjnolet](https://github.com/cjnolet)
+- Separating mdspan/mdarray infra into host_* and device_* variants ([#810](https://github.com/rapidsai/raft/pull/810)) [@cjnolet](https://github.com/cjnolet)
+- Fix malloc/delete mismatch ([#808](https://github.com/rapidsai/raft/pull/808)) [@mhoemmen](https://github.com/mhoemmen)
+- Renaming `pyraft` -> `raft-dask` ([#801](https://github.com/rapidsai/raft/pull/801)) [@cjnolet](https://github.com/cjnolet)
+- Branch 22.10 merge 22.08 ([#800](https://github.com/rapidsai/raft/pull/800)) [@cjnolet](https://github.com/cjnolet)
+- Statically link all CUDA toolkit libraries ([#797](https://github.com/rapidsai/raft/pull/797)) [@trxcllnt](https://github.com/trxcllnt)
+- Minor follow-up fixes for ivf-flat ([#796](https://github.com/rapidsai/raft/pull/796)) [@achirkin](https://github.com/achirkin)
+- KMeans benchmarks (cuML + ANN implementations) and fix for IndexT=int64_t ([#795](https://github.com/rapidsai/raft/pull/795)) [@Nyrio](https://github.com/Nyrio)
+- Optimize fusedL2NN when data is skinny ([#794](https://github.com/rapidsai/raft/pull/794)) [@ahendriksen](https://github.com/ahendriksen)
+- Complete the deprecation of duplicated hpp headers ([#793](https://github.com/rapidsai/raft/pull/793)) [@ahendriksen](https://github.com/ahendriksen)
+- Prepare parts of the balanced kmeans for ivf-pq ([#788](https://github.com/rapidsai/raft/pull/788)) [@achirkin](https://github.com/achirkin)
+- Unpin `dask` and `distributed` for development ([#783](https://github.com/rapidsai/raft/pull/783)) [@galipremsagar](https://github.com/galipremsagar)
+- Exposing python wrapper for the RMAT generator logic ([#778](https://github.com/rapidsai/raft/pull/778)) [@teju85](https://github.com/teju85)
+- Device, Host, Managed Accessor Types for `mdspan` ([#776](https://github.com/rapidsai/raft/pull/776)) [@divyegala](https://github.com/divyegala)
+- Fix Forward-Merger Conflicts ([#768](https://github.com/rapidsai/raft/pull/768)) [@ajschmidt8](https://github.com/ajschmidt8)
+- Fea 2208 kmeans use specializations ([#760](https://github.com/rapidsai/raft/pull/760)) [@cjnolet](https://github.com/cjnolet)
+- ivf_flat::index: hide implementation details ([#747](https://github.com/rapidsai/raft/pull/747)) [@achirkin](https://github.com/achirkin)
+
# raft 22.08.00 (17 Aug 2022)
## 🚨 Breaking Changes
diff --git a/DEVELOPER_GUIDE.md b/DEVELOPER_GUIDE.md
index 5c1e122525..e1dd682fd9 100644
--- a/DEVELOPER_GUIDE.md
+++ b/DEVELOPER_GUIDE.md
@@ -4,7 +4,7 @@
Devloping features and fixing bugs for the RAFT library itself is straightforward and only requires building and installing the relevant RAFT artifacts.
-The process for working on a CUDA/C++ feature which spans RAFT and one or more consumers can vary slightly depending on whether the consuming project relies on a source build (as outlined in the [BUILD](BUILD.md#install_header_only_cpp) docs). In such a case, the option `CPM_raft_SOURCE=/path/to/raft/source` can be passed to the cmake of the consuming project in order to build the local RAFT from source. The PR with relevant changes to the consuming project can also pin the RAFT version temporarily by explicitly changing the `FORK` and `PINNED_TAG` arguments to the RAFT branch containing their changes when invoking `find_and_configure_raft`. The pin should be reverted after the changed is merged to the RAFT project and before it is merged to the dependent project(s) downstream.
+The process for working on a CUDA/C++ feature which might span RAFT and one or more consuming libraries can vary slightly depending on whether the consuming project relies on a source build (as outlined in the [BUILD](BUILD.md#install_header_only_cpp) docs). In such a case, the option `CPM_raft_SOURCE=/path/to/raft/source` can be passed to the cmake of the consuming project in order to build the local RAFT from source. The PR with relevant changes to the consuming project can also pin the RAFT version temporarily by explicitly changing the `FORK` and `PINNED_TAG` arguments to the RAFT branch containing their changes when invoking `find_and_configure_raft`. The pin should be reverted after the changed is merged to the RAFT project and before it is merged to the dependent project(s) downstream.
If building a feature which spans projects and not using the source build in cmake, the RAFT changes (both C++ and Python) will need to be installed into the environment of the consuming project before they can be used. The ideal integration of RAFT into consuming projects will enable both the source build in the consuming project only for this case but also rely on a more stable packaging (such as conda packaging) otherwise.
@@ -14,6 +14,16 @@ Since RAFT is a core library with multiple consumers, it's important that the pu
The public APIs should be lightweight wrappers around calls to private APIs inside the `detail` namespace.
+## Common Design Considerations
+
+1. Use the `hpp` extension for files which can be compiled with `gcc` against the CUDA-runtime. Use the `cuh` extension for files which require `nvcc` to be compiled. `hpp` can also be used for functions marked `__host__ __device__` only if proper checks are in place to remove the `__device__` designation when not compiling with `nvcc`.
+
+2. When additional classes, structs, or general POCO types are needed to be used for representing data in the public API, place them in a new file called `_types.hpp`. This tells users they are safe to expose these types on their own public APIs without bringing in device code. At a minimum, the definitions for these types, at least, should not require `nvcc`. In general, these classes should only store very simple state and should not perform their own computations. Instead, new functions should be exposed on the public API which accept these objects, reading or updating their state as necessary.
+
+3. Documentation for public APIs should be well documented, easy to use, and it is highly preferred that they include usage instructions.
+
+4. Before creating a new primitive, check to see if one exists already. If one exists but the API isn't flexible enough to include your use-case, consider first refactoring the existing primitive. If that is not possible without an extreme number of changes, consider how the public API could be made more flexible. If the new primitive is different enough from all existing primitives, consider whether an existing public API could invoke the new primitive as an option or argument. If the new primitive is different enough from what exists already, add a header for the new public API function to the appropriate subdirectory and namespace.
+
## Testing
It's important for RAFT to maintain a high test coverage in order to minimize the potential for downstream projects to encounter unexpected build or runtime behavior as a result of changes. A well-defined public API can help maintain compile-time stability but means more focus should be placed on testing the functional requirements and verifying execution on the various edge cases within RAFT itself. Ideally, bug fixes and new features should be able to be made to RAFT independently of the consuming projects.
diff --git a/README.md b/README.md
index 2159f128bf..2c0231f37e 100755
--- a/README.md
+++ b/README.md
@@ -24,7 +24,7 @@ RAFT provides a header-only C++ library and pre-compiled shared libraries that c
RAFT also provides 2 Python libraries:
- `pylibraft` - low-level Python wrappers around RAFT algorithms and primitives.
-- `pyraft` - reusable infrastructure for building analytics, including tools for building both single-GPU and multi-node multi-GPU algorithms.
+- `raft-dask` - reusable infrastructure for building analytics, including tools for building both single-GPU and multi-node multi-GPU algorithms.
## Getting started
@@ -39,7 +39,7 @@ The APIs in RAFT currently accept raw pointers to device memory and we are in th
The `mdarray` forms a convenience layer over RMM and can be constructed in RAFT using a number of different helper functions:
```c++
-#include
+#include
int n_rows = 10;
int n_cols = 10;
@@ -56,8 +56,8 @@ Most of the primitives in RAFT accept a `raft::handle_t` object for the manageme
The example below demonstrates creating a RAFT handle and using it with `device_matrix` and `device_vector` to allocate memory, generating random clusters, and computing
pairwise Euclidean distances:
```c++
-#include
-#include
+#include
+#include
#include
#include
@@ -108,13 +108,15 @@ The easiest way to install RAFT is through conda and several packages are provid
- `libraft-nn` (optional) contains shared libraries for the nearest neighbors primitives.
- `libraft-distance` (optional) contains shared libraries for distance primitives.
- `pylibraft` (optional) Python wrappers around RAFT algorithms and primitives
-- `pyraft` (optional) contains reusable Python infrastructure and tools to accelerate Python algorithm development.
+- `raft-dask` (optional) enables deployment of multi-node multi-GPU algorithms that use RAFT `raft::comms` in Dask clusters.
-Use the following command to install RAFT with conda (replace `rapidsai` with `rapidsai-nightly` to install more up-to-date but less stable nightly packages). `mamba` is preferred over the `conda` command.
+Use the following command to install all of the RAFT packages with conda (replace `rapidsai` with `rapidsai-nightly` to install more up-to-date but less stable nightly packages). `mamba` is preferred over the `conda` command.
```bash
-mamba install -c rapidsai libraft-headers libraft-nn libraft-distance pyraft pylibraft
+mamba install -c rapidsai -c conda-forge -c nvidia raft-dask pylibraft
```
+You can also install the `libraft-*` conda packages individually using the `mamba` command above.
+
After installing RAFT, `find_package(raft COMPONENTS nn distance)` can be used in your CUDA/C++ cmake build to compile and/or link against needed dependencies in your raft target. `COMPONENTS` are optional and will depend on the packages installed.
### CPM
@@ -181,7 +183,7 @@ mamba env create --name raft_dev_env -f conda/environments/raft_dev_cuda11.5.yml
mamba activate raft_dev_env
```
```
-./build.sh pyraft pylibraft libraft tests bench --compile-libs
+./build.sh raft-dask pylibraft libraft tests bench --compile-libs
```
The [build](BUILD.md) instructions contain more details on building RAFT from source and including it in downstream projects. You can also find a more comprehensive version of the above CPM code snippet the [Building RAFT C++ from source](BUILD.md#build_cxx_source) section of the build instructions.
@@ -193,11 +195,18 @@ The folder structure mirrors other RAPIDS repos, with the following folders:
- `ci`: Scripts for running CI in PRs
- `conda`: Conda recipes and development conda environments
- `cpp`: Source code for C++ libraries.
- - `docs`: Doxygen configuration
- - `include`: The C++ API is fully-contained here
- - `src`: Compiled template specializations for the shared libraries
+ - `bench`: Benchmarks source code
+ - `cmake`: Cmake modules and templates
+ - `doxygen`: Doxygen configuration
+ - `include`: The C++ API headers are fully-contained here
+ - `scripts`: Helpful scripts for development
+ - `src`: Compiled APIs and template specializations for the shared libraries
+ - `test`: Googletests source code
- `docs`: Source code and scripts for building library documentation (doxygen + pydocs)
- `python`: Source code for Python libraries.
+ - `pylibraft`: Python build and source code for pylibraft library
+ - `raft-dask`: Python build and source code for raft-dask library
+- `thirdparty`: Third-party licenses
## Contributing
diff --git a/build.sh b/build.sh
index 8b00fa69dd..a31d97c22c 100755
--- a/build.sh
+++ b/build.sh
@@ -18,14 +18,14 @@ ARGS=$*
# script, and that this script resides in the repo dir!
REPODIR=$(cd $(dirname $0); pwd)
-VALIDARGS="clean libraft pyraft pylibraft docs tests bench clean -v -g --install --compile-libs --compile-nn --compile-dist --allgpuarch --no-nvtx --show_depr_warn -h --buildfaiss --minimal-deps"
-HELP="$0 [ ...] [ ...] [--cmake-args=\"\"] [--cache-tool=]
+VALIDARGS="clean libraft pylibraft raft-dask docs tests bench clean -v -g --install --compile-libs --compile-nn --compile-dist --allgpuarch --no-nvtx --show_depr_warn -h --buildfaiss --minimal-deps"
+HELP="$0 [ ...] [ ...] [--cmake-args=\"\"] [--cache-tool=] [--limit-tests=] [--limit-bench=]
where is:
clean - remove all existing build artifacts and configuration (start over)
libraft - build the raft C++ code only. Also builds the C-wrapper library
around the C++ code.
- pyraft - build the pyraft Python package
pylibraft - build the pylibraft Python package
+ raft-dask - build the raft-dask Python package. this also requires pylibraft.
docs - build the documentation
tests - build the tests
bench - build the benchmarks
@@ -35,9 +35,13 @@ HELP="$0 [ ...] [ ...] [--cmake-args=\"\"] [--cache-tool=\"] [--cache-tool==0.29,<0.30
-- cmake>=3.20.1,!=3.23.0
+- cmake>=3.23.1
- scikit-build>=0.13.1
-- rapids-build-env=22.08.*
-- rapids-notebook-env=22.08.*
-- rapids-doc-env=22.08.*
-- rmm=22.08.*
-- dask-cuda=22.08.*
+- rapids-build-env=22.10.*
+- rapids-notebook-env=22.10.*
+- rapids-doc-env=22.10.*
+- rmm=22.10.*
+- dask-cuda=22.10.*
- ucx>=1.13.0
-- ucx-py=0.27.*
+- ucx-py=0.28.*
- ucx-proc=*=gpu
- doxygen>=1.8.20
- libfaiss>=1.7.0
@@ -27,8 +30,8 @@ dependencies:
- pip:
- sphinx_markdown_tables
- breathe
- - git+https://github.com/dask/dask.git@2022.7.1
- - git+https://github.com/dask/distributed.git@2022.7.1
+ - git+https://github.com/dask/dask.git@2022.9.2
+ - git+https://github.com/dask/distributed.git@2022.9.2
# rapids-build-env, notebook-env and doc-env are defined in
# https://docs.rapids.ai/maintainers/depmgmt/
diff --git a/conda/environments/raft_dev_cuda11.2.yml b/conda/environments/raft_dev_cuda11.2.yml
index 5991e3e370..d8cb5759c1 100644
--- a/conda/environments/raft_dev_cuda11.2.yml
+++ b/conda/environments/raft_dev_cuda11.2.yml
@@ -5,19 +5,22 @@ channels:
- rapidsai-nightly
- conda-forge
dependencies:
+- c-compiler
+- cxx-compiler
- cudatoolkit=11.2
+- ninja
- clang=11.1.0
- clang-tools=11.1.0
- cython>=0.29,<0.30
-- cmake>=3.20.1,!=3.23.0
+- cmake>=3.23.1
- scikit-build>=0.13.1
-- rapids-build-env=22.08.*
-- rapids-notebook-env=22.08.*
-- rapids-doc-env=22.08.*
-- rmm=22.08.*
-- dask-cuda=22.08.*
+- rapids-build-env=22.10.*
+- rapids-notebook-env=22.10.*
+- rapids-doc-env=22.10.*
+- rmm=22.10.*
+- dask-cuda=22.10.*
- ucx>=1.13.0
-- ucx-py=0.27.*
+- ucx-py=0.28.*
- ucx-proc=*=gpu
- doxygen>=1.8.20
- libfaiss>=1.7.0
@@ -27,8 +30,8 @@ dependencies:
- pip:
- sphinx_markdown_tables
- breathe
- - git+https://github.com/dask/dask.git@2022.7.1
- - git+https://github.com/dask/distributed.git@2022.7.1
+ - git+https://github.com/dask/dask.git@2022.9.2
+ - git+https://github.com/dask/distributed.git@2022.9.2
# rapids-build-env, notebook-env and doc-env are defined in
# https://docs.rapids.ai/maintainers/depmgmt/
diff --git a/conda/environments/raft_dev_cuda11.4.yml b/conda/environments/raft_dev_cuda11.4.yml
index 21e3e1ff33..74ee0366ca 100644
--- a/conda/environments/raft_dev_cuda11.4.yml
+++ b/conda/environments/raft_dev_cuda11.4.yml
@@ -5,19 +5,22 @@ channels:
- rapidsai-nightly
- conda-forge
dependencies:
+- c-compiler
+- cxx-compiler
- cudatoolkit=11.4
+- ninja
- clang=11.1.0
- clang-tools=11.1.0
- cython>=0.29,<0.30
-- cmake>=3.20.1,!=3.23.0
+- cmake>=3.23.1
- scikit-build>=0.13.1
-- rapids-build-env=22.08.*
-- rapids-notebook-env=22.08.*
-- rapids-doc-env=22.08.*
-- rmm=22.08.*
-- dask-cuda=22.08.*
+- rapids-build-env=22.10.*
+- rapids-notebook-env=22.10.*
+- rapids-doc-env=22.10.*
+- rmm=22.10.*
+- dask-cuda=22.10.*
- ucx>=1.13.0
-- ucx-py=0.27.*
+- ucx-py=0.28.*
- ucx-proc=*=gpu
- doxygen>=1.8.20
- libfaiss>=1.7.0
@@ -27,8 +30,8 @@ dependencies:
- pip:
- sphinx_markdown_tables
- breathe
- - git+https://github.com/dask/dask.git@2022.7.1
- - git+https://github.com/dask/distributed.git@2022.7.1
+ - git+https://github.com/dask/dask.git@2022.9.2
+ - git+https://github.com/dask/distributed.git@2022.9.2
# rapids-build-env, notebook-env and doc-env are defined in
# https://docs.rapids.ai/maintainers/depmgmt/
diff --git a/conda/environments/raft_dev_cuda11.5.yml b/conda/environments/raft_dev_cuda11.5.yml
index 49725eb39f..fca6684bc8 100644
--- a/conda/environments/raft_dev_cuda11.5.yml
+++ b/conda/environments/raft_dev_cuda11.5.yml
@@ -5,20 +5,23 @@ channels:
- rapidsai-nightly
- conda-forge
dependencies:
+- c-compiler
+- cxx-compiler
- cudatoolkit=11.5
- cuda-python >=11.5,<11.7.1
+- ninja
- clang=11.1.0
- clang-tools=11.1.0
- cython>=0.29,<0.30
-- cmake>=3.20.1,!=3.23.0
+- cmake>=3.23.1
- scikit-build>=0.13.1
-- rapids-build-env=22.08.*
-- rapids-notebook-env=22.08.*
-- rapids-doc-env=22.08.*
-- rmm=22.08.*
-- dask-cuda=22.08.*
+- rapids-build-env=22.10.*
+- rapids-notebook-env=22.10.*
+- rapids-doc-env=22.10.*
+- rmm=22.10.*
+- dask-cuda=22.10.*
- ucx>=1.13.0
-- ucx-py=0.27.*
+- ucx-py=0.28.*
- ucx-proc=*=gpu
- doxygen>=1.8.20
- libfaiss>=1.7.0
@@ -28,8 +31,8 @@ dependencies:
- pip:
- sphinx_markdown_tables
- breathe
- - git+https://github.com/dask/dask.git@2022.7.1
- - git+https://github.com/dask/distributed.git@2022.7.1
+ - git+https://github.com/dask/dask.git@2022.9.2
+ - git+https://github.com/dask/distributed.git@2022.9.2
# rapids-build-env, notebook-env and doc-env are defined in
# https://docs.rapids.ai/maintainers/depmgmt/
diff --git a/conda/recipes/libraft/conda_build_config.yaml b/conda/recipes/libraft/conda_build_config.yaml
index bed95d14b3..c4d0c2a087 100644
--- a/conda/recipes/libraft/conda_build_config.yaml
+++ b/conda/recipes/libraft/conda_build_config.yaml
@@ -11,7 +11,7 @@ sysroot_version:
- "2.17"
cmake_version:
- - ">=3.20.1,!=3.23.0"
+ - ">=3.23.1"
nccl_version:
- ">=2.9.9"
diff --git a/conda/recipes/pylibraft/conda_build_config.yaml b/conda/recipes/pylibraft/conda_build_config.yaml
index 5c2fa69f8e..725c38cb6a 100644
--- a/conda/recipes/pylibraft/conda_build_config.yaml
+++ b/conda/recipes/pylibraft/conda_build_config.yaml
@@ -11,4 +11,4 @@ sysroot_version:
- "2.17"
cmake_version:
- - ">=3.20.1,!=3.23.0"
+ - ">=3.23.1"
diff --git a/conda/recipes/pyraft/build.sh b/conda/recipes/raft-dask/build.sh
similarity index 81%
rename from conda/recipes/pyraft/build.sh
rename to conda/recipes/raft-dask/build.sh
index 1462f365ff..963433dd8d 100644
--- a/conda/recipes/pyraft/build.sh
+++ b/conda/recipes/raft-dask/build.sh
@@ -3,4 +3,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
# This assumes the script is executed from the root of the repo directory
-./build.sh pyraft --install --no-nvtx
+./build.sh raft-dask --install --no-nvtx
diff --git a/conda/recipes/pyraft/conda_build_config.yaml b/conda/recipes/raft-dask/conda_build_config.yaml
similarity index 86%
rename from conda/recipes/pyraft/conda_build_config.yaml
rename to conda/recipes/raft-dask/conda_build_config.yaml
index 1ff86d58da..a6ca533504 100644
--- a/conda/recipes/pyraft/conda_build_config.yaml
+++ b/conda/recipes/raft-dask/conda_build_config.yaml
@@ -14,4 +14,4 @@ ucx_version:
- "1.13.0"
cmake_version:
- - ">=3.20.1,!=3.23.0"
+ - ">=3.23.1"
diff --git a/conda/recipes/pyraft/meta.yaml b/conda/recipes/raft-dask/meta.yaml
similarity index 90%
rename from conda/recipes/pyraft/meta.yaml
rename to conda/recipes/raft-dask/meta.yaml
index 7a2e8d6c49..4e10294db7 100644
--- a/conda/recipes/pyraft/meta.yaml
+++ b/conda/recipes/raft-dask/meta.yaml
@@ -10,7 +10,7 @@
{% set ucx_py_version=environ.get('UCX_PY_VERSION') %}
package:
- name: pyraft
+ name: raft-dask
version: {{ version }}
source:
@@ -35,7 +35,7 @@ requirements:
- cython>=0.29,<0.30
- scikit-build>=0.13.1
- rmm {{ minor_version }}
- - libraft-headers {{ version }}
+ - pylibraft {{ version }}
- cudatoolkit {{ cuda_version }}.*
- cuda-python >=11.5,<11.7.1
- nccl>=2.9.9
@@ -45,14 +45,14 @@ requirements:
run:
- python x.x
- dask-cuda {{ minor_version }}
- - libraft-headers {{ version }}
+ - pylibraft {{ version }}
- nccl>=2.9.9
- rmm {{ minor_version }}
- ucx >={{ ucx_version }}
- ucx-py {{ ucx_py_version }}
- ucx-proc=*=gpu
- - dask==2022.7.1
- - distributed==2022.7.1
+ - dask==2022.9.2
+ - distributed==2022.9.2
- cuda-python >=11.5,<11.7.1
- joblib >=0.11
- {{ pin_compatible('cudatoolkit', max_pin='x', min_pin='x') }}
@@ -61,10 +61,10 @@ tests: # [linux64]
requirements: # [linux64]
- cudatoolkit {{ cuda_version }}.* # [linux64]
imports: # [linux64]
- - raft # [linux64]
+ - raft_dask # [linux64]
about:
home: http://rapids.ai/
license: Apache-2.0
# license_file: LICENSE
- summary: pyraft library
+ summary: raft-dask library
diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
index 2c424e9431..ce6eb00bc1 100644
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
@@ -13,10 +13,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#=============================================================================
-set(RAPIDS_VERSION "22.06")
-set(RAFT_VERSION "${RAPIDS_VERSION}.00")
+set(RAPIDS_VERSION "22.10")
+set(RAFT_VERSION "22.10.00")
-cmake_minimum_required(VERSION 3.20.1 FATAL_ERROR)
+cmake_minimum_required(VERSION 3.23.1 FATAL_ERROR)
include(../fetch_rapids.cmake)
include(rapids-cmake)
include(rapids-cpm)
@@ -26,7 +26,7 @@ include(rapids-find)
rapids_cuda_init_architectures(RAFT)
-project(RAFT VERSION 22.08.00 LANGUAGES CXX CUDA)
+project(RAFT VERSION ${RAFT_VERSION} LANGUAGES CXX CUDA)
# Needed because GoogleBenchmark changes the state of FindThreads.cmake, causing subsequent runs to
# have different values for the `Threads::Threads` target. Setting this flag ensures
@@ -53,7 +53,7 @@ option(BUILD_TESTS "Build raft unit-tests" ON)
option(BUILD_BENCH "Build raft C++ benchmark tests" OFF)
option(CUDA_ENABLE_KERNELINFO "Enable kernel resource usage info" OFF)
option(CUDA_ENABLE_LINEINFO "Enable the -lineinfo option for nvcc (useful for cuda-memcheck / profiler)" OFF)
-option(CUDA_STATIC_RUNTIME "Statically link the CUDA runtime" OFF)
+option(CUDA_STATIC_RUNTIME "Statically link the CUDA toolkit runtime and libraries" OFF)
option(DETECT_CONDA_ENV "Enable detection of conda environment for dependencies" ON)
option(DISABLE_DEPRECATION_WARNINGS "Disable deprecaction warnings " ON)
option(DISABLE_OPENMP "Disable OpenMP" OFF)
@@ -85,7 +85,7 @@ message(VERBOSE "RAFT: Disable OpenMP: ${DISABLE_OPENMP}")
message(VERBOSE "RAFT: Enable kernel resource usage info: ${CUDA_ENABLE_KERNELINFO}")
message(VERBOSE "RAFT: Enable lineinfo in nvcc: ${CUDA_ENABLE_LINEINFO}")
message(VERBOSE "RAFT: Enable nvtx markers: ${RAFT_NVTX}")
-message(VERBOSE "RAFT: Statically link the CUDA runtime: ${CUDA_STATIC_RUNTIME}")
+message(VERBOSE "RAFT: Statically link the CUDA toolkit runtime and libraries: ${CUDA_STATIC_RUNTIME}")
# Set RMM logging level
set(RMM_LOGGING_LEVEL "INFO" CACHE STRING "Choose the logging level.")
@@ -106,6 +106,21 @@ endif()
##############################################################################
# - compiler options ---------------------------------------------------------
+set(_ctk_static_suffix "")
+if(CUDA_STATIC_RUNTIME)
+ # If we're statically linking CTK cuBLAS,
+ # we also want to statically link BLAS
+ set(BLA_STATIC ON)
+ set(_ctk_static_suffix "_static")
+ # Control legacy FindCUDA.cmake behavior too
+ # Remove this after we push it into rapids-cmake:
+ # https://github.com/rapidsai/rapids-cmake/pull/259
+ set(CUDA_USE_STATIC_CUDA_RUNTIME ON)
+endif()
+
+# CUDA runtime
+rapids_cuda_init_runtime(USE_STATIC ${CUDA_STATIC_RUNTIME})
+
if (NOT DISABLE_OPENMP)
find_package(OpenMP)
if(OPENMP_FOUND)
@@ -168,12 +183,11 @@ target_include_directories(raft INTERFACE
# Only CUDA libs and rmm should
# be used in global target.
target_link_libraries(raft INTERFACE
- CUDA::cublas
- CUDA::curand
- CUDA::cusolver
- CUDA::cudart
- CUDA::cusparse
rmm::rmm
+ CUDA::cublas${_ctk_static_suffix}
+ CUDA::curand${_ctk_static_suffix}
+ CUDA::cusolver${_ctk_static_suffix}
+ CUDA::cusparse${_ctk_static_suffix}
$<$:raft::Thrust>
)
@@ -214,6 +228,11 @@ endif()
##############################################################################
# - raft_distance ------------------------------------------------------------
+# TODO:
+# Currently, this package also contains the 'random' namespace (for rmat logic)
+# We couldn't get this to work properly due to strange CI failures as noticed
+# in the PR#778. In the long term, we should rename this package to `raft_compiled`
+# in order to have a single pre-compiled raft package for those who need it.
add_library(raft_distance INTERFACE)
if(TARGET raft_distance AND (NOT TARGET raft::distance))
@@ -255,6 +274,17 @@ if(RAFT_COMPILE_DIST_LIBRARY)
src/distance/specializations/detail/lp_unexpanded_double_double_double_int.cu
src/distance/specializations/detail/lp_unexpanded_float_float_float_uint32.cu
src/distance/specializations/detail/lp_unexpanded_float_float_float_int.cu
+ src/distance/specializations/detail/russel_rao_double_double_double_int.cu
+ src/distance/specializations/detail/russel_rao_float_float_float_uint32.cu
+ src/distance/specializations/detail/russel_rao_float_float_float_int.cu
+# src/distance/specializations/fused_l2_nn_double_int.cu
+# src/distance/specializations/fused_l2_nn_double_int64.cu
+# src/distance/specializations/fused_l2_nn_float_int.cu
+# src/distance/specializations/fused_l2_nn_float_int64.cu
+ src/random/specializations/rmat_rectangular_generator_int_double.cu
+ src/random/specializations/rmat_rectangular_generator_int64_double.cu
+ src/random/specializations/rmat_rectangular_generator_int_float.cu
+ src/random/specializations/rmat_rectangular_generator_int64_float.cu
)
set_target_properties(
raft_distance_lib
@@ -310,6 +340,21 @@ if(RAFT_COMPILE_NN_LIBRARY)
src/nn/specializations/detail/ball_cover_lowdim_pass_two_2d.cu
src/nn/specializations/detail/ball_cover_lowdim_pass_one_3d.cu
src/nn/specializations/detail/ball_cover_lowdim_pass_two_3d.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_float_fast.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_float_no_basediff.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_float_no_smem_lut.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_fp8s_fast.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_fp8s_no_basediff.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_fp8s_no_smem_lut.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_fp8u_fast.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_fp8u_no_basediff.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_fp8u_no_smem_lut.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_half_fast.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_half_no_basediff.cu
+ src/nn/specializations/detail/ivfpq_compute_similarity_half_no_smem_lut.cu
+ src/nn/specializations/detail/ivfpq_search_float_int64_t.cu
+ src/nn/specializations/detail/ivfpq_search_float_uint32_t.cu
+ src/nn/specializations/detail/ivfpq_search_float_uint64_t.cu
src/nn/specializations/fused_l2_knn_long_float_true.cu
src/nn/specializations/fused_l2_knn_long_float_false.cu
src/nn/specializations/fused_l2_knn_int_float_true.cu
diff --git a/cpp/bench/CMakeLists.txt b/cpp/bench/CMakeLists.txt
index 6b2d463d0e..51170e4265 100644
--- a/cpp/bench/CMakeLists.txt
+++ b/cpp/bench/CMakeLists.txt
@@ -14,63 +14,122 @@
# limitations under the License.
#=============================================================================
-set(RAFT_CPP_BENCH_TARGET "bench_raft")
-
-# (please keep the filenames in alphabetical order)
-add_executable(${RAFT_CPP_BENCH_TARGET}
- bench/distance/distance_cosine.cu
- bench/distance/distance_exp_l2.cu
- bench/distance/distance_l1.cu
- bench/distance/distance_unexp_l2.cu
- bench/linalg/add.cu
- bench/linalg/map_then_reduce.cu
- bench/linalg/matrix_vector_op.cu
- bench/linalg/reduce.cu
- bench/random/make_blobs.cu
- bench/random/permute.cu
- bench/random/rng.cu
- bench/sparse/convert_csr.cu
- bench/spatial/fused_l2_nn.cu
- bench/spatial/knn.cu
- bench/spatial/selection.cu
- bench/main.cpp
-)
-
-set_target_properties(${RAFT_CPP_BENCH_TARGET}
- PROPERTIES BUILD_RPATH "\$ORIGIN"
- # set target compile options
- CXX_STANDARD 17
- CXX_STANDARD_REQUIRED ON
- CUDA_STANDARD 17
- CUDA_STANDARD_REQUIRED ON
- POSITION_INDEPENDENT_CODE ON
- INTERFACE_POSITION_INDEPENDENT_CODE ON
- INSTALL_RPATH "\$ORIGIN/../../../lib"
-)
-
-target_compile_options(${RAFT_CPP_BENCH_TARGET}
- PRIVATE "$<$:${RAFT_CXX_FLAGS}>"
- "$<$:${RAFT_CUDA_FLAGS}>"
-)
-
-target_include_directories(${RAFT_CPP_BENCH_TARGET}
- PUBLIC "$"
-)
-
-target_link_libraries(${RAFT_CPP_BENCH_TARGET}
- PRIVATE
- raft::raft
- raft::distance
- raft::nn
- faiss::faiss
- benchmark::benchmark
- $
- $
-)
+###################################################################################################
+# - compiler function -----------------------------------------------------------------------------
+
+function(ConfigureBench)
+
+set(options OPTIONAL DIST NN)
+set(oneValueArgs NAME )
+set(multiValueArgs PATH TARGETS CONFIGURATIONS)
+
+cmake_parse_arguments(ConfigureBench "${options}" "${oneValueArgs}"
+ "${multiValueArgs}" ${ARGN} )
+
+set(BENCH_NAME ${ConfigureBench_NAME})
+
+add_executable(${BENCH_NAME} ${ConfigureBench_PATH})
+
+message("BENCH PATH: ${ConfigureBench_PATH}")
+
+target_link_libraries(${BENCH_NAME}
+ PRIVATE
+ raft::raft
+ $<$:raft::distance>
+ $<$:raft::nn>
+ benchmark::benchmark
+ Threads::Threads
+ $
+ $
+ )
+
+set_target_properties(${BENCH_NAME}
+ PROPERTIES
+ # set target compile options
+ INSTALL_RPATH "\$ORIGIN/../../../lib"
+ CXX_STANDARD 17
+ CXX_STANDARD_REQUIRED ON
+ CUDA_STANDARD 17
+ CUDA_STANDARD_REQUIRED ON
+ POSITION_INDEPENDENT_CODE ON
+ INTERFACE_POSITION_INDEPENDENT_CODE ON
+ )
+
+target_compile_options(${BENCH_NAME}
+ PRIVATE "$<$:${RAFT_CXX_FLAGS}>"
+ "$<$:${RAFT_CUDA_FLAGS}>"
+ )
+
+target_include_directories(${BENCH_NAME}
+ PUBLIC "$"
+ )
install(
- TARGETS ${RAFT_CPP_BENCH_TARGET}
- COMPONENT testing
- DESTINATION bin/libraft/gbench
- EXCLUDE_FROM_ALL
-)
+ TARGETS ${BENCH_NAME}
+ COMPONENT testing
+ DESTINATION bin/gbench/libraft
+ EXCLUDE_FROM_ALL)
+
+endfunction()
+
+if(BUILD_BENCH)
+ ConfigureBench(NAME CLUSTER_BENCH
+ PATH
+ bench/cluster/kmeans_balanced.cu
+ bench/cluster/kmeans.cu
+ bench/main.cpp
+ OPTIONAL DIST NN
+ )
+
+ ConfigureBench(NAME DISTANCE_BENCH
+ PATH
+ bench/distance/distance_cosine.cu
+ bench/distance/distance_exp_l2.cu
+ bench/distance/distance_l1.cu
+ bench/distance/distance_unexp_l2.cu
+ bench/main.cpp
+ OPTIONAL DIST
+ )
+
+ ConfigureBench(NAME LINALG_BENCH
+ PATH
+ bench/linalg/add.cu
+ bench/linalg/map_then_reduce.cu
+ bench/linalg/matrix_vector_op.cu
+ bench/linalg/reduce.cu
+ bench/main.cpp
+ )
+
+ ConfigureBench(NAME RANDOM_BENCH
+ PATH
+ bench/random/make_blobs.cu
+ bench/random/permute.cu
+ bench/random/rng.cu
+ bench/main.cpp
+ )
+
+ ConfigureBench(NAME SPARSE_BENCH
+ PATH
+ bench/sparse/convert_csr.cu
+ bench/main.cpp
+ )
+
+ ConfigureBench(NAME SPATIAL_BENCH
+ PATH
+ bench/spatial/fused_l2_nn.cu
+ bench/spatial/knn/brute_force_float_int64_t.cu
+ bench/spatial/knn/brute_force_float_uint32_t.cu
+ bench/spatial/knn/ivf_flat_float_int64_t.cu
+ bench/spatial/knn/ivf_flat_float_uint32_t.cu
+ bench/spatial/knn/ivf_flat_int8_t_int64_t.cu
+ bench/spatial/knn/ivf_flat_uint8_t_uint32_t.cu
+ bench/spatial/knn/ivf_pq_float_int64_t.cu
+ bench/spatial/knn/ivf_pq_float_uint32_t.cu
+ bench/spatial/knn/ivf_pq_int8_t_int64_t.cu
+ bench/spatial/knn/ivf_pq_uint8_t_uint32_t.cu
+ bench/spatial/selection.cu
+ bench/main.cpp
+ OPTIONAL DIST NN
+ )
+endif()
+
diff --git a/cpp/bench/cluster/kmeans.cu b/cpp/bench/cluster/kmeans.cu
new file mode 100644
index 0000000000..bf4cc2f686
--- /dev/null
+++ b/cpp/bench/cluster/kmeans.cu
@@ -0,0 +1,115 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include
+#include
+#include
+
+#if defined RAFT_DISTANCE_COMPILED && defined RAFT_NN_COMPILED
+#include
+#endif
+
+namespace raft::bench::cluster {
+
+struct KMeansBenchParams {
+ DatasetParams data;
+ BlobsParams blobs;
+ raft::cluster::KMeansParams kmeans;
+};
+
+template
+struct KMeans : public BlobsFixture {
+ KMeans(const KMeansBenchParams& p) : BlobsFixture(p.data, p.blobs), params(p) {}
+
+ void run_benchmark(::benchmark::State& state) override
+ {
+ raft::device_matrix_view X_view = this->X.view();
+ std::optional> opt_weights_view = std::nullopt;
+ std::optional> centroids_view =
+ std::make_optional>(centroids.view());
+ raft::device_vector_view labels_view = labels.view();
+ raft::host_scalar_view inertia_view = raft::make_host_scalar_view(&inertia);
+ raft::host_scalar_view n_iter_view = raft::make_host_scalar_view(&n_iter);
+
+ this->loop_on_state(state, [&]() {
+ raft::cluster::kmeans_fit_predict(this->handle,
+ params.kmeans,
+ X_view,
+ opt_weights_view,
+ centroids_view,
+ labels_view,
+ inertia_view,
+ n_iter_view);
+ });
+ }
+
+ void allocate_temp_buffers(const ::benchmark::State& state) override
+ {
+ centroids =
+ raft::make_device_matrix(this->handle, params.kmeans.n_clusters, params.data.cols);
+ labels = raft::make_device_vector(this->handle, params.data.rows);
+ }
+
+ private:
+ KMeansBenchParams params;
+ raft::device_matrix centroids;
+ raft::device_vector labels;
+ T inertia;
+ IndexT n_iter;
+}; // struct KMeans
+
+std::vector getKMeansInputs()
+{
+ std::vector out;
+ KMeansBenchParams p;
+ p.data.row_major = true;
+ p.blobs.cluster_std = 1.0;
+ p.blobs.shuffle = false;
+ p.blobs.center_box_min = -10.0;
+ p.blobs.center_box_max = 10.0;
+ p.blobs.seed = 12345ULL;
+ p.kmeans.init = raft::cluster::KMeansParams::KMeansPlusPlus;
+ p.kmeans.max_iter = 300;
+ p.kmeans.tol = 1e-4;
+ p.kmeans.verbosity = RAFT_LEVEL_INFO;
+ p.kmeans.metric = raft::distance::DistanceType::L2Expanded;
+ p.kmeans.inertia_check = true;
+ std::vector> row_cols_k = {
+ {1000000, 20, 1000},
+ {3000000, 50, 20},
+ {10000000, 50, 5},
+ };
+ for (auto& rck : row_cols_k) {
+ p.data.rows = std::get<0>(rck);
+ p.data.cols = std::get<1>(rck);
+ p.blobs.n_clusters = std::get<2>(rck);
+ p.kmeans.n_clusters = std::get<2>(rck);
+ for (auto bs_shift : std::vector({16, 18})) {
+ p.kmeans.batch_samples = 1 << bs_shift;
+ out.push_back(p);
+ }
+ }
+ return out;
+}
+
+// note(lsugy): commenting out int64_t because the templates are not compiled in the distance
+// library, resulting in long compilation times.
+RAFT_BENCH_REGISTER((KMeans), "", getKMeansInputs());
+RAFT_BENCH_REGISTER((KMeans), "", getKMeansInputs());
+// RAFT_BENCH_REGISTER((KMeans), "", getKMeansInputs());
+// RAFT_BENCH_REGISTER((KMeans), "", getKMeansInputs());
+
+} // namespace raft::bench::cluster
diff --git a/cpp/bench/cluster/kmeans_balanced.cu b/cpp/bench/cluster/kmeans_balanced.cu
new file mode 100644
index 0000000000..210b40ced8
--- /dev/null
+++ b/cpp/bench/cluster/kmeans_balanced.cu
@@ -0,0 +1,110 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include
+#include
+#include
+
+#if defined RAFT_DISTANCE_COMPILED && defined RAFT_NN_COMPILED
+#include
+#endif
+
+namespace raft::bench::cluster {
+
+struct KMeansBalancedBenchParams {
+ DatasetParams data;
+ uint32_t max_iter;
+ uint32_t n_lists;
+ raft::distance::DistanceType metric;
+};
+
+template
+struct KMeansBalanced : public fixture {
+ KMeansBalanced(const KMeansBalancedBenchParams& p) : params(p) {}
+
+ void run_benchmark(::benchmark::State& state) override
+ {
+ this->loop_on_state(state, [this]() {
+ raft::spatial::knn::detail::kmeans::build_hierarchical(this->handle,
+ this->params.max_iter,
+ (uint32_t)this->params.data.cols,
+ this->X.data_handle(),
+ this->params.data.rows,
+ this->centroids.data_handle(),
+ this->params.n_lists,
+ this->params.metric,
+ this->handle.get_stream());
+ });
+ }
+
+ void allocate_data(const ::benchmark::State& state) override
+ {
+ X = raft::make_device_matrix(handle, params.data.rows, params.data.cols);
+
+ raft::random::RngState rng{1234};
+ constexpr T kRangeMax = std::is_integral_v ? std::numeric_limits::max() : T(1);
+ constexpr T kRangeMin = std::is_integral_v ? std::numeric_limits::min() : T(-1);
+ if constexpr (std::is_integral_v) {
+ raft::random::uniformInt(
+ rng, X.data_handle(), params.data.rows * params.data.cols, kRangeMin, kRangeMax, stream);
+ } else {
+ raft::random::uniform(
+ rng, X.data_handle(), params.data.rows * params.data.cols, kRangeMin, kRangeMax, stream);
+ }
+ handle.sync_stream(stream);
+ }
+
+ void allocate_temp_buffers(const ::benchmark::State& state) override
+ {
+ centroids =
+ raft::make_device_matrix(this->handle, params.n_lists, params.data.cols);
+ }
+
+ private:
+ KMeansBalancedBenchParams params;
+ raft::device_matrix X;
+ raft::device_matrix centroids;
+}; // struct KMeansBalanced
+
+std::vector getKMeansBalancedInputs()
+{
+ std::vector out;
+ KMeansBalancedBenchParams p;
+ p.data.row_major = true;
+ p.max_iter = 20;
+ p.metric = raft::distance::DistanceType::L2Expanded;
+ std::vector> row_cols = {
+ {100000, 128}, {1000000, 128}, {10000000, 128},
+ // The following dataset sizes are too large for most GPUs.
+ // {100000000, 128},
+ };
+ for (auto& rc : row_cols) {
+ p.data.rows = rc.first;
+ p.data.cols = rc.second;
+ for (auto n_lists : std::vector({1000, 10000, 100000})) {
+ p.n_lists = n_lists;
+ out.push_back(p);
+ }
+ }
+ return out;
+}
+
+// Note: the datasets sizes are too large for 32-bit index types.
+RAFT_BENCH_REGISTER((KMeansBalanced), "", getKMeansBalancedInputs());
+RAFT_BENCH_REGISTER((KMeansBalanced), "", getKMeansBalancedInputs());
+RAFT_BENCH_REGISTER((KMeansBalanced), "", getKMeansBalancedInputs());
+
+} // namespace raft::bench::cluster
diff --git a/cpp/bench/common/benchmark.hpp b/cpp/bench/common/benchmark.hpp
index fb878a0c8d..adfe5218e2 100644
--- a/cpp/bench/common/benchmark.hpp
+++ b/cpp/bench/common/benchmark.hpp
@@ -18,9 +18,12 @@
#include
+#include
+#include
#include
#include
#include
+#include
#include
@@ -121,6 +124,10 @@ class fixture {
// every benchmark should be overriding this
virtual void run_benchmark(::benchmark::State& state) = 0;
virtual void generate_metrics(::benchmark::State& state) {}
+ virtual void allocate_data(const ::benchmark::State& state) {}
+ virtual void deallocate_data(const ::benchmark::State& state) {}
+ virtual void allocate_temp_buffers(const ::benchmark::State& state) {}
+ virtual void deallocate_temp_buffers(const ::benchmark::State& state) {}
protected:
/** The helper that writes zeroes to some buffer in GPU memory to flush the L2 cache. */
@@ -144,6 +151,58 @@ class fixture {
}
};
+/** Indicates the dataset size. */
+struct DatasetParams {
+ size_t rows;
+ size_t cols;
+ bool row_major;
+};
+
+/** Holds params needed to generate blobs dataset */
+struct BlobsParams {
+ int n_clusters;
+ double cluster_std;
+ bool shuffle;
+ double center_box_min, center_box_max;
+ uint64_t seed;
+};
+
+/** Fixture for cluster benchmarks using make_blobs */
+template
+class BlobsFixture : public fixture {
+ public:
+ BlobsFixture(const DatasetParams dp, const BlobsParams bp) : data_params(dp), blobs_params(bp) {}
+
+ virtual void run_benchmark(::benchmark::State& state) = 0;
+
+ void allocate_data(const ::benchmark::State& state) override
+ {
+ auto labels_ref = raft::make_device_vector(this->handle, data_params.rows);
+ X = raft::make_device_matrix(this->handle, data_params.rows, data_params.cols);
+
+ raft::random::make_blobs(X.data_handle(),
+ labels_ref.data_handle(),
+ (IndexT)data_params.rows,
+ (IndexT)data_params.cols,
+ (IndexT)blobs_params.n_clusters,
+ stream,
+ data_params.row_major,
+ nullptr,
+ nullptr,
+ (T)blobs_params.cluster_std,
+ blobs_params.shuffle,
+ (T)blobs_params.center_box_min,
+ (T)blobs_params.center_box_max,
+ blobs_params.seed);
+ this->handle.sync_stream(stream);
+ }
+
+ protected:
+ DatasetParams data_params;
+ BlobsParams blobs_params;
+ raft::device_matrix X;
+};
+
namespace internal {
template
@@ -162,8 +221,17 @@ class Fixture : public ::benchmark::Fixture {
{
fixture_ =
std::apply([](const Params&... ps) { return std::make_unique(ps...); }, params_);
+ fixture_->allocate_data(state);
+ fixture_->allocate_temp_buffers(state);
+ }
+
+ void TearDown(const State& state) override
+ {
+ fixture_->deallocate_temp_buffers(state);
+ fixture_->deallocate_data(state);
+ fixture_.reset();
}
- void TearDown(const State& state) override { fixture_.reset(); }
+
void SetUp(State& st) override { SetUp(const_cast(st)); }
void TearDown(State& st) override { TearDown(const_cast(st)); }
@@ -248,6 +316,10 @@ struct registrar {
}; // namespace internal
+#define RAFT_BENCH_REGISTER_INTERNAL(TestClass, ...) \
+ static raft::bench::internal::registrar BENCHMARK_PRIVATE_NAME(registrar)( \
+ RAFT_STRINGIFY(TestClass), __VA_ARGS__)
+
/**
* This is the entry point macro for all benchmarks. This needs to be called
* for the set of benchmarks to be registered so that the main harness inside
@@ -262,8 +334,7 @@ struct registrar {
* empty string
* @param params... zero or more lists of params upon which to benchmark.
*/
-#define RAFT_BENCH_REGISTER(TestClass, ...) \
- static raft::bench::internal::registrar BENCHMARK_PRIVATE_NAME(registrar)( \
- #TestClass, __VA_ARGS__)
+#define RAFT_BENCH_REGISTER(TestClass, ...) \
+ RAFT_BENCH_REGISTER_INTERNAL(RAFT_DEPAREN(TestClass), __VA_ARGS__)
} // namespace raft::bench
diff --git a/cpp/bench/distance/distance_common.cuh b/cpp/bench/distance/distance_common.cuh
index dae2550326..4f1a8ccab1 100644
--- a/cpp/bench/distance/distance_common.cuh
+++ b/cpp/bench/distance/distance_common.cuh
@@ -16,9 +16,9 @@
#include
#include
-#include
+#include
#if defined RAFT_DISTANCE_COMPILED
-#include
+#include
#endif
#include
diff --git a/cpp/bench/linalg/add.cu b/cpp/bench/linalg/add.cu
index 7c651b61ed..7d00b8cbae 100644
--- a/cpp/bench/linalg/add.cu
+++ b/cpp/bench/linalg/add.cu
@@ -15,7 +15,7 @@
*/
#include
-#include
+#include
#include
namespace raft::bench::linalg {
diff --git a/cpp/bench/linalg/map_then_reduce.cu b/cpp/bench/linalg/map_then_reduce.cu
index 7eeb4a79b6..33a3e66264 100644
--- a/cpp/bench/linalg/map_then_reduce.cu
+++ b/cpp/bench/linalg/map_then_reduce.cu
@@ -15,7 +15,7 @@
*/
#include
-#include
+#include
#include
namespace raft::bench::linalg {
diff --git a/cpp/bench/linalg/matrix_vector_op.cu b/cpp/bench/linalg/matrix_vector_op.cu
index d3a53ea345..aa8f2667ed 100644
--- a/cpp/bench/linalg/matrix_vector_op.cu
+++ b/cpp/bench/linalg/matrix_vector_op.cu
@@ -15,7 +15,7 @@
*/
#include
-#include
+#include
#include
namespace raft::bench::linalg {
diff --git a/cpp/bench/linalg/reduce.cu b/cpp/bench/linalg/reduce.cu
index 018086a689..015e0b8abe 100644
--- a/cpp/bench/linalg/reduce.cu
+++ b/cpp/bench/linalg/reduce.cu
@@ -15,7 +15,7 @@
*/
#include
-#include
+#include
#include
diff --git a/cpp/bench/random/make_blobs.cu b/cpp/bench/random/make_blobs.cu
index c449223040..fdd4ef61d2 100644
--- a/cpp/bench/random/make_blobs.cu
+++ b/cpp/bench/random/make_blobs.cu
@@ -15,7 +15,7 @@
*/
#include
-#include
+#include
#include
#include
diff --git a/cpp/bench/random/permute.cu b/cpp/bench/random/permute.cu
index a72eca3f87..5364bb44e3 100644
--- a/cpp/bench/random/permute.cu
+++ b/cpp/bench/random/permute.cu
@@ -16,7 +16,7 @@
#include
#include
-#include
+#include
#include
#include
diff --git a/cpp/bench/sparse/convert_csr.cu b/cpp/bench/sparse/convert_csr.cu
index 0e701518ab..830fab13cc 100644
--- a/cpp/bench/sparse/convert_csr.cu
+++ b/cpp/bench/sparse/convert_csr.cu
@@ -14,8 +14,6 @@
* limitations under the License.
*/
-#include
-#include
#include
#include
diff --git a/cpp/bench/spatial/fused_l2_nn.cu b/cpp/bench/spatial/fused_l2_nn.cu
index dc3b507fbf..aa36483145 100644
--- a/cpp/bench/spatial/fused_l2_nn.cu
+++ b/cpp/bench/spatial/fused_l2_nn.cu
@@ -17,14 +17,17 @@
#include
#include
#include
-#include
+#include
#include
-#include
+#include
#include
-#if defined RAFT_NN_COMPILED
-#include
-#endif
+// TODO: Once fusedL2NN is specialized in the raft_distance shared library, add
+// back
+//
+// #if defined RAFT_NN_COMPILED
+// #include
+// #endif
namespace raft::bench::spatial {
@@ -73,6 +76,30 @@ struct fused_l2_nn : public fixture {
false,
stream);
});
+
+ // Num distance calculations
+ int64_t num_dist_calcs = (int64_t)params.n * (int64_t)params.m;
+
+ int64_t num_flops = 3 * num_dist_calcs * params.k;
+
+ int64_t read_elts = (int64_t)params.n * params.k + (int64_t)params.m * params.k;
+ int64_t write_elts = (int64_t)params.n;
+
+ state.counters["D/s"] = benchmark::Counter(num_dist_calcs,
+ benchmark::Counter::kIsIterationInvariantRate,
+ benchmark::Counter::OneK::kIs1000);
+
+ state.counters["FLOP/s"] = benchmark::Counter(
+ num_flops, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1000);
+
+ state.counters["BW Wr"] = benchmark::Counter(write_elts * sizeof(cub::KeyValuePair),
+ benchmark::Counter::kIsIterationInvariantRate,
+ benchmark::Counter::OneK::kIs1000);
+ state.counters["BW Rd"] = benchmark::Counter(read_elts * sizeof(float),
+ benchmark::Counter::kIsIterationInvariantRate,
+ benchmark::Counter::OneK::kIs1000);
+
+ state.counters["K"] = benchmark::Counter(params.k);
}
private:
@@ -88,9 +115,9 @@ const std::vector fused_l2_nn_input_vecs = {
{32, 16384, 16384}, {64, 16384, 16384}, {128, 16384, 16384}, {256, 16384, 16384},
{512, 16384, 16384}, {1024, 16384, 16384}, {16384, 32, 16384}, {16384, 64, 16384},
{16384, 128, 16384}, {16384, 256, 16384}, {16384, 512, 16384}, {16384, 1024, 16384},
+ {16384, 16384, 2}, {16384, 16384, 4}, {16384, 16384, 8}, {16384, 16384, 16},
{16384, 16384, 32}, {16384, 16384, 64}, {16384, 16384, 128}, {16384, 16384, 256},
{16384, 16384, 512}, {16384, 16384, 1024}, {16384, 16384, 16384},
-
};
RAFT_BENCH_REGISTER(fused_l2_nn, "", fused_l2_nn_input_vecs);
diff --git a/cpp/bench/spatial/knn.cu b/cpp/bench/spatial/knn.cuh
similarity index 85%
rename from cpp/bench/spatial/knn.cu
rename to cpp/bench/spatial/knn.cuh
index 64a1217d7f..bb01320cdf 100644
--- a/cpp/bench/spatial/knn.cu
+++ b/cpp/bench/spatial/knn.cuh
@@ -14,13 +14,25 @@
* limitations under the License.
*/
+#pragma once
+
#include
#include
#include
+#include
+#include
+
+#if defined RAFT_DISTANCE_COMPILED
+#include
+#endif
+
#if defined RAFT_NN_COMPILED
#include
+#if defined RAFT_DISTANCE_COMPILED
+#include
+#endif
#endif
#include
@@ -44,16 +56,16 @@ struct params {
size_t k;
};
-auto operator<<(std::ostream& os, const params& p) -> std::ostream&
+inline auto operator<<(std::ostream& os, const params& p) -> std::ostream&
{
os << p.n_samples << "#" << p.n_dims << "#" << p.n_queries << "#" << p.k;
return os;
}
-enum class TransferStrategy { NO_COPY, COPY_PLAIN, COPY_PINNED, MAP_PINNED, MANAGED };
-enum class Scope { BUILD, SEARCH, BUILD_SEARCH };
+enum class TransferStrategy { NO_COPY, COPY_PLAIN, COPY_PINNED, MAP_PINNED, MANAGED }; // NOLINT
+enum class Scope { BUILD, SEARCH, BUILD_SEARCH }; // NOLINT
-auto operator<<(std::ostream& os, const TransferStrategy& ts) -> std::ostream&
+inline auto operator<<(std::ostream& os, const TransferStrategy& ts) -> std::ostream&
{
switch (ts) {
case TransferStrategy::NO_COPY: os << "NO_COPY"; break;
@@ -66,7 +78,7 @@ auto operator<<(std::ostream& os, const TransferStrategy& ts) -> std::ostream&
return os;
}
-auto operator<<(std::ostream& os, const Scope& s) -> std::ostream&
+inline auto operator<<(std::ostream& os, const Scope& s) -> std::ostream&
{
switch (s) {
case Scope::BUILD: os << "BUILD"; break;
@@ -155,6 +167,34 @@ struct ivf_flat_knn {
}
};
+template
+struct ivf_pq_knn {
+ using dist_t = float;
+
+ std::optional> index;
+ raft::spatial::knn::ivf_pq::index_params index_params;
+ raft::spatial::knn::ivf_pq::search_params search_params;
+ params ps;
+
+ ivf_pq_knn(const raft::handle_t& handle, const params& ps, const ValT* data) : ps(ps)
+ {
+ index_params.n_lists = 4096;
+ index_params.metric = raft::distance::DistanceType::L2Expanded;
+ index.emplace(raft::spatial::knn::ivf_pq::build(
+ handle, index_params, data, IdxT(ps.n_samples), uint32_t(ps.n_dims)));
+ }
+
+ void search(const raft::handle_t& handle,
+ const ValT* search_items,
+ dist_t* out_dists,
+ IdxT* out_idxs)
+ {
+ search_params.n_probes = 20;
+ raft::spatial::knn::ivf_pq::search(
+ handle, search_params, *index, search_items, ps.n_queries, ps.k, out_idxs, out_dists);
+ }
+};
+
template
struct brute_force_knn {
using dist_t = ValT;
@@ -216,7 +256,7 @@ struct knn : public fixture {
}
template
- void gen_data(raft::random::RngState& state,
+ void gen_data(raft::random::RngState& state, // NOLINT
rmm::device_uvector& vec,
size_t n,
rmm::cuda_stream_view stream)
@@ -337,15 +377,15 @@ struct knn : public fixture {
rmm::device_uvector out_idxs_;
};
-const std::vector kInputs{
+inline const std::vector kInputs{
{2000000, 128, 1000, 32}, {10000000, 128, 1000, 32}, {10000, 8192, 1000, 32}};
-const std::vector kAllStrategies{
+inline const std::vector kAllStrategies{
TransferStrategy::NO_COPY, TransferStrategy::MAP_PINNED, TransferStrategy::MANAGED};
-const std::vector kNoCopyOnly{TransferStrategy::NO_COPY};
+inline const std::vector kNoCopyOnly{TransferStrategy::NO_COPY};
-const std::vector kScopeFull{Scope::BUILD_SEARCH};
-const std::vector kAllScopes{Scope::BUILD_SEARCH, Scope::SEARCH, Scope::BUILD};
+inline const std::vector kScopeFull{Scope::BUILD_SEARCH};
+inline const std::vector kAllScopes{Scope::BUILD_SEARCH, Scope::SEARCH, Scope::BUILD};
#define KNN_REGISTER(ValT, IdxT, ImplT, inputs, strats, scope) \
namespace BENCHMARK_PRIVATE_NAME(knn) \
@@ -354,14 +394,4 @@ const std::vector kAllScopes{Scope::BUILD_SEARCH, Scope::SEARCH, Scope::B
RAFT_BENCH_REGISTER(KNN, #ValT "/" #IdxT "/" #ImplT, inputs, strats, scope); \
}
-KNN_REGISTER(float, int64_t, brute_force_knn, kInputs, kAllStrategies, kScopeFull);
-KNN_REGISTER(float, int64_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
-KNN_REGISTER(int8_t, int64_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
-KNN_REGISTER(uint8_t, int64_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
-
-KNN_REGISTER(float, uint32_t, brute_force_knn, kInputs, kNoCopyOnly, kScopeFull);
-KNN_REGISTER(float, uint32_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
-KNN_REGISTER(int8_t, uint32_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
-KNN_REGISTER(uint8_t, uint32_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
-
} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/brute_force_float_int64_t.cu b/cpp/bench/spatial/knn/brute_force_float_int64_t.cu
new file mode 100644
index 0000000000..d981104e20
--- /dev/null
+++ b/cpp/bench/spatial/knn/brute_force_float_int64_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(float, int64_t, brute_force_knn, kInputs, kAllStrategies, kScopeFull);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/brute_force_float_uint32_t.cu b/cpp/bench/spatial/knn/brute_force_float_uint32_t.cu
new file mode 100644
index 0000000000..60f7edae96
--- /dev/null
+++ b/cpp/bench/spatial/knn/brute_force_float_uint32_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(float, uint32_t, brute_force_knn, kInputs, kAllStrategies, kScopeFull);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/ivf_flat_float_int64_t.cu b/cpp/bench/spatial/knn/ivf_flat_float_int64_t.cu
new file mode 100644
index 0000000000..594d4d16d2
--- /dev/null
+++ b/cpp/bench/spatial/knn/ivf_flat_float_int64_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(float, int64_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/ivf_flat_float_uint32_t.cu b/cpp/bench/spatial/knn/ivf_flat_float_uint32_t.cu
new file mode 100644
index 0000000000..595ad2b922
--- /dev/null
+++ b/cpp/bench/spatial/knn/ivf_flat_float_uint32_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(float, uint32_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/ivf_flat_int8_t_int64_t.cu b/cpp/bench/spatial/knn/ivf_flat_int8_t_int64_t.cu
new file mode 100644
index 0000000000..bd268f036c
--- /dev/null
+++ b/cpp/bench/spatial/knn/ivf_flat_int8_t_int64_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(int8_t, int64_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/ivf_flat_uint8_t_uint32_t.cu b/cpp/bench/spatial/knn/ivf_flat_uint8_t_uint32_t.cu
new file mode 100644
index 0000000000..9d8b982c3e
--- /dev/null
+++ b/cpp/bench/spatial/knn/ivf_flat_uint8_t_uint32_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(uint8_t, uint32_t, ivf_flat_knn, kInputs, kNoCopyOnly, kAllScopes);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/ivf_pq_float_int64_t.cu b/cpp/bench/spatial/knn/ivf_pq_float_int64_t.cu
new file mode 100644
index 0000000000..18d8cd8ad6
--- /dev/null
+++ b/cpp/bench/spatial/knn/ivf_pq_float_int64_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(float, int64_t, ivf_pq_knn, kInputs, kNoCopyOnly, kAllScopes);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/ivf_pq_float_uint32_t.cu b/cpp/bench/spatial/knn/ivf_pq_float_uint32_t.cu
new file mode 100644
index 0000000000..81621674bf
--- /dev/null
+++ b/cpp/bench/spatial/knn/ivf_pq_float_uint32_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(float, uint32_t, ivf_pq_knn, kInputs, kNoCopyOnly, kAllScopes);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/ivf_pq_int8_t_int64_t.cu b/cpp/bench/spatial/knn/ivf_pq_int8_t_int64_t.cu
new file mode 100644
index 0000000000..cc28eee67c
--- /dev/null
+++ b/cpp/bench/spatial/knn/ivf_pq_int8_t_int64_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(int8_t, int64_t, ivf_pq_knn, kInputs, kNoCopyOnly, kAllScopes);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/knn/ivf_pq_uint8_t_uint32_t.cu b/cpp/bench/spatial/knn/ivf_pq_uint8_t_uint32_t.cu
new file mode 100644
index 0000000000..b4759cbac1
--- /dev/null
+++ b/cpp/bench/spatial/knn/ivf_pq_uint8_t_uint32_t.cu
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "../knn.cuh"
+
+namespace raft::bench::spatial {
+
+KNN_REGISTER(uint8_t, uint32_t, ivf_pq_knn, kInputs, kNoCopyOnly, kAllScopes);
+
+} // namespace raft::bench::spatial
diff --git a/cpp/bench/spatial/selection.cu b/cpp/bench/spatial/selection.cu
index c3a2bc6d3d..1f116c199f 100644
--- a/cpp/bench/spatial/selection.cu
+++ b/cpp/bench/spatial/selection.cu
@@ -18,7 +18,7 @@
#include
#if defined RAFT_NN_COMPILED
-#include
+#include
#endif
#include
diff --git a/cpp/cmake/thirdparty/get_faiss.cmake b/cpp/cmake/thirdparty/get_faiss.cmake
index f61ba7014c..e6f06a00a5 100644
--- a/cpp/cmake/thirdparty/get_faiss.cmake
+++ b/cpp/cmake/thirdparty/get_faiss.cmake
@@ -15,7 +15,7 @@
#=============================================================================
function(find_and_configure_faiss)
- set(oneValueArgs VERSION PINNED_TAG BUILD_STATIC_LIBS EXCLUDE_FROM_ALL)
+ set(oneValueArgs VERSION REPOSITORY PINNED_TAG BUILD_STATIC_LIBS EXCLUDE_FROM_ALL)
cmake_parse_arguments(PKG "${options}" "${oneValueArgs}"
"${multiValueArgs}" ${ARGN} )
@@ -25,15 +25,16 @@ function(find_and_configure_faiss)
LIBRARY_NAMES faiss
)
- set(BUILD_SHARED_LIBS OFF)
- if (NOT PKG_BUILD_STATIC_LIBS)
- set(BUILD_SHARED_LIBS ON)
+ set(BUILD_SHARED_LIBS ON)
+ if (PKG_BUILD_STATIC_LIBS)
+ set(BUILD_SHARED_LIBS OFF)
+ set(CPM_DOWNLOAD_faiss ON)
endif()
rapids_cpm_find(faiss ${PKG_VERSION}
GLOBAL_TARGETS faiss::faiss
CPM_ARGS
- GIT_REPOSITORY https://github.com/facebookresearch/faiss.git
+ GIT_REPOSITORY ${PKG_REPOSITORY}
GIT_TAG ${PKG_PINNED_TAG}
EXCLUDE_FROM_ALL ${PKG_EXCLUDE_FROM_ALL}
OPTIONS
@@ -42,6 +43,7 @@ function(find_and_configure_faiss)
"FAISS_ENABLE_GPU ON"
"BUILD_TESTING OFF"
"CMAKE_MESSAGE_LOG_LEVEL VERBOSE"
+ "FAISS_USE_CUDA_TOOLKIT_STATIC ${CUDA_STATIC_RUNTIME}"
)
if(TARGET faiss AND NOT TARGET faiss::faiss)
@@ -66,7 +68,22 @@ function(find_and_configure_faiss)
rapids_export_find_package_root(BUILD faiss [=[${CMAKE_CURRENT_LIST_DIR}]=] raft-nn-lib-exports)
endfunction()
+if(NOT RAFT_FAISS_GIT_TAG)
+ # TODO: Remove this once faiss supports FAISS_USE_CUDA_TOOLKIT_STATIC
+ # (https://github.com/facebookresearch/faiss/pull/2446)
+ set(RAFT_FAISS_GIT_TAG fea/statically-link-ctk-v1.7.0)
+ # set(RAFT_FAISS_GIT_TAG bde7c0027191f29c9dadafe4f6e68ca0ee31fb30)
+endif()
+
+if(NOT RAFT_FAISS_GIT_REPOSITORY)
+ # TODO: Remove this once faiss supports FAISS_USE_CUDA_TOOLKIT_STATIC
+ # (https://github.com/facebookresearch/faiss/pull/2446)
+ set(RAFT_FAISS_GIT_REPOSITORY https://github.com/trxcllnt/faiss.git)
+ # set(RAFT_FAISS_GIT_REPOSITORY https://github.com/facebookresearch/faiss.git)
+endif()
+
find_and_configure_faiss(VERSION 1.7.0
- PINNED_TAG bde7c0027191f29c9dadafe4f6e68ca0ee31fb30
+ REPOSITORY ${RAFT_FAISS_GIT_REPOSITORY}
+ PINNED_TAG ${RAFT_FAISS_GIT_TAG}
BUILD_STATIC_LIBS ${RAFT_USE_FAISS_STATIC}
EXCLUDE_FROM_ALL ${RAFT_EXCLUDE_FAISS_FROM_ALL})
diff --git a/cpp/doxygen/Doxyfile.in b/cpp/doxygen/Doxyfile.in
index 6f29e79146..549862600a 100644
--- a/cpp/doxygen/Doxyfile.in
+++ b/cpp/doxygen/Doxyfile.in
@@ -880,7 +880,27 @@ RECURSIVE = YES
# run.
EXCLUDE = @CMAKE_CURRENT_SOURCE_DIR@/include/raft/sparse/linalg/symmetrize.hpp \
- \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/cache \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/common \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/lap \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/sparse/selection \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/sparse/csr.hpp \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/linalg/lanczos.cuh \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/linalg/lanczos.hpp \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/cuda_utils.cuh \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/cudart_utils.h \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/util/device_atomics.cuh \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/device_utils.cuh \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/error.hpp \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/handle.hpp \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/integer_utils.h \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/interruptible.hpp \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/mdarray.hpp \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/pow2_utils.cuh \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/span.hpp \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/vectorized.cuh \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/raft.hpp \
+ @CMAKE_CURRENT_SOURCE_DIR@/include/raft/core/cudart_utils.hpp
# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or
# directories that are symbolic links (a Unix file system feature) are excluded
diff --git a/cpp/include/raft.hpp b/cpp/include/raft.hpp
index b1b8255b7e..6a4f323c58 100644
--- a/cpp/include/raft.hpp
+++ b/cpp/include/raft.hpp
@@ -15,11 +15,12 @@
*/
/**
- * This file is deprecated and will be removed in release 22.06.
+ * This file is deprecated and will be removed in a future release.
*/
-#include "raft/handle.hpp"
-#include "raft/mdarray.hpp"
-#include "raft/span.hpp"
+#include "raft/core/device_mdarray.hpp"
+#include "raft/core/device_mdspan.hpp"
+#include "raft/core/device_span.hpp"
+#include "raft/core/handle.hpp"
#include
diff --git a/cpp/include/raft/cache/cache_util.cuh b/cpp/include/raft/cache/cache_util.cuh
index 3e2222eff1..60da09ca7c 100644
--- a/cpp/include/raft/cache/cache_util.cuh
+++ b/cpp/include/raft/cache/cache_util.cuh
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2019-2022, NVIDIA CORPORATION.
+ * Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -13,356 +13,19 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-
-#pragma once
-
-#include
-#include
-
-namespace raft {
-namespace cache {
-
-/**
- * @brief Collect vectors of data from the cache into a contiguous memory buffer.
- *
- * We assume contiguous memory layout for the output buffer, i.e. we get
- * column vectors into a column major out buffer, or row vectors into a row
- * major output buffer.
- *
- * On exit, the output array is filled the following way:
- * out[i + n_vec*k] = cache[i + n_vec * cache_idx[k]]), where i=0..n_vec-1, and
- * k = 0..n-1 where cache_idx[k] >= 0
- *
- * We ignore vectors where cache_idx[k] < 0.
- *
- * @param [in] cache stores the cached data, size [n_vec x n_cached_vectors]
- * @param [in] n_vec number of elements in a cached vector
- * @param [in] cache_idx cache indices, size [n]
- * @param [in] n the number of elements that need to be collected
- * @param [out] out vectors collected from the cache, size [n_vec * n]
- */
-template
-__global__ void get_vecs(
- const math_t* cache, int_t n_vec, const idx_t* cache_idx, int_t n, math_t* out)
-{
- int tid = threadIdx.x + blockIdx.x * blockDim.x;
- int row = tid % n_vec; // row idx
- if (tid < n_vec * n) {
- size_t out_col = tid / n_vec; // col idx
- size_t cache_col = cache_idx[out_col];
- if (cache_idx[out_col] >= 0) {
- if (row + out_col * n_vec < (size_t)n_vec * n) { out[tid] = cache[row + cache_col * n_vec]; }
- }
- }
-}
-
-/**
- * @brief Store vectors of data into the cache.
- *
- * Elements within a vector should be contiguous in memory (i.e. column vectors
- * for column major data storage, or row vectors of row major data).
- *
- * If tile_idx==nullptr then the operation is the opposite of get_vecs,
- * i.e. we store
- * cache[i + cache_idx[k]*n_vec] = tile[i + k*n_vec], for i=0..n_vec-1, k=0..n-1
- *
- * If tile_idx != nullptr, then we permute the vectors from tile according
- * to tile_idx. This allows to store vectors from a buffer where the individual
- * vectors are not stored contiguously (but the elements of each vector shall
- * be contiguous):
- * cache[i + cache_idx[k]*n_vec] = tile[i + tile_idx[k]*n_vec],
- * for i=0..n_vec-1, k=0..n-1
- *
- * @param [in] tile stores the data to be cashed cached, size [n_vec x n_tile]
- * @param [in] n_tile number of vectors in the input tile
- * @param [in] n_vec number of elements in a cached vector
- * @param [in] tile_idx indices of vectors that we want to store
- * @param [in] n number of vectos that we want to store (n <= n_tile)
- * @param [in] cache_idx cache indices, size [n], negative values are ignored
- * @param [inout] cache updated cache
- * @param [in] n_cache_vecs
- */
-template
-__global__ void store_vecs(const math_t* tile,
- int n_tile,
- int n_vec,
- const int* tile_idx,
- int n,
- const int* cache_idx,
- math_t* cache,
- int n_cache_vecs)
-{
- int tid = threadIdx.x + blockIdx.x * blockDim.x;
- int row = tid % n_vec; // row idx
- if (tid < n_vec * n) {
- int tile_col = tid / n_vec; // col idx
- int data_col = tile_idx ? tile_idx[tile_col] : tile_col;
- int cache_col = cache_idx[tile_col];
-
- // We ignore negative values. The rest of the checks should be fulfilled
- // if the cache is used properly
- if (cache_col >= 0 && cache_col < n_cache_vecs && data_col < n_tile) {
- cache[row + (size_t)cache_col * n_vec] = tile[row + (size_t)data_col * n_vec];
- }
- }
-}
-
-/**
- * @brief Map a key to a cache set.
- *
- * @param key key to be hashed
- * @param n_cache_sets number of cache sets
- * @return index of the cache set [0..n_cache_set)
- */
-int DI hash(int key, int n_cache_sets) { return key % n_cache_sets; }
-
-/**
- * @brief Binary search to find the first element in the array which is greater
- * equal than a given value.
- * @param [in] array sorted array of n numbers
- * @param [in] n length of the array
- * @param [in] val the value to search for
- * @return the index of the first element in the array for which
- * array[idx] >= value. If there is no such value, then return n.
- */
-int DI arg_first_ge(const int* array, int n, int val)
-{
- int start = 0;
- int end = n - 1;
- if (array[0] == val) return 0;
- if (array[end] < val) return n;
- while (start + 1 < end) {
- int q = (start + end + 1) / 2;
- // invariants:
- // start < end
- // start < q <=end
- // array[start] < val && array[end] <=val
- // at every iteration d = end-start is decreasing
- // when d==0, then array[end] will be the first element >= val.
- if (array[q] >= val) {
- end = q;
- } else {
- start = q;
- }
- }
- return end;
-}
-/**
- * @brief Find the k-th occurrence of value in a sorted array.
- *
- * Assume that array is [0, 1, 1, 1, 2, 2, 4, 4, 4, 4, 6, 7]
- * then find_nth_occurrence(cset, 12, 4, 2) == 7, because cset_array[7] stores
- * the second element with value = 4.
- * If there are less than k values in the array, then return -1
- *
- * @param [in] array sorted array of numbers, size [n]
- * @param [in] n number of elements in the array
- * @param [in] val the value we are searching for
- * @param [in] k
- * @return the idx of the k-th occurance of val in array, or -1 if
- * the value is not found.
- */
-int DI find_nth_occurrence(const int* array, int n, int val, int k)
-{
- int q = arg_first_ge(array, n, val);
- if (q + k < n && array[q + k] == val) {
- q += k;
- } else {
- q = -1;
- }
- return q;
-}
-
/**
- * @brief Rank the entries in a cache set according to the time stamp, return
- * the indices that would sort the time stamp in ascending order.
- *
- * Assume we have a single cache set with time stamps as:
- * key (threadIdx.x): 0 1 2 3
- * val (time stamp): 8 6 7 5
- *
- * The corresponding sorted key-value pairs:
- * key: 3 1 2 0
- * val: 5 6 7 8
- * rank: 0th 1st 2nd 3rd
- *
- * On return, the rank is assigned for each thread:
- * threadIdx.x: 0 1 2 3
- * rank: 3 1 2 0
- *
- * For multiple cache sets, launch one block per cache set.
- *
- * @tparam nthreads number of threads per block (nthreads <= associativity)
- * @tparam associativity number of items in a cache set
- *
- * @param [in] cache_time time stamp of caching the data,
- size [associativity * n_cache_sets]
- * @param [in] n_cache_sets number of cache sets
- * @param [out] rank within the cache set size [nthreads * items_per_thread]
- * Each block should give a different pointer for rank.
+ * This file is deprecated and will be removed in release 22.06.
+ * Please use the cuh version instead.
*/
-template
-DI void rank_set_entries(const int* cache_time, int n_cache_sets, int* rank)
-{
- const int items_per_thread = raft::ceildiv(associativity, nthreads);
- typedef cub::BlockRadixSort BlockRadixSort;
- __shared__ typename BlockRadixSort::TempStorage temp_storage;
-
- int key[items_per_thread];
- int val[items_per_thread];
-
- int block_offset = blockIdx.x * associativity;
-
- for (int j = 0; j < items_per_thread; j++) {
- int k = threadIdx.x + j * nthreads;
- int t = (k < associativity) ? cache_time[block_offset + k] : 32768;
- key[j] = t;
- val[j] = k;
- }
-
- BlockRadixSort(temp_storage).Sort(key, val);
-
- for (int j = 0; j < items_per_thread; j++) {
- if (val[j] < associativity) { rank[val[j]] = threadIdx.x * items_per_thread + j; }
- }
- __syncthreads();
-}
/**
- * @brief Assign cache location to a set of keys using LRU replacement policy.
- *
- * The keys and the corresponding cache_set arrays shall be sorted according
- * to cache_set in ascending order. One block should be launched for every cache
- * set.
- *
- * Each cache set is sorted according to time_stamp, and values from keys
- * are filled in starting at the oldest time stamp. Entries that were accessed
- * at the current time are not reassigned.
- *
- * @tparam nthreads number of threads per block
- * @tparam associativity number of keys in a cache set
- *
- * @param [in] keys that we want to cache size [n]
- * @param [in] n number of keys
- * @param [in] cache_set assigned to keys, size [n]
- * @param [inout] cached_keys keys of already cached vectors,
- * size [n_cache_sets*associativity], on exit it will be updated with the
- * cached elements from keys.
- * @param [in] n_cache_sets number of cache sets
- * @param [inout] cache_time will be updated to "time" for those elements that
- * could be assigned to a cache location, size [n_cache_sets*associativity]
- * @param [in] time time stamp
- * @param [out] cache_idx the cache idx assigned to the input, or -1 if it could
- * not be cached, size [n]
+ * DISCLAIMER: this file is deprecated: use lap.cuh instead
*/
-template
-__global__ void assign_cache_idx(const int* keys,
- int n,
- const int* cache_set,
- int* cached_keys,
- int n_cache_sets,
- int* cache_time,
- int time,
- int* cache_idx)
-{
- int block_offset = blockIdx.x * associativity;
-
- const int items_per_thread = raft::ceildiv(associativity, nthreads);
-
- // the size of rank limits how large associativity can be used in practice
- __shared__ int rank[items_per_thread * nthreads];
- rank_set_entries(cache_time, n_cache_sets, rank);
-
- // Each thread will fill items_per_thread items in the cache.
- // It uses a place, only if it was not updated at the current time step
- // (cache_time != time).
- // We rank the places according to the time stamp, least recently used
- // elements come to the front.
- // We fill the least recently used elements with the working set.
- // there might be elements which cannot be assigned to cache loc.
- // these elements are assigned -1.
- for (int j = 0; j < items_per_thread; j++) {
- int i = threadIdx.x + j * nthreads;
- int t_idx = block_offset + i;
- bool mask = (i < associativity);
- // whether this slot is available for writing
- mask = mask && (cache_time[t_idx] != time);
+#pragma once
- // rank[i] tells which element to store by this thread
- // we look up where is the corresponding key stored in the input array
- if (mask) {
- int k = find_nth_occurrence(cache_set, n, blockIdx.x, rank[i]);
- if (k > -1) {
- int key_val = keys[k];
- cached_keys[t_idx] = key_val;
- cache_idx[k] = t_idx;
- cache_time[t_idx] = time;
- }
- }
- }
-}
+#pragma message(__FILE__ \
+ " is deprecated and will be removed in a future release." \
+ " Please use the raft/util version instead.")
-/* Unnamed namespace is used to avoid multiple definition error for the
- following non-template function */
-namespace {
-/**
- * @brief Get the cache indices for keys stored in the cache.
- *
- * For every key, we look up the corresponding cache position.
- * If keys[k] is stored in the cache, then is_cached[k] is set to true, and
- * cache_idx[k] stores the corresponding cache idx.
- *
- * If keys[k] is not stored in the cache, then we assign a cache set to it.
- * This cache set is stored in cache_idx[k], and is_cached[k] is set to false.
- * In this case AssignCacheIdx should be called, to get an assigned position
- * within the cache set.
- *
- * Cache_time is assigned to the time input argument for all elements in idx.
- *
- * @param [in] keys array of keys that we want to look up in the cache, size [n]
- * @param [in] n number of keys to look up
- * @param [inout] cached_keys keys stored in the cache, size [n_cache_sets * associativity]
- * @param [in] n_cache_sets number of cache sets
- * @param [in] associativity number of keys in cache set
- * @param [inout] cache_time time stamp when the indices were cached, size [n_cache_sets *
- * associativity]
- * @param [out] cache_idx cache indices of the working set elements, size [n]
- * @param [out] is_cached whether the element is cached size[n]
- * @param [in] time iteration counter (used for time stamping)
- */
-__global__ void get_cache_idx(int* keys,
- int n,
- int* cached_keys,
- int n_cache_sets,
- int associativity,
- int* cache_time,
- int* cache_idx,
- bool* is_cached,
- int time)
-{
- int tid = threadIdx.x + blockIdx.x * blockDim.x;
- if (tid < n) {
- int widx = keys[tid];
- int sidx = hash(widx, n_cache_sets);
- int cidx = sidx * associativity;
- int i = 0;
- bool found = false;
- // search for empty spot and the least recently used spot
- while (i < associativity && !found) {
- found = (cache_time[cidx + i] > 0 && cached_keys[cidx + i] == widx);
- i++;
- }
- is_cached[tid] = found;
- if (found) {
- cidx = cidx + i - 1;
- cache_time[cidx] = time; // update time stamp
- cache_idx[tid] = cidx; // exact cache idx
- } else {
- cache_idx[tid] = sidx; // assign cache set
- }
- }
-}
-}; // end unnamed namespace
-}; // namespace cache
-}; // namespace raft
+#include
diff --git a/cpp/include/raft/sparse/hierarchy/detail/agglomerative.cuh b/cpp/include/raft/cluster/detail/agglomerative.cuh
similarity index 97%
rename from cpp/include/raft/sparse/hierarchy/detail/agglomerative.cuh
rename to cpp/include/raft/cluster/detail/agglomerative.cuh
index c8a1eb8304..618f852bba 100644
--- a/cpp/include/raft/sparse/hierarchy/detail/agglomerative.cuh
+++ b/cpp/include/raft/cluster/detail/agglomerative.cuh
@@ -16,9 +16,9 @@
#pragma once
-#include
-#include
-#include
+#include
+#include
+#include
#include
@@ -35,11 +35,7 @@
#include
-namespace raft {
-
-namespace hierarchy {
-namespace detail {
-
+namespace raft::cluster::detail {
template
class UnionFind {
public:
@@ -329,6 +325,4 @@ void extract_flattened_clusters(const raft::handle_t& handle,
}
}
-}; // namespace detail
-}; // namespace hierarchy
-}; // namespace raft
+}; // namespace raft::cluster::detail
diff --git a/cpp/include/raft/sparse/hierarchy/detail/connectivities.cuh b/cpp/include/raft/cluster/detail/connectivities.cuh
similarity index 86%
rename from cpp/include/raft/sparse/hierarchy/detail/connectivities.cuh
rename to cpp/include/raft/cluster/detail/connectivities.cuh
index f56366f21f..da8adf783d 100644
--- a/cpp/include/raft/sparse/hierarchy/detail/connectivities.cuh
+++ b/cpp/include/raft/cluster/detail/connectivities.cuh
@@ -16,18 +16,18 @@
#pragma once
-#include
-#include
-#include
+#include
+#include
+#include
#include
#include
-#include
+#include
+#include
#include
#include
-#include
-#include
+#include
#include
#include
@@ -35,11 +35,9 @@
#include
-namespace raft {
-namespace hierarchy {
-namespace detail {
+namespace raft::cluster::detail {
-template
+template
struct distance_graph_impl {
void run(const raft::handle_t& handle,
const value_t* X,
@@ -58,7 +56,7 @@ struct distance_graph_impl {
* @tparam value_t
*/
template
-struct distance_graph_impl {
+struct distance_graph_impl {
void run(const raft::handle_t& handle,
const value_t* X,
size_t m,
@@ -75,7 +73,7 @@ struct distance_graph_impl knn_graph_coo(stream);
- raft::sparse::selection::knn_graph(handle, X, m, n, metric, knn_graph_coo, c);
+ raft::sparse::spatial::knn_graph(handle, X, m, n, metric, knn_graph_coo, c);
indices.resize(knn_graph_coo.nnz, stream);
data.resize(knn_graph_coo.nnz, stream);
@@ -121,7 +119,7 @@ struct distance_graph_impl
+template
void get_distance_graph(const raft::handle_t& handle,
const value_t* X,
size_t m,
@@ -140,6 +138,4 @@ void get_distance_graph(const raft::handle_t& handle,
dist_graph.run(handle, X, m, n, metric, indptr, indices, data, c);
}
-}; // namespace detail
-}; // namespace hierarchy
-}; // namespace raft
+}; // namespace raft::cluster::detail
diff --git a/cpp/include/raft/cluster/detail/kmeans.cuh b/cpp/include/raft/cluster/detail/kmeans.cuh
index 303de77078..26005f58a0 100644
--- a/cpp/include/raft/cluster/detail/kmeans.cuh
+++ b/cpp/include/raft/cluster/detail/kmeans.cuh
@@ -27,19 +27,21 @@
#include
#include
-#include
+#include
#include
+#include
#include
+#include
#include
#include
-#include
-#include
+#include
#include
#include