Skip to content

Commit

Permalink
Extend smart build to CUDA-11, CUDA-12, and ROCm (#669)
Browse files Browse the repository at this point in the history
- The RedisAIBuilder class was completely overhauled to allow users to
  express a wider range of support for hardware/software stacks. This 
  will be extended to support ROCm, CUDA-11, and CUDA-12.
- Versions for each of these packages are no longer specified in an
  internal class. Instead a default set of JSON files specifies the
sources and versions. Users can specify their own custom specifications
  at smart build time

---------

[ committed by @ashao ]
[ reviewed by @MattToast @juliaputko ]

Co-authored-by: Matt Drozt <[email protected]>
Co-authored-by: Julia Putko <[email protected]>
  • Loading branch information
3 people authored Sep 19, 2024
1 parent 72be515 commit 5fb8eb4
Show file tree
Hide file tree
Showing 51 changed files with 2,534 additions and 1,970 deletions.
16 changes: 4 additions & 12 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ env:

jobs:
run_tests:
name: Run tests ${{ matrix.subset }} with ${{ matrix.os }}, Python ${{ matrix.py_v}}, RedisAI ${{ matrix.rai }}
name: Run tests ${{ matrix.subset }} with ${{ matrix.os }}, Python ${{ matrix.py_v}}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand All @@ -63,9 +63,6 @@ jobs:
- os: macos-14
py_v: "3.9"

env:
SMARTSIM_REDISAI: ${{ matrix.rai }}

steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
Expand Down Expand Up @@ -109,15 +106,10 @@ jobs:
- name: Install SmartSim (with ML backends)
run: |
python -m pip install git+https://github.com/CrayLabs/SmartRedis.git@develop#egg=smartredis
python -m pip install .[dev,mypy,ml]
- name: Install ML Runtimes with Smart (with pt, tf, and onnx support)
if: contains( matrix.os, 'ubuntu' ) || contains( matrix.os, 'macos-12')
run: smart build --device cpu --onnx -v
python -m pip install .[dev,mypy]
- name: Install ML Runtimes with Smart (no ONNX,TF on Apple Silicon)
if: contains( matrix.os, 'macos-14' )
run: smart build --device cpu --no_tf -v
- name: Install ML Runtimes
run: smart build --device cpu -v

- name: Run mypy
run: |
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ tests/test_output
# Dependencies
smartsim/_core/.third-party
smartsim/_core/.dragon
smartsim/_core/build

# Docs
_build
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -643,11 +643,11 @@ from C, C++, Fortran and Python with the SmartRedis Clients:
<tr>
<td rowspan="3">1.2.7</td>
<td>PyTorch</td>
<td>2.0.1</td>
<td>2.1.0</td>
</tr>
<tr>
<td>TensorFlow\Keras</td>
<td>2.13.1</td>
<td>2.15.0</td>
</tr>
<tr>
<td>ONNX</td>
Expand Down
33 changes: 33 additions & 0 deletions doc/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,39 @@ Jump to:

## SmartSim

### Cuda 12 and ROCm support branch

To be merged into `develop` at some future point in time

Description

- Refactor to the RedisAI build to allow more flexibility in versions
and sources of ML backends
- Add Dockerfiles with GPU support
- Fine grain build support for GPUs
- Update Torch to 2.1.0, Tensorflow to 2.15.0
- Better error messages in build process

Detailed Notes

- The RedisAIBuilder class was completely overhauled to allow users to
express a wider range of support for hardware/software stacks. This
will be extended to support ROCm, CUDA-11, and CUDA-12.
- Versions for each of these packages are no longer specified in an
internal class. Instead a default set of JSON files specifies the
sources and versions. Users can specify their own custom specifications
at smart build time
- Two new Dockerfiles are now provided (one each for 11.8 and 12.1) that
can be used to build a container to run the tutorials. No HPC support
should be expected at this time
- SmartSim can now be built using Cuda version 11.8 or Cuda 12.1 by specify
`smart build --device=cuda118` or `smart build --device=cuda121`. The
original `smart build --device=gpu` will default to using Cuda 11.8.
- As a result of the previous change, SmartSim now requires C++17 and a
minimum Cuda version of 11.8 in order to build Torch 2.1.0.
- Error messages were not being interpolated correctly. This has been
addressed to provide more context when exposing error messages to users.

### Development branch

To be released at some future point in time
Expand Down
Loading

0 comments on commit 5fb8eb4

Please sign in to comment.