Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.8.0 #723

Merged
merged 20 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
c61e636
Merge master into develop (#586)
al-rigazzi May 15, 2024
fbcc46a
Update tutorials and tutorial containers (#589)
al-rigazzi May 20, 2024
54755ad
Fix build error caused by use of deprecated pkg_resources (#598)
ankona May 23, 2024
7d995bb
Building SmartSim without ML backends (#601)
m-kurz May 24, 2024
34987e7
Fix util-tests outputs appearing in root directory (#614)
ankona Jun 13, 2024
0956399
Implement support for SGE (#610)
ashao Jun 17, 2024
8423eb4
Restrict to numpy 1.x (#623)
ashao Jun 25, 2024
96b37c2
Remove broken redis documentation links (#627)
ankona Jul 2, 2024
c0584cc
Add ability to specify hardware policies on dragon run requests (#638)
ankona Jul 17, 2024
723544e
More easily discoverable dependencies (#635)
ashao Jul 18, 2024
d7d979e
Fix-hostname (#642)
al-rigazzi Jul 18, 2024
6f6722c
Mitigate dragon/numpy, mypy/typing_extension dependency issues (#653)
ankona Jul 31, 2024
fde9f2e
Remove builder from setup.py (#654)
ashao Aug 6, 2024
6abbd77
Update codecov to v4.5.0 (#657)
mellis13 Aug 7, 2024
c2ab99b
Pin watchdog version to prevent mypy errors (#690)
ashao Sep 2, 2024
72be515
Add Type Checking to Params on Model (#676)
juliaputko Sep 5, 2024
5fb8eb4
Extend smart build to CUDA-11, CUDA-12, and ROCm (#669)
ashao Sep 19, 2024
2f68c08
Refine install documentation for Perlmutter and Frontier (#717)
ashao Sep 23, 2024
7c28d5b
Change 'conda activate' to 'source activate' for Frontier (#719)
ashao Sep 25, 2024
e8eaa2b
Bump version number to 0.8.0 (#718)
MattToast Sep 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ jobs:
CIBW_ENVIRONMENT_MACOS: PATH="$(brew --prefix)/opt/make/libexec/gnubin:$PATH"
MACOSX_DEPLOYMENT_TARGET: "10.09"

- uses: actions/upload-artifact@v2
- uses: actions/upload-artifact@v3
with:
path: ./wheelhouse/*.whl

Expand All @@ -105,7 +105,7 @@ jobs:
python -m pip install cmake>=3.13
python setup.py sdist

- uses: actions/upload-artifact@v2
- uses: actions/upload-artifact@v3
with:
path: dist/*.tar.gz

Expand All @@ -114,7 +114,7 @@ jobs:
needs: [build_wheels, build_sdist]
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v2
- uses: actions/download-artifact@v3
with:
name: artifact
path: dist
Expand Down
20 changes: 6 additions & 14 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,11 @@ env:
HOMEBREW_NO_GITHUB_API: "ON"
HOMEBREW_NO_INSTALL_CLEANUP: "ON"
DEBIAN_FRONTEND: "noninteractive" # Disable interactive apt install sessions
GIT_CLONE_PROTECTION_ACTIVE: false

jobs:
run_tests:
name: Run tests ${{ matrix.subset }} with ${{ matrix.os }}, Python ${{ matrix.py_v}}, RedisAI ${{ matrix.rai }}
name: Run tests ${{ matrix.subset }} with ${{ matrix.os }}, Python ${{ matrix.py_v}}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand All @@ -62,9 +63,6 @@ jobs:
- os: macos-14
py_v: "3.9"

env:
SMARTSIM_REDISAI: ${{ matrix.rai }}

steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
Expand Down Expand Up @@ -108,19 +106,13 @@ jobs:
- name: Install SmartSim (with ML backends)
run: |
python -m pip install git+https://github.com/CrayLabs/SmartRedis.git@develop#egg=smartredis
python -m pip install .[dev,ml]

- name: Install ML Runtimes with Smart (with pt, tf, and onnx support)
if: contains( matrix.os, 'ubuntu' ) || contains( matrix.os, 'macos-12')
run: smart build --device cpu --onnx -v
python -m pip install .[dev,mypy]

- name: Install ML Runtimes with Smart (no ONNX,TF on Apple Silicon)
if: contains( matrix.os, 'macos-14' )
run: smart build --device cpu --no_tf -v
- name: Install ML Runtimes
run: smart build --device cpu -v

- name: Run mypy
run: |
python -m pip install .[mypy]
make check-mypy

- name: Run Pylint
Expand Down Expand Up @@ -164,7 +156,7 @@ jobs:
retention-days: 5

- name: Upload Pytest coverage to Codecov
uses: codecov/codecov-action@v3.1.4
uses: codecov/codecov-action@v4.5.0
with:
fail_ci_if_error: false
files: ./coverage.xml
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ tests/test_output
# Dependencies
smartsim/_core/.third-party
smartsim/_core/.dragon
smartsim/_core/build

# Docs
_build
Expand Down
6 changes: 1 addition & 5 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ build:
- git clone --depth 1 https://github.com/CrayLabs/SmartRedis.git smartredis
- git clone --depth 1 https://github.com/CrayLabs/SmartDashboard.git smartdashboard
post_create_environment:
- python -m pip install .[dev]
- python -m pip install .[dev,docs]
- cd smartredis; python -m pip install .
- cd smartredis/doc; doxygen Doxyfile_c; doxygen Doxyfile_cpp; doxygen Doxyfile_fortran
- ln -s smartredis/examples ./examples
Expand All @@ -37,7 +37,3 @@ build:
sphinx:
configuration: doc/conf.py
fail_on_warning: true

python:
install:
- requirements: doc/requirements-doc.txt
4 changes: 2 additions & 2 deletions .wci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@
language: Python

release:
version: 0.7.0
date: 2024-05-14
version: 0.8.0
date: 2024-09-25

documentation:
general: https://www.craylabs.org/docs/overview.html
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -150,11 +150,11 @@ tutorials-dev:
@docker compose build tutorials-dev
@docker run -p 8888:8888 smartsim-tutorials:dev-latest

# help: tutorials-prod - Build and start a docker container to run the tutorials (v0.7.0)
# help: tutorials-prod - Build and start a docker container to run the tutorials (v0.8.0)
.PHONY: tutorials-prod
tutorials-prod:
@docker compose build tutorials-prod
@docker run -p 8888:8888 smartsim-tutorials:v0.7.0
@docker run -p 8888:8888 smartsim-tutorials:v0.8.0


# help:
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -643,11 +643,11 @@ from C, C++, Fortran and Python with the SmartRedis Clients:
<tr>
<td rowspan="3">1.2.7</td>
<td>PyTorch</td>
<td>2.0.1</td>
<td>2.1.0</td>
</tr>
<tr>
<td>TensorFlow\Keras</td>
<td>2.13.1</td>
<td>2.15.0</td>
</tr>
<tr>
<td>ONNX</td>
Expand Down
2 changes: 1 addition & 1 deletion conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def print_test_configuration() -> None:

def pytest_configure() -> None:
pytest.test_launcher = test_launcher
pytest.wlm_options = ["slurm", "pbs", "lsf", "pals", "dragon"]
pytest.wlm_options = ["slurm", "pbs", "lsf", "pals", "dragon", "sge"]
account = get_account()
pytest.test_account = account
pytest.test_device = test_device
Expand Down
4 changes: 3 additions & 1 deletion doc/_static/version_names.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{
"version_names":[
"develop (unstable)",
"0.7.0 (stable)",
"0.8.0 (stable)",
"0.7.0",
"0.6.2",
"0.6.1",
"0.6.0",
Expand All @@ -15,6 +16,7 @@
"version_urls": [
"https://www.craylabs.org/develop/overview.html",
"https://www.craylabs.org/docs/overview.html",
"https://www.craylabs.org/docs/versions/0.7.0/overview.html",
"https://www.craylabs.org/docs/versions/0.6.2/overview.html",
"https://www.craylabs.org/docs/versions/0.6.1/overview.html",
"https://www.craylabs.org/docs/versions/0.6.0/overview.html",
Expand Down
109 changes: 109 additions & 0 deletions doc/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,119 @@ Jump to:

## SmartSim

### 0.8.0

Released on 25 September, 2024

Description

- Refine Frontier documentation for proper use of miniforge3
- Refactor to the RedisAI build to allow more flexibility in versions
and sources of ML backends
- Add Dockerfiles with GPU support
- Fine grain build support for GPUs
- Update Torch to 2.1.0, Tensorflow to 2.15.0
- Better error messages in build process
- Allow specifying Model and Ensemble parameters with
number-like types (e.g. numpy types)
- Pin watchdog to 4.x
- Update codecov to 4.5.0
- Remove build of Redis from setup.py
- Mitigate dependency installation issues
- Fix internal host name representation for Dragon backend
- Make dependencies more discoverable in setup.py
- Add hardware pinning capability when using dragon
- Pin NumPy version to 1.x
- New launcher support for SGE (and similar derivatives)
- Fix test outputs being created in incorrect directory
- Improve support for building SmartSim without ML backends
- Update packaging dependency
- Remove broken oss.redis.com URI blocking documentation generation

Detailed Notes

- On Frontier, the recommended way to activate conda environments is
to go through source activate. This also means that ``conda init``
is not needed. The instructions for Frontier have been updated to
reflect this.
([SmartSim-PR719](https://github.com/CrayLabs/SmartSim/pull/719))
- The RedisAIBuilder class was completely overhauled to allow users to
express a wider range of support for hardware/software stacks. This
will be extended to support ROCm, CUDA-11, and CUDA-12.
([SmartSim-PR669](https://github.com/CrayLabs/SmartSim/pull/669))
- Versions for each of these packages are no longer specified in an
internal class. Instead a default set of JSON files specifies the
sources and versions. Users can specify their own custom specifications
at smart build time
([SmartSim-PR669](https://github.com/CrayLabs/SmartSim/pull/669))
- Two new Dockerfiles are now provided (one each for 11.8 and 12.1) that
can be used to build a container to run the tutorials. No HPC support
should be expected at this time
([SmartSim-PR669](https://github.com/CrayLabs/SmartSim/pull/669))
- As a result of the previous change, SmartSim now requires C++17 and a
minimum Cuda version of 11.8 in order to build Torch 2.1.0.
([SmartSim-PR669](https://github.com/CrayLabs/SmartSim/pull/669))
- Error messages were not being interpolated correctly. This has been
addressed to provide more context when exposing error messages to users.
([SmartSim-PR669](https://github.com/CrayLabs/SmartSim/pull/669))
- The serializer would fail if a parameter for a Model or Ensemble
was specified as a numpy dtype. The constructors for these
methods now validate that the input is number-like and convert
them to strings
([SmartSim-PR676](https://github.com/CrayLabs/SmartSim/pull/676))
- Pin watchdog to 4.x because v5 introduces new types and requires
updates to the type-checking
([SmartSim-PR690](https://github.com/CrayLabs/SmartSim/pull/690))
- Update codecov to 4.5.0 to mitigate GitHub action failure
([SmartSim-PR657](https://github.com/CrayLabs/SmartSim/pull/657))
- The builder module was included in setup.py to allow us to ship the
main Redis binaries (not RedisAI) with installs from PyPI. To
allow easier maintenance of this file and enable future complexity
this has been removed. The Redis binaries will thus be built
by users during the `smart build` step
- Installation of mypy or dragon in separate build actions caused
some dependencies (typing_extensions, numpy) to be upgraded and
caused runtime failures. The build actions were tweaked to include
all optional dependencies to be considered by pip during resolution.
Additionally, the numpy version was capped on dragon installations.
([SmartSim-PR653](https://github.com/CrayLabs/SmartSim/pull/653))
- setup.py used to define dependencies in a way that was not amenable
to code scanning tools. Direct dependencies now appear directly
in the setup call and the definition of the SmartRedis version
has been removed
([SmartSim-PR635](https://github.com/CrayLabs/SmartSim/pull/635))
- The separate definition of dependencies for the docs in
requirements-doc.txt is now defined as an extra.
([SmartSim-PR635](https://github.com/CrayLabs/SmartSim/pull/635))
- The new major version release of Numpy is incompatible with modules
compiled against Numpy 1.x. For both SmartSim and SmartRedis we
request a 1.x version of numpy. This is needed in SmartSim because
some of the downstream dependencies request NumPy
([SmartSim-PR623](https://github.com/CrayLabs/SmartSim/pull/623))
- SGE is now a supported launcher for SmartSim. Users can now define
BatchSettings which will be monitored by the TaskManager. Additionally,
if the MPI implementation was built with SGE support, Orchestrators can
use `mpirun` without needing to specify the hosts
([SmartSim-PR610](https://github.com/CrayLabs/SmartSim/pull/610))
- Ensure outputs from tests are written to temporary `tests/test_output` directory
- Fix an error that would prevent ``smart build`` from moving a successfully
compiled RedisAI shared object to the install location expected by SmartSim
if no ML backend installations were found. Previously, this would effectively
require users to build and install an ML backend to use the SmartSim
orchestrator even if it was not necessary for their workflow. Users can
install SmartSim without ML backends by running
``smart build --no_tf --no_pt`` and the RedisAI shared object will now be
placed in the expected location.
([SmartSim-PR601](https://github.com/CrayLabs/SmartSim/pull/601))
- Fix packaging failures due to deprecated `pkg_resources`. ([SmartSim-PR598](https://github.com/CrayLabs/SmartSim/pull/598))

### 0.7.0

Released on 14 May, 2024

Description

- Update tutorials and tutorial containers
- Improve Dragon server shutdown
- Add dragon runtime installer
- Add launcher based on Dragon
Expand Down Expand Up @@ -64,6 +171,8 @@ Description

Detailed Notes

- The tutorials are up-to date with SmartSim and SmartRedis APIs. Additionally,
the tutorial containers' Docker files are updated. ([SmartSim-PR589](https://github.com/CrayLabs/SmartSim/pull/589))
- The Dragon server will now terminate any process which is still running
when a request of an immediate shutdown is sent. ([SmartSim-PR582](https://github.com/CrayLabs/SmartSim/pull/582))
- Add `--dragon` option to `smart build`. Install appropriate Dragon
Expand Down
2 changes: 1 addition & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
import smartsim
version = smartsim.__version__
except ImportError:
version = "0.7.0"
version = "0.8.0"

# The full version, including alpha/beta/rc tags
release = version
Expand Down
28 changes: 28 additions & 0 deletions doc/dragon.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,34 @@ In the next sections, we detail how Dragon is integrated into SmartSim.

For more information on HPC launchers, visit the :ref:`Run Settings<run_settings_hpc_ex>` page.

Hardware Pinning
================

Dragon also enables users to specify hardware constraints using ``DragonRunSettings``. CPU
and GPU affinity can be specified using the ``DragonRunSettings`` object. The following
example demonstrates how to specify CPU affinity and GPU affinities simultaneously. Note
that affinities are passed as a list of device indices.

.. code-block:: python

# Because "dragon" was specified as the launcher during Experiment initialization,
# create_run_settings will return a DragonRunSettings object
rs = exp.create_run_settings(exe="mpi_app",
exe_args=["--option", "value"],
env_vars={"MYVAR": "VALUE"})

# Request the first 8 CPUs for this job
rs.set_cpu_affinity(list(range(9)))

# Request the first two GPUs on the node for this job
rs.set_gpu_affinity([0, 1])

.. note::

SmartSim launches jobs in the order they are received on the first available
host in a round-robin pattern. To ensure a process is launched on a node with
specific features, configure a hostname constraint.

=================
The Dragon Server
=================
Expand Down
Loading
Loading