Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.6.1 #490

Merged
merged 27 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
142ddc1
Merge pull request #444 from CrayLabs/master
al-rigazzi Dec 18, 2023
8fd7160
Fix index when installing torch through smart build (#449)
ashao Jan 8, 2024
4f3a9a1
Add concurrency group to test workflow (#439)
al-rigazzi Jan 9, 2024
9e550a8
Override sphinx-tabs background color (#453)
mellis13 Jan 10, 2024
c38f73f
Upgrade Machine Learning Dependencies (#451)
MattToast Jan 17, 2024
f683521
Enrich logging through context vars (#452)
ankona Jan 19, 2024
e107932
Remove Cobalt support (#448)
al-rigazzi Jan 19, 2024
cab2ef8
Update actions (#446)
al-rigazzi Jan 19, 2024
35973b5
Quality of life `smart validate` Improvements (#458)
MattToast Jan 20, 2024
92a3c99
Python 3.11 Support (#461)
MattToast Jan 22, 2024
e4d1646
Relax typing extensions required version (#459)
MattToast Jan 22, 2024
50aa382
Add isort/black check to github actions (#464)
amandarichardsonn Jan 25, 2024
092163b
Fixed Typehint for RunSettings.colocated_db_settings (#462)
amandarichardsonn Jan 29, 2024
7803f4d
Fix test_logs to prevent generation of dir (#467)
al-rigazzi Jan 30, 2024
b160c05
Expose Typehints (#468)
MattToast Jan 31, 2024
948d97c
Validate Slurm Timing format (#471)
amandarichardsonn Jan 31, 2024
106d70f
Add eval() to remove Torch warnings during testing (#472)
mellis13 Feb 2, 2024
3a4e828
Add support for Mac OSX on Apple Silicon (#465)
ashao Feb 2, 2024
b84b49f
Manifest: has DB objects refactor (#476)
MattToast Feb 6, 2024
18fe50a
Patch Changes for Release Prep (#477)
MattToast Feb 7, 2024
8408368
Use developer log level, protect logger defaults in test (#473)
al-rigazzi Feb 8, 2024
96d6ef0
Update license to include 2024 (#485)
AlyssaCote Feb 12, 2024
bbce88a
Duplicate DBModel/Script prevention (#475)
amandarichardsonn Feb 12, 2024
9015bd8
Update instructions for Apple Silicon (#479)
ashao Feb 13, 2024
42cdf84
Fix line-ending related errors on MacOS (ARM64) (#482)
ashao Feb 14, 2024
784fd4e
Update to changelog for release (#487)
amandarichardsonn Feb 15, 2024
a931387
Updating SmartSim version for release (#486)
amandarichardsonn Feb 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/build_docs.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#
# BSD 2-Clause License
#
# Copyright (c) 2021-2023, Hewlett Packard Enterprise
# Copyright (c) 2021-2024, Hewlett Packard Enterprise
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -39,11 +39,11 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
with:
fetch-depth: 0 # otherwise, there would be errors pushing refs to the destination repository.

- uses: actions/checkout@v2
- uses: actions/checkout@v4
with:
ref: doc
path: doc-branch
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#
# BSD 2-Clause License
#
# Copyright (c) 2021-2023, Hewlett Packard Enterprise
# Copyright (c) 2021-2024, Hewlett Packard Enterprise
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -56,8 +56,8 @@ jobs:
os: [ubuntu-20.04, macos-12]

steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: actions/checkout@v4
- uses: actions/setup-python@v5

- name: Install cibuildwheel
run: python -m pip install cibuildwheel>=2.12.3
Expand Down Expand Up @@ -93,9 +93,9 @@ jobs:
name: Build source distribution
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- uses: actions/setup-python@v2
- uses: actions/setup-python@v5
name: Install Python
with:
python-version: '3.8'
Expand Down
31 changes: 18 additions & 13 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#
# BSD 2-Clause License
#
# Copyright (c) 2021-2023, Hewlett Packard Enterprise
# Copyright (c) 2021-2024, Hewlett Packard Enterprise
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -34,6 +34,10 @@ on:
branches:
- develop

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
HOMEBREW_NO_ANALYTICS: "ON" # Make Homebrew installation a little quicker
HOMEBREW_NO_AUTO_UPDATE: "ON"
Expand All @@ -53,15 +57,14 @@ jobs:
os: [macos-12, ubuntu-20.04] # Operating systems
compiler: [8] # GNU compiler version
rai: [1.2.7] # Redis AI versions
py_v: [3.8, 3.9, '3.10'] # Python versions

py_v: ['3.8', '3.9', '3.10', '3.11'] # Python versions

env:
SMARTSIM_REDISAI: ${{ matrix.rai }}

steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.py_v }}

Expand Down Expand Up @@ -101,19 +104,12 @@ jobs:
# on developments of the client are brought in.
- name: Install SmartSim (with ML backends)
run: |

python -m pip install git+https://github.com/CrayLabs/SmartRedis.git@develop#egg=smartredis
python -m pip install .[dev,ml]


- name: Install ML Runtimes with Smart (with pt, tf, and onnx support)
if: (matrix.py_v != '3.10')
run: smart build --device cpu --onnx -v

- name: Install ML Runtimes with Smart (with pt and tf support)
if: (matrix.py_v == '3.10')
run: smart build --device cpu -v

- name: Run mypy
run: |
python -m pip install .[mypy]
Expand All @@ -122,6 +118,15 @@ jobs:
- name: Run Pylint
run: make check-lint

# Run isort/black style check
- name: Run isort
run: isort --check-only --profile black ./smartsim ./tests

# Run isort/black style check
- name: Run black
run: |
black --exclude smartsim/version.py --check ./smartsim ./tests

# Run pytest (backends subdirectory)
- name: Run Pytest
if: (matrix.subset == 'backends')
Expand Down Expand Up @@ -151,7 +156,7 @@ jobs:
retention-days: 5

- name: Upload Pytest coverage to Codecov
uses: codecov/codecov-action@v2
uses: codecov/codecov-action@v3.1.4
with:
fail_ci_if_error: false
files: ./coverage.xml
2 changes: 1 addition & 1 deletion .pylintrc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# BSD 2-Clause License
#
# Copyright (c) 2021-2023 Hewlett Packard Enterprise
# Copyright (c) 2021-2024 Hewlett Packard Enterprise
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
Expand Down
7 changes: 3 additions & 4 deletions .wci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
Machine Learning (ML) libraries, like PyTorch and TensorFlow,
in combination with High Performance Computing (HPC) simulations and applications.
SmartSim launches ML infrastructure on HPC systems alongside user workloads
and supports most HPC workload managers (e.g. Slurm, PBSPro, LSF, Cobalt).
and supports most HPC workload managers (e.g. Slurm, PBSPro, LSF).
SmartSim also provides a set of client libraries in Python, C++, C, and Fortran.
These client libraries allow users to send and receive data between user
applications and the machine learning infrastructure. Moreover, the
Expand All @@ -22,8 +22,8 @@
language: Python

release:
version: 0.6.0
date: 2023-12-18
version: 0.6.1
date: 2024-02-15

documentation:
general: https://www.craylabs.org/docs/overview.html
Expand All @@ -41,7 +41,6 @@
- Slurm
- PBSPro
- LSF
- Cobalt
- Linux/MacOS
transfer_protocols:
- TCP/IP
Expand Down
2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
BSD 2-Clause License

Copyright (c) 2021-2023, Hewlett Packard Enterprise
Copyright (c) 2021-2024, Hewlett Packard Enterprise
All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down
10 changes: 7 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# BSD 2-Clause License
#
# Copyright (c) 2021-2023, Hewlett Packard Enterprise
# Copyright (c) 2021-2024, Hewlett Packard Enterprise
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -66,6 +66,10 @@ clobber: clean
# help:
# help: Style
# help: -------
# help: check-all - Performs all the style-related checks
.PHONY: check-all
check-all: check-style check-format check-sort-imports check-lint check-mypy
$(info All style checks PASSED)

# help: style - Sort imports and format with black
.PHONY: style
Expand Down Expand Up @@ -146,11 +150,11 @@ tutorials-dev:
@docker compose build tutorials-dev
@docker run -p 8888:8888 smartsim-tutorials:dev-latest

# help: tutorials-prod - Build and start a docker container to run the tutorials (v0.6.0)
# help: tutorials-prod - Build and start a docker container to run the tutorials (v0.6.1)
.PHONY: tutorials-prod
tutorials-prod:
@docker compose build tutorials-prod
@docker run -p 8888:8888 smartsim-tutorials:v0.6.0
@docker run -p 8888:8888 smartsim-tutorials:v0.6.1


# help:
Expand Down
21 changes: 10 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ before using it on your system. Each tutorial is a Jupyter notebook that can be
which will run a jupyter lab with the tutorials, SmartSim, and SmartRedis installed.

```bash
docker pull ghcr.io/craylabs/smartsim-tutorials:v0.4.1
docker run -p 8888:8888 ghcr.io/craylabs/smartsim-tutorials:v0.4.1
docker pull ghcr.io/craylabs/smartsim-tutorials:latest
docker run -p 8888:8888 ghcr.io/craylabs/smartsim-tutorials:latest
# click on link to open jupyter lab
```

Expand Down Expand Up @@ -179,7 +179,6 @@ launch capabilities for all applications.
- Slurm
- LSF
- PBSPro
- Cobalt
- Local (for laptops/single node, no batch)


Expand All @@ -198,7 +197,7 @@ qsub -l select=3:ncpus=20 -l walltime=00:10:00 -l place=scatter -I -q <queue>
bsub -Is -W 00:10 -nnodes 3 -P <project> $SHELL
```

This same script will run on a SLURM, PBS, LSF, or Cobalt system as the ``launcher``
This same script will run on a SLURM, PBS, or LSF system as the ``launcher``
is set to `auto` in the [Experiment](https://www.craylabs.org/docs/api/smartsim_api.html#experiment)
initialization. The run command like ``mpirun``,
``aprun`` or ``srun`` will be automatically detected from what is available on the
Expand Down Expand Up @@ -277,8 +276,8 @@ print(exp.get_status(ensemble))
python hello_ensemble.py
```

Similar to the interactive example, this same script will run on a SLURM, PBS, LSF,
or Cobalt system as the ``launcher`` is set to `auto` in the
Similar to the interactive example, this same script will run on a SLURM, PBS,
or LSF system as the ``launcher`` is set to `auto` in the
[Experiment](https://www.craylabs.org/docs/api/smartsim_api.html#experiment)
initialization. Local launching does not support batch workloads.

Expand Down Expand Up @@ -452,8 +451,8 @@ Each tutorial is a Jupyter notebook that can be run through the
which will run a jupyter lab with the tutorials, SmartSim, and SmartRedis installed.

```bash
docker pull ghcr.io/craylabs/smartsim-tutorials:v1
docker run -p 8888:8888 ghcr.io/craylabs/smartsim-tutorials:v0.4.1
docker pull ghcr.io/craylabs/smartsim-tutorials:latest
docker run -p 8888:8888 ghcr.io/craylabs/smartsim-tutorials:latest
```
Each of the following examples can be found in the
[SmartSim documentation](https://www.craylabs.org/docs/tutorials/getting_started/getting_started.html).
Expand Down Expand Up @@ -640,15 +639,15 @@ from C, C++, Fortran and Python with the SmartRedis Clients:
<tr>
<td rowspan="3">1.2.7</td>
<td>PyTorch</td>
<td>1.11.x</td>
<td>2.0.1</td>
</tr>
<tr>
<td>TensorFlow\Keras</td>
<td>2.8.x</td>
<td>2.13.1</td>
</tr>
<tr>
<td>ONNX</td>
<td>1.11.x</td>
<td>1.14.1</td>
</tr>
</tbody>
</table>
Expand Down
44 changes: 9 additions & 35 deletions conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# BSD 2-Clause License
#
# Copyright (c) 2021-2023, Hewlett Packard Enterprise
# Copyright (c) 2021-2024, Hewlett Packard Enterprise
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -101,7 +101,7 @@ def print_test_configuration() -> None:

def pytest_configure() -> None:
pytest.test_launcher = test_launcher
pytest.wlm_options = ["slurm", "pbs", "cobalt", "lsf", "pals"]
pytest.wlm_options = ["slurm", "pbs", "lsf", "pals"]
account = get_account()
pytest.test_account = account
pytest.test_device = test_device
Expand Down Expand Up @@ -153,12 +153,7 @@ def kill_all_test_spawned_processes() -> None:
def get_hostlist() -> t.Optional[t.List[str]]:
global test_hostlist
if not test_hostlist:
if "COBALT_NODEFILE" in os.environ:
try:
return _parse_hostlist_file(os.environ["COBALT_NODEFILE"])
except FileNotFoundError:
return None
elif "PBS_NODEFILE" in os.environ and test_launcher == "pals":
if "PBS_NODEFILE" in os.environ and test_launcher == "pals":
# with PALS, we need a hostfile even if `aprun` is available
try:
return _parse_hostlist_file(os.environ["PBS_NODEFILE"])
Expand Down Expand Up @@ -269,27 +264,14 @@ def get_base_run_settings(
run_args = {"--np": ntasks, "--hostfile": host_file}
run_args.update(kwargs)
return RunSettings(exe, args, run_command="mpiexec", run_args=run_args)
if test_launcher == "cobalt":
if shutil.which("aprun"):
run_command = "aprun"
run_args = {"--pes": ntasks}
else:
run_command = "mpirun"
host_file = os.environ["COBALT_NODEFILE"]
run_args = {"-n": ntasks, "--hostfile": host_file}
run_args.update(kwargs)
settings = RunSettings(
exe, args, run_command=run_command, run_args=run_args
)
return settings
if test_launcher == "lsf":
run_args = {"--np": ntasks, "--nrs": nodes}
run_args.update(kwargs)
settings = RunSettings(exe, args, run_command="jsrun", run_args=run_args)
return settings
if test_launcher != "local":
raise SSConfigError(
"Base run settings are available for Slurm, PBS, Cobalt, "
"Base run settings are available for Slurm, PBS, "
f"and LSF, but launcher was {test_launcher}"
)
# TODO allow user to pick aprun vs MPIrun
Expand Down Expand Up @@ -320,18 +302,6 @@ def get_run_settings(
run_args = {"np": ntasks, "hostfile": host_file}
run_args.update(kwargs)
return PalsMpiexecSettings(exe, args, run_args=run_args)
# TODO allow user to pick aprun vs MPIrun
if test_launcher == "cobalt":
if shutil.which("aprun"):
run_args = {"pes": ntasks}
run_args.update(kwargs)
return AprunSettings(exe, args, run_args=run_args)

host_file = os.environ["COBALT_NODEFILE"]
run_args = {"n": ntasks, "hostfile": host_file}
run_args.update(kwargs)
return MpirunSettings(exe, args, run_args=run_args)

if test_launcher == "lsf":
run_args = {
"nrs": nodes,
Expand All @@ -344,7 +314,7 @@ def get_run_settings(

@staticmethod
def get_orchestrator(nodes: int = 1, batch: bool = False) -> Orchestrator:
if test_launcher in ["pbs", "cobalt"]:
if test_launcher == "pbs":
if not shutil.which("aprun"):
hostlist = get_hostlist()
else:
Expand Down Expand Up @@ -698,3 +668,7 @@ def setup_test_colo(
assert colo_model.colocated
# Check to make sure that limit_db_cpus made it into the colo settings
return colo_model

@pytest.fixture
def config() -> smartsim._core.config.Config:
return CONFIG
7 changes: 7 additions & 0 deletions doc/_static/custom_tab_style.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.sphinx-tabs-panel {
background-color: inherit;
}

.sphinx-tabs-tab[aria-selected="true"] {
background-color: inherit;
}
Loading
Loading