Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update smartsim API with core changes #721

Closed
wants to merge 27 commits into from

Conversation

juliaputko
Copy link
Contributor

No description provided.

ashao and others added 7 commits September 2, 2024 11:47
The release of watchdog v5 introduced new types which caused further
errors with mypy. To mitigate these errors for now, we pin the watchdog
version to 4.x and will resolve these errors in the future.

[ committed by @ashao ]
[ reviewed by @al-rigazzi ]
Allow specifying Model and Ensemble parameters 
with number-like types. The constructors for 
parameters on Model and Ensemble now validate 
that the input is number-like and convert them to 
strings.

[ committed by @juliaputko ]
[ reviewed by @ashao]
- The RedisAIBuilder class was completely overhauled to allow users to
  express a wider range of support for hardware/software stacks. This 
  will be extended to support ROCm, CUDA-11, and CUDA-12.
- Versions for each of these packages are no longer specified in an
  internal class. Instead a default set of JSON files specifies the
sources and versions. Users can specify their own custom specifications
  at smart build time

---------

[ committed by @ashao ]
[ reviewed by @MattToast @juliaputko ]

Co-authored-by: Matt Drozt <[email protected]>
Co-authored-by: Julia Putko <[email protected]>
After discussing with admins at OLCF, miniforge is the preferred
solution for creating virtual environments on Frontier. The instructions
for installing SmartSim have been updated accordingly. Additionally,
perlmutter did not have a step for compiling the SmartRedis libraries.
This has been rectified to bring the two systems to parity.

[ committed by @ashao ]
[ reviewed by @MattToast @AlyssaCote ]
On Frontier, the recommended way to activate conda environments is
to go through source activate. This also means that ``conda init``
is not needed. The instructions for Frontier have been updated to 
reflect this.

[ committed by @ashao ]
[ reviewed by @MattToast ]
Bump the version number for the release, last minute actions and docs fixes

[ committed by @MattToast ]
[ reviewed by @ashao ]
Copy link

codecov bot commented Sep 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.32%. Comparing base (cce16e6) to head (8db1ff1).
Report is 19 commits behind head on smartsim-refactor.

Additional details and impacted files

Impacted file tree graph

@@                  Coverage Diff                  @@
##           smartsim-refactor     #721      +/-   ##
=====================================================
+ Coverage              40.45%   47.32%   +6.87%     
=====================================================
  Files                    110      109       -1     
  Lines                   7326     6573     -753     
=====================================================
+ Hits                    2964     3111     +147     
+ Misses                  4362     3462     -900     
Files with missing lines Coverage Δ
smartsim/builders/ensemble.py 93.04% <ø> (ø)

... and 15 files with indirect coverage changes

ashao and others added 20 commits September 26, 2024 15:46
Based on feedback from OLCF, users may need to se the MIOPEN cache prior
to running `smart validate`. The installation instructions for Frontier
have been updated accordingly.

[ committed by @ashao ]
[ reviewed by @MattToast ]
Removes the use of CI Build Wheel now that SmartSim is a pure python
package.

[ committed by @MattToast ]
[ reviewed by @ashao ]
Merge develop to master for release

[ committed by @MattToast ]
[ reviewed by @al-rigazzi ]
This PR brings develop up to date with master for release.
Scylla is in a preliminary state and so needs some specific instructions
to help install SmartSim with CUDA support. The directions included
here are preliminary and will be updated as needed.

[ committed by @ashao ]
[ reviewed by @MattToast @amandarichardsonn ]
In libtensorflow, the `input` argument to `TF_SessionRun` seems to be
mistyped to `TF_Output` instead of `TF_Input`. These two types differ
only in name. GCC-14 catches this and throws an error, even though
earlier versions allow this. To solve this problem, patches are applied
to the Tensorflow backend in RedisAI. Future versions of Tensorflow may
fix this problem, but for now this seems to be the best workaround.

[ committed by @ashao ]
[ reviewed by @MattToast ]
Create a v1.0 branch to combine ongoing efforts in `mli-feature` and
`smartsim-refactor` feature branches

---------

Co-authored-by: Alyssa Cote <[email protected]>
Co-authored-by: Al Rigazzi <[email protected]>
Combine the `core-refactor` feature branch with `mli-feature` in `v1.0`
branch

---------

Co-authored-by: Amanda Richardson <[email protected]>
Co-authored-by: Amanda Richardson <[email protected]>
Co-authored-by: Matt Drozt <[email protected]>
Co-authored-by: Julia Putko <[email protected]>
Co-authored-by: amandarichardsonn <[email protected]>
Co-authored-by: Alyssa Cote <[email protected]>
Co-authored-by: Al Rigazzi <[email protected]>
Co-authored-by: Julia Putko <[email protected]>
Co-authored-by: Matt Drozt <[email protected]>
Logs are improved during dragon install when there is a platform and asset type mismatch.

[ committed by @AlyssaCote ]
[ reviewed by @ankona ]
@juliaputko juliaputko closed this Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants