Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update install notes #749

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

ashao
Copy link
Member

@ashao ashao commented Oct 16, 2024

Some of the instructions for installing SmartSim were stale. Additionally, a user had requested how SmartSim could be installed on a machine that was airgapped from the internet. The install notes have been updated and slightly reorganized to aid discovery.

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.86%. Comparing base (d7d979e) to head (4952a13).
Report is 16 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #749      +/-   ##
===========================================
- Coverage    83.91%   81.86%   -2.05%     
===========================================
  Files           83       84       +1     
  Lines         6284     7075     +791     
===========================================
+ Hits          5273     5792     +519     
- Misses        1011     1283     +272     

see 48 files with indirect coverage changes

need to retrieve all of the build dependencies themselves. Some machines
have specific environment variables and/or configuration settings that need
to be set for optimal performance. The below machines have vetted
instructions, please feel free to contribute instructions for your own
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the last sentence, two sentences -> I would also link to how to contribute

@ashao ashao force-pushed the transfer_redis_artifacts branch from d6bc0a4 to 1a66edb Compare October 31, 2024 16:20
@ashao ashao force-pushed the transfer_redis_artifacts branch from 1a66edb to 4952a13 Compare October 31, 2024 16:22
Copy link
Contributor

@amandarichardsonn amandarichardsonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the bolts of Frankenstein’s lab, these installation notes are a masterpiece! This will undoubtedly electrify the installation process for everyone. Well done!

this seems to be hardcoded to `gcc` and `g++` in the Redis build so ensure that
`which gcc g++` do not point to Apple Clang.
We suggest using GCC to build Redis, RedisAI, and the ML backends. For specific
version requirements see the :ref:`Requirements <requirements>` section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not look like there is a Requirements section, and it instead takes you to the ML Library Support/Linux section -> should this link instead point to the ML Library Support section?

for these GPUs often depends on the version of the CUDA or ROCm stack that is availble on your
machine. In _most_ cases, the versions backwards compatible. If you encounter problems, please
contact us and we can build the backend libraries for your desired version of CUDA and ROCm.
SmartSim supports using Nvidia and AMD GPUs when using RedisAI for GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SmartSim supports using Nvidia and AMD GPUs when using RedisAI for GPU inference - what are your thoughts on rephrasing it to SmartSim enables the use of Nvidia and AMD GPUs for GPU inference with RedisAI?

contact us and we can build the backend libraries for your desired version of CUDA and ROCm.
SmartSim supports using Nvidia and AMD GPUs when using RedisAI for GPU
inference. GPU support often depends on the version of the CUDA or ROCm stack
that is available on your machine. In _most_ cases, the versions of the ML
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like most is not rendering correctly in readthedocs and is showing up exactly as is - instead try *most*

inference. GPU support often depends on the version of the CUDA or ROCm stack
that is available on your machine. In _most_ cases, the versions of the ML
frameworks are backwards compatible. If you encounter problems, please contact
us at (smartsim at hpe dot com) and we can build the backend libraries for your
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for pointing them on how to contact us, what are your thoughts on pointing them to this area of the docs?

https://www.craylabs.org/docs/contributing.html#how-to-connect

so they have multiple options?

ALSO! what are your thoughts on changing (smartsim at hpe dot com) to instead
[email protected] <mailto:[email protected]>_

@@ -64,7 +65,7 @@ Linux

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice in the Linux tabs, there are additional requirements for CUDA 11 and CUDA 12 but not for ROCm or CPU - just raising this just incase!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh also why does ROCm 6 have N/A for two columns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU seems a little out of place, what are your thoughts on splitting this table into two with

  1. GPU Configurations
  2. CPU Configurations

wget https://developer.download.nvidia.com/compute/cuda/12.5.0/local_installers/cuda_12.5.0_555.42.02_linux.run
sh ./cuda_12.5.0_555.42.02_linux.run --toolkit --silent --toolkitpath=$CUDA_TOOLKIT_INSTALL_PATH

**Step 3:** Download cuDNN:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after line 46 so the sentence on line 47 starts on a /n

mkdir -p $CUDNN_INSTALL_PATH
tar -xf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar -C $CUDNN_INSTALL_PATH --strip-components 1

Option 1: Environment Variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on wrapping Option 1 and Option 2 into their own section? With an overview on both, I think it might better help the user understand what they are choosing between?



The easiest way to accomplish this assumes that you have the following
- A source machine connected to the internet with SmartSim built (referred to as Machine A).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format of this is showing up weird in readthedocs -> add a space after line 16 to resolve the issue


The easiest way to accomplish this assumes that you have the following
- A source machine connected to the internet with SmartSim built (referred to as Machine A).
- A target machine not connected to the Internet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to note this as Machine B or is that too much

Custom ML backends
------------------

The ML backends (Torch, ONNX Runtime, and Tensorflow) and their associated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supported based on the intended device (CPU, ROCm, CUDA-11, or CUDA-12) - I think the only device in the list is CPU right? Maybe say device or platform

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants