You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users are constantly extending the existing public Anaconda modules on Polaris, Sophia etc, installing new Python packages, conda libraries, older or newer versions of existing packages, etc. When they run into issues, the [email protected] tickets often require much iteration to get the necessary info to resolve their problems.
They could be reporting legitimate issues with the installed PyTorch, DeepSpeed, etc. package, or it could be a basic user-error for the way they are launching the jobs (e.g. on a UAN, et.c), or it could be that they arent using the Python and/or installed package that they thought they were using (e.g. ~/.local/ user site-packages installs..., cloned conda env that reinstalled PyTorch from a generic wheel from pip/conda-forge, etc).
It might not be a bad idea to create a new page in the user docs to give reporting requirements/tips when opening tickets regarding Python/conda/ML frameworks. Could be a top level page like the recently-created page https://docs.alcf.anl.gov/issues/ "Questions/Issues on ALCF Docs", but instead "Questions/Issues on ALCF Installed Software".
Users are constantly extending the existing public Anaconda modules on Polaris, Sophia etc, installing new Python packages, conda libraries, older or newer versions of existing packages, etc. When they run into issues, the [email protected] tickets often require much iteration to get the necessary info to resolve their problems.
They could be reporting legitimate issues with the installed PyTorch, DeepSpeed, etc. package, or it could be a basic user-error for the way they are launching the jobs (e.g. on a UAN, et.c), or it could be that they arent using the Python and/or installed package that they thought they were using (e.g.
~/.local/
user site-packages installs..., cloned conda env that reinstalled PyTorch from a generic wheel from pip/conda-forge, etc).It might not be a bad idea to create a new page in the user docs to give reporting requirements/tips when opening tickets regarding Python/conda/ML frameworks. Could be a top level page like the recently-created page https://docs.alcf.anl.gov/issues/ "Questions/Issues on ALCF Docs", but instead "Questions/Issues on ALCF Installed Software".
Or a section in https://docs.alcf.anl.gov/polaris/data-science-workflows/python/ and the other machine pages.
Include all the following details:
..
The text was updated successfully, but these errors were encountered: