Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Venv creation and uv support #245

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open

Venv creation and uv support #245

wants to merge 61 commits into from

Conversation

jarlsondre
Copy link
Collaborator

@jarlsondre jarlsondre commented Nov 13, 2024

Summary

This PR updates the way we create venvs removing parts of the reliance on massive scripts such as generic_torch.sh. It also allows us to use uv, which is quite a bit faster than just using pip. Of course, this is done without depending on uv, as it is still possible to only use pip.

I wrote a quick tutorial on how the uv workflow goes, which can be seen in the uv-tutorial.md file that was added here.

Motivation

The reliance on the generic_torch.sh script has numerous disadvantages. First of all, we are unable to provide all of our dependencies in the pyproject.toml, which means that a simple pip install itwinai is simply not possible at the moment. Secondly, the generic_torch.sh script is messy with many if statement that are repeated multiple times throughout, such as "if cuda" (but in shell syntax), meaning that it is hard to build upon the script. Thirdly, because we have a bunch of separated pip install x statements, we effectively give no way for the dependency manager to solve our dependency graph in a nice way. This results in many packages being installed only to be uninstalled on the next pip install statement. This causes our script to take much longer than needed.

Noteworthy

Because part of this PR is also about transitioning to uv, I have renamed a lot of the venvs that we use to just .venv. It seems that this work better with uv in some cases. I was also thinking that we could keep our old venvs and just symlink them to .venv, so that we don't have to create new names if we ever have to change systems again. This should hopefully streamline our naming convention a bit more.

Related issue :
#244

@jarlsondre jarlsondre added the enhancement New feature or request label Nov 13, 2024
@jarlsondre jarlsondre self-assigned this Nov 13, 2024
@jarlsondre jarlsondre marked this pull request as draft November 13, 2024 13:08
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
jarlsondre and others added 12 commits November 13, 2024 15:41
* Refactor Dockerfiles

* Refactor container gen script

* ADD jlab dockerfile

* First working version of jlab container

* ADD CMCC requirements

* update dockerfiles

* ADD nvconda and refactor

* Update containers

* ADD containers

* ADD simple plus dockerfile

* Update NV deps

* Update CUDA

* Add comment

* Cleanup

* Cleanup

* UPDATE README

* Refactor

* Fix linter

* Refactor dockerfiles and improve tests

* Refactor

* Refactor

* Fix

* Add first tests for HPC

* First broken tests for HPC

* Update tests and strategy

* UPDATE tests

* Update horovod tests

* Update tests and jlab deps

* Add MLFLow tracking URI

* ADD distributed trainer tests

* mpirun container deepspeed

* Fix distributed strategy tests on multi-node

* ADD srun launcher

* Refactor jobscript

* Cleanup

* isort tests

* Refactor scripts

* Minor fixes

* Add logging to file for all workers

* Add jupyter base files

* Add jupyter base files

* spelling

* Update provenance deps

* Update DS version

* Update prov docs

* Cleanup

* add nvidia dep

* Remove incomplete work

* update pyproject

* ADD hadolit config file

* FIX flag

* Fix linters

* Refactor

* Update prov4ml

* Update pytest CI

* Minor fix

* Incorporate feedback

* Update Dockerfiles

* Incorporate feedback

* Update comments

* Refactor tests
@jarlsondre jarlsondre marked this pull request as ready for review November 19, 2024 16:28
uv-tutorial.md Outdated Show resolved Hide resolved
@matbun
Copy link
Collaborator

matbun commented Nov 25, 2024

If this PR is merged after #249, we need to update the pyproject.toml to use the new-main version of the provenance logger. Maybe other files need to be updated accordingly

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
env-files/torch/install-horovod-deepspeed-cuda.sh Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
tutorials/distributed-ml/torch-scaling-test/README.md Outdated Show resolved Hide resolved
use-cases/eurac/requirements.txt Show resolved Hide resolved
uv-tutorial.md Outdated Show resolved Hide resolved
uv-tutorial.md Outdated Show resolved Hide resolved
uv-tutorial.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants