interTwin-eu · jarlsondre · Dec 2, 2024 · Nov 11, 2024 · Nov 11, 2024 · Nov 11, 2024
diff --git a/Makefile b/Makefile
@@ -11,7 +11,7 @@ torch-env-cpu: env-files/torch/generic_torch.sh
 	env ENV_NAME=.venv-pytorch \
 		NO_CUDA=1 \
 		bash -c 'bash env-files/torch/generic_torch.sh'
-	.venv-pytorch/bin/horovodrun --check-build 
+	# .venv-pytorch/bin/horovodrun --check-build 
 
 # Install TensorFlow env (GPU support)
 tensorflow-env: env-files/tensorflow/generic_tf.sh
@@ -44,7 +44,10 @@ tf-env-vega: env-files/tensorflow/createEnvVegaTF.sh env-files/tensorflow/generi
 
 
 test:
-	PYTORCH_ENABLE_MPS_FALLBACK=1 .venv-pytorch/bin/pytest -v tests/ -m "not slurm"
+	.venv/bin/pytest -v tests/
+
+test-local:
+	PYTORCH_ENABLE_MPS_FALLBACK=1 .venv/bin/pytest -v tests/ -m "not hpc"
 
 test-jsc: tests/run_on_jsc.sh
 	bash tests/run_on_jsc.sh

diff --git a/README.md b/README.md
@@ -128,10 +128,84 @@ git clone [--recurse-submodules] [email protected]:interTwin-eu/itwinai.git
 
 ### Install itwinai environment
 
-You can create the
-Python virtual environments using our predefined Makefile targets.
+In this project, we are using `uv` as a project-wide package manager. Therefore, if
+you are a developer, you should see the [uv tutorial](/docs/uv-tutorial.md) after reading
+the following `pip` tutorial.
 
-#### PyTorch (+ Lightning) virtual environment
+#### Installation using pip
+
+##### Creating a venv
+
+You can install the `itwinai` environment for development using `pip`. First, however,
+you would want to make a Python venv if you haven't already. Make sure you have
+Python installed (on HPC you have to load it with `module load Python`), and then you
+can create a venv with the following command:
+
+```bash
+python -m venv <name-of-venv>
+```
+
+For example, if I wanted to create a venv in the directory `.venv` (which is useful if
+you use e.g. `uv`), then I would do:
+
+```bash
+python -m venv .venv
+```
+
+After this you can activate your venv using the following command:
+
+```bash
+source .venv/bin/activate
+```
+
+Now anything you pip install will be installed in your venv and if you run any python
+commands they will use the version from your venv.
+
+##### Installation of packages
+
+We provide some _extras_ that can be activated depending on which platform you are
+using.
+
+- `macos`, `amd` or `nvidia` depending on which platform you use. Changes the version
+of `prov4ML`.
+- `dev` for development purposes. Includes libraries for testing and tensorboard etc.
+- `torch` for installation with PyTorch.
+
+If you want to install PyTorch using CUDA then you also have to add an
+`--extra-index-url` to the CUDA version that you want. Since you are developing the
+library, you also want to enable the editable flag, `-e`, so that you don't have to
+reinstall everything every time you make a change. If you are on HPC, then you will
+usually want to add the `--no-cache-dir` flag to avoid filling up your `~/.cache`
+directory, as you can very easily reach your disk quota otherwise. An example of a
+complete command for installing as a developer on HPC with CUDA thus becomes:
+
+```bash
+pip install -e ".[torch,dev,nvidia,tf]" \
+    --no-cache-dir \
+    --extra-index-url https://download.pytorch.org/whl/cu121
+```
+
+If you wanted to install this locally on macOS (i.e. without CUDA) with PyTorch, you
+would do the following instead:
+
+```bash
+pip install -e ".[torch,dev,macos,tf]"
+```
+
+<!-- You can create the Python virtual environments using our predefined Makefile targets. -->
+
+#### Horovod and DeepSpeed
+
+The above does not install `Horovod` and `DeepSpeed`, however, as they require a
+specialized [script](env-files/torch/install-horovod-deepspeed-cuda.sh). If you do not
+require CUDA, then you can install them using `pip` as follows:
+
+```bash
+pip install --no-cache-dir --no-build-isolation git+https://github.com/horovod/horovod.git
+pip install --no-cache-dir --no-build-isolation deepspeed
+```
+
+#### PyTorch (+ Lightning) virtual environment with makefiles
 
 Makefile targets for environment installation:
 

diff --git a/docs/uv-tutorial.md b/docs/uv-tutorial.md
@@ -0,0 +1,98 @@
+# Tutorial for using the uv package manager
+
+[uv](https://docs.astral.sh/uv/) is a Python package manager meant to act as a drop-in
+replacement for `pip` (and many more tools). In this project, we use it to manage our
+packages, similar to how `poetry` works. This is done using a lockfile called
+`uv.lock`.
+
+## uv as a drop-in replacement for pip
+
+`uv` is a lot faster than `pip`, so we recommend installing packages from `PyPI`
+with `uv pip install <package>` instead of `pip install <package>`. You don't need to
+change anything in your project to use this feature, as it works as a drop-in
+replacement to `pip`.
+
+## uv as a project-wide package manager
+
+If you wish to use the `uv sync` and/or `uv lock` commands, which is how you use `uv`
+to manage all your project packages, then note that these commands will only work
+with the directory called `.venv` in the project directory. Sometimes, this can be a
+bit annoying, especially with an existing venv, so we recommend using a
+[symlink](https://en.wikipedia.org/wiki/Symbolic_link). If you need to have multiple
+venvs that you want to switch between, you can update the symlink to whichever of them
+you want to use at the moment. For SLURM scripts, you can hardcode them if need be.
+
+### Symlinking .venv
+
+To create a symlink between your venv and the `.venv` directory, you can use the
+following command:
+
+```bash
+ln -s <path/to/your_venv> <path/to/.venv>
+```
+
+As an example, if I am in the `itwinai/` folder and my venv is called `envAI_juwels`,
+then the following will create the wanted symlink:
+
+```bash
+ln -s envAI_juwels .venv
+```
+
+### Installing from uv.lock
+
+> [!Warning]
+> If `uv` creates your venv for you, the venv will not contain `pip`. However, you need
+> to have `pip` installed to be able to run the installation scripts for `Horovod` and
+> `DeepSpeed`, so we have included `pip` in the dependencies in `pyproject.toml`.
+
+To install from the `uv.lock` file into the `.venv` venv, you can do the following:
+
+```bash
+uv sync
+```
+
+If the `uv.lock` file has optional dependencies (e.g. `macos` or `torch`), then these
+can be added with the `--extra` flag as follows:
+
+```bash
+uv sync --extra torch --extra macos
+```
+
+These will usually correspond to the optional dependencies in the `pyproject.toml`. In
+particular, if you are a developer you would use one of the following two commands. If
+you are on HPC with cuda, you would use:
+
+```bash
+uv sync --no-cache --extra dev --extra nvidia --extra torch --extra tf 
+```
+
+If you are developing on your local computer with macOS, then you would use:
+
+```bash
+uv sync --extra torch --extra tf --extra dev --extra macos
+```
+
+### Updating the uv.lock file
+
+To update the project's `uv.lock` file with the dependencies of the project, you can
+use the command:
+
+```bash
+uv lock
+```
+
+This will create a `uv.lock` file if it doesn't already exist, using the dependencies
+from the `pyproject.toml`.
+
+## Adding new packages to the project
+
+To add a new package to the project (i.e. to the `pyproject.toml` file) with `uv`, you
+can use the following command:
+
+```bash
+uv add <package>
+```
+
+> [!Warning]
+> This will add the package to your `.venv` venv, so make sure to have symlinked to
+> this directory if you haven't already.
diff --git a/env-files/tensorflow/generic_tf.sh b/env-files/tensorflow/generic_tf.sh
@@ -1,97 +1,18 @@
 #!/bin/bash
 
-# ENV VARIABLES:
-#   - ENV_NAME: set custom name for virtual env. Default: ".venv-tf"
-#   - NO_CUDA: if set, install without cuda support
-
-# Detect custom env name from env
 if [ -z "$ENV_NAME" ]; then
   ENV_NAME=".venv-tf"
 fi
 
-if [ -z "$NO_CUDA" ]; then
-  echo "Installing itwinai and its dependencies in '$ENV_NAME' virtual env (CUDA enabled)"
-else
-  echo "Installing itwinai and its dependencies in '$ENV_NAME' virtual env (CUDA disabled)"
-fi
-
-# get python version
-pver="$(python --version 2>&1 | awk '{print $2}' | cut -f1-2 -d.)"
-
-# use pyenv if exist
-if [ -d "$HOME/.pyenv" ];then
-  export PYENV_ROOT="$HOME/.pyenv"
-  export PATH="$PYENV_ROOT/bin:$PATH"
-fi
+work_dir=$PWD
 
-# set dir
-cDir=$PWD
-
-# create environment
-if [ -d "${cDir}/$ENV_NAME" ];then
+# Create the python venv if it doesn't already exist
+if [ -d "${work_dir}/$ENV_NAME" ];then
   echo "env $ENV_NAME already exists"
-
-  source $ENV_NAME/bin/activate
 else
   python3 -m venv $ENV_NAME
-
-  # activate env
-  source $ENV_NAME/bin/activate
-
-  echo "$ENV_NAME environment is created in ${cDir}"
-fi
-
-pip3 install --no-cache-dir  --upgrade pip
-
-# get wheel -- setuptools extension
-pip3 install --no-cache-dir wheel
-
-# install TF 
-if [ -f "${cDir}/$ENV_NAME/bin/tensorboard" ]; then
-  echo 'TF already installed'
-  echo
-else
-  if [ -z "$NO_CUDA" ]; then
-    pip3 install tensorflow[and-cuda]==2.16.* --no-cache-dir
-  else
-    # CPU only installation
-    pip3 install tensorflow==2.16.* --no-cache-dir
-  fi
-fi
-
-# CURRENTLY, horovod is not used with TF. Skipped.
-# # install horovod
-# if [ -f "${cDir}/$ENV_NAME/bin/horovodrun" ]; then
-#   echo 'Horovod already installed'
-#   echo
-# else
-#   if [ -z "$NO_CUDA" ]; then
-#     export HOROVOD_GPU=CUDA
-#     export HOROVOD_GPU_OPERATIONS=NCCL
-#     export HOROVOD_WITH_TENSORFLOW=1
-#     # export TMPDIR=${cDir}
-#   else
-#     # CPU only installation
-#     export HOROVOD_WITH_TENSORFLOW=1
-#     # export TMPDIR=${cDir}
-#   fi
-
-#   pip3 install --no-cache-dir horovod[tensorflow,keras] # --ignore-installed
-# fi
-
-# WHEN USING TF >= 2.16:
-# install legacy version of keras (2.16)
-# Since TF 2.16, keras updated to 3.3,
-# which leads to an error when more than 1 node is used
-# https://keras.io/getting_started/
-pip3 install --no-cache-dir  tf_keras==2.16.*
-
-# Install Pov4ML
-if [[ "$OSTYPE" =~ ^darwin ]] ; then
-  pip install "prov4ml[apple,nvidia]@git+https://github.com/matbun/ProvML@new-main" || exit 1
-else
-  pip install "prov4ml[nvidia]@git+https://github.com/matbun/ProvML@new-main" || exit 1
+  echo "$ENV_NAME environment is created in ${work_dir}"
 fi
 
-# Install itwinai: MUST be last line of the script for the user installation script to work!
-pip3 install --no-cache-dir  -e .[dev]
+source $ENV_NAME/bin/activate
+pip install --no-cache-dir -e ".[dev,nvidia,tf]"