itwinai
is a powerful Python toolkit designed to help scientists and researchers streamline AI and machine learning
workflows, specifically for digital twin applications. It provides easy-to-use tools for distributed training,
hyper-parameter optimization on HPC systems, and integrated ML logging, reducing engineering overhead and accelerating
research. Developed primarily by CERN, itwinai
supports modular and reusable ML workflows, with
the flexibility to be extended through third-party plugins, empowering AI-driven scientific research in digital twins.
See the latest version of our docs here.
If you are a developer, please refer to the developers installation guide.
Requirements:
- Linux or macOS environment. Windows was never tested.
Depending on your environment, there are different ways to select a specific python version.
If you are working on a laptop or on a simple on-prem setup, you could consider using pyenv. See the installation instructions. If you are using pyenv, make sure to read this.
In HPC systems it is more popular to load dependencies using Environment Modules or Lmod. If you don't know what modules to load, contact the system administrator to learn how to select the proper modules.
Commands to execute every time before installing or activating the python virtual environment for PyTorch:
-
Juelich Supercomputer (JSC):
ml --force purge ml Stages/2024 GCC OpenMPI CUDA/12 cuDNN MPI-settings/CUDA ml Python CMake HDF5 PnetCDF libaio mpi4py
-
Vega supercomputer:
ml --force purge ml Python/3.11.5-GCCcore-13.2.0 CMake/3.24.3-GCCcore-11.3.0 mpi4py OpenMPI CUDA/12.3 ml GCCcore/11.3.0 NCCL cuDNN/8.9.7.29-CUDA-12.3.0 UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.3.0
Commands to execute every time before installing or activating the python virtual environment for TensorFlow:
-
Juelich Supercomputer (JSC):
ml --force purge ml Stages/2024 GCC/12.3.0 OpenMPI CUDA/12 MPI-settings/CUDA ml Python/3.11 HDF5 PnetCDF libaio mpi4py CMake cuDNN/8.9.5.29-CUDA-12
-
Vega supercomputer:
ml --force purge ml Python/3.11.5-GCCcore-13.2.0 CMake/3.24.3-GCCcore-11.3.0 mpi4py OpenMPI CUDA/12.3 ml GCCcore/11.3.0 NCCL cuDNN/8.9.7.29-CUDA-12.3.0 UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.3.0
Install itwinai and its dependencies using the following command, and follow the instructions:
# First, load the required environment modules, if on an HPC
# Second, create a python virtual environment and activate it
$ python -m venv ENV_NAME
$ source ENV_NAME/bin/activate
# Install itwinai inside the environment
(ENV_NAME) $ export ML_FRAMEWORK="pytorch" # or "tensorflow"
(ENV_NAME) $ curl -fsSL https://github.com/interTwin-eu/itwinai/raw/main/env-files/itwinai-installer.sh | bash
The ML_FRAMEWORK
environment variable controls whether you are installing
itwinai for PyTorch or TensorFlow.
Warning
itwinai depends on Horovod, which requires CMake>=1.13
and
other packages.
Make sure to have them installed in your environment before proceeding.
If you are contributing to this repository, please continue below for more advanced instructions.
Warning
Branch protection rules are applied to all branches which names
match this regex: [dm][ea][vi]*
. When creating new branches,
please avoid using names that match that regex, otherwise branch
protection rules will block direct pushes to that branch.
git clone [--recurse-submodules] [email protected]:interTwin-eu/itwinai.git
You can create the Python virtual environments using our predefined Makefile targets.
Makefile targets for environment installation:
- Juelich Supercomputer (JSC):
torch-gpu-jsc
- Vega supercomputer:
torch-env-vega
- In any other cases, when CUDA is available:
torch-env
- In any other cases, when CUDA NOT is available (CPU-only installation):
torch-env-cpu
For instance, on a laptop with a CUDA-compatible GPU you can use:
make torch-env
When not on an HPC system, you can activate the python environment directly with:
source .venv-pytorch/bin/activate
Otherwise, if you are on an HPC system, please refer to this section explaining how to load the required environment modules before the python environment.
To build a Docker image for the pytorch version (need to adapt TAG
):
# Local
docker buildx build -t itwinai:TAG -f env-files/torch/Dockerfile .
# Ghcr.io
docker buildx build -t ghcr.io/intertwin-eu/itwinai:TAG -f env-files/torch/Dockerfile .
docker push ghcr.io/intertwin-eu/itwinai:TAG
Makefile targets for environment installation:
- Juelich Supercomputer (JSC):
tf-gpu-jsc
- Vega supercomputer:
tf-env-vega
- In any other case, when CUDA is available:
tensorflow-env
- In any other case, when CUDA NOT is available (CPU-only installation):
tensorflow-env-cpu
For instance, on a laptop with a CUDA-compatible GPU you can use:
make tensorflow-env
When not on an HPC system, you can activate the python environment directly with:
source .venv-tf/bin/activate
Otherwise, if you are on an HPC system, please refer to this section explaining how to load the required environment modules before the python environment.
To build a Docker image for the tensorflow version (need to adapt TAG
):
# Local
docker buildx build -t itwinai:TAG -f env-files/tensorflow/Dockerfile .
# Ghcr.io
docker buildx build -t ghcr.io/intertwin-eu/itwinai:TAG -f env-files/tensorflow/Dockerfile .
docker push ghcr.io/intertwin-eu/itwinai:TAG
Usually, HPC systems organize their software in modules which need to be imported by the users every time they open a new shell, before activating a Python virtual environment.
Below you can find some examples on how to load the correct environment modules on the HPC systems we are currently working with.
Commands to be executed before activating the python virtual environment:
-
Juelich Supercomputer (JSC):
ml --force purge ml Stages/2024 GCC OpenMPI CUDA/12 cuDNN MPI-settings/CUDA ml Python CMake HDF5 PnetCDF libaio mpi4py
-
Vega supercomputer:
ml --force purge ml Python/3.11.5-GCCcore-13.2.0 CMake/3.24.3-GCCcore-11.3.0 mpi4py OpenMPI CUDA/12.3 ml GCCcore/11.3.0 NCCL cuDNN/8.9.7.29-CUDA-12.3.0 UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.3.0
-
When not on an HPC: do nothing.
For instance, on JSC you can activate the PyTorch virtual environment in this way:
# Load environment modules
ml --force purge
ml Stages/2024 GCC OpenMPI CUDA/12 cuDNN MPI-settings/CUDA
ml Python CMake HDF5 PnetCDF libaio mpi4py
# Activate virtual env
source envAI_hdfml/bin/activate
Commands to be executed before activating the python virtual environment:
-
Juelich Supercomputer (JSC):
ml --force purge ml Stages/2024 GCC/12.3.0 OpenMPI CUDA/12 MPI-settings/CUDA ml Python/3.11 HDF5 PnetCDF libaio mpi4py CMake cuDNN/8.9.5.29-CUDA-12
-
Vega supercomputer:
ml --force purge ml Python/3.11.5-GCCcore-13.2.0 CMake/3.24.3-GCCcore-11.3.0 mpi4py OpenMPI CUDA/12.3 ml GCCcore/11.3.0 NCCL cuDNN/8.9.7.29-CUDA-12.3.0 UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.3.0
-
When not on an HPC: do nothing.
For instance, on JSC you can activate the TensorFlow virtual environment in this way:
# Load environment modules
ml --force purge
ml Stages/2024 GCC/12.3.0 OpenMPI CUDA/12 MPI-settings/CUDA
ml Python/3.11 HDF5 PnetCDF libaio mpi4py CMake cuDNN/8.9.5.29-CUDA-12
# Activate virtual env
source envAItf_hdfml/bin/activate
Do this only if you are a developer wanting to test your code with pytest.
First, you need to create virtual environments both for torch and tensorflow, following the instructions above, depending on the system that you are using (e.g., JSC).
To select the name of the torch and tf environments in which the tests will be
executed you can set the following environment variables.
If these env variables are not set, the testing suite will assume that the
PyTorch environment is under
.venv-pytorch
and the TensorFlow environment is under .venv-tf
.
export TORCH_ENV="my_torch_env"
export TF_ENV="my_tf_env"
Functional tests (marked with pytest.mark.functional
) will be executed under
/tmp/pytest
location to guarantee isolation among tests.
To run functional tests use:
pytest -v tests/ -m "functional"
Note
Depending on the system that you are using, we implemented a tailored Makefile target to run the test suite on it. Read these instructions until the end!
We provide some Makefile targets to run the whole test suite including unit, integration, and functional tests. Choose the right target depending on the system that you are using:
Makefile targets:
- Juelich Supercomputer (JSC):
test-jsc
- In any other case:
test
For instance, to run the test suite on your laptop user:
make test
This section is intended for the developers of itwinai and outlines the practices used to manage container images through GitHub Container Registry (GHCR).
Our container images follow the convention:
ghcr.io/intertwin-eu/IMAGE_NAME:TAG
For example, in ghcr.io/intertwin-eu/itwinai:0.2.2-torch2.6-jammy
:
IMAGE_NAME
isitwinai
TAG
is0.2.2-torch2.6-jammy
The TAG
follows the convention:
[jlab-]X.Y.Z-(torch|tf)x.y-distro
Where:
X.Y.Z
is the itwinai version(torch|tf)
is an exclusive OR between "torch" and "tf". You can pick one or the other, but not both.x.y
is the version of the ML framework (e.g., PyTorch or TensorFlow)distro
is the OS distro in the container (e.g., Ubuntu Jammy)jlab-
is prepended to the tag of images including JupyterLab
We use different image names to group similar images under the same namespace:
itwinai
: Production images. These should be well-maintained and orderly.itwinai-dev
: Development images. Tags can vary, and may include random hashes.itwinai-cvmfs
: Images that need to be made available through CVMFS via Unpacker.
Warning
It is very important to keep the number of tags for itwinai-cvmfs
as low
as possible. Tags should only be created under this namespace when strictly
necessary. Otherwise, this could cause issues for the Unpacker.
Our docker manifests support labels to record provenance information, which can be lately
accessed by docker inspect IMAGE_NAME:TAG
.
A full example below:
export BASE_IMG_NAME="what goes after the last FROM"
export IMAGE_FULL_NAME="IMAGE_NAME:TAG"
docker build \
-t "$IMAGE_FULL_NAME" \
-f path/to/Dockerfile \
--build-arg COMMIT_HASH="$(git rev-parse --verify HEAD)" \
--build-arg BASE_IMG_NAME="$BASE_IMG_NAME" \
--build-arg BASE_IMG_DIGEST="$(docker pull "$BASE_IMG_NAME" > /dev/null 2>&1 && docker inspect "$BASE_IMG_NAME" --format='{{index .RepoDigests 0}}')" \
--build-arg ITWINAI_VERSION="$(grep -Po '(?<=^version = ")[^"]*' pyproject.toml)" \
--build-arg CREATION_DATE="$(date +"%Y-%m-%dT%H:%M:%S%:z")" \
--build-arg IMAGE_FULL_NAME=$IMAGE_FULL_NAME \
.