This codebase is tested using Python 3.8.13 with PyTorch 1.13.0. Follow the below steps to create an environment and install dependencies.
# Run base image
docker pull nvcr.io/nvidia/pytorch:22.07-py3
docker run -it --gpus all --ipc=host --rm --name=tc_clip nvcr.io/nvidia/pytorch:22.07-py3
# Setup environment
pip install -r requirements.txt
# Save docker image, ...
# Create a conda environment
conda create -y --name tc_clip python=3.8
# Activate the environment
conda activate vclip
# Install PyTorch
conda install pytorch==1.13.0 torchvision==0.14.0 -c pytorch -c nvidia
# Install requirements
pip install -r requirements.txt
Note: Ensure that you have the system CUDA of the same version as the PyTorch CUDA version to properly install Apex.
- Clone the Apex library:
git clone https://github.com/NVIDIA/apex
cd apex
-
Replace the cached_cast function in
apex/amp/utils.py
with the modified version in scripts/apex_custom_cached_cast.py. This is to enable multiple forwards during training. See PR #1282 for details. -
Finally, install Apex:
pip install --upgrade pip
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./