Works on Python 3.9 with system CUDA version 12.3. Tested on RTX 4060 (8 GB VRAM) and 16 GB RAM, so those are the minimum system reqs but may work on lower.
- Run setup.sh to create folders and fetch external libraries.
- If I made a mistake in setup sh, just run the commands manually.
- Install required packages with
pip install -r requirements.txt
- Download DeCap weights from DeCap_CoCo.zip.
- Unzip and place inside custom_pipeline/pretrained weights/ (see gen_caption.py lines 32 and 58 for reference)
Just run main.py.
The existing pipeline can run inference in 2-3 minutes. The custom pipeline may take up to 30 minutes for inference.