Welcome to the official repository of CT-CLIP, a pioneering work in 3D medical imaging with a particular focus on chest CT volumes. CT-CLIP provides an open-source codebase and pre-trained models, all freely accessible to researchers. CT-CLIP is also utilized to develop a cutting-edge visual-language chat model, CT-CHAT, designed specifically for 3D chest CT volumes. You can access the training dataset (CT-RATE) consisting of chest CT volumes paired with radiology text reports via the HuggingFace repository.
Before you start, you must install the necessary dependencies. To do so, execute the following commands:
# Navigate to the 'transformer_maskgit' directory and install the required packages
cd transformer_maskgit
pip install -e .
# Return to the root directory
cd ..
# Navigate to the 'CT_CLIP' directory and install its required packages
cd CT_CLIP
pip install -e .
# Return to the root directory
cd ..
After following these steps, your environment should be properly set up with all required packages.
The CT-CLIP model necessitates the use of an A100 GPU with 80GB of VRAM for a batch size of 8 for efficient training, due to the model's considerable size. Inference can be done in smaller GPUs. The patch sizes of the image encoder can be adjusted to make it fit onto smaller GPUs, although this will affect the model performance in smaller pathologies. Batch size can also be lowered, but this is not recommended for CLIP training as it will not learn negative images with lower batch sizes.
For details on the training of zero-shot CT-CLIP and fine-tuned CT-CLIP models, please navigate to scripts.
For details on the training of text classifier, please navigate to text_classifier.
For details on the inference and evaluation of zero-shot CT-CLIP and fine-tuned CT-CLIP models, please navigate to scripts.
For details on the inference of text classifier, please navigate to text_classifier.
Inference with CT-CLIP (zero-shot) and CT-CLIP (VocabFine) takes approximately 1.5 seconds to assess 18 pathologies from a single CT volume, while inference with CT-CLIP (ClassFine) takes just 0.5 seconds for the same task.
For your convenience, we provide access to pretrained models directly. These models have been trained on our paired radiological report and chest CT volume dataset, as elaborated in the paper.
You can download the models from the following links:
-
CT-CLIP: Download Here
-
CT-CLIP (VocabFine): Download Here
-
CT-CLIP (ClassFine): Download Here
-
Text Classifier Model: Download Here
By leveraging these pretrained models, you can easily reproduce our results or further extend our work.
A major challenge in computational research in 3D medical imaging is the lack of comprehensive datasets. Addressing this issue, we present CT-RATE, the first 3D medical imaging dataset that pairs images with textual reports. CT-RATE consists of 25,692 non-contrast chest CT volumes, expanded to 50,188 through various reconstructions, from 21,304 unique patients, along with corresponding radiology text reports, multi-abnormality labels, and metadata. We divided the cohort into two groups: 20,000 patients were allocated to the training set and 1,304 to the validation set. Our folders are structured as split_patientID_scanID_reconstructionID. For instance, "valid_53_a_1" indicates that this is a CT volume from the validation set, scan "a" from patient 53, and reconstruction 1 of scan "a". This naming convention applies to all files.
You can download the dataset used in this work via the Hugging Face repository.
Data used to finetune and validate the text classifier model can be accessed here.
If you use CT-RATE or CT-CLIP, we would appreciate your references to our paper.
We are committed to fostering innovation and collaboration in the research community. To this end, all elements of CT-CLIP are released under a Creative Commons Attribution (CC-BY-NC-SA) license. This licensing framework ensures that our contributions can be freely used for non-commercial research purposes, while also encouraging contributions and modifications, provided that the original work is properly cited and any derivative works are shared under similar terms.