GitHub

Developing deep-learning models for medical imaging requires large, annotated datasets, but the heterogeneity of annotations across tasks presents significant challenges. Foundation X is an end-to-end framework designed to train a multi-task foundation model by leveraging diverse expert-level annotations from multiple public datasets. It introduces a Cyclic & Lock-Release pretraining strategy alongside a student-teacher learning paradigm to enhance knowledge retention while mitigating overfitting. Trained on 11 chest X-ray datasets, Foundation X seamlessly integrates classification, localization, and segmentation tasks. Experimental results demonstrate its ability to maximize annotation utility, improve cross-dataset and cross-task learning, and achieve superior performance in disease classification, localization, and segmentation.

Publication

Foundation X: Integrating Classification, Localization, and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis
Nahid Ul Islam¹, DongAo Ma¹, Jiaxuan Pang¹, Shivasakthi Senthil Velan¹, Michael B Gotway², and Jianming Liang¹
¹Arizona State University, ²Mayo Clinic
Winter Conference on Applications of Computer Vision (WACV-2025)
Paper | Supp | Poster | Code | Presentation Slides | Presentation

Datasets (Full View)

We pretrain our Foundation X model using 11 publicly available chest X-ray datasets, as shown in the first 11 datasets in the table. Although not every dataset contains all three types of annotations—classification, localization, and segmentation—we leverage all available annotations to maximize the model’s learning potential. Among these datasets, all include classification ground truths, six provide localization bounding box annotations, and three offer segmentation masks for diseases. Furthermore, we utilize organ localization and segmentation datasets from VinDr-CXR, VinDr-RibCXR, NIH Montgomery, and JSRT for target task fine-tuning. Here, the organ segmentation masks for VinDr-CXR were sourced from the CheXmask database. We also fine-tuned VinDr-CXR with local labels for the disease localization task.

Data Splits and Bounding Box Annotations

Data splits and generated COCO-format localization bouding box annotation files can be downloaded through this Google Form.

Pre-trained models

You can download the pretrained models through this Google Form.

Loading pre-trained Foundation X checkpoints and extracting features

We provide a utility script load_weights.py to initialize a Swin-B backbone using our pretrained Foundation X checkpoints. The model only loads the encoder weights from the checkpoint, and supports an optional projection layer.

Example: Load model and extract features

from load_weights import build_model

# Path to the pretrained Foundation X checkpoint
pretrained_weights = "path/to/weights/ckpt.pth"

# Initialize the model
foundationx_model = build_model(
   pretrained_weights,
   num_classes=0,
   projector_features=256,     # Optional: dimensionality of projection layer
   use_mlp=True
)

foundationx_model.eval()  # Set model to evaluation mode

# extract features from input batch (e.g., [B, 3, 224, 224])
with torch.no_grad():
   features = foundationx_model.forward_features(input_tensor)

Setting-up the multiscaledeformableattention package

Please follow the steps described in DINO GitHub Repo to install the package "multiscaledeformableattention".

Pretraining Instructions

Follow the script scripts/run_IntegratedModel_Foundation6_ClsLocSeg_v102.sh to start pretraining Foundation X model.
Make sure to update the data direcotry in the files datasets/coco.py (for localization tasks) and datasets_medical.py (for classification and segmentation tasks).
If Classification Heads need to be increased or decreased the file models/dino/swin_transformer_CyclicSegmentation.py should be modified.
If Segmentation Heads need to be increased or decreased the file models/dino/swin_transformer_CyclicSegmentation.py should be modified.
The Localization branch is based on DINO. We modified the code to have multiple Localization Decoders. If the number of Localization Decoders needs to be adjusted, the following code snippet must be modified. Currently, the code reflects 6 Localization Decoders.

Major results from our work

1. Foundation X maximizes performance improvements during pretraining by utilizing all available annotations for classification, localization, and segmentation.

2. Foundation X enhances performance when jointly trained for organ localization and segmentation and excels during finetuning.

3. Foundation X excels in few-shot learning and shows strong performance across training samples.

4. Foundation X maximizes performance with cross-dataset and cross-task learning.

5. Foundation X full finetuning outperforms head-only finetuning and baseline models.

Citation

If you use this code or use our pre-trained models for your research, please cite our paper:

@InProceedings{Islam_2025_WACV,
    author    = {Islam, Nahid Ul and Ma, DongAo and Pang, Jiaxuan and Velan, Shivasakthi Senthil and Gotway, Michael and Liang, Jianming},
    title     = {Foundation X: Integrating Classification Localization and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025},
    pages     = {3647-3656}
}

Acnkowledgement

This research was partially supported by ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, as well as by the NIH under Award Number R01HL128785. The authors are solely responsible for the content, which does not necessarily reflect the official views of the NIH. This work also utilized GPUs provided by ASU Research Computing (SOL), Bridges-2 at the Pittsburgh Supercomputing Center (allocated under BCS190015), and Anvil at Purdue University (allocated under MED220025). These resources are supported by the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, funded by the National Science Foundation under grants #2138259, #2138286, #2138307, #2137603, and #2138296. We also extend our gratitude to Anirudh Kaniyar Narayana Iyengar for his contributions to collecting localization data, preparing bounding boxes in COCO format, and developing some of the data loaders. Finally, the content of this paper is covered by patents pending.

Contact

For any questions, feel free to reach out:
Email: nuislam (at) asu.edu

License

Released under the ASU GitHub Project License

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
Figures		Figures
__pycache__		__pycache__
config		config
data		data
datasets		datasets
moco		moco
models		models
pytorch_grad_cam		pytorch_grad_cam
scripts		scripts
tools		tools
util		util
LICENSE		LICENSE
README.md		README.md
datasets_medical.py		datasets_medical.py
engine.py		engine.py
engineClsSeg.py		engineClsSeg.py
load_weights.py		load_weights.py
main.py		main.py
main_NAD_finetune.py		main_NAD_finetune.py
main_TEST.py		main_TEST.py
requirements_conda.txt		requirements_conda.txt
requirements_pip.txt		requirements_pip.txt
utils_segmentation.py		utils_segmentation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Publication

Datasets (Full View)

Data Splits and Bounding Box Annotations

Pre-trained models

Loading pre-trained Foundation X checkpoints and extracting features

Example: Load model and extract features

Setting-up the multiscaledeformableattention package

Pretraining Instructions

Major results from our work

Citation

Acnkowledgement

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

jlianglab/Foundation_X

Folders and files

Latest commit

History

Repository files navigation

Publication

Datasets (Full View)

Data Splits and Bounding Box Annotations

Pre-trained models

Loading pre-trained Foundation X checkpoints and extracting features

Example: Load model and extract features

Setting-up the multiscaledeformableattention package

Pretraining Instructions

Major results from our work

Citation

Acnkowledgement

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages