Skip to content

jlianglab/Foundation_X

Repository files navigation

Developing deep-learning models for medical imaging requires large, annotated datasets, but the heterogeneity of annotations across tasks presents significant challenges. Foundation X is an end-to-end framework designed to train a multi-task foundation model by leveraging diverse expert-level annotations from multiple public datasets. It introduces a Cyclic & Lock-Release pretraining strategy alongside a student-teacher learning paradigm to enhance knowledge retention while mitigating overfitting. Trained on 11 chest X-ray datasets, Foundation X seamlessly integrates classification, localization, and segmentation tasks. Experimental results demonstrate its ability to maximize annotation utility, improve cross-dataset and cross-task learning, and achieve superior performance in disease classification, localization, and segmentation.

Publication

Foundation X: Integrating Classification, Localization, and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis
Nahid Ul Islam1, DongAo Ma1, Jiaxuan Pang1, Shivasakthi Senthil Velan1, Michael B Gotway2, and Jianming Liang1
1Arizona State University, 2Mayo Clinic
Winter Conference on Applications of Computer Vision (WACV-2025)
Paper | Supp | Poster | Code | Presentation Slides | Presentation

Datasets (Full View)

  1. CheXpert
  2. NIH ChestX-ray14
  3. VinDr-CXR
  4. NIH Schenzhen CXR
  5. MIMIC-II
  6. TBX11k
  7. NODE21
  8. CANDID-PTX
  9. RSNA Pneumonia
  10. ChestX-Det
  11. SIIM-ACR
  12. CheXmask VinDr-CXR
  13. VinDr-RibCXR
  14. NIH Montgomery
  15. JSRT


We pretrain our Foundation X model using 11 publicly available chest X-ray datasets, as shown in the first 11 datasets in the table. Although not every dataset contains all three types of annotations—classification, localization, and segmentation—we leverage all available annotations to maximize the model’s learning potential. Among these datasets, all include classification ground truths, six provide localization bounding box annotations, and three offer segmentation masks for diseases. Furthermore, we utilize organ localization and segmentation datasets from VinDr-CXR, VinDr-RibCXR, NIH Montgomery, and JSRT for target task fine-tuning. Here, the organ segmentation masks for VinDr-CXR were sourced from the CheXmask database. We also fine-tuned VinDr-CXR with local labels for the disease localization task.

Data Splits and Bounding Box Annotations

  • Data splits and generated COCO-format localization bouding box annotation files can be downloaded through this Google Form.

Pre-trained models

  • You can download the pretrained models through this Google Form.

Loading pre-trained Foundation X checkpoints and extracting features

We provide a utility script load_weights.py to initialize a Swin-B backbone using our pretrained Foundation X checkpoints. The model only loads the encoder weights from the checkpoint, and supports an optional projection layer.

Example: Load model and extract features

from load_weights import build_model

# Path to the pretrained Foundation X checkpoint
pretrained_weights = "path/to/weights/ckpt.pth"

# Initialize the model
foundationx_model = build_model(
   pretrained_weights,
   num_classes=0,
   projector_features=256,     # Optional: dimensionality of projection layer
   use_mlp=True
)

foundationx_model.eval()  # Set model to evaluation mode

# extract features from input batch (e.g., [B, 3, 224, 224])
with torch.no_grad():
   features = foundationx_model.forward_features(input_tensor)

Setting-up the multiscaledeformableattention package

  • Please follow the steps described in DINO GitHub Repo to install the package "multiscaledeformableattention".

Pretraining Instructions


Major results from our work

1. Foundation X maximizes performance improvements during pretraining by utilizing all available annotations for classification, localization, and segmentation.

2. Foundation X enhances performance when jointly trained for organ localization and segmentation and excels during finetuning.

3. Foundation X excels in few-shot learning and shows strong performance across training samples.

4. Foundation X maximizes performance with cross-dataset and cross-task learning.

5. Foundation X full finetuning outperforms head-only finetuning and baseline models.


Citation

If you use this code or use our pre-trained models for your research, please cite our paper:

@InProceedings{Islam_2025_WACV,
    author    = {Islam, Nahid Ul and Ma, DongAo and Pang, Jiaxuan and Velan, Shivasakthi Senthil and Gotway, Michael and Liang, Jianming},
    title     = {Foundation X: Integrating Classification Localization and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025},
    pages     = {3647-3656}
}

Acnkowledgement

This research was partially supported by ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, as well as by the NIH under Award Number R01HL128785. The authors are solely responsible for the content, which does not necessarily reflect the official views of the NIH. This work also utilized GPUs provided by ASU Research Computing (SOL), Bridges-2 at the Pittsburgh Supercomputing Center (allocated under BCS190015), and Anvil at Purdue University (allocated under MED220025). These resources are supported by the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, funded by the National Science Foundation under grants #2138259, #2138286, #2138307, #2137603, and #2138296. We also extend our gratitude to Anirudh Kaniyar Narayana Iyengar for his contributions to collecting localization data, preparing bounding boxes in COCO format, and developing some of the data loaders. Finally, the content of this paper is covered by patents pending.

Contact

For any questions, feel free to reach out:
Email: nuislam (at) asu.edu

License

Released under the ASU GitHub Project License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages