iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer. (Under Review)
Fengtao Zhou,
Yingxue Xu,
Yanfen Cui,
Shenyan Zhang,
Yun Zhu,
Weiyang He,
Jiguang Wang,
Xin Wang,
Ronald Chan,
Louis Ho Shing Lau,
Chu Han,
Dafu Zhang,
Zhenhui Li*,
Hao Chen*
Key Ideas & Main Findings
- Background: Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among patients, with a considerable subset displaying treatment resistance. Ineffective NACT not only leads to adverse effects but also misses the optimal therapeutic window, resulting in lower survival rate. Hence, it is crucial to utilize clinical data to precisely predict treatment response and survival prognosis for GC patients.
- Multimodal Learning for Gastric Cancer Analysis: Existing methods relying on unimodal data falls short in capturing GC's multifaceted nature, whereas multimodal data offers a more holistic and comprehensive insight for prediction. However, existing multimodal learning methods assume the availability of all modalities for each patient, which does not align with the reality of clinical practice. The limited availability of modalities for each patient would cause information loss, adversely affecting predictive accuracy.
- Incomplete Multimodal Data Integration Framework for Gastric Cancer: In this study, we propose an incomplete multimodal data integration framework for GC (iMD4GC) to address the challenges posed by incomplete multimodal data, enabling precise response prediction and survival analysis. Specifically, iMD4GC incorporates unimodal attention layers for each modality to capture intra-modal information. Subsequently, the cross-modal interaction layers explore potential inter-modal interactions and capture complementary information across modalities, thereby enabling information compensation for missing modalities. To enhance the ability to handle severely incomplete multimodal data, iMD4GC employs a ``more-to-fewer'' knowledge distillation, transferring knowledge learned from more modalities to fewer ones.
- Datasets: To evaluate iMD4GC, we collected three multimodal datasets for GC study: GastricRes (698 cases) for response prediction, GastricSur (801 cases) for survival analysis, and TCGA-STAD (400 cases) for survival analysis. The scale of our datasets is significantly larger than previous studies.
- Experimental Results: The iMD4GC achieved impressive performance with an 80.2% AUC on GastricRes, 71.4% C-index on GastricSur, and 66.1% C-index on TCGA-STAD, significantly surpassing other compared methods. Moreover, iMD4GC exhibits inherent interpretability, enabling transparent analysis of the decision-making process and providing valuable insights to clinicians. Furthermore, the flexible scalability provided by iMD4GC holds immense significance for clinical practice, facilitating precise oncology through artificial intelligence and multimodal data integration.
Please follow this GitHub for more updates.
- Source code for reproducing experimental results on GastricRes dataset.
- Source code for reproducing experimental results on GastricSur dataset.
- Source code for reproducing experimental results on TCGA-STAD dataset.
To install dependencies,
torch 1.12.0+cu116
scikit-survival 0.19.0
To evaluate iMD4GC, we collected three multimodal datasets for GC study: GastricRes (698 cases) for response prediction, GastricSur (801 cases) for survival analysis, and TCGA-STAD (400 cases) for survival analysis.
- GastricRes: This dataset was collected from four prominent medical hospitals in China. This dataset encompasses comprehensive information from 698 patients who were diagnosed with gastric cancer and underwent Neoadjuvant Chemotherapy (NACT) treatment. The dataset consists of three modalities: clinical records, whole slide images (WSI), and computed tomography images (CT).
- GastricSur: This dataset was collected from two prominent medical hospitals in China. It comprises comprehensive data from a cohort of 801 patients who were diagnosed with gastric cancer and subsequently underwent surgical resection. Similar to the GastricRes dataset, this collection encompasses three distinct modalities: clinical records, whole slide images (WSI), and computed tomography scans (CT). Throughout the follow-up period, there are 286 patients who died.
- TCGA-STAD: Add weakly-supervised results for Tensorboard. This dataset was obtained from The Cancer Genome Atlas (TCGA) database. It contains data from 400 patients diagnosed with gastric cancer. Different from GastricRes and GastricSur datasets, the modalities involved in this dataset are: clinical records, WSI, and RNA-seq. The clinical records are downloaded from the LinkedOmics. The WSI are downloaded from the GDC Data Portal. The RNA-seq data is downloaded from the cBioPortal. Specifically, all patients in this dataset have clinical records, while 363 patients have WSI and 374 patients have RNA-seq.
The first two datasets are not publicly released due to restrictions by privacy concern, but they are available from the corresponding author on reasonable request. The TCGA-STAD dataset is publicly available at GDC Data Portal. The formatted data of TCGA-STAD is available at OneDrive, which can be used to reproduce the results in this study.
This work was supported by National Natural Science Foundation of China (No. 62202403, 82001986, and 82360345), Hong Kong Innovation and Technology Fund (No. PRP/034/22FX), Shenzhen Science and Technology Innovation Committee Funding (Project No. SGDX20210823103201011), the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. R6003-22 and C4024-22GF).