This notebook is derived from the CXR Foundation demo notebook. It leverages the Change Healthcare Stratus Imaging PACS for creating the annotated datasets in DICOM and the Google Cloud Medical Imaging Suite (MIS) to process the labeled datasets and uptrain a foundation model. The purpose of this demo is to show that, by implementing this pattern and leveraging a combination of these capabilities on Google Cloud, you can rapidly accelerate the medical imaging ML development process.
MIS provides integration patterns between clinical imaging systems and Google Cloud services that can be used to accelerate medical imaging ML data de-id, cohorting, labeling, training, and transform. Such services are the Healthcare API, Big Query, and Vertex AI.
Commercial PACS systems are already found in most enterprise imaging environments, as they are used for the majority of diagnostic radiology interpretation, and being able to leverage existing labeling capabilities through DICOM (KOS, GSPS, SR, SEG) as input to model training should accelerate the labeling process. These systems already have a tremendous amount of annotated data. The trick is being able to leverage it in a deterministic way. Additionally using this process will lead the result of the inference stage to being interoperable with what the PACS expects as input.
Using foundation models accelerates the data curation and training pipeline development and runtime. Leveraging a foundation model means input data is less than a traditionally trained model, which means less data has to be labeled.
In this demo we will be building a medical imaging model that predicts Pneumothorax using x-ray images. For this, we will label 200 x-ray studies for Pneumothorax in the Change Healthcare Stratus Imaging PACS by creating a DICOM Key Object Selection (KOS) for each image that presents positive. Studies negative for Pneumothorax will not contain a KOS.
In a real world scenario there will also be diagnostic report text available and if leveraging Google Cloud NLP or LLM APIs these can provide further pointers to studies of relevance based on findings.
In order to label images, you must have some images within your commercial system. We’ll be using images from the NIH Chest X-ray dataset, since it already has x-rays that are labeled with Pneumothorax, as well as other conditions. In this demo we will be following the existing labels and recreating them in the PACS system using the process above. This is to demonstrate that existing data and processes in DICOM within existing PACS systems can be used as readily labeled data ML training.
The data that we are using in this demo is de-identified. You may use data with PHI if it is an approved option in your organization. There is also an option to use the Healthcare API De-id if you are looking to de-id your DICOM data.
NOTE: If you don't have access to a Change Healthcare Stratus Imaging PACS system to do the labeling, then you can do one of the following:
- Request access to a test instance of the Change Healthcare Stratus Imaging PACS.
- Leverage staged sample data from
./data/staged/inputs/*.dcm
and skip to the Prepare DICOM instances section
At the end of the notebook we will have a modal that can detect Pneumothorax in X-ray images. Please note that this is only an example and in its current state it is not for diagnostic use.
Now it's time to deploy and run our notebook!
- We hope this content has shown that a combination of PACS, MIS, and a foundation model, can accelerate multiple steps in the Medical Imaging AI development pipeline.
- Annotated data from a commercial PACS systems can be used as labeled data in an ML training pipeline as long as it is fully accessible and there is a deterministic transform, prioritizing DICOM.