diff --git a/.gitignore b/.gitignore index 5d6e73e1..866e942b 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,7 @@ examples/autotest.py temp_outputs .ruff_cache *built_with* +nnunet_out* #vscode files /.idea diff --git a/docs/AutoPipeline.md b/docs/AutoPipeline.md index 835ee021..c4f3e365 100644 --- a/docs/AutoPipeline.md +++ b/docs/AutoPipeline.md @@ -174,14 +174,18 @@ The contours can be selected by creating a YAML file to define a regular express ```sh OUTPUT_DIRECTORY ├── nnUNet_preprocessed - ├── nnUNet_raw_data_base - │ └── nnUNet_raw_data - │ └── Task500_HNSCC - │ ├── imagesTr - │ ├── imagesTs - │ ├── labelsTr - │ └── labelsTs - └── nnUNet_trained_models + ├── nnUNet_raw + │   └── Dataset001_HNSCC + │   ├── dataset.csv + │   ├── dataset.json + │   ├── imagesTr + │   ├── imagesTs + │   ├── labelsTr + │   ├── labelsTs + │   ├── markdown_images + │   ├── nnunet_preprocess_and_train.sh + │   └── report.md + └── nnUNet_results ``` 2. **Training Size** @@ -231,7 +235,7 @@ The contours can be selected by creating a YAML file to define a regular express ```sh OUTPUT_DIRECTORY - ├── 0_subject1_0000.nii.gz + ├── {DATASET}_{SUBJECT_NUM}_{MODALITY}.nii.gz └── ... ``` @@ -246,8 +250,15 @@ The contours can be selected by creating a YAML file to define a regular express A dataset json file may look like this: ```json { - "modality":{ - "0": "CT" - } + "channel_names": { + "0": "CT" + }, + "labels": { + "background": 0, + "GTV": 1 + }, + "numTraining": 5, + "file_ending": ".nii.gz", + "licence": "hands off!" } ``` diff --git a/docs/nnUNet.md b/docs/nnUNet.md index 26e01e40..1bb03f1c 100644 --- a/docs/nnUNet.md +++ b/docs/nnUNet.md @@ -19,122 +19,7 @@ autopipeline\ Modalities can also be set to `--modalities MR,RTSTRUCT` AutoPipeline offers many more options and features for you to customize your outputs: < ->. -## nnUNet Preprocess and Train - -### One-Step Preprocess and Train - -Med-ImageTools generates a file in your output folder called `nnunet_preprocess_and_train.sh` that combines all the commands needed for preprocessing and training your nnUNet model. Run that shell script to get a fully trained nnUNet model. - -Alternatively, you can go through each step individually as follows below: - -### nnUNet Preprocessing - -Follow the instructions for setting up your paths for nnUNet: - -Med-ImageTools generates the dataset.json that nnUNet requires in the output directory that you specify. - -The generated output directory structure will look something like: - -```sh -OUTPUT_DIRECTORY -├── nnUNet_preprocessed -├── nnUNet_raw_data_base -│ └── nnUNet_raw_data -│ └── Task500_HNSCC -│ ├── nnunet_preprocess_and_train.sh -│ └── ... -└── nnUNet_trained_models -``` - -nnUNet requires that environment variables be set before any commands are executed. To temporarily set them, run the following: - -```sh -export nnUNet_raw_data_base="/OUTPUT_DIRECTORY/nnUNet_raw_data_base" -export nnUNet_preprocessed="/OUTPUT_DIRECTORY/nnUNet_preprocessed" -export RESULTS_FOLDER="/OUTPUT_DIRECTORY/nnUNet_trained_models" -``` - -To permanently set these environment variables, make sure that in your `~/.bashrc` file, these environment variables are set for nnUNet. The `nnUNet_preprocessed` and `nnUNet_trained_models` folders are generated as empty folders for you by Med-ImageTools. `nnUNet_raw_data_base` is populated with the required raw data files. Add this to the file: - -```sh -export nnUNet_raw_data_base="/OUTPUT_DIRECTORY/nnUNet_raw_data_base" -export nnUNet_preprocessed="/OUTPUT_DIRECTORY/nnUNet_preprocessed" -export RESULTS_FOLDER="/OUTPUT_DIRECTORY/nnUNet_trained_models" -``` - -Then, execute the command: - -```sh -source ~/.bashrc -``` - -Too allow nnUNet to preprocess your data for training, run the following command. Set XXX to the ID that you want to preprocess. This is your task ID. For example, for Task500_HNSCC, the task ID is 500. Task IDs must be between 500 and 999, so Med-ImageTools can run 500 instances with the `--nnunet` flag in a single output folder. - -```sh -nnUNet_plan_and_preprocess -t XXX --verify_dataset_integrity -``` - -### nnUNet Training - -Once nnUNet has finished preprocessing, you may begin training your nnUNet model. To train your model, run the following command. Learn more about nnUNet's options here: - -```sh -nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD -``` - -## nnUNet Inference - -For inference data, nnUNet requires data to be in a different output format. To run AutoPipeline for nnUNet inference, run the following command: - -```sh -autopipeline\ - [INPUT_DIRECTORY] \ - [OUTPUT_DIRECTORY] \ - --modalities CT \ - --nnunet_inference \ - --dataset_json_path [DATASET_JSON_PATH] -``` -To execute this command AutoPipeline needs a json file with the image modality definitions. - -Modalities can also be set to `--modalities MR`. - -The directory structue will look like: - -```sh -OUTPUT_DIRECTORY -├── 0_subject1_0000.nii.gz -└── ... -``` - -To run inference, run the command: - -```sh -nnUNet_predict -i INPUT_FOLDER -o OUTPUT_FOLDER -t TASK_NAME_OR_ID -m CONFIGURATION -``` - -In this case, the `INPUT_FOLDER` of nnUNet is the `OUTPUT_DIRECTORY` of Med-ImageTools.# Preparing Data for nnUNet - -nnUNet repo can be found at: - -## Processing DICOM Data with Med-ImageTools - -Ensure that you have followed the steps in before proceeding. - -To convert your data from DICOM to NIfTI for training an nnUNet auto-segmentation model, run the following command: - -```sh -autopipeline\ - [INPUT_DIRECTORY] \ - [OUTPUT_DIRECTORY] \ - --modalities CT,RTSTRUCT \ - --nnunet -``` - -Modalities can also be set to `--modalities MR,RTSTRUCT` - -AutoPipeline offers many more options and features for you to customize your outputs: . ## nnUNet Preprocess and Train @@ -155,28 +40,28 @@ The generated output directory structure will look something like: ```sh OUTPUT_DIRECTORY ├── nnUNet_preprocessed -├── nnUNet_raw_data_base -│ └── nnUNet_raw_data -│ └── Task500_HNSCC -│ ├── nnunet_preprocess_and_train.sh -│ └── ... -└── nnUNet_trained_models +├── nnUNet_raw +│   └── Dataset001_HNSCC +│   ├── nnunet_preprocess_and_train.sh +│   └── ... +└── nnUNet_results + ``` nnUNet requires that environment variables be set before any commands are executed. To temporarily set them, run the following: ```sh -export nnUNet_raw_data_base="/OUTPUT_DIRECTORY/nnUNet_raw_data_base" +export nnUNet_raw="/OUTPUT_DIRECTORY/nnUNet_raw" export nnUNet_preprocessed="/OUTPUT_DIRECTORY/nnUNet_preprocessed" -export RESULTS_FOLDER="/OUTPUT_DIRECTORY/nnUNet_trained_models" +export nnUNet_results="/OUTPUT_DIRECTORY/nnUNet_results" ``` -To permanently set these environment variables, make sure that in your `~/.bashrc` file, these environment variables are set for nnUNet. The `nnUNet_preprocessed` and `nnUNet_trained_models` folders are generated as empty folders for you by Med-ImageTools. `nnUNet_raw_data_base` is populated with the required raw data files. Add this to the file: +To permanently set these environment variables, make sure that in your `~/.bashrc` file, these environment variables are set for nnUNet. The `nnUNet_preprocessed` and `nnUNet_results` folders are generated as empty folders for you by Med-ImageTools. `nnUNet_raw` is populated with the required raw data files. Add this to the file: ```sh -export nnUNet_raw_data_base="/OUTPUT_DIRECTORY/nnUNet_raw_data_base" +export nnUNet_raw="/OUTPUT_DIRECTORY/nnUNet_raw" export nnUNet_preprocessed="/OUTPUT_DIRECTORY/nnUNet_preprocessed" -export RESULTS_FOLDER="/OUTPUT_DIRECTORY/nnUNet_trained_models" +export nnUNet_results=="/OUTPUT_DIRECTORY/nnUNet_results" ``` Then, execute the command: @@ -185,18 +70,18 @@ Then, execute the command: source ~/.bashrc ``` -Too allow nnUNet to preprocess your data for training, run the following command. Set XXX to the ID that you want to preprocess. This is your task ID. For example, for Task500_HNSCC, the task ID is 500. Task IDs must be between 500 and 999, so Med-ImageTools can run 500 instances with the `--nnunet` flag in a single output folder. +Too allow nnUNet to preprocess your data for training, run the following command. Set X to the ID that you want to preprocess. This is your dataset ID. For example, for Dataset001_HNSCC, the dataset ID is 1. Dataset IDs must be between 1 and 999, so Med-ImageTools can run 999 instances with the `--nnunet` flag in a single output folder. ```sh -nnUNet_plan_and_preprocess -t XXX --verify_dataset_integrity +nnUNetv2_plan_and_preprocess -d X --verify_dataset_integrity -c 3d_fullres ``` ### nnUNet Training -Once nnUNet has finished preprocessing, you may begin training your nnUNet model. To train your model, run the following command. Learn more about nnUNet's options here: +Once nnUNet has finished preprocessing, you may begin training your nnUNet model. To train your model, run the following command. Learn more about nnUNet's options here: ```sh -nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD +nnUNetv2_train DATASET_NAME_OR_ID UNET_CONFIGURATION FOLD ``` ## nnUNet Inference @@ -218,15 +103,15 @@ Modalities can also be set to `--modalities MR`. The directory structue will look like: ```sh -OUTPUT_DIRECTORY -├── 0_subject1_0000.nii.gz -└── ... + OUTPUT_DIRECTORY + ├── {DATASET}_{SUBJECT_NUM}_{MODALITY}.nii.gz + └── ... ``` To run inference, run the command: ```sh -nnUNet_predict -i INPUT_FOLDER -o OUTPUT_FOLDER -t TASK_NAME_OR_ID -m CONFIGURATION +nnUNetv2_predict -i INPUT_FOLDER -o OUTPUT_FOLDER -d DATASET_NAME_OR_ID -c CONFIGURATION ``` -In this case, the `INPUT_FOLDER` of nnUNet is the `OUTPUT_DIRECTORY` of Med-ImageTools. +In this case, the `INPUT_FOLDER` of nnUNet is the `OUTPUT_DIRECTORY` of Med-ImageTools. \ No newline at end of file diff --git a/src/imgtools/autopipeline.py b/src/imgtools/autopipeline.py index 5d9759fe..eba7d8b2 100644 --- a/src/imgtools/autopipeline.py +++ b/src/imgtools/autopipeline.py @@ -13,7 +13,7 @@ from imgtools.ops import StructureSetToSegmentation, ImageAutoInput, ImageAutoOutput, Resample from imgtools.pipeline import Pipeline -from imgtools.utils.nnunet import generate_dataset_json, markdown_report_images +from imgtools.utils.nnunet import generate_dataset_json, create_train_script, markdown_report_images from imgtools.utils.args import parser from imgtools.logging import logger @@ -148,7 +148,6 @@ def __init__(self, if not nnunet and continue_processing and not os.path.exists(pathlib.Path(output_directory, ".temp").as_posix()): raise FileNotFoundError(f"Cannot continue processing. .temp directory does not exist in {output_directory}. Run without --continue_processing to start from scratch.") - study_name = os.path.split(self.input_directory)[1] if nnunet_inference: roi_yaml_path = "" custom_train_test_split = False @@ -156,34 +155,49 @@ def __init__(self, if modalities != "CT" and modalities != "MR": raise ValueError("nnUNet inference can only be run on image files. Please set modalities to 'CT' or 'MR'") if nnunet: - self.base_output_directory = self.output_directory - if not os.path.exists(pathlib.Path(self.output_directory, "nnUNet_preprocessed").as_posix()): - os.makedirs(pathlib.Path(self.output_directory, "nnUNet_preprocessed").as_posix()) - if not os.path.exists(pathlib.Path(self.output_directory, "nnUNet_trained_models").as_posix()): - os.makedirs(pathlib.Path(self.output_directory, "nnUNet_trained_models").as_posix()) - self.output_directory = pathlib.Path(self.output_directory, "nnUNet_raw_data_base", - "nnUNet_raw_data").as_posix() - if not os.path.exists(self.output_directory): - os.makedirs(self.output_directory) + + pathlib.Path(self.output_directory, "nnUNet_results").mkdir(parents=True, exist_ok=True) + pathlib.Path(self.output_directory, "nnUNet_preprocessed").mkdir(parents=True, exist_ok=True) + raw_path = pathlib.Path(self.output_directory, "nnUNet_raw") + raw_path.mkdir(parents=True, exist_ok=True) + self.output_directory = raw_path.as_posix() + all_nnunet_folders = glob.glob(pathlib.Path(self.output_directory, "*", " ").as_posix()) - numbers = [int(os.path.split(os.path.split(folder)[0])[1][4:7]) for folder in all_nnunet_folders if os.path.split(os.path.split(folder)[0])[1].startswith("Task")] - if (len(numbers) == 0 and continue_processing) or not continue_processing or not os.path.exists(pathlib.Path(self.output_directory, f"Task{max(numbers)}_{study_name}", ".temp").as_posix()): - available_numbers = list(range(500, 1000)) - for folder in all_nnunet_folders: - folder_name = os.path.split(os.path.split(folder)[0])[1] - if folder_name.startswith("Task") and folder_name[4:7].isnumeric() and int(folder_name[4:7]) in available_numbers: - available_numbers.remove(int(folder_name[4:7])) - if len(available_numbers) == 0: - raise Error("There are not enough task ID's for the nnUNet output. Please make sure that there is at least one task ID available between 500 and 999, inclusive") - task_folder_name = f"Task{available_numbers[0]}_{study_name}" - self.output_directory = pathlib.Path(self.output_directory, task_folder_name).as_posix() - self.task_id = available_numbers[0] + + # Extract used dataset IDs from folder names that match the "Dataset###_" format + used_ids = { + int(pathlib.Path(folder).parent.parent.name[7:10]) + for folder in all_nnunet_folders + if pathlib.Path(folder).parent.parent.name.startswith("Dataset") + } + + study_name = pathlib.Path(self.input_directory).name + new_dataset_required = ( + not used_ids # No existing datasets + or not continue_processing # Processing shouldn't continue with existing datasets + or not pathlib.Path(self.output_directory, f"Dataset{max(used_ids):03}_{study_name}", ".temp").exists() # Temp folder missing + ) + + if new_dataset_required: + all_ids = set(range(1, 1000)) + available_ids = sorted(all_ids - used_ids) + if not available_ids: + raise ValueError( + "There are not enough dataset IDs for the nnUNet output. " + "Please ensure at least one dataset ID is available between 001 and 999, inclusive." + ) + dataset_id = available_ids[0] # Assign the first available dataset ID else: - self.task_id = max(numbers) - task_folder_name = f"Task{self.task_id}_{study_name}" - self.output_directory = pathlib.Path(self.output_directory, task_folder_name).as_posix() - if not os.path.exists(pathlib.Path(self.output_directory, ".temp").as_posix()): - os.makedirs(pathlib.Path(self.output_directory, ".temp").as_posix()) + dataset_id = max(used_ids) # Reuse the highest existing dataset ID + + self.dataset_id = dataset_id + + # Create the dataset folder name and update the output directory path + dataset_folder_name = f"Dataset{self.dataset_id:03}_{study_name}" + self.output_directory = pathlib.Path(self.output_directory, dataset_folder_name).as_posix() + + temp_folder_path = pathlib.Path(self.output_directory, ".temp") + temp_folder_path.mkdir(parents=True, exist_ok=True) if not dry_run: # Make a directory @@ -304,7 +318,7 @@ def __init__(self, raise FileNotFoundError(f"No file named {dataset_json_path} found. Image modality definitions are required for nnUNet inference") else: with open(dataset_json_path, "r") as f: - self.nnunet_info["modalities"] = {v: k.zfill(4) for k, v in json.load(f)["modality"].items()} + self.nnunet_info["modalities"] = {v: k.zfill(4) for k, v in json.load(f)["channel_names"].items()} # Input operations self.input = ImageAutoInput(input_directory, modalities, n_jobs, visualize, update) @@ -361,7 +375,6 @@ def process_one_subject(self, subject_id): if os.path.exists(pathlib.Path(self.output_directory,".temp",f'temp_{subject_id}.pkl').as_posix()): print(f"{subject_id} already processed") return - print("Processing:", subject_id) read_results = self.input(subject_id) @@ -413,6 +426,11 @@ def process_one_subject(self, subject_id): if hasattr(read_results[i], "metadata") and read_results[i].metadata is not None: metadata.update(read_results[i].metadata) + if self.is_nnunet or self.is_nnunet_inference: + nnunet_subject_name = f"{pathlib.Path(self.input_directory).name}_{subject_id.split('_')[0]:>03}" + + subject_name = "_".join(subject_id.split("_")[1::]) # Extracts {SUBJECT_NAME} + # modality is MR and the user has selected to have nnunet output if self.is_nnunet: if modality == "MR": # MR images can have various modalities like FLAIR, T1, etc. @@ -429,15 +447,15 @@ def process_one_subject(self, subject_id): self.total_modality_counter[modality] = 1 else: self.total_modality_counter[modality] += 1 - if "_".join(subject_id.split("_")[1::]) in self.train: - self.output(subject_id, image, output_stream, nnunet_info=self.nnunet_info) + if subject_name in self.train: + self.output(nnunet_subject_name, image, output_stream, nnunet_info=self.nnunet_info) else: - self.output(subject_id, image, output_stream, nnunet_info=self.nnunet_info, train_or_test="Ts") + self.output(nnunet_subject_name, image, output_stream, nnunet_info=self.nnunet_info, train_or_test="Ts") elif self.is_nnunet_inference: self.nnunet_info["current_modality"] = modality if modality == "CT" else metadata["AcquisitionContrast"] if self.nnunet_info["current_modality"] not in self.nnunet_info["modalities"].keys(): raise ValueError(f"The modality {self.nnunet_info['current_modality']} is not in the list of modalities that are present in dataset.json.") - self.output(subject_id, image, output_stream, nnunet_info=self.nnunet_info) + self.output(nnunet_subject_name, image, output_stream, nnunet_info=self.nnunet_info) else: self.output(subject_id, image, output_stream) @@ -518,7 +536,7 @@ def process_one_subject(self, subject_id): all_files = glob.glob(pathlib.Path(image_train_path, "*.nii.gz").as_posix()) # print(all_files) for file in all_files: - if subject_id in os.path.split(file)[1]: + if nnunet_subject_name in os.path.split(file)[1]: os.remove(file) warnings.warn(f"Patient {subject_id} is missing a complete image-label pair") self.patients_with_missing_labels.add("".join(subject_id.split("_")[1:])) @@ -539,10 +557,10 @@ def process_one_subject(self, subject_id): sparse_mask = np.transpose(mask.generate_sparse_mask().mask_array) sparse_mask = sitk.GetImageFromArray(sparse_mask) # convert the nparray to sitk image sparse_mask.CopyInformation(image) - if "_".join(subject_id.split("_")[1::]) in self.train: - self.output(subject_id, sparse_mask, output_stream, nnunet_info=self.nnunet_info, label_or_image="labels") # rtstruct is label for nnunet + if subject_name in self.train: + self.output(nnunet_subject_name, sparse_mask, output_stream, nnunet_info=self.nnunet_info, label_or_image="labels") # rtstruct is label for nnunet else: - self.output(subject_id, sparse_mask, output_stream, nnunet_info=self.nnunet_info, label_or_image="labels", train_or_test="Ts") + self.output(nnunet_subject_name, sparse_mask, output_stream, nnunet_info=self.nnunet_info, label_or_image="labels", train_or_test="Ts") else: # if there is only one ROI, sitk.GetArrayFromImage() will return a 3d array instead of a 4d array with one slice if len(mask_arr.shape) == 3: @@ -578,7 +596,7 @@ def process_one_subject(self, subject_id): metadata["Modalities"] = str(list(subject_modalities)) metadata["numRTSTRUCTs"] = num_rtstructs if self.is_nnunet: - metadata["Train or Test"] = "train" if "_".join(subject_id.split("_")[1::]) in self.train else "test" + metadata["Train or Test"] = "train" if subject_name in self.train else "test" with open(pathlib.Path(self.output_directory,".temp",f'{subject_id}.pkl').as_posix(),'wb') as f: # the continue flag depends on this being the last line in this method pickle.dump(metadata,f) return @@ -608,33 +626,36 @@ def save_data(self): shutil.rmtree(pathlib.Path(self.output_directory, ".temp").as_posix()) - # Save dataset json - if self.is_nnunet: # dataset.json for nnunet and .sh file to run to process it - imagests_path = pathlib.Path(self.output_directory, "imagesTs").as_posix() - images_test_location = imagests_path if os.path.exists(imagests_path) else None - generate_dataset_json(pathlib.Path(self.output_directory, "dataset.json").as_posix(), - pathlib.Path(self.output_directory, "imagesTr").as_posix(), - images_test_location, - tuple(self.nnunet_info["modalities"].keys()), - {v: k for k, v in self.existing_roi_indices.items()}, - os.path.split(self.input_directory)[1]) - _, child = os.path.split(self.output_directory) - shell_path = pathlib.Path(self.output_directory, child.split("_")[1]+".sh").as_posix() - if os.path.exists(shell_path): - os.remove(shell_path) - with open(shell_path, "w", newline="\n") as f: - output = "#!/bin/bash\n" - output += "set -e" - output += f'export nnUNet_raw_data_base="{self.base_output_directory}/nnUNet_raw_data_base"\n' - output += f'export nnUNet_preprocessed="{self.base_output_directory}/nnUNet_preprocessed"\n' - output += f'export RESULTS_FOLDER="{self.base_output_directory}/nnUNet_trained_models"\n\n' - output += f'nnUNet_plan_and_preprocess -t {self.task_id} --verify_dataset_integrity\n\n' - output += 'for (( i=0; i<5; i++ ))\n' - output += 'do\n' - output += f' nnUNet_train 3d_fullres nnUNetTrainerV2 {os.path.split(self.output_directory)[1]} $i --npz\n' - output += 'done' - f.write(output) - markdown_report_images(self.output_directory, self.total_modality_counter) # images saved to the output directory + if self.is_nnunet: + train_dir = ((pathlib.Path(self.output_directory)) / 'imagesTr') + num_training_cases = sum( # This can be different from len(self.train) if regex of ROI not matched + 1 for file in train_dir.iterdir() + if file.suffixes == ['.nii', '.gz'] + ) + test_dir = ((pathlib.Path(self.output_directory)) / 'imagesTs') + num_test_cases = sum( # This can be different from len(self.test) if regex of ROI not matched + 1 for file in test_dir.iterdir() + if file.suffixes == ['.nii', '.gz'] + ) if test_dir.exists() else 0 # no testing data + + channel_names_mapping = { # Earlier generated as {"CT": ""0000"} now needed as {"0": "CT"} + self.nnunet_info["modalities"][k].lstrip('0') or '0': k + for k in self.nnunet_info["modalities"].keys() + } + generate_dataset_json( + output_folder=pathlib.Path(self.output_directory), + channel_names=channel_names_mapping, + labels=self.existing_roi_indices, + file_ending='.nii.gz', + num_training_cases=num_training_cases + ) + create_train_script(self.output_directory, self.dataset_id) + markdown_report_images( + self.output_directory, + self.total_modality_counter, + num_training_cases, + num_test_cases + ) # Save summary info (factor into different file) markdown_path = pathlib.Path(self.output_directory, "report.md").as_posix() diff --git a/src/imgtools/utils/nnunet.py b/src/imgtools/utils/nnunet.py index 584a7534..389082be 100644 --- a/src/imgtools/utils/nnunet.py +++ b/src/imgtools/utils/nnunet.py @@ -1,103 +1,169 @@ -from typing import Tuple, List -import os -import pathlib -import glob -import json -import numpy as np +from typing import Tuple, Dict +import pathlib, json import matplotlib.pyplot as plt +def markdown_report_images( + output_folder: str | pathlib.Path, + modality_count: Dict[str, int], + train_total: int, + test_total: int) -> None: + output_folder = pathlib.Path(output_folder) + images_folder = output_folder / "markdown_images" -def markdown_report_images(output_folder, modality_count): + images_folder.mkdir(parents=True, exist_ok=True) + + # Bar plot for modality counts modalities = list(modality_count.keys()) modality_totals = list(modality_count.values()) - if not os.path.exists(pathlib.Path(output_folder, "markdown_images").as_posix()): - os.makedirs(pathlib.Path(output_folder, "markdown_images").as_posix()) - plt.figure(1) + plt.figure() plt.bar(modalities, modality_totals) - plt.savefig(pathlib.Path(output_folder, "markdown_images", "nnunet_modality_count.png").as_posix()) - - plt.figure(2) - train_total = len(glob.glob(pathlib.Path(output_folder, "labelsTr", "*.nii.gz").as_posix())) - test_total = len(glob.glob(pathlib.Path(output_folder, "labelsTs", "*.nii.gz").as_posix())) - plt.pie([train_total, test_total], labels=[f"Train - {train_total}", f"Test - {test_total}"]) - plt.savefig(pathlib.Path(output_folder, "markdown_images", "nnunet_train_test_pie.png").as_posix()) - -# this code is taken from: -# Division of Medical Image Computing, German Cancer Research Center (DKFZ) -# in the nnUNet and batchgenerator repositories - - -def save_json(obj, file: str, indent: int = 4, sort_keys: bool = True) -> None: + plt.title("Modality Counts") + plt.xlabel("Modalities") + plt.ylabel("Counts") + plt.savefig(images_folder / "nnunet_modality_count.png") + plt.close() + + # Pie chart for train/test distribution + plt.figure() + plt.pie( + [train_total, test_total], + labels=[f"Train - {train_total}", f"Test - {test_total}"], + autopct='%1.1f%%', + ) + plt.title("Train/Test Distribution") + plt.savefig(images_folder / "nnunet_train_test_pie.png") + plt.close() + + +def save_json( + obj: dict, + file: str | pathlib.Path, + indent: int = 4, + sort_keys: bool = True) -> None: with open(file, 'w') as f: json.dump(obj, f, sort_keys=sort_keys, indent=indent) +def create_train_script( + output_directory: str | pathlib.Path, + dataset_id: int): + """ + Creates a bash script (`nnunet_preprocess_and_train.sh`) for running nnUNet training, with paths for raw data, + preprocessed data, and trained models. The script ensures environment variables are set and + executes the necessary training commands. -def get_identifiers_from_splitted_files(folder: str): - uniques = np.unique([i[:-12] for i in subfiles(folder, suffix='.nii.gz', join=False)]) - return uniques - - -def subfiles(folder: str, join: bool = True, prefix: str = None, suffix: str = None, sort: bool = True) -> List[str]: - if join: - path_fn = os.path.join - else: - def path_fn(x, y): return y - - res = [path_fn(folder, i) for i in os.listdir(folder) if os.path.isfile(os.path.join(folder, i)) - and (prefix is None or i.startswith(prefix)) - and (suffix is None or i.endswith(suffix))] - if sort: - res.sort() - return res - - -def generate_dataset_json(output_file: str, imagesTr_dir: str, imagesTs_dir: str, modalities: Tuple, - labels: dict, dataset_name: str, sort_keys=True, license: str = "hands off!", dataset_description: str = "", - dataset_reference="", dataset_release='0.0'): + Parameters: + - output_directory (str): The directory where the output and subdirectories are located. + - dataset_id (int): The ID of the dataset to be processed. """ - :param output_file: This needs to be the full path to the dataset.json you intend to write, so - output_file='DATASET_PATH/dataset.json' where the folder DATASET_PATH points to is the one with the - imagesTr and labelsTr subfolders - :param imagesTr_dir: path to the imagesTr folder of that dataset - :param imagesTs_dir: path to the imagesTs folder of that dataset. Can be None - :param modalities: tuple of strings with modality names. must be in the same order as the images (first entry - corresponds to _0000.nii.gz, etc). Example: ('T1', 'T2', 'FLAIR'). - :param labels: dict with int->str (key->value) mapping the label IDs to label names. Note that 0 is always - supposed to be background! Example: {0: 'background', 1: 'edema', 2: 'enhancing tumor'} - :param dataset_name: The name of the dataset. Can be anything you want - :param sort_keys: In order to sort or not, the keys in dataset.json - :param license: - :param dataset_description: - :param dataset_reference: website of the dataset, if available - :param dataset_release: - :return: + # Define paths using pathlib + output_directory = pathlib.Path(output_directory) + shell_path = output_directory / 'nnunet_preprocess_and_train.sh' + base_dir = output_directory.parent.parent + + if shell_path.exists(): + shell_path.unlink() + + # Define the environment variables and the script commands + script_content = f"""#!/bin/bash +set -e + +export nnUNet_raw="{base_dir}/nnUNet_raw" +export nnUNet_preprocessed="{base_dir}/nnUNet_preprocessed" +export nnUNet_results="{base_dir}/nnUNet_results" + +nnUNetv2_plan_and_preprocess -d {dataset_id} --verify_dataset_integrity -c 3d_fullres + +for (( i=0; i<5; i++ )) +do + nnUNetv2_train {dataset_id} 3d_fullres $i +done +""" + + # Write the script content to the file + with shell_path.open("w", newline="\n") as f: + f.write(script_content) + +# Code take from: https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/dataset_conversion/generate_dataset_json.py + +def generate_dataset_json(output_folder: pathlib.Path | str, + channel_names: Dict[str, str], + labels: Dict[str, int], + num_training_cases: int, + file_ending: str, + regions_class_order: Tuple[int, ...] = None, + dataset_name: str = None, + reference: str = None, + release: str = None, + usage_license: str = 'hands off!', + description: str = None, + overwrite_image_reader_writer: str = None, + **kwargs): """ - train_identifiers = get_identifiers_from_splitted_files(imagesTr_dir) - - if imagesTs_dir is not None: - test_identifiers = get_identifiers_from_splitted_files(imagesTs_dir) - else: - test_identifiers = [] - - json_dict = {} - json_dict['name'] = dataset_name - json_dict['description'] = dataset_description - json_dict['tensorImageSize'] = "4D" - json_dict['reference'] = dataset_reference - json_dict['licence'] = license - json_dict['release'] = dataset_release - json_dict['modality'] = {str(i): modalities[i] for i in range(len(modalities))} - json_dict['labels'] = {str(i): labels[i] for i in labels.keys()} - - json_dict['numTraining'] = len(train_identifiers) - json_dict['numTest'] = len(test_identifiers) - json_dict['training'] = [ - {'image': "./imagesTr/%s.nii.gz" % i, "label": "./labelsTr/%s.nii.gz" % i} for i - in - train_identifiers] - json_dict['test'] = ["./imagesTs/%s.nii.gz" % i for i in test_identifiers] - - if not output_file.endswith("dataset.json"): - print("WARNING: output file name is not dataset.json! This may be intentional or not. You decide. " - "Proceeding anyways...") - save_json(json_dict, os.path.join(output_file), sort_keys=sort_keys) + Generates a dataset.json file in the output folder + + channel_names: + Channel names must map the index to the name of the channel, example: + { + 0: 'T1', + 1: 'CT' + } + Note that the channel names may influence the normalization scheme!! Learn more in the documentation. + + labels: + This will tell nnU-Net what labels to expect. Important: This will also determine whether you use region-based training or not. + Example regular labels: + { + 'background': 0, + 'left atrium': 1, + 'some other label': 2 + } + Example region-based training: + { + 'background': 0, + 'whole tumor': (1, 2, 3), + 'tumor core': (2, 3), + 'enhancing tumor': 3 + } + + Remember that nnU-Net expects consecutive values for labels! nnU-Net also expects 0 to be background! + + num_training_cases: is used to double check all cases are there! + + file_ending: needed for finding the files correctly. IMPORTANT! File endings must match between images and + segmentations! + + dataset_name, reference, release, license, description: self-explanatory and not used by nnU-Net. Just for + completeness and as a reminder that these would be great! + + overwrite_image_reader_writer: If you need a special IO class for your dataset you can derive it from + BaseReaderWriter, place it into nnunet.imageio and reference it here by name + + kwargs: whatever you put here will be placed in the dataset.json as well + + """ + + has_regions: bool = any([isinstance(i, (tuple, list)) and len(i) > 1 for i in labels.values()]) + if has_regions: + assert regions_class_order is not None, "You have defined regions but regions_class_order is not set. " \ + "You need that." + + # Construct the dataset JSON structure + dataset_json = { + "channel_names": channel_names, + "labels": labels, + "numTraining": num_training_cases, + "file_ending": file_ending, + "name": dataset_name, + "reference": reference, + "release": release, + "licence": usage_license, + "description": description, + "overwrite_image_reader_writer": overwrite_image_reader_writer, + "regions_class_order": regions_class_order, + } + + dataset_json = {k: v for k, v in dataset_json.items() if v is not None} + + dataset_json.update(kwargs) + + save_json(dataset_json, pathlib.Path(output_folder) / 'dataset.json', sort_keys=False) \ No newline at end of file