-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update to nnunetV2 #145
base: main
Are you sure you want to change the base?
Conversation
…ADCURE-0005_000_0000.nii.gz)
WalkthroughThe pull request introduces significant modifications to the nnUNet documentation and codebase. Key changes include a restructuring of the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant AutoPipeline
participant nnunet
User->>AutoPipeline: Start processing
AutoPipeline->>AutoPipeline: Create output directories
AutoPipeline->>AutoPipeline: Determine dataset ID
AutoPipeline->>nnunet: Generate dataset JSON
nnunet-->>AutoPipeline: Return dataset JSON
AutoPipeline->>nnunet: Create training script
nnunet-->>AutoPipeline: Return training script
AutoPipeline-->>User: Processing complete
Possibly related PRs
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (6)
docs/nnUNet.md (1)
Line range hint
1-117
: Consider enhancing the documentation further.Given that one of the PR objectives is to "Update Documentation", consider the following improvements:
- Add information about testing on large subsets of data (mentioned in PR objectives)
- Include performance considerations or best practices
- Add troubleshooting section for common issues
- Consider adding version compatibility information
src/imgtools/utils/nnunet.py (3)
133-134
: Remove unnecessaryf
prefix from non-formatted strings.The
f
prefix is unnecessary since the strings do not contain any placeholders. Removing it will prevent confusion.Apply this diff to correct the strings:
- assert regions_class_order is not None, f"You have defined regions but regions_class_order is not set. " \ - f"You need that." + assert regions_class_order is not None, "You have defined regions but regions_class_order is not set. " \ + "You need that."🧰 Tools
🪛 Ruff (0.8.0)
133-133: f-string without any placeholders
Remove extraneous
f
prefix(F541)
134-134: f-string without any placeholders
Remove extraneous
f
prefix(F541)
143-143
: Use descriptive variable names and simplify dictionary iteration.The variable name
l
is ambiguous. Renaming it to something likelabel_key
improves readability. Also, you can iterate directly over the dictionary without calling.keys()
.Apply this diff to enhance clarity:
-for l in labels.keys(): +for label_key in labels:🧰 Tools
🪛 Ruff (0.8.0)
143-143: Ambiguous variable name:
l
(E741)
143-143: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
153-153
: Maintain professional tone in comments.The comment "Live with it." can be perceived as unprofessional. Consider rephrasing it to maintain a respectful tone.
Apply this diff to update the comment:
- # channel_names now. Live with it. + # Renamed to 'channel_names' for clarity.src/imgtools/autopipeline.py (2)
Line range hint
1-1
: Remove unnecessary import ofError
fromaifc
The import statement
from aifc import Error
is not used in the code or is inappropriate for raising exceptions in this context. Consider removing it to clean up unused imports.Apply this diff to remove the unused import:
- from aifc import Error
633-633
: Simplify dictionary iteration by omitting.keys()
When iterating over a dictionary's keys, you can iterate directly over the dictionary without calling
.keys()
. This makes the code more concise and idiomatic.Apply this diff to simplify the loop:
channel_names_mapping = { self.nnunet_info["modalities"][k].lstrip('0') or '0': k - for k in self.nnunet_info["modalities"].keys() + for k in self.nnunet_info["modalities"] }🧰 Tools
🪛 Ruff (0.8.0)
633-633: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
docs/nnUNet.md
(1 hunks)src/imgtools/autopipeline.py
(7 hunks)src/imgtools/utils/nnunet.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
src/imgtools/autopipeline.py
633-633: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
src/imgtools/utils/nnunet.py
2-2: os
imported but unused
Remove unused import
(F401)
2-2: glob
imported but unused
Remove unused import
(F401)
133-133: f-string without any placeholders
Remove extraneous f
prefix
(F541)
134-134: f-string without any placeholders
Remove extraneous f
prefix
(F541)
143-143: Ambiguous variable name: l
(E741)
143-143: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (1)
docs/nnUNet.md (1)
117-117
: LGTM! Helpful clarification added.
The added clarification about the relationship between Med-ImageTools' output directory and nnUNet's input folder is valuable for users.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #145 +/- ##
==========================================
+ Coverage 47.19% 47.48% +0.29%
==========================================
Files 41 41
Lines 3348 3319 -29
==========================================
- Hits 1580 1576 -4
+ Misses 1768 1743 -25 ☔ View full report in Codecov by Sentry. |
also update PR title pls <3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (5)
docs/nnUNet.md (5)
43-48
: Consider enhancing directory structure documentation.The directory structure is accurate for nnUNetV2, but it would be helpful to add brief descriptions of each directory's purpose:
nnUNet_raw
: Contains the raw dataset filesnnUNet_preprocessed
: Stores preprocessed datannUNet_results
: Stores training results and model weights
73-76
: Consider documenting configuration options.While the preprocessing command is correct, it would be helpful to explain:
- Available configuration options beyond
3d_fullres
- When to use different configurations
- Expected preprocessing time
107-107
: Document inference file naming convention.The example shows
subject1_000_0000.nii.gz
but doesn't explain:
- The meaning of the numeric suffixes
- Expected file naming patterns
- How to map these files back to original data
114-114
: Enhance inference command documentation.While the command syntax is correct, consider documenting:
- Available CONFIGURATION options
- Expected inference time
- Memory requirements
- GPU/CPU requirements
117-117
: Consider adding more context about folder relationships.While the input/output relationship is explained, it would be helpful to:
- Provide an example folder structure
- Explain any required file organization
- Document any naming conventions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (8)
docs/nnUNet.md (3)
73-76
: Consider adding more details about preprocessing options.The preprocessing command is correctly documented, but it would be helpful to explain:
- What the
-c 3d_fullres
parameter means- Other available configuration options
- Expected preprocessing duration
81-84
: Add parameter documentation for the training command.While the command syntax is correct, please add explanations for:
- Available
UNET_CONFIGURATION
options- Valid
FOLD
values- Expected training duration
114-117
: Enhance inference command documentation.Please add explanations for:
- Available values for the
-c CONFIGURATION
parameter- Expected inference time
- Memory requirements
src/imgtools/utils/nnunet.py (4)
5-9
: Add return type annotation and error handling.The function signature could be improved with proper return type annotation and validation of input parameters.
def markdown_report_images( output_folder: str | pathlib.Path, modality_count: Dict[str, int], train_total: int, - test_total: int) -> None: + test_total: int) -> pathlib.Path: + if not modality_count: + raise ValueError("modality_count cannot be empty") + if train_total < 0 or test_total < 0: + raise ValueError("train_total and test_total must be non-negative")
18-24
: Enhance plot styling and readability.The bar plot could benefit from improved styling for better visualization.
- plt.figure() - plt.bar(modalities, modality_totals) - plt.title("Modality Counts") - plt.xlabel("Modalities") - plt.ylabel("Counts") + plt.figure(figsize=(10, 6)) + bars = plt.bar(modalities, modality_totals) + plt.title("Modality Distribution", pad=20) + plt.xlabel("Modalities", labelpad=10) + plt.ylabel("Count", labelpad=10) + # Add value labels on top of each bar + for bar in bars: + height = bar.get_height() + plt.text(bar.get_x() + bar.get_width()/2., height, + f'{int(height)}', ha='center', va='bottom')
76-80
: Make training iterations configurable.The number of training iterations (5) should be a parameter rather than hard-coded.
+def create_train_script( + output_directory: str | pathlib.Path, + dataset_id: int, + num_folds: int = 5) -> pathlib.Path: ... -for (( i=0; i<5; i++ )) +for (( i=0; i<${num_folds}; i++ ))
97-97
: Fix inconsistent parameter naming.The parameter is named
usage_license
but used aslicence
in the dictionary. Maintain consistent naming.- usage_license: str = 'hands off!', + license: str = 'hands off!', ... - "licence": usage_license, + "license": license,Also applies to: 166-166
src/imgtools/autopipeline.py (1)
168-173
: Optimize the dictionary comprehension for dataset ID extraction.The current implementation can be simplified by removing the redundant
.keys()
call.Apply this diff to optimize the code:
- used_ids = { - int(pathlib.Path(folder).parent.parent.name[7:10]) - for folder in all_nnunet_folders - if pathlib.Path(folder).parent.parent.name.startswith("Dataset") - } + used_ids = { + int(parent_name[7:10]) + for folder in all_nnunet_folders + if (parent_name := pathlib.Path(folder).parent.parent.name).startswith("Dataset") + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
docs/nnUNet.md
(3 hunks)src/imgtools/autopipeline.py
(8 hunks)src/imgtools/utils/nnunet.py
(1 hunks)
🧰 Additional context used
📓 Learnings (1)
src/imgtools/autopipeline.py (1)
Learnt from: JoshuaSiraj
PR: bhklab/med-imagetools#145
File: src/imgtools/autopipeline.py:168-173
Timestamp: 2024-11-28T21:22:43.073Z
Learning: Dataset IDs are always 3-digit numbers in this codebase.
🪛 Ruff (0.8.0)
src/imgtools/autopipeline.py
632-632: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (6)
docs/nnUNet.md (2)
43-48
: LGTM: Directory structure is well documented.
The directory structure accurately reflects the new organization of nnUNet files and folders.
62-64
: Fix environment variable export syntax.
There's a syntax error in the environment variable export statements. The double equals (==
) for nnUNet_results
is incorrect.
src/imgtools/utils/nnunet.py (1)
1-2
: LGTM! Clean imports.
The imports have been properly cleaned up as suggested in the past review comments.
src/imgtools/autopipeline.py (3)
158-164
: LGTM: Directory creation is robust and safe.
The code safely creates nnUNet directories using mkdir
with appropriate flags to handle parent directories and prevent race conditions.
629-647
: LGTM: Comprehensive nnUNet setup with proper documentation.
The code effectively:
- Maps modalities to channel names
- Generates dataset.json with proper configuration
- Creates training scripts
- Generates markdown reports with visualizations
🧰 Tools
🪛 Ruff (0.8.0)
632-632: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
429-433
:
Add validation for subject_id format to prevent IndexError.
The code assumes subject_id
contains at least two parts separated by underscore. This could raise an IndexError if the format is invalid.
Apply this diff to add validation:
if self.is_nnunet or self.is_nnunet_inference:
+ parts = subject_id.split('_')
+ if len(parts) < 2:
+ raise ValueError(f"Invalid subject_id format: {subject_id}. Expected format: NUMBER_NAME")
- subject_id = f"{subject_id.split('_')[1]}_{subject_id.split('_')[0]:03}"
+ subject_id = f"{parts[1]}_{parts[0]:03}"
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
src/imgtools/utils/nnunet.py (1)
46-84
: Extract configuration and improve script generationThe function uses hardcoded folder names and lacks configuration flexibility.
Consider extracting the configuration to make the function more flexible:
+from dataclasses import dataclass + +@dataclass +class NNUNetConfig: + raw_dir: str = "nnUNet_raw" + preprocessed_dir: str = "nnUNet_preprocessed" + results_dir: str = "nnUNet_results" + num_folds: int = 5 + def create_train_script( output_directory: str | pathlib.Path, - dataset_id: int): + dataset_id: int, + config: NNUNetConfig = NNUNetConfig()) -> pathlib.Path: """ Creates a bash script (`nnunet_preprocess_and_train.sh`) for running nnUNet training, with paths for raw data, preprocessed data, and trained models. The script ensures environment variables are set and executes the necessary training commands. Parameters: - output_directory (str): The directory where the output and subdirectories are located. - dataset_id (int): The ID of the dataset to be processed. + - config (NNUNetConfig): Configuration for nnUNet directories and training parameters. + + Returns: + - pathlib.Path: Path to the generated script. """ if dataset_id < 0: raise ValueError("dataset_id must be non-negative") # Define paths using pathlib output_directory = pathlib.Path(output_directory) shell_path = output_directory / 'nnunet_preprocess_and_train.sh' base_dir = output_directory.parent.parent if shell_path.exists(): shell_path.unlink() # Define the environment variables and the script commands script_content = f"""#!/bin/bash set -e -export nnUNet_raw="{base_dir}/nnUNet_raw" -export nnUNet_preprocessed="{base_dir}/nnUNet_preprocessed" -export nnUNet_results="{base_dir}/nnUNet_results" +export nnUNet_raw="{base_dir}/{config.raw_dir}" +export nnUNet_preprocessed="{base_dir}/{config.preprocessed_dir}" +export nnUNet_results="{base_dir}/{config.results_dir}" nnUNetv2_plan_and_preprocess -d {dataset_id} --verify_dataset_integrity -c 3d_fullres -for (( i=0; i<5; i++ )) +for (( i=0; i<{config.num_folds}; i++ )) do nnUNetv2_train {dataset_id} 3d_fullres $i done """ # Write the script content to the file with shell_path.open("w", newline="\n") as f: f.write(script_content) + shell_path.chmod(0o755) # Make the script executable + return shell_path
… whe --ignore_missing_regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/imgtools/autopipeline.py (2)
168-192
: Consider using constants for dataset ID rangeThe code correctly handles dataset IDs, but could be improved by defining constants for better maintainability.
Consider this refactor:
+MIN_DATASET_ID = 1 +MAX_DATASET_ID = 999 +DATASET_ID_FORMAT = "{:03}" -all_ids = set(range(1, 1000)) +all_ids = set(range(MIN_DATASET_ID, MAX_DATASET_ID + 1)) -dataset_folder_name = f"Dataset{self.dataset_id:03}_{study_name}" +dataset_folder_name = f"Dataset{DATASET_ID_FORMAT.format(self.dataset_id)}_{study_name}"
Line range hint
429-458
: Simplify dictionary key lookupsThe modality handling logic is correct, but the dictionary key lookups can be simplified.
Apply this diff to simplify the lookups:
-if metadata["AcquisitionContrast"] not in self.total_modality_counter.keys(): +if metadata["AcquisitionContrast"] not in self.total_modality_counter: -if modality not in self.total_modality_counter.keys(): +if modality not in self.total_modality_counter: -if self.nnunet_info["current_modality"] not in self.nnunet_info["modalities"].keys(): +if self.nnunet_info["current_modality"] not in self.nnunet_info["modalities"]:🧰 Tools
🪛 Ruff (0.8.0)
456-456: Use
key not in dict
instead ofkey not in dict.keys()
Remove
.keys()
(SIM118)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
src/imgtools/autopipeline.py
(9 hunks)
🧰 Additional context used
📓 Learnings (1)
src/imgtools/autopipeline.py (1)
Learnt from: JoshuaSiraj
PR: bhklab/med-imagetools#145
File: src/imgtools/autopipeline.py:168-173
Timestamp: 2024-11-28T21:22:43.073Z
Learning: Dataset IDs are always 3-digit numbers in this codebase.
🪛 Ruff (0.8.0)
src/imgtools/autopipeline.py
456-456: Use key not in dict
instead of key not in dict.keys()
Remove .keys()
(SIM118)
643-643: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (4)
src/imgtools/autopipeline.py (4)
158-164
: LGTM: Clean directory setup using pathlib
The directory creation is implemented correctly using pathlib with proper parent directory handling and race condition prevention.
193-200
: LGTM: Proper dataset folder creation
The dataset folder creation is implemented correctly using pathlib with proper path handling.
450-453
: LGTM: Proper train/test split handling
The code correctly handles the routing of outputs to appropriate train/test directories with consistent subject naming.
Also applies to: 560-563
629-658
: LGTM: Comprehensive data saving and reporting
The implementation correctly handles:
- Training/test case counting
- Channel name mapping for dataset JSON
- Training script generation
- Markdown report generation
🧰 Tools
🪛 Ruff (0.8.0)
643-643: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
@JoshuaSiraj no glaring issues in the latest commit, i've also resolved the useless code rabbit comments but theres 3 I've left unresolved which I think are worth addressing |
theres also now 27 instances of |
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
docs/nnUNet.md (2)
76-76
: Add parameter documentation for preprocessing command.While the command is correct, consider adding explanations for the parameters:
-d X
: Dataset ID number-c 3d_fullres
: Configuration type--verify_dataset_integrity
: Validation step
114-114
: Add parameter documentation for inference command.Consider adding explanations for the parameters:
-i INPUT_FOLDER
: Input directory containing images-o OUTPUT_FOLDER
: Output directory for predictions-d DATASET_NAME_OR_ID
: Dataset identifier-c CONFIGURATION
: Model configuration (e.g., 3d_fullres)docs/AutoPipeline.md (2)
238-238
: Add an example of the output filename format.Consider adding an example to clarify the format:
e.g.,HNSCC_001_CT.nii.gz
253-262
: Consider documenting additional dataset.json fields.While the current fields are correct, consider mentioning these optional fields:
name
: Dataset namedescription
: Dataset descriptionreference
: Citation or referencerelease
: Release versiontensorImageSize
: Image dimensionality ("2D" or "3D")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
docs/AutoPipeline.md
(3 hunks)docs/nnUNet.md
(3 hunks)
🔇 Additional comments (4)
docs/nnUNet.md (3)
43-48
: LGTM! Directory structure is correct for nnUNetV2.
The updated directory structure accurately reflects the nnUNetV2 requirements.
54-56
: LGTM! Environment variables are correctly set.
The environment variables are properly defined with single equals signs.
62-64
:
Fix environment variable export syntax.
There's a syntax error in the environment variable export statements. The double equals (==
) for nnUNet_results
is incorrect.
Apply this fix:
-export nnUNet_results=="/OUTPUT_DIRECTORY/nnUNet_results"
+export nnUNet_results="/OUTPUT_DIRECTORY/nnUNet_results"
docs/AutoPipeline.md (1)
177-188
: LGTM! Directory structure is comprehensive and accurate.
The structure correctly shows all required files and directories for nnUNetV2.
Old:
New:
|
1 similar comment
Old:
New:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/imgtools/utils/nnunet.py (1)
5-35
: Consider clearing plot state between figures.While the function correctly closes figures, it's good practice to clear the plot state between figures to ensure complete isolation.
plt.figure() plt.bar(modalities, modality_totals) plt.title("Modality Counts") plt.xlabel("Modalities") plt.ylabel("Counts") plt.savefig(images_folder / "nnunet_modality_count.png") plt.close() + plt.clf() # Clear the current figure # Pie chart for train/test distribution plt.figure()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.gitignore
(1 hunks)src/imgtools/utils/nnunet.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🧰 Additional context used
📓 Learnings (1)
src/imgtools/utils/nnunet.py (1)
Learnt from: jjjermiah
PR: bhklab/med-imagetools#145
File: src/imgtools/utils/nnunet.py:88-169
Timestamp: 2024-11-29T21:18:00.351Z
Learning: For the `generate_dataset_json` function in `src/imgtools/utils/nnunet.py`, adding validation for required parameters is not necessary.
🔇 Additional comments (4)
src/imgtools/utils/nnunet.py (4)
1-3
: LGTM! Clean and minimal imports.
All imports are necessary and well-organized.
38-44
: LGTM! Clean and focused implementation.
The function is well-typed and handles its single responsibility effectively.
88-169
: LGTM! Well-documented and properly attributed implementation.
The function is well-documented, properly typed, and efficiently handles the dataset configuration generation. The attribution to the original nnUNet repository is appropriately included.
46-84
: Verify shell script compatibility across systems.
The script uses /bin/bash
shebang, which might not be available in the same location across all systems.
Summary by CodeRabbit
Documentation
dataset.json
structure.New Features
Improvements