feat: update to nnunetV2 #145

JoshuaSiraj · 2024-11-28T14:37:20Z

Update Documentation
Test on large subsets of data
Try cleaning code more?

Summary by CodeRabbit

Documentation
- Streamlined documentation for nnUNet by updating command structures and renaming key variables for clarity.
- Enhanced instructions for data preparation, training, and inference processes, including new flags for processing options and an updated dataset.json structure.
New Features
- Introduced a new function for generating training scripts for nnUNet.
- Enhanced reporting capabilities with improved markdown report generation for images.
Improvements
- Simplified dataset ID assignment and subject ID handling in the processing pipeline.
- Updated metadata handling for better consistency, particularly for MR images.
- Improved clarity and efficiency in nnUNet processing logic, including updated output directory structure.
- Added additional flags for contour selection behavior in AutoPipeline.

…DatasetXXX_Name)

…/nnUNet/blob/master/documentation/setting_up_paths.md)

…ADCURE-0005_000_0000.nii.gz)

… readability

… for subject_id

coderabbitai · 2024-11-28T14:37:27Z

Walkthrough

The pull request introduces significant modifications to the nnUNet documentation and codebase. Key changes include a restructuring of the nnUNet.md documentation, focusing on data preparation and inference commands. Enhancements to the AutoPipeline class streamline dataset processing and training script generation. The nnunet.py module sees updates to reporting functions and the addition of a training script generator, improving functionality and clarity in handling dataset JSON generation and image reporting.

Changes

File Path	Change Summary
docs/nnUNet.md	Restructured documentation; renamed variables and updated command signatures for preprocessing, training, and inference.
src/imgtools/autopipeline.py	Added import for `create_train_script`; simplified directory creation; refined dataset ID logic; modified `process_one_subject` for consistent subject ID handling; enhanced metadata saving.
src/imgtools/utils/nnunet.py	Updated `markdown_report_images` for directory creation and simplified plotting logic; added `create_train_script` function; modified `generate_dataset_json` with new parameters and output structure.
docs/AutoPipeline.md	Updated usage instructions and output structure; clarified flags for nnUNet training and inference; revised `dataset.json` structure with new fields.
.gitignore	Added pattern to ignore files starting with `nnunet_out`; reorganized existing entries for clarity.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant AutoPipeline
    participant nnunet

    User->>AutoPipeline: Start processing
    AutoPipeline->>AutoPipeline: Create output directories
    AutoPipeline->>AutoPipeline: Determine dataset ID
    AutoPipeline->>nnunet: Generate dataset JSON
    nnunet-->>AutoPipeline: Return dataset JSON
    AutoPipeline->>nnunet: Create training script
    nnunet-->>AutoPipeline: Return training script
    AutoPipeline-->>User: Processing complete

🐰 "In the code where changes abound,
Documentation's clarity is found.
With scripts to train and data to share,
nnUNet's journey is light as air!
From DICOM to NIfTI, we hop along,
With every update, we grow strong!" 🐇✨

Possibly related PRs

feat: dicom-finder cli tool, and improve documentation #142: The changes in the main PR regarding the documentation updates for nnUNet commands and output structures are related to the enhancements made in the dicom-finder CLI tool, which also involves documentation improvements and command structure adjustments.
fix: improve logging maintainability and temporarily remove json logging configuration #146: The updates to the .gitignore file in this PR, which now excludes .dcm files, are relevant as the main PR involves changes to the handling of DICOM files in the context of nnUNet, indicating a connection in file management practices.

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (6)

docs/nnUNet.md (1)

Line range hint 1-117: Consider enhancing the documentation further.

Given that one of the PR objectives is to "Update Documentation", consider the following improvements:

Add information about testing on large subsets of data (mentioned in PR objectives)

Include performance considerations or best practices

Add troubleshooting section for common issues

Consider adding version compatibility information
src/imgtools/utils/nnunet.py (3)
133-134: Remove unnecessary f prefix from non-formatted strings.

The f prefix is unnecessary since the strings do not contain any placeholders. Removing it will prevent confusion.

Apply this diff to correct the strings:
-    assert regions_class_order is not None, f"You have defined regions but regions_class_order is not set. " \
-                                            f"You need that."
+    assert regions_class_order is not None, "You have defined regions but regions_class_order is not set. " \
+                                            "You need that."
🧰 Tools

🪛 Ruff (0.8.0)

133-133: f-string without any placeholders

Remove extraneous f prefix

(F541)

134-134: f-string without any placeholders

Remove extraneous f prefix

(F541)

143-143: Use descriptive variable names and simplify dictionary iteration.

The variable name l is ambiguous. Renaming it to something like label_key improves readability. Also, you can iterate directly over the dictionary without calling .keys().

Apply this diff to enhance clarity:
-for l in labels.keys():
+for label_key in labels:
🧰 Tools

🪛 Ruff (0.8.0)

143-143: Ambiguous variable name: l

(E741)

143-143: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

153-153: Maintain professional tone in comments.

The comment "Live with it." can be perceived as unprofessional. Consider rephrasing it to maintain a respectful tone.

Apply this diff to update the comment:
-    # channel_names now. Live with it.
+    # Renamed to 'channel_names' for clarity.
src/imgtools/autopipeline.py (2)
Line range hint 1-1: Remove unnecessary import of Error from aifc

The import statement from aifc import Error is not used in the code or is inappropriate for raising exceptions in this context. Consider removing it to clean up unused imports.

Apply this diff to remove the unused import:
- from aifc import Error
633-633: Simplify dictionary iteration by omitting .keys()

When iterating over a dictionary's keys, you can iterate directly over the dictionary without calling .keys(). This makes the code more concise and idiomatic.

Apply this diff to simplify the loop:
      channel_names_mapping = {
          self.nnunet_info["modalities"][k].lstrip('0') or '0': k  
-         for k in self.nnunet_info["modalities"].keys()
+         for k in self.nnunet_info["modalities"]
      }
🧰 Tools

🪛 Ruff (0.8.0)

633-633: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 22d8fb7 and e9cf057.

📒 Files selected for processing (3)

docs/nnUNet.md (1 hunks)
src/imgtools/autopipeline.py (7 hunks)
src/imgtools/utils/nnunet.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.0)

src/imgtools/autopipeline.py

633-633: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/imgtools/utils/nnunet.py

2-2: os imported but unused

Remove unused import

(F401)

2-2: glob imported but unused

Remove unused import

(F401)

133-133: f-string without any placeholders

Remove extraneous f prefix

(F541)

134-134: f-string without any placeholders

Remove extraneous f prefix

(F541)

143-143: Ambiguous variable name: l

(E741)

143-143: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (1)

docs/nnUNet.md (1)

117-117: LGTM! Helpful clarification added.

The added clarification about the relationship between Med-ImageTools' output directory and nnUNet's input folder is valuable for users.

src/imgtools/utils/nnunet.py

src/imgtools/autopipeline.py

codecov · 2024-11-28T14:45:41Z

Codecov Report

Attention: Patch coverage is 12.82051% with 68 lines in your changes missing coverage. Please review.

Project coverage is 47.48%. Comparing base (66f8119) to head (d3e1a11).
Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
src/imgtools/autopipeline.py	9.30%	39 Missing ⚠️
src/imgtools/utils/nnunet.py	17.14%	29 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
+ Coverage   47.19%   47.48%   +0.29%     
==========================================
  Files          41       41              
  Lines        3348     3319      -29     
==========================================
- Hits         1580     1576       -4     
+ Misses       1768     1743      -25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/imgtools/utils/nnunet.py

src/imgtools/autopipeline.py

jjjermiah · 2024-11-28T15:28:01Z

also update PR title pls <3

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)

docs/nnUNet.md (5)

43-48: Consider enhancing directory structure documentation.

The directory structure is accurate for nnUNetV2, but it would be helpful to add brief descriptions of each directory's purpose:

nnUNet_raw: Contains the raw dataset files

nnUNet_preprocessed: Stores preprocessed data

nnUNet_results: Stores training results and model weights

73-76: Consider documenting configuration options.

While the preprocessing command is correct, it would be helpful to explain:

Available configuration options beyond 3d_fullres

When to use different configurations

Expected preprocessing time

107-107: Document inference file naming convention.

The example shows subject1_000_0000.nii.gz but doesn't explain:

The meaning of the numeric suffixes

Expected file naming patterns

How to map these files back to original data

114-114: Enhance inference command documentation.

While the command syntax is correct, consider documenting:

Available CONFIGURATION options

Expected inference time

Memory requirements

GPU/CPU requirements

117-117: Consider adding more context about folder relationships.

While the input/output relationship is explained, it would be helpful to:

Provide an example folder structure

Explain any required file organization

Document any naming conventions

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e9cf057 and dfa1ce1.

📒 Files selected for processing (1)

docs/nnUNet.md (3 hunks)

docs/nnUNet.md

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (8)

docs/nnUNet.md (3)

73-76: Consider adding more details about preprocessing options.

The preprocessing command is correctly documented, but it would be helpful to explain:

What the -c 3d_fullres parameter means

Other available configuration options

Expected preprocessing duration

81-84: Add parameter documentation for the training command.

While the command syntax is correct, please add explanations for:

Available UNET_CONFIGURATION options

Valid FOLD values

Expected training duration

114-117: Enhance inference command documentation.

Please add explanations for:

Available values for the -c CONFIGURATION parameter

Expected inference time

Memory requirements
src/imgtools/utils/nnunet.py (4)
5-9: Add return type annotation and error handling.

The function signature could be improved with proper return type annotation and validation of input parameters.
 def markdown_report_images(
     output_folder: str | pathlib.Path, 
     modality_count: Dict[str, int], 
     train_total: int, 
-    test_total: int) -> None:
+    test_total: int) -> pathlib.Path:
+    if not modality_count:
+        raise ValueError("modality_count cannot be empty")
+    if train_total < 0 or test_total < 0:
+        raise ValueError("train_total and test_total must be non-negative")
18-24: Enhance plot styling and readability.

The bar plot could benefit from improved styling for better visualization.
-    plt.figure()  
-    plt.bar(modalities, modality_totals)
-    plt.title("Modality Counts")
-    plt.xlabel("Modalities")
-    plt.ylabel("Counts")
+    plt.figure(figsize=(10, 6))
+    bars = plt.bar(modalities, modality_totals)
+    plt.title("Modality Distribution", pad=20)
+    plt.xlabel("Modalities", labelpad=10)
+    plt.ylabel("Count", labelpad=10)
+    # Add value labels on top of each bar
+    for bar in bars:
+        height = bar.get_height()
+        plt.text(bar.get_x() + bar.get_width()/2., height,
+                f'{int(height)}', ha='center', va='bottom')
76-80: Make training iterations configurable.

The number of training iterations (5) should be a parameter rather than hard-coded.
+def create_train_script(
+        output_directory: str | pathlib.Path,
+        dataset_id: int,
+        num_folds: int = 5) -> pathlib.Path:
...
-for (( i=0; i<5; i++ ))
+for (( i=0; i<${num_folds}; i++ ))
97-97: Fix inconsistent parameter naming.

The parameter is named usage_license but used as licence in the dictionary. Maintain consistent naming.
-                          usage_license: str = 'hands off!',
+                          license: str = 'hands off!',
...
-        "licence": usage_license,  
+        "license": license,  
Also applies to: 166-166
src/imgtools/autopipeline.py (1)
168-173: Optimize the dictionary comprehension for dataset ID extraction.

The current implementation can be simplified by removing the redundant .keys() call.

Apply this diff to optimize the code:
-            used_ids = {
-                int(pathlib.Path(folder).parent.parent.name[7:10]) 
-                for folder in all_nnunet_folders
-                if pathlib.Path(folder).parent.parent.name.startswith("Dataset")
-            }
+            used_ids = {
+                int(parent_name[7:10])
+                for folder in all_nnunet_folders
+                if (parent_name := pathlib.Path(folder).parent.parent.name).startswith("Dataset")
+            }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dfa1ce1 and 3ed91bc.

📒 Files selected for processing (3)

docs/nnUNet.md (3 hunks)
src/imgtools/autopipeline.py (8 hunks)
src/imgtools/utils/nnunet.py (1 hunks)

🧰 Additional context used

📓 Learnings (1)

src/imgtools/autopipeline.py (1)

Learnt from: JoshuaSiraj
PR: bhklab/med-imagetools#145
File: src/imgtools/autopipeline.py:168-173
Timestamp: 2024-11-28T21:22:43.073Z
Learning: Dataset IDs are always 3-digit numbers in this codebase.

🪛 Ruff (0.8.0)

src/imgtools/autopipeline.py

632-632: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (6)

docs/nnUNet.md (2)

43-48: LGTM: Directory structure is well documented.

The directory structure accurately reflects the new organization of nnUNet files and folders.

62-64: Fix environment variable export syntax.

There's a syntax error in the environment variable export statements. The double equals (==) for nnUNet_results is incorrect.

src/imgtools/utils/nnunet.py (1)

1-2: LGTM! Clean imports.

The imports have been properly cleaned up as suggested in the past review comments.

src/imgtools/autopipeline.py (3)

158-164: LGTM: Directory creation is robust and safe.

The code safely creates nnUNet directories using mkdir with appropriate flags to handle parent directories and prevent race conditions.

629-647: LGTM: Comprehensive nnUNet setup with proper documentation.

The code effectively:

Maps modalities to channel names
Generates dataset.json with proper configuration
Creates training scripts
Generates markdown reports with visualizations

🧰 Tools

🪛 Ruff (0.8.0)

632-632: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

429-433: ⚠️ Potential issue

Add validation for subject_id format to prevent IndexError.

The code assumes subject_id contains at least two parts separated by underscore. This could raise an IndexError if the format is invalid.

Apply this diff to add validation:

     if self.is_nnunet or self.is_nnunet_inference:
+        parts = subject_id.split('_')
+        if len(parts) < 2:
+            raise ValueError(f"Invalid subject_id format: {subject_id}. Expected format: NUMBER_NAME")
-        subject_id = f"{subject_id.split('_')[1]}_{subject_id.split('_')[0]:03}"
+        subject_id = f"{parts[1]}_{parts[0]:03}"

Likely invalid or redundant comment.

src/imgtools/utils/nnunet.py

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)

src/imgtools/utils/nnunet.py (1)

46-84: Extract configuration and improve script generation

The function uses hardcoded folder names and lacks configuration flexibility.

Consider extracting the configuration to make the function more flexible:

+from dataclasses import dataclass
+
+@dataclass
+class NNUNetConfig:
+    raw_dir: str = "nnUNet_raw"
+    preprocessed_dir: str = "nnUNet_preprocessed"
+    results_dir: str = "nnUNet_results"
+    num_folds: int = 5
+
 def create_train_script(
         output_directory: str | pathlib.Path,
-        dataset_id: int):
+        dataset_id: int,
+        config: NNUNetConfig = NNUNetConfig()) -> pathlib.Path:
     """
     Creates a bash script (`nnunet_preprocess_and_train.sh`) for running nnUNet training, with paths for raw data,
     preprocessed data, and trained models. The script ensures environment variables are set and 
     executes the necessary training commands.

     Parameters:
     - output_directory (str): The directory where the output and subdirectories are located.
     - dataset_id (int): The ID of the dataset to be processed.
+    - config (NNUNetConfig): Configuration for nnUNet directories and training parameters.
+
+    Returns:
+    - pathlib.Path: Path to the generated script.
     """
     if dataset_id < 0:
         raise ValueError("dataset_id must be non-negative")

     # Define paths using pathlib
     output_directory = pathlib.Path(output_directory)
     shell_path = output_directory / 'nnunet_preprocess_and_train.sh'
     base_dir = output_directory.parent.parent

     if shell_path.exists():
         shell_path.unlink()

     # Define the environment variables and the script commands
     script_content = f"""#!/bin/bash
set -e

-export nnUNet_raw="{base_dir}/nnUNet_raw"
-export nnUNet_preprocessed="{base_dir}/nnUNet_preprocessed"
-export nnUNet_results="{base_dir}/nnUNet_results"
+export nnUNet_raw="{base_dir}/{config.raw_dir}"
+export nnUNet_preprocessed="{base_dir}/{config.preprocessed_dir}"
+export nnUNet_results="{base_dir}/{config.results_dir}"

nnUNetv2_plan_and_preprocess -d {dataset_id} --verify_dataset_integrity -c 3d_fullres

-for (( i=0; i<5; i++ ))
+for (( i=0; i<{config.num_folds}; i++ ))
do
    nnUNetv2_train {dataset_id} 3d_fullres $i
done
"""

     # Write the script content to the file
     with shell_path.open("w", newline="\n") as f:
         f.write(script_content)
+    shell_path.chmod(0o755)  # Make the script executable
+    return shell_path

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3ed91bc and 1a0552d.

📒 Files selected for processing (1)

src/imgtools/utils/nnunet.py (1 hunks)

src/imgtools/utils/nnunet.py

… whe --ignore_missing_regex

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

src/imgtools/autopipeline.py (2)
168-192: Consider using constants for dataset ID range

The code correctly handles dataset IDs, but could be improved by defining constants for better maintainability.

Consider this refactor:
+MIN_DATASET_ID = 1
+MAX_DATASET_ID = 999
+DATASET_ID_FORMAT = "{:03}"

-all_ids = set(range(1, 1000))
+all_ids = set(range(MIN_DATASET_ID, MAX_DATASET_ID + 1))

-dataset_folder_name = f"Dataset{self.dataset_id:03}_{study_name}"
+dataset_folder_name = f"Dataset{DATASET_ID_FORMAT.format(self.dataset_id)}_{study_name}"
Line range hint 429-458: Simplify dictionary key lookups

The modality handling logic is correct, but the dictionary key lookups can be simplified.

Apply this diff to simplify the lookups:
-if metadata["AcquisitionContrast"] not in self.total_modality_counter.keys():
+if metadata["AcquisitionContrast"] not in self.total_modality_counter:

-if modality not in self.total_modality_counter.keys():
+if modality not in self.total_modality_counter:

-if self.nnunet_info["current_modality"] not in self.nnunet_info["modalities"].keys():
+if self.nnunet_info["current_modality"] not in self.nnunet_info["modalities"]:
🧰 Tools

🪛 Ruff (0.8.0)

456-456: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1a0552d and 615a232.

📒 Files selected for processing (1)

src/imgtools/autopipeline.py (9 hunks)

🧰 Additional context used

📓 Learnings (1)

src/imgtools/autopipeline.py (1)

Learnt from: JoshuaSiraj
PR: bhklab/med-imagetools#145
File: src/imgtools/autopipeline.py:168-173
Timestamp: 2024-11-28T21:22:43.073Z
Learning: Dataset IDs are always 3-digit numbers in this codebase.

🪛 Ruff (0.8.0)

src/imgtools/autopipeline.py

456-456: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

643-643: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (4)

src/imgtools/autopipeline.py (4)

158-164: LGTM: Clean directory setup using pathlib

The directory creation is implemented correctly using pathlib with proper parent directory handling and race condition prevention.

193-200: LGTM: Proper dataset folder creation

The dataset folder creation is implemented correctly using pathlib with proper path handling.

450-453: LGTM: Proper train/test split handling

The code correctly handles the routing of outputs to appropriate train/test directories with consistent subject naming.

Also applies to: 560-563

629-658: LGTM: Comprehensive data saving and reporting

The implementation correctly handles:

Training/test case counting
Channel name mapping for dataset JSON
Training script generation
Markdown report generation

🧰 Tools

🪛 Ruff (0.8.0)

643-643: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

jjjermiah · 2024-11-29T21:20:14Z

@JoshuaSiraj no glaring issues in the latest commit, i've also resolved the useless code rabbit comments but theres 3 I've left unresolved which I think are worth addressing

jjjermiah · 2024-11-29T21:21:38Z

theres also now 27 instances of pathlib.Path(self.output_directory) in the code, which is impressive
if i were to refactor anything. i would start by making the code readable first, and then address logic later

JoshuaSiraj · 2024-11-29T21:24:28Z

@JoshuaSiraj no glaring issues in the latest commit, i've also resolved the useless code rabbit comments but theres 3 I've left unresolved which I think are worth addressing

Thanks!

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)

docs/nnUNet.md (2)

76-76: Add parameter documentation for preprocessing command.

While the command is correct, consider adding explanations for the parameters:

-d X: Dataset ID number

-c 3d_fullres: Configuration type

--verify_dataset_integrity: Validation step

114-114: Add parameter documentation for inference command.

Consider adding explanations for the parameters:

-i INPUT_FOLDER: Input directory containing images

-o OUTPUT_FOLDER: Output directory for predictions

-d DATASET_NAME_OR_ID: Dataset identifier

-c CONFIGURATION: Model configuration (e.g., 3d_fullres)

docs/AutoPipeline.md (2)

238-238: Add an example of the output filename format.

Consider adding an example to clarify the format:
e.g., HNSCC_001_CT.nii.gz

253-262: Consider documenting additional dataset.json fields.

While the current fields are correct, consider mentioning these optional fields:

name: Dataset name

description: Dataset description

reference: Citation or reference

release: Release version

tensorImageSize: Image dimensionality ("2D" or "3D")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 615a232 and d3e1a11.

📒 Files selected for processing (2)

docs/AutoPipeline.md (3 hunks)
docs/nnUNet.md (3 hunks)

🔇 Additional comments (4)

docs/nnUNet.md (3)

43-48: LGTM! Directory structure is correct for nnUNetV2.

The updated directory structure accurately reflects the nnUNetV2 requirements.

54-56: LGTM! Environment variables are correctly set.

The environment variables are properly defined with single equals signs.

62-64: ⚠️ Potential issue

Fix environment variable export syntax.

There's a syntax error in the environment variable export statements. The double equals (==) for nnUNet_results is incorrect.

Apply this fix:

-export nnUNet_results=="/OUTPUT_DIRECTORY/nnUNet_results"
+export nnUNet_results="/OUTPUT_DIRECTORY/nnUNet_results"

docs/AutoPipeline.md (1)

177-188: LGTM! Directory structure is comprehensive and accurate.

The structure correctly shows all required files and directories for nnUNetV2.

JoshuaSiraj · 2024-12-02T19:19:16Z

Old:

├── nnUNet_preprocessed
├── nnUNet_raw_data_base
│   └── nnUNet_raw_data
│       └── Task500_data
│           ├── dataset.csv
│           ├── dataset.json
│           ├── data.sh
│           ├── imagesTr
│           │   └── 0_RADCURE-0005_0000.nii.gz
│           ├── labelsTr
│           │   └── 0_RADCURE-0005.nii.gz
│           ├── markdown_images
│           │   ├── nnunet_modality_count.png
│           │   └── nnunet_train_test_pie.png
│           └── report.md
└── nnUNet_trained_models

New:

├── nnUNet_preprocessed
├── nnUNet_raw
│   └── Dataset001_RADCURE
│       ├── dataset.csv
│       ├── dataset.json
│       ├── imagesTr
│       │   ├── RADCURE_001_0000.nii.gz
│       │   ├── RADCURE_002_0000.nii.gz
│       │   ├── RADCURE_003_0000.nii.gz
│       │   ├── RADCURE_004_0000.nii.gz
│       │   ├── RADCURE_005_0000.nii.gz
│       ├── imagesTs
│       │   ├── RADCURE_000_0000.nii.gz
│       │   ├── RADCURE_006_0000.nii.gz
│       ├── labelsTr
│       │   ├── RADCURE_001.nii.gz
│       │   ├── RADCURE_002.nii.gz
│       │   ├── RADCURE_003.nii.gz
│       │   ├── RADCURE_004.nii.gz
│       │   ├── RADCURE_005.nii.gz
│       ├── labelsTs
│       │   ├── RADCURE_000.nii.gz
│       │   ├── RADCURE_006.nii.gz
│       ├── markdown_images
│       │   ├── nnunet_modality_count.png
│       │   └── nnunet_train_test_pie.png
│       ├── nnunet_preprocess_and_train.sh
│       └── report.md
└── nnUNet_results

@skim2257

JoshuaSiraj · 2024-12-03T15:05:35Z

Old:

├── nnUNet_preprocessed
├── nnUNet_raw_data_base
│   └── nnUNet_raw_data
│       └── Task500_data
│           ├── dataset.csv
│           ├── dataset.json
│           ├── data.sh
│           ├── imagesTr
│           │   └── 0_RADCURE-0005_0000.nii.gz
│           ├── labelsTr
│           │   └── 0_RADCURE-0005.nii.gz
│           ├── markdown_images
│           │   ├── nnunet_modality_count.png
│           │   └── nnunet_train_test_pie.png
│           └── report.md
└── nnUNet_trained_models

New:

├── nnUNet_preprocessed
├── nnUNet_raw
│   └── Dataset001_RADCURE
│       ├── dataset.csv
│       ├── dataset.json
│       ├── imagesTr
│       │   ├── RADCURE_001_0000.nii.gz
│       │   ├── RADCURE_002_0000.nii.gz
│       │   ├── RADCURE_003_0000.nii.gz
│       │   ├── RADCURE_004_0000.nii.gz
│       │   ├── RADCURE_005_0000.nii.gz
│       ├── imagesTs
│       │   ├── RADCURE_000_0000.nii.gz
│       │   ├── RADCURE_006_0000.nii.gz
│       ├── labelsTr
│       │   ├── RADCURE_001.nii.gz
│       │   ├── RADCURE_002.nii.gz
│       │   ├── RADCURE_003.nii.gz
│       │   ├── RADCURE_004.nii.gz
│       │   ├── RADCURE_005.nii.gz
│       ├── labelsTs
│       │   ├── RADCURE_000.nii.gz
│       │   ├── RADCURE_006.nii.gz
│       ├── markdown_images
│       │   ├── nnunet_modality_count.png
│       │   └── nnunet_train_test_pie.png
│       ├── nnunet_preprocess_and_train.sh
│       └── report.md
└── nnUNet_results

@skim2257

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/imgtools/utils/nnunet.py (1)
5-35: Consider clearing plot state between figures.

While the function correctly closes figures, it's good practice to clear the plot state between figures to ensure complete isolation.
     plt.figure()  
     plt.bar(modalities, modality_totals)
     plt.title("Modality Counts")
     plt.xlabel("Modalities")
     plt.ylabel("Counts")
     plt.savefig(images_folder / "nnunet_modality_count.png")
     plt.close()  
+    plt.clf()  # Clear the current figure

     # Pie chart for train/test distribution
     plt.figure()

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3e1a11 and caf5da5.

📒 Files selected for processing (2)

.gitignore (1 hunks)
src/imgtools/utils/nnunet.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

.gitignore

🧰 Additional context used

📓 Learnings (1)

src/imgtools/utils/nnunet.py (1)

Learnt from: jjjermiah
PR: bhklab/med-imagetools#145
File: src/imgtools/utils/nnunet.py:88-169
Timestamp: 2024-11-29T21:18:00.351Z
Learning: For the `generate_dataset_json` function in `src/imgtools/utils/nnunet.py`, adding validation for required parameters is not necessary.

🔇 Additional comments (4)

src/imgtools/utils/nnunet.py (4)

1-3: LGTM! Clean and minimal imports.

All imports are necessary and well-organized.

38-44: LGTM! Clean and focused implementation.

The function is well-typed and handles its single responsibility effectively.

88-169: LGTM! Well-documented and properly attributed implementation.

The function is well-documented, properly typed, and efficiently handles the dataset configuration generation. The attribution to the original nnUNet repository is appropriately included.

46-84: Verify shell script compatibility across systems.

The script uses /bin/bash shebang, which might not be available in the same location across all systems.

JoshuaSiraj added 8 commits November 27, 2024 14:02

fix: removed repeated documentation on nnUnet page

b388e8e

fix: removed repeated documentation on nnunet page

90bd7d8

feat: updated to generate_dataset_json func from nnunetv2

172b595

feat: changed to nnunetv2 dataset naming convention(TaskXXX_NAME - > …

bf69ab2

…DatasetXXX_Name)

feat: updated to nnunet folder convention(https://github.com/MIC-DKFZ…

bca5c54

…/nnUNet/blob/master/documentation/setting_up_paths.md)

feat: changed to nnunetv2 file naming(0_RADCURE-0005_0000.nii.gz -> R…

908e8e2

…ADCURE-0005_000_0000.nii.gz)

style: added comments and updated markdown_report_images function for…

7d76c05

… readability

fix: using nnunetv2 functions in train.sh and updated string splicing…

e9cf057

… for subject_id

JoshuaSiraj requested a review from jjjermiah November 28, 2024 14:37

JoshuaSiraj changed the title ~~Joshua siraj/update to nn unetv2~~ Joshua siraj/update to nnunetv2 Nov 28, 2024

coderabbitai bot reviewed Nov 28, 2024

View reviewed changes

src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved

src/imgtools/autopipeline.py Outdated Show resolved Hide resolved

src/imgtools/autopipeline.py Show resolved Hide resolved

src/imgtools/autopipeline.py Outdated Show resolved Hide resolved

jjjermiah requested changes Nov 28, 2024

View reviewed changes

JoshuaSiraj changed the title ~~Joshua siraj/update to nnunetv2~~ feat: update to nnunetV2 Nov 28, 2024

docs: updated nnunet page to reflect v2 changes

dfa1ce1

coderabbitai bot reviewed Nov 28, 2024

View reviewed changes

docs/nnUNet.md Outdated Show resolved Hide resolved

docs/nnUNet.md Outdated Show resolved Hide resolved

resolved pr comments

3ed91bc

coderabbitai bot reviewed Nov 28, 2024

View reviewed changes

src/imgtools/utils/nnunet.py Show resolved Hide resolved

src/imgtools/utils/nnunet.py Show resolved Hide resolved

src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved

src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved

fix: small bugs

1a0552d

coderabbitai bot reviewed Nov 28, 2024

View reviewed changes

src/imgtools/utils/nnunet.py Show resolved Hide resolved

src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved

fix: update to file name and taking care of patients that have no ROI…

615a232

… whe --ignore_missing_regex

coderabbitai bot reviewed Nov 29, 2024

View reviewed changes

docs: update to nnunet stuff on main page and minor edit on nnunet page

d3e1a11

coderabbitai bot reviewed Dec 2, 2024

View reviewed changes

JoshuaSiraj requested a review from skim2257 December 2, 2024 16:43

refactor: small type update + gitignore updated to ignore test outs

caf5da5

coderabbitai bot reviewed Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update to nnunetV2 #145

feat: update to nnunetV2 #145

JoshuaSiraj commented Nov 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 28, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

codecov bot commented Nov 28, 2024 •

edited

Loading

jjjermiah commented Nov 28, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

jjjermiah commented Nov 29, 2024

jjjermiah commented Nov 29, 2024

JoshuaSiraj commented Nov 29, 2024

coderabbitai bot left a comment

JoshuaSiraj commented Dec 2, 2024

JoshuaSiraj commented Dec 3, 2024

coderabbitai bot left a comment

feat: update to nnunetV2 #145

Are you sure you want to change the base?

feat: update to nnunetV2 #145

Conversation

JoshuaSiraj commented Nov 28, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 28, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 28, 2024 • edited Loading

Codecov Report

jjjermiah commented Nov 28, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

jjjermiah commented Nov 29, 2024

jjjermiah commented Nov 29, 2024

JoshuaSiraj commented Nov 29, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

JoshuaSiraj commented Dec 2, 2024

JoshuaSiraj commented Dec 3, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

JoshuaSiraj commented Nov 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 28, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Nov 28, 2024 •

edited

Loading