Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update to nnunetV2 #145

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

JoshuaSiraj
Copy link
Collaborator

@JoshuaSiraj JoshuaSiraj commented Nov 28, 2024

  • Update Documentation
  • Test on large subsets of data
  • Try cleaning code more?

Summary by CodeRabbit

  • Documentation

    • Streamlined documentation for nnUNet by updating command structures and renaming key variables for clarity.
    • Enhanced instructions for data preparation, training, and inference processes, including new flags for processing options and an updated dataset.json structure.
  • New Features

    • Introduced a new function for generating training scripts for nnUNet.
    • Enhanced reporting capabilities with improved markdown report generation for images.
  • Improvements

    • Simplified dataset ID assignment and subject ID handling in the processing pipeline.
    • Updated metadata handling for better consistency, particularly for MR images.
    • Improved clarity and efficiency in nnUNet processing logic, including updated output directory structure.
    • Added additional flags for contour selection behavior in AutoPipeline.

Copy link
Contributor

coderabbitai bot commented Nov 28, 2024

Walkthrough

The pull request introduces significant modifications to the nnUNet documentation and codebase. Key changes include a restructuring of the nnUNet.md documentation, focusing on data preparation and inference commands. Enhancements to the AutoPipeline class streamline dataset processing and training script generation. The nnunet.py module sees updates to reporting functions and the addition of a training script generator, improving functionality and clarity in handling dataset JSON generation and image reporting.

Changes

File Path Change Summary
docs/nnUNet.md Restructured documentation; renamed variables and updated command signatures for preprocessing, training, and inference.
src/imgtools/autopipeline.py Added import for create_train_script; simplified directory creation; refined dataset ID logic; modified process_one_subject for consistent subject ID handling; enhanced metadata saving.
src/imgtools/utils/nnunet.py Updated markdown_report_images for directory creation and simplified plotting logic; added create_train_script function; modified generate_dataset_json with new parameters and output structure.
docs/AutoPipeline.md Updated usage instructions and output structure; clarified flags for nnUNet training and inference; revised dataset.json structure with new fields.
.gitignore Added pattern to ignore files starting with nnunet_out; reorganized existing entries for clarity.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant AutoPipeline
    participant nnunet

    User->>AutoPipeline: Start processing
    AutoPipeline->>AutoPipeline: Create output directories
    AutoPipeline->>AutoPipeline: Determine dataset ID
    AutoPipeline->>nnunet: Generate dataset JSON
    nnunet-->>AutoPipeline: Return dataset JSON
    AutoPipeline->>nnunet: Create training script
    nnunet-->>AutoPipeline: Return training script
    AutoPipeline-->>User: Processing complete
Loading

🐰 "In the code where changes abound,
Documentation's clarity is found.
With scripts to train and data to share,
nnUNet's journey is light as air!
From DICOM to NIfTI, we hop along,
With every update, we grow strong!" 🐇✨

Possibly related PRs

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@JoshuaSiraj JoshuaSiraj changed the title Joshua siraj/update to nn unetv2 Joshua siraj/update to nnunetv2 Nov 28, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (6)
docs/nnUNet.md (1)

Line range hint 1-117: Consider enhancing the documentation further.

Given that one of the PR objectives is to "Update Documentation", consider the following improvements:

  1. Add information about testing on large subsets of data (mentioned in PR objectives)
  2. Include performance considerations or best practices
  3. Add troubleshooting section for common issues
  4. Consider adding version compatibility information
src/imgtools/utils/nnunet.py (3)

133-134: Remove unnecessary f prefix from non-formatted strings.

The f prefix is unnecessary since the strings do not contain any placeholders. Removing it will prevent confusion.

Apply this diff to correct the strings:

-    assert regions_class_order is not None, f"You have defined regions but regions_class_order is not set. " \
-                                            f"You need that."
+    assert regions_class_order is not None, "You have defined regions but regions_class_order is not set. " \
+                                            "You need that."
🧰 Tools
🪛 Ruff (0.8.0)

133-133: f-string without any placeholders

Remove extraneous f prefix

(F541)


134-134: f-string without any placeholders

Remove extraneous f prefix

(F541)


143-143: Use descriptive variable names and simplify dictionary iteration.

The variable name l is ambiguous. Renaming it to something like label_key improves readability. Also, you can iterate directly over the dictionary without calling .keys().

Apply this diff to enhance clarity:

-for l in labels.keys():
+for label_key in labels:
🧰 Tools
🪛 Ruff (0.8.0)

143-143: Ambiguous variable name: l

(E741)


143-143: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


153-153: Maintain professional tone in comments.

The comment "Live with it." can be perceived as unprofessional. Consider rephrasing it to maintain a respectful tone.

Apply this diff to update the comment:

-    # channel_names now. Live with it.
+    # Renamed to 'channel_names' for clarity.
src/imgtools/autopipeline.py (2)

Line range hint 1-1: Remove unnecessary import of Error from aifc

The import statement from aifc import Error is not used in the code or is inappropriate for raising exceptions in this context. Consider removing it to clean up unused imports.

Apply this diff to remove the unused import:

- from aifc import Error

633-633: Simplify dictionary iteration by omitting .keys()

When iterating over a dictionary's keys, you can iterate directly over the dictionary without calling .keys(). This makes the code more concise and idiomatic.

Apply this diff to simplify the loop:

      channel_names_mapping = {
          self.nnunet_info["modalities"][k].lstrip('0') or '0': k  
-         for k in self.nnunet_info["modalities"].keys()
+         for k in self.nnunet_info["modalities"]
      }
🧰 Tools
🪛 Ruff (0.8.0)

633-633: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 22d8fb7 and e9cf057.

📒 Files selected for processing (3)
  • docs/nnUNet.md (1 hunks)
  • src/imgtools/autopipeline.py (7 hunks)
  • src/imgtools/utils/nnunet.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
src/imgtools/autopipeline.py

633-633: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/imgtools/utils/nnunet.py

2-2: os imported but unused

Remove unused import

(F401)


2-2: glob imported but unused

Remove unused import

(F401)


133-133: f-string without any placeholders

Remove extraneous f prefix

(F541)


134-134: f-string without any placeholders

Remove extraneous f prefix

(F541)


143-143: Ambiguous variable name: l

(E741)


143-143: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (1)
docs/nnUNet.md (1)

117-117: LGTM! Helpful clarification added.

The added clarification about the relationship between Med-ImageTools' output directory and nnUNet's input folder is valuable for users.

src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/autopipeline.py Outdated Show resolved Hide resolved
src/imgtools/autopipeline.py Show resolved Hide resolved
src/imgtools/autopipeline.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 12.82051% with 68 lines in your changes missing coverage. Please review.

Project coverage is 47.48%. Comparing base (66f8119) to head (d3e1a11).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/imgtools/autopipeline.py 9.30% 39 Missing ⚠️
src/imgtools/utils/nnunet.py 17.14% 29 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
+ Coverage   47.19%   47.48%   +0.29%     
==========================================
  Files          41       41              
  Lines        3348     3319      -29     
==========================================
- Hits         1580     1576       -4     
+ Misses       1768     1743      -25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/autopipeline.py Outdated Show resolved Hide resolved
@jjjermiah
Copy link
Contributor

also update PR title pls <3

@JoshuaSiraj JoshuaSiraj changed the title Joshua siraj/update to nnunetv2 feat: update to nnunetV2 Nov 28, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
docs/nnUNet.md (5)

43-48: Consider enhancing directory structure documentation.

The directory structure is accurate for nnUNetV2, but it would be helpful to add brief descriptions of each directory's purpose:

  • nnUNet_raw: Contains the raw dataset files
  • nnUNet_preprocessed: Stores preprocessed data
  • nnUNet_results: Stores training results and model weights

73-76: Consider documenting configuration options.

While the preprocessing command is correct, it would be helpful to explain:

  • Available configuration options beyond 3d_fullres
  • When to use different configurations
  • Expected preprocessing time

107-107: Document inference file naming convention.

The example shows subject1_000_0000.nii.gz but doesn't explain:

  • The meaning of the numeric suffixes
  • Expected file naming patterns
  • How to map these files back to original data

114-114: Enhance inference command documentation.

While the command syntax is correct, consider documenting:

  • Available CONFIGURATION options
  • Expected inference time
  • Memory requirements
  • GPU/CPU requirements

117-117: Consider adding more context about folder relationships.

While the input/output relationship is explained, it would be helpful to:

  • Provide an example folder structure
  • Explain any required file organization
  • Document any naming conventions
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e9cf057 and dfa1ce1.

📒 Files selected for processing (1)
  • docs/nnUNet.md (3 hunks)

docs/nnUNet.md Outdated Show resolved Hide resolved
docs/nnUNet.md Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (8)
docs/nnUNet.md (3)

73-76: Consider adding more details about preprocessing options.

The preprocessing command is correctly documented, but it would be helpful to explain:

  • What the -c 3d_fullres parameter means
  • Other available configuration options
  • Expected preprocessing duration

81-84: Add parameter documentation for the training command.

While the command syntax is correct, please add explanations for:

  • Available UNET_CONFIGURATION options
  • Valid FOLD values
  • Expected training duration

114-117: Enhance inference command documentation.

Please add explanations for:

  • Available values for the -c CONFIGURATION parameter
  • Expected inference time
  • Memory requirements
src/imgtools/utils/nnunet.py (4)

5-9: Add return type annotation and error handling.

The function signature could be improved with proper return type annotation and validation of input parameters.

 def markdown_report_images(
     output_folder: str | pathlib.Path, 
     modality_count: Dict[str, int], 
     train_total: int, 
-    test_total: int) -> None:
+    test_total: int) -> pathlib.Path:
+    if not modality_count:
+        raise ValueError("modality_count cannot be empty")
+    if train_total < 0 or test_total < 0:
+        raise ValueError("train_total and test_total must be non-negative")

18-24: Enhance plot styling and readability.

The bar plot could benefit from improved styling for better visualization.

-    plt.figure()  
-    plt.bar(modalities, modality_totals)
-    plt.title("Modality Counts")
-    plt.xlabel("Modalities")
-    plt.ylabel("Counts")
+    plt.figure(figsize=(10, 6))
+    bars = plt.bar(modalities, modality_totals)
+    plt.title("Modality Distribution", pad=20)
+    plt.xlabel("Modalities", labelpad=10)
+    plt.ylabel("Count", labelpad=10)
+    # Add value labels on top of each bar
+    for bar in bars:
+        height = bar.get_height()
+        plt.text(bar.get_x() + bar.get_width()/2., height,
+                f'{int(height)}', ha='center', va='bottom')

76-80: Make training iterations configurable.

The number of training iterations (5) should be a parameter rather than hard-coded.

+def create_train_script(
+        output_directory: str | pathlib.Path,
+        dataset_id: int,
+        num_folds: int = 5) -> pathlib.Path:
...
-for (( i=0; i<5; i++ ))
+for (( i=0; i<${num_folds}; i++ ))

97-97: Fix inconsistent parameter naming.

The parameter is named usage_license but used as licence in the dictionary. Maintain consistent naming.

-                          usage_license: str = 'hands off!',
+                          license: str = 'hands off!',
...
-        "licence": usage_license,  
+        "license": license,  

Also applies to: 166-166

src/imgtools/autopipeline.py (1)

168-173: Optimize the dictionary comprehension for dataset ID extraction.

The current implementation can be simplified by removing the redundant .keys() call.

Apply this diff to optimize the code:

-            used_ids = {
-                int(pathlib.Path(folder).parent.parent.name[7:10]) 
-                for folder in all_nnunet_folders
-                if pathlib.Path(folder).parent.parent.name.startswith("Dataset")
-            }
+            used_ids = {
+                int(parent_name[7:10])
+                for folder in all_nnunet_folders
+                if (parent_name := pathlib.Path(folder).parent.parent.name).startswith("Dataset")
+            }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dfa1ce1 and 3ed91bc.

📒 Files selected for processing (3)
  • docs/nnUNet.md (3 hunks)
  • src/imgtools/autopipeline.py (8 hunks)
  • src/imgtools/utils/nnunet.py (1 hunks)
🧰 Additional context used
📓 Learnings (1)
src/imgtools/autopipeline.py (1)
Learnt from: JoshuaSiraj
PR: bhklab/med-imagetools#145
File: src/imgtools/autopipeline.py:168-173
Timestamp: 2024-11-28T21:22:43.073Z
Learning: Dataset IDs are always 3-digit numbers in this codebase.
🪛 Ruff (0.8.0)
src/imgtools/autopipeline.py

632-632: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (6)
docs/nnUNet.md (2)

43-48: LGTM: Directory structure is well documented.

The directory structure accurately reflects the new organization of nnUNet files and folders.


62-64: Fix environment variable export syntax.

There's a syntax error in the environment variable export statements. The double equals (==) for nnUNet_results is incorrect.

src/imgtools/utils/nnunet.py (1)

1-2: LGTM! Clean imports.

The imports have been properly cleaned up as suggested in the past review comments.

src/imgtools/autopipeline.py (3)

158-164: LGTM: Directory creation is robust and safe.

The code safely creates nnUNet directories using mkdir with appropriate flags to handle parent directories and prevent race conditions.


629-647: LGTM: Comprehensive nnUNet setup with proper documentation.

The code effectively:

  1. Maps modalities to channel names
  2. Generates dataset.json with proper configuration
  3. Creates training scripts
  4. Generates markdown reports with visualizations
🧰 Tools
🪛 Ruff (0.8.0)

632-632: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


429-433: ⚠️ Potential issue

Add validation for subject_id format to prevent IndexError.

The code assumes subject_id contains at least two parts separated by underscore. This could raise an IndexError if the format is invalid.

Apply this diff to add validation:

     if self.is_nnunet or self.is_nnunet_inference:
+        parts = subject_id.split('_')
+        if len(parts) < 2:
+            raise ValueError(f"Invalid subject_id format: {subject_id}. Expected format: NUMBER_NAME")
-        subject_id = f"{subject_id.split('_')[1]}_{subject_id.split('_')[0]:03}"
+        subject_id = f"{parts[1]}_{parts[0]:03}"

Likely invalid or redundant comment.

src/imgtools/utils/nnunet.py Show resolved Hide resolved
src/imgtools/utils/nnunet.py Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
src/imgtools/utils/nnunet.py (1)

46-84: Extract configuration and improve script generation

The function uses hardcoded folder names and lacks configuration flexibility.

Consider extracting the configuration to make the function more flexible:

+from dataclasses import dataclass
+
+@dataclass
+class NNUNetConfig:
+    raw_dir: str = "nnUNet_raw"
+    preprocessed_dir: str = "nnUNet_preprocessed"
+    results_dir: str = "nnUNet_results"
+    num_folds: int = 5
+
 def create_train_script(
         output_directory: str | pathlib.Path,
-        dataset_id: int):
+        dataset_id: int,
+        config: NNUNetConfig = NNUNetConfig()) -> pathlib.Path:
     """
     Creates a bash script (`nnunet_preprocess_and_train.sh`) for running nnUNet training, with paths for raw data,
     preprocessed data, and trained models. The script ensures environment variables are set and 
     executes the necessary training commands.

     Parameters:
     - output_directory (str): The directory where the output and subdirectories are located.
     - dataset_id (int): The ID of the dataset to be processed.
+    - config (NNUNetConfig): Configuration for nnUNet directories and training parameters.
+
+    Returns:
+    - pathlib.Path: Path to the generated script.
     """
     if dataset_id < 0:
         raise ValueError("dataset_id must be non-negative")

     # Define paths using pathlib
     output_directory = pathlib.Path(output_directory)
     shell_path = output_directory / 'nnunet_preprocess_and_train.sh'
     base_dir = output_directory.parent.parent

     if shell_path.exists():
         shell_path.unlink()

     # Define the environment variables and the script commands
     script_content = f"""#!/bin/bash
set -e

-export nnUNet_raw="{base_dir}/nnUNet_raw"
-export nnUNet_preprocessed="{base_dir}/nnUNet_preprocessed"
-export nnUNet_results="{base_dir}/nnUNet_results"
+export nnUNet_raw="{base_dir}/{config.raw_dir}"
+export nnUNet_preprocessed="{base_dir}/{config.preprocessed_dir}"
+export nnUNet_results="{base_dir}/{config.results_dir}"

nnUNetv2_plan_and_preprocess -d {dataset_id} --verify_dataset_integrity -c 3d_fullres

-for (( i=0; i<5; i++ ))
+for (( i=0; i<{config.num_folds}; i++ ))
do
    nnUNetv2_train {dataset_id} 3d_fullres $i
done
"""

     # Write the script content to the file
     with shell_path.open("w", newline="\n") as f:
         f.write(script_content)
+    shell_path.chmod(0o755)  # Make the script executable
+    return shell_path
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3ed91bc and 1a0552d.

📒 Files selected for processing (1)
  • src/imgtools/utils/nnunet.py (1 hunks)

src/imgtools/utils/nnunet.py Show resolved Hide resolved
src/imgtools/utils/nnunet.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/imgtools/autopipeline.py (2)

168-192: Consider using constants for dataset ID range

The code correctly handles dataset IDs, but could be improved by defining constants for better maintainability.

Consider this refactor:

+MIN_DATASET_ID = 1
+MAX_DATASET_ID = 999
+DATASET_ID_FORMAT = "{:03}"

-all_ids = set(range(1, 1000))
+all_ids = set(range(MIN_DATASET_ID, MAX_DATASET_ID + 1))

-dataset_folder_name = f"Dataset{self.dataset_id:03}_{study_name}"
+dataset_folder_name = f"Dataset{DATASET_ID_FORMAT.format(self.dataset_id)}_{study_name}"

Line range hint 429-458: Simplify dictionary key lookups

The modality handling logic is correct, but the dictionary key lookups can be simplified.

Apply this diff to simplify the lookups:

-if metadata["AcquisitionContrast"] not in self.total_modality_counter.keys():
+if metadata["AcquisitionContrast"] not in self.total_modality_counter:

-if modality not in self.total_modality_counter.keys():
+if modality not in self.total_modality_counter:

-if self.nnunet_info["current_modality"] not in self.nnunet_info["modalities"].keys():
+if self.nnunet_info["current_modality"] not in self.nnunet_info["modalities"]:
🧰 Tools
🪛 Ruff (0.8.0)

456-456: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1a0552d and 615a232.

📒 Files selected for processing (1)
  • src/imgtools/autopipeline.py (9 hunks)
🧰 Additional context used
📓 Learnings (1)
src/imgtools/autopipeline.py (1)
Learnt from: JoshuaSiraj
PR: bhklab/med-imagetools#145
File: src/imgtools/autopipeline.py:168-173
Timestamp: 2024-11-28T21:22:43.073Z
Learning: Dataset IDs are always 3-digit numbers in this codebase.
🪛 Ruff (0.8.0)
src/imgtools/autopipeline.py

456-456: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)


643-643: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (4)
src/imgtools/autopipeline.py (4)

158-164: LGTM: Clean directory setup using pathlib

The directory creation is implemented correctly using pathlib with proper parent directory handling and race condition prevention.


193-200: LGTM: Proper dataset folder creation

The dataset folder creation is implemented correctly using pathlib with proper path handling.


450-453: LGTM: Proper train/test split handling

The code correctly handles the routing of outputs to appropriate train/test directories with consistent subject naming.

Also applies to: 560-563


629-658: LGTM: Comprehensive data saving and reporting

The implementation correctly handles:

  • Training/test case counting
  • Channel name mapping for dataset JSON
  • Training script generation
  • Markdown report generation
🧰 Tools
🪛 Ruff (0.8.0)

643-643: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

@jjjermiah
Copy link
Contributor

@JoshuaSiraj no glaring issues in the latest commit, i've also resolved the useless code rabbit comments but theres 3 I've left unresolved which I think are worth addressing

@jjjermiah
Copy link
Contributor

theres also now 27 instances of pathlib.Path(self.output_directory) in the code, which is impressive
if i were to refactor anything. i would start by making the code readable first, and then address logic later

@JoshuaSiraj
Copy link
Collaborator Author

@JoshuaSiraj no glaring issues in the latest commit, i've also resolved the useless code rabbit comments but theres 3 I've left unresolved which I think are worth addressing

Thanks!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
docs/nnUNet.md (2)

76-76: Add parameter documentation for preprocessing command.

While the command is correct, consider adding explanations for the parameters:

  • -d X: Dataset ID number
  • -c 3d_fullres: Configuration type
  • --verify_dataset_integrity: Validation step

114-114: Add parameter documentation for inference command.

Consider adding explanations for the parameters:

  • -i INPUT_FOLDER: Input directory containing images
  • -o OUTPUT_FOLDER: Output directory for predictions
  • -d DATASET_NAME_OR_ID: Dataset identifier
  • -c CONFIGURATION: Model configuration (e.g., 3d_fullres)
docs/AutoPipeline.md (2)

238-238: Add an example of the output filename format.

Consider adding an example to clarify the format:
e.g., HNSCC_001_CT.nii.gz


253-262: Consider documenting additional dataset.json fields.

While the current fields are correct, consider mentioning these optional fields:

  • name: Dataset name
  • description: Dataset description
  • reference: Citation or reference
  • release: Release version
  • tensorImageSize: Image dimensionality ("2D" or "3D")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 615a232 and d3e1a11.

📒 Files selected for processing (2)
  • docs/AutoPipeline.md (3 hunks)
  • docs/nnUNet.md (3 hunks)
🔇 Additional comments (4)
docs/nnUNet.md (3)

43-48: LGTM! Directory structure is correct for nnUNetV2.

The updated directory structure accurately reflects the nnUNetV2 requirements.


54-56: LGTM! Environment variables are correctly set.

The environment variables are properly defined with single equals signs.


62-64: ⚠️ Potential issue

Fix environment variable export syntax.

There's a syntax error in the environment variable export statements. The double equals (==) for nnUNet_results is incorrect.

Apply this fix:

-export nnUNet_results=="/OUTPUT_DIRECTORY/nnUNet_results"
+export nnUNet_results="/OUTPUT_DIRECTORY/nnUNet_results"
docs/AutoPipeline.md (1)

177-188: LGTM! Directory structure is comprehensive and accurate.

The structure correctly shows all required files and directories for nnUNetV2.

@JoshuaSiraj JoshuaSiraj requested a review from skim2257 December 2, 2024 16:43
@JoshuaSiraj
Copy link
Collaborator Author

Old:

├── nnUNet_preprocessed
├── nnUNet_raw_data_base
│   └── nnUNet_raw_data
│       └── Task500_data
│           ├── dataset.csv
│           ├── dataset.json
│           ├── data.sh
│           ├── imagesTr
│           │   └── 0_RADCURE-0005_0000.nii.gz
│           ├── labelsTr
│           │   └── 0_RADCURE-0005.nii.gz
│           ├── markdown_images
│           │   ├── nnunet_modality_count.png
│           │   └── nnunet_train_test_pie.png
│           └── report.md
└── nnUNet_trained_models

New:

├── nnUNet_preprocessed
├── nnUNet_raw
│   └── Dataset001_RADCURE
│       ├── dataset.csv
│       ├── dataset.json
│       ├── imagesTr
│       │   ├── RADCURE_001_0000.nii.gz
│       │   ├── RADCURE_002_0000.nii.gz
│       │   ├── RADCURE_003_0000.nii.gz
│       │   ├── RADCURE_004_0000.nii.gz
│       │   ├── RADCURE_005_0000.nii.gz
│       ├── imagesTs
│       │   ├── RADCURE_000_0000.nii.gz
│       │   ├── RADCURE_006_0000.nii.gz
│       ├── labelsTr
│       │   ├── RADCURE_001.nii.gz
│       │   ├── RADCURE_002.nii.gz
│       │   ├── RADCURE_003.nii.gz
│       │   ├── RADCURE_004.nii.gz
│       │   ├── RADCURE_005.nii.gz
│       ├── labelsTs
│       │   ├── RADCURE_000.nii.gz
│       │   ├── RADCURE_006.nii.gz
│       ├── markdown_images
│       │   ├── nnunet_modality_count.png
│       │   └── nnunet_train_test_pie.png
│       ├── nnunet_preprocess_and_train.sh
│       └── report.md
└── nnUNet_results

@skim2257

1 similar comment
@JoshuaSiraj
Copy link
Collaborator Author

Old:

├── nnUNet_preprocessed
├── nnUNet_raw_data_base
│   └── nnUNet_raw_data
│       └── Task500_data
│           ├── dataset.csv
│           ├── dataset.json
│           ├── data.sh
│           ├── imagesTr
│           │   └── 0_RADCURE-0005_0000.nii.gz
│           ├── labelsTr
│           │   └── 0_RADCURE-0005.nii.gz
│           ├── markdown_images
│           │   ├── nnunet_modality_count.png
│           │   └── nnunet_train_test_pie.png
│           └── report.md
└── nnUNet_trained_models

New:

├── nnUNet_preprocessed
├── nnUNet_raw
│   └── Dataset001_RADCURE
│       ├── dataset.csv
│       ├── dataset.json
│       ├── imagesTr
│       │   ├── RADCURE_001_0000.nii.gz
│       │   ├── RADCURE_002_0000.nii.gz
│       │   ├── RADCURE_003_0000.nii.gz
│       │   ├── RADCURE_004_0000.nii.gz
│       │   ├── RADCURE_005_0000.nii.gz
│       ├── imagesTs
│       │   ├── RADCURE_000_0000.nii.gz
│       │   ├── RADCURE_006_0000.nii.gz
│       ├── labelsTr
│       │   ├── RADCURE_001.nii.gz
│       │   ├── RADCURE_002.nii.gz
│       │   ├── RADCURE_003.nii.gz
│       │   ├── RADCURE_004.nii.gz
│       │   ├── RADCURE_005.nii.gz
│       ├── labelsTs
│       │   ├── RADCURE_000.nii.gz
│       │   ├── RADCURE_006.nii.gz
│       ├── markdown_images
│       │   ├── nnunet_modality_count.png
│       │   └── nnunet_train_test_pie.png
│       ├── nnunet_preprocess_and_train.sh
│       └── report.md
└── nnUNet_results

@skim2257

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/imgtools/utils/nnunet.py (1)

5-35: Consider clearing plot state between figures.

While the function correctly closes figures, it's good practice to clear the plot state between figures to ensure complete isolation.

     plt.figure()  
     plt.bar(modalities, modality_totals)
     plt.title("Modality Counts")
     plt.xlabel("Modalities")
     plt.ylabel("Counts")
     plt.savefig(images_folder / "nnunet_modality_count.png")
     plt.close()  
+    plt.clf()  # Clear the current figure

     # Pie chart for train/test distribution
     plt.figure()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3e1a11 and caf5da5.

📒 Files selected for processing (2)
  • .gitignore (1 hunks)
  • src/imgtools/utils/nnunet.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🧰 Additional context used
📓 Learnings (1)
src/imgtools/utils/nnunet.py (1)
Learnt from: jjjermiah
PR: bhklab/med-imagetools#145
File: src/imgtools/utils/nnunet.py:88-169
Timestamp: 2024-11-29T21:18:00.351Z
Learning: For the `generate_dataset_json` function in `src/imgtools/utils/nnunet.py`, adding validation for required parameters is not necessary.
🔇 Additional comments (4)
src/imgtools/utils/nnunet.py (4)

1-3: LGTM! Clean and minimal imports.

All imports are necessary and well-organized.


38-44: LGTM! Clean and focused implementation.

The function is well-typed and handles its single responsibility effectively.


88-169: LGTM! Well-documented and properly attributed implementation.

The function is well-documented, properly typed, and efficiently handles the dataset configuration generation. The attribution to the original nnUNet repository is appropriately included.


46-84: Verify shell script compatibility across systems.

The script uses /bin/bash shebang, which might not be available in the same location across all systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants