Skip to content

8525 improve documentation on the datalist format #8539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: dev
Choose a base branch
from

Conversation

DanielNobbe
Copy link

@DanielNobbe DanielNobbe commented Aug 14, 2025

Fixes #8525 .

Description

I found the description of the Medical Segmentation Decathlon datalist format (short: decathlon datalist) lacking, although some parts of the framework depend on it, specifically the Auto3DSeg AutoRunner.
I've added a comprehensive description of the format under monai.data.decathlon_datalist.load_decathlon_datalist, and some small notes elsewhere.

There's a corresponding MR for the tutorials here.

Please let me know if anything is incorrect, the codebase is quite big and I haven't been working with it for very long.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Copy link

coderabbitai bot commented Aug 14, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Docstrings were expanded to document the Decathlon datalist format and required inputs. In monai/apps/auto3dseg/auto_runner.py, AutoRunner gained a Notes section specifying required configuration keys (modality, datalist, dataroot) and a reference to the datalist file format. In monai/data/decathlon_datalist.py, load_decathlon_datalist and load_decathlon_properties docstrings were rewritten to explicitly describe the Decathlon JSON structure (metadata, train/test lists, optional fold), updated return examples (label paths), and formatting tweaks. No code behavior or API signatures changed.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Assessment against linked issues

Objective Addressed Explanation
Document Decathlon datalist format, including metadata and train/test lists, under load_decathlon_datalist (#8525)
Clarify usage by referencing/aligning Auto3dSeg AutoRunner requirements (config keys) (#8525)
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@DanielNobbe DanielNobbe force-pushed the 8525-improve-documentation-on-the-datalist-format branch from 48ddcbe to ed384d7 Compare August 14, 2025 14:38
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 300d737
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: f0dde7a
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 2648b84
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 86c9085
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 48afc88
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 761306a
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: ed384d7

Signed-off-by: Daniël Nobbe <[email protected]>
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: b46ed41

Signed-off-by: Daniël Nobbe <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
monai/apps/auto3dseg/auto_runner.py (1)

197-204: Avoid duplicate “Notes”; rename and add Sphinx cross-ref to the datalist docs

There’s already a “Notes” section above. Prefer a single notes section or rename this block. Also use a Sphinx cross-reference for the datalist function.

Apply this diff:

-    Notes:
-        The input config requires at least the following keys:
+    Required input configuration:
+        The input config requires at least the following keys:
         - ``modality``: the modality of the data, e.g. "ct", "mri", etc.
         - ``datalist``: the path to the datalist file in JSON format.
         - ``dataroot``: the root directory of the data files.
 
-        For the datalist file format, see the description under monai.data.decathlon_datalist.load_decathlon_datalist.
+        See also: :py:func:`monai.data.decathlon_datalist.load_decathlon_datalist` for the datalist file format.
monai/data/decathlon_datalist.py (3)

95-128: Clarify structure: allow list-valued images/labels and mention optional validation

Great expansion. Two minor clarity gaps: (1) “image” and “label” can be strings or lists of strings (multi-modal/multi-channel inputs). (2) MONAI datalists may also include an optional “validation” list.

Apply this diff:

-    JSON file should follow the format of the Medical Segmentation Decathlon
+    JSON file should follow the format of the Medical Segmentation Decathlon
     datalist.json files, see http://medicaldecathlon.com.
     The files are structured as follows:
 
     .. code-block:: python
 
         {
             "metadata_key_0": "metadata_value_0",
             "metadata_key_1": "metadata_value_1",
             ...,
             "training": [
-                {"image": "path/to/image_1.nii.gz", "label": "path/to/label_1.nii.gz"},
-                {"image": "path/to/image_2.nii.gz", "label": "path/to/label_2.nii.gz"},
+                # image/label can be a string or a list of strings
+                {"image": "path/to/image_1.nii.gz", "label": "path/to/label_1.nii.gz"},
+                {"image": ["path/to/imgA_2.nii.gz", "path/to/imgB_2.nii.gz"], "label": "path/to/label_2.nii.gz"},
                 ...
             ],
+            "validation": [
+                {"image": "path/to/image_val_1.nii.gz", "label": "path/to/label_val_1.nii.gz"},
+                ...
+            ],
             "test": [
                 "path/to/image_3.nii.gz",
                 "path/to/image_4.nii.gz",
                 ...
             ]
         }
 
@@
-    The ``training`` key contains a list of dictionaries, each of which has at least
-    the ``image`` and ``label`` keys, the latter of which is a path (for segmentation data).
-    Each item can also include a ``fold`` key for cross-validation purposes.
-    The "test" key contains a list of image paths, without labels.
+    The ``training`` key contains a list of dictionaries, each of which has at least
+    the ``image`` and ``label`` keys; both can be a string or a list of strings (paths).
+    Each item can also include a ``fold`` key for cross-validation purposes.
+    The ``validation`` key is optional and, if present, follows the same structure as ``training``.
+    The ``test`` key contains a list of image paths, without labels.

142-147: Fix example typo and reflect list-valued paths

The first example label path is missing a closing quote; also consider showing list-valued “image” to match the note above.

Apply this diff:

-    .. code-block:: python
+    .. code-block:: python
 
         [
-            {'image': '/workspace/data/chest_19.nii.gz',  'label': '/workspace/labels/chest_19.nii.gz},
-            {'image': '/workspace/data/chest_31.nii.gz',  'label': '/workspace/labels/chest_31.nii.gz'},
+            {"image": "/workspace/data/chest_19.nii.gz",  "label": "/workspace/labels/chest_19.nii.gz"},
+            {"image": ["/workspace/data/chest_31_a.nii.gz", "/workspace/data/chest_31_b.nii.gz"],
+             "label": "/workspace/labels/chest_31.nii.gz"},
         ]

169-171: Use Sphinx cross-ref for the related function

Turn the backticked name into a working Sphinx link.

Apply this diff:

-    """Extract the properties with the specified keys from the Decathlon JSON file.
-    See under `load_decathlon_datalist` for the expected keys in the Decathlon challenge.
+    """Extract the properties with the specified keys from the Decathlon JSON file.
+    See also :py:func:`monai.data.decathlon_datalist.load_decathlon_datalist` for the expected keys.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between cafc1fe and 716f4d8.

📒 Files selected for processing (2)
  • monai/apps/auto3dseg/auto_runner.py (1 hunks)
  • monai/data/decathlon_datalist.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit Configuration File

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/apps/auto3dseg/auto_runner.py
  • monai/data/decathlon_datalist.py

@DanielNobbe DanielNobbe marked this pull request as draft August 15, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve documentation on the datalist format
1 participant