-
Notifications
You must be signed in to change notification settings - Fork 1.2k
8525 improve documentation on the datalist format #8539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
8525 improve documentation on the datalist format #8539
Conversation
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughDocstrings were expanded to document the Decathlon datalist format and required inputs. In monai/apps/auto3dseg/auto_runner.py, AutoRunner gained a Notes section specifying required configuration keys (modality, datalist, dataroot) and a reference to the datalist file format. In monai/data/decathlon_datalist.py, load_decathlon_datalist and load_decathlon_properties docstrings were rewritten to explicitly describe the Decathlon JSON structure (metadata, train/test lists, optional fold), updated return examples (label paths), and formatting tweaks. No code behavior or API signatures changed. Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Assessment against linked issues
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
48ddcbe
to
ed384d7
Compare
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 300d737 I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: f0dde7a I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 2648b84 I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 86c9085 I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 48afc88 I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: 761306a I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: ed384d7 Signed-off-by: Daniël Nobbe <[email protected]>
I, Daniël Nobbe <[email protected]>, hereby add my Signed-off-by to this commit: b46ed41 Signed-off-by: Daniël Nobbe <[email protected]>
Signed-off-by: Daniël Nobbe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
monai/apps/auto3dseg/auto_runner.py (1)
197-204
: Avoid duplicate “Notes”; rename and add Sphinx cross-ref to the datalist docsThere’s already a “Notes” section above. Prefer a single notes section or rename this block. Also use a Sphinx cross-reference for the datalist function.
Apply this diff:
- Notes: - The input config requires at least the following keys: + Required input configuration: + The input config requires at least the following keys: - ``modality``: the modality of the data, e.g. "ct", "mri", etc. - ``datalist``: the path to the datalist file in JSON format. - ``dataroot``: the root directory of the data files. - For the datalist file format, see the description under monai.data.decathlon_datalist.load_decathlon_datalist. + See also: :py:func:`monai.data.decathlon_datalist.load_decathlon_datalist` for the datalist file format.monai/data/decathlon_datalist.py (3)
95-128
: Clarify structure: allow list-valued images/labels and mention optional validationGreat expansion. Two minor clarity gaps: (1) “image” and “label” can be strings or lists of strings (multi-modal/multi-channel inputs). (2) MONAI datalists may also include an optional “validation” list.
Apply this diff:
- JSON file should follow the format of the Medical Segmentation Decathlon + JSON file should follow the format of the Medical Segmentation Decathlon datalist.json files, see http://medicaldecathlon.com. The files are structured as follows: .. code-block:: python { "metadata_key_0": "metadata_value_0", "metadata_key_1": "metadata_value_1", ..., "training": [ - {"image": "path/to/image_1.nii.gz", "label": "path/to/label_1.nii.gz"}, - {"image": "path/to/image_2.nii.gz", "label": "path/to/label_2.nii.gz"}, + # image/label can be a string or a list of strings + {"image": "path/to/image_1.nii.gz", "label": "path/to/label_1.nii.gz"}, + {"image": ["path/to/imgA_2.nii.gz", "path/to/imgB_2.nii.gz"], "label": "path/to/label_2.nii.gz"}, ... ], + "validation": [ + {"image": "path/to/image_val_1.nii.gz", "label": "path/to/label_val_1.nii.gz"}, + ... + ], "test": [ "path/to/image_3.nii.gz", "path/to/image_4.nii.gz", ... ] } @@ - The ``training`` key contains a list of dictionaries, each of which has at least - the ``image`` and ``label`` keys, the latter of which is a path (for segmentation data). - Each item can also include a ``fold`` key for cross-validation purposes. - The "test" key contains a list of image paths, without labels. + The ``training`` key contains a list of dictionaries, each of which has at least + the ``image`` and ``label`` keys; both can be a string or a list of strings (paths). + Each item can also include a ``fold`` key for cross-validation purposes. + The ``validation`` key is optional and, if present, follows the same structure as ``training``. + The ``test`` key contains a list of image paths, without labels.
142-147
: Fix example typo and reflect list-valued pathsThe first example label path is missing a closing quote; also consider showing list-valued “image” to match the note above.
Apply this diff:
- .. code-block:: python + .. code-block:: python [ - {'image': '/workspace/data/chest_19.nii.gz', 'label': '/workspace/labels/chest_19.nii.gz}, - {'image': '/workspace/data/chest_31.nii.gz', 'label': '/workspace/labels/chest_31.nii.gz'}, + {"image": "/workspace/data/chest_19.nii.gz", "label": "/workspace/labels/chest_19.nii.gz"}, + {"image": ["/workspace/data/chest_31_a.nii.gz", "/workspace/data/chest_31_b.nii.gz"], + "label": "/workspace/labels/chest_31.nii.gz"}, ]
169-171
: Use Sphinx cross-ref for the related functionTurn the backticked name into a working Sphinx link.
Apply this diff:
- """Extract the properties with the specified keys from the Decathlon JSON file. - See under `load_decathlon_datalist` for the expected keys in the Decathlon challenge. + """Extract the properties with the specified keys from the Decathlon JSON file. + See also :py:func:`monai.data.decathlon_datalist.load_decathlon_datalist` for the expected keys.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (2)
monai/apps/auto3dseg/auto_runner.py
(1 hunks)monai/data/decathlon_datalist.py
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit Configuration File
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/apps/auto3dseg/auto_runner.py
monai/data/decathlon_datalist.py
Signed-off-by: Daniël Nobbe <[email protected]>
Signed-off-by: Daniël Nobbe <[email protected]>
for more information, see https://pre-commit.ci
Fixes #8525 .
Description
I found the description of the Medical Segmentation Decathlon datalist format (short: decathlon datalist) lacking, although some parts of the framework depend on it, specifically the Auto3DSeg AutoRunner.
I've added a comprehensive description of the format under
monai.data.decathlon_datalist.load_decathlon_datalist
, and some small notes elsewhere.There's a corresponding MR for the tutorials here.
Please let me know if anything is incorrect, the codebase is quite big and I haven't been working with it for very long.
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.