Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #100

Merged
merged 3 commits into from
Nov 10, 2024
Merged

Dev #100

merged 3 commits into from
Nov 10, 2024

Conversation

Oufattole
Copy link
Collaborator

@Oufattole Oufattole commented Nov 6, 2024

Resolves issue #98 and cleans up the tutorial

Summary by CodeRabbit

  • New Features

    • Enhanced README documentation for the MIMIC-IV tutorial, improving clarity and organization.
    • Added new sections for task criteria files and mortality prediction tasks, detailing configurations and significance.
    • Introduced YAML configuration files for predicting in-hospital and ICU mortality, as well as post-hospital discharge mortality.
  • Bug Fixes

    • Improved error handling in the tabularization process to prevent accidental overwrites of existing files.
  • Documentation

    • Updated various README files to provide clearer instructions and information regarding new features and configurations.

…ta to false -- in that setting it should just skip over existing files
Tutorial update, added a full MIMICIV tutorial walking through meds_transform extraction and aces task extraction too.
Copy link
Contributor

coderabbitai bot commented Nov 6, 2024

Walkthrough

The pull request implements several updates to the MIMIC-IV tutorial documentation, enhancing clarity and organization. Key changes include the addition of a new introductory section, restructured data extraction instructions, and updates to environment variable commands. New YAML configuration files for mortality and readmission tasks are introduced, along with detailed descriptions and predicates for each task. The README files across various directories have been updated to provide comprehensive overviews of the tasks and their significance, ensuring users have clear guidance on utilizing the MIMIC-IV dataset effectively.

Changes

File Path Change Summary
MIMICIV_TUTORIAL/README.MD Added introductory note, restructured extraction section, updated environment variable commands.
MIMICIV_TUTORIAL/tasks/README.md Introduced "Task Criteria Files" section explaining configuration files for MEDS-TAB tasks.
MIMICIV_TUTORIAL/tasks/los/in_hospital/first_48h.yaml New YAML configuration for predicting in-hospital LOS exceeding three days.
MIMICIV_TUTORIAL/tasks/los/in_icu/first_48h.yaml New YAML configuration for predicting ICU LOS exceeding three days.
MIMICIV_TUTORIAL/tasks/mortality/README.md Overview of mortality prediction tasks and their significance added.
MIMICIV_TUTORIAL/tasks/mortality/in_hospital/README.md New section on "In-hospital Mortality Prediction" added.
MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_24h.yaml New YAML configuration for predicting in-hospital mortality within the first 24 hours.
MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_48h.yaml New YAML configuration for predicting in-hospital mortality within the first 48 hours.
MIMICIV_TUTORIAL/tasks/mortality/in_icu/README.md New section on "In-ICU Mortality Prediction" added.
MIMICIV_TUTORIAL/tasks/mortality/in_icu/first_24h.yaml New YAML configuration for predicting in-ICU mortality within the first 24 hours.
MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/1y.yaml New YAML for predicting mortality within one year post-discharge.
MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/30d.yaml New YAML for predicting mortality within 30 days post-discharge.
MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/README.md Added section on "Post-hospitalization Mortality Prediction."
MIMICIV_TUTORIAL/tasks/readmission/30d.yaml New YAML configuration for managing hospital admission and discharge events within 30 days.
MIMICIV_TUTORIAL/tasks/readmission/README.md New section on readmission prediction tasks added.
src/MEDS_tabular_automl/scripts/tabularize_static.py Enhanced error handling for file writing operations in tabularization script.

Possibly related PRs

Suggested reviewers

  • mmcdermott
  • teyaberg

Poem

🐰 In the garden of knowledge, we hop and play,
With MIMIC-IV's wisdom lighting the way.
New paths to explore, with data so bright,
Each YAML a treasure, each README a light.
Let's gather our findings, share them with cheer,
For learning together brings us all near! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.96%. Comparing base (02c44af) to head (1443717).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #100      +/-   ##
==========================================
+ Coverage   81.91%   81.96%   +0.05%     
==========================================
  Files          20       20              
  Lines        1255     1253       -2     
==========================================
- Hits         1028     1027       -1     
+ Misses        227      226       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (20)
MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/30d.yaml (1)

10-18: Document the input window configuration.

The input window configuration could benefit from additional documentation:

  1. Consider adding a comment explaining what NULL start means
  2. The condition (None, 0) for death events should be documented for clarity

Suggested documentation:

  input:
    start: NULL  # No specific start time constraint
    end: trigger
    start_inclusive: True
    end_inclusive: True
    index_timestamp: end
    has:
-      death: (None, 0) # They must be alive at the time of hospital discharge
+      death: (None, 0) # Format: (min_count, max_count). Here: patient must have 0 death events
MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/1y.yaml (2)

1-6: Consider adding documentation and tightening the regex pattern.

The predicates could benefit from:

  1. Documentation comments explaining the expected format and meaning of the codes
  2. A more specific regex pattern for hospital_discharge to prevent matching unintended codes
  3. Version or format specification for the MEDS_DEATH code
 predicates:
+  # Matches hospital discharge events with format HOSPITAL_DISCHARGE//<details>
   hospital_discharge:
-    code: { regex: "HOSPITAL_DISCHARGE//.*" }
+    code: { regex: "^HOSPITAL_DISCHARGE//[A-Za-z0-9_-]+$" }

+  # Represents death events in MEDS format
   death:
     code: MEDS_DEATH
+    version: "1.0"  # Add version if applicable

19-24: Consider enhancing target window documentation.

The 365-day mortality prediction window is a common metric in medical research. However, consider:

  1. Adding a comment explaining why 1-year mortality was chosen
  2. Documenting how this aligns with standard medical research practices
   target:
     start: trigger
     end: start + 365d
     start_inclusive: False
     end_inclusive: True
     label: death
+    # One-year mortality is a standard metric in medical research,
+    # providing insights into long-term patient outcomes post-discharge
MIMICIV_TUTORIAL/tasks/readmission/30d.yaml (2)

20-25: LGTM! Consider documenting the same-day readmission exclusion.

The target window configuration correctly implements a 30-day readmission window. The exclusive start (start_inclusive: False) intentionally excludes same-day readmissions, which should be documented for clarity.

Add a comment explaining the rationale:

  target:
    start: trigger
    end: start + 30d
    start_inclusive: False  # Exclude same-day readmissions
    end_inclusive: True
    label: hospital_admission

26-32: LGTM! Consider documenting the censoring protection purpose.

The censoring protection window correctly ensures data completeness by requiring at least one event after the target window. This prevents right-censoring issues that could bias the prediction task.

Add documentation explaining the purpose:

  censoring_protection_window:
+   # This window ensures the patient's data extends beyond the target window,
+   # preventing right-censoring issues where the outcome is unknown
    start: target.end
    end: null
    start_inclusive: False
    end_inclusive: True
    has:
      _ANY_EVENT: (1, None)
MIMICIV_TUTORIAL/tasks/los/in_icu/first_48h.yaml (3)

1-5: Improve description formatting for consistency.

The description is informative but has minor formatting inconsistencies.

Apply these improvements:

 description: >-
   This file specifies the base configuration for the prediction of a icu los being greater than 3days,
   leveraging only the first 48 hours of data after admission, with a 24 hour gap between the input window
   and the target window. Patients who die or are discharged in the gap window are excluded. Note that this
-  task is in-**ICU** los, not in-**HOSPITAL** los which is a different task.
+  task is in-ICU LOS, not in-HOSPITAL LOS which is a different task.

7-15: Consider strengthening predicate definitions.

While the predicates are functionally correct, there are opportunities for improvement:

  1. The ICU regex patterns could be more specific to prevent false matches
  2. The MEDS_DEATH code constant should be documented for maintainability

Consider these improvements:

 predicates:
   icu_admission:
-    code: { regex: "ICU_ADMISSION//.*" }
+    code: { regex: "^ICU_ADMISSION//[^/]+$" }
   icu_discharge:
-    code: { regex: "ICU_DISCHARGE//.*" }
+    code: { regex: "^ICU_DISCHARGE//[^/]+$" }
   death:
     code: MEDS_DEATH
+    # MEDS_DEATH: Standard code indicating patient mortality event

34-39: Verify target window labeling.

The target window configuration uses discharge_or_death as the label, which aligns with the task description. However, consider documenting the expected label values for clarity.

Add a comment explaining the label values:

   target:
     start: trigger
     end: start + 3d
     start_inclusive: False
     end_inclusive: True
     label: discharge_or_death
+    # Label will be True if patient is discharged or dies within the target window
MIMICIV_TUTORIAL/tasks/los/in_hospital/first_48h.yaml (1)

1-5: Enhance description clarity and formatting.

While the description is informative, consider these improvements:

  1. Use proper sentence case and punctuation
  2. Add explicit mention of the 3-day threshold in the task name
  3. Clarify the exclusion criteria timing
 description: >-
-  This file specifies the base configuration for the prediction of a hospital los being greater than 3days,
-  leveraging only the first 48 hours of data after admission, with a 24 hour gap between the input window
-  and the target window. Patients who die or are discharged in the gap window are excluded. Note that this
-  task is in-**hospital** los, not in-**ICU** los which is a different task.
+  This file specifies the base configuration for predicting hospital length of stay (LOS) > 3 days,
+  leveraging only the first 48 hours of data after admission. There is a 24-hour gap between the input
+  window and the target window, during which patients who die or are discharged are excluded from the
+  analysis. Note: This task predicts in-HOSPITAL LOS, not in-ICU LOS (which is a separate task).
MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_24h.yaml (1)

1-8: Consider revising the markdown-style formatting in YAML description.

The description uses markdown-style bold formatting (e.g., **hospital**, **ICU**, **30-day**) which might not render as expected in YAML. Consider using plain text or appropriate YAML formatting conventions.

-  Note that this task is in-**hospital** mortality, not in-**ICU** mortality or **30-day** mortality
+  Note that this task is in-hospital mortality, not in-ICU mortality or 30-day mortality
MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_48h.yaml (1)

1-9: Consider enhancing the description with additional context.

While the description is comprehensive, it would be helpful to explain:

  • The rationale behind choosing a 24-hour gap window
  • The clinical implications of excluding patients who die or are discharged during the gap window
MIMICIV_TUTORIAL/tasks/mortality/in_icu/first_24h.yaml (5)

3-4: Complete the metadata fields.

The version and author fields are currently incomplete. Please provide the appropriate values.


11-21: Consider restructuring the description for better readability.

While the description is comprehensive, it could be more structured. Consider using bullet points for key aspects:

 description: >-
-  This file specifies the base configuration for the prediction of in ICU mortality, leveraging only the first
-  24 hours of data after ICU admission, with a 24 hour gap between the input window and the target window.
-  Patients who die or are discharged from the ICU in the gap window are excluded. Patients who die in the same
-  hospital stay but in a subsequent ICU stay are **not** positive labels for this task.
-  Note that this task is in-**ICU** mortality, not in-**hospital** mortality or **30-day** mortality, which
-  are different tasks.
-  Note that this task is predicting the outcome for a patient's entire ICU admission, not just the first 24
-  hours or the subsequent 24 hours after the trigger event. Imminent mortality (on various time-scales) are
-  different tasks.
+  Task Configuration:
+    - Predicts in-ICU mortality using first 24 hours of data after ICU admission
+    - Includes 24-hour gap between input window and target window
+    - Excludes patients who die/discharge during gap window
+
+  Important Distinctions:
+    - Focuses on in-ICU mortality (not in-hospital or 30-day mortality)
+    - Predicts outcome for entire ICU admission
+    - Does not include deaths in subsequent ICU stays during same hospital stay
+    - Not designed for imminent mortality prediction

46-52: Clarify the input window configuration.

The input window's start being null might be unclear. Consider adding a comment explaining that this allows for data collection from the beginning of the available records up to 24 hours after the trigger.


52-60: Consider adding documentation for gap window constraints.

The gap window configuration includes constraints for icu_admission and discharge_or_death with (None, 0). Consider adding a comment explaining that this ensures no additional ICU admissions or discharge/death events occur during the gap period.


60-65: Document the arrow operator usage.

The target window's end condition uses the -> operator. Consider adding a comment explaining that this indicates the window extends until the first occurrence of a discharge_or_death event.

MIMICIV_TUTORIAL/README.MD (4)

7-8: Consider making the directory requirement more prominent.

The note about being in the root directory is crucial for successful execution. Consider making it more prominent by:

  • Moving it to the very top
  • Using bold text or a warning block
-For this tutorial make sure you are cd'd into the root directory of the repository.
+> ⚠️ **IMPORTANT**: Before starting, ensure you are in the root directory of the repository, not in the MIMICIV_TUTORIAL directory.

59-64: Pin dependency versions for reproducibility.

Consider pinning the version of hydra-joblib-launcher to ensure consistent behavior.

 conda create -n aces python=3.12
 conda activate aces
 pip install es-aces==0.5.0
-pip install hydra-joblib-launcher
+pip install hydra-joblib-launcher==1.2.0  # Replace with appropriate version

89-92: Add validation for tabularization parameters.

Consider validating the parameters before running the tabularization script.

+# Validate parameters
+if [ -z "${N_PARALLEL_WORKERS}" ] || [ "${N_PARALLEL_WORKERS}" -lt 1 ]; then
+    echo "Error: Invalid number of workers"
+    exit 1
+fi
+
+if [ -z "${TASKS_STR}" ]; then
+    echo "Error: No tasks specified"
+    exit 1
+fi
+
 export RESHARD_DIR=${ROOT_DIR}/reshareded_meds/ # set to directory to output reshareded meds data
+mkdir -p "${RESHARD_DIR}"
+
 TASKS_STR=$(echo ${TASKS[@]} | tr ' ' ',')
 bash MIMICIV_TUTORIAL/tabularize_meds.sh "${MIMICIV_MEDS_DIR}" "$RESHARD_DIR" $OUTPUT_TABULARIZATION_DIR \
     "${TASKS_STR}" $TASKS_DIR $OUTPUT_MODEL_DIR $N_PARALLEL_WORKERS \

37-37: Remove trailing colons from headers.

For consistency with markdown style guidelines, remove the trailing colons from headers.

-### (Option 1) Download pre-extracted labels from gcp:
+### (Option 1) Download pre-extracted labels from gcp

-### (Option 2) Use ACES to extract labels using a task config definition:
+### (Option 2) Use ACES to extract labels using a task config definition

Also applies to: 55-55

🧰 Tools
🪛 Markdownlint

37-37: Punctuation: ':'
Trailing punctuation in heading

(MD026, no-trailing-punctuation)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 739f338 and 1443717.

📒 Files selected for processing (16)
  • MIMICIV_TUTORIAL/README.MD (3 hunks)
  • MIMICIV_TUTORIAL/tasks/README.md (1 hunks)
  • MIMICIV_TUTORIAL/tasks/los/in_hospital/first_48h.yaml (1 hunks)
  • MIMICIV_TUTORIAL/tasks/los/in_icu/first_48h.yaml (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/README.md (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/in_hospital/README.md (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_24h.yaml (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_48h.yaml (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/in_icu/README.md (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/in_icu/first_24h.yaml (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/1y.yaml (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/30d.yaml (1 hunks)
  • MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/README.md (1 hunks)
  • MIMICIV_TUTORIAL/tasks/readmission/30d.yaml (1 hunks)
  • MIMICIV_TUTORIAL/tasks/readmission/README.md (1 hunks)
  • src/MEDS_tabular_automl/scripts/tabularize_static.py (0 hunks)
💤 Files with no reviewable changes (1)
  • src/MEDS_tabular_automl/scripts/tabularize_static.py
✅ Files skipped from review due to trivial changes (2)
  • MIMICIV_TUTORIAL/tasks/README.md
  • MIMICIV_TUTORIAL/tasks/mortality/in_icu/README.md
🧰 Additional context used
🪛 Markdownlint
MIMICIV_TUTORIAL/README.MD

37-37: Punctuation: ':'
Trailing punctuation in heading

(MD026, no-trailing-punctuation)


55-55: Punctuation: ':'
Trailing punctuation in heading

(MD026, no-trailing-punctuation)

MIMICIV_TUTORIAL/tasks/mortality/README.md

21-21: null
Bare URL used

(MD034, no-bare-urls)

🪛 LanguageTool
MIMICIV_TUTORIAL/tasks/mortality/README.md

[style] ~3-~3: The phrase “a variety of” may be wordy. To make your writing clearer, consider replacing it.
Context: ...for predicting mortality in patients in a variety of clinical contexts. Mortality is a commo...

(A_VARIETY_OF)


[uncategorized] ~14-~14: Possible missing preposition found.
Context: ...diction tasks. We break down mortality a variety of categories, all of which are...

(AI_HYDRA_LEO_MISSING_TO)

MIMICIV_TUTORIAL/tasks/mortality/in_hospital/README.md

[uncategorized] ~6-~6: Possible missing comma found.
Context: ...ed in either an inpatient or outpatient setting as well as the differing likely clinica...

(AI_HYDRA_LEO_MISSING_COMMA)

MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/README.md

[uncategorized] ~6-~6: Possible missing comma found.
Context: ... general state of health as of discharge but also in concert with readmission predic...

(AI_HYDRA_LEO_MISSING_COMMA)

MIMICIV_TUTORIAL/tasks/readmission/README.md

[uncategorized] ~4-~4: Possible missing comma found.
Context: ... actionable task in the U.S. healthcare system in particular due to the financial pena...

(AI_HYDRA_LEO_MISSING_COMMA)

🔇 Additional comments (26)
MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/30d.yaml (3)

8-8: LGTM! Trigger is appropriately set.

The trigger correctly references the hospital_discharge predicate, which is essential for tracking post-discharge events.


1-7: Verify the predicate codes against MIMIC-IV documentation.

The predicates look well-structured, but please ensure:

  1. The regex pattern "HOSPITAL_DISCHARGE//.*" matches all valid hospital discharge codes in MIMIC-IV
  2. "MEDS_DEATH" is the correct code for mortality events
#!/bin/bash
# Search for documentation or code that defines these event codes
rg -l "HOSPITAL_DISCHARGE|MEDS_DEATH" --type md

19-24: Verify the target window calculation.

The target window configuration looks correct, but please ensure:

  1. The expression start + 30d is properly interpreted by your system
  2. The exclusion of the start time (start_inclusive: False) is intentional to avoid counting deaths that occur exactly at discharge
✅ Verification successful

The previous search pattern was too strict. Let's try with a more flexible pattern to find similar configurations and also look for related mortality/discharge patterns in yaml files to verify the consistency of the target window calculation.


Target window configuration is correctly implemented and consistent with other mortality predictions

The target window configuration is verified to be correct based on the following findings:

  • The expression start + 30d follows the same pattern used in other mortality prediction configurations (e.g., start + 365d, start + 3d)
  • The start_inclusive: False setting is consistently used across similar configurations to avoid counting events at the exact start time
  • The configuration properly uses death as the label and correctly sets up the window relative to the hospital discharge trigger
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar time window configurations to verify consistency
rg -A 5 "start \+ \d+d" --type yaml

Length of output: 49


Script:

#!/bin/bash
# Search for similar time window configurations with a more flexible pattern
rg -A 5 "target:.*start.*\+.*d" --type yaml

# Also search for any yaml files with similar mortality/discharge patterns
fd -e yaml -x grep -A 5 -i "mortality\|discharge\|death"

Length of output: 6217

MIMICIV_TUTORIAL/tasks/mortality/post_hospital_discharge/1y.yaml (2)

8-8: LGTM!

The trigger is correctly defined and references the hospital_discharge predicate.


11-18: Verify the input window configuration.

The input window starting from NULL (beginning of data) might include excessive historical data. Consider:

  1. Setting a reasonable time limit for historical data (e.g., last 5 years before discharge)
  2. Adding documentation about the rationale for including all historical data
✅ Verification successful

Input window configuration is consistent with other mortality prediction tasks

Based on the search results, using NULL (beginning of data) as the input window start is a consistent pattern across all mortality prediction tasks in the codebase, including:

  • In-hospital mortality (24h, 48h)
  • In-ICU mortality (24h)
  • Post-discharge mortality (30d, 1y)
  • Length of stay predictions
  • Readmission predictions

This appears to be an intentional design choice to include all available historical data for these predictive tasks, which is reasonable given:

  1. The need for comprehensive patient history in mortality predictions
  2. Consistent implementation across all similar tasks
  3. The presence of proper filtering conditions (e.g., death: (None, 0)) to ensure data validity
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if other mortality prediction tasks in the codebase use similar window configurations
rg -A 10 "input:\s*$" $(fd -e yaml -e yml)

Length of output: 9227

MIMICIV_TUTORIAL/tasks/readmission/30d.yaml (3)

8-14: LGTM! Input window configuration is well-structured.

The window configuration correctly captures all historical events up to the discharge event, with appropriate inclusivity settings and index timestamp placement.


15-20: LGTM! Prior hospitalization window is correctly configured.

The window effectively captures the duration of the current hospital stay, which is essential for readmission prediction.


1-6: Consider using more specific regex patterns for event matching.

The current patterns (".*") will match any string after the event type. This might inadvertently match unintended events. Consider defining more specific patterns based on the actual event codes in your MIMIC-IV dataset.

MIMICIV_TUTORIAL/tasks/los/in_icu/first_48h.yaml (3)

17-17: LGTM!

The trigger is correctly defined using the icu_admission predicate.


26-34: Verify gap window constraints.

The gap window constraints are critical for task validity. The configuration correctly excludes admissions and deaths/discharges during the gap period, which is essential for preventing data leakage.

#!/bin/bash
# Verify similar gap window configurations in other LOS prediction tasks
fd -e yaml -e yml -g '*los*' . | xargs rg -A 10 'gap:.*\n.*start:.*\n.*end:.*\n.*has:'

19-26: Verify input window configuration.

The input window starting at NULL needs verification:

  1. Does this effectively capture all relevant data before the trigger?
  2. Could this lead to inconsistent data collection across different cases?
✅ Verification successful

Let me analyze the window configuration patterns in the codebase to verify if NULL start time is a valid/common pattern and understand its implications.


Input window configuration is consistent with codebase patterns

The NULL start time in the input window is a standard pattern used across multiple task definitions in the codebase, including mortality, readmission, and length of stay predictions. This configuration effectively captures all relevant data up to the trigger point or trigger + timeframe, as evidenced by similar configurations in:

  • Mortality prediction tasks (in-hospital, post-discharge)
  • Readmission prediction tasks
  • Length of stay prediction tasks

The configuration is valid and follows the established pattern in the codebase for collecting historical data before the trigger point.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar window configurations in other task files
fd -e yaml -e yml . | xargs rg -A 3 'input:.*\n.*start:\s*NULL' 

Length of output: 280


Script:

#!/bin/bash
# Search for similar window configurations in other task files, handling multiline properly
fd -e yaml -e yml . | xargs rg -U 'input:\s*\n\s*start:\s*NULL' -A 3

# Also search for other window configurations to understand the pattern
fd -e yaml -e yml . | xargs rg -U 'windows:.*\n.*input:.*\n.*start:' -A 5

Length of output: 7583

MIMICIV_TUTORIAL/tasks/los/in_hospital/first_48h.yaml (5)

17-17: LGTM!

The trigger is correctly set to hospital_admission, which aligns with the task's objective.


19-26: Verify input window configuration.

The input window's start time is set to NULL. Please confirm:

  1. Is this intentional?
  2. Does this mean "from the beginning of available data"?
  3. Should there be a minimum bound to ensure data quality?

Consider adding a minimum bound to ensure consistent data availability across patients.


31-33: LGTM: Gap window constraints are well-defined.

The gap window constraints effectively ensure:

  1. No new hospital admissions during the gap
  2. No discharges or deaths during the gap

This is crucial for maintaining the integrity of the prediction task.


34-39: Verify target window configuration.

The target window configuration looks correct but please verify:

  1. Is 3 days an appropriate threshold for this specific hospital's LOS patterns?
  2. Should the end be exclusive instead of inclusive for consistency?

Consider making the window bounds consistent (either all inclusive or all exclusive) across the configuration to prevent off-by-one errors.


7-15: Verify predicate definitions and add documentation.

Please verify:

  1. Are the regex patterns HOSPITAL_ADMISSION//.* and HOSPITAL_DISCHARGE//.* specific enough? Consider adding examples in comments.
  2. What is the exact format of MEDS_DEATH code? Consider documenting this.
✅ Verification successful

Let me search for any documentation or definition of these codes to provide a complete verification.


Predicate patterns are consistently used across tasks

The regex patterns and codes are used consistently across multiple task definitions in the codebase:

  • HOSPITAL_ADMISSION//.* and HOSPITAL_DISCHARGE//.* patterns are used in readmission, mortality, and length of stay (LOS) tasks
  • MEDS_DEATH is consistently used as a non-regex code for death events
  • The logical combination in discharge_or_death follows the same pattern in all tasks

The patterns appear to be intentionally generic to capture all hospital admission and discharge events, while MEDS_DEATH is a specific code. No documentation suggests these patterns need to be more specific.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of these codes in the codebase to verify patterns
echo "Searching for hospital admission/discharge patterns..."
rg -A 2 "HOSPITAL_(ADMISSION|DISCHARGE)//"

echo "Searching for MEDS_DEATH usage..."
rg -A 2 "MEDS_DEATH"

Length of output: 4630


Script:

#!/bin/bash
# Search for documentation or definition of these codes
echo "Searching for documentation of hospital admission/discharge codes..."
rg -i "hospital.*(admission|discharge).*code" -A 3 -B 3

echo "Searching for documentation of MEDS_DEATH..."
rg -i "meds_death" -A 3 -B 3 "*.md" "*.txt" "*.rst"

# Look for any config or definition files
echo "Searching for code definitions..."
fd -e json -e yaml -e yml | xargs rg "(HOSPITAL_ADMISSION|HOSPITAL_DISCHARGE|MEDS_DEATH)"

Length of output: 6856

MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_24h.yaml (4)

20-20: LGTM! Trigger is appropriately defined.

The trigger event hospital_admission aligns well with the task's purpose of predicting mortality from admission time.


22-38: Verify gap window constraints are sufficient.

The gap window constraints appropriately exclude patients who have admission, discharge, or death events during the gap period. However, consider if additional validation is needed:

  1. Should we also validate continuous hospital stay during the input window?
  2. Are there other events that might invalidate the prediction window?

10-18: Verify predicate definitions and patterns.

The predicates look well-structured, but please verify:

  1. The regex patterns for hospital events (HOSPITAL_ADMISSION//.* and HOSPITAL_DISCHARGE//.*) match your expected event codes
  2. The MEDS_DEATH code is correctly defined in your system
✅ Verification successful

Let me check the actual usage of these codes to ensure they are consistently defined.


Predicate definitions are consistent across the codebase

The verification shows that:

  1. The MEDS_DEATH code is consistently used across all mortality and length-of-stay tasks
  2. The hospital event patterns (HOSPITAL_ADMISSION//.* and HOSPITAL_DISCHARGE//.*) are consistently defined across all relevant tasks, with one minor variation in in_icu/first_24h.yaml that uses ^ anchor which is functionally equivalent
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of MEDS_DEATH code and hospital event patterns

# Search for MEDS_DEATH definition
echo "Searching for MEDS_DEATH definition:"
rg -l "MEDS_DEATH"

# Search for hospital admission/discharge patterns
echo -e "\nSearching for hospital event patterns:"
rg -l "HOSPITAL_(ADMISSION|DISCHARGE)//"

Length of output: 1050


Script:

#!/bin/bash
# Check the context of MEDS_DEATH and hospital patterns in other files

echo "Checking MEDS_DEATH usage context:"
rg "MEDS_DEATH" -A 2 -B 2

echo -e "\nChecking hospital event patterns usage:"
rg "HOSPITAL_(ADMISSION|DISCHARGE)//" -A 2 -B 2

Length of output: 6727


39-43: Review target window end condition syntax.

The target window end condition uses an arrow operator (->) which might need verification:

    end: start -> discharge_or_death

Please confirm this is the correct syntax for your configuration parser. Some systems might expect a different operator or format.

✅ Verification successful

Based on the examination of the files, I can now generate the final response:

The arrow operator (->) syntax in the window configuration is correct and verified.

The syntax is consistently used across multiple task configuration files in the MIMIC-IV tutorial, specifically in:

  • MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_24h.yaml
  • MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_48h.yaml
  • MIMICIV_TUTORIAL/tasks/mortality/in_icu/first_24h.yaml

All these files use the same window end condition syntax:

end: start -> discharge_or_death

This is a standardized pattern used to define the target window end condition in the ACES task configuration system, where the arrow operator indicates a temporal relationship between the start time and a specific event (in this case, discharge_or_death).

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for similar window configurations

# Look for other YAML configurations using the arrow operator
echo "Searching for similar window end conditions:"
rg -l "end:\s+\w+\s*->" --type yaml

Length of output: 311


Script:

#!/bin/bash
# Let's examine the content of these files to verify consistent usage
echo "Content of first_48h.yaml:"
cat MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_48h.yaml

echo -e "\nContent of first_24h.yaml:"
cat MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_24h.yaml

echo -e "\nContent of icu/first_24h.yaml:"
cat MIMICIV_TUTORIAL/tasks/mortality/in_icu/first_24h.yaml

# Also search for any documentation or schema files that might validate this syntax
echo -e "\nSearching for documentation or schema files:"
fd -e md -e yaml -e json schema config docs

Length of output: 5362


Script:

#!/bin/bash
# Let's search for any README files or documentation in the root or MIMICIV_TUTORIAL directory
echo "Searching for documentation in root and MIMICIV_TUTORIAL:"
find . -maxdepth 2 -type f -name "README*" -o -name "*.md"

# Let's also look for any example or template files that might validate this syntax
echo -e "\nSearching for example or template files:"
fd -e yaml example template

Length of output: 586


Script:

#!/bin/bash
# Let's check the documentation files for any mention of the window configuration syntax
echo "Checking documentation files for window configuration details:"
echo -e "\nChecking docs/prediction.md:"
cat docs/prediction.md

echo -e "\nChecking docs/usage_guide.md:"
cat docs/usage_guide.md

echo -e "\nChecking MIMICIV_TUTORIAL/README.MD:"
cat MIMICIV_TUTORIAL/README.MD

Length of output: 42084

MIMICIV_TUTORIAL/tasks/mortality/in_hospital/first_48h.yaml (3)

20-20: LGTM!

The trigger configuration is appropriate for initiating the mortality prediction window at hospital admission.


10-19: Verify predicate code patterns and consider adding validation.

Several potential improvements for the predicates section:

  1. The regex patterns HOSPITAL_ADMISSION//.* and HOSPITAL_DISCHARGE//.* match any string after the double slash. Consider being more specific if there's an expected format.
  2. The MEDS_DEATH code uses a different format than the others. Consider using consistent patterns across all predicates.
  3. Consider adding code format validation to ensure data consistency.

Let's verify the actual code formats in use:


22-43: Review and clarify window configurations.

Several points need attention:

  1. The input window's start: NULL needs clarification. What data is included before the trigger event?
  2. The gap window's constraints might miss edge cases. Consider:
    • What happens if a patient has multiple admission events?
    • Should we validate the chronological order of events?
  3. The target window's end condition (start -> discharge_or_death) should be validated to ensure it always resolves to a valid timestamp.

Let's verify the window configuration patterns in other tasks:

✅ Verification successful

Let me gather more information about the window configurations, particularly focusing on the input window patterns and gap window validations.

This will help us understand:

  1. The complete context of mortality task configurations
  2. How input windows are configured across different tasks
  3. The validation patterns used in gap windows

Window configurations are consistent with task requirements

After analyzing the configurations across mortality tasks, I can address the initial concerns:

  1. input.start: NULL is a consistent pattern across all mortality tasks, intentionally allowing all data before the trigger event to be included. This is valid as we want to consider the patient's complete history.

  2. The gap window constraints are properly configured:

    • The (None, 0) pattern is consistently used across tasks to exclude patients with events during the gap period
    • Multiple admission events are correctly handled by excluding any admission during the gap
    • The chronological order is enforced by the window boundaries (input.end -> gap -> target)
  3. The discharge_or_death end condition is well-defined in predicates:

    discharge_or_death:
      expr: or(hospital_discharge, death)

    This ensures a valid timestamp will always be available for window termination.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar window configurations in other task definitions
echo "Searching for input window patterns:"
ast-grep --pattern 'input:
  $$$
  start: NULL
  $$$'

echo "Searching for gap window patterns:"
ast-grep --pattern 'gap:
  $$$
  has:
    $$$'

echo "Searching for target window patterns with discharge_or_death:"
rg -A 5 "discharge_or_death" --type yaml

Length of output: 4938


Script:

#!/bin/bash
# Search for all window configurations in mortality tasks
echo "Searching for all window configurations in mortality tasks:"
fd "first.*yaml" MIMICIV_TUTORIAL/tasks/mortality/ --exec cat {}

echo -e "\nSearching for input window configurations across all tasks:"
fd "first.*yaml" MIMICIV_TUTORIAL/tasks/ --exec rg -A 5 "input:" {}

echo -e "\nSearching for gap window has conditions:"
fd "first.*yaml" MIMICIV_TUTORIAL/tasks/ --exec rg -A 5 "has:" {}

Length of output: 6602

MIMICIV_TUTORIAL/tasks/mortality/in_icu/first_24h.yaml (2)

22-42: LGTM! Well-structured predicates.

The predicates are well-defined with consistent regex patterns and logical combinations.


43-43: LGTM! Appropriate trigger event.

The trigger event is correctly set to ICU admission, which aligns with the task's objective.

MIMICIV_TUTORIAL/README.MD (1)

11-12: LGTM!

Clear presentation of available options helps users make an informed choice.

Comment on lines +18 to +20
export ROOT_DIR=??? # set to the directory in which you want to store all data
export MIMICIV_MEDS_DIR=${ROOT_DIR}/meds/ # set to the directory in which you want to store the raw MIMIC-IV data
export OUTPUT_TABULARIZATION_DIR=${ROOT_DIR}/meds_tab/ # set to the output directory for the tabularized data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Add guidance for ROOT_DIR and improve error handling.

The ROOT_DIR variable is undefined (shown as ???). Additionally, the script could benefit from basic error handling.

-export ROOT_DIR=??? # set to the directory in which you want to store all data
+# Set the root directory for all data (e.g., /path/to/your/data)
+export ROOT_DIR="${HOME}/meds_data"
+
+# Create necessary directories
+mkdir -p "${ROOT_DIR}"
+mkdir -p "${MIMICIV_MEDS_DIR}"
+mkdir -p "${OUTPUT_TABULARIZATION_DIR}"
+
+# Validate directories exist
+if [[ ! -d "${ROOT_DIR}" || ! -d "${MIMICIV_MEDS_DIR}" || ! -d "${OUTPUT_TABULARIZATION_DIR}" ]]; then
+    echo "Error: Failed to create required directories"
+    exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export ROOT_DIR=??? # set to the directory in which you want to store all data
export MIMICIV_MEDS_DIR=${ROOT_DIR}/meds/ # set to the directory in which you want to store the raw MIMIC-IV data
export OUTPUT_TABULARIZATION_DIR=${ROOT_DIR}/meds_tab/ # set to the output directory for the tabularized data
# Set the root directory for all data (e.g., /path/to/your/data)
export ROOT_DIR="${HOME}/meds_data"
# Create necessary directories
mkdir -p "${ROOT_DIR}"
mkdir -p "${MIMICIV_MEDS_DIR}"
mkdir -p "${OUTPUT_TABULARIZATION_DIR}"
# Validate directories exist
if [[ ! -d "${ROOT_DIR}" || ! -d "${MIMICIV_MEDS_DIR}" || ! -d "${OUTPUT_TABULARIZATION_DIR}" ]]; then
echo "Error: Failed to create required directories"
exit 1
fi
export MIMICIV_MEDS_DIR=${ROOT_DIR}/meds/ # set to the directory in which you want to store the raw MIMIC-IV data
export OUTPUT_TABULARIZATION_DIR=${ROOT_DIR}/meds_tab/ # set to the output directory for the tabularized data

Comment on lines +77 to +82
for TASK_NAME in "${TASKS[@]}"; do
SINGLE_TASK_DIR="${MIMICIV_MEDS_DIR}/tasks/${TASK_NAME}"
mkdir -p $SINGLE_TASK_DIR # create a directory for the task
cp MIMICIV_TUTORIAL/tasks/${TASK_NAME}.yaml "${SINGLE_TASK_DIR}.yaml"
aces-cli --multirun hydra/launcher=joblib data=sharded data.standard=meds data.root="$MIMICIV_MEDS_DIR/data" "data.shard=$(expand_shards $MIMICIV_MEDS_DIR/data)" cohort_dir="$TASKS_DIR" cohort_name="$TASK_NAME"
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Define expand_shards function and add error handling.

The script uses expand_shards function which isn't defined. Also, the task extraction could benefit from error handling.

+# Helper function to expand shards
+expand_shards() {
+    local dir="$1"
+    ls "${dir}"/shard-*.parquet | sed 's/.*shard-\([0-9]*\).parquet/\1/' | tr '\n' ','
+}
+
 for TASK_NAME in "${TASKS[@]}"; do
     SINGLE_TASK_DIR="${MIMICIV_MEDS_DIR}/tasks/${TASK_NAME}"
     mkdir -p $SINGLE_TASK_DIR # create a directory for the task
+    
+    if [ ! -f "MIMICIV_TUTORIAL/tasks/${TASK_NAME}.yaml" ]; then
+        echo "Error: Task config ${TASK_NAME}.yaml not found"
+        continue
+    }
+    
     cp MIMICIV_TUTORIAL/tasks/${TASK_NAME}.yaml "${SINGLE_TASK_DIR}.yaml"
     aces-cli --multirun hydra/launcher=joblib data=sharded data.standard=meds data.root="$MIMICIV_MEDS_DIR/data" "data.shard=$(expand_shards $MIMICIV_MEDS_DIR/data)" cohort_dir="$TASKS_DIR" cohort_name="$TASK_NAME"
 done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for TASK_NAME in "${TASKS[@]}"; do
SINGLE_TASK_DIR="${MIMICIV_MEDS_DIR}/tasks/${TASK_NAME}"
mkdir -p $SINGLE_TASK_DIR # create a directory for the task
cp MIMICIV_TUTORIAL/tasks/${TASK_NAME}.yaml "${SINGLE_TASK_DIR}.yaml"
aces-cli --multirun hydra/launcher=joblib data=sharded data.standard=meds data.root="$MIMICIV_MEDS_DIR/data" "data.shard=$(expand_shards $MIMICIV_MEDS_DIR/data)" cohort_dir="$TASKS_DIR" cohort_name="$TASK_NAME"
done
# Helper function to expand shards
expand_shards() {
local dir="$1"
ls "${dir}"/shard-*.parquet | sed 's/.*shard-\([0-9]*\).parquet/\1/' | tr '\n' ','
}
for TASK_NAME in "${TASKS[@]}"; do
SINGLE_TASK_DIR="${MIMICIV_MEDS_DIR}/tasks/${TASK_NAME}"
mkdir -p $SINGLE_TASK_DIR # create a directory for the task
if [ ! -f "MIMICIV_TUTORIAL/tasks/${TASK_NAME}.yaml" ]; then
echo "Error: Task config ${TASK_NAME}.yaml not found"
continue
}
cp MIMICIV_TUTORIAL/tasks/${TASK_NAME}.yaml "${SINGLE_TASK_DIR}.yaml"
aces-cli --multirun hydra/launcher=joblib data=sharded data.standard=meds data.root="$MIMICIV_MEDS_DIR/data" "data.shard=$(expand_shards $MIMICIV_MEDS_DIR/data)" cohort_dir="$TASKS_DIR" cohort_name="$TASK_NAME"
done

@Oufattole Oufattole merged commit 97d5f5a into main Nov 10, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant