Skip to content

Commit

Permalink
Merge branch 'main' into clean
Browse files Browse the repository at this point in the history
  • Loading branch information
mmcdermott committed Jun 12, 2024
2 parents d7127e8 + 1009f70 commit 109d153
Show file tree
Hide file tree
Showing 25 changed files with 401 additions and 22 deletions.
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,16 @@
This repository provides utilities and scripts to run limited automatic tabular ML pipelines for generic MEDS
datasets.


# Installation

**Pip Install**

```bash
pip install meds-tab
```

**Local Install**

```
# clone the git repo
pip install .
Expand Down Expand Up @@ -39,7 +41,8 @@ For an end to end example over MIMIC-IV, see the [companion repository](https://
### Core CLI Scripts Overview

1. **`meds-tab-describe`**: This command processes MEDS data shards to compute the frequencies of different code-types
- time-series codes (codes with timestamps)

- time-series codes (codes with timestamps)
- time-series numerical values (codes with timestamps and numerical values)
- static codes (codes without timestamps)
- static numerical codes (codes without timestamps but with numerical values).
Expand All @@ -54,6 +57,7 @@ For an end to end example over MIMIC-IV, see the [companion repository](https://

6. **`meds-tab-xgboost-sweep`**: Conducts an Optuna hyperparameter sweep to optimize over `window_sizes`, `aggregations`, and `min_code_inclusion_frequency`, aiming to enhance model performance and adaptability.

# How does MEDS-Tab Work?

#### What do you mean "tabular pipelines"? Isn't _all_ structured EHR data already tabular?

Expand All @@ -66,4 +70,14 @@ satisfy the (1) "single row single instance", (2) "consistent set of columns", a
Thus, in this pipeline, when we say we will produce a "tabular" view of MEDS data, we mean a dataset that can
realize these constraints, which will explicitly involve summarizing the patient data over various historical
or future windows in time to produce a single row per patient with a consistent, logical set of columns
(though there may still be missingness).
(though there may still be missingness).

## Implementation Improvements

# Computational Performance vs. Existing Pipelines

# XGBoost Performance

## XGBoost Model Performance on MIMIC-IV

## XGBoost Model Performance on Philips eICU
7 changes: 7 additions & 0 deletions docs/source/computational-performance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Computational Performance vs. Existing Pipelines
================================================

.. include:: ../../README.md
:parser: markdown
:start-after: Computational Performance vs. Existing Pipelines
:end-before: XGBoost Model Performance on MIMIC-IV
13 changes: 7 additions & 6 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,17 @@
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "MEDS-TAB"
project = "MEDS-Tab"
copyright = "2024, Matthew McDermott, Nassim Oufattole, Teya Bergamaschi"
author = "Matthew McDermott, Nassim Oufattole, Teya Bergamaschi"
release = "0.1.0"
version = "0.1.0"
release = "0.0.1"
version = "0.0.1"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

sys.path.insert(0, os.path.abspath("../.."))

extensions = [
"sphinx.ext.duration",
"sphinx.ext.doctest",
Expand Down Expand Up @@ -60,8 +61,8 @@
html_static_path = ["_static"]


html_title = f"NEDS-TAB v{version} Documentation"
html_short_title = "MEDS-TAB Documentation"
html_title = f"MEDS-Tab v{version} Documentation"
html_short_title = "MEDS-Tab Documentation"

# html_logo = "query-512.png"
# html_favicon = "query-16.ico"
Expand All @@ -70,7 +71,7 @@

html_theme_options = {
"dark_mode_code_blocks": False,
# "nav_title": "MEDS-TAB",
# "nav_title": "MEDS-Tab",
# "palette": {"primary": "green", "accent": "green"},
# "repo_url": "https://github.com/mmcdermott/MEDS_Tabular_AutoML",
# "repo_name": "MEDS_Tabular_AutoML",
Expand Down
30 changes: 30 additions & 0 deletions docs/source/generated/src.MEDS_tabular_automl.configs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
src.MEDS\_tabular\_automl.configs
=================================

.. automodule:: src.MEDS_tabular_automl.configs



















.. rubric:: Modules

.. autosummary::
:toctree:
:recursive:

src.MEDS_tabular_automl.configs.tabularization
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
src.MEDS\_tabular\_automl.configs.tabularization
================================================

.. automodule:: src.MEDS_tabular_automl.configs.tabularization
23 changes: 23 additions & 0 deletions docs/source/generated/src.MEDS_tabular_automl.describe_codes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
src.MEDS\_tabular\_automl.describe\_codes
=========================================

.. automodule:: src.MEDS_tabular_automl.describe_codes







.. rubric:: Functions

.. autosummary::

clear_code_aggregation_suffix
compute_feature_frequencies
convert_to_df
convert_to_freq_dict
filter_parquet
filter_to_codes
get_feature_columns
get_feature_freqs
18 changes: 18 additions & 0 deletions docs/source/generated/src.MEDS_tabular_automl.file_name.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
src.MEDS\_tabular\_automl.file\_name
====================================

.. automodule:: src.MEDS_tabular_automl.file_name







.. rubric:: Functions

.. autosummary::

get_model_files
get_task_specific_path
list_subdir_files
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
src.MEDS\_tabular\_automl.generate\_static\_features
====================================================

.. automodule:: src.MEDS_tabular_automl.generate_static_features







.. rubric:: Functions

.. autosummary::

convert_to_matrix
get_flat_static_rep
get_sparse_static_rep
summarize_static_measurements
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
src.MEDS\_tabular\_automl.generate\_summarized\_reps
====================================================

.. automodule:: src.MEDS_tabular_automl.generate_summarized_reps







.. rubric:: Functions

.. autosummary::

aggregate_matrix
compute_agg
generate_summary
get_rolling_window_indicies
sparse_aggregate
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
src.MEDS\_tabular\_automl.generate\_ts\_features
================================================

.. automodule:: src.MEDS_tabular_automl.generate_ts_features







.. rubric:: Functions

.. autosummary::

feature_name_to_code
get_flat_ts_rep
get_long_code_df
get_long_value_df
summarize_dynamic_measurements
18 changes: 18 additions & 0 deletions docs/source/generated/src.MEDS_tabular_automl.mapper.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
src.MEDS\_tabular\_automl.mapper
================================

.. automodule:: src.MEDS_tabular_automl.mapper







.. rubric:: Functions

.. autosummary::

get_earliest_lock
register_lock
wrap
34 changes: 34 additions & 0 deletions docs/source/generated/src.MEDS_tabular_automl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,37 @@ src.MEDS\_tabular\_automl
=========================

.. automodule:: src.MEDS_tabular_automl



















.. rubric:: Modules

.. autosummary::
:toctree:
:recursive:

src.MEDS_tabular_automl.configs
src.MEDS_tabular_automl.describe_codes
src.MEDS_tabular_automl.file_name
src.MEDS_tabular_automl.generate_static_features
src.MEDS_tabular_automl.generate_summarized_reps
src.MEDS_tabular_automl.generate_ts_features
src.MEDS_tabular_automl.mapper
src.MEDS_tabular_automl.scripts
src.MEDS_tabular_automl.utils
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
src.MEDS\_tabular\_automl.scripts.cache\_task
=============================================

.. automodule:: src.MEDS_tabular_automl.scripts.cache_task







.. rubric:: Functions

.. autosummary::

generate_row_cached_matrix
main
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
src.MEDS\_tabular\_automl.scripts.describe\_codes
=================================================

.. automodule:: src.MEDS_tabular_automl.scripts.describe_codes







.. rubric:: Functions

.. autosummary::

main
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
src.MEDS\_tabular\_automl.scripts.launch\_xgboost
=================================================

.. automodule:: src.MEDS_tabular_automl.scripts.launch_xgboost







.. rubric:: Functions

.. autosummary::

main





.. rubric:: Classes

.. autosummary::

Iterator
XGBoostModel
34 changes: 34 additions & 0 deletions docs/source/generated/src.MEDS_tabular_automl.scripts.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
src.MEDS\_tabular\_automl.scripts
=================================

.. automodule:: src.MEDS_tabular_automl.scripts



















.. rubric:: Modules

.. autosummary::
:toctree:
:recursive:

src.MEDS_tabular_automl.scripts.cache_task
src.MEDS_tabular_automl.scripts.describe_codes
src.MEDS_tabular_automl.scripts.launch_xgboost
src.MEDS_tabular_automl.scripts.tabularize_static
src.MEDS_tabular_automl.scripts.tabularize_time_series
Loading

0 comments on commit 109d153

Please sign in to comment.