Skip to content

Commit

Permalink
swiched from bash to console block code as mdformat errors due to thi…
Browse files Browse the repository at this point in the history
…s in the github action tests
  • Loading branch information
Oufattole committed Oct 24, 2024
1 parent 29c073f commit 02c44af
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 13 deletions.
6 changes: 3 additions & 3 deletions docs/README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ To run tests, use the following command:

Run all the fast tests that are fast (and don't use gpu) with:

```bash
```console
pytest -k "not slow"
```

Expand All @@ -24,12 +24,12 @@ This section explains how to edit documentation files in the `docs` directory.

First install docs code

```bash
```console
pip install -e .[docs]
```

Run

```bash
```console
mkdocs serve
```
4 changes: 2 additions & 2 deletions docs/prediction.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ We optimize predictive accuracy and model performance by using varied window siz

A single XGBoost run was completed to profile time and memory usage. This was done for each `$TASK` using the following command:

```bash
```console
meds-tab-model \
model_launcher=xgboost \
"input_dir=${MEDS_RESHARD_DIR}/data" "output_dir=$OUTPUT_TABULARIZATION_DIR" \
Expand Down Expand Up @@ -80,7 +80,7 @@ To better understand the runtimes, we also report the task specific cohort size.
The XGBoost sweep was run using the following command for each `$TASK`:

```bash
```console
meds-tab-model \
--multirun \
model_launcher=xgboost \
Expand Down
16 changes: 8 additions & 8 deletions docs/usage_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ We provide a set of core CLI scripts to facilitate the tabularization and modeli

This optional command reshards the data. A core challenge in tabularization is the high memory usage and slow compute time. We shard the data into small shards to reduce the memory usage as we can independently tabularize each shard, and we can reduce cpu time by parallelizing the processing of these shards across workers that are independently processing different shards.

```bash
```console
MEDS_transform-reshard_to_split \
--multirun \
worker="range(0,6)" \
Expand Down Expand Up @@ -94,7 +94,7 @@ This command processes MEDS data shards to compute the frequencies of different

This script further caches feature names and frequencies in a dataset stored in a `code_metadata.parquet` file within the `OUTPUT_DIR` argument specified as a hydra-style command line argument.

```bash
```console
meds-tab-describe \
"input_dir=${MEDS_RESHARD_DIR}/data" "output_dir=$OUTPUT_DIR"
```
Expand Down Expand Up @@ -159,7 +159,7 @@ OUTPUT_DIR/

Filters and processes the dataset based on the count of codes, generating a tabular vector for each patient at each timestamp in the shards. Each row corresponds to a unique `subject_id` and `timestamp` combination. As a result, rows are duplicated across multiple timestamps for the same patient.

```bash
```console
meds-tab-tabularize-static \
"input_dir=${MEDS_RESHARD_DIR}/data" \
"output_dir=$OUTPUT_DIR" \
Expand Down Expand Up @@ -255,7 +255,7 @@ OUTPUT_DIR/

This stage handles the computationally intensive task of converting temporal medical data into feature vectors. The process employs several key optimizations: sparse matrix operations utilizing scipy.sparse for memory-efficient storage, data sharding that enables parallel processing, and efficient aggregation using Polars for fast rolling window computations.

```bash
```console
meds-tab-tabularize-time-series \
--multirun \
worker="range(0,$N_PARALLEL_WORKERS)" \
Expand Down Expand Up @@ -343,7 +343,7 @@ OUTPUT_DIR/tabularize/

Aligns task-specific labels with the nearest prior event in the tabularized data. It requires a labeled dataset directory with three columns (`subject_id`, `timestamp`, `label`) structured similarly to the `input_dir`.

```bash
```console
meds-tab-cache-task \
--multirun \
hydra/launcher=joblib \
Expand Down Expand Up @@ -447,7 +447,7 @@ Trains a tabular model using user-specified parameters. The system incorporates

### Single Model Training

```bash
```console
meds-tab-model \
model_launcher=xgboost \
"input_dir=${MEDS_RESHARD_DIR}/data" \
Expand All @@ -461,7 +461,7 @@ meds-tab-model \

### Hyperparameter Optimization

```bash
```console
meds-tab-model \
--multirun \
model_launcher=xgboost \
Expand Down Expand Up @@ -564,7 +564,7 @@ OUTPUT_MODEL_DIR/
??? example "Experimental Feature"
We also support an autogluon based hyperparameter and model search:

```bash
```console
meds-tab-autogluon model_launcher=autogluon \
"input_dir=${MEDS_RESHARD_DIR}/data" \
"output_dir=$OUTPUT_DIR" \
Expand Down

0 comments on commit 02c44af

Please sign in to comment.