You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable users to load and analyze sparse representations of patient data across different temporal aggregation windows in a notebook environment. This will facilitate statistical analysis of tabularized features.
Requirements
Feature Discovery Function
defget_available_windows_and_aggs(tabularized_data_dir: Path) ->tuple[list[str], list[str]]:
""" Scan the filesystem to discover available window sizes and aggregation types. Args: tabularized_data_dir (Path): Directory containing tabularized data files Returns: tuple[list[str], list[str]]: Available window sizes and aggregation types """
Data Loading Function
defload_data(
tabularized_data_dir: Path,
windows: list[str],
aggs: list[str],
metadata_fp: Path
) ->Union[pd.DataFrame, tuple[sp.sparse.csr_matrix, list[str], pd.DataFrame]]:
""" Load sparse patient representation for specified windows and aggregations. Args: tabularized_data_dir (Path): Directory containing tabularized data files windows (list[str]): List of window sizes to include aggs (list[str]): List of aggregation types to include metadata_fp (Path): Path to patient metadata file Returns: Either: - pd.DataFrame: Sparse DataFrame with features as columns, (patient_id, time) as index - tuple[sp.sparse.csr_matrix, list[str], pd.DataFrame]: - Sparse matrix containing feature values - List of feature names (code + window + agg) - DataFrame with patient_id and time information """
Data Structure
Rows: Aligned with patient and time information
Columns: Feature names in format {code}_{window}_{agg}
Values: Sparse representation of feature values
Output Options
Option A: Sparse pandas DataFrame
Index: MultiIndex with (patient_id, time)
Columns: Feature names
Option B: Tuple of three elements
Sparse matrix (CSR format) containing the data
List of column names (features)
DataFrame with patient_id and time information
Implementation Notes
Use efficient sparse matrix format to handle large, sparse feature sets
Implement robust error handling for missing or corrupt files
Consider adding validation for window sizes and aggregation types
Include progress indicators (maybe tqdm.auto.tqdm) for operations as this will be used in the notebook setting
Examples
# Get available optionswindows, aggs=get_available_windows_and_aggs(data_dir)
print(f"Available windows: {windows}")
print(f"Available aggregations: {aggs}")
# Load data (DataFrame option)df=load_data(
data_dir,
windows=["1h", "4h", "24h"],
aggs=["mean", "max", "count"],
metadata_fp=metada
The text was updated successfully, but these errors were encountered:
Overview
Enable users to load and analyze sparse representations of patient data across different temporal aggregation windows in a notebook environment. This will facilitate statistical analysis of tabularized features.
Requirements
Feature Discovery Function
Data Loading Function
Data Structure
{code}_{window}_{agg}
Output Options
Option A: Sparse pandas DataFrame
Option B: Tuple of three elements
Implementation Notes
tqdm.auto.tqdm
) for operations as this will be used in the notebook settingExamples
The text was updated successfully, but these errors were encountered: