-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added docstrings for scripts/generate_permutations.py #23
Added docstrings for scripts/generate_permutations.py #23
Conversation
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe updates across various files in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script as describe_codes.py
participant Function as compute_feature_frequencies
participant DataFrame as DataFrame
User->>Script: Execute write_fn
Script->>Function: Call compute_feature_frequencies
Function->>DataFrame: Process DataFrame
DataFrame-->>Function: Return frequencies
Function-->>Script: Return frequency DataFrame
Script-->>User: Output results
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## improve_docstrings_and_tests #23 +/- ##
=============================================================
Coverage 86.85% 86.85%
=============================================================
Files 13 13
Lines 898 898
=============================================================
Hits 780 780
Misses 118 118 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (2)
tests/test_integration.py (1)
Line range hint
5-33
: Ensure module-level imports are at the top of the file to comply with PEP 8 standards.- import json - import subprocess - import tempfile - from io import StringIO - from pathlib import Path - import polars as pl - from hydra import compose, initialize - from test_tabularize import ( - CODE_COLS, - EXPECTED_STATIC_FILES, - MEDS_OUTPUTS, - SPLITS_JSON, - STATIC_FIRST_COLS, - STATIC_PRESENT_COLS, - SUMMARIZE_EXPECTED_FILES, - VALUE_COLS, - ) - from MEDS_tabular_automl.describe_codes import get_feature_columns - from MEDS_tabular_automl.file_name import list_subdir_files - from MEDS_tabular_automl.utils import ( - VALUE_AGGREGATIONS, - get_events_df, - get_feature_names, - get_shard_prefix, - get_unique_time_events_df, - load_matrix, - ) + # All imports moved to the top of the filetests/test_tabularize.py (1)
Line range hint
5-30
: The imports are not at the top of the file, which is a common Python convention to improve readability and prevent potential issues with circular dependencies.+ import rootutils + import json + import subprocess + import tempfile + from io import StringIO + from pathlib import Path + import polars as pl + from hydra import compose, initialize + from MEDS_tabular_automl.describe_codes import get_feature_columns + from MEDS_tabular_automl.file_name import list_subdir_files + from MEDS_tabular_automl.scripts import ( + cache_task, + describe_codes, + launch_xgboost, + tabularize_static, + tabularize_time_series, + ) + from MEDS_tabular_automl.utils import ( + VALUE_AGGREGATIONS, + get_events_df, + get_feature_names, + get_shard_prefix, + get_unique_time_events_df, + load_matrix, + )
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (7)
- src/MEDS_tabular_automl/describe_codes.py (9 hunks)
- src/MEDS_tabular_automl/generate_static_features.py (6 hunks)
- src/MEDS_tabular_automl/generate_summarized_reps.py (6 hunks)
- src/MEDS_tabular_automl/scripts/describe_codes.py (2 hunks)
- src/MEDS_tabular_automl/scripts/generate_permutations.py (2 hunks)
- tests/test_integration.py (1 hunks)
- tests/test_tabularize.py (1 hunks)
Files skipped from review due to trivial changes (1)
- src/MEDS_tabular_automl/scripts/generate_permutations.py
Additional context used
Ruff
src/MEDS_tabular_automl/generate_summarized_reps.py
6-6: Module level import not at top of file (E402)
7-7: Module level import not at top of file (E402)
9-9: Module level import not at top of file (E402)
10-10: Module level import not at top of file (E402)
11-16: Module level import not at top of file (E402)
tests/test_integration.py
5-5: Module level import not at top of file (E402)
6-6: Module level import not at top of file (E402)
7-7: Module level import not at top of file (E402)
8-8: Module level import not at top of file (E402)
9-9: Module level import not at top of file (E402)
11-11: Module level import not at top of file (E402)
12-12: Module level import not at top of file (E402)
13-22: Module level import not at top of file (E402)
24-24: Module level import not at top of file (E402)
25-25: Module level import not at top of file (E402)
26-33: Module level import not at top of file (E402)
tests/test_tabularize.py
5-5: Module level import not at top of file (E402)
6-6: Module level import not at top of file (E402)
7-7: Module level import not at top of file (E402)
8-8: Module level import not at top of file (E402)
9-9: Module level import not at top of file (E402)
11-11: Module level import not at top of file (E402)
12-12: Module level import not at top of file (E402)
14-14: Module level import not at top of file (E402)
15-15: Module level import not at top of file (E402)
16-22: Module level import not at top of file (E402)
23-30: Module level import not at top of file (E402)
Additional comments not posted (13)
src/MEDS_tabular_automl/scripts/describe_codes.py (1)
76-76
: Ensure correct function usage after refactoring.src/MEDS_tabular_automl/generate_static_features.py (3)
28-38
: The functionconvert_to_matrix
is well-documented and the type annotations are clear.
170-184
: The functionget_flat_static_rep
is well-documented and the type annotations are clear.
54-66
: The functionget_sparse_static_rep
effectively merges static and time-series data. Ensure the assertions for data integrity are robust.Verification successful
The function
get_sparse_static_rep
includes assertions to ensure data integrity:
assert static_df.select(pl.col("patient_id")).collect().to_series().is_sorted()
assert matrix.shape[1] == len(...)
These assertions help verify that the
patient_id
column instatic_df
is sorted and that the shape of the matrix matches the expected length.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the data integrity checks in `get_sparse_static_rep`. # Test: Search for usage of `assert` in the function. Expect: Robust checks. rg --type python $'assert' 'src/MEDS_tabular_automl/generate_static_features.py'Length of output: 108
Script:
#!/bin/bash # Description: Verify the data integrity checks in `get_sparse_static_rep`. # Test: Search for usage of `assert` in the function. Expect: Robust checks. rg 'assert' 'src/MEDS_tabular_automl/generate_static_features.py'Length of output: 195
src/MEDS_tabular_automl/generate_summarized_reps.py (5)
168-191
: The functiongenerate_summary
is well-documented. Ensure the aggregation types are correctly validated.Verification successful
The code does include checks for unsupported aggregation types, raising a
ValueError
when an invalid aggregation method is encountered.
ValueError: If the aggregation method is not implemented.
raise ValueError(f"Aggregation method '{agg}' not implemented.")
ValueError: If the aggregation type is not supported.
raise ValueError(
These lines confirm that the aggregation types are correctly validated within the
generate_summary
function.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify aggregation type validation in `generate_summary`. # Test: Search for aggregation type checks. Expect: Correct validation. rg --type python $'ValueError' 'src/MEDS_tabular_automl/generate_summarized_reps.py'Length of output: 112
Script:
#!/bin/bash # Description: Verify aggregation type validation in `generate_summary`. # Test: Search for aggregation type checks. Expect: Correct validation. rg 'ValueError' 'src/MEDS_tabular_automl/generate_summarized_reps.py'Length of output: 294
70-87
: The functionaggregate_matrix
is well-documented. Ensure the type compatibility for aggregated matrices.Verification successful
The function
aggregate_matrix
includes a type compatibility check by raising aTypeError
if the matrix type is invalid. This ensures that the function handles type compatibility as expected.
src/MEDS_tabular_automl/generate_summarized_reps.py
:raise TypeError(f"Invalid matrix type {type(agg_matrix)}")
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify type compatibility for aggregated matrices in `aggregate_matrix`. # Test: Search for type checks in the function. Expect: Proper type handling. rg --type python $'TypeError' 'src/MEDS_tabular_automl/generate_summarized_reps.py'Length of output: 111
Script:
#!/bin/bash # Description: Verify type compatibility for aggregated matrices in `aggregate_matrix`. # Test: Search for type checks in the function. Expect: Proper type handling. rg 'TypeError' --glob 'src/MEDS_tabular_automl/generate_summarized_reps.py'Length of output: 346
126-149
: The functioncompute_agg
is well-documented and handles aggregation using rolling window indices. Ensure the DataFrame operations are optimized.Verification successful
The
compute_agg
function does not appear to use DataFrame operations directly. Instead, it operates on sparse matrices and numpy arrays, which are generally efficient for the types of operations being performed (e.g.,sum
,min
,max
,power
,getnnz
, andconcatenate
). Therefore, the concern about optimizing DataFrame operations does not apply here.
- The function uses efficient operations on sparse matrices and numpy arrays.
- No DataFrame operations like
groupby
,apply
,merge
,join
, orconcat
were found within the function.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify DataFrame operations are optimized in `compute_agg`. # Test: Search for DataFrame operations. Expect: Optimized operations. rg --type python $'group_by' 'src/MEDS_tabular_automl/generate_summarized_reps.py'Length of output: 110
Script:
#!/bin/bash # Description: Verify DataFrame operations are optimized in `compute_agg`. # Test: Search for DataFrame operations. Expect: Optimized operations. rg 'groupby|apply|merge|join|concat' src/MEDS_tabular_automl/generate_summarized_reps.pyLength of output: 715
47-56
: The functionget_rolling_window_indicies
effectively computes rolling window indices. Ensure the handling of the "full" window size is appropriate.Verification successful
The handling of the "full" window size in the
get_rolling_window_indicies
function appears to be appropriate. Whenwindow_size
is "full", it sets a large time delta of 150 years, which should cover the entire dataset.
if window_size == "full":
timedelta = pd.Timedelta(150 * 52, unit="W") # just use 150 years as time delta
else:
timedelta = pd.Timedelta(window_size)
This logic ensures that the "full" window size is handled correctly by setting an extensive time delta.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the handling of the "full" window size in `get_rolling_window_indicies`. # Test: Search for usage of "full" in the function. Expect: Appropriate handling. rg --type python $'"full"' 'src/MEDS_tabular_automl/generate_summarized_reps.py'Length of output: 110
Script:
#!/bin/bash # Description: Verify the handling of the "full" window size in `get_rolling_window_indicies`. # Test: Search for usage of "full" within the function `get_rolling_window_indicies`. Expect: Appropriate handling. rg '"full"' --glob 'src/MEDS_tabular_automl/generate_summarized_reps.py'Length of output: 234
Script:
#!/bin/bash # Description: Extract and review the relevant lines of code around the occurrences of "full" in `generate_summarized_reps.py`. # Extract lines around the first occurrence of "full" rg -A 5 -B 5 '"full"' src/MEDS_tabular_automl/generate_summarized_reps.pyLength of output: 789
19-31
: The functionsparse_aggregate
is well-documented and handles various aggregation methods. Ensure all intended aggregation methods are implemented.src/MEDS_tabular_automl/describe_codes.py (4)
4-4
: The import ofOmegaConf
has been updated correctly to reflect the changes in the dependency.
38-66
: The functionconvert_to_freq_dict
is well-documented and handles edge cases by raising aValueError
if the expected columns are not present. This is a good practice for robust error handling.
208-208
: Registeringfilter_to_codes
as a new resolver inOmegaConf
is a significant change. Ensure that this new resolver is used appropriately throughout the project.Verification successful
The new resolver
filter_to_codes
is used appropriately in the following locations:
src/MEDS_tabular_automl/describe_codes.py
src/MEDS_tabular_automl/scripts/tabularize_static.py
src/MEDS_tabular_automl/configs/tabularization/default.yaml
This confirms that the resolver is integrated and utilized correctly within the project.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the new resolver `filter_to_codes` is used appropriately. # Test: Search for resolver usage. Expect: Appropriate usage in configuration files or other Python scripts. rg --type python $'filter_to_codes'Length of output: 65
Script:
#!/bin/bash # Description: Verify that the new resolver `filter_to_codes` is used appropriately. # Test: Search for resolver usage. Expect: Appropriate usage in configuration files or other Python scripts. rg 'filter_to_codes'Length of output: 775
69-102
: The functioncompute_feature_frequencies
is well-documented and includes a comprehensive example. Ensure that the new functionality aligns with the rest of the module and that all dependent functions are updated accordingly.Verification successful
The occurrences of
compute_feature_frequencies
in the codebase are as follows:
src/MEDS_tabular_automl/describe_codes.py
: The function is defined and used within an example in the docstring.src/MEDS_tabular_automl/scripts/describe_codes.py
: The function is imported and used within therwlock_wrap
function call.docs/source/generated/src.MEDS_tabular_automl.describe_codes.rst
: The function is listed in the documentation.Based on these findings, the function
compute_feature_frequencies
appears to be correctly implemented, imported, and documented. There are no indications of outdated or incorrect usage.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all dependent functions are updated to work with the new `compute_feature_frequencies` functionality. # Test: Search for function usage. Expect: Only occurrences with updated functionality. rg --type python $'compute_feature_frequencies'Length of output: 77
Script:
#!/bin/bash # Description: Verify that all dependent functions are updated to work with the new `compute_feature_frequencies` functionality. # Test: Search for function usage. Expect: Only occurrences with updated functionality. rg 'compute_feature_frequencies'Length of output: 538
Script:
#!/bin/bash # Description: Extract relevant lines for detailed inspection to verify the correct usage and updates of `compute_feature_frequencies`. # Extract lines from `src/MEDS_tabular_automl/describe_codes.py` echo "Occurrences in src/MEDS_tabular_automl/describe_codes.py:" rg --context 5 'compute_feature_frequencies' src/MEDS_tabular_automl/describe_codes.py # Extract lines from `src/MEDS_tabular_automl/scripts/describe_codes.py` echo "Occurrences in src/MEDS_tabular_automl/scripts/describe_codes.py:" rg --context 5 'compute_feature_frequencies' src/MEDS_tabular_automl/scripts/describe_codes.py # Extract lines from `docs/source/generated/src.MEDS_tabular_automl.describe_codes.rst` echo "Occurrences in docs/source/generated/src.MEDS_tabular_automl.describe_codes.rst:" rg --context 5 'compute_feature_frequencies' docs/source/generated/src.MEDS_tabular_automl.describe_codes.rstLength of output: 2582
""" | ||
permutations = [] | ||
for i in range(1, len(list_of_options) + 1): | ||
permutations.extend(list(combinations(list_of_options, r=i))) | ||
format_print(sorted(permutations)) | ||
|
||
|
||
def main(): | ||
"""Generate all possible permutations of a list of options.""" | ||
def main() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See prior comment re -> None
Summary by CodeRabbit
New Features
Refactor
Tests
Documentation