-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some small changes to update things for more recent versions of packages. #115
Conversation
…removed references to np.NaN as that is removed in np 2.0
WalkthroughThe updates across several files aim to replace usage of Changes
Sequence Diagram(s)Sequence diagrams are not applicable for these changes as the updates mainly involve data representation and handling refinements, without significant alterations to control flow or feature interactions. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (4)
EventStream/data/pytorch_dataset.py (1)
Line range hint
209-209
: Potential override of loop variablen_events
The static analysis has flagged a potential issue where the loop control variable
n_events
may override the iterable it iterates over. This could lead to unexpected behavior or bugs.- for i, (subj, n_events) in enumerate(zip(subject_ids, n_events)): + for i, (subj, num_events) in enumerate(zip(subject_ids, n_events)):EventStream/data/dataset_polars.py (3)
Line range hint
203-203
: Refactor type checking to useisinstance()
.The static analysis tool suggests using
isinstance()
for type checking instead of direct type comparison. This is a more Pythonic way of checking types and can handle cases where a variable might be an instance of a subclass.- if type(qq) is Path: + if isinstance(qq, Path):
Line range hint
1648-1666
: Ensure loop variables are correctly bound in lambda functions.Several warnings have been generated about lambda functions not binding loop variables correctly within list comprehensions. This can lead to unexpected behavior where the lambda captures the same (last) value for each iteration instead of the intended loop-specific value.
- .name.map(lambda c: f"dynamic/{m}/{c.replace(key_prefix, '')}/count"), + .name.map(lambda c, m=m, key_prefix=key_prefix: f"dynamic/{m}/{c.replace(key_prefix, '')}/count"),
Line range hint
2013-2013
: Optimize dictionary key existence check.The static analysis tool suggests using a more efficient method for checking key existence in a dictionary. Instead of using
key in dict.keys()
, which unnecessarily creates a list of keys, usekey in dict
.- if feature not in cfg.vocabulary.idxmap.keys(): + if feature not in cfg.vocabulary.idxmap:
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
pyproject.toml
is excluded by!**/*.toml
Files selected for processing (4)
- EventStream/data/README.md (2 hunks)
- EventStream/data/dataset_polars.py (6 hunks)
- EventStream/data/pytorch_dataset.py (2 hunks)
- tests/data/test_pytorch_dataset.py (2 hunks)
Additional context used
Ruff
EventStream/data/pytorch_dataset.py
209-209: Loop control variable
n_events
overrides iterable it iterates (B020)EventStream/data/dataset_polars.py
203-203: Do not compare types, use
isinstance()
(E721)
1648-1648: Function definition does not bind loop variable
m
(B023)
1648-1648: Function definition does not bind loop variable
key_prefix
(B023)
1652-1652: Function definition does not bind loop variable
m
(B023)
1652-1652: Function definition does not bind loop variable
val_prefix
(B023)
1654-1654: Function definition does not bind loop variable
m
(B023)
1654-1654: Function definition does not bind loop variable
val_prefix
(B023)
1657-1657: Function definition does not bind loop variable
m
(B023)
1657-1657: Function definition does not bind loop variable
val_prefix
(B023)
1658-1658: Function definition does not bind loop variable
m
(B023)
1658-1658: Function definition does not bind loop variable
val_prefix
(B023)
1659-1659: Function definition does not bind loop variable
m
(B023)
1659-1659: Function definition does not bind loop variable
val_prefix
(B023)
1666-1666: Function definition does not bind loop variable
m
(B023)
2013-2013: Use
key in dict
instead ofkey in dict.keys()
(SIM118)Remove
.keys()
LanguageTool
EventStream/data/README.md
[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE)
Context: ...ep-learning applications and models. 3. For general modeling and data analysis. Th...
[style] ~55-~55: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’. (HYPHENATED_LY_ADVERB_ADJECTIVE)
Context: ...ta are presented to models in a sparse, fully-temporally realized format which should be more ef...
[style] ~58-~58: ‘with respect to’ might be wordy. Consider a shorter alternative. (EN_WORDINESS_PREMIUM_WITH_RESPECT_TO)
Context: ...yle of data embedding overall (at least with respect to the per-timestamp data embedding practi...
[uncategorized] ~63-~63: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...nly the following data: 1.subject_id
, which is the identifier for the subject...
[uncategorized] ~64-~64: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...ntifier for the subject. 2.start_time
, which is the start timestamp for the su...
[uncategorized] ~64-~64: It seems likely that a singular genitive (’s) apostrophe is missing. (AI_HYDRA_LEO_APOSTROPHE_S_XS)
Context: ..., which is the start timestamp for the subjects overall record. 3.
static_indices`, wh...
[uncategorized] ~65-~65: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...ects overall record. 3.static_indices
, which contains a list of integral indic...
[uncategorized] ~68-~68: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...sistent. 4.static_measurement_indices
, which contains a list of the integral i...
[uncategorized] ~71-~71: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ... length withstatic_indices
. 5.time
, which contains a list of the time in mi...
[uncategorized] ~72-~72: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ... event in question. 6.dynamic_indices
, which contains a list of lists, where t...
[uncategorized] ~75-~75: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ... event. 7.dynamic_measurement_indices
, which is of the same (ragged) shape as ...
[uncategorized] ~77-~77: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...ndynamic_indices
. 8.dynamic_values
, which is of the same (ragged) shape as ...
[uncategorized] ~93-~93: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...g to how they vary in time: 1.STATIC
: in which case they are unchanging and c...
[uncategorized] ~94-~94: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ... subject. 2.FUNCTIONAL_TIME_DEPENDENT
: in which case they can be specified in ...
[uncategorized] ~96-~96: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: .../or a continuous timestamp. 3.DYNAMIC
: in which case they are time-varying, bu...
[grammar] ~98-~98: The adjective “many-to-one” is spelled with hyphens. (ONE_TO_MANY_HYPHEN)
Context: ...asurements are linked to events in a many to one fashion and are identified via a separa...
[style] ~107-~107: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing. (REP_NEED_TO_VB)
Context: ...or deep-learning applications (e.g., we need to use embedding matrices for categoric...
[grammar] ~109-~109: It looks like the word order is not correct. (WORD_ORDER_QUESTIONS)
Context: ...us values for numerical values). 3. How one would go about generating such data within a ...
[uncategorized] ~115-~115: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...ently: 1.SINGLE_LABEL_CLASSIFICATION
: In this case, the measure is assumed to...
[uncategorized] ~117-~117: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...maximum. 2.MULTI_LABEL_CLASSIFICATION
: In this case, the measure is assumed to...
[uncategorized] ~120-~120: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...asurements. 3.MULTIVARIATE_REGRESSION
: In this case, the measure is assumed to...
[grammar] ~122-~122: This phrase is duplicated. You should probably use “the values” only once. (PHRASE_REPETITION)
Context: ... multivariate regression problem and the values the values observed. This keys for this measure ar...
[uncategorized] ~125-~125: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...r that event. 4.UNIVARIATE_REGRESSION
: In this case, the measure is assumed to...
[uncategorized] ~137-~137: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...rocessing of those values: 1.INTEGER
: the observed values will be converted t...
[uncategorized] ~138-~138: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...ill be converted to integers. 2.FLOAT
: the observed values will remain as floa...
[uncategorized] ~138-~138: If this is a compound adjective that modifies the following noun, use a hyphen. (EN_COMPOUND_ADJECTIVE_INTERNAL)
Context: ...AT: the observed values will remain as floating point values. 3.
CATEGORICAL_INTEGER`: the o...
[uncategorized] ~139-~139: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...g point values. 3.CATEGORICAL_INTEGER
: the observed values will be converted t...
[style] ~140-~140: ‘on the basis of’ might be wordy. Consider a shorter alternative. (EN_WORDINESS_PREMIUM_ON_THE_BASIS_OF)
Context: ... normalized into a set of categories on the basis of the values observed. 4. `CATEGORICAL_FL...
[uncategorized] ~141-~141: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ... values observed. 4.CATEGORICAL_FLOAT
: the observed values will be normalized ...
[style] ~141-~141: ‘on the basis of’ might be wordy. Consider a shorter alternative. (EN_WORDINESS_PREMIUM_ON_THE_BASIS_OF)
Context: ... be normalized into a set of categories on the basis of the values observed. #### Configura...
[uncategorized] ~147-~147: Do not mix variants of the same word (‘modelling’ and ‘modeling’) within a single text. (EN_EXACT_COHERENCY_RULE)
Context: ....Dataset` object should pre-process for modelling use, but it also is filled during pre-p...
[uncategorized] ~152-~152: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...ds include: 1.MeasurementConfig.name
: contains the name of the measurement, a...
[style] ~153-~153: ‘on the basis of’ might be wordy. Consider a shorter alternative. (EN_WORDINESS_PREMIUM_ON_THE_BASIS_OF)
Context: ...manually in practice; it will be filled on the basis of how the measurement config is stored...
[grammar] ~164-~164: The expression “vice versa” is spelled without hyphens. (VICE_VERSA)
Context: ...able to a plain-old-data dictionary and vice-versa) that is used to compute the column val...
[style] ~183-~183: The words ‘observed’ and ‘observation’ are quite similar. Consider replacing ‘observed’ with a different word. (VERB_NOUN_SENT_LEVEL_REP)
Context: ...res how many times that measurement was observed with a non-null value (or a non-null...
[style] ~185-~185: The words ‘observed’ and ‘observation’ are quite similar. Consider replacing ‘observed’ with a different word. (VERB_NOUN_SENT_LEVEL_REP)
Context: ...) per possible instance where it was observed at all (e.g., all possible subjects wit...
[style] ~201-~201: The phrase “A variety of” may be wordy. To make your writing clearer, consider replacing it. (A_VARIETY_OF)
Context: ...qual to the keys in this dictionary. 2. A variety of preprocessing simple parameters, which ...
[style] ~227-~227: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE)
Context: ... passed as a parameter to the function. For saving (loading) the dataset will write...
[uncategorized] ~230-~230: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short). (COMMA_COMPOUND_SENTENCE)
Context: ... we recommend using theparquet
format and it is the only format supported out of ...
[uncategorized] ~268-~268: Possible missing comma found. (AI_HYDRA_LEO_MISSING_COMMA)
Context: ...d normalizer model is a standard scaler model which centers data to have zero mean an...
[style] ~280-~280: ‘on the basis of’ might be wordy. Consider a shorter alternative. (EN_WORDINESS_PREMIUM_ON_THE_BASIS_OF)
Context: ...events in the data, they are determined on the basis of a priori, known functions whose sole in...
[uncategorized] ~301-~301: Do not mix variants of the same word (‘modelling’ and ‘modeling’) within a single text. (EN_EXACT_COHERENCY_RULE)
Context: ...uring dataset pre-processing for use in modelling. After transformation, column types wil...
[grammar] ~320-~320: Two determiners in a row. Choose either “a” or “the”. (DT_DT)
Context: ..._df.index` and indicates to which event a the row's metadata element corresponds. Man...
[uncategorized] ~331-~331: Do not mix variants of the same word (‘modelling’ and ‘modeling’) within a single text. (EN_EXACT_COHERENCY_RULE)
Context: ...uring dataset pre-processing for use in modelling. ###EventStream.data.PytorchDataset
...
[uncategorized] ~349-~349: Do not mix variants of the same word (‘modelling’ and ‘modeling’) within a single text. (EN_EXACT_COHERENCY_RULE)
Context: ...s suitable for embedding and downstream modelling over randomly chosen sub-sequences with...
[uncategorized] ~350-~350: Do not mix variants of the same word (‘modelling’ and ‘modeling’) within a single text. (EN_EXACT_COHERENCY_RULE)
Context: ... mode is useful for generative sequence modelling, but less so for supervised learning, i...
[uncategorized] ~354-~354: The preposition ‘for’ seems more likely in this position. (AI_HYDRA_LEO_REPLACE_AT_FOR)
Context: ...ers can also specify atask_df
object at construction of this dataset. This data...
[style] ~370-~370: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE)
Context: ... the sequence length for patienti
. - LetK
denote the number of static data el...
[style] ~371-~371: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE)
Context: ...ta elements observed for patienti
. - LetM[j]
denote the number of per-event d...
[uncategorized] ~400-~400: When ‘event-specific’ is used as a modifier, it is usually spelled with a hyphen. (SPECIFIC_HYPHEN)
Context: ...equence. -'dynamic_*'
corresponds to event specific metadata elements describing each seque...
[style] ~401-~401: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE)
Context: ...ng each sequence event. -'*_indices'
corresponds to the categorical index of the data el...
[style] ~409-~409: You’ve already used the word ‘also’ once in your sentence, so using it again may be redundant. (REDUNDANT_FILLER)
Context: ...ls were also specified, then there will also be an entry in the output dictionary per t...
[misspelling] ~415-~415: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’. (EN_A_VS_AN)
Context: ...essed like a dictionary, but it is also a object stored intypes.py
of class `E...
[style] ~423-~423: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE)
Context: ...er-batch maximum sequence length. -M
is the per-batch maximum number of data el...
[style] ~424-~424: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE)
Context: ...umber of data elements per event. -K
is the per-batch maximum number of static ...
[style] ~448-~448: You’ve already used the word ‘also’ once in your sentence, so using it again may be redundant. (REDUNDANT_FILLER)
Context: ...ls were also specified, then there will also be a dictionary at keystream_labels
wit...
[style] ~462-~462: ‘prior to’ might be wordy. Consider a shorter alternative. (EN_WORDINESS_PREMIUM_PRIOR_TO)
Context: ...ng matrices of differing dimensions, prior to being combined via projection layers an...
Markdownlint
EventStream/data/README.md
373-373: null (MD040, fenced-code-language)
Fenced code blocks should have a language specified
426-426: null (MD040, fenced-code-language)
Fenced code blocks should have a language specified
Additional comments not posted (7)
tests/data/test_pytorch_dataset.py (2)
182-182
: Updated representation of NaN values in test dataThe change from
np.NaN
tofloat('nan')
indynamic_values
is consistent with the updates across the project to standardize NaN representations. This will ensure that tests remain aligned with the implementation.
244-244
: Updated representation of NaN values in test dataThe change from
np.NaN
tofloat('nan')
indynamic_values
is consistent with the updates across the project to standardize NaN representations. This will ensure that tests remain aligned with the implementation.EventStream/data/pytorch_dataset.py (1)
57-57
: Updated handling of null values and sorting into_int_index
The new implementation of
to_int_index
correctly handles null values by explicitly returningNone
for them and uses a more efficient sorting mechanism. This change should improve the robustness and performance of index generation.EventStream/data/README.md (1)
79-80
: Updated Representation of Missing or Outlier ValuesThe documentation now consistently uses
float('nan')
instead ofnp.NaN
to represent missing or filtered outlier values. This change aligns with the updates across the project's codebase to standardize NaN representations.EventStream/data/dataset_polars.py (3)
434-455
: Updated handling of bounds to usefloat('nan')
.The function
drop_or_censor
has been updated to usefloat('nan')
instead ofnp.NaN
for handling boundary conditions. This change is consistent with the PR's objective to update the codebase for more recent versions of packages. The documentation and parameters have been updated accordingly to reflect this change. Ensure that all references to this function across the project are compatible with these changes.
1181-1181
: Updated handling of numerical values to usefloat('nan')
.The function
_transform_numerical_measurement
now usesfloat('nan')
in place ofnp.NaN
when handling numerical measurements that do not meet certain criteria. This update aligns with the overall changes in the project to standardize the representation of NaN values.
1243-1243
: Updated handling of categorical values to usefloat('nan')
.The function
_transform_categorical_measurement
has been updated to usefloat('nan')
for handling categorical measurements that are not found in the vocabulary. This change ensures consistency with the new standard for NaN values across the project.
No description provided.