You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@kwinkunks Thanks a lot for creating a nice package.
Here are some more detailed comments that I think could be helpful.
wasserstein could return a pandas DataFrame with appropriate index and column names.
It will be helpful to see an example where the data is not identically distributed by construction, e.g. after applying different SrandardScalers?
The sentence "This shows us that the distributions of the PE log in well indices 6 and 7 are somewhat different and may be anomalous." should be revised to match with the text/str indices as shown in the figure above.
is_correlated how does this work? Which correlation is used? How does it convert the correlation value to a binary outcome? How are the chunks correlated?
The sentence "That is, shuffling the data removes the correlation, but does not mean the records are independent." is unclear and confusing. How is a user supposed to make sense of the output of this function?
feature_importances it is unclear how the order of the output is related to the input. Also do the lower or higher values mean higher importance?
The API documentation says "In each case, the n normalized importances with the most variance are averaged.". This is unclear and will be helpful for a user to get more precise information on what was done like how were the scores normalized? If something like Z-scoring was done then would that be appropriate as it could change the importance scores to negative.
Also this combining of importance scores is non-standard, at least I have not seen it commonly used. So it will be helpful to have references for this approach.
More genereally, the documentation should provide more details on what the functions are exactly doing.
Part of #97. The goal is to make the function more explainable.
Moved the aggregation function to utils.py where it can be
inspected more easily. By default it's a simple sum, but
for the feature importance measure it normalizes before
summing over features, then normalizes the result so the
importances sum to 1. Borda counts are an option too but
I think scores are best for importance, to preserve the
magnitudes of the coefficients.
Also switched to LinearRegression instead of Lasso, which
should really be fitted for alpha, see #98.
@kwinkunks Thanks a lot for creating a nice package.
Here are some more detailed comments that I think could be helpful.
wasserstein
could return a pandas DataFrame with appropriate index and column names.It will be helpful to see an example where the data is not identically distributed by construction, e.g. after applying different SrandardScalers?
The sentence "This shows us that the distributions of the PE log in well indices 6 and 7 are somewhat different and may be anomalous." should be revised to match with the text/str indices as shown in the figure above.
is_correlated
how does this work? Which correlation is used? How does it convert the correlation value to a binary outcome? How are the chunks correlated?The sentence "That is, shuffling the data removes the correlation, but does not mean the records are independent." is unclear and confusing. How is a user supposed to make sense of the output of this function?
feature_importances
it is unclear how the order of the output is related to the input. Also do the lower or higher values mean higher importance?The API documentation says "In each case, the n normalized importances with the most variance are averaged.". This is unclear and will be helpful for a user to get more precise information on what was done like how were the scores normalized? If something like Z-scoring was done then would that be appropriate as it could change the importance scores to negative.
Also this combining of importance scores is non-standard, at least I have not seen it commonly used. So it will be helpful to have references for this approach.
More genereally, the documentation should provide more details on what the functions are exactly doing.
ping openjournals/joss-reviews#6065
The text was updated successfully, but these errors were encountered: