Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink the measure of generalization error #16

Open
psychelzh opened this issue Sep 25, 2024 · 1 comment
Open

Rethink the measure of generalization error #16

psychelzh opened this issue Sep 25, 2024 · 1 comment
Labels
help wanted ❤️ we'd love your help!

Comments

@psychelzh
Copy link
Owner

psychelzh commented Sep 25, 2024

Currently, the measure of generalization error used in summary() is the correlation between the pooled predictions and the real values. But sklearn warns against doing so:

Note on inappropriate usage of cross_val_predict

The result of cross_val_predict may be different from those obtained using cross_val_score as the elements are grouped in different ways. The function cross_val_score takes an average over cross-validation folds, whereas cross_val_predict simply returns the labels (or probabilities) from several distinct models undistinguished. Thus, cross_val_predict is not an appropriate measure of generalization error.

A sounder method is calculating generalization errors separately for each fold, and the average them. But Pearson correlations might need special treating.

@psychelzh psychelzh added the help wanted ❤️ we'd love your help! label Sep 25, 2024
@psychelzh
Copy link
Owner Author

psychelzh commented Sep 25, 2024

Of course, the current method just follows the original paper. So, we would leave this issue open here because it might not be proper to implement another measure for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted ❤️ we'd love your help!
Projects
None yet
Development

No branches or pull requests

1 participant