Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logloss #40

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Logloss #40

wants to merge 4 commits into from

Conversation

mllg
Copy link

@mllg mllg commented Sep 19, 2019

LogLoss is not defined for p=0 and p=1. Other toolkits clip to [0+eps, 1-eps] to overcome this: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html

@mfrasco
Copy link
Owner

mfrasco commented Sep 20, 2019

Thank you for the contribution to the package. I'm torn about what to do here. My primary concern is backwards compatibility. What if some end user has a line of code that is like

if (!is.finite(logLoss(actual, predicted)) {
    predicted <- pmax(eps, pmin(1 - eps, predicted))
}

What do you think about this concern? Other contributors have proposed backwards incompatible changes, and I need to think through the implications.

On the other hand, I question whether the Metrics package should be clipping the user's predictions for them. Perhaps the user should do that themselves? Also, why did you choose 1e-12 rather than 1e-15 as scikit-learn uses or something like .Machine$double.xmin?

@mllg
Copy link
Author

mllg commented Sep 20, 2019

What do you think about this concern? Other contributors have proposed backwards incompatible changes, and I need to think through the implications.

Well, that's a difficult question. One could introduce additional arguments (clip = FALSE) to stick to the old behavior? The same could be done for undefined values for precision/recall in #36 (missing.val = NA vs. missing.val = 0). Note that you might end up with a package with really inconvenient defaults ...

On the other hand, I question whether the Metrics package should be clipping the user's predictions for them. Perhaps the user should do that themselves? Also, why did you choose 1e-12 rather than 1e-15 as scikit-learn uses or something like .Machine$double.xmin?

It was late, 1e-15 should also work. AFAIK the more generic approach would be to use sqrt(.Machine$double.eps) (c.f. ?all.equal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants