-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] proba regression: reduction to multiclass classification #378
Comments
Yes I think this would be a good thing to implement once #335 is complete and merged. |
I will be making the PR for this today had some few doubts needing clarification
|
But that's available only once you've fitted it, which is later than construction. How would that work, logically?
No, you pass the entire classifier instance. As I'm saying above, parameters are |
Oh I thought I had to take input as strings like I did in case of statsmodels. If I take the input as a sklearn classifier instance then thats not an issue at all. |
Since we are constructing the Histogram distribution only when we call |
Yes, inputs being strings is "bad design" if a viable alternative is the composition/strategy patterns. Because with strings, you always have to add the encoding manually, whereas in composition you can pass any component that is API compliant. |
I think you still need the exact bins because you need to pass them to |
From the discussion today, a short design for a reducer to multiclass classification mentioned in #7.
Parameters are:
sklearn
classifierclf
capable of multiclass classificationbins
arg, default = 10. Possible values are int, or an ordered list of float.The algortihm does as follows:
bins
is int, replaces this arg internally by that many bins, at the bins + 1 equally spaced quantiles of the empirical training distribution.fit
, fitsclf
to this binned training datapredict_proba
, usesclf.predict.proba
to obtain class probabilities, and uses these together with the bins frombins
to obtain aHistogram
distributionOne could also think about another algorithm where the bins are cumulative, i.e., being contained in the bin defined by lowest point to i-th bin. This is also valid but one needs to be careful that the resulting cdf is monotonic. Could be a choice of strategy.
FYI @ShreeshaM07, @SaiRevanth25.
The text was updated successfully, but these errors were encountered: