-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve infrastructure for experimental dispatching of non existing methods in cuML #6148
base: branch-24.12
Are you sure you want to change the base?
Conversation
@@ -792,3 +792,25 @@ class UniversalBase(Base): | |||
""" | |||
|
|||
return False | |||
|
|||
def _check_cpu_model(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delete this method from ProxyEstimator
now that it is implemented here?
I think the only bigger picture question for this PR is: should we just implement the missing methods instead of forwarding? It would probably benefit all cuml users, be more straightforward (measure in less dunder usage :D), but it is a lot more work. We can decide in either direction, but wanted to bring it up so that we briefly talk about it and then decide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think that the dispatching mechanism should work well :). However, what worries me a bit is that it is not a guarantee that Scikit-Learn functions will all work fine just following attributes transfers. There seems to be a lot of edge cases, and I believe that this would require some more rigorous testing. Ideally, if we want to guarantee functionality we would have to whitelist the functions that work fine for each estimator.
and creates one if necessary. | ||
""" | ||
if not hasattr(self, "_cpu_model"): | ||
self.import_cpu_model() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the call to the import_cpu_model
function is not necessary here as we already checked the presence of the _cpu_model_class
attribute earlier.
Maybe something to add is a test that iterates overall cuml estimators and their scikit-learn equivalent and checks all attrs exist. We could extend it to instantiate each estimator, fit it and then repeat the check. |
For the "iterate over all estimators" part: I wrote something to do that for #6107 and am wanting to re-use it for #4753 (iterate all estimators, then filter those that accept |
This PR adds methods in UniversalBase, so that cuML estimators that inherit from it can enable better errors, and experimentally dispatch to other libraries (sklearn, umap, hdbscan...) for methods that haven't been implemented in cuML itself.