-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Implement predict() in simple tree and simple RF models #6258
Conversation
f48cb7c
to
fdfba08
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6258 +/- ##
==========================================
- Coverage 86.70% 86.70% -0.01%
==========================================
Files 316 316
Lines 68120 68129 +9
==========================================
+ Hits 59064 59068 +4
- Misses 9056 9061 +5 |
@VesnaT wondered why was this the only problematic model. Well, because the new deprecation does not break anything we see that it was the only model primarily working through |
fdfba08
to
c5a8a1f
Compare
I was confused how our models work and thus I wanted to deprecate the default implementation of Then I found where Notice that the tests do not fail even with the deprecation in place. This is no tests in Orange is calling |
c5a8a1f
to
75ff06e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big issue here, which remains, is how Orange's Tables behave: if arrays were copied in init, only internal arrays would be locked.
I think this is the important observation. If creating tables with existing arrays is a problem, I am sure it happens in many other places, not just predict from this PR.
I like the proposal of copying the arrays in init, maybe by introducing a new copy parameter that is True by default but can be explicitly changed so it is still possible to reuse the same arrays (but only if we can think of cases where this would actually be needed/useful).
In case we change Table's init, we would not need to copy X before using it to construct a Table in predict, which I mentioned in the comments in this PR.
What do you think @markotoplak, @janezd?
pt = tree.predict(X) | ||
p += pt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to introduce pt - you could still += in one line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I could. But I kept it for symmetry with the other class (for classification).
Orange/base.py
Outdated
warnings.warn("Automatic fallback to predict_storage will be removed in 3.36. " | ||
"All models need to implement predict.", | ||
OrangeDeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you expect people to do, when they see this warning?
If the model did not implement a predict before, they could just copy the 4 lines below into their predict which of course would solve nothing?
If I understand the original problem correctly, the Table should be created with a copy of X. If that is the fix, I think it is better to put it here one time and avoid people maybe implementing predict incorrectly in their own forced implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently just do not see any good reasons that anyone would directly implement predict_storage
. Most likely they should implement predict
and I'd guess that in most of cases they would not need to construct a table.
The only exception I can imagine is that they want to call sublearner's __call__
.
Waiting for counterexamples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main counterexamples are when working with non- numpy/Table data or some additional information not found in X/Y/metas.
- SqlTable used fit/predict storage because you can offload processing to the sql server (if you use X, it downloads a sample). Although we are not actively working on that, I like that the architecture supports the option to.
- I see that Survival analysis uses fit_storage, because it needs more than X&Y, but not for predicting, where X is enough? @JakaKokosar
- You are developing Dask tables, maybe we will want to adjust and optimize some learner in the future that will use some DaskTable methods/caching? (but more likely in this case we could just have an
if
in standard predict to check the type of X and use dask functions directly). - We have Corpus and Timeseries classes that extend Table and maybe some models will want to use them and the numpy arrays will not be enough
Bottom line, if you are questioning why "anyone would directly implement predict_storage", then the discussion should be if we want to remove that method completely - and as long as everything in Orange is based on Tables and we support its extensions, I would lean towards leaving this support and not rely exclusively on numpy arrays. At least until we switch to pandas, then we should re-evaluate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a list where this could go wrong. I agree. Let's not deprecate.
@lanzagar, the goal of the deprecation was to make it clearer to the programmer what is being called where and make it clear that the function to implement is certainly If we decide we really need a consistent handling of the case when If this deprecation if too controversial I can remove it from this PR. |
This avoid creating intermediate tables.
75ff06e
to
28c90fb
Compare
I removed the deprectation commit. Let's handle that separately. |
Issue
This avoids creating intermediate tables in
predict()
, which then create problems onwards because, currently, creating a table with an array locks the array given. The array remains locked even when the intermediate table is disregarded.This PR is a fix for some crashes in biolab/orange3-explain#54.
The big issue here, which remains, is how Orange's Tables behave: if arrays were copied in init, only internal arrays would be locked. Anyway, I believe the code is now a bit cleaner.
Includes