-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Variable.make #3925
Remove Variable.make #3925
Conversation
b84acf7
to
dd50aba
Compare
This passes
I think we should deprecate |
f023d17
to
5283234
Compare
@lanzagar, can you take a look? There are still things to smooth out (like removing all calls of Variable.make, which are currently forwarded to normal constructors), but in principle this seems to work. Comments from others very welcome, too, of course. |
I can look at it on Thursday/Friday. As for |
The problem is that two variables are now equal if they are "compatible". As a consequence, |
I like this approach that now all discrete variable with the same name (without compute_value) are equal and then DomainConversion creates a mapping as a compute_value-type function. I have no counterexamples yet: I do not see a reason why it couldn't work, but I am not sure it will either. :) |
Some bugs will probably need to be ironed out, but I think this is good conceptually.
|
5283234
to
5e94af0
Compare
Codecov Report
@@ Coverage Diff @@
## master #3925 +/- ##
=========================================
Coverage ? 84.66%
=========================================
Files ? 371
Lines ? 64892
Branches ? 0
=========================================
Hits ? 54940
Misses ? 9952
Partials ? 0 |
Codecov Report
@@ Coverage Diff @@
## master #3925 +/- ##
=========================================
- Coverage 85.42% 85.4% -0.02%
=========================================
Files 385 385
Lines 68873 68871 -2
=========================================
- Hits 58832 58821 -11
- Misses 10041 10050 +9 |
@lanzagar: the problem was caused by inconsistency in More good news: the schema I used (logistic regression trained on two-class Iris from Select Rows is tested on three-class directly from File) does not work on master branch. Here it now works, after the last fixes. |
f2f9288
to
7c45427
Compare
@lanzagar, @markotoplak, now would be a good time to check and merge -- if OK. This will surely introduce some bugs, we must use it for some time before releasing it. "47 changed files" looks scary, but most are just fixes in tests -- removal of clear_all_caches and giving names to test variables with empty names. |
I tried this with tests from Spectroscopy. Your changes do not break anything significant. Great work! What worries me that now tests of spectroscopy took 302 sec. On master they finish in 61 seconds. These tests are pretty heavy on data processing and classification where the test data has to be transformed. I'll try to narrow this down a bit. |
@@ -254,6 +258,13 @@ def empty(self): | |||
"""True if the domain has no variables of any kind""" | |||
return not self.variables and not self.metas | |||
|
|||
def _get_equivalent(self, var): | |||
if isinstance(var, Variable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this:
import time
import Orange
data = Orange.data.Table([list(range(5000))])
t = time.time()
pdata = Orange.preprocess.SklImpute()(data)
print("first transform", time.time()-t, "s")
t = time.time()
data.transform(pdata.domain)
print("Table.tranform", time.time()-t, "s")
On master I get
first transform 0.6564505100250244 s
Table.tranform 0.5023694038391113 s
but on this branch
first transform 11.536584615707397 s
Table.tranform 0.5088396072387695 s
My profiler led me here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in the "Removal of Variable.make: _get_equivalent in O(1)" commit.
I pushed a commit that makes _get_equivalent faster. Feel free to make it prettier. Performance issues now seem to be resolved. |
Orange/data/variable.py
Outdated
return hasattr(other, "master") and self.master is other.master | ||
return type(self) is type(other) \ | ||
and self.name == other.name \ | ||
and self._compute_value is other._compute_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get self._compute_value == other._compute_value
so that compute_values could define their own __eq__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why not. I changed it.
Thanks for fixing |
I very much approve this PR. Thanks a lot for this, Janez. |
8d53e66
to
ecbcd3f
Compare
@janezd, I agree that removing backmapping does not belong to this PR. Forward mapping is necessary for models to work, but back mapping is not. We only need it when we compute accuracy - it therefore seems to belong there. But yes, a potential compatibility breaking change and perhaps best handled separately. |
`ReplaceUnknownsModel` did not work on input dataset with `class_vars` since biolabgh-3925 due to error raised by backmapping.
Implements #3705. Probably also #3521, #3949, #3049.
Includes