-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Edit Domain (and perhaps other widgets) could cause missing data later in the workflow #4922
Conversation
02f4c70
to
1b33b03
Compare
1b33b03
to
6c1c324
Compare
Codecov Report
@@ Coverage Diff @@
## master #4922 +/- ##
==========================================
+ Coverage 84.12% 84.14% +0.02%
==========================================
Files 283 278 -5
Lines 57819 57045 -774
==========================================
- Hits 48640 48001 -639
+ Misses 9179 9044 -135 |
Orange/preprocess/transformation.py
Outdated
|
||
return super().__eq__(other) \ | ||
and len(self.lookup_table) == len(other.lookup_table) \ | ||
and all(nan_equal(x, y) or np.isnan(x) and np.isnan(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nan_equal
already handles nan
s so the second part after or
is probably a leftover from before nan_equal
was introduced?
@@ -156,3 +160,17 @@ def transform(self, column): | |||
column[mask] = 0 | |||
values = self.lookup_table[column] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed this in passing - the docs say self.lookup_table
can also be a list or tuple, but this looks like it will not work for native python types (indexing with a np.ndarray)
Orange/preprocess/transformation.py
Outdated
and len(self.lookup_table) == len(other.lookup_table) \ | ||
and all(nan_equal(x, y) or np.isnan(x) and np.isnan(y) | ||
for x, y in zip(self.lookup_table, other.lookup_table)) \ | ||
and nan_equal(self.unknown, other.unknown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised that we do not have a helper function somewhere in utils already for checking if contents of two iterables are the same that handles nan
s. Looks like something we would need in several places (maybe we just solve it locally every time :)).
Anyway, how about:
a = np.array(self.lookup_table)
b = np.array(other.lookup_table)
((a == b) | (np.isnan(a) & np.isnan(b))).all()
If the observation holds that the docs are incorrect and lookup_table
already has to be a numpy array, then only the final line is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the docstring: lookup_table must obviously be an array. As for checking, I went for np.allclose(..., ..., equal_nan=True)
.
def __eq__(self, other): | ||
return type(other) is type(self) and self.variable == other.variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the default __eq__
for Transformation
along these lines:
return type(other) is type(self) and vars(self) == vars(other)
Seems that that is usually what we want and it would make a lot of trivial overloads of eq unnecessary?
__hash__
is a bit more tricky since class attributes can often be unhashable objects. We could leave it as it is or try hashing a sorted tuple of vars(self)
. The latter should work for simple cases like Indicator
and at least fails more noticably if it should have been overloaded in a subclass but was not!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant here. It looks like a good idea, but making the parent class to smart might shoot a derived class in the foot. Somebody could add an attribute without realizing it's used in comparisons. Maybe it's better to be explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You convinced me at first, but then again, you can make an equivalent claim for the other side:
Somebody could add an attribute without realizing it's NOT used in comparisons.
Maybe I preferred comparing everything (vs the minimum) because it behaves closer to how it was before - if some class doesn't overload eq it might erroneously return False (like up to now), but a True should be correct.
In the end it does not really matter, if something is not working like it should it's a bug to be fixed in either case.
I don't mind merging it with either default.
e58036f
to
ba3e1c4
Compare
Issue
Fixes #4895.
Equality of variables also checks equality of their
compute_value
. This can only work ifTransformation
and derived classes have__eq__
and__hash__
.To fix #4895, it would suffice to change
oweditdomain.LookupMappingTransform
. The PR does it for other classes too, to prevent similar future mishaps.Description of changes
Define
__eq__
and__hash__
where needed.Any add-ons that don't have will now misbehave in a different way -- before this,
__eq__
always returnedFalse
, even when transforms were the same. Now it will sometimes returnTrue
even though they may be different. This will happen if the class defines additional fields that are not checked by the base class.The base class can neither check children's extra attributes, neither can it raise an exception (e.g. in a meta class), because many derived classes (even some that define extra attributes) do not need
__eq__
.Includes