[FIX] Edit Domain (and perhaps other widgets) could cause missing data later in the workflow #4922

janezd · 2020-07-24T14:38:40Z

Issue

Equality of variables also checks equality of their compute_value. This can only work if Transformation and derived classes have __eq__ and __hash__.

To fix #4895, it would suffice to change oweditdomain.LookupMappingTransform. The PR does it for other classes too, to prevent similar future mishaps.

Description of changes

Define __eq__ and __hash__ where needed.

Any add-ons that don't have will now misbehave in a different way -- before this, __eq__ always returned False, even when transforms were the same. Now it will sometimes return True even though they may be different. This will happen if the class defines additional fields that are not checked by the base class.

The base class can neither check children's extra attributes, neither can it raise an exception (e.g. in a meta class), because many derived classes (even some that define extra attributes) do not need __eq__.

Includes

Code changes
Tests

codecov · 2020-07-24T19:58:03Z

Codecov Report

Merging #4922 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4922      +/-   ##
==========================================
+ Coverage   84.12%   84.14%   +0.02%     
==========================================
  Files         283      278       -5     
  Lines       57819    57045     -774     
==========================================
- Hits        48640    48001     -639     
+ Misses       9179     9044     -135

lanzagar · 2020-07-31T09:05:25Z

Orange/preprocess/transformation.py

+
+        return super().__eq__(other) \
+               and len(self.lookup_table) == len(other.lookup_table) \
+               and all(nan_equal(x, y) or np.isnan(x) and np.isnan(y)


nan_equal already handles nans so the second part after or is probably a leftover from before nan_equal was introduced?

lanzagar · 2020-07-31T09:28:46Z

Orange/preprocess/transformation.py

@@ -156,3 +160,17 @@ def transform(self, column):
        column[mask] = 0
        values = self.lookup_table[column]


Noticed this in passing - the docs say self.lookup_table can also be a list or tuple, but this looks like it will not work for native python types (indexing with a np.ndarray)

lanzagar · 2020-07-31T09:50:52Z

Orange/preprocess/transformation.py

+               and len(self.lookup_table) == len(other.lookup_table) \
+               and all(nan_equal(x, y) or np.isnan(x) and np.isnan(y)
+                       for x, y in zip(self.lookup_table, other.lookup_table)) \
+               and nan_equal(self.unknown, other.unknown)


I was surprised that we do not have a helper function somewhere in utils already for checking if contents of two iterables are the same that handles nans. Looks like something we would need in several places (maybe we just solve it locally every time :)).

Anyway, how about:

a = np.array(self.lookup_table) b = np.array(other.lookup_table) ((a == b) | (np.isnan(a) & np.isnan(b))).all()

If the observation holds that the docs are incorrect and lookup_table already has to be a numpy array, then only the final line is needed.

I changed the docstring: lookup_table must obviously be an array. As for checking, I went for np.allclose(..., ..., equal_nan=True).

lanzagar · 2020-07-31T10:02:44Z

Orange/preprocess/transformation.py

    def __eq__(self, other):
        return type(other) is type(self) and self.variable == other.variable


How about the default __eq__ for Transformation along these lines:

return type(other) is type(self) and vars(self) == vars(other)

Seems that that is usually what we want and it would make a lot of trivial overloads of eq unnecessary?

__hash__ is a bit more tricky since class attributes can often be unhashable objects. We could leave it as it is or try hashing a sorted tuple of vars(self). The latter should work for simple cases like Indicator and at least fails more noticably if it should have been overloaded in a subclass but was not!

I'm hesitant here. It looks like a good idea, but making the parent class to smart might shoot a derived class in the foot. Somebody could add an attribute without realizing it's used in comparisons. Maybe it's better to be explicit.

You convinced me at first, but then again, you can make an equivalent claim for the other side:
Somebody could add an attribute without realizing it's NOT used in comparisons.

Maybe I preferred comparing everything (vs the minimum) because it behaves closer to how it was before - if some class doesn't overload eq it might erroneously return False (like up to now), but a True should be correct.

In the end it does not really matter, if something is not working like it should it's a bug to be fixed in either case.
I don't mind merging it with either default.

janezd force-pushed the transformations-eq branch 2 times, most recently from 02f4c70 to 1b33b03 Compare July 24, 2020 15:11

Transformation (and derived classes): Add __eq__ and __hash__ operators

6c1c324

janezd force-pushed the transformations-eq branch from 1b33b03 to 6c1c324 Compare July 24, 2020 19:43

lanzagar reviewed Jul 31, 2020

View reviewed changes

Lookup: Assume that lookup_table is an array

ba3e1c4

janezd force-pushed the transformations-eq branch from e58036f to ba3e1c4 Compare July 31, 2020 15:14

lanzagar merged commit f899e95 into biolab:master Aug 7, 2020

janezd mentioned this pull request May 20, 2022

Deepcopied variables are not equal to their originals #5983

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Edit Domain (and perhaps other widgets) could cause missing data later in the workflow #4922

[FIX] Edit Domain (and perhaps other widgets) could cause missing data later in the workflow #4922

janezd commented Jul 24, 2020 •

edited

Loading

codecov bot commented Jul 24, 2020 •

edited

Loading

lanzagar Jul 31, 2020

lanzagar Jul 31, 2020

lanzagar Jul 31, 2020

janezd Jul 31, 2020

lanzagar Jul 31, 2020

janezd Jul 31, 2020

lanzagar Aug 7, 2020

		@@ -156,3 +160,17 @@ def transform(self, column):
		column[mask] = 0
		values = self.lookup_table[column]

		def __eq__(self, other):
		return type(other) is type(self) and self.variable == other.variable

[FIX] Edit Domain (and perhaps other widgets) could cause missing data later in the workflow #4922

[FIX] Edit Domain (and perhaps other widgets) could cause missing data later in the workflow #4922

Conversation

janezd commented Jul 24, 2020 • edited Loading

Issue

Description of changes

Includes

codecov bot commented Jul 24, 2020 • edited Loading

Codecov Report

lanzagar Jul 31, 2020

Choose a reason for hiding this comment

lanzagar Jul 31, 2020

Choose a reason for hiding this comment

lanzagar Jul 31, 2020

Choose a reason for hiding this comment

janezd Jul 31, 2020

Choose a reason for hiding this comment

lanzagar Jul 31, 2020

Choose a reason for hiding this comment

janezd Jul 31, 2020

Choose a reason for hiding this comment

lanzagar Aug 7, 2020

Choose a reason for hiding this comment

janezd commented Jul 24, 2020 •

edited

Loading

codecov bot commented Jul 24, 2020 •

edited

Loading