Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Edit Domain (and perhaps other widgets) could cause missing data later in the workflow #4922

Merged
merged 2 commits into from
Aug 7, 2020

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Jul 24, 2020

Issue

Fixes #4895.

Equality of variables also checks equality of their compute_value. This can only work if Transformation and derived classes have __eq__ and __hash__.

To fix #4895, it would suffice to change oweditdomain.LookupMappingTransform. The PR does it for other classes too, to prevent similar future mishaps.

Description of changes

Define __eq__ and __hash__ where needed.

Any add-ons that don't have will now misbehave in a different way -- before this, __eq__ always returned False, even when transforms were the same. Now it will sometimes return True even though they may be different. This will happen if the class defines additional fields that are not checked by the base class.

The base class can neither check children's extra attributes, neither can it raise an exception (e.g. in a meta class), because many derived classes (even some that define extra attributes) do not need __eq__.

Includes
  • Code changes
  • Tests

@janezd janezd force-pushed the transformations-eq branch 2 times, most recently from 02f4c70 to 1b33b03 Compare July 24, 2020 15:11
@codecov
Copy link

codecov bot commented Jul 24, 2020

Codecov Report

Merging #4922 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4922      +/-   ##
==========================================
+ Coverage   84.12%   84.14%   +0.02%     
==========================================
  Files         283      278       -5     
  Lines       57819    57045     -774     
==========================================
- Hits        48640    48001     -639     
+ Misses       9179     9044     -135     


return super().__eq__(other) \
and len(self.lookup_table) == len(other.lookup_table) \
and all(nan_equal(x, y) or np.isnan(x) and np.isnan(y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nan_equal already handles nans so the second part after or is probably a leftover from before nan_equal was introduced?

@@ -156,3 +160,17 @@ def transform(self, column):
column[mask] = 0
values = self.lookup_table[column]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed this in passing - the docs say self.lookup_table can also be a list or tuple, but this looks like it will not work for native python types (indexing with a np.ndarray)

Comment on lines 169 to 172
and len(self.lookup_table) == len(other.lookup_table) \
and all(nan_equal(x, y) or np.isnan(x) and np.isnan(y)
for x, y in zip(self.lookup_table, other.lookup_table)) \
and nan_equal(self.unknown, other.unknown)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised that we do not have a helper function somewhere in utils already for checking if contents of two iterables are the same that handles nans. Looks like something we would need in several places (maybe we just solve it locally every time :)).

Anyway, how about:

a = np.array(self.lookup_table)
b = np.array(other.lookup_table)
((a == b) | (np.isnan(a) & np.isnan(b))).all()

If the observation holds that the docs are incorrect and lookup_table already has to be a numpy array, then only the final line is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the docstring: lookup_table must obviously be an array. As for checking, I went for np.allclose(..., ..., equal_nan=True).

Comment on lines 51 to 52
def __eq__(self, other):
return type(other) is type(self) and self.variable == other.variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the default __eq__ for Transformation along these lines:

return type(other) is type(self) and vars(self) == vars(other)

Seems that that is usually what we want and it would make a lot of trivial overloads of eq unnecessary?

__hash__ is a bit more tricky since class attributes can often be unhashable objects. We could leave it as it is or try hashing a sorted tuple of vars(self). The latter should work for simple cases like Indicator and at least fails more noticably if it should have been overloaded in a subclass but was not!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant here. It looks like a good idea, but making the parent class to smart might shoot a derived class in the foot. Somebody could add an attribute without realizing it's used in comparisons. Maybe it's better to be explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You convinced me at first, but then again, you can make an equivalent claim for the other side:
Somebody could add an attribute without realizing it's NOT used in comparisons.

Maybe I preferred comparing everything (vs the minimum) because it behaves closer to how it was before - if some class doesn't overload eq it might erroneously return False (like up to now), but a True should be correct.

In the end it does not really matter, if something is not working like it should it's a bug to be fixed in either case.
I don't mind merging it with either default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Color node is destructive
2 participants