-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Select Rows and Table: Filtering string values #2176
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2176 +/- ##
=========================================
Coverage ? 72.46%
=========================================
Files ? 319
Lines ? 55142
Branches ? 0
=========================================
Hits ? 39957
Misses ? 15185
Partials ? 0 |
Orange/data/table.py
Outdated
@@ -1202,6 +1205,7 @@ def _filter_values_indicators(self, filter): | |||
sel = np.vectorize(f)(col) | |||
elif isinstance(f, (data_filter.FilterContinuous, | |||
data_filter.FilterString)): | |||
col = get_str_column(col, isinstance(f, data_filter.FilterString)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why calling a (trivial) function instead of doing it right here? It further obscures the logic of this code, which is already too contrived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Radon and cyclomatic complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We measure cyclomatic complexity to make the code more, not less readable. :)
Find another way to decrease the complexity. Cheating with dictionaries typically makes the code slower or more complex, but in this case dictionaries are natural and fast. You can prepare a dictionary (as a class attribute, not a local variable)
operators = {
f.Equal: operator.eq,
f.NotEqual: operator.neq,
f.Less: operator.le,
...
}
Then you can replace most if-s with a single col = ops[f.oper](col, fmin)
.
Perhaps this does not work for some reason. In this case, find another solution to simplify the function. Extracting individual ifs to local functions is not a proper solution to high cyclomatic complexity (except when it makes the code actually better).
You can also extract the code the is within the for loop into a separate function. This would make sense since this is the code that applies the filter to a selection. If this is still not enough, make a separate function for each filter type (elifs in this block of code), and extract the function that prepares the initial sel
column.
Orange/data/table.py
Outdated
sel += col | ||
else: | ||
raise TypeError("Invalid filter") | ||
sel = self._filter_by_condition(f, conjunction, sel, data_filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better than before, but not by much. If there's a simple and obvious a way to make _filter_by_condition
more legible, do so. (I confess: that's my old code.) If there's not, forget it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what steps do I need to take to reproduce the problem, nor what is the actual change in this PR.
Please update the description with steps to reproduce, add a tests that performs those steps (if possible) and split the PR into two commits, one with the fix and another with refactoring (which was probably done to satisfy radon).
Orange/data/table.py
Outdated
@@ -1133,6 +1133,99 @@ def _filter_same_value(self, column, value, negate=False): | |||
sel = np.logical_not(sel) | |||
return self.from_table_rows(self, sel) | |||
|
|||
def _filter_by_condition(self, f, conjunction, sel, data_filter): # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please stop abusing pragma: no cover.
This method should be tested, along with every other line of code in the table.py module. If codecov complains that the code you have moved around (but not changed) is not covered, make sure that the refactoring is done in a separate commit (so that the change can be inspected separately) and state that in the comment. Or write some tests. Basically any other way to approach it is better than marking actual code with "no cover".
@astaric: Added a description how to reproduce the problem. |
Split the method into two, _filter_values_to_indicator handles multiple conditions, conjunctions/disjunctions and negations, while _filter_to_indicator handles application of a single ValueFilter.
Extract handling of FilterDiscrete, FilterContinuous and FitlerString to separate functions, each handling its own specifics.
Thank you. I have refactored the test a bit and split the _filter_values_indicator into multiple smaller functions (undoing the 383bab2 refactoring commit in the process). The old _filter_values_indicators tried to handle both conjunctions/disjunctions and specific filters at the same time. I have split that code into two functions. Furthermore, evaluation of the most complicated filters (Discrete, Continuous, String) has been extracted to separate methods. |
@janezd can you take a look at the new refactoring? I am almost sad that this is going to be thrown away once we move to pandas :) |
FWIW I totally don't mind if you do the throwing. 😄 I believe a rebase should mitigate the failing test. |
A restart worked as well :) |
Issue
Table throws error when trying to compare string values. A string value written in Select Rows widgets and a numerical value which is set as string.
https://sentry.io/biolab/orange3/issues/244859655/
Description of changes
Steps to reproduce the behavior
Put File and Select Rows widgets and connect them. Then open iris. Set one of four features to string and meta. Then choose that feature in Select Rows and choose filter "is before" or some other one.
Includes