[FIX] Select Rows and Table: Filtering string values #2176

jerneju · 2017-03-31T11:20:49Z

Issue

Table throws error when trying to compare string values. A string value written in Select Rows widgets and a numerical value which is set as string.
https://sentry.io/biolab/orange3/issues/244859655/

Description of changes

Steps to reproduce the behavior

Put File and Select Rows widgets and connect them. Then open iris. Set one of four features to string and meta. Then choose that feature in Select Rows and choose filter "is before" or some other one.

Includes

Code changes
Tests
Documentation

jerneju · 2017-03-31T11:25:16Z

https://sentry.io/biolab/orange3/issues/244859655/

codecov-io · 2017-03-31T11:37:34Z

Codecov Report

❗ No coverage uploaded for pull request base (master@8e2f09b). Click here to learn what that means.
The diff coverage is 97.5%.

@@            Coverage Diff            @@
##             master    #2176   +/-   ##
=========================================
  Coverage          ?   72.46%           
=========================================
  Files             ?      319           
  Lines             ?    55142           
  Branches          ?        0           
=========================================
  Hits              ?    39957           
  Misses            ?    15185           
  Partials          ?        0

janezd · 2017-03-31T21:50:22Z

Orange/data/table.py

@@ -1202,6 +1205,7 @@ def _filter_values_indicators(self, filter):
                sel = np.vectorize(f)(col)
            elif isinstance(f, (data_filter.FilterContinuous,
                                data_filter.FilterString)):
+                col = get_str_column(col, isinstance(f, data_filter.FilterString))


Why calling a (trivial) function instead of doing it right here? It further obscures the logic of this code, which is already too contrived.

Radon and cyclomatic complexity.

We measure cyclomatic complexity to make the code more, not less readable. :)

Find another way to decrease the complexity. Cheating with dictionaries typically makes the code slower or more complex, but in this case dictionaries are natural and fast. You can prepare a dictionary (as a class attribute, not a local variable)

operators = { f.Equal: operator.eq, f.NotEqual: operator.neq, f.Less: operator.le, ... }

Then you can replace most if-s with a single col = ops[f.oper](col, fmin).

Perhaps this does not work for some reason. In this case, find another solution to simplify the function. Extracting individual ifs to local functions is not a proper solution to high cyclomatic complexity (except when it makes the code actually better).

You can also extract the code the is within the for loop into a separate function. This would make sense since this is the code that applies the filter to a selection. If this is still not enough, make a separate function for each filter type (elifs in this block of code), and extract the function that prepares the initial sel column.

janezd · 2017-04-13T18:16:10Z

Orange/data/table.py

-                    sel += col
-            else:
-                raise TypeError("Invalid filter")
+            sel = self._filter_by_condition(f, conjunction, sel, data_filter)


This is better than before, but not by much. If there's a simple and obvious a way to make _filter_by_condition more legible, do so. (I confess: that's my old code.) If there's not, forget it.

astaric

I have no idea what steps do I need to take to reproduce the problem, nor what is the actual change in this PR.

Please update the description with steps to reproduce, add a tests that performs those steps (if possible) and split the PR into two commits, one with the fix and another with refactoring (which was probably done to satisfy radon).

astaric · 2017-04-24T20:10:50Z

Orange/data/table.py

@@ -1133,6 +1133,99 @@ def _filter_same_value(self, column, value, negate=False):
            sel = np.logical_not(sel)
        return self.from_table_rows(self, sel)

+    def _filter_by_condition(self, f, conjunction, sel, data_filter):  # pragma: no cover


Please stop abusing pragma: no cover.

This method should be tested, along with every other line of code in the table.py module. If codecov complains that the code you have moved around (but not changed) is not covered, make sure that the refactoring is done in a separate commit (so that the change can be inspected separately) and state that in the comment. Or write some tests. Basically any other way to approach it is better than marking actual code with "no cover".

jerneju · 2017-04-26T08:54:23Z

https://sentry.io/biolab/orange3/issues/260544377/

jerneju · 2017-04-26T08:58:37Z

@astaric: Added a description how to reproduce the problem.

https://sentry.io/biolab/orange3/issues/244859655/

Split the method into two, _filter_values_to_indicator handles multiple conditions, conjunctions/disjunctions and negations, while _filter_to_indicator handles application of a single ValueFilter.

Extract handling of FilterDiscrete, FilterContinuous and FitlerString to separate functions, each handling its own specifics.

astaric · 2017-04-26T17:00:45Z

Thank you.

I have refactored the test a bit and split the _filter_values_indicator into multiple smaller functions (undoing the 383bab2 refactoring commit in the process).

The old _filter_values_indicators tried to handle both conjunctions/disjunctions and specific filters at the same time. I have split that code into two functions. Furthermore, evaluation of the most complicated filters (Discrete, Continuous, String) has been extracted to separate methods.

astaric · 2017-04-26T17:02:03Z

@janezd can you take a look at the new refactoring? I am almost sad that this is going to be thrown away once we move to pandas :)

kernc · 2017-04-26T18:35:10Z

I am almost sad that this is going to be thrown away once we move to pandas

FWIW I totally don't mind if you do the throwing. 😄

I believe a rebase should mitigate the failing test.

astaric · 2017-04-27T05:43:01Z

A restart worked as well :)

janezd reviewed Mar 31, 2017

View reviewed changes

astaric added this to the 3.4.2 milestone Apr 7, 2017

astaric self-assigned this Apr 7, 2017

janezd reviewed Apr 13, 2017

View reviewed changes

astaric modified the milestones: 3.4.2, future Apr 19, 2017

astaric requested changes Apr 24, 2017

View reviewed changes

jerneju and others added 3 commits April 26, 2017 14:04

[FIX] Table: Filtering string values

030f536

https://sentry.io/biolab/orange3/issues/244859655/

Refactor test

5ef044e

Table: refactor _filter_values_indicators

885676b

Split the method into two, _filter_values_to_indicator handles multiple conditions, conjunctions/disjunctions and negations, while _filter_to_indicator handles application of a single ValueFilter.

astaric approved these changes Apr 26, 2017

View reviewed changes

Table: refactor _values_filter_to_indicator

d47acc8

Extract handling of FilterDiscrete, FilterContinuous and FitlerString to separate functions, each handling its own specifics.

astaric added 2 commits April 28, 2017 09:38

Fix upload of coverage report

200ee9e

table: Add test for FilterStringList

f379c0b

astaric merged commit b35b050 into biolab:master Apr 28, 2017

jerneju deleted the string-filter-table branch May 3, 2017 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Select Rows and Table: Filtering string values #2176

[FIX] Select Rows and Table: Filtering string values #2176

jerneju commented Mar 31, 2017 •

edited

Loading

jerneju commented Mar 31, 2017

codecov-io commented Mar 31, 2017 •

edited

Loading

janezd Mar 31, 2017

jerneju Apr 3, 2017

janezd Apr 3, 2017

janezd Apr 13, 2017

astaric left a comment

astaric Apr 24, 2017

jerneju commented Apr 26, 2017

jerneju commented Apr 26, 2017

astaric commented Apr 26, 2017

astaric commented Apr 26, 2017

kernc commented Apr 26, 2017 •

edited

Loading

astaric commented Apr 27, 2017

[FIX] Select Rows and Table: Filtering string values #2176

[FIX] Select Rows and Table: Filtering string values #2176

Conversation

jerneju commented Mar 31, 2017 • edited Loading

Issue

Description of changes

Steps to reproduce the behavior

Includes

jerneju commented Mar 31, 2017

codecov-io commented Mar 31, 2017 • edited Loading

Codecov Report

janezd Mar 31, 2017

Choose a reason for hiding this comment

jerneju Apr 3, 2017

Choose a reason for hiding this comment

janezd Apr 3, 2017

Choose a reason for hiding this comment

janezd Apr 13, 2017

Choose a reason for hiding this comment

astaric left a comment

Choose a reason for hiding this comment

astaric Apr 24, 2017

Choose a reason for hiding this comment

jerneju commented Apr 26, 2017

jerneju commented Apr 26, 2017

astaric commented Apr 26, 2017

astaric commented Apr 26, 2017

kernc commented Apr 26, 2017 • edited Loading

astaric commented Apr 27, 2017

jerneju commented Mar 31, 2017 •

edited

Loading

codecov-io commented Mar 31, 2017 •

edited

Loading

kernc commented Apr 26, 2017 •

edited

Loading