[ENH] Scatter Plot Graph and Scaling can handle metas #2699

jerneju · 2017-10-20T14:37:50Z

Issue

Scatter Plot moves primitive meas to attributes.

Description of changes

Includes

Code changes
Tests
Documentation

janezd

This is so much simpler. All widgets should use models for combos/lists of variables.

janezd · 2017-10-20T15:45:39Z

Orange/widgets/utils/scaling.py

-                no_jit[index] /= 2 * len(attr.values)
+                c *= 2
+                c += 1
+                c /= 2 * len(attr.values)


Isn't (2x+1)/(2a) same as (x + 0.5)/a? Aren't the above three lines the same as

c += 0.5 c /= len(attr.values)

After rewriting it this way, I also understand what the code actually does. :) Moves the data into the middle of intervals and normalizes it to [0, 1].

I know this was here before, but only God knows why it was written like this.

janezd · 2017-10-20T15:46:07Z

Orange/widgets/utils/scaling.py


    def _compute_jittered_data(self):
        data = self.data
        self.jittered_data = self.scaled_data.copy()
        random = np.random.RandomState(seed=self.jitter_seed)
-        for index, col in enumerate(self.jittered_data):
+        domain = self.domain
+        for attr in domain.attributes + domain.class_vars + domain.metas:


You can use chain(domain, domain.metas) in such situations.

janezd · 2017-10-20T15:48:08Z

Orange/widgets/utils/scaling.py

-        self.data = data
+        domain = data.domain
+        self.primitives = data.domain.attributes + data.domain.class_vars + \
+                          tuple([v for v in data.domain.metas if v.is_primitive()])


You could use chain(data.domain, (v for v in data.domain.metas if v.is_primitive())).

If you keep it as it is: don't use tuple([...]): just use generators, tuple(...), without constructing a list first.

janezd · 2017-10-20T15:48:31Z

Orange/widgets/utils/scaling.py

+                          tuple([v for v in data.domain.metas if v.is_primitive()])
+        new_domain = Domain(attributes=domain.attributes,
+                            class_vars=domain.class_vars,
+                            metas=tuple([v for v in domain.metas if v.is_primitive()]))


Same - no need to construct a list first.

janezd · 2017-10-20T15:49:52Z

Orange/widgets/utils/scaling.py

+        col = self.jittered_data.get_column_view(attr)[0]
+        col = 1 - col
+        col = self.scaled_data.get_column_view(attr)[0]
+        col = 1 - col


These four lines have no effect, do they?

Did you mean something like this?

col *= -1 col += 1

Or (shorter, but maybe slower) col[:] = 1 - col.

Yes, the latter one can consume even 4 times more time.

janezd · 2017-10-20T15:54:35Z

Orange/widgets/unsupervised/owmds.py

-
-        attributes = self.data.domain.attributes + (self.variable_x, self.variable_y) + \
-                     primitive_metas
+        attributes = data.domain.attributes + (self.variable_x, self.variable_y)


@BlazZupan, do we want the columns that contain x and y coordinates from the MDS to appear as ordinary attributes or as metas?

They used to be in metas, but now that visualization widgets can show metas as well, I would tend to store x and y as metas to not pollute that feature set.

Actually this line here is not related to that (it's for constructing data for plotting).
Output data has new coordinates in metas as we want, see commit function below (L671)

janezd · 2017-10-20T15:55:52Z

Orange/widgets/utils/scaling.py

-            attr = self.domain[index]
+            getCached(self.data, DomainBasicStats, (self.data, True))
+        domain = self.domain
+        for attr in domain.attributes + domain.class_vars + domain.metas:


Maybe chain(domain, domain.metas)?

To be sincere, I would prefer domain.all_variables or something similar. Like we have domain.variables = domain.attributes + domain.class_vars we could have something that includes all three: attributes, class_vars and metas.

I hate this, too.

When one gets older, (s)he start adding historic notes to everything. So let me say that in the beginning, like 20 years ago, Orange used to have attributes and classes, and both together were called variables. Then I introduced meta attributes, and they didn't mean the same thing as today: meta attributes were (essentially) sparse, while attributes and classes were dense ("essentially", because there was no numpy, these were C++ structures). Since all code - before the introduction of metas - expected that variables were attributes + classes, I would break everything if I added metas there. It also made more sense than today since variables (=attributes + classes) were the dense part of the data.

The idea that any part can be dense or sparse, and that metas are not so different from attributes and so forth evolved gradually even in Orange 3. Hence we have not initially decided that variables will be attributes + class + meta. To make things worse, we (most probably: I) extended this bad design to iteration over Domain. @kernc (I think) correctly pointed out that since var in domain returns True also when var is in domain.metas, so for var in domain should iterate over attributes, classes and also metas. So in all the places where you could use chain(domain, domain.metas) in your code, just domain should suffice instead. Alas, fixing the iteration over Domain would break so much code that we gave up. This horrible design decision stays forever, I fear.

I don't like all_variables. variables should already be all, so all_variables sounds almost like really_all_variables. Plus, the next thing you'll want will be primitive_variables. And maybe also continuous_variables. I wouldn't like to multiply the number of domains properties.

For the sake of brainstorming - and if this is indeed necessary - what about having a method Domain.all(condition). If condition is a type derived from Variable, all returns all variables of that type (and derived types). Hence , Domain.all(DiscreteVariable) returns all discrete variables. We then add PrimitiveVariable as a new common base class for DiscreteVariable and ContinuousVariable. With that, Domain.all(PrimitiveVariable) returns all primitive variables. Finally, the default value for condition is Variable, so Domain.all() returns attributes + class_vars + metas.

Method all could also accept other types of conditions if we can think of any.

Just for brainstorming.

Great idea, I implemented it.

janezd · 2017-10-20T16:02:13Z

Orange/widgets/utils/scaling.py

        """
        Get array of 0 and 1 of len = len(self.data). If there is a missing
        value at any attribute in indices return 0 for that instance.
        """
        if self.valid_data_array is None or len(self.valid_data_array) == 0:
            return np.array([], np.bool)
+        domain = self.domain
+        indices = []
+        for index, attr in enumerate(domain.attributes + domain.class_vars + domain.metas):


enumerate(chain(domain, domain.metas))?

kernc · 2017-10-23T11:38:44Z

Orange/data/domain.py

@@ -211,6 +211,9 @@ def variables(self):
    def metas(self):
        return self._metas

+    def all(self, condition=Variable):
+        return [v for v in chain(self._variables, self._metas) if isinstance(v, condition)]


.all() is not good. The idiom means something completely different in Python/NumPy context. .filter() would be better, .where() would be best.

This change should require re-touching everywhere the former ([]+[] or chain) idiom was used.

I don't like where. :)

all reads well: I want all discrete variables, hence domain.all(DiscreteVariable). But I agree the name is wrong wrt its usual meaning in Python.

domain.where(DiscreteVariable) doesn't read so well and it also means two different things in numpy, one of which is similar to this Domain.where (but with different arguments) and one which is completely different.

domain.filter(DiscreteVariable) is sort of fine, but it gives an impression that it will return a new instance of Domain with just discrete variables.

What about domain.collect(DiscreteVariable)? Or domain.pick(DiscreteVariable) -- though I prefer collect?

codecov-io · 2017-10-23T11:47:57Z

Codecov Report

Merging #2699 into master will decrease coverage by 0.01%.
The diff coverage is 88.39%.

@@            Coverage Diff             @@
##           master    #2699      +/-   ##
==========================================
- Coverage   76.06%   76.05%   -0.02%     
==========================================
  Files         338      338              
  Lines       59675    59670       -5     
==========================================
- Hits        45391    45381      -10     
- Misses      14284    14289       +5

astaric · 2017-10-23T17:49:28Z

Orange/data/domain.py

@@ -211,6 +211,9 @@ def variables(self):
    def metas(self):
        return self._metas

+    def where(self, condition=Variable):


Could you do this in a separate PR?
While I agree that this change is neatly hidden here, I would prefer to have modifications to Variables / Domain in a separate, clearly marked PR (where we can discuss different alternatives).

astaric · 2017-10-23T17:50:29Z

Orange/widgets/utils/scaling.py

-            attr = self.domain[index]
+            getCached(self.data, DomainBasicStats, (self.data, True))
+        domain = self.domain
+        for attr in domain.where():


This looks weird.

Continued in #2706 .

jerneju · 2017-11-03T12:27:47Z

https://sentry.io/biolab/orange3/issues/307091588/

lanzagar · 2017-11-06T13:07:23Z

Orange/widgets/visualize/owscatterplot.py

+        cols = []
+        for var in attrs:
+            cols.append(graph.jittered_data.get_column_view(var)[0][valid])
+        X = np.vstack(cols).T


X = np.column_stack(cols)

lanzagar · 2017-11-07T10:27:00Z

Orange/widgets/utils/scaling.py

+            all_data = (data.X, Y, data.metas)
+        else:
+            all_data = (data.X, Y)
+        all_data = np.hstack(all_data).T


This can have dtype==object when stacking with metas which will cause np.isfinite to fail 2 lines below

Thanks. Fixed.

Fixed here, but I get similar errors elsewhere (see sentry report).
Try fixing all metas dtype problems (converting all primitive metas to float might be enough, maybe something else is also needed). Then please test thoroughly! Use at least the workflow from the sentry report.

I found another error caused by Corpus (issue: biolab/orange3-text#324 )

jerneju · 2017-11-07T10:52:06Z

https://sentry.io/biolab/orange3/issues/400459027/

[ENH] Scatter Plot Graph and Scaling can handle metas

janezd reviewed Oct 20, 2017

View reviewed changes

kernc previously requested changes Oct 23, 2017

View reviewed changes

astaric reviewed Oct 23, 2017

View reviewed changes

jerneju mentioned this pull request Oct 24, 2017

[RFC] Domain: return all variables #2706

Closed

3 tasks

jerneju changed the title ~~[WIP][ENH] Scatter Plot Graph and Scaling can handle metas~~ [ENH] Scatter Plot Graph and Scaling can handle metas Oct 30, 2017

jerneju added this to the 3.8 milestone Nov 3, 2017

lanzagar reviewed Nov 6, 2017

View reviewed changes

lanzagar requested changes Nov 7, 2017

View reviewed changes

jerneju added 5 commits November 10, 2017 08:45

Scatter Plot Graph and Scaling: handle metas

9b262f2

Scatter Plot: no more metas handling

ca4db00

Scatter Plot: fix VizRank

7738e5a

MDS: no more metas handling

81247dd

Scatter Plot: add metas test

0fdccda

lanzagar approved these changes Nov 10, 2017

View reviewed changes

lanzagar merged commit 45d8402 into biolab:master Nov 10, 2017

jerneju deleted the spg-scaling-metas branch November 10, 2017 10:22

markotoplak modified the milestones: 3.8, 3.7.1 Nov 10, 2017

jerneju mentioned this pull request Nov 14, 2017

[FIX] Scatter Plot: subset data #2773

Merged

3 tasks

nikicc pushed a commit to nikicc/orange3 that referenced this pull request Nov 17, 2017

Merge pull request biolab#2699 from jerneju/spg-scaling-metas

b7545b3

[ENH] Scatter Plot Graph and Scaling can handle metas

jerneju mentioned this pull request Jan 18, 2018

[FIX] t-SNE: remove metas handling biolab/orange3-single-cell#48

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Scatter Plot Graph and Scaling can handle metas #2699

[ENH] Scatter Plot Graph and Scaling can handle metas #2699

jerneju commented Oct 20, 2017 •

edited

Loading

janezd left a comment

janezd Oct 20, 2017

janezd Oct 20, 2017

janezd Oct 20, 2017

janezd Oct 20, 2017

janezd Oct 20, 2017

jerneju Oct 23, 2017

janezd Oct 20, 2017

lanzagar Nov 7, 2017

janezd Oct 20, 2017

jerneju Oct 20, 2017 •

edited

Loading

janezd Oct 20, 2017

janezd Oct 20, 2017

jerneju Oct 23, 2017

janezd Oct 20, 2017

kernc Oct 23, 2017 •

edited

Loading

jerneju Oct 23, 2017

janezd Oct 23, 2017

codecov-io commented Oct 23, 2017 •

edited

Loading

astaric Oct 23, 2017

astaric Oct 23, 2017

janezd Oct 23, 2017

jerneju commented Nov 3, 2017

lanzagar Nov 6, 2017

lanzagar Nov 7, 2017

jerneju Nov 7, 2017

lanzagar Nov 7, 2017

jerneju Nov 8, 2017 •

edited

Loading

jerneju commented Nov 7, 2017

[ENH] Scatter Plot Graph and Scaling can handle metas #2699

[ENH] Scatter Plot Graph and Scaling can handle metas #2699

Conversation

jerneju commented Oct 20, 2017 • edited Loading

Issue

Description of changes

Includes

janezd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerneju Oct 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kernc Oct 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Oct 23, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerneju commented Nov 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerneju Nov 8, 2017 • edited Loading

Choose a reason for hiding this comment

jerneju commented Nov 7, 2017

jerneju commented Oct 20, 2017 •

edited

Loading

jerneju Oct 20, 2017 •

edited

Loading

kernc Oct 23, 2017 •

edited

Loading

codecov-io commented Oct 23, 2017 •

edited

Loading

jerneju Nov 8, 2017 •

edited

Loading