Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add an option to Concatenate to merge columns with different formulae #4831

Merged
merged 3 commits into from
Jun 5, 2020

Conversation

janezd
Copy link
Contributor

@janezd janezd commented May 28, 2020

Issue

Ref gh-4791.

Description of changes

compute_value is removed only when two variables with the same name are merged, that is, if a variable of the same type appears in the same part (attributes, classes, metas) with different compute_value. (The widget does not merge variables that appear in different parts.)

The option applies only in absence of primary table, so the related checkbox is in a box that is enabled only when there is no primary table.

Includes
  • Code changes
  • Tests
  • Documentation

@janezd janezd force-pushed the concatenate-ignore-compute-value branch from 5498f8f to a86c9b9 Compare May 28, 2020 19:50
@codecov
Copy link

codecov bot commented May 28, 2020

Codecov Report

Merging #4831 into master will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4831      +/-   ##
==========================================
- Coverage   84.12%   84.11%   -0.02%     
==========================================
  Files         282      277       -5     
  Lines       57327    56523     -804     
==========================================
- Hits        48229    47545     -684     
+ Misses       9098     8978     -120     

@janezd janezd force-pushed the concatenate-ignore-compute-value branch from 4c4c79a to 6583592 Compare May 30, 2020 19:00
@ajdapretnar
Copy link
Contributor

This PR works as intended for my use case.

However, it crashes on some Text data.
To reproduce: Corpus - Bag of Words - Concatenate. Add the second Corpus without BoW.

I didn't spend much time exploring this, but it seems like Concatenate treats corpora as Tables instead of Corpus. When creating a new instance, it should be the same as type as input. If types don't match, we already have an error (L285).

Traceback (most recent call last):
  File "/Users/ajda/orange/orange-canvas-core/orangecanvas/scheme/signalmanager.py", line 936, in __process_next
    if self.__process_next_helper(use_max_active=True):
  File "/Users/ajda/orange/orange-canvas-core/orangecanvas/scheme/signalmanager.py", line 974, in __process_next_helper
    self.process_node(selected_node)
  File "/Users/ajda/orange/orange-canvas-core/orangecanvas/scheme/signalmanager.py", line 605, in process_node
    self.send_to_node(node, signals_in)
  File "/Users/ajda/orange/orange-widget-base/orangewidget/workflow/widgetsscheme.py", line 792, in send_to_node
    self.process_signals_for_widget(node, widget, signals)
  File "/Users/ajda/orange/orange-widget-base/orangewidget/workflow/widgetsscheme.py", line 833, in process_signals_for_widget
    widget.handleNewSignals()
  File "/Users/ajda/orange/orange3/Orange/widgets/data/owconcatenate.py", line 198, in handleNewSignals
    self.unconditional_apply()
  File "/Users/ajda/orange/orange3/Orange/widgets/data/owconcatenate.py", line 240, in apply
    tables = [table.transform(domain) for table in tables]
  File "/Users/ajda/orange/orange3/Orange/widgets/data/owconcatenate.py", line 240, in <listcomp>
    tables = [table.transform(domain) for table in tables]
  File "/Users/ajda/orange/orange3/Orange/data/table.py", line 520, in transform
    return type(self).from_table(domain, self)
  File "/Users/ajda/orange/orange3/Orange/data/table.py", line 461, in from_table
    self.X = get_columns(row_indices, conversion.attributes, n_rows,
  File "/Users/ajda/orange/orange3/Orange/data/table.py", line 384, in get_columns
    col.compute_shared(source), \
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/vectorization/base.py", line 58, in __call__
    corpus = self.vectorizer.transform(corpus,
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/sentiment/__init__.py", line 37, in transform
    corpus = WordPunctTokenizer()(corpus)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/preprocess/tokenize.py", line 24, in __call__
    return self._store_tokens_from_documents(corpus, callback)
  File "/Users/ajda/orange/orange3-text/orangecontrib/text/preprocess/preprocess.py", line 76, in _store_tokens_from_documents
    tokens, n = [], len(corpus.pp_documents)
AttributeError: 'Table' object has no attribute 'pp_documents'

@janezd
Copy link
Contributor Author

janezd commented Jun 1, 2020

Thanks, @ajdapretnar. I know next to nothing (or rather nothing) about Corpus. I assume the problem is https://github.com/biolab/orange3/pull/4831/files#diff-ddccc373d231bf134feb661e12b5afeaR276. Could you please explore how to fix it? Could you, for a start, try replacing Orange.data.Table.from_numpy with type(table).from_numpy?

@janezd janezd force-pushed the concatenate-ignore-compute-value branch from 6583592 to fab7132 Compare June 1, 2020 15:06
@janezd
Copy link
Contributor Author

janezd commented Jun 1, 2020

As discussed on Slack, I now replaced Orange.data.Table with type(table).

@markotoplak
Copy link
Member

Works well, except that now widget does something else by default. Two possible solutions:

  • Set ignore_compute_value to False by default.
  • Set ignore_compute_value to default True and migrate old settings, so that ignore_compute_value is set to False.

@janezd, which one do you prefer?

@janezd janezd force-pushed the concatenate-ignore-compute-value branch from fab7132 to d3df2b0 Compare June 5, 2020 13:08
@janezd
Copy link
Contributor Author

janezd commented Jun 5, 2020

I changed the default to False. Not just for compatibility, this is probably a better default anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants