[FIX] Continuize: Disable normalizing sparse data #4379

VesnaT · 2020-01-30T08:20:11Z

Issue

Fixes #4378

Description of changes

Disable Normalize by span and Normalize by standard deviation radio buttons for sparse datasets.

Includes

Code changes
Tests
Documentation

codecov · 2020-01-30T08:31:11Z

Codecov Report

Merging #4379 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4379      +/-   ##
==========================================
+ Coverage   87.13%   87.14%   +<.01%     
==========================================
  Files         399      399              
  Lines       72901    72936      +35     
==========================================
+ Hits        63521    63557      +36     
+ Misses       9380     9379       -1

markotoplak · 2020-01-30T08:57:26Z

What kind of normalization can we still have with sparse data then? We could still do normalization that does some division or multiplication, we just have to avoid shifts (plus, minus).

Another thing: I never associated sparse data and discrete values, but yes, why not... Where do we get discrete sparse data in Orange? Is ti directly read from a file or generated by some text-mining widget?

ajdapretnar · 2020-01-30T09:02:06Z

🤔 I think that if you have some discrete variables in the corpus before bag of words, they would inevitably get transformed into sparse alongside words.

VesnaT · 2020-01-30T09:03:02Z

The example is described in the issue and yes, it does not make sense but it still should not crash.

markotoplak · 2020-01-30T09:08:16Z

Yes, but using appropriate operations for sparse data would be better than disabling options. Normalization by span is something that could still be done for sparse data, it should just not be centered.

VesnaT · 2020-01-30T09:19:00Z

Some types of normalization can be done using a Preprocess widget. Should these be added to the Continuize widget as well?

Continuize: Disable normalizing sparse data

f0523b5

janezd added the needs discussion Core developers need to discuss the issue label Feb 13, 2020

janezd assigned markotoplak Feb 14, 2020

markotoplak merged commit 0a65399 into biolab:master Feb 14, 2020

markotoplak mentioned this pull request Feb 14, 2020

Unify normalization in Preprocess and Continuize widgets #4418

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Continuize: Disable normalizing sparse data #4379

[FIX] Continuize: Disable normalizing sparse data #4379

VesnaT commented Jan 30, 2020

codecov bot commented Jan 30, 2020

markotoplak commented Jan 30, 2020

ajdapretnar commented Jan 30, 2020

VesnaT commented Jan 30, 2020 •

edited

Loading

markotoplak commented Jan 30, 2020

VesnaT commented Jan 30, 2020

[FIX] Continuize: Disable normalizing sparse data #4379

[FIX] Continuize: Disable normalizing sparse data #4379

Conversation

VesnaT commented Jan 30, 2020

Issue

Description of changes

Includes

codecov bot commented Jan 30, 2020

Codecov Report

markotoplak commented Jan 30, 2020

ajdapretnar commented Jan 30, 2020

VesnaT commented Jan 30, 2020 • edited Loading

markotoplak commented Jan 30, 2020

VesnaT commented Jan 30, 2020

VesnaT commented Jan 30, 2020 •

edited

Loading