Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Continuize: Disable normalizing sparse data #4379

Merged
merged 1 commit into from
Feb 14, 2020

Conversation

VesnaT
Copy link
Contributor

@VesnaT VesnaT commented Jan 30, 2020

Issue

Fixes #4378

Description of changes

Disable Normalize by span and Normalize by standard deviation radio buttons for sparse datasets.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Jan 30, 2020

Codecov Report

Merging #4379 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4379      +/-   ##
==========================================
+ Coverage   87.13%   87.14%   +<.01%     
==========================================
  Files         399      399              
  Lines       72901    72936      +35     
==========================================
+ Hits        63521    63557      +36     
+ Misses       9380     9379       -1

@markotoplak
Copy link
Member

What kind of normalization can we still have with sparse data then? We could still do normalization that does some division or multiplication, we just have to avoid shifts (plus, minus).

Another thing: I never associated sparse data and discrete values, but yes, why not... Where do we get discrete sparse data in Orange? Is ti directly read from a file or generated by some text-mining widget?

@ajdapretnar
Copy link
Contributor

🤔 I think that if you have some discrete variables in the corpus before bag of words, they would inevitably get transformed into sparse alongside words.

@VesnaT
Copy link
Contributor Author

VesnaT commented Jan 30, 2020

The example is described in the issue and yes, it does not make sense but it still should not crash.

@markotoplak
Copy link
Member

Yes, but using appropriate operations for sparse data would be better than disabling options. Normalization by span is something that could still be done for sparse data, it should just not be centered.

@VesnaT
Copy link
Contributor Author

VesnaT commented Jan 30, 2020

Some types of normalization can be done using a Preprocess widget. Should these be added to the Continuize widget as well?

@janezd janezd added the needs discussion Core developers need to discuss the issue label Feb 13, 2020
@markotoplak markotoplak merged commit 0a65399 into biolab:master Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion Core developers need to discuss the issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Continuize: Normalizing sparse data fails
4 participants