Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] File: Provide percent missing values in Info box #3305

Merged
merged 6 commits into from
Nov 29, 2018
Merged

[FIX] File: Provide percent missing values in Info box #3305

merged 6 commits into from
Nov 29, 2018

Conversation

MaxGreil
Copy link
Contributor

Issue

Fixes issue #3157.

Description of changes

Info Box in widget File now displays the same information as the info box in widget Data Table

Includes
  • Code changes
  • Tests
  • Documentation

@MaxGreil MaxGreil changed the title File: Provide percent missing values in Info box [FIX] File: Provide percent missing values in Info box Oct 24, 2018
@MaxGreil
Copy link
Contributor Author

@markotoplak can you please have a look at this?

miss = "%.1f" % table.get_nan_percentile_class()
text += " ({}% missing values)".format(miss)
else:
text += " (no missing values)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These five lines repeat twice. Could you construct this part of the string beforeif (but only if data has class of any kind)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you have in mind here. Which if do you mean and how should I then construct the string before it?

format(len(domain.class_var.values))
if table.has_missing_class():
miss = "%.1f" % table.get_nan_percentile_class()
text += " ({}% missing values)".format(miss)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not something like

text += " ({:.1f}% missing values)".format(table.get_nan_percentile_class())

Or

miss = table.get_nan_percentile_class()
text += " ({:.1f}% missing values)".format(miss)

, so that we don't format the string in two lines in two different ways.

miss = "%.1f" % table.get_nan_percentile_class()
text += " ({}% missing values)".format(miss)
else:
text += " (no missing values)"
elif table.domain.class_vars:
text += "<br/>Multi-target; {} target variables.".format(
len(table.domain.class_vars))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you extract the above five lines, you could also report the percentage of unknowns for this case (there's no reason why this case should be different).

return np.isnan(self.X).sum() / self.X.size * 100

def get_nan_percentile_class(self):
return np.isnan(self._Y).sum() / self._Y.size * 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if these two functions reported relative frequencies, not percentages. That is, I would remove * 100. Percentages are more "human-friendly", while these two functions look general and could also be used in other, non-GUI-related code.

@codecov
Copy link

codecov bot commented Nov 26, 2018

Codecov Report

Merging #3305 into master will decrease coverage by 0.21%.
The diff coverage is 73.68%.

@@            Coverage Diff             @@
##           master    #3305      +/-   ##
==========================================
- Coverage   82.57%   82.35%   -0.22%     
==========================================
  Files         348      360      +12     
  Lines       61395    64197    +2802     
==========================================
+ Hits        50697    52871    +2174     
- Misses      10698    11326     +628

def has_missing_class(self):
"""Return `True` if there are any missing class values."""
return bn.anynan(self._Y)

def get_nan_frequency_attribute(self):
return np.isnan(self.X).sum() / self.X.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just noticed one more: could you return 0 if self.X.size is 0, to avoid division by zero? Same in the next function?

@janezd
Copy link
Contributor

janezd commented Nov 29, 2018

Thank you both. I also added some tests.

@janezd janezd merged commit e3a68bc into biolab:master Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants