Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Distributions: Fix selection output #6578

Merged
merged 3 commits into from
Sep 20, 2023

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Sep 17, 2023

Issue

Fixes #6576.

Description of changes

Instead of storing indices of columns, self.selection now stores selected bar values.

Includes
  • Code changes
  • Tests

@janezd janezd force-pushed the distributions-selection-order branch from a537ec7 to 1856db3 Compare September 17, 2023 17:14
@markotoplak
Copy link
Member

The same setting is now used to store something else. Thus we should be careful. The safest option is to use some other name than selection. That would prevent strange crashes when loading new workflows on old versions (or vice-versa if we do not write migration, which should at minimum discard the selection).

If old behavior was invalid, migration does not make much sense, or does it? It was fine when not sorted, true? Then migration could be useful if that was the majority use case.

@markotoplak markotoplak changed the title Distributions: Fix selection output [FIX] Distributions: Fix selection output Sep 17, 2023
@janezd
Copy link
Contributor Author

janezd commented Sep 17, 2023

A sensible migration would be to keep the selection if there was no ordering, because in this case it worked, and to clear it otherwise. Unfortunately, the selection is a context setting and ordering is not. Clearing a context dependent setting depending on a non-context would be hacky.

I don't think we'll get any crashes because of not migrate anything, because self.order is just a permutation -- no out-of-range indexes.

There's another problem. The current code is already broken and this PR doesn't change it. Draw a distribution for Iris and select the last class. Now put Iris through Select Rows and remove one of the classes (check purging checkboxes, too). Put this, binary-class data into Distributions. show_selection will crash when trying to mark the third, non-existing column.

To fix this, the context would have to depend on the values of the plotted variable, but this would be too strict with regard to other settings. Therefore self.selection (or some renamed setting) should store names of selected value. If some of selected values are missing in the new data, they wouldn't be selected, that's it.

@markotoplak
Copy link
Member

A sensible migration would be to keep the selection if there was no ordering, because in this case it worked, and to clear it otherwise. Unfortunately, the selection is a context setting and ordering is not. Clearing a context dependent setting depending on a non-context would be hacky.

Actually, with this dependency it could work quite cleanly. I think migrate_settings gets all the settings, including a list of contexts. Here we could then clean all of them contexts if ordering is specified.

I don't think we'll get any crashes because of not migrate anything, because self.order is just a permutation -- no out-of-range indexes.

That is a relief, thanks.

@markotoplak
Copy link
Member

markotoplak commented Sep 18, 2023

To fix this, the context would have to depend on the values of the plotted variable, but this would be too strict with regard to other settings. Therefore self.selection (or some renamed setting) should store names of selected value. If some of selected values are missing in the new data, they wouldn't be selected, that's it.

I agree. Storing value indices in settings is probably always wrong, except if contexts ensure the same content/ordering (which is most often impractical, as you pointed out).

@janezd janezd force-pushed the distributions-selection-order branch from 1856db3 to 9254145 Compare September 18, 2023 09:44
@janezd janezd force-pushed the distributions-selection-order branch from 9254145 to 6ce7f67 Compare September 18, 2023 09:53
@janezd
Copy link
Contributor Author

janezd commented Sep 18, 2023

@markotoplak, I've fixed it differently now. It now also fixes the crash when a selected value no longer appears in a new input data.

For the latter fix, self.selection now stores values, not indices. Migration from indices to values requires a (potentially ordered) list of values, which is computed while plotting (I do not want to refactor this out; it's integrated into plotting in four functions that handle different cases; see, e.g. https://github.com/biolab/orange3/pull/6578/files#diff-6e553113df1df05af9603e23aaaf9410dc817c405f26ea4f37a59f76fa76fdaeR689). Therefore, migration is done on the fly, as part of a clean-up that is required anyway (https://github.com/biolab/orange3/pull/6578/files#diff-6e553113df1df05af9603e23aaaf9410dc817c405f26ea4f37a59f76fa76fdaeR1135).

@janezd janezd force-pushed the distributions-selection-order branch from b80d9c5 to 1674965 Compare September 19, 2023 16:41
@janezd janezd force-pushed the distributions-selection-order branch from 1674965 to c2ab198 Compare September 19, 2023 16:54
@janezd janezd force-pushed the distributions-selection-order branch from c2ab198 to 52a12f7 Compare September 19, 2023 17:14
@markotoplak
Copy link
Member

I tried torturing this but my biggest complaint would be choice of variable names in add_range, which are about indices, but now actually include values. Because that is so very local let's not worry about it. Merging.

@markotoplak markotoplak merged commit 041c0ce into biolab:master Sep 20, 2023
16 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distributions outputs wrong data
2 participants