Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Outlier detection: keep instance ids, make thread safe #5427

Merged
merged 1 commit into from
May 12, 2021

Conversation

markotoplak
Copy link
Member

Issue
  1. Outlier detection did not keep instance ids, so subsets did work with the Data output of OWOutliers.
  2. Also, it was not thread-safe: multiple calls to _OutlierModel.__call__ could result in undefined behavior, because some caching was done at the object level.
Description of changes

To fix (1), I just used domain transformations instead of copying. To fix (2), I replaced object-level caching with SharedComputeValue.

Includes
  • Code changes
  • Tests
  • Documentation

@markotoplak markotoplak force-pushed the fix-outlier-instance-id branch 2 times, most recently from 6c50278 to 462dbe1 Compare May 6, 2021 18:44
@codecov
Copy link

codecov bot commented May 6, 2021

Codecov Report

Merging #5427 (587bf9d) into master (d4e259f) will decrease coverage by 0.21%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5427      +/-   ##
==========================================
- Coverage   86.37%   86.15%   -0.22%     
==========================================
  Files         303      303              
  Lines       62157    61606     -551     
==========================================
- Hits        53685    53077     -608     
- Misses       8472     8529      +57     

Copy link
Contributor

@VesnaT VesnaT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment. Other than that, I'm happy to merge.

def __call__(self, data):
return self.model.data_to_model_domain(data)


class _OutlierModel(SklModel):
def __init__(self, skl_model):
super().__init__(skl_model)
self._cached_data = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is no longer needed.

Outlier detection did not keep instance ids, so subsets did not work.
Also, it was not thread safe: multiple calls to _OutlierModel.__call__
could result in undefined behaviour, because some caching was done at
the object level.
@markotoplak markotoplak force-pushed the fix-outlier-instance-id branch from 462dbe1 to 587bf9d Compare May 11, 2021 20:37
@markotoplak
Copy link
Member Author

@VesnaT, thanks!

@markotoplak markotoplak merged commit 3773093 into biolab:master May 12, 2021
@markotoplak markotoplak deleted the fix-outlier-instance-id branch November 25, 2021 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants