Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow BatchBALD to not consider some samples #62

Merged
merged 4 commits into from
Aug 26, 2024

Conversation

Leengit
Copy link
Collaborator

@Leengit Leengit commented Aug 22, 2024

For reasons of efficiency, BatchBALD does not give a BatchBALD score for every sample. Instead it finds and scores the best samples. The remaining samples are given larger scores (higher certainty) based upon a fallback computation, which is currently a possibly shifted confidence score.

However, sometimes (e.g., when a sample has already been labeled by the user) we don't want BatchBALD to waste one of its "best-sample" slots on some subset of the samples. This is now implemented via a new ComputeCertainty.set_batchbald_excluded_samples method. The specified samples will never be selected as best samples and will instead always receive a possibly shifted confidence score.

This is a needed step in addressing DigitalSlideArchive/superpixel-classification#18.

This is useful for active learning in that samples that are already
labeled should not be selected for (re-)labeling.  Note that the
implementation computes confidence scores for these excluded samples,
just as it does for all other samples that are not selected.
@Leengit Leengit added the enhancement New feature or request label Aug 22, 2024
@Leengit Leengit self-assigned this Aug 22, 2024
@Leengit Leengit changed the title Allo BatchBALD to not consider some samples Allow BatchBALD to not consider some samples Aug 22, 2024
@Leengit Leengit force-pushed the batchbald_excluded_samples branch from 66fedba to 9c6e9d6 Compare August 22, 2024 20:40
@Leengit Leengit requested a review from manthey August 23, 2024 15:58
@Leengit
Copy link
Collaborator Author

Leengit commented Aug 23, 2024

For source-code reviewing, it may be best to focus on the first commit, which shows the code changes in their original location. The second commit moves a bunch of code into a subroutine as the original routine was getting too long, which makes git diff show a lot of changes that are merely a move. Also, please take a look at the third and fourth commits, where some testing and type hinting are added.

@manthey
Copy link

manthey commented Aug 26, 2024

Thanks for keeping the commits separate to make it easier to review.

@Leengit Leengit merged commit 6abd047 into DigitalSlideArchive:main Aug 26, 2024
6 checks passed
@Leengit Leengit deleted the batchbald_excluded_samples branch August 26, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants