Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add context-based post processing for linear features #342

Merged
merged 9 commits into from
Feb 5, 2024
Merged

Conversation

rwood-97
Copy link
Collaborator

@rwood-97 rwood-97 commented Jan 18, 2024

Summary

As per #339, this PR implements a post-processing script so that users can filter out false positives.
This works for linear features or anything where you expect multiple patches to be clustered but solo patches would be false positive.
It also adds a save_predictions() method the classifier to make sure predictions and confidence scores are saved in format expected for post-processing.

Fixes #218
Addresses #339

Checklist before assigning a reviewer (update as needed)

  • Allow users to pick lowest conf for which to change label
  • Check for edge cases - overlapping patches and non-square patches
  • Self-review code
  • Ensure submission passes current tests
  • Add tests
  • Update relevant docs

Reviewer checklist

Please add anything you want reviewers to specifically focus/comment on.

  • Everything looks ok?

@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2024

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (b52086e) 59.64% compared to head (f668a73) 60.49%.
Report is 2 commits behind head on main.

❗ Current head f668a73 differs from pull request most recent head 777c857. Consider uploading reports for the commit 777c857 to get more accurate results

Files Patch % Lines
mapreader/classify/classifier.py 23.07% 10 Missing ⚠️
mapreader/process/post_process.py 98.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #342      +/-   ##
==========================================
+ Coverage   59.64%   60.49%   +0.85%     
==========================================
  Files          35       37       +2     
  Lines        6165     6334     +169     
==========================================
+ Hits         3677     3832     +155     
- Misses       2488     2502      +14     
Flag Coverage Δ
unittests 60.49% <93.60%> (+0.85%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rwood-97
Copy link
Collaborator Author

To run, first run inference on some patches and save outputs by calling my_classifier.save_predictions(set_name="infer") with your chosen dataset.

Then:

import pandas as pd

from mapreader.process.post_process import PatchDataFrame

df = pd.read_csv("./predictions_patch_df.csv", index_col=0)

labels_map = {
    0: "no",
    1: "railspace",
    2: "building",
    3: "railspace+building"
}

patches = PatchDataFrame(df, labels_map=labels_map)
patches.get_context(labels=["railspace", "railspace+building",])
patches.update_preds(remap={"railspace": "no", "railspace+building": "building"}, conf=0.8)

This will select all railspace/railspace+building patches, get their context, then update predictions for patches with no surrounding railspace and confidence score of less than 0.8.

Can also set remap to be new labels completely (e.g. {"railspace": "check me", "railspace+building": "check me"}).

@rwood-97 rwood-97 linked an issue Jan 25, 2024 that may be closed by this pull request
@rwood-97
Copy link
Collaborator Author

See here for stats on post-processing https://github.com/Living-with-machines/railspace/issues/14

TBC
MapReader post-processing's sub-package currently contains one method for post-processing the predictions from your model based on the idea that features such as railways, roads, coastlines, etc. are continuous and so patches with these labels should be found near to other patches also with these labels.

For example, if a patch is predicted to be a railspace, but is surrounded by patches predicted to be non-railspace, then it is likely that the railspace patch is a false positive.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could be even more explicit and say: "The current method checks whether any of the 8 surrounding patches have the same label as a given patch (e.g. railspace), and if not, assumes this to be a false positive".

Perhaps could also mention: "Future releases may add functionality to create custom filter rules for your use case"

Copy link
Collaborator

@edwardchalstrey1 edwardchalstrey1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, but otherwise LGTM

@rwood-97 rwood-97 merged commit 033917f into main Feb 5, 2024
3 checks passed
@rwood-97 rwood-97 deleted the 339-postproc branch February 5, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Post processing of predicted labels using patch context Easier saving of predictions as csv files
3 participants