Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do the red arrows on some images create a danger for data leakage? #26

Open
bganglia opened this issue Mar 18, 2020 · 3 comments
Open
Labels
question Further information is requested

Comments

@bganglia
Copy link
Collaborator

bganglia commented Mar 18, 2020

It just occurred to me that arrows only occur on images with a positive diagnosis, so this could cause data leakage.

That might not be as much of problem if you are using these images for differential diagnosis, and already know the patient has something, but it could be an issue if this dataset is being combined with healthy images to decide whether the patient is healthy or sick.

@bganglia bganglia changed the title Do we have a way of dealing with red markings? Do the red arrows on some images create a danger for data leakage? Mar 18, 2020
@bganglia
Copy link
Collaborator Author

bganglia commented Mar 18, 2020

If images from patients who might be healthy are being compared to these images, the small figure labels (e.g. "A", "B") could also lead to data leakage.

@ieee8023
Copy link
Owner

True. This is a challenge to overcome. However, the models trained with a lot of data don't suffer from this issue though. Look at this example processed using a model trained on the 100k NIH examples:
Screen Shot 2020-03-17 at 9 12 34 PM

The gradient of the prediction with respect to the input is not using them to make a prediction. So it is possible that the features from those pretrained models (in the torchxrayvision library) can easily ignore the arrows and focus on the right features.

@ieee8023 ieee8023 added the question Further information is requested label Mar 21, 2020
@bfreskura
Copy link

Is it possible to mark such images in the metadata.csv?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants