Do the red arrows on some images create a danger for data leakage? #26

bganglia · 2020-03-18T00:16:08Z

It just occurred to me that arrows only occur on images with a positive diagnosis, so this could cause data leakage.

That might not be as much of problem if you are using these images for differential diagnosis, and already know the patient has something, but it could be an issue if this dataset is being combined with healthy images to decide whether the patient is healthy or sick.

bganglia · 2020-03-18T00:38:03Z

If images from patients who might be healthy are being compared to these images, the small figure labels (e.g. "A", "B") could also lead to data leakage.

ieee8023 · 2020-03-18T01:18:43Z

True. This is a challenge to overcome. However, the models trained with a lot of data don't suffer from this issue though. Look at this example processed using a model trained on the 100k NIH examples:

The gradient of the prediction with respect to the input is not using them to make a prediction. So it is possible that the features from those pretrained models (in the torchxrayvision library) can easily ignore the arrows and focus on the right features.

bfreskura · 2020-04-09T09:28:59Z

Is it possible to mark such images in the metadata.csv?

bganglia changed the title ~~Do we have a way of dealing with red markings?~~ Do the red arrows on some images create a danger for data leakage? Mar 18, 2020

ieee8023 added the question Further information is requested label Mar 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do the red arrows on some images create a danger for data leakage? #26

Do the red arrows on some images create a danger for data leakage? #26

bganglia commented Mar 18, 2020 •

edited

Loading

bganglia commented Mar 18, 2020 •

edited

Loading

ieee8023 commented Mar 18, 2020

bfreskura commented Apr 9, 2020

Do the red arrows on some images create a danger for data leakage? #26

Do the red arrows on some images create a danger for data leakage? #26

Comments

bganglia commented Mar 18, 2020 • edited Loading

bganglia commented Mar 18, 2020 • edited Loading

ieee8023 commented Mar 18, 2020

bfreskura commented Apr 9, 2020

bganglia commented Mar 18, 2020 •

edited

Loading

bganglia commented Mar 18, 2020 •

edited

Loading