You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It just occurred to me that arrows only occur on images with a positive diagnosis, so this could cause data leakage.
That might not be as much of problem if you are using these images for differential diagnosis, and already know the patient has something, but it could be an issue if this dataset is being combined with healthy images to decide whether the patient is healthy or sick.
The text was updated successfully, but these errors were encountered:
bganglia
changed the title
Do we have a way of dealing with red markings?
Do the red arrows on some images create a danger for data leakage?
Mar 18, 2020
If images from patients who might be healthy are being compared to these images, the small figure labels (e.g. "A", "B") could also lead to data leakage.
True. This is a challenge to overcome. However, the models trained with a lot of data don't suffer from this issue though. Look at this example processed using a model trained on the 100k NIH examples:
The gradient of the prediction with respect to the input is not using them to make a prediction. So it is possible that the features from those pretrained models (in the torchxrayvision library) can easily ignore the arrows and focus on the right features.
It just occurred to me that arrows only occur on images with a positive diagnosis, so this could cause data leakage.
That might not be as much of problem if you are using these images for differential diagnosis, and already know the patient has something, but it could be an issue if this dataset is being combined with healthy images to decide whether the patient is healthy or sick.
The text was updated successfully, but these errors were encountered: