-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not understanding why DICOM redaction does not detect Patient Name on example data #1309
Comments
Hello ! |
Apologies for the delay. We will look into this soon and report back. |
@parataiito a hotfix was created a a new version released. Could you please check again? Apologies for the late resolution on this! |
Closing for now, please re-open if needed. |
Thanks for the (very) quick reply! |
Works like a charm on all the demo files! So that's perfect! I also tested them on random data I generated and I was wondering if you understand why it does not work specifically on this on : sample_data.zip Is it due to the fact the data I burnt in the pixel array is not matched to any value in the DICOM tags? |
The DICOM redactor either takes values from the tags, or uses different text based approaches to identify entities such as names. In this case the default spaCy model used by Presidio does is not able to detect "ez OY" as a name, but a different model can. I would suggest experimenting with changing Presidio's configuration. For example: import pydicom
from presidio_analyzer import AnalyzerEngine, RecognizerResult
from presidio_analyzer.nlp_engine import TransformersNlpEngine
from presidio_image_redactor import ImageAnalyzerEngine, DicomImagePiiVerifyEngine, DicomImageRedactorEngine
model_config = [
{
"lang_code": "en",
"model_name": {
"spacy": "en_core_web_sm",
"transformers": "StanfordAIMI/stanford-deidentifier-base",
},
}
]
nlp_engine = TransformersNlpEngine(models=model_config)
text_analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine)
image_analyzer_engine = ImageAnalyzerEngine(analyzer)
dicom_engine = DicomImagePiiVerifyEngine(image_analyzer_engine=image_analyzer_engine)
instance = pydicom.dcmread(file_of_interest)
verify_image, ocr_results, analyzer_results = dicom_engine.verify_dicom_instance(instance, padding_width=25, show_text_annotation=True) Running this version with the spaCy model does not identify the bounding box with a name as PII, whereas this transformers model ( |
Hi @omri374 .
It redacts all the information less the header. |
It could be an OCR issue, where the OCR just can't detect the bounding box. Have you looked into the bounding boxes returned by the OCR? |
Thank you for the answer @omri374. This is the output of the simple program. I've followed the following documentation. The header doesn't seem to be detected by the bboxes. Regarding the image this is an DICOM image ultrasound. Even if I save it as a normal image and then use presidio the issue persists. |
hi @jhssilva, it might be because the contrast between the text and the background is relatively low. In this case, you might want to consider preprocessing the image before feeding it to the redactor. Ideas for such preprocessing functions could be found here: presidio-image-redactor/presidio_image_redactor/image_processing_engine.py |
Hey @ayabel . Thank you for your input and guidance. I've tested with the adaptiveThreshold as suggested. Being said that I've decided to take a different approach.
Note: In this example I didn't redact the bottom part of the image. Suggestion: Would be nice to have an example to such cases in the documentation as using the adaptive treshold or use the approach that I've suggested to specific cases. |
Hello !
First, thanks for this tool, it looks very promising, so congrats on the idea!
I have a question though.
I followed walkthrough from here:
I used the "0_ORIGINAL.dcm" file from the test files.
Here is my code to show it seems identical to the tutorial:
However, my output is this:
I don't understand why the Patient Name is not redacted like it is on your example :
For additional info, I am using Python 3.11.2 (but I tried with 3.9 too).
PS: I did not put it in bug since I am not 100% sure it is. It's probably on my side but I have no idea where it comes from...
Thanks in advance :)
The text was updated successfully, but these errors were encountered: