-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support DocumentReference URL attachments #172
Conversation
@@ -5,17 +5,18 @@ | |||
import hashlib | |||
import logging | |||
import os | |||
from typing import List | |||
from typing import List, Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file are the actual feature change -- all other changes are mostly just refactoring to bring FhirClient
out from a bulk export implementation detail up to a core piece of the etl.py
machinery.
Previously, we only supported DocumentReferences with inlined notes. Now, we will properly download URL attachments. Also: - Expands the mimetypes we look for from just text/plain to text/plain, text/*, and application/xml in that order. - Adds --fhir-url to specify the FHIR server when you are using an externally downloaded folder - Renames and moves the BackendServiceServer in the ndjson loader to FhirClient in toplevel code. - Moves some credential argument handling out of the Ndjson loader into etl.py code. - Eased credential requirement checking, so that you don't even need credentials, as long as the server doesn't complain (e.g. we can even run against Cerner's public sandbox that doesn't need auth) - Made tasks async. - Bumps FhirClient's timeout from 5 seconds to 5 minutes, for safety Note: - This implementation is a little naive. It just downloads each URL as it sees them, with no caching. If we grow another NLP task, we'll to be more clever. And even without that, we could maybe be smarter about looking for a cached NLP result first.
summary = task.run() | ||
summary = await task.run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is still not running tasks in parallel, but just making the task runners able to run async code themselves. (Parallel tasks is a whole other discussion with its own difficulties.)
@mock.patch("cumulus.fhir_client.uuid.uuid4", new=lambda: "1234") | ||
class TestBulkServer(unittest.IsolatedAsyncioTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort of outside the scope of this PR, but this test file is getting long enough that it is at least worth considering if some of the classes should be broken out into their own files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Next time I'm in there, I can probably break it up.
Previously, we only supported DocumentReferences with inlined notes. Now, we will properly download URL attachments.
Also:
Note:
Description
Checklist
docs/
) needs to be updated