feat: support DocumentReference URL attachments #172

mikix · 2023-02-13T21:13:24Z

Previously, we only supported DocumentReferences with inlined notes. Now, we will properly download URL attachments.

Also:

Expands the mimetypes we look for from just text/plain to text/plain, text/*, and application/xml in that order.
Adds --fhir-url to specify the FHIR server when you are using an externally downloaded folder
Renames and moves the BackendServiceServer in the ndjson loader to FhirClient in toplevel code.
Moves some credential argument handling out of the Ndjson loader into etl.py code.
Eased credential requirement checking, so that you don't even need credentials, as long as the server doesn't complain (e.g. we can even run against Cerner's public sandbox that doesn't need auth)
Made tasks async.
Bumps FhirClient's timeout from 5 seconds to 5 minutes, for safety

Note:

This implementation is a little naive. It just downloads each URL as it sees them, with no caching. If we grow another NLP task, we'll to be more clever. And even without that, we could maybe be smarter about looking for a cached NLP result first.

Description

Checklist

Consider if documentation (like in docs/) needs to be updated
Consider if tests should be added

docs/howtos/epic-tips.md

mikix · 2023-02-13T21:22:05Z

cumulus/ctakes.py

@@ -5,17 +5,18 @@
 import hashlib
 import logging
 import os
-from typing import List
+from typing import List, Optional


The changes in this file are the actual feature change -- all other changes are mostly just refactoring to bring FhirClient out from a bulk export implementation detail up to a core piece of the etl.py machinery.

cumulus/fhir_client.py

Previously, we only supported DocumentReferences with inlined notes. Now, we will properly download URL attachments. Also: - Expands the mimetypes we look for from just text/plain to text/plain, text/*, and application/xml in that order. - Adds --fhir-url to specify the FHIR server when you are using an externally downloaded folder - Renames and moves the BackendServiceServer in the ndjson loader to FhirClient in toplevel code. - Moves some credential argument handling out of the Ndjson loader into etl.py code. - Eased credential requirement checking, so that you don't even need credentials, as long as the server doesn't complain (e.g. we can even run against Cerner's public sandbox that doesn't need auth) - Made tasks async. - Bumps FhirClient's timeout from 5 seconds to 5 minutes, for safety Note: - This implementation is a little naive. It just downloads each URL as it sees them, with no caching. If we grow another NLP task, we'll to be more clever. And even without that, we could maybe be smarter about looking for a cached NLP result first.

mikix · 2023-02-14T15:49:47Z

cumulus/etl.py

-        summary = task.run()
+        summary = await task.run()


Note that this is still not running tasks in parallel, but just making the task runners able to run async code themselves. (Parallel tasks is a whole other discussion with its own difficulties.)

cumulus/ctakes.py

cumulus/fhir_client.py

docs/howtos/epic-tips.md

dogversioning · 2023-02-14T21:24:02Z

tests/test_bulk_export.py

+@mock.patch("cumulus.fhir_client.uuid.uuid4", new=lambda: "1234")
 class TestBulkServer(unittest.IsolatedAsyncioTestCase):


sort of outside the scope of this PR, but this test file is getting long enough that it is at least worth considering if some of the classes should be broken out into their own files.

Fair. Next time I'm in there, I can probably break it up.

mikix commented Feb 13, 2023

View reviewed changes

docs/howtos/epic-tips.md Show resolved Hide resolved

mikix commented Feb 13, 2023

View reviewed changes

mikix force-pushed the mikix/doc-url branch from af51091 to a19f73c Compare February 13, 2023 21:28

mikix commented Feb 13, 2023

View reviewed changes

cumulus/fhir_client.py Show resolved Hide resolved

mikix force-pushed the mikix/doc-url branch from a19f73c to 166e581 Compare February 14, 2023 13:58

mikix commented Feb 14, 2023

View reviewed changes

mikix mentioned this pull request Feb 14, 2023

Support DocumentReferences with a url field rather than a local note #171

Closed

dogversioning approved these changes Feb 14, 2023

View reviewed changes

mikix merged commit 97d9bd2 into main Feb 14, 2023

mikix deleted the mikix/doc-url branch February 14, 2023 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support DocumentReference URL attachments #172

feat: support DocumentReference URL attachments #172

mikix commented Feb 13, 2023 •

edited

Loading

mikix Feb 13, 2023

mikix Feb 14, 2023

dogversioning Feb 14, 2023

mikix Feb 14, 2023

		@mock.patch("cumulus.fhir_client.uuid.uuid4", new=lambda: "1234")
		class TestBulkServer(unittest.IsolatedAsyncioTestCase):

feat: support DocumentReference URL attachments #172

feat: support DocumentReference URL attachments #172

Conversation

mikix commented Feb 13, 2023 • edited Loading

Description

Checklist

mikix Feb 13, 2023

Choose a reason for hiding this comment

mikix Feb 14, 2023

Choose a reason for hiding this comment

dogversioning Feb 14, 2023

Choose a reason for hiding this comment

mikix Feb 14, 2023

Choose a reason for hiding this comment

mikix commented Feb 13, 2023 •

edited

Loading