Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support DocumentReference URL attachments #172

Merged
merged 1 commit into from
Feb 14, 2023
Merged

Conversation

mikix
Copy link
Contributor

@mikix mikix commented Feb 13, 2023

Previously, we only supported DocumentReferences with inlined notes. Now, we will properly download URL attachments.

Also:

  • Expands the mimetypes we look for from just text/plain to text/plain, text/*, and application/xml in that order.
  • Adds --fhir-url to specify the FHIR server when you are using an externally downloaded folder
  • Renames and moves the BackendServiceServer in the ndjson loader to FhirClient in toplevel code.
  • Moves some credential argument handling out of the Ndjson loader into etl.py code.
  • Eased credential requirement checking, so that you don't even need credentials, as long as the server doesn't complain (e.g. we can even run against Cerner's public sandbox that doesn't need auth)
  • Made tasks async.
  • Bumps FhirClient's timeout from 5 seconds to 5 minutes, for safety

Note:

  • This implementation is a little naive. It just downloads each URL as it sees them, with no caching. If we grow another NLP task, we'll to be more clever. And even without that, we could maybe be smarter about looking for a cached NLP result first.

Description

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added

@@ -5,17 +5,18 @@
import hashlib
import logging
import os
from typing import List
from typing import List, Optional
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are the actual feature change -- all other changes are mostly just refactoring to bring FhirClient out from a bulk export implementation detail up to a core piece of the etl.py machinery.

Previously, we only supported DocumentReferences with inlined notes.
Now, we will properly download URL attachments.

Also:
- Expands the mimetypes we look for from just text/plain to
  text/plain, text/*, and application/xml in that order.
- Adds --fhir-url to specify the FHIR server when you are using an
  externally downloaded folder
- Renames and moves the BackendServiceServer in the ndjson loader
  to FhirClient in toplevel code.
- Moves some credential argument handling out of the Ndjson loader
  into etl.py code.
- Eased credential requirement checking, so that you don't even need
  credentials, as long as the server doesn't complain (e.g. we can
  even run against Cerner's public sandbox that doesn't need auth)
- Made tasks async.
- Bumps FhirClient's timeout from 5 seconds to 5 minutes, for safety

Note:
- This implementation is a little naive. It just downloads each URL
  as it sees them, with no caching. If we grow another NLP task,
  we'll to be more clever. And even without that, we could maybe be
  smarter about looking for a cached NLP result first.
Comment on lines -61 to +56
summary = task.run()
summary = await task.run()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is still not running tasks in parallel, but just making the task runners able to run async code themselves. (Parallel tasks is a whole other discussion with its own difficulties.)

cumulus/ctakes.py Show resolved Hide resolved
cumulus/fhir_client.py Show resolved Hide resolved
docs/howtos/epic-tips.md Show resolved Hide resolved
docs/howtos/epic-tips.md Show resolved Hide resolved
Comment on lines +233 to 234
@mock.patch("cumulus.fhir_client.uuid.uuid4", new=lambda: "1234")
class TestBulkServer(unittest.IsolatedAsyncioTestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of outside the scope of this PR, but this test file is getting long enough that it is at least worth considering if some of the classes should be broken out into their own files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Next time I'm in there, I can probably break it up.

@mikix mikix merged commit 97d9bd2 into main Feb 14, 2023
@mikix mikix deleted the mikix/doc-url branch February 14, 2023 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants