Skip to content

Commit

Permalink
UMLS table creation
Browse files Browse the repository at this point in the history
  • Loading branch information
dogversioning committed May 8, 2024
1 parent 68c1899 commit 1bafaed
Show file tree
Hide file tree
Showing 15 changed files with 605 additions and 1 deletion.
40 changes: 40 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: CI
on: [push]
jobs:

lint:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install linters
run: |
python -m pip install --upgrade pip
pip install ruff==0.2.1
- name: Run ruff
if: success() || failure() # still run black if above checks fails
run: |
ruff check
ruff format --check
unittest:
name: unit tests
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest
- name: Test with pytest
run: |
python -m pytest
25 changes: 25 additions & 0 deletions .github/workflows/pypi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: PyPI

on:
release:
types: [created]

jobs:
publish:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build
run: python -m build

- name: Publish
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.PYPI_API_TOKEN }}
print_hash: true
137 changes: 137 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
downloads/
generated_parquet/

# project specific
downloads/
generated_parquet/
output.sql

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
17 changes: 17 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
default_install_hook_types: [pre-commit, pre-push]
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.2.1
hooks:
- name: Ruff formatting
id: ruff-format
- name: Ruff linting
id: ruff
stages: [pre-push]

- repo: https://github.com/sqlfluff/sqlfluff
rev: 2.3.4
hooks:
- id: sqlfluff-lint
types: []
types_or: [sql,jinja]
39 changes: 38 additions & 1 deletion README.MD
Original file line number Diff line number Diff line change
@@ -1 +1,38 @@
# Cumulus Library UMLS
# Cumulus Library UMLS

An installation of the Unified Medical Language System® Metathesaurus®. Part of the [SMART on FHIR Cumulus Project](https://smarthealthit.org/cumulus-a-universal-sidecar-for-a-smart-learning-healthcare-system/)

For more information, [browse the documentation](https://docs.smarthealthit.org/cumulus/library).
## Usage

In order to use the Metathesaurus, you'll need to get an API key for access from the National Library of Medicine, which you can sign up for [here](https://uts.nlm.nih.gov/uts/signup-login).

You can then install this module by running `pip install cumulus-library-umls`.

This will add a `umls` target to `cumulus-library`. You'll need to pass your
API key via the `--umls-key` CLI flag, or set the `UMLS_API_KEY` environment variable
to the key you received from NIH.

This ends up being a fairly intensive operation - we download a large file,
extract it, create parquet files from Athena, and then upload it. It usually
takes a half hour to run. We try to preserve some of those artifacts along
the way to make rebuilds faster. If you need to force recreation from scratch, the
`--replace-existing` CLI flag will handle this.

## Licensing details

The `cumulus-library-umls` study is provided as a convenience to install the
UMLS Metathesaurus, but is not shipped with the Metathesaurus dataset. It will
require an API key to download the data from NIH directly.

As a reminder, the
[License Agreement for Use of the UMLS® Metathesaurus®](https://uts.nlm.nih.gov/uts/assets/LicenseAgreement.pdf)
provides several restrictions on this usage of this data (including distributing
the dataset). When you sign up for a UMLS key, you are assuming responsibility
for complying with these terms, or an alternate licensing agreement with the
owner of the Metathesaus data if you are provided with one.


## Citations

Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061. PubMed PMID: 14681409; PubMed Central PMCID: PMC308795.
Empty file.
6 changes: 6 additions & 0 deletions cumulus_library_umls/umls/manifest.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
study_prefix = "umls"

[table_builder_config]
file_names = [
"umls_builder.py"
]
Loading

0 comments on commit 1bafaed

Please sign in to comment.