Skip to content

A project designed to extract structured data from clinical text-based notes

License

Notifications You must be signed in to change notification settings

alexanderflint/structured-data-from-notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

structured-data-from-notes

A project designed to extract structured data from clinical text-based notes

This repository contains SQL and Python code written to capture and parse structured data from clinical text-based notes from an electronic medical record (EMR).

The SQL code identifies cases based on the presence of a text string embedded in the note template, in specific note types. The initial versions are designed to query specific fields in Clarity (Epic), but the general method can be adapted for use with other relational database.

The Python code takes the results of the SQL query and extracts structured data from the clinical note by identifying a simple repeating motif within the note generated by a purpose-built note template. The motif has the following structure for each field:

For example:

Hair color: 3 - Brown varHairColor;

As the enumerations generated by the custom note template are designed to have an integer key and a non-integer text value, these can be parsed individually or together as a key-value pair.

If the parser is set up to extract only the integer data from the enumeration string data (as is the case with the initial versions of the Python code included here), the resulting output would look like this:

subject hairColor 1 3 2 2 3 1 4 3

...

About

A project designed to extract structured data from clinical text-based notes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages