Skip to content

lijianw2/Enhancing_Biomedical_Lay_Summarisation_with_External_Knowledge_Graphs

 
 

Repository files navigation

Enhancing_Biomedical_Lay_Summarisation_with_External_Knowledge_Graphs

This repository will contain the code and data for the paper "Enhancing Biomedical Lay Summarisation with External Knowledge Graphs", accepted in EMNLP 2023.

Running models with eLife data

Downloading trained models

Our trained models can be downloaded automatically by running the get_models.sh script:

bash get_models.sh

Downloading eLife graph data

Similarly, the eLife graph data can be downloaded by running the get_elife_graph_data.sh script:

bash get_elife_graph_data.sh

Running generation with trained models

To generate summaries for the eLife data using our trained models, run the generate.py script with the path to the model you wish to use:

python generate.py {model_path}

Running training with eLife data

To train a model on the eLife data, run the train.py script with the path a config file (see train_configs.json for an example):

python train.py {config_path}

Running models with new data

In order to run any of the models on new data, new graph data files (in the same format as our eLife graph data) will need to be created.

Our graph data is stored in .pkl files (one for each data split), each of which is created from .jsonl file containing a list of dictionaries (one for each article) in the following format:

{
  "id": str,                                         # unique identifier
  "nodes": [node_id],                                # list of graph nodes, represented by their string ids
  "edges": [[src_node_id, rel_id, tgt_node_id]],     # list of relation tuples, represented by their node/relation string ids 
  "nfeatures": [node_embedding],                     # list of initial node features, n-dimentional arrays
}

As covered in the paper, we use data from UMLS to construct our graphs, which can only be accessed by requesting a lisence. Specifically, use the Metamap tool to map text to UMLS concepts, we use the UMLS Metathesaurus to retrieve semantic types of concepts, and, finally, we use the UMLS API to retrieve the definitions of the identified UMLS concepts and semantic types.

Although we are unable to publish the raw UMLS data files we use to construct our graphs (and the concept and semantic type definitions we use to construct our features) due to UMLS licensing restrictions, we will update this repository with the code we use to construct our graphs and features in the near future.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.4%
  • Shell 2.6%