Entity Extraction in the metabolites dataset

Tool for Named Entity Recognition (NER) + Named Entity Linking (NEL).

Access the annotations for the metabolites dataset here.

Setup

Docker

docker build . -t ner_nel

Run a container over the built image:

docker run -v $(pwd):/exposome_NER_NEL/ --name exposome_NER_NEL --gpus all -it ner_nel bash

The part '$(pwd)' indicates that the code available to the container will be placed in the current directory. Replace this if running the code from other directory.

The arg '"device=7"' indicates that GPU 7 will be used. Change to use other GPUS, for example, '"device=7,8"' to use GPUs 7 and 8 (don't forget to schedule on the Google calendar the use of gpus :) )

The arg --user $(id -u):$(id -g) indicates that current user (you) will have permission to access the files produced during the container run.

To disable annoyng messages printed to the terminal run the following commands

export TF_CPP_MIN_LOG_LEVEL=3
export AUTOGRAPH_VERBOSITY=0

Go to the root directory:

cd exposome_NER_NEL/

Set current directory as the root:

export PYTHONPATH="${PYTHONPATH}:."

Install the abbreviation detector:

./prepare_env.sh

Download data resources to apply the tool and the metabolites dataset

Run:

./get_data.sh

Annotation

Assuming that the directory 'data/metabolites_dataset' exists, to annotate the entities present in the a given set of the dataset run:

python use.py -set <set_name>/

<set_name> can have the following values:

‘exposome’, corresponding to the initial sample with 29 full-text articles from the Exposome
‘inflammation’ with 115 full-text articles associated with the HMDB metabolites related to inflammation
'cancer' with 1092 full-text articles, retrieved with the pmids of the file 'pmid-Microbiota-Cancers_set'
'large' with 3423 full-text articles, retrieved with the pmids of the file 'pmid-Microbiolo-largeset-10k_papers'

The annotations files will be outputted to the 'data/metabolites_dataset/<set_name>/ann/' in the BRAT format.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
Dockerfile		Dockerfile
get_data.sh		get_data.sh
prepare_env.sh		prepare_env.sh
readme.md		readme.md
requirements.txt		requirements.txt
use.py		use.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entity Extraction in the metabolites dataset

Setup

Docker

Download data resources to apply the tool and the metabolites dataset

Annotation

About

Releases

Packages

Languages

lasigeBioTM/exposome_NER_NEL

Folders and files

Latest commit

History

Repository files navigation

Entity Extraction in the metabolites dataset

Setup

Docker

Download data resources to apply the tool and the metabolites dataset

Annotation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages