PGxCorpus

PGxCorpus is a manually annotated corpus, designed for the extraction of pharmacogenomic realtions from text. It is composed of 945 sentences mannually annotated, issued from 911 distinct PubMed abstracts. Annotation has been achieved by 11 annotators, including 5 senior annotators. Each sentence has been seen independently by 2 annotators, in a first phase, and by a third senior annotator, in a second phase.

PGxCorpus is in the file PGxCorpus.tar, in the Brat file format.
It can be browsed on our Brat server at https://pgxcorpus.loria.fr/.
It is also available on FigShare.

Annotation guidelines

The annotation guidelines were provided to the annotators to reduced the heterogeneity in the annotation task.

annotation_guidelines.pdf: Annotation guidelines

Source code of a baseline experiment

The source code of the baseline experiment reported in [1], is available in ./baseline_experiment/

In preparation.

License

PGxCorpus is under Creative Commons BY NC 4.0.

Acknowledgments

PGxCorpus is supported by the PractiKPharma project (http://practikpharma.loria.fr/), funded by the French National Research Agency (ANR) under grant ANR-15-CE23-0028.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PGxCorpus

Annotation guidelines

Source code of a baseline experiment

License

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

PGxCorpus

Annotation guidelines

Source code of a baseline experiment

License

Acknowledgments