Skip to content

Latest commit

 

History

History
62 lines (45 loc) · 2.74 KB

README.md

File metadata and controls

62 lines (45 loc) · 2.74 KB

Package to annote binding type of bioactivity measures based on keyword search of

  • abstracts from PubMed, PubChem assay description, CrossRef or Google Patents
  • assay descriptions from ChEMBL assay descriptions

The annotation is currently supported for two types of targets

Getting started

Install

pip install git+https://github.com/sohviluukkonen/BindingType.git@main

Usage

The package has both an API and a CLI which can process either

  • Papyrus datasets
  • lists of document and/or assay IDs

Papyrus data

In the case of Papyrus-dataframe, the annotation will a new BindingType column to the dataframe and can be done from the command line with

bindtype_papyrus -i <dataset.csv/.tsv> -tt <GPCR/Kinase>

or with the API with

from bindtype.papyrus import add_binding_type_to_papyrus
df = add_binding_type_to_papyrus(df, target_type=GPRC/Kinase)

There is also an option to annotate all 'unknown' compounds that based on their Tanimoto similarity to the annotated compounds: -sim, --similarity flag in the CLI and similarity=True in the API.

General usage

In the more general case, the annotation will create dictionaries based list of document IDs and/or assays IDs. This can be done either from the command line with

bindtype -did <document_id_file_path> -aid <assay_id_file_path> -tt <GPCR/Kinase>

or with the API with

# for the GPCRs
from bindtype import ClassA_GPCR_HierachicalBindingTypeAnnotation
parser = ClassA_GPCR_HierachicalBindingTypeAnnotation()

# for the kinases
from bindtype import Kinase_AllostericAnnotation
parser = Kinase_AllostericAnnotation()

# Only abstracts
dct_doc_annotations = parser(document_ids=list_of_document_ids)

# Only assay descriptions
dct_assay_annotations = parser(assay_ids=list_of_assay_ids)

# Both
dct_doc_annotations, dct_assay_annotations = parser(document_ids=list_of_document_ids, assay_ids=list_of_assay_ids)

As the scripts were developed with data from Papyrus and uses document and assay description IDs should be in the format used in the all_doc_ids and AID columns: PMID:<pubchem_id>, PubChemAID:<pubchem_assay_id>, DOI:, PATENT:<patent_id> and <chembl_assay_id>.