Skip to content
Pablo Moreno edited this page Feb 5, 2014 · 44 revisions

BiNChE is a tool for ontology-based chemical enrichment analysis. Based on the ChEBI chemical ontology, BiNChE enables researchers to identify overrepresented, i.e. enriched, ontological terms in their data. The tool is accessible through the ChEBI website. In addition, a stand along Java library is provided here.

Following in the footsteps of enrichment tools for the Gene Ontology, BiNChE utilizes organized chemical knowledge to allow identification of chemical classes or roles or both to help analyse small molecule omics data. Similar to use cases in genomics, chemical enrichment analysis provides higher level information and associations, e.g. to biological roles. Enrichment analysis is an essential tool for small molecule data exploration.

Entry page: http://www.ebi.ac.uk/chebi/tools/binche/

Web Interface

Input

  • Plain: The plain or unweighted analysis requires a list of ChEBI identifiers and relies on a binomial test to define whether the provided list is enriched in certain ChEBI categories.

  • Weighted: For the weighted analysis, a list of ChEBI identifiers plus weights (decimal number) is needed. The ChEBI identifier and weight columns are tab-delimited. Examples for weights are intensity values from measurements or score values from putative molecule identification lists. This type of enrichment uses an implementation of the SaddleSum algorithm to calculate the significance of an enrichment.

  • Fragment: This is a particular case of a weighted analysis, where only a subset of the ontology is used and certain pruners are applied. As such, the input is the same as that described for the weighted analysis.

Type of Analysis

  • Plain: Plain analysis runs a bionomial test to check for the statistical significance of deviations of input related ontological terms from the background population.

  • Weighted: Weighted analysis runs a SaddleSum implementation that "approximates the distribution of sum of weights asymptotically by saddlepoint method" (see the manual). The weights indicate the importance of each term.

  • Fragment: Fragment analysis is a weighted analysis limited to the chemical classes of the ChEBI ontology (Roles are not used) and uses different pruning strategies on the resulting graph to highlight molecular entities that are enriched. "Fragments" should be understood as molecular fragments or functional groups. Data would typically come from fragmentation mass spectrometry experiments. In contrast to the weighted analysis option, terminal molecular leaves or root vertices are not removed.

The significance of the results are corrected in every case for multiple hypothesis testing using Benjamini and Hochberg's false-discovery rate (FDR). In all the types of analysis, the enrichment is calculated taking the entire selected ontology as background population.

Target of Enrichment

The ChEBI chemical ontology includes three chemical branches: roles, classifications, and sub-atomic particles. BiNChE only makes use of the chemical roles and classifications. Depending on the scientific question, the branches can be used separately or in combination for an enrichment analysis.

  • ChEBI structure classification: The structure classification describes a molecular entity based on its composition and/or the connectivity between its constituent atoms.

  • ChEBI role classification: The role classification describes the role of a molecular entity within a biological context and/or its intended use by humans.

  • ChEBI structure and role classification: The structure and role classification is the union of both classifications. Note that the structure classification is significantly larger than the role classification.

Graph Pruning Strategies

The ChEBI ontology forms a directed cyclic graph. The challenge in the visualisation of enrichment results lies in the complexity and detail of the ontology graph. An informative graph should -- first and foremost -- show enriched ontological terms. To add information to that mere list of enriched terms, it is essential to map the relative position or connectivity of those terms to each other. To avoid unnecessary cluttering of the graph, pruning strategies have been added to the graph layout to remove irrelevant terms. Only terms that are not enriched are subjected to the pruning methods.

  • Zero Degree Vertex Pruner: Removes vertices that have a total degree of zero.

  • Root Children Pruner: Removes the first three levels of children vertices from the root vertex of the chemical and role ontology. The removed vertices refer to less meaningful terms, such as "molecular entity", "chemical substance", or "application", and skew the overall graph layout.

  • Molecule Leaves Pruner: Removes leaves (terminal vertices) that represent discrete molecules and not a class or role.

  • High P-Value Branch Pruner: Removes branches from the graph components that contain only vertices with a p-value greater than 0.05.

  • Linear Branch Collapser Pruner: Collapses linear branches within the graph to hide connecting vertices that are not involved in branching. Consequently, these vertices have an in- and out-degree of one.

To use pruners, they need to be combined through pruning strategies. Pruning strategies implement the [PruningStrategy] (https://github.com/pcm32/BiNChE/blob/develop/src/main/java/net/sourceforge/metware/binche/graph/PrunningStrategy.java) interface. Given that the different pruners exert changes on the graph on each application, subsequent applications of them on the graph can further reduce its elements. Pruning strategies apply pruners at three stages: initial, loop, and final, which are executed in that order. For each of these stages, pruners need to be assigned (a pruner can be assigned to more than one phase). The initial and final phases only involve the application of pruners a single time, while the loop phase iterates the application of the pruners set until the graph converges. Currently, the implemented strategies are:

  • Empty Pruning Strategy: No pruning applied.

  • Fragment Enrichment Pruning Strategy: Applies the High P-Value Branch Pruner (with a cut-off at 0.05) and the Linear Branch Collapser Pruner, both in the initial and loop phases.

  • Plain Enrichment Pruning Strategy: For the pre-loop phase this strategy applies the High Value Branch Pruner (0.05), the Linear Branch Collapser Pruner, and the Root Children Pruner (3 levels, without repetition). During the loop phase, this strategy applies the Molecule Leaves Pruner, the High P-Value Branch Pruner (0.05), the Linear Branch Collapser Pruner, and the Zero Degree Vertex Pruner. No pruners are applied in the final phase post-loop.

  • Weighted Enrichment Pruning Strategy In the initial phase, this strategy applies the Molecule Leaves Pruner, the Root Children Pruner (4 levels, no repetition), and the High P-Value Branch Pruner(0.05). No pruners are applied in the loop phase. In the final phase, the Linear Branch Collapser and Zero Degree Vertex Pruners are applied.

Use Cases

In general, any list of small molecules, produced via a computational pipeline, experimental technique or any other method, is suitable for the analysis through BiNChE. Examples of these could a list of small molecules that are relevant within a set of biological assays; metabolites that are consumed or produced by a set of enzymes of interest; a set of metabolites that are known to be part of the metabolism of an organism but that are absent in other organisms of interest; a set of small molecules that where defined as relevant in a metabolomics study; etc.

Weighted

Weighted analysis provides a bird eye view of a list of compounds that have associated weights, e.g. from network analysis or metabolomics. The example below comes from an effort to build tissue specific metabolic pathways. Here, weighted enrichment analysis highlights the presence of "very long-chain fatty acyl-CoA" (CHEBI:61910) in the target tissue (brain). Subsequent reasoning about the presence of acyl-CoAs in that tissue helps to validate and refine the methods used.

CHEBI:16708	1
CHEBI:17689	1
CHEBI:24549	0.5
CHEBI:29101	1
CHEBI:50622	1
CHEBI:52966	0.1
CHEBI:63540	1
CHEBI:63543	0.5
CHEBI:63546	1
CHEBI:63548	1
CHEBI:65136	1
CHEBI:72714	0.5
CHEBI:72715	1
CHEBI:73061	1
CHEBI:73072	1
CHEBI:73074	1
CHEBI:75108	0.5
CHEBI:76450	1
ChEBI:11851	1
ChEBI:12962	0.1
ChEBI:13115	1
ChEBI:13705	1

Plain

Plain analysis can be used to analyse metabolite identification lists from MetFrag. Running MetFrag with default settings results in a list of 15 putative identifications of the fragmentation spectrum. The identifiers can be used as input for BiNChE after identifier conversion (e.g. using the ChEBI plug-in in KNIME). Amongst others, analysis shows enrichment in the terms chromanone and chromanes. Here, the analysis suggests that the spectrum represents a bicyclic compound that belongs to the class of chromanes.

CHEBI:15413
CHEBI:76215
CHEBI:50202
CHEBI:15649
CHEBI:52047
CHEBI:16035
CHEBI:27725
CHEBI:3237
CHEBI:18131
CHEBI:27587

Implementation and Core Library (API)

http://cytoscapeweb.cytoscape.org/ https://github.com/pcm32/BiNChE

Usage

Clone this wiki locally