Skip to content

cancervariants/gene-normalization

Repository files navigation

Gene Normalizer

image image image Actions status DOI

Overview

The Gene Normalizer provides tools for resolving ambiguous human gene references to consistently-structured, normalized terms. For gene concepts extracted from NCBI Gene, Ensembl, and HGNC, it designates a CURIE, and provides additional metadata like current and previously-used symbols, aliases, database cross-references and associations, and coordinates.


Live service

Documentation · Installation · Usage · API reference


Install

The Gene Normalizer is available on PyPI:

python3 -m pip install gene-normalizer

See installation instruction in the documentation for a description of installation options and data setup requirements.

Examples

Use the live service to programmatically normalize gene terms, as in the following truncated example:

$ curl 'https://normalize.cancervariants.org/gene/normalize?q=BRAF' | python -m json.tool
{
    "query": "BRAF",
    "match_type": 100,
    "gene": {
        "conceptType": "Gene",
        "id": "normalize.gene.hgnc:1097"
        "primaryCode": "hgnc:1097",
        "label": "BRAF",
        "extensions": [
            {
                "name": "aliases",
                "value": [
                    "BRAF1",
                    "B-RAF1",
                    "NS7",
                    "RAFB1",
                    "B-raf",
                    "BRAF-1"
                ]
            }
        ]
    }
    # ...
}

Or utilize the Python API for fast access:

>>> from gene.database import create_db
>>> from gene.query import QueryHandler
>>> q = QueryHandler(create_db())
>>> result = q.normalize("KRAS")
>>> result.gene.primaryCode
'hgnc:6407'

See the usage and normalization entries in the documentation for more.

Feedback and contributing

We welcome bug reports, feature requests, and code contributions from users and interested collaborators. The documentation contains guidance for submitting feedback and contributing new code.