The Gene Normalizer provides tools for resolving ambiguous human gene references to consistently-structured, normalized terms. For gene concepts extracted from NCBI Gene, Ensembl, and HGNC, it designates a CURIE, and provides additional metadata like current and previously-used symbols, aliases, database cross-references and associations, and coordinates.
Documentation · Installation · Usage · API reference
The Gene Normalizer is available on PyPI:
python3 -m pip install gene-normalizer
See installation instruction in the documentation for a description of installation options and data setup requirements.
Use the live service to programmatically normalize gene terms, as in the following truncated example:
$ curl 'https://normalize.cancervariants.org/gene/normalize?q=BRAF' | python -m json.tool
{
"query": "BRAF",
"match_type": 100,
"gene": {
"conceptType": "Gene",
"id": "normalize.gene.hgnc:1097"
"primaryCode": "hgnc:1097",
"label": "BRAF",
"extensions": [
{
"name": "aliases",
"value": [
"BRAF1",
"B-RAF1",
"NS7",
"RAFB1",
"B-raf",
"BRAF-1"
]
}
]
}
# ...
}
Or utilize the Python API for fast access:
>>> from gene.database import create_db
>>> from gene.query import QueryHandler
>>> q = QueryHandler(create_db())
>>> result = q.normalize("KRAS")
>>> result.gene.primaryCode
'hgnc:6407'
See the usage and normalization entries in the documentation for more.
We welcome bug reports, feature requests, and code contributions from users and interested collaborators. The documentation contains guidance for submitting feedback and contributing new code.