-
Notifications
You must be signed in to change notification settings - Fork 97
Variant Normalization
A genomic variant is represented by a locus (chromosome + position), reference sequence and list of alternates.
Is common, because of the VCF specification, that the reference and alternate fields contain extra bases not needed for the Variant representation. It is completely valid to specify a variation like chr1:100:AC:AT
, which is absolutely the same variant that chr1:101:C:T
.
The number of possible combinations to represent the same genomic variant is potentially infinite, so it is mandatory to normalize the representation of the variant in order to determine when two representations are the same or different variants.
The variant normalization perform different steps over each variant to make a full normalization.
Simple trimming
chr1 . 100 CTC CCC
chr1 . 101 T C
Deletions
chr1 . 100 AT A
chr1 . 101 T -
Insertions
chr1 . 100 A AT
chr1 . 101 - T
Ambiguous trimming
chr1 . 100 AAA A
chr1 . 100 AA -
Complex trimming
chr1 . 100 ATC ACCC
chr1 . 102 T CC
OpenCGA is an open source project and it is freely available.
General
- Home
- Architecture
- Data Models
- RESTful Web Services
- Configuration
- Download and Installation
- Tutorials
OpenCGA Catalog
OpenCGA Storage
About