Skip to content

OpenCGA Storage Hadoop

imedina edited this page May 16, 2016 · 1 revision

Overview

The main aim is to store and index gvcf (genome variant call format) as well as vcf files. To provide this functionality, the information is stored in two different tables depending on the purpose of the query.

  1. Storage The information stored should be a full representation of the provided information (gvcf or vcf format) and be able to reproduce the full content of the original file by exporting the data. The focus is a compact and comprehensive representation of the data per region.

  2. Indexing The observed genomic differences in any individual in a study are stored and annotated in this table.

Clone this wiki locally