From 3a9dcf85499d4012650c17e344178f205c86cbab Mon Sep 17 00:00:00 2001 From: Miguel Carmona Date: Fri, 1 Feb 2019 11:54:58 +0000 Subject: [PATCH] top loci table parquet meta info --- README.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/README.md b/README.md index a2d705c..e6b89bb 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,28 @@ gsutil -m rsync -rn gs://genetics-portal-staging/v2d/180904 gs://genetics-portal List of loci associated with disease. Currently this data comes from two sources: (i) [GWAS Catalog](https://www.ebi.ac.uk/gwas/docs/file-downloads), (ii) [Neale *et al.* UK Biobank summary statistics (version 1)](http://www.nealelab.is/uk-biobank). +##### Parquet meta info + +``` +file schema: schema +-------------------------------------------------------------------------------- +study_id: OPTIONAL BINARY L:STRING R:0 D:1 +chrom: OPTIONAL BINARY L:STRING R:0 D:1 +pos: OPTIONAL INT64 R:0 D:1 +ref: OPTIONAL BINARY L:STRING R:0 D:1 +alt: OPTIONAL BINARY L:STRING R:0 D:1 +rsid: OPTIONAL BINARY L:STRING R:0 D:1 +direction: OPTIONAL BINARY L:STRING R:0 D:1 +beta: OPTIONAL DOUBLE R:0 D:1 +beta_ci_lower: OPTIONAL DOUBLE R:0 D:1 +beta_ci_upper: OPTIONAL DOUBLE R:0 D:1 +odds_ratio: OPTIONAL DOUBLE R:0 D:1 +oddsr_ci_lower: OPTIONAL DOUBLE R:0 D:1 +oddsr_ci_upper: OPTIONAL DOUBLE R:0 D:1 +pval_mantissa: OPTIONAL DOUBLE R:0 D:1 +pval_exponent: OPTIONAL INT64 R:0 D:1 +``` + ##### Top loci columns - `study_id`: unique identifier for study - `variant_id_b37`: chrom_pos_ref_alt (build 37) identifier for variant. RSID to variant ID mapping is non-unique, therefore multiple IDs may exist separated by ';'