From 9ae323e20c84fac0c40be40b173b704015f9e806 Mon Sep 17 00:00:00 2001 From: Miguel Carmona Date: Fri, 1 Feb 2019 11:26:35 +0000 Subject: [PATCH 1/3] update study table parquet meta description --- README.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/README.md b/README.md index 8b4d732..a2d705c 100644 --- a/README.md +++ b/README.md @@ -109,6 +109,33 @@ List of loci associated with disease. Currently this data comes from two sources Information about each study found in the top loci table. +##### Parquet meta info +``` +file schema: schema +-------------------------------------------------------------------------------- +study_id: OPTIONAL BINARY L:STRING R:0 D:1 +pmid: OPTIONAL BINARY L:STRING R:0 D:1 +pub_date: OPTIONAL BINARY L:STRING R:0 D:1 +pub_journal: OPTIONAL BINARY L:STRING R:0 D:1 +pub_title: OPTIONAL BINARY L:STRING R:0 D:1 +pub_author: OPTIONAL BINARY L:STRING R:0 D:1 +trait_reported: OPTIONAL BINARY L:STRING R:0 D:1 +trait_efos: OPTIONAL F:1 +.list: REPEATED F:1 +..item: OPTIONAL BINARY L:STRING R:1 D:3 +ancestry_initial: OPTIONAL F:1 +.list: REPEATED F:1 +..item: OPTIONAL BINARY L:STRING R:1 D:3 +ancestry_replication: OPTIONAL F:1 +.list: REPEATED F:1 +..item: OPTIONAL BINARY L:STRING R:1 D:3 +n_initial: OPTIONAL INT64 R:0 D:1 +n_replication: OPTIONAL INT64 R:0 D:1 +n_cases: OPTIONAL INT64 R:0 D:1 +trait_category: OPTIONAL BINARY L:STRING R:0 D:1 +num_assoc_loci: OPTIONAL INT64 R:0 D:1 +``` + ##### Study table columns - `study_id`: unique identifier for study - `pmid`: pubmed ID (GWAS Catalog studies only) From 3a9dcf85499d4012650c17e344178f205c86cbab Mon Sep 17 00:00:00 2001 From: Miguel Carmona Date: Fri, 1 Feb 2019 11:54:58 +0000 Subject: [PATCH 2/3] top loci table parquet meta info --- README.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/README.md b/README.md index a2d705c..e6b89bb 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,28 @@ gsutil -m rsync -rn gs://genetics-portal-staging/v2d/180904 gs://genetics-portal List of loci associated with disease. Currently this data comes from two sources: (i) [GWAS Catalog](https://www.ebi.ac.uk/gwas/docs/file-downloads), (ii) [Neale *et al.* UK Biobank summary statistics (version 1)](http://www.nealelab.is/uk-biobank). +##### Parquet meta info + +``` +file schema: schema +-------------------------------------------------------------------------------- +study_id: OPTIONAL BINARY L:STRING R:0 D:1 +chrom: OPTIONAL BINARY L:STRING R:0 D:1 +pos: OPTIONAL INT64 R:0 D:1 +ref: OPTIONAL BINARY L:STRING R:0 D:1 +alt: OPTIONAL BINARY L:STRING R:0 D:1 +rsid: OPTIONAL BINARY L:STRING R:0 D:1 +direction: OPTIONAL BINARY L:STRING R:0 D:1 +beta: OPTIONAL DOUBLE R:0 D:1 +beta_ci_lower: OPTIONAL DOUBLE R:0 D:1 +beta_ci_upper: OPTIONAL DOUBLE R:0 D:1 +odds_ratio: OPTIONAL DOUBLE R:0 D:1 +oddsr_ci_lower: OPTIONAL DOUBLE R:0 D:1 +oddsr_ci_upper: OPTIONAL DOUBLE R:0 D:1 +pval_mantissa: OPTIONAL DOUBLE R:0 D:1 +pval_exponent: OPTIONAL INT64 R:0 D:1 +``` + ##### Top loci columns - `study_id`: unique identifier for study - `variant_id_b37`: chrom_pos_ref_alt (build 37) identifier for variant. RSID to variant ID mapping is non-unique, therefore multiple IDs may exist separated by ';' From 9fd3b56fb5f33bcf4551a99484d9aa22488262fa Mon Sep 17 00:00:00 2001 From: Miguel Carmona Date: Fri, 1 Feb 2019 12:58:51 +0000 Subject: [PATCH 3/3] include ld and finemapping parquet meta info --- README.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/README.md b/README.md index e6b89bb..ae5bc4f 100644 --- a/README.md +++ b/README.md @@ -206,6 +206,24 @@ Todo: Credible set analysis results used to link index variants to tag variants. Full finemapping methods can be seen here: https://github.com/opentargets/finemapping +##### Parquet meta info + +``` +file schema: schema +-------------------------------------------------------------------------------- +study_id: OPTIONAL BINARY L:STRING R:0 D:1 +lead_chrom: OPTIONAL BINARY L:STRING R:0 D:1 +lead_pos: OPTIONAL INT64 R:0 D:1 +lead_ref: OPTIONAL BINARY L:STRING R:0 D:1 +lead_alt: OPTIONAL BINARY L:STRING R:0 D:1 +tag_chrom: OPTIONAL BINARY L:STRING R:0 D:1 +tag_pos: OPTIONAL INT64 R:0 D:1 +tag_ref: OPTIONAL BINARY L:STRING R:0 D:1 +tag_alt: OPTIONAL BINARY L:STRING R:0 D:1 +log10_ABF: OPTIONAL DOUBLE R:0 D:1 +posterior_prob: OPTIONAL DOUBLE R:0 D:1 +``` + ##### Finemapping table columns - `study_id`: unique identifier for study - `index_variantid_b37`: unique variant identifier for index variant, chrom_pos_ref_alt (build 37) @@ -227,6 +245,28 @@ Steps Table of LD values linking index varaints to tag variants. +##### Parquet meta info + +``` +file schema: schema +-------------------------------------------------------------------------------- +study_id: OPTIONAL BINARY L:STRING R:0 D:1 +lead_chrom: OPTIONAL BINARY L:STRING R:0 D:1 +lead_pos: OPTIONAL INT64 R:0 D:1 +lead_ref: OPTIONAL BINARY L:STRING R:0 D:1 +lead_alt: OPTIONAL BINARY L:STRING R:0 D:1 +tag_chrom: OPTIONAL BINARY L:STRING R:0 D:1 +tag_pos: OPTIONAL INT64 R:0 D:1 +tag_ref: OPTIONAL BINARY L:STRING R:0 D:1 +tag_alt: OPTIONAL BINARY L:STRING R:0 D:1 +overall_r2: OPTIONAL DOUBLE R:0 D:1 +AFR_1000G_prop: OPTIONAL DOUBLE R:0 D:1 +AMR_1000G_prop: OPTIONAL DOUBLE R:0 D:1 +EAS_1000G_prop: OPTIONAL DOUBLE R:0 D:1 +EUR_1000G_prop: OPTIONAL DOUBLE R:0 D:1 +SAS_1000G_prop: OPTIONAL DOUBLE R:0 D:1 +``` + ##### LD table columns - `study_id`: unique identifier for study - `index_variantid_b37`: unique variant identifier for index variant, chrom_pos_ref_alt (build 37)