Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for v0.2.3 #39

Merged
merged 4 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ makedocs(
"Introduction" => "man/intro.md",
"Documentation" => "man/documentation.md",
"Downloads" => "man/download.md",
"Acceptable Z-scores file" => "man/zfile.md",
"Tutorial" => "man/examples.md",
"Customizing LD files" => "man/solveblocks.md",
# "Video tutorials" => "man/video.md",
Expand Down
53 changes: 51 additions & 2 deletions docs/src/man/documentation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
"\n",
"| Option name | Argument | Description |\n",
"| :--- | :----: | :--- |\n",
"| `--zfile` | String | Input file containing Z-scores as well as CHR/POS/REF/ALT. See [Acceptable Z-score files](https://biona001.github.io/GhostKnockoffGWAS/dev/man/zfile) for detailed requirement on this file. |\n",
"| `--zfile` | String | Input file containing Z-scores as well as CHR/POS/REF/ALT. See [Acceptable Z-score files](https://biona001.github.io/GhostKnockoffGWAS/dev/man/documentation/#Acceptable-Z-scores-file-format) for detailed requirement on this file. |\n",
"| `--LD-files` | String | Input directory to the pre-processed LD files. Most users downloads this from the [Downloads Page](https://biona001.github.io/GhostKnockoffGWAS/dev/man/download) |\n",
"| `--N` | Int | Sample size for target (original) study |\n",
"| `--genome-build` | Int | The human genome build used for SNP positions in `zfile` (this value must be 19 or 38) |\n",
Expand Down Expand Up @@ -66,6 +66,55 @@
"\n",
"For a more detailed explanation on these 2 files, see [Tutorial](https://biona001.github.io/GhostKnockoffGWAS/dev/man/examples/#Step-4:-Interpreting-the-result). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acceptable Z-scores file format\n",
"\n",
"The Z score file should satisfy the following requirements:\n",
"1. It is a comma- or tab-separated text file (.gz compressed is acceptable)\n",
"2. The first row should be a header line, and every row after the first will be treated as a different SNP. \n",
"3. By default `GhostKnockoffGWAS` will search for column names `CHR`, `POS`, `REF`, `ALT`, and `Z`. Alternatively, you can specify which column should be used for each of these fields by providing the corresponding optional inputs, e.g. `--CHR 6` tells `GhostKnockoffGWAS` to use column 6 as `CHR`. The `ALT` allele will be treated as the effect allele and `REF` be treated as non-effect allele. The POS (position) field of each variant must be from HG19 or HG38, which must be specified by the `--genome-build` argument. \n",
"\n",
"Here is a minimal example with 10 Z scores\n",
"\n",
"```\n",
"CHR\tPOS\tREF\tALT\tZ\n",
"17\t150509\tT\tTA\t1.08773561923134\n",
"17\t151035\tT\tC\t0.703898767202681\n",
"17\t151041\tG\tA\tNaN\n",
"17\t151872\tT\tC\t-0.299877259561085\n",
"17\t152087\tC\tT\t-0.371627135786605\n",
"17\t152104\tG\tA\t-0.28387322965385\n",
"17\t152248\tG\tA\t0.901618600934489\n",
"17\t152427\tG\tA\t1.10987516000804\n",
"17\t152771\tA\tG\t0.708492545266136\n",
"```\n",
"\n",
"A toy example is [example_zfile.txt](https://github.com/biona001/GhostKnockoffGWAS/blob/main/data/example_zfile.txt) (17MB).\n",
"\n",
"!!! tip\n",
"\n",
" Missing Z scores can be specified as `NaN` or as an empty cell. If you do not want a SNP to be considered in the analysis, you can change the its Z-score to NaN. CHR/POS/REF/ALT fields cannot have missing values."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Requirements on the input Z-scores\n",
"\n",
"In our papers, Z-scores are defined by $z = \\frac{1}{\\sqrt{N}}X^ty$ where $X$ is the $N \\times P$ standardized genotype matrix with $N$ samples and $P$ SNPs, $y$ is the normalized $n \\times 1$ phenotype vector, and these Z-scores have $N(0, 1)$ distribution under the null. \n",
"\n",
"In practice, [this paper](https://arxiv.org/abs/2310.04030) shows that other association test statistics that are $N(0, 1)$ under the null also result in FDR control. This includes commonly used tests in genetic association studies such as:\n",
"+ generalized linear mixed effect model to account for sample relatedness\n",
"+ saddle point approximation for extreme case-control imbalance\n",
"+ meta-analysis that aggregates multiple studies.\n",
"\n",
"If you have p-values, effect sizes, odds ratios,...etc, converting them into Z score might be possible, for example by following the *Notes on computing Z-scores* of [this blog post](https://huwenboshi.github.io/data%20management/2017/11/23/tips-for-formatting-gwas-summary-stats.html). "
]
}
],
"metadata": {
Expand All @@ -82,5 +131,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
42 changes: 40 additions & 2 deletions docs/src/man/documentation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Command-line documentation and usage of GhostKnockoffGWAS

## Usage
Expand All @@ -13,7 +12,7 @@ GhostKnockoffGWAS --zfile example_zfile.txt --LD-files EUR --N 506200 --genome-b

| Option name | Argument | Description |
| :--- | :----: | :--- |
| `--zfile` | String | Input file containing Z-scores as well as CHR/POS/REF/ALT. See [Acceptable Z-score files](https://biona001.github.io/GhostKnockoffGWAS/dev/man/zfile) for detailed requirement on this file. |
| `--zfile` | String | Input file containing Z-scores as well as CHR/POS/REF/ALT. See [Acceptable Z-score files](https://biona001.github.io/GhostKnockoffGWAS/dev/man/documentation/#Acceptable-Z-scores-file-format) for detailed requirement on this file. |
| `--LD-files` | String | Input directory to the pre-processed LD files. Most users downloads this from the [Downloads Page](https://biona001.github.io/GhostKnockoffGWAS/dev/man/download) |
| `--N` | Int | Sample size for target (original) study |
| `--genome-build` | Int | The human genome build used for SNP positions in `zfile` (this value must be 19 or 38) |
Expand Down Expand Up @@ -41,3 +40,42 @@ GhostKnockoffGWAS --zfile example_zfile.txt --LD-files EUR --N 506200 --genome-b
3. (optional) Manhattan plots, which can be generated by following [step 5 of detailed example](https://biona001.github.io/GhostKnockoffGWAS/dev/man/examples/#Step-5:-Generating-Manhattan-plots).

For a more detailed explanation on these 2 files, see [Tutorial](https://biona001.github.io/GhostKnockoffGWAS/dev/man/examples/#Step-4:-Interpreting-the-result).

## Acceptable Z-scores file format

The Z score file should satisfy the following requirements:
1. It is a comma- or tab-separated text file (.gz compressed is acceptable)
2. The first row should be a header line, and every row after the first will be treated as a different SNP.
3. By default `GhostKnockoffGWAS` will search for column names `CHR`, `POS`, `REF`, `ALT`, and `Z`. Alternatively, you can specify which column should be used for each of these fields by providing the corresponding optional inputs, e.g. `--CHR 6` tells `GhostKnockoffGWAS` to use column 6 as `CHR`. The `ALT` allele will be treated as the effect allele and `REF` be treated as non-effect allele. The POS (position) field of each variant must be from HG19 or HG38, which must be specified by the `--genome-build` argument.

Here is a minimal example with 10 Z scores

```
CHR POS REF ALT Z
17 150509 T TA 1.08773561923134
17 151035 T C 0.703898767202681
17 151041 G A NaN
17 151872 T C -0.299877259561085
17 152087 C T -0.371627135786605
17 152104 G A -0.28387322965385
17 152248 G A 0.901618600934489
17 152427 G A 1.10987516000804
17 152771 A G 0.708492545266136
```

A toy example is [example_zfile.txt](https://github.com/biona001/GhostKnockoffGWAS/blob/main/data/example_zfile.txt) (17MB).

!!! tip

Missing Z scores can be specified as `NaN` or as an empty cell. If you do not want a SNP to be considered in the analysis, you can change the its Z-score to NaN. CHR/POS/REF/ALT fields cannot have missing values.

## Requirements on the input Z-scores

In our papers, Z-scores are defined by $z = \frac{1}{\sqrt{N}}X^ty$ where $X$ is the $N \times P$ standardized genotype matrix with $N$ samples and $P$ SNPs, $y$ is the normalized $n \times 1$ phenotype vector, and these Z-scores have $N(0, 1)$ distribution under the null.

In practice, [this paper](https://arxiv.org/abs/2310.04030) shows that other association test statistics that are $N(0, 1)$ under the null also result in FDR control. This includes commonly used tests in genetic association studies such as:
+ generalized linear mixed effect model to account for sample relatedness
+ saddle point approximation for extreme case-control imbalance
+ meta-analysis that aggregates multiple studies.

If you have p-values, effect sizes, odds ratios,...etc, converting them into Z score might be possible, for example by following the *Notes on computing Z-scores* of [this blog post](https://huwenboshi.github.io/data%20management/2017/11/23/tips-for-formatting-gwas-summary-stats.html).
20 changes: 11 additions & 9 deletions docs/src/man/download.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
"\n",
"## Software\n",
"\n",
"| Operating System | v0.2.2 (June 28th, 2024) |\n",
"| Operating System | v0.2.3 (Nov 7th, 2024) |\n",
"| :--- | :----: |\n",
"| Linux 64-bit | [Download](https://github.com/biona001/GhostKnockoffGWAS/releases/tag/v0.2.2) |\n",
"| Linux 64-bit | [Download](https://github.com/biona001/GhostKnockoffGWAS/releases/tag/v0.2.3) |\n",
"\n",
"After unzipping, the executable will be located inside `bin/GhostKnockoffGWAS`. We recommend adding the folder containing the `GhostKnockoffGWAS` executable to `PATH` for easier access."
]
Expand All @@ -23,12 +23,14 @@
"source": [
"## Pre-processed LD files\n",
"\n",
"| Population | Link | Number of SNPs | Description |\n",
"| :--- | :----: | :---: | :---: |\n",
"| EUR (Europeans) | [download](https://zenodo.org/records/10433663) (7.5GB) |650826 | See **Note 1** |\n",
"| ASN (East Asians) | TBD | |\n",
"| AFR (Africans) | TBD | |\n",
"| AMR (Admixed Americans) | TBD | | |\n",
"We welcome `solveblock` users to upload its output to the cloud and share the download link with us. After checking for its quality, we will include it in the following table.\n",
"\n",
"| Population | Link | Number of SNPs | Description | Citation |\n",
"| :--- | :----: | :---: | :---: | :---: |\n",
"| EUR (Europeans) | [download](https://zenodo.org/records/10433663) (7.5GB) |650826 | See **Note 1** | [paper](https://www.biorxiv.org/content/10.1101/2024.02.28.582621v2) |\n",
"| ASN (East Asians) | TBD | | |\n",
"| AFR (Africans) | TBD | | |\n",
"| AMR (Admixed Americans) | TBD | | | |\n",
"\n",
"+ **Note 1**: This file contain pre-processed LD files generated from the typed SNPs of the EUR cohort from the Pan-UKB panel. The quasi-independent regions were obtained by directly adapting [the output of ldetect](https://bitbucket.org/nygcresearch/ldetect-data/src/master/EUR/)"
]
Expand All @@ -48,5 +50,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
19 changes: 10 additions & 9 deletions docs/src/man/download.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@

# Downloads page

Here is the main downloads page. New software and pre-processed knockoff data will be released here.

## Software

| Operating System | v0.2.2 (June 28th, 2024) |
| Operating System | v0.2.3 (Nov 7th, 2024) |
| :--- | :----: |
| Linux 64-bit | [Download](https://github.com/biona001/GhostKnockoffGWAS/releases/tag/v0.2.2) |
| Linux 64-bit | [Download](https://github.com/biona001/GhostKnockoffGWAS/releases/tag/v0.2.3) |

After unzipping, the executable will be located inside `bin/GhostKnockoffGWAS`. We recommend adding the folder containing the `GhostKnockoffGWAS` executable to `PATH` for easier access.

## Pre-processed LD files

| Population | Link | Number of SNPs | Description |
| :--- | :----: | :---: | :---: |
| EUR (Europeans) | [download](https://zenodo.org/records/10433663) (7.5GB) |650826 | See **Note 1** |
| ASN (East Asians) | TBD | |
| AFR (Africans) | TBD | |
| AMR (Admixed Americans) | TBD | | |
We welcome `solveblock` users to upload its output to the cloud and share the download link with us. After checking for its quality, we will include it in the following table.

| Population | Link | Number of SNPs | Description | Citation |
| :--- | :----: | :---: | :---: | :---: |
| EUR (Europeans) | [download](https://zenodo.org/records/10433663) (7.5GB) |650826 | See **Note 1** | [paper](https://www.biorxiv.org/content/10.1101/2024.02.28.582621v2) |
| ASN (East Asians) | TBD | | |
| AFR (Africans) | TBD | | |
| AMR (Admixed Americans) | TBD | | | |

+ **Note 1**: This file contain pre-processed LD files generated from the typed SNPs of the EUR cohort from the Pan-UKB panel. The quasi-independent regions were obtained by directly adapting [the output of ldetect](https://bitbucket.org/nygcresearch/ldetect-data/src/master/EUR/)
8 changes: 4 additions & 4 deletions docs/src/man/examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"\n",
"1. Step 1: Download pre-processed LD files and binary executable and extract their content\n",
"\n",
" wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.2/app_linux_x86.tar.gz\n",
" wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.3/app_linux_x86.tar.gz\n",
" wget https://zenodo.org/records/10433663/files/EUR.zip\n",
" tar -xvzf app_linux_x86.tar.gz\n",
" unzip EUR.zip # decompresses to ~8.7GB\n",
Expand All @@ -36,7 +36,7 @@
"\n",
"Proceed to the [Downloads page](https://biona001.github.io/GhostKnockoffGWAS/dev/man/download) and download (1) the software as well as (2) a pre-processed knockoff dataset suitable for your analysis, e.g.\n",
"```shell\n",
"wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.2/app_linux_x86.tar.gz\n",
"wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.3/app_linux_x86.tar.gz\n",
"wget https://zenodo.org/records/10433663/files/EUR.zip\n",
"```\n",
"Next, unzip the files in linux command line via:\n",
Expand All @@ -57,7 +57,7 @@
"source": [
"## Step 2: Prepare a valid Z score file\n",
"\n",
"One needs a [valid Z score file](https://biona001.github.io/GhostKnockoffGWAS/dev/man/zfile) as input. \n",
"One needs a [valid Z score file](https://biona001.github.io/GhostKnockoffGWAS/dev/man/documentation/#Acceptable-Z-scores-file-format) as input. \n",
"\n",
"If you would like to follow along with this tutorial, feel free to download this test data [example_zfile.txt](https://github.com/biona001/GhostKnockoffGWAS/blob/main/data/example_zfile.txt) (17MB). The first few rows is\n",
"```\n",
Expand Down Expand Up @@ -260,5 +260,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
7 changes: 3 additions & 4 deletions docs/src/man/examples.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Detailed Example

This page collect examples of running the ghost knockoff pipeline. We will cover topics such as installation, examining input data, running the software, and interpreting the output.
Expand All @@ -9,7 +8,7 @@ Here is a short summary of this tutorial:

1. Step 1: Download pre-processed LD files and binary executable and extract their content

wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.2/app_linux_x86.tar.gz
wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.3/app_linux_x86.tar.gz
wget https://zenodo.org/records/10433663/files/EUR.zip
tar -xvzf app_linux_x86.tar.gz
unzip EUR.zip # decompresses to ~8.7GB
Expand All @@ -26,7 +25,7 @@ Here is a short summary of this tutorial:

Proceed to the [Downloads page](https://biona001.github.io/GhostKnockoffGWAS/dev/man/download) and download (1) the software as well as (2) a pre-processed knockoff dataset suitable for your analysis, e.g.
```shell
wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.2/app_linux_x86.tar.gz
wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.3/app_linux_x86.tar.gz
wget https://zenodo.org/records/10433663/files/EUR.zip
```
Next, unzip the files in linux command line via:
Expand All @@ -42,7 +41,7 @@ This should create 2 folders `app_linux_x86/` and `EUR/` in the current director

## Step 2: Prepare a valid Z score file

One needs a [valid Z score file](https://biona001.github.io/GhostKnockoffGWAS/dev/man/zfile) as input.
One needs a [valid Z score file](https://biona001.github.io/GhostKnockoffGWAS/dev/man/documentation/#Acceptable-Z-scores-file-format) as input.

If you would like to follow along with this tutorial, feel free to download this test data [example_zfile.txt](https://github.com/biona001/GhostKnockoffGWAS/blob/main/data/example_zfile.txt) (17MB). The first few rows is
```
Expand Down
6 changes: 3 additions & 3 deletions docs/src/man/intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,13 @@
"\n",
"1. Go to [Download Page](https://biona001.github.io/GhostKnockoffGWAS/dev/man/download) and download (1) the software and (2) the pre-processed LD files. For example,\n",
"\n",
" wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.2/app_linux_x86.tar.gz\n",
" wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.3/app_linux_x86.tar.gz\n",
" wget https://zenodo.org/records/10433663/files/EUR.zip\n",
"2. Unzip them both:\n",
"\n",
" tar -xvzf app_linux_x86.tar.gz\n",
" unzip EUR.zip # decompresses to ~8.7GB\n",
"3. Prepare your input Z score file into accepted format, see [Acceptable Z-scores](https://biona001.github.io/GhostKnockoffGWAS/dev/man/zfile). A toy example can be downloaded by:\n",
"3. Prepare your input Z score file into accepted format, see [Acceptable Z-scores](https://biona001.github.io/GhostKnockoffGWAS/dev/man/documentation/#Acceptable-Z-scores-file-format). A toy example can be downloaded by:\n",
"\n",
" wget https://github.com/biona001/GhostKnockoffGWAS/raw/main/data/example_zfile.txt\n",
"4. Run the executable\n",
Expand Down Expand Up @@ -91,5 +91,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
5 changes: 2 additions & 3 deletions docs/src/man/intro.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Introduction

This package conducts knockoff-based inference to perform genome-wide conditional independent tests based on GWAS summary statistics. The methodology is described in the following papers
Expand All @@ -25,13 +24,13 @@ Most users are expected to follow this workflow. Detailed explanations for each

1. Go to [Download Page](https://biona001.github.io/GhostKnockoffGWAS/dev/man/download) and download (1) the software and (2) the pre-processed LD files. For example,

wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.2/app_linux_x86.tar.gz
wget https://github.com/biona001/GhostKnockoffGWAS/releases/download/v0.2.3/app_linux_x86.tar.gz
wget https://zenodo.org/records/10433663/files/EUR.zip
2. Unzip them both:

tar -xvzf app_linux_x86.tar.gz
unzip EUR.zip # decompresses to ~8.7GB
3. Prepare your input Z score file into accepted format, see [Acceptable Z-scores](https://biona001.github.io/GhostKnockoffGWAS/dev/man/zfile). A toy example can be downloaded by:
3. Prepare your input Z score file into accepted format, see [Acceptable Z-scores](https://biona001.github.io/GhostKnockoffGWAS/dev/man/documentation/#Acceptable-Z-scores-file-format). A toy example can be downloaded by:

wget https://github.com/biona001/GhostKnockoffGWAS/raw/main/data/example_zfile.txt
4. Run the executable
Expand Down
Loading
Loading