Skip to content
/ GESim Public

Gene expression simulator from genotype/haplotype data

License

Notifications You must be signed in to change notification settings

ziadbkh/GESim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GESim

GESim (V0.1) is a tool to simulate gene expression from genotype/haplotype data:

  1. Simulations under a null model with no variants having an effect.
  2. Simulations based on one SNP as the causal genetic architecture.
  3. Simulations based on an interaction of a SNP pair as the causal genetic architecture.
  4. Simulations based on an additive impact of a SNP pair as the causal genetic architecture.
  5. Simulations based on a haplotype stretch within a block as a causal genetic architecture.

This version (V0.1) was quickly tested. Further tests should be carried out soon.

Assumptions

  1. VCF file should contain biallelic variants (only two alleles).
  2. VCF file should not have missing genotypes. Missing genotypes will be replaced by homozygous reference variants.
  3. A warning message will appear if the square of the Pearson correlation coefficient between the SNPs of a pair, any of the pair SNPs and the encoded combined impact (additive/interaction) is greater than 0.8.
  4. If haplotype-based simulations are required, the VCF file should be phased (| separator between alleles).

Prerequisites

  1. R version 3.4.4 (2018-03-15) or later.
  2. optparse (R libraries).

Usage

Default configurations

Rscript GESim.R -i example/variants.vcf -s example/snps.txt --pair_a example/pairs.txt --pair_i example/pairs.txt --hap example/haps.txt  --random 0 --h2 0.05 -o example/out/out 

Help

Please see the sections below for more details and examples. Parameters and options can be accessed using the help command.

Rscript GESim.R --help

Parameters

Mandatory parameters

  1. -i or --vcf: Haplotype/Genotype file path (.vcf). If simulations based on haplotypes are required, the alleles must be phased and '|' separated.
  2. -o or --out: Output file path with the prefix of the names of the output files.

Optional parameters

  1. -s or --snp: SNP file path. It contains one column with the SNP index (the order of the SNP) in the VCF file. See the sample files for the format.
  2. --pair_a: SNP pairs file path to be used for simulations based on the additive impact of a SNP pair. It contains two columns (tab-separated) containing the SNP index (the order of the SNP) in the VCF file. See the sample files for the format.
  3. --pair_i: SNP pairs file path to be used for SNP interaction-based simulations. It contains two columns (tab-separated) containing the SNP index (the order of the SNP) in the VCF file. See the sample files for the format.
  4. --hap: Haplotype file path to be used for haplotype-based simulations. It contains three columns (tab-separated) as follows: the SNP determining the beginning of the haplotype, the SNP determining the end of the haplotype, then the haplotype stretch used for encoding. For example, if you want to simulate gene expression for the haplotype 01101 within the block determined by the 5th and ninth SNP in the VCF file, the line in this file should be like this 5\t9\t01101. See the sample files for the format.
  5. --h2: Heritability value between 0 and 1. It refers to the proportion of the expression variation caused by the genetic architecture. Default is 0.05.
  6. --random: Number of simulations with no causal genetic architecture. Default is 0 which means no simulations for this type.

Citation

Al Bkhetan, Ziad, et al. "eQTLHap: a tool for comprehensive eQTL analysis considering haplotypic and genotypic effects." Briefings in Bioinformatics (2021).

Licence

Copyright 2021 Ziad Al Bkhetan

Licensed under the GNU GENERAL PUBLIC LICENSE (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://github.com/ziadbkh/GESim/blob/main/LICENSE

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contact information

For any help or inquiries, please contact: [email protected]

About

Gene expression simulator from genotype/haplotype data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages