Skip to content

Haplotype-based computational genetic mapping (HBCGM) for inbred population

License

Notifications You must be signed in to change notification settings

zqfang/haplomap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
zqfang
Aug 28, 2024
3dffb9e · Aug 28, 2024
Aug 13, 2022
Aug 18, 2022
Aug 28, 2024
Jul 9, 2024
Aug 28, 2024
Feb 6, 2022
Feb 5, 2022
May 28, 2022
Aug 13, 2024
Jan 2, 2022
Aug 15, 2022
Jun 23, 2022
Feb 9, 2022
Aug 28, 2024
Aug 3, 2020
Aug 17, 2020

Repository files navigation

Haplomap

Haplotype-based computational genetic mapping (a.k.a HBCGM)

bioconda Haplomap

Haplomap is a successor project of HBCGM, as development on the latter was last continued in 2010. Haplomap has been adopted as a replacement for the original HBCGM

Citation:

Zhuoqing Fang, Gary Peltz, An Automated Multi-Modal Graph-Based Pipeline for Mouse Genetic Discovery, Bioinformatics, 2022;, btac356, https://doi.org/10.1093/bioinformatics/btac356

see what's new in the CHANGELOG.

HBCGM

Dependency

Works both on Linux and MacOS

Haplomap:

  • CMake
  • GCC >= 4.8
  • clang >= 11.0.3 (only tested with 11.x version)
  • C++11
  • GSL

For Variant Calling, you need:

  • GATK 4.x
  • SAMtools
  • BCFtools
  • BEDtools
  • BWA

Running pipeline

  • Snakemake

Installation

conda install -c bioconda haplomap

Installl from source

  1. Install GSL first e.g.

Ubuntu

sudo apt-get install libgsl-dev

MacOS

brew install gsl

or compile GSL(makesure that GSL include and lib path is exported)

./configure --prefix=${HOME}/program/gsl
make && make install
# you may need to add this line to your .bashrc 
export LD_LIBRARY_PATH="${HOME}/program/gsl/lib:$LD_LIBRARY_PATH"
  1. build and install to path
cd ${haplomap_repo}
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/to/directory/bin ..
make

Usage

Run haplomap standalone

See more detail in haplomap subfolder: Run haplomap standalone

Use snakemake workflow to run Mouse Phenome Database (MPD) datasets

0. Variant calling

See variant calling using GATK, BCFtools, svtools.

e.g.

# modify the file path in haplomap and run with 12 cores
snakemake -s workflows/bcftools.call.smk  --configfile config.yaml \
          -k -p -j 12   

Mouse Phenome Database have > 10K datasets. Try to configure the files below to run

1. Prepare MPD measnum id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

26720-m
26720-f
9940-f
...

2. Edit the config.yaml file path in workflows folder:

only edit HBCGM section.

HBCGM:
    # working directory
    WORKSPACE: "/data/bases/fangzq/MPD/results_drug_diet"
    # path to haplomap
    BIN: "/home/fangzq/github/HBCGM/build/bin"
    
    # MPD id file, one id per line 
    TRAIT_IDS: "/data/bases/fangzq/MPD/drug-diet.ids.txt"
    # set to true will select individual animal data. Default: use strain means.   
    USE_RAWDATA: false 
    # strains metadata: map strain abbrev to full name, jax ids, etc. 
    # see docs folder to view examples
    STRAIN_ANNO: "/data/bases/shared/haplomap/PELTZ_20210609/strains.metadata.csv"
    
    # filtered VCF files after variant calling step 
    VCF_DIR: "/data/bases/shared/haplomap/PELTZ_20210609/VCFs"
    # Ensembl-vep output after variant calling step
    VEP_DIR: "/data/bases/shared/haplomap/PELTZ_20210609/VEP"

    ## Optional files
    # genetic relation file from PLink output
    GENETIC_REL: "/data/bases/shared/haplomap/PELTZ_20210609/mouse54_grm.rel"
    # gene expression file 
    GENE_EXPRS: "/data/bases/shared/haplomap/PELTZ_20210609/mus.compact.exprs.txt"

3. run haplomap pipeline

3.1 create conda envs

conda create -n hbcgm -f environment.yaml

3.2 run on a local computing node.

source activate hbcgm
# modify the file path in haplomap and run with 24 cores
snakemake -s workflows/haplomap.smk \
          --configfile workflows/config.yaml 
          -k -p -j 24   

3.3 Run on the HPC, e.g. Stanford Sherlock

e.g. Sherlock slurm

  1. edit slurm.submit.sh, change file path to HBCGM/workflows
  2. edit workflows/slurm_config.yaml, specify the resource you need.
  3. submit
sbatch slurm.submit.sh

Output

output explanation, see here: Run haplomap standalone

Contact

Email:

Copyright and License Information

Copyright (C) 2019-2022 Stanford University, Zhuoqing Fang and Gary Peltz.

Authors: Zhuoqing Fang and Gary Peltz.

The original HBCGM (the maximal haplotype construction method) was developed by Dr. David Dill and Dr. Gary Peltz at Stanford.

HBCGM/Halomap is patented to Dr. Gary Peltz.

This program is licensed with commercial restriction use license. Please see the attached LICENSE file for details.

About

Haplotype-based computational genetic mapping (HBCGM) for inbred population

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages