Name	Name	Last commit message	Last commit date
Latest commit zqfang doc Aug 28, 2024 3dffb9e · Aug 28, 2024 History 317 Commits
.github/workflows	.github/workflows	actions	Aug 13, 2022
conda	conda	recipe update	Aug 18, 2022
docs	docs	doc	Aug 28, 2024
example	example	fixed ghmap -e, this should be an optional input	Jul 9, 2024
haplomap	haplomap	doc	Aug 28, 2024
scripts	scripts	clean	Feb 6, 2022
test	test	fix segment fault	Feb 5, 2022
webapp	webapp	minor	May 28, 2022
workflows	workflows	link	Aug 13, 2024
.gitignore	.gitignore	update	Jan 2, 2022
CMakeLists.txt	CMakeLists.txt	Merge branch 'master' of https://github.com/zqfang/HBCGM	Aug 15, 2022
CMakeLists.txt.in	CMakeLists.txt.in	explicity set reference allele to 0 when ref genome is avaible. This …	Jun 23, 2022
LICENSE	LICENSE	LICENSE	Feb 9, 2022
README.md	README.md	doc	Aug 28, 2024
environment.yaml	environment.yaml	env	Aug 3, 2020
slurm.submit.sh	slurm.submit.sh	slurm	Aug 17, 2020

Haplomap

Haplotype-based computational genetic mapping (a.k.a HBCGM)

Haplomap is a successor project of HBCGM, as development on the latter was last continued in 2010. Haplomap has been adopted as a replacement for the original HBCGM

Citation:

Zhuoqing Fang, Gary Peltz, An Automated Multi-Modal Graph-Based Pipeline for Mouse Genetic Discovery, Bioinformatics, 2022;, btac356, https://doi.org/10.1093/bioinformatics/btac356

see what's new in the CHANGELOG.

Dependency

Works both on Linux and MacOS

Haplomap:

CMake
GCC >= 4.8
clang >= 11.0.3 (only tested with 11.x version)
C++11
GSL

For Variant Calling, you need:

GATK 4.x
SAMtools
BCFtools
BEDtools
BWA

Running pipeline

Snakemake

Installation

conda install -c bioconda haplomap

Installl from source

Install GSL first e.g.

Ubuntu

sudo apt-get install libgsl-dev

MacOS

brew install gsl

or compile GSL(makesure that GSL include and lib path is exported)

./configure --prefix=${HOME}/program/gsl
make && make install
# you may need to add this line to your .bashrc 
export LD_LIBRARY_PATH="${HOME}/program/gsl/lib:$LD_LIBRARY_PATH"

build and install to path

cd ${haplomap_repo}
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/to/directory/bin ..
make

Usage

Run haplomap standalone

See more detail in haplomap subfolder: Run haplomap standalone

Use `snakemake` workflow to run Mouse Phenome Database (MPD) datasets

0. Variant calling

See variant calling using GATK, BCFtools, svtools.

e.g.

# modify the file path in haplomap and run with 12 cores
snakemake -s workflows/bcftools.call.smk  --configfile config.yaml \
          -k -p -j 12

Mouse Phenome Database have > 10K datasets. Try to configure the files below to run

1. Prepare MPD `measnum` id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

26720-m
26720-f
9940-f
...

2. Edit the `config.yaml` file path in `workflows` folder:

only edit HBCGM section.

HBCGM:
    # working directory
    WORKSPACE: "/data/bases/fangzq/MPD/results_drug_diet"
    # path to haplomap
    BIN: "/home/fangzq/github/HBCGM/build/bin"
    
    # MPD id file, one id per line 
    TRAIT_IDS: "/data/bases/fangzq/MPD/drug-diet.ids.txt"
    # set to true will select individual animal data. Default: use strain means.   
    USE_RAWDATA: false 
    # strains metadata: map strain abbrev to full name, jax ids, etc. 
    # see docs folder to view examples
    STRAIN_ANNO: "/data/bases/shared/haplomap/PELTZ_20210609/strains.metadata.csv"
    
    # filtered VCF files after variant calling step 
    VCF_DIR: "/data/bases/shared/haplomap/PELTZ_20210609/VCFs"
    # Ensembl-vep output after variant calling step
    VEP_DIR: "/data/bases/shared/haplomap/PELTZ_20210609/VEP"

    ## Optional files
    # genetic relation file from PLink output
    GENETIC_REL: "/data/bases/shared/haplomap/PELTZ_20210609/mouse54_grm.rel"
    # gene expression file 
    GENE_EXPRS: "/data/bases/shared/haplomap/PELTZ_20210609/mus.compact.exprs.txt"

3. run haplomap pipeline

3.1 create conda envs

conda create -n hbcgm -f environment.yaml

3.2 run on a local computing node.

source activate hbcgm
# modify the file path in haplomap and run with 24 cores
snakemake -s workflows/haplomap.smk \
          --configfile workflows/config.yaml 
          -k -p -j 24

3.3 Run on the HPC, e.g. Stanford Sherlock

e.g. Sherlock slurm

edit slurm.submit.sh, change file path to HBCGM/workflows
edit workflows/slurm_config.yaml, specify the resource you need.
submit

sbatch slurm.submit.sh

Output

output explanation, see here: Run haplomap standalone

Contact

Email:

Zhuoqing Fang: [email protected]
Gary Peltz: [email protected]

Copyright and License Information

Authors: Zhuoqing Fang and Gary Peltz.

The original HBCGM (the maximal haplotype construction method) was developed by Dr. David Dill and Dr. Gary Peltz at Stanford.

HBCGM/Halomap is patented to Dr. Gary Peltz.

This program is licensed with commercial restriction use license. Please see the attached LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Haplomap

Dependency

Installation

Installl from source

Usage

Run haplomap standalone

Use `snakemake` workflow to run Mouse Phenome Database (MPD) datasets

0. Variant calling

1. Prepare MPD `measnum` id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

2. Edit the `config.yaml` file path in `workflows` folder:

3. run haplomap pipeline

3.1 create conda envs

3.2 run on a local computing node.

3.3 Run on the HPC, e.g. Stanford Sherlock

Output

Contact

Copyright and License Information

About

Releases 3

Packages

Contributors 2

Languages

License

zqfang/haplomap

Folders and files

Latest commit

History

Repository files navigation

Haplomap

Dependency

Installation

Installl from source

Usage

Run haplomap standalone

Use snakemake workflow to run Mouse Phenome Database (MPD) datasets

0. Variant calling

1. Prepare MPD measnum id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

2. Edit the config.yaml file path in workflows folder:

3. run haplomap pipeline

3.1 create conda envs

3.2 run on a local computing node.

3.3 Run on the HPC, e.g. Stanford Sherlock

Output

Contact

Copyright and License Information

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Use `snakemake` workflow to run Mouse Phenome Database (MPD) datasets

1. Prepare MPD `measnum` id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

2. Edit the `config.yaml` file path in `workflows` folder:

Packages