Skip to content

sof202/ChromBinarize

Repository files navigation

ChromHMM Binarization Tools

This is a selection of scripts that will convert various bed files for ONT, oxBS and WGBS datasets into a format compliant with ChromHMM.

ChromHMM is great at binarizing at a simple level, but struggles for datasets that are not traditionally peak called. In addition to this, 'better' peak calling algorithms (like MACS) exist for ChIP-Seq and ATAC-Seq datasets. As such, a separate suite of scripts that binarize these datasets (into a format recognised by ChromHMM) is proposed here.

Note

In the following README (and greater repository), the word 'methylation' means any type of DNA methylation. As such, when more precise language is required, you will see instead '5mC' or '5hmC' (etc.). If at any point the wording feels ambiguous when it shouldn't be, please raise an issue.

Setup

In order to run these scripts you will need to first fill out the config file (template provided in ./config-setup.txt). It is recommended that you put this config file near your data (note: this is not a requirement, you can actually put this file anywhere you wish).

Next run the setup script with:

./setup

This setup script requires user input for removing SLURM directives and also when setting up conda environments. This was a conscious decision as you may want to check what is being installed by conda first. Also, this setup script will take quite some time due to the dependency tree (~49 packages) for R.

You will see the following message on success:

[1] "success"

If you do not see this success message, please open up an issue.

Usage

After completing setup, run scripts sequentially using SLURM workload manager:

sbatch path/to/script path/to/config/file

Note

If you want to get a quick summary of what a script does, run the script without any positional parameters (you can just run it like a normal bash script in this case, sbatch is not required).

Software Requirements

This pipeline requires a unix-flavoured OS and requires the following software to be installed. Versions are those that were used during testing, lower minor version numbers are likely to still work.

  • bash (>=4.2.46(2))
  • SLURM Workload Manager (>=20.02.3)
  • Conda
    • Any installation will do, this has worked on Miniconda 4.5.2 (from 2020)
    • Make sure conda can be found on your PATH (check with which conda)
  • GNU awk (>=4.0.2)
  • GNU gzip (>=1.5)

The following software and R packages are installed for you in the setup script:

Further documentation

Please consult the wiki for further documentation on specific scripts.