Skip to content

Comparison of methylation calling between ONT long-read R.9 and R.10 data

Notifications You must be signed in to change notification settings

NIH-CARD/CARDlongread_meth_R.9vs10

Repository files navigation

CARDlongread_meth_R.9vs10

Comparison of methylation calling between ONT long-read R.9 and R.10 data

modkit_methcompare.py

This is the python script used to make the split violin plot graphs comparing modkit methylation frequencies over specified intervals for R9, R10, and bisulfite sequencing modkit files. These graphs were featured in xxx paper.

This script can be used to compare interval-specific methylation frequences from three different modkit files, with one of the files being used for binning.

Input data

The script requires three different bed or bed-like files with columns for genomic position (chromosome and start/end position), probability of the target base being modified, and coverage level of the base called.

The violin plot in the paper was made from bedMethyl files generated by modkit, a package for analysing ONT modified bases.
More info about the modkit package and bedMethyl output file can be found at https://github.com/nanoporetech/modkit.

The command used to generate the modkit files used in the paper are shown below:

#!/bin/bash

SAMPLE_NAME=$1
REF=$2
BAM_FILE=$2
OUT_PATH=$3

ml modkit 

modkit pileup --cpg --ref ${REF} --only-tabs --threads 24 --ignore h --combine-strands ${BAM_FILE} ${OUT_PATH}${SAMPLE_NAME}.hg38.modkit.comb.bed

Parameters

--sample_name : sample name (string value)

--r9_modkit : path to R9 bedfile

--r10_modkit : path to R10 bedfile

--bis_modkit : path to bisulfite bedfile

--cov_min : minimum coverage threshold, default = 20 (int value)

--cov_max : maximum coverage threshold, default = 200 (int value)

--interval : number of evenly spaced intervals for binning data, default = 10 (ex. 0, 10, 20, 30, ... 100) (int value)

--custom_interval : a list of custom unevenly spaced interval values for binning data (ex. [0, 5, 10, 50, 90, 95, 100])

--binning : dataset to bin the graph by , either 'r9', 'r10', or 'Bisulfite' (string value)

--bw : number from 0.0 - 1.0 (float value) that scales the violin plot bandwidth for more or less smoothing, default = 0.1 (float value)

--scale : method to normalizes each density to determine the violin's width: 'width' = default; all violins have the same, 'area' = all violins have the same area, = violin widths are proportional to number of observations (string value)

--out__dir : output directory path

Sample run command

python modkit_methcompare.py \
--cov_min 20 \
--cov_max 200 \
--r9_modkit /path/to/r9_modkit.bed \
--r10_modkit /path/to/r10_modkit.bed \
--bis_modkit /path/to/bis_modkit.bed \
--interval 10 \
--custom_interval [0, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, 100]
--sample_name HG002 \
--binning Bisulfite  \
--out_dir /path/to/out_dir

Graphs

The modkit_methcompare.py script generates three separate graphs that can then be overlaid to form the final figure.

The first graph is a split violin plot of the R9 and R10 methylation proportions binned by bisulfite intervals. HG002_bis_VP

The second graph is a line plot with lines connecting the median interval points in each sample.
This can be vectorized to overlay the split violin plot. HG002_Bisulfitebins_lines

The last plot is a panel showing the distribution of CpG site methylation frequencies for each sample.
This can be rotated 90° and added to the right side of the split violin graph. HG002_bis_VP_cov

Below is the final figure assembly: HG002_bis_final_VP

About

Comparison of methylation calling between ONT long-read R.9 and R.10 data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published