This repository contains R functions and scripts we use to analyze the *.mcalls output files from the BWASP workflow.
Please find detailed installation instructions and options in the INSTALL document.
Claire Morandin and Volker P. Brendel (2021) Tools and applications for integrative analysis of DNA methylation in social insects. Molecular Ecology Resources, 00, 1-19. https://doi.org/10.1111/1755-0998.13566.
Original pre-print: at BioRxiv.
Please direct all comments and suggestions to Volker Brendel at Indiana University.
Required input to the BWASPR workflow consists of the *.mcalls files (tab delimited data for the named columns)
SeqID.Pos SequenceID Position Strand Coverage Prcnt_Meth Prcnt_Unmeth
and two files specifying the data labels and *.mcalls file locations and certain parameters, respectively. Let's look at the example files in inst/extdata:
AmHE.dat
================================================================================
# Samples from Herb et al. (2012) Nature Neuroscience:
#
Am HE forager 0 CpGhsm ../inst/extdata/Amel-forager.CpGhsm.mcalls
Am HE forager 0 CpGscd ../inst/extdata/Amel-forager.CpGscd.mcalls
Am HE nurse 0 CpGhsm ../inst/extdata/Amel-nurse.CpGhsm.mcalls
Am HE nurse 0 CpGscd ../inst/extdata/Amel-nurse.CpGscd.mcalls
AmHE.par
================================================================================
SPECIESNAME Apis mellifera
ASSEMBLYVERSION Amel_4.5
GENOMESIZE 250270657
TOTALNBRPMSITES 20307353
SPECIESGFF3DIR ../inst/extdata/AmGFF3DIR
GENELISTGFF3 Amel.gene.gff3
EXONLISTGFF3 Amel.exon.gff3
PCGEXNLISTGFF3 Amel.pcg-exon.gff3
PROMOTRLISTGFF3 Amel.promoter.gff3
CDSLISTGFF3 Amel.pcg-CDS.gff3
UTRFLAGSET 1
5UTRLISTGFF3 Amel.pcg-5pUTR.gff3
3UTRLISTGFF3 Amel.pcg-3pUTR.gff3
The first file has columns for species (here Am); study (here HE); sample (here forager and nurse"); replicate number (here 0, indicating single samples or, as in the case of this study, aggregates over replicates); and file locations (here for the CpGhsm and CpGscd *.mcalls files); note that the file locations in this example are relative links, assuming you will run the example discussed in the demo directory. The second file specifies the species name, genome assembly version, genome size (in base pairs), total number of potential methylation sites (CpGs), and file names for GFF3 annotation of various genomic features (UTRFLAGSET is set to 1 to use UTR annotation in the GFF3 file).
A typical BWASPR workflow will read the specified *.mcalls files and generate various output tables and plots, labeled in various ways with species_ study_ sample_ replicate labels. The demo/Rscript.BWASPR file shows a template workflow. Initial customization is done at the top of the file and mostly from inclusion of a configuration file such as demo/sample.conf. The following table summarizes the successive workflow steps. You may want to open the demo/Rscript.BWASPR and demo/sample.conf in separate windows as a reference while viewing the table. Details on running the workflow with the demo data are given in demo/README.
RUNflag | input | (select) parameters | function | theme | output files |
---|---|---|---|---|---|
RUNcms | studymk | covlist, locount, hicount | cmStats() | sample coverage and methylation statistics | cms-*.txt cms-*.pdf |
RUNpwc | studymk studymc |
- | cmpSites() | pairwise sample comparisons | pwc-*.vs.*.txt |
RUNcrl | studymk | destrand | cmpSamples() | correlations between aggregate samples | crl-*.txt crl-*.pdf |
RUNrepcms | replicate *.mcalls | repcovlist, replocount, rephicount |
cmStats() | replicate coverage and methylation statistics | repcms-*.txt repcms-*.pdf |
RUNrepcrl | replicate *.mcalls | destrand | cmpSamples() | correlations between replicates | repcrl-*.txt repcrl-*.pdf |
RUNmmp | studymk | - | map_methylome() | methylation to annotation maps | mmp-*.txt |
RUNacs | studymk | destrand | annotate_methylome() | annotation of common sites | acs-*.txt |
RUNrnk | studymk | genome_ann$region | rank_rbm() | ranked genes and promoters | ranked-*.txt sites-in-*.txt rnk-sig-*.pdf sip-*.txt rnk-sip-*.txt rnk-sip-*.pdf |
RUNmrpr | studymk | ddset nr2d doplots |
det_mrpr() | methylation-rich and -poor regions | dst-*.txt *ds-*.pdf mdr-*.tab mdr-*.bed mpr-*.txt mrr-*.txt rmp-*.txt gwr-*.txt |
RUNdmt | studymc | wsize, stepsize | det_dmt() | differentially methylated tiles and genes | dmt-*.txt dmg-*.txt |
RUNdmsg | sample *.mcalls |
highcoverage destrand |
det_dmsg() | differentially methylated sites and genes | dms-*.txt dmg-*.txt |
RUNdmgdtls | studyhc | destrand | show_dmsg() | details for differentially methylated genes | dmg-*.vs.*_details.txt dmg-*.vs.*_heatmaps.pdf |
RUNogl | studyhc | - | explore_dmsg() rank_dmg() |
ranked lists of differentially methylated genes | ogl-*.txt rnk-dmg-*.vs.*.txt rnk-dmg-*.vs.*.pdf wrt-*.txt |
RUNsave | workflow output | - | save.image() | save image of workflow output | *.RData |