Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to macs3 #379

Merged
merged 7 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- [[#327](https://github.com/nf-core/atacseq/issues/327)] - Consistently support `.csi` indices as alternative to `.bai` to allow SAMTOOLS_INDEX to be used with the `-c` flag.
- [[#356](https://github.com/nf-core/atacseq/issues/356)] - Get rid of the `lib` folder and rearrange the pipeline accordingly.
- [[#379](https://github.com/nf-core/atacseq/pull/356)] - Use macs3 instead of macs2.
- Updated pipeline template to [nf-core/tools 2.14.1](https://github.com/nf-core/tools/releases/tag/2.14.1)
- [[#359](https://github.com/nf-core/atacseq/issues/359)] - Fix `--save_unaligned` description in schema.
- [[#344](https://github.com/nf-core/atacseq/issues/344)] - Fix memory issues when sorting merged replicates after `bedtools genomecov`.
Expand All @@ -25,6 +26,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
> **NB:** Parameter has been **added** if just the new parameter information is present.
> **NB:** Parameter has been **removed** if parameter information isn't present.

### Software dependencies

Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

| Dependency | Old version | New version |
| ---------- | ----------- | ----------- |
| `macs2` | 2.2.7.1 | |
| `macs3` | | 3.0.1 |

## [[2.1.2](https://github.com/nf-core/atacseq/releases/tag/2.1.2)] - 2022-08-07

### Enhancements & fixes
Expand Down
2 changes: 1 addition & 1 deletion CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@

> Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010 May 28;38(4):576-89. doi: 10.1016/j.molcel.2010.05.004. PubMed PMID: 20513432; PubMed Central PMCID: PMC2898526.

- [MACS2](https://www.ncbi.nlm.nih.gov/pubmed/18798982/)
- [MACS3](https://www.ncbi.nlm.nih.gov/pubmed/18798982/)

> Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137. doi: 10.1186/gb-2008-9-9-r137. Epub 2008 Sep 17. PubMed PMID: 18798982; PubMed Central PMCID: PMC2592715.

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
4. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
5. Generate gene-body meta-profile from bigWig files ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotProfile.html))
6. Calculate genome-wide enrichment (optionally relative to control) ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html))
7. Call broad/narrow peaks ([`MACS2`](https://github.com/macs3-project/MACS))
7. Call broad/narrow peaks ([`MACS3`](https://github.com/macs3-project/MACS))
8. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html))
9. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/))
10. Count reads in consensus peaks ([`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
Expand All @@ -66,7 +66,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
1. Re-mark duplicates ([`picard`](https://broadinstitute.github.io/picard/))
2. Remove duplicate reads ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
3. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
4. Call broad/narrow peaks ([`MACS2`](https://github.com/macs3-project/MACS))
4. Call broad/narrow peaks ([`MACS3`](https://github.com/macs3-project/MACS))
5. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html))
6. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/))
7. Count reads in consensus peaks relative to merged library-level alignments ([`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/merged_library_frip_score_header.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#id: 'mlib_frip_score'
#section_name: 'MERGED LIB: MACS2 peak FRiP score'
#section_name: 'MERGED LIB: MACS3 peak FRiP score'
#description: "is generated by calculating the fraction of all mapped reads that fall
# into the MACS2 called peak regions. A read must overlap a peak by at least 20% to be counted.
# into the MACS3 called peak regions. A read must overlap a peak by at least 20% to be counted.
# See <a href='https://www.encodeproject.org/data-standards/terms/' target='_blank'>FRiP score</a>."
#plot_type: 'bargraph'
#anchor: 'mlib_frip_score'
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/merged_library_peak_count_header.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#id: 'mlib_peak_count'
#section_name: 'MERGED LIB: MACS2 peak count'
#section_name: 'MERGED LIB: MACS3 peak count'
#description: "is calculated from total number of peaks called by
# <a href='https://github.com/taoliu/MACS' target='_blank'>MACS2</a>"
# <a href='https://github.com/taoliu/MACS' target='_blank'>MACS3</a>"
#plot_type: 'bargraph'
#anchor: 'mlib_peak_count'
#pconfig:
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/merged_replicate_frip_score_header.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#id: 'mrep_frip_score'
#section_name: 'MERGED REP: MACS2 peak FRiP score'
#section_name: 'MERGED REP: MACS3 peak FRiP score'
#description: "is generated by calculating the fraction of all mapped reads that fall
# into the MACS2 called peak regions. A read must overlap a peak by at least 20% to be counted.
# into the MACS3 called peak regions. A read must overlap a peak by at least 20% to be counted.
# See <a href='https://www.encodeproject.org/data-standards/terms/' target='_blank'>FRiP score</a>."
#plot_type: 'bargraph'
#anchor: 'mrep_frip_score'
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc/merged_replicate_peak_count_header.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#id: 'mrep_peak_count'
#section_name: 'MERGED REP: MACS2 Peak count'
#section_name: 'MERGED REP: MACS3 Peak count'
#description: "is calculated from total number of peaks called by
# <a href='https://github.com/taoliu/MACS' target='_blank'>MACS2</a>"
# <a href='https://github.com/taoliu/MACS' target='_blank'>MACS3</a>"
#plot_type: 'bargraph'
#anchor: 'mrep_peak_count'
#pconfig:
Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ module_order:
anchor: "mlib_featurecounts"
info: "This section of the report shows featureCounts results for the number of reads assigned to merged library consensus peaks."
path_filters:
- "./macs2/merged_library/featurecounts/*.summary"
- "./macs3/merged_library/featurecounts/*.summary"
- samtools:
name: "MERGED REP: SAMTools"
info: "This section of the report shows SAMTools results after merging replicates and filtering."
Expand All @@ -88,7 +88,7 @@ module_order:
anchor: "mrep_featurecounts"
info: "This section of the report shows featureCounts results for the number of reads assigned to merged replicate consensus peaks."
path_filters:
- "./macs2/merged_replicate/featurecounts/*.summary"
- "./macs3/merged_replicate/featurecounts/*.summary"

report_section_order:
mlib_peak_count:
Expand Down
10 changes: 5 additions & 5 deletions bin/macs2_merged_expand.py → bin/macs3_merged_expand.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@
############################################

Description = "Add sample boolean files and aggregate columns from merged MACS narrow or broad peak file."
Epilog = """Example usage: python macs2_merged_expand.py <MERGED_INTERVAL_FILE> <SAMPLE_NAME_LIST> <OUTFILE> --is_narrow_peak --min_replicates 1"""
Epilog = """Example usage: python macs3_merged_expand.py <MERGED_INTERVAL_FILE> <SAMPLE_NAME_LIST> <OUTFILE> --is_narrow_peak --min_replicates 1"""

argParser = argparse.ArgumentParser(description=Description, epilog=Epilog)

## REQUIRED PARAMETERS
argParser.add_argument("MERGED_INTERVAL_FILE", help="Merged MACS2 interval file created using linux sort and mergeBed.")
argParser.add_argument("MERGED_INTERVAL_FILE", help="Merged MACS3 interval file created using linux sort and mergeBed.")
argParser.add_argument(
"SAMPLE_NAME_LIST",
help="Comma-separated list of sample names as named in individual MACS2 broadPeak/narrowPeak output file e.g. SAMPLE_R1 for SAMPLE_R1_peak_1.",
help="Comma-separated list of sample names as named in individual MACS3 broadPeak/narrowPeak output file e.g. SAMPLE_R1 for SAMPLE_R1_peak_1.",
)
argParser.add_argument("OUTFILE", help="Full path to output directory.")

Expand Down Expand Up @@ -76,7 +76,7 @@ def makedir(path):
## sort -k1,1 -k2,2n <MACS_NARROWPEAK_FILE_LIST> | mergeBed -c 2,3,4,5,6,7,8,9,10 -o collapse,collapse,collapse,collapse,collapse,collapse,collapse,collapse,collapse > merged_peaks.txt


def macs2_merged_expand(MergedIntervalTxtFile, SampleNameList, OutFile, isNarrow=False, minReplicates=1):
def macs3_merged_expand(MergedIntervalTxtFile, SampleNameList, OutFile, isNarrow=False, minReplicates=1):
makedir(os.path.dirname(OutFile))

combFreqDict = {}
Expand Down Expand Up @@ -208,7 +208,7 @@ def macs2_merged_expand(MergedIntervalTxtFile, SampleNameList, OutFile, isNarrow
############################################
############################################

macs2_merged_expand(
macs3_merged_expand(
MergedIntervalTxtFile=args.MERGED_INTERVAL_FILE,
SampleNameList=args.SAMPLE_NAME_LIST.split(","),
OutFile=args.OUTFILE,
Expand Down
2 changes: 1 addition & 1 deletion bin/plot_macs2_qc.r → bin/plot_macs3_qc.r
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ library(scales)
option_list <- list(make_option(c("-i", "--peak_files"), type="character", default=NULL, help="Comma-separated list of peak files.", metavar="path"),
make_option(c("-s", "--sample_ids"), type="character", default=NULL, help="Comma-separated list of sample ids associated with peak files. Must be unique and in same order as peaks files input.", metavar="string"),
make_option(c("-o", "--outdir"), type="character", default='./', help="Output directory", metavar="path"),
make_option(c("-p", "--outprefix"), type="character", default='macs2_peakqc', help="Output prefix", metavar="string"))
make_option(c("-p", "--outprefix"), type="character", default='macs3_peakqc', help="Output prefix", metavar="string"))

opt_parser <- OptionParser(option_list=option_list)
opt <- parse_args(opt_parser)
Expand Down
Loading
Loading