Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update all! #26

Merged
merged 13 commits into from
Apr 26, 2024
6 changes: 3 additions & 3 deletions .github/workflows/cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ jobs:
weekly-cache:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4.1.4
- name: Set up cache
uses: actions/cache@v2
uses: actions/cache@v4.0.2
with:
path: .snakemake/conda
key: snakemake-conda
id: cache
- name: Download environments
uses: snakemake/snakemake-github-action@v1.24.0
uses: snakemake/snakemake-github-action@v1.25.1
with:
directory: .
snakefile: workflow/Snakefile
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/conventional-prs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
title-format:
runs-on: ubuntu-latest
steps:
- uses: amannn/action-semantic-pull-request@v5.2.0
- uses: amannn/action-semantic-pull-request@v5.5.2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
Expand Down
17 changes: 8 additions & 9 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@ name: Tests

on: # yamllint disable-line rule:truthy
push:
branches: [main, devel]
branches: [master, devel]
pull_request:
branches: [main, devel]
branches: [master, devel]

jobs:

Pre-Commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4.1.4
with:
lfs: 'true'
- name: Run pre-commit on all files
uses: pre-commit/[email protected].0
uses: pre-commit/[email protected].1
with:
extra_args: --all-files

Expand All @@ -26,18 +26,18 @@ jobs:
needs:
- Pre-Commit
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4.1.4
with:
lfs: 'true'

- name: Cache
uses: actions/cache@v3
uses: actions/cache@v4.0.2
with:
path: .snakemake/conda
key: snakemake-conda

- name: Download environments
uses: snakemake/snakemake-github-action@v1.24.0
uses: snakemake/snakemake-github-action@v1.25.1
with:
directory: .
snakefile: workflow/Snakefile
Expand All @@ -48,7 +48,7 @@ jobs:
--cores 1

- name: Test workflow
uses: snakemake/snakemake-github-action@v1.24.0
uses: snakemake/snakemake-github-action@v1.25.1
with:
directory: .
snakefile: workflow/Snakefile
Expand All @@ -57,4 +57,3 @@ jobs:
--show-failed-logs
--cores 3
--conda-cleanup-pkgs cache
--all-temp
5 changes: 2 additions & 3 deletions .github/workflows/release-please.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
---
on: # yamllint disable-line rule:truthy
push:
branches:
- main
branches: [main, devel, master]

name: release-please

Expand All @@ -12,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:

- uses: google-github-actions/release-please-action@v3
- uses: google-github-actions/release-please-action@v4.1.0
id: release
with:
release-type: go
Expand Down
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# smsk_popoolation: A Snakemake pipeline for population genomics

[![Build Status](https://travis-ci.org/jlanga/smsk_popoolation.svg?branch=master)](https://travis-ci.org/jlanga/smsk_popoolation)
![Build Status](https://github.com/jlanga/smsk_popoolation/actions/workflows/main.yml/badge.svg)
[![DOI](https://zenodo.org/badge/76841262.svg)](https://zenodo.org/badge/latestdoi/76841262)

## 1. Description
Expand All @@ -17,8 +17,6 @@ This is a repo that contains installers and snakemake scripts to execute the pip

## 2. First steps

Follow the contents of the `.travis.yml` file:

1. Install ([ana](https://www.continuum.io/downloads)|[mini](http://conda.pydata.org/miniconda.html))conda

2. Clone and install the software
Expand Down
4 changes: 2 additions & 2 deletions config/params.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ popoolation:
min_covered_fraction: 0.5
analyses: [pi, theta, D]
window_step: # window: 5000, step: 1000 -> [1, 5000], [1000, 6000], [2000, 7000], ...
- [1, 1]
- [1K, 500]
- [5K, 1K]

popoolation2:
Expand All @@ -36,5 +36,5 @@ popoolation2:
min_coverage: 4
min_covered_fraction: 1.0
window_step: # window: 5000, step: 1000 -> [1, 5000], [1000, 6000], [2000, 7000], ...
- [1, 1]
- [1K, 500]
- [5K, 1K]
7 changes: 5 additions & 2 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,12 @@ samples = pd.read_table("config/samples.tsv", comment="#", dtype="str")

# Other variables
POPULATION_LIBRARY = (
samples[["population", "library"]].drop_duplicates().values.tolist()
samples[["population", "library"]]
.sort_values(by=["population", "library"])
.drop_duplicates()
.values.tolist()
)
POPULATIONS = list(set(population for population, library in POPULATION_LIBRARY))
POPULATIONS = samples["population"].sort_values().drop_duplicates().values.tolist()

PAIRS = ["pe_pe", "pe_se"]
CHROMOSOMES = features["chromosomes"].split(" ")
Expand Down
2 changes: 0 additions & 2 deletions workflow/rules/popoolation/__main__.smk
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,5 @@ include: "hp.smk"

rule popoolation:
input:
rules.popoolation__mpileup.input,
rules.popoolation__variance_sliding.input,
rules.popoolation__plot.input,
rules.popoolation__hp.input,
2 changes: 1 addition & 1 deletion workflow/rules/popoolation/variance_sliding.smk
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ rule popoolation__variance_sliding__:
--snp-output {params.snps} \
2> {log} 1>&2

gzip --keep {params.snps} {params.vs} 2>> {log} 1>&2
gzip {params.snps} {params.vs} 2>> {log} 1>&2
"""


Expand Down
2 changes: 0 additions & 2 deletions workflow/rules/popoolation2/__main__.smk
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,4 @@ include: "plot.smk"

rule popoolation2:
input:
rules.popoolation2__sync.input,
rules.popoolation2__fst_sliding.input,
rules.popoolation2__plot.input,
2 changes: 2 additions & 0 deletions workflow/rules/preprocess/__environment__.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ dependencies:
- pigz =2.8
- python =3.12.3
- samtools =1.20
- r-tidyverse =2.0.0
- r-tidyquant =1.0.7
19 changes: 18 additions & 1 deletion workflow/rules/preprocess/coverage.smk
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,23 @@ rule preprocess__coverage__compute_hist__:
"""


rule preprocess__coverage:
rule preprocess_coverage__plot__:
input:
[PRE_COV / f"{population}.hist" for population in POPULATIONS],
output:
pdf=PRE_COV / "coverage.pdf",
coverage=PRE_COV / "coverage.tsv",
log:
PRE_COV / "coverage.log",
conda:
"__environment__.yml"
shell:
"""
Rscript workflow/scripts/plot_coverage.R 2> {log} 1>&2
"""


rule preprocess__coverage:
input:
hist=PRE_COV / "coverage.pdf",
coverage=PRE_COV / "coverage.tsv",
18 changes: 11 additions & 7 deletions workflow/rules/preprocess/index.smk
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,24 @@ rule preprocess__index:
input:
fa=REFERENCE / f"{REFERENCE_NAME}.fa.gz",
output:
mock=touch(PRE_INDEX / f"{REFERENCE_NAME}"),
buckets=[
PRE_INDEX / f"{reference_name}.{suffix}"
for reference_name in [REFERENCE_NAME]
for suffix in "0123 amb ann bwt.2bit.64 pac".split()
],
buckets=multiext(
str(PRE_INDEX / f"{REFERENCE_NAME}."),
"0123",
"amb",
"ann",
"bwt.2bit.64",
"pac",
),
log:
PRE_INDEX / f"{REFERENCE_NAME}.log",
conda:
"__environment__.yml"
params:
prefix=PRE_INDEX / f"{REFERENCE_NAME}",
shell:
"""
bwa-mem2 index \
-p {output.mock} \
-p {params.prefix} \
{input.fa} \
> {log} 2>&1
"""
16 changes: 12 additions & 4 deletions workflow/rules/preprocess/map.smk
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,32 @@ rule preprocess__map__bwamem2__:
input:
forward_=READS / "{population}.{library}_1.fq.gz",
reverse_=READS / "{population}.{library}_2.fq.gz",
index=PRE_INDEX / f"{REFERENCE_NAME}",
buckets=multiext(
str(PRE_INDEX / f"{REFERENCE_NAME}."),
"0123",
"amb",
"ann",
"bwt.2bit.64",
"pac",
),
reference=REFERENCE / f"{REFERENCE_NAME}.fa.gz",
output:
cram=PRE_MAP / "{population}.{library}.cram",
params:
rg_tag=compose_rg_tag,
threads: 24
log:
PRE_MAP / "{population}.{library}.bwa_mem.log",
conda:
"__environment__.yml"
params:
rg_tag=compose_rg_tag,
prefix=PRE_INDEX / f"{REFERENCE_NAME}",
shell:
"""
( bwa-mem2 mem \
-M \
-R '{params.rg_tag}' \
-t {threads} \
{input.index} \
{params.prefix} \
{input.forward_} \
{input.reverse_} \
| samtools sort \
Expand Down
47 changes: 47 additions & 0 deletions workflow/scripts/plot_coverage.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/usr/bin/env Rscript
library(tidyverse)
library(tidyquant)


frequencies <-
list.files(
path = "results/preprocess/coverage",
pattern = ".hist$",
full.names = TRUE
) %>%
map(
function(x) {
read_tsv(
file = x,
col_names = c("population", "coverage", "frequency"),
col_types = "cii"
)
}
) %>%
bind_rows()


coverages <- frequencies %>%
filter(coverage > 0) %>%
ggplot(aes(x = coverage, y = frequency)) +
geom_ma(ma_fun = SMA, linetype = "solid", color = "black") +
scale_y_log10() +
facet_wrap(~population)

ggsave(
filename = "results/preprocess/coverage/coverage.pdf",
plot = coverages,
width = 297, height = 210, units = "mm"
)



frequencies %>%
filter(coverage > 10) %>%
group_by(population) %>%
filter(frequency == max(frequency)) %>%
mutate(
max_coverage = 1.5 * coverage,
min_coverage = 0.5 * coverage
) %>%
write_tsv("results/preprocess/coverage/coverage.tsv")
Loading