Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor codebase #41

Merged
merged 30 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a6cfe91
chore: refactor rule all, move some files to more appropriate locations
rroutsong Jun 27, 2024
e89e0bd
chore: refactor script locations, workflow rules, remove uropa rules …
rroutsong Jul 5, 2024
55ca719
fix: correct diffbindedger output paths
rroutsong Jul 8, 2024
21c4aea
chore: add back bin scripts, correct pathing for bin scripts, formatting
rroutsong Jul 8, 2024
07eb85f
fix: remove old imports, duplicate parameters
rroutsong Jul 10, 2024
83b0e5c
fix: correct typo type to _type
rroutsong Jul 10, 2024
543121b
fix: correct pathing on diffbind outputs
rroutsong Jul 10, 2024
e5027d8
fix: missing imports in peakcall rules
rroutsong Jul 11, 2024
0e7ba2e
fix: indent
rroutsong Jul 11, 2024
8f6cedd
fix: fix single end functionality in bwa rule
rroutsong Jul 11, 2024
c4e32cd
chore: spacing, refactor manorm rule
rroutsong Jul 11, 2024
65e3e3c
fix: reference global tmpdir and paired_end flag
rroutsong Jul 11, 2024
f946886
fix: fix rep switch
rroutsong Jul 11, 2024
af5e791
chore: more spacing, fix reps flag reversal
rroutsong Jul 11, 2024
52cd415
chore: make bin files executable, fix execution issues from test data…
rroutsong Jul 15, 2024
b54ac88
fix: working out bugs discovered on AV
rroutsong Jul 22, 2024
2f6aff8
fix: testing corrections
rroutsong Jul 29, 2024
710cae0
fix: comment out manorm rules for now
rroutsong Jul 31, 2024
53a7949
fix: turn off sicer involved inputs in cfchip pipeline
rroutsong Jul 31, 2024
1779a91
fix: realign uropa, promotertable2, diffbind outputs inputs
rroutsong Jul 31, 2024
06fcb3c
fix: realign promotertable cfchip inputs
rroutsong Jul 31, 2024
5b9ff2c
Delete workflow/chrom-seek.code-workspace
skchronicles Aug 2, 2024
991c620
fix: minor review fixes, reverting some dev settings
rroutsong Aug 2, 2024
62aa68d
Merge branch 'refactor_codebase' of http://github.com/OpenOmics/chrom…
rroutsong Aug 2, 2024
a453a6b
fix: comment out debug-dag from dryrun snakemake execution
rroutsong Aug 2, 2024
7cae04e
fix: reconfigure a few outputs based on test runs
rroutsong Aug 6, 2024
fd490e9
fix: refactor diffbind prep script, add error case exception for umap…
rroutsong Aug 6, 2024
73076f8
Update prep_diffbind.py
tovahmarkowitz Aug 7, 2024
3aef261
fix: add blocking/control functions to grouping header
rroutsong Aug 7, 2024
e405e1d
Merge branch 'refactor_codebase' of http://github.com/OpenOmics/chrom…
rroutsong Aug 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
File renamed without changes.
16 changes: 13 additions & 3 deletions workflow/scripts/DiffBind_v2_cfChIP_QC.Rmd → bin/DiffBind_v2_cfChIP_QC.Rmd
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,10 @@ try(dba.plotPCA(DBdataCounts),silent=TRUE)
```{r TMM}
vec <- c("seqnames", "start", "end", "width", "strand", samples$samples$SampleID)
consensus2 <- dba.peakset(DBdataCounts, bRetrieve=TRUE) %>% ##extracts TMM-normalized counts
as.data.frame() %>% setNames(vec) %>% arrange(start, end) %>% mutate(Peaks = paste0("Peak",1:nrow(.))) %>%
as.data.frame() %>%
setNames(vec) %>%
arrange(start, end) %>%
mutate(Peaks = paste0("Peak",1:nrow(.))) %>%
dplyr::select(1:4, Peaks, samples$samples$SampleID)

outfile1 <- paste0(contrasts, "-", peakcaller, "_DiffBindQC_TMMcounts.csv")
Expand All @@ -164,12 +167,19 @@ counts_TMM_ALL <- counts_TMM_ALL %>% dplyr::select(5:ncol(.)) %>%
t() %>% log10() %>% as.data.frame(.)
##UMAP coordinates
set.seed(123)

if (nrow(samples$samples) < 16) {
umap_coord <- umap(counts_TMM_ALL, n_neighbors= nrow(samples$samples)-1)
neighbors=nrow(samples$samples)-1
if (neighbors > 1) {
umap_coord <- umap(counts_TMM_ALL, n_neighbors=neighbors)
} else {
umap_coord <- umap(counts_TMM_ALL, n_neighbors=2)
}
} else {
umap_coord <- umap(counts_TMM_ALL)
}
umap_coord <-as.data.frame(umap_coord$layout) %>% setNames(c("UMAP1", "UMAP2"))
umap_coord <- as.data.frame(umap_coord$layout) %>%
setNames(c("UMAP1", "UMAP2"))

outfile <- paste0(contrasts, "-", peakcaller, "_DiffBindQC_UMAP.csv")
write.csv(umap_coord, outfile, row.names = F)
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
8 changes: 4 additions & 4 deletions workflow/scripts/filterMetrics → bin/filterMetrics.py
skchronicles marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,9 @@ def getmetadata(type):
elif type == 'tnreads':
metadata = 'NReads'
elif type == 'mnreads':
metadata = 'NMappedReads'
metadata = 'NMappedReads'
elif type == 'unreads':
metadata = 'NUniqMappedReads'
metadata = 'NUniqMappedReads'
elif type == 'fragLen':
metadata = 'FragmentLength'
return metadata
Expand All @@ -88,11 +88,11 @@ def filteredData(sample, ftype):
extenders = []
for ppqt_value in linelist:
if int(ppqt_value) > 150:
extenders.append(ppqt_value)
extenders.append(ppqt_value)
if len(extenders) > 0:
print("{}\t{}\t{}".format(sample, mtypes, extenders[0]))
else:
print("{}\t{}\t{}".format(sample, mtypes, linelist[0]))
print("{}\t{}\t{}".format(sample, mtypes, linelist[0]))
elif ftype == 'ppqt' or ftype == 'ngsqc' or ftype == 'nrf':
mtypes = getmetadata(ftype)
for i in range(len(linelist)):
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
8 changes: 4 additions & 4 deletions workflow/scripts/prep_diffbind.py → bin/prep_diffbind.py
100644 → 100755
rroutsong marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,20 @@
blocks = config['project']['blocks']

if None in list(blocks.values()):
samplesheet = [",".join(["SampleID","Condition", "Replicate", "bamReads",
samplesheet = [",".join(["SampleID", "Condition", "Replicate", "bamReads",
"ControlID", "bamControl", "Peaks", "PeakCaller"])]
else:
samplesheet = [",".join(["SampleID","Condition","Treatment","Replicate", "bamReads",
samplesheet = [",".join(["SampleID", "Condition", "Treatment", "Replicate", "bamReads",
"ControlID", "bamControl", "Peaks", "PeakCaller"])]


for condition in args.group1, args.group2:
for chip in groupdata[condition]:
replicate = str([ i + 1 for i in range(len(groupdata[condition])) if groupdata[condition][i]== chip ][0])
bamReads = args.workpath + "/" + args.bamdir + "/" + chip + ".Q5DD.bam"
bamReads = args.bamdir + "/" + chip + ".Q5DD.bam"
controlID = chip2input[chip]
if controlID != "":
bamControl = args.workpath + "/" + args.bamdir + "/" + controlID + ".Q5DD.bam"
bamControl = args.bamdir + "/" + controlID + ".Q5DD.bam"
else:
bamControl = ""
peaks = args.workpath + "/" + args.peaktool + "/" + chip + "/" + chip + args.peakextension
Expand Down
62 changes: 62 additions & 0 deletions bin/prep_diffbindQC.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/usr/bin/env python3

import json
import argparse
import csv
from os.path import join


def main(args):
with open(join(args.workpath, "config.json"), "r") as read_file:
config=json.load(read_file)

chip2input = config['project']['peaks']['inputs']
groupdata = config['project']['groups']

tmpIDs = [x for xs in groupdata.values() for x in xs]
Ncounts = [tmpIDs.count(tmp) for tmp in set(tmpIDs)]

with open(args.csvfile, 'w') as csvfile:
columns = ["SampleID","Condition", "Replicate", "bamReads",
"ControlID", "bamControl", "Peaks", "PeakCaller"]
writer = csv.DictWriter(csvfile, fieldnames=columns)
writer.writeheader()

count = 1
for chip in chip2input.keys():
if set(Ncounts) == {1}: # if all samples only in one group
for key in groupdata.keys():
if chip in groupdata[key]:
condition = key
replicate = str([ i + 1 for i in range(len(groupdata[condition])) if groupdata[condition][i]== chip ][0])
else:
condition = ""
replicate = str(count)
count = count +1
bamReads = args.bamdir + "/" + chip + ".Q5DD.bam"
controlID = chip2input[chip]
if controlID != "":
bamControl = args.bamdir + "/" + controlID + ".Q5DD.bam"
else:
bamControl = ""
peaks = args.workpath + "/" + args.peaktool + "/" + chip + "/" + chip + args.peakextension
row_values = [chip, condition, replicate, bamReads, controlID, bamControl, peaks, args.peakcaller]
writer.writerow(dict(zip(columns, row_values)))


if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Script to prepare the DiffBind input csv')
parser.add_argument('--wp', dest='workpath', required=True,
help='Full path of the working directory')
parser.add_argument('--pt', dest='peaktool', required=True,
help='Name of the the peak calling tool, also the directory where the peak file will be located')
parser.add_argument('--pe', dest='peakextension', required=True,
help='The file extension of the peakcall output')
parser.add_argument('--pc', dest='peakcaller', required=True,
help='Value for the PeakCaller column of the DiffBind csv')
parser.add_argument('--bd', dest='bamdir', required=True,
help='Name of the directory where the bam files are located')
parser.add_argument('--csv', dest='csvfile', required=True,
help='Name of the output csv file')

main(parser.parse_args())
File renamed without changes.
4 changes: 2 additions & 2 deletions config/containers.json
skchronicles marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"images": {
"cfchip": "docker://skchronicles/cfchip_toolkit_v0.5.0",
"python": "docker://asyakhleborodova/chrom_seek_python_v0.1.0",
"cfchip": "docker://skchronicles/cfchip_toolkit:v0.5.0",
"python": "docker://asyakhleborodova/chrom_seek_python:v0.1.0",
"ppqt": "docker://asyakhleborodova/ppqt:v0.2.0"
}
}
8 changes: 6 additions & 2 deletions src/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from . import version as __version__


def init(repo_path, output_path, links=[], required=['workflow', 'resources', 'config']):
def init(repo_path, output_path, links=[], required=['workflow', 'bin', 'resources', 'config']):
"""Initialize the output directory. If user provides a output
directory path that already exists on the filesystem as a file
(small chance of happening but possible), a OSError is raised. If the
Expand Down Expand Up @@ -207,6 +207,7 @@ def setup(sub_args, ifiles, repo_path, output_path):
# Add other runtime info for debugging
config['project']['version'] = __version__
config['project']['workpath'] = os.path.abspath(sub_args.output)
config['project']['binpath'] = os.path.abspath(os.path.join(config['project']['workpath'], 'bin'))
git_hash = git_commit_hash(repo_path)
config['project']['git_commit_hash'] = git_hash # Add latest git commit hash
config['project']['pipeline_path'] = repo_path # Add path to installation
Expand All @@ -221,7 +222,8 @@ def setup(sub_args, ifiles, repo_path, output_path):
v = str(v)
config['options'][opt] = v


# initiate a few workflow vars
config['options']['peak_type_base'] = ["protTSS"]
return config


Expand Down Expand Up @@ -608,6 +610,8 @@ def dryrun(outdir, config='config.json', snakefile=os.path.join('workflow', 'Sna
dryrun_output = subprocess.check_output([
'snakemake', '-npr',
'-s', str(snakefile),
'--verbose',
# '--debug-dag',
'--use-singularity',
'--rerun-incomplete',
'--cores', str(256),
Expand Down
1 change: 0 additions & 1 deletion src/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,6 @@ function submit(){
if [[ ${6#\'} != /lscratch* ]]; then
CLUSTER_OPTS="sbatch --cpus-per-task {cluster.threads} -p {cluster.partition} -t {cluster.time} --mem {cluster.mem} --job-name={params.rname} -e $SLURM_DIR/slurm-%j_{params.rname}.out -o $SLURM_DIR/slurm-%j_{params.rname}.out {cluster.ntasks} {cluster.ntasks_per_core} {cluster.exclusive}"
fi
# Create sbacth script to build index
cat << EOF > kickoff.sh
#!/usr/bin/env bash
#SBATCH --cpus-per-task=16
Expand Down
Loading
Loading