Replies: 6 comments
-
Hi, Thank you for your interest in Bambu. We do want to look at if we can adapt Bambu for singe cell data in the future, however there are no imminent plans to release this yet. |
Beta Was this translation helpful? Give feedback.
-
@andredsim : thanks for the reply. I am working with @atrull314 in the development of the nf-core Indeed, when we discussed adding That being said, I think the above plan will be our first pass - what you suggested has intrigued me and I can see that being a feasible step (with what we are currently working on being used as a way compare results at a high-level. and also have the annotations you mentioned). Thus, I will be happy to look into this when we get to it and if we find anything useful, I will be sure to share here (or if someone else jumps ahead and tests the suggestion, we will be sure to add this on to |
Beta Was this translation helpful? Give feedback.
-
thanks, yes please share any update! It looks like Bambu for transcript discovery might be a great addition to the pipeline. I also joined the slack channel on nf-core |
Beta Was this translation helpful? Give feedback.
-
Hi @lianov , I am part of the bambu team and am planning to expand bambu to adopt single cell data as well. We are interested in the |
Beta Was this translation helpful? Give feedback.
-
@lingminhao : That would be great and yes we have a Looking forward to our discussions. |
Beta Was this translation helpful? Give feedback.
-
Hello, I wanted to follow-up on our discussion with an update related to the approach suggested by @andredsim. What we have found so far, is that bambu fails on specific subsetted bams - we are unsure if there is a workaround this at the moment, and hope the example provided below (along with the data) can aid this discussion. If there is any miss-interpreatation on our end from the suggested approach, we are happy to change. BackgroundFor context, the data we use here is derived from the In particular, the GridION dataset contains the higher quality chemistry we are targetting for the pipeline (Q20) - thus this is our main dataset. However, we also make use of the PrometheION dataset as a stress test linked to high depth (we do expect that samples being processsed with this pipeline will have high depth as a related to single-cell/nuclei). Overview of steps in testAfter initial processing (code and data will be below):
Step number 4 is where we find issues. Initially, it seems to work, until we hit an issue with one specific BAM file (BAM name:
While processing all subsetted BAMs is the preffered approach, I have gone ahead and also tested running bambu per subsetted bam to see if we could spot other issues (in the code below, this is under
This is where we are - it's unclear to us if we should be filtering out specific BAMs which do not meet 1 or more assumptions for bambu (unclear what we might be missing. We would need to know more to make it generic enough for an nf-core pipeline analyzing different data types/sources). At a high-level, we did look at the size of the BAMs in case they were too small for bambu (note these tests have only been done with the GridION sample so far). While these 2 specific BAMs are smaller than others, we do see other BAMs of similar size where we did not encounter a fatal error - just a data point. Data and code:I am providing a public Globus endpoint which should have all the data needed to reproduce the issue. If you have any issues, please let us know. Depending on the scope of this issue, we may move forward without including bambu on an initial release BUT we would love to do so as soon as we can work out a solution for single-cell/nuclei data and proceed with further tests (whether before first release or future releases). Thus, we are happy to work with you all to push this discussion forward. Globus endpoint : https://app.globus.org/file-manager?origin_id=d1a6e641-7072-4477-8aa7-40fa4f0a5622&origin_path=%2F
Script is also present in the Globus endpoint ( # Brief tests linked discussion with bambu authors (https://github.com/GoekeLab/bambu/discussions/342):
######################
### LOAD LIBRARIES ###
######################
library(bambu)
################
### BAM PATH ###
################
# setting 2 diff. variables for quick tests
#bam_files_small == GridION
#bam_files_large == PromethION
bam_files_small <- "./input_data/ERR9958133.corrected.dedup.bam"
bam_files_large <- "./input_data/ERR9958135.corrected.dedup.bam"
gtf <- "./input_data/gencode.v31.annotation.gtf"
genome_file <- "./input_data/GRCh38.primary_assembly.genome.fa"
##########################
### PREPARE ANNOTATION ###
##########################
bambuAnnotations <- prepareAnnotations(gtf)
########################################
### PERFORM DISCOVERY ON ALL SAMPLES ###
########################################
# Here, we set quant = FALSE while discovery = TRUE to save a common set of annotations
se_all_no_quant <- bambu(reads = c(bam_files_small, bam_files_large),
annotations = bambuAnnotations,
genome = genome_file,
lowMemory = FALSE,
discovery = TRUE,
quant = FALSE,
verbose = TRUE)
se_all_no_quant
# save extended annotation for use on per-sample runs:
writeToGTF(se_all_no_quant, file = "extended_annotation_all_samples.gtf")
###########################
### Re-PREPARE ANNOTATION #
###########################
# from gtf generate above re-prepare annotation obj
new_gtf <- "./extended_annotation_all_samples.gtf"
bambuAnnotations <- prepareAnnotations(new_gtf)
######################################
### QUANTIFY per cell data GridION ###
######################################
# testing for GridION
# quantify per cell barcode:
bam_files_ERR9958133 <- list.files(path = "input_data/ERR9958133_subset_bam",
pattern = "\\.bam$",
full.names = TRUE)
bam_files_ERR9958133
#### Try1: all bams (per-cell within a sample) in a single bambu run ####
# would be the cleaner path
dir.create("rcOutDir_ERR9958133", recursive = TRUE)
se_ERR9958133_per_cell <- bambu(reads = bam_files_ERR9958133,
annotations = bambuAnnotations,
genome = genome_file,
ncore = 1,
lowMemory = FALSE,
discovery = FALSE, # we use previously discovered annotations
quant = TRUE,
rcOutDir = "rcOutDir_ERR9958133",
verbose = TRUE)
se_ERR9958133_per_cell
#NOTE: bam_files_ERR9958133[499] causes issues ("input_data/ERR9958133_subset_bam/ERR9958133_GACTTCCTCTGTCGCT.bam")
#### Try2: processing cell barcodes separately... ####
dir.create("per_cell_outs_ERR9958133/", recursive = TRUE)
mapply(FUN = function(x) {
se_per_cell <- bambu(reads = bam_files_ERR9958133[x],
annotations = bambuAnnotations,
genome = genome_file,
ncore = 1,
lowMemory = FALSE,
discovery = FALSE, # we use previously discovered annotations
quant = TRUE,
verbose = TRUE)
# save standard bambu outputs with basename of bam as prefix
writeBambuOutput(se_per_cell,
path = "per_cell_outs_ERR9958133/",
prefix = paste0(sub("\\.bam$", "", basename(bam_files_ERR9958133[x])),"_"))
}, x=1:length(bam_files_ERR9958133))
#NOTE: bam_files_ERR9958133[39] causes issues ("input_data/ERR9958133_subset_bam/ERR9958133_AAGCGTTGTTTGATCG.bam")
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I just had a question on whether there will be plans for this tool to support single cell data? I know that there's a way to produce the transcript-to-read mappings and build the count matrix ourselves, but because that takes the quantification piece out, I was just curious if there were any plans to integrate this within bambu so there would be an option to produce single cell barcode matrices?
Thanks for your help!
Beta Was this translation helpful? Give feedback.
All reactions