Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_ctDNA not working with hg19 NCBI BAM file #11

Open
zahrahaider opened this issue Jan 8, 2024 · 0 comments
Open

test_ctDNA not working with hg19 NCBI BAM file #11

zahrahaider opened this issue Jan 8, 2024 · 0 comments

Comments

@zahrahaider
Copy link

Hi,

Thanks for a great tool! I am trying to use test_ctDNA on bam files from targeted panel sequencing data and with BAM files aligned to RefSeq: NCBI GRCH37 Reference Sequence. Even after changing the seqname style of the BSgenome.Hsapiens.UCSC.hg19 from UCSC to NCBI, I kept getting the error: "Error: Chromosomes in bam file don't match the specified reference".
I saw that the culprit chromosomes were these unlocalized sequences GL00* when I ran the get_bam_chr() function. These sequences failed to map in BSgenome when I switched from UCSC to NCBI.
Is there a way to skip the chromosomes in the BAM file that do not match the reference? Could that be added as a parameter to the test_ctDNA function?

Below is the code and the sessionInfo():

#To make BSgenome.Hsapiens.UCSC.hg19 package compatible, I first changed the seqname style as follows:
library(BSgenome.Hsapiens.UCSC.hg19)

#change ref genome seq style from Ucsc (chr) to NCBI to match the bam files 

hg19 <- BSgenome.Hsapiens.UCSC.hg19
seqlevelsStyle(hg19) <- "NCBI"

#test ctDNA

test_pos_mrd<-test_ctDNA(bam="panel_data/bam_files/P05.preop.subset.bam",mutations = mutations_df[mutations_df$patient_id=="P05",c("CHROM","POS","REF","ALT")],targets = target,reference = hg19,informative_reads_threshold = 100)

Error: Chromosomes in bam file don't match the specified reference

#check the chromosomes in BAM file
> get_bam_chr("panel_data/bam_files/P05.preop.subset.bam")
 [1] "1"          "2"          "3"          "4"          "5"         
 [6] "6"          "7"          "8"          "9"          "10"        
[11] "11"         "12"         "13"         "14"         "15"        
[16] "16"         "17"         "18"         "19"         "20"        
[21] "21"         "22"         "X"          "Y"          "MT"        
[26] "GL000207.1" "GL000226.1" "GL000229.1" "GL000231.1" "GL000210.1"
[31] "GL000239.1" "GL000235.1" "GL000201.1" "GL000247.1" "GL000245.1"
[36] "GL000197.1" "GL000203.1" "GL000246.1" "GL000249.1" "GL000196.1"
[41] "GL000248.1" "GL000244.1" "GL000238.1" "GL000202.1" "GL000234.1"
[46] "GL000232.1" "GL000206.1" "GL000240.1" "GL000236.1" "GL000241.1"
[51] "GL000243.1" "GL000242.1" "GL000230.1" "GL000237.1" "GL000233.1"
[56] "GL000204.1" "GL000198.1" "GL000208.1" "GL000191.1" "GL000227.1"
[61] "GL000228.1" "GL000214.1" "GL000221.1" "GL000209.1" "GL000218.1"
[66] "GL000220.1" "GL000213.1" "GL000211.1" "GL000199.1" "GL000217.1"
[71] "GL000216.1" "GL000215.1" "GL000205.1" "GL000219.1" "GL000224.1"
[76] "GL000223.1" "GL000195.1" "GL000212.1" "GL000222.1" "GL000200.1"
[81] "GL000193.1" "GL000194.1" "GL000225.1" "GL000192.1"

##session_info

R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS/LAPACK: /Users/zahra/.conda/envs/R/lib/libopenblasp-r0.3.23.dylib;  LAPACK version 3.11.0

locale:
[1] C/UTF-8/C/C/C/C

time zone: Europe/Stockholm
tzcode source: system (macOS)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] data.table_1.14.10                GenomicAlignments_1.38.0         
 [3] SummarizedExperiment_1.32.0       Biobase_2.60.0                   
 [5] MatrixGenerics_1.14.0             matrixStats_1.2.0                
 [7] Rsamtools_2.16.0                  BSgenome.Hsapiens.UCSC.hg19_1.4.3
 [9] BSgenome_1.70.1                   rtracklayer_1.62.0               
[11] BiocIO_1.12.0                     Biostrings_2.68.1                
[13] XVector_0.40.0                    GenomicRanges_1.52.1             
[15] GenomeInfoDb_1.36.4               IRanges_2.34.1                   
[17] S4Vectors_0.38.2                  BiocGenerics_0.46.0              
[19] plyr_1.8.9                        ggplot2_3.4.4                    
[21] dplyr_1.1.4                       tidyr_1.3.0                      
[23] purrr_1.0.2                       ctDNAtools_0.4.1                 

loaded via a namespace (and not attached):
 [1] utf8_1.2.4              generics_0.1.3          SparseArray_1.2.2      
 [4] bitops_1.0-7            lattice_0.22-5          magrittr_2.0.3         
 [7] grid_4.3.2              Matrix_1.6-4            restfulr_0.0.15        
[10] fansi_1.0.6             scales_1.3.0            XML_3.99-0.16          
[13] codetools_0.2-19        abind_1.4-5             cli_3.6.2              
[16] rlang_1.1.2             crayon_1.5.2            munsell_0.5.0          
[19] DelayedArray_0.28.0     withr_2.5.2             yaml_2.3.8             
[22] S4Arrays_1.2.0          tools_4.3.2             parallel_4.3.2         
[25] BiocParallel_1.34.2     colorspace_2.1-0        GenomeInfoDbData_1.2.10
[28] assertthat_0.2.1        vctrs_0.6.5             R6_2.5.1               
[31] lifecycle_1.0.4         zlibbioc_1.46.0         pkgconfig_2.0.3        
[34] pillar_1.9.0            gtable_0.3.4            glue_1.6.2             
[37] Rcpp_1.0.11             tibble_3.2.1            tidyselect_1.2.0       
[40] rjson_0.2.21            compiler_4.3.2          RCurl_1.98-1.13        


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant