Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_coverage() gives total read count higher than total mapped reads from recount3 #7

Open
hanalysis opened this issue Nov 28, 2024 · 0 comments

Comments

@hanalysis
Copy link

Hi all,

Thank you for such a useful tool. I've been trialling megadepth to quantify repeat element reads in studies included in recount3, starting with RP110623 (GSE100576). But, I found that using get_coverage() for this study I get a total read count of greater than the "total mapped reads" in the recount3 metadata file for each sample.

Here's the code I've been using to get to total read count for the first sample:

library(tidyverse)
library(megadepth)
library(recount3)


bigwig_url <- locate_url(
  "SRP110623", 
  "data_sources/sra",
  "bw", # bigwig file
  "human",
  "SRR5762372", 
  recount3_url = "https://recount-opendata.s3.amazonaws.com/recount3/release"
)

# Bed file is here: "https://github.com/hanalysis/Misc/blob/main/alu_example.bed"
                          
coverage <- megadepth::get_coverage(bigwig_file = bigwig_url, 
                                          annotation = "alu_example.bed",
                                          op = "sum")

sum(coverage$score)



I had expected that this would give me fewer reads than the total mapped reads in the metadata from recount3 (from "recount_qc.star.all_mapped_reads").

My understanding was that multimapped reads are included in the recount3 bigwig files, but that Megadepth only counts “primary mappings” - i,e. each read would only be counted once, as long as our intervals are not within such a small distance so that a read could overlap two intervals (we merged all intervals within 100nt of each other so this seems unlikely). Any insight that could help us to understand what is happening here would be very much appreciated.

Thanks in advance for any help,

Han

R Session Information

setting value
version R version 4.2.1 (2022-06-23 ucrt)
os Windows 10 x64 (build 22631)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United Kingdom.utf8
ctype English_United Kingdom.utf8
tz Europe/London
date 2024-11-27
rstudio 2024.09.1+394 Cranberry Hibiscus (desktop)
pandoc NA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant