Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpkm/tpm rather than median/mean reads #10

Open
mhassan opened this issue Oct 19, 2018 · 7 comments
Open

rpkm/tpm rather than median/mean reads #10

mhassan opened this issue Oct 19, 2018 · 7 comments

Comments

@mhassan
Copy link

mhassan commented Oct 19, 2018

am plotting two samples on using the same settings on the the example_run.sh, except the gtf file (which is mine). However, while the y axis scale on the first automatically readjusts to median read count, panel 2 is stuck to one. Is there a way to make sure the scales on both panels readjust automatically.
Also is it possible to visual rfkm/tpm rather than median/mean reads?

@dgarrimar
Copy link
Collaborator

Hi @mhassan, regarding your first question, could you provide a sample plot to better diagnose what's going one? In principle both scales should be adjusted as expected. Regarding your second question, it is currently not possible. But how would you do that? Think that we are showing the signal for a particular region, while RPKM/TPM would correspond to the whole gene. Therefore it is the signal track what is more informative, right? @abreschi @emi80

@mhassan
Copy link
Author

mhassan commented Oct 30, 2018 via email

@dgarrimar dgarrimar changed the title axis scales rpkm/tpm rather than median/mean reads Jul 8, 2020
@dgarrimar
Copy link
Collaborator

dgarrimar commented Jul 9, 2020

Do you think something like this (pseudo RPKM) would be useful?

where query length is the length of the aligned query sequence.

@emi80
Copy link
Member

emi80 commented Jul 16, 2020

Hi @dgarrimar, @mhassan
I think this could be done. The only problem I see is that getting the total number of reads from the BAM would slow down the process quite a bit with big files and with many files it might end up being very slow.

We might look into multiprocessing or other ways to improve this if we really need it.

@stale
Copy link

stale bot commented Jan 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issues with no recent activity label Jan 21, 2021
@dgarrimar dgarrimar added help wanted and removed stale Issues with no recent activity labels Jan 22, 2021
@ckovalak
Copy link

ckovalak commented Sep 9, 2021

Hi @emi80
Just wanted to bump this because I think having normalized read counts would be extremely helpful when comparing across datasets of varying depth. To keep it simple, maybe just an option to report RPM (reads per million) rather than a raw read count? Total library size could potentially be included as an additional column in the input_bams.tsv as this value would not change and would save considerable time calculating this number each plot.

@bdgsilva
Copy link

Hi @emi80 @dgarrimar,
Also wanted to further support the implementation of normalized read densities as it would be extremely useful when comparing samples with different sequencing depths. @ckovalak suggestion is a good option and I believe it was how MISO implemented this same feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants