Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. #29

Open
magmir71 opened this issue Oct 27, 2023 · 0 comments

Comments

@magmir71
Copy link

1. It is not clear now, how exactly the transcription process is modeled. It seems that transcription is modeled using just one parameter (average expression level) with Poisson distribution. However, at least two parameters are needed to specify smth like negative binomial (NB) distribution. For zero-inflation, you need yet another parameter. Most popular scRNA-seq expression quantification tools assume NB of zero-inflated NB distribution.

2. There is also a parameter "total number of reads", which is used in the last step of the pipeline. I assume it uses a multinomial distribution where the vector of probabilities correspond to simulated transcript counts from transcript generation step.
However, other multivariate distributions could reflect the actual data much better. E.g., Dirichlet-multinomial distribution is often used to model overdispersed multinomial distribution.

Moreover, total number of reads is quite misleading because it seems that the number should represent the number of reads after deduplication. I think, if one could specify the number of PCR cycles for two amplification steps - before fragmentation and after fragmentation, it would be much more informative and useful.

3. It seems that fragmentation step produces just one fragmented cDNA from the original full-length cDNA. However, in real 10x Genomics data, you have multiple fragmented cDNAs from the same transcript, because one does fragmentation after 1st PCR amplification.

@magmir71 magmir71 changed the title Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. feat: Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant