feat: Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. #29

magmir71 · 2023-10-27T13:42:21Z

1. It is not clear now, how exactly the transcription process is modeled. It seems that transcription is modeled using just one parameter (average expression level) with Poisson distribution. However, at least two parameters are needed to specify smth like negative binomial (NB) distribution. For zero-inflation, you need yet another parameter. Most popular scRNA-seq expression quantification tools assume NB of zero-inflated NB distribution.

2. There is also a parameter "total number of reads", which is used in the last step of the pipeline. I assume it uses a multinomial distribution where the vector of probabilities correspond to simulated transcript counts from transcript generation step.
However, other multivariate distributions could reflect the actual data much better. E.g., Dirichlet-multinomial distribution is often used to model overdispersed multinomial distribution.

Moreover, total number of reads is quite misleading because it seems that the number should represent the number of reads after deduplication. I think, if one could specify the number of PCR cycles for two amplification steps - before fragmentation and after fragmentation, it would be much more informative and useful.

3. It seems that fragmentation step produces just one fragmented cDNA from the original full-length cDNA. However, in real 10x Genomics data, you have multiple fragmented cDNAs from the same transcript, because one does fragmentation after 1st PCR amplification.

magmir71 changed the title ~~Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads.~~ feat: Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. #29

feat: Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. #29

magmir71 commented Oct 27, 2023

feat: Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. #29

feat: Specify distributions that will be used for modelling transcription, fragmentation and sampling of reads. #29

Comments

magmir71 commented Oct 27, 2023