Normalization/Log2 transformation requirements #28

mallorymaynes · 2021-10-05T13:35:51Z

Hello and thanks for developing this model. I read in the supplemental materials that the G x S matrix for RNAseq data should be filtered for low counts, normalized, and also log2 transformed before running the model. It also gives RPKM and TPM as suggestions for the normalization, however I would like to use upper-quantile normalized counts generated by RUVg so I can include my use my spike-ins easily. Will this be a problem? So far I have filtered low count genes and extracted the normalized counts from RUVg, log2 transformed them, and rounded so they are integers. I want to be sure I am understanding correctly and that my normalization procedure checks out (and also that I'm not over-normalizing).

Thanks!

davidsebfischer · 2021-10-06T08:58:45Z

Hi @mallorymaynes , ImpulseDE2 uses a negative binomial noise model which comes with assumptions on data distribution and is built for count (ie non-normalised, non-logged, integer) data. This type of statistical modelling still works if your data transform does not validate the count data structure too much, log-ing will cause major issues most likely, for example.

Assuming that your transforms dont change the statistics too much, it may work, it would be better to use count data and to supply size factors for scale the model. Filtering genes does not affect the model fits of the other genes if you define size factors.

mallorymaynes · 2021-10-06T13:38:57Z

Thank you, this is very helpful. It sounds like I should instead use my raw counts and include the estimated factors of unwanted variation generated by RUVg - is that what you mean by supplying factors to scale the model?

mallorymaynes · 2021-10-08T18:19:54Z

Hi David, I am still a little confused about how to input my RUVseq factors of unwanted variation into ImpulseDE2. Specifically, the output for RUVseq (called "W_1") is used as a covariate in DESeq2 or edgeR models, such that the full model for a time course in DESeq2 would be "~ W_1 + time + treatment + treatment:time," and the reduced would be: "~ W_1 + treatment + time." Given this, how do I correctly integrate W_1 into ImpulseDE2? Would this be considered vecConfounders, size factors, or something I can integrate in the dfAnnotation? Thanks for your help, it is much appreciated!

davidsebfischer · 2021-10-14T09:09:48Z

This would be an element of vecConfounders, which essentially build a model that works like the "+" nomenclature in DESeq!

mallorymaynes mentioned this issue Oct 19, 2021

Error fitting Impulse model #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization/Log2 transformation requirements #28

Normalization/Log2 transformation requirements #28

mallorymaynes commented Oct 5, 2021

davidsebfischer commented Oct 6, 2021

mallorymaynes commented Oct 6, 2021

mallorymaynes commented Oct 8, 2021

davidsebfischer commented Oct 14, 2021

Normalization/Log2 transformation requirements #28

Normalization/Log2 transformation requirements #28

Comments

mallorymaynes commented Oct 5, 2021

davidsebfischer commented Oct 6, 2021

mallorymaynes commented Oct 6, 2021

mallorymaynes commented Oct 8, 2021

davidsebfischer commented Oct 14, 2021