Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization/Log2 transformation requirements #28

Open
mallorymaynes opened this issue Oct 5, 2021 · 4 comments
Open

Normalization/Log2 transformation requirements #28

mallorymaynes opened this issue Oct 5, 2021 · 4 comments

Comments

@mallorymaynes
Copy link

Hello and thanks for developing this model. I read in the supplemental materials that the G x S matrix for RNAseq data should be filtered for low counts, normalized, and also log2 transformed before running the model. It also gives RPKM and TPM as suggestions for the normalization, however I would like to use upper-quantile normalized counts generated by RUVg so I can include my use my spike-ins easily. Will this be a problem? So far I have filtered low count genes and extracted the normalized counts from RUVg, log2 transformed them, and rounded so they are integers. I want to be sure I am understanding correctly and that my normalization procedure checks out (and also that I'm not over-normalizing).

Thanks!

@davidsebfischer
Copy link
Contributor

Hi @mallorymaynes , ImpulseDE2 uses a negative binomial noise model which comes with assumptions on data distribution and is built for count (ie non-normalised, non-logged, integer) data. This type of statistical modelling still works if your data transform does not validate the count data structure too much, log-ing will cause major issues most likely, for example.

Assuming that your transforms dont change the statistics too much, it may work, it would be better to use count data and to supply size factors for scale the model. Filtering genes does not affect the model fits of the other genes if you define size factors.

@mallorymaynes
Copy link
Author

Thank you, this is very helpful. It sounds like I should instead use my raw counts and include the estimated factors of unwanted variation generated by RUVg - is that what you mean by supplying factors to scale the model?

@mallorymaynes
Copy link
Author

Hi David, I am still a little confused about how to input my RUVseq factors of unwanted variation into ImpulseDE2. Specifically, the output for RUVseq (called "W_1") is used as a covariate in DESeq2 or edgeR models, such that the full model for a time course in DESeq2 would be "~ W_1 + time + treatment + treatment:time," and the reduced would be: "~ W_1 + treatment + time." Given this, how do I correctly integrate W_1 into ImpulseDE2? Would this be considered vecConfounders, size factors, or something I can integrate in the dfAnnotation? Thanks for your help, it is much appreciated!

@davidsebfischer
Copy link
Contributor

This would be an element of vecConfounders, which essentially build a model that works like the "+" nomenclature in DESeq!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants