Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scran normalize #101

Open
jayypaul opened this issue Jun 14, 2022 · 1 comment
Open

scran normalize #101

jayypaul opened this issue Jun 14, 2022 · 1 comment

Comments

@jayypaul
Copy link

Hello,

I have a heterogeneous dataset consisting of stroma and immune cells. For now, I'm interested in the stroma cells, and I was wondering if running scran again after subsetting the compartment of interest will lead to more accurate size factor estimation, since heterogenous data can produce negative estimates in some cases (which I witnessed but was able to address).

After subsetting, I have this many cells per sample:

image

And I would imagine that this would be a problem as well. Prior to subsetting I have this many cells per sample:
image

I've also read that low number of cells per sample could be problematic with scran normalization... but just want to get some opinion from authors on which may be the better route forward.. run norm prior to subset, then subset? Or re run...

Thanks!

@LTLA
Copy link
Collaborator

LTLA commented Jun 15, 2022

If you're considering the analysis of each sample, then yes, the small number of cells in some of the samples will make normalization difficult. More specifically, this will introduce some instability in the estimates; the question is whether or not this instability is offset by the (assumed) improvement in accuracy once heterogeneity is out of the picture.

Having said that, if you've already subsetted it down to stroma cells and the subpopulations within the stroma subset are reasonably similar, you could just go with library size normalization (e.g., scuttle::librarySizeFactors). The expectation would be that there isn't a lot of composition biases that would motivate the use of scran's pooling normalization in the first place.

Alternatively, if you're analyzing all samples together and the batch effects are modest, you could run pooledSizeFactors on the set of all stroma cells. Any composition biases introduced by minor DE between batches would then be handled by the pooling normalization, while ensuring you have enough cells to get stable estimates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants