Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Understanding discrepant results when using different ranking approaches and algorithmic parameters #128

Open
atakanekiz opened this issue Feb 11, 2023 · 2 comments
Labels

Comments

@atakanekiz
Copy link

atakanekiz commented Feb 11, 2023

Hello,

Thank you so very much for developing and continuing to support this fantastic package. This issue is not so much of a problem with the package; rather, it is a question to understand seemingly striking differences when analyzing scRNAseq data with fgsea. Let me set the stage by summarizing the analyses I have done on the same dataset (three different ranking approaches and varying algorithm parameters)

1a - Using normalized data (SCT), I manually ranked genes using signal-to-noise metric introduced initially for Broad GSEA by using ranking <- (sample_mean - reference_mean)/(sample_sd + reference_sd). Here, sample and reference cells are defined based on which sample they belong to in the integrated Seurat object (there are multiple patients some of them are responders (R) some are non-responders (NR)). Thus we have a genome-wide vector for ranking genes. I realize that this ranking approach may have shortcomings mostly due to sparse and non-normal gene expression data in scRNAseq. Nevertheless, this approach has been robust in multiple datasets for me. For running fgsea, nPermSimple = 1000 and scoreType = "std" (default) parameters are used.

1b - Ranking done the as above. nPermSimple = 1000 and scoreType = "neg" parameters are used.

1c - Ranking done the as above. nPermSimple = 1000 and scoreType = "pos" parameters are used.

1d - Ranking done the as above. nPermSimple = 1000 and scoreType = "neg" parameters are used.

2 - As per #50, I performed a differential expression analysis between R and NR cells in Seurat using the default Wilcoxon test. I exported the results for every gene (not just the significant ones) and ranked them based on avg_log2FC values. nPermSimple = 1000 and scoreType = "std" (default) parameters are used.

3 - Similar to the second approach, DE analysis was performed in Seurat, and signed p-values were used for ranking as per the discussions in #50 by -sign(markers$avg_log2FC) * log10(markers$p_val_adj). nPermSimple = 1000 and scoreType = "std" (default) parameters are used.

I'm showing results for each of these analyses side-by-side below:

Untitled

Now, to the question(s)... I have read related issues (#124 #83 #50 #55 #103 ) and I think my data suffers from unbalancedness which explains the different p and NES values. It is still surprising to me to see that a gene set of about 250 genes is that much susceptible to variation. Going back to the issues I referred to above, it seems like one can increase nPermSimple to achieve a smaller p-value rather than getting an NA value. NA values make me think that the enrichment may not be that strong to start with, but with the higher nPermSimple, now it is highly significant. Also we wouldn't want to select one-tailed p value calculation because we have to approach the hypothesis in an unbiased manner (ie. it may very well be that the enrichment turns out the other way around than expected in a given experiment. Am I thinking about these all wrong?

The second and more puzzling issue is that the analyses involving Seurat DE-ranked genes tell me different stories altogether. They do not look similar to signal-to-noise ranked results above, and weirdly they look quite different from each other too. Also, the green running statistics curve looks way smoother in these analyses compared to jagged jumps and falls in the first panel. I can't exactly make sense of these differences. Small variations may be expected but this much of a swing leaves me with question marks as to which approach is the right approach. I would appreciate if you can shed some light.

@assaron
Copy link
Member

assaron commented Feb 13, 2023

NA values make me think that the enrichment may not be that strong to start with, but with the higher nPermSimple, now it is highly significant.

No, NA values mean that we can't estimate them with a any decent accuracy. It can be both significant and not significant.

Also we wouldn't want to select one-tailed p value calculation because we have to approach the hypothesis in an unbiased manner (ie. it may very well be that the enrichment turns out the other way around than expected in a given experiment.

I don't fully agree with you here. You need to account for what kind of data you have and depending on that should apply appropriate tests and interpret them accordingly. If your test statistic is highly skewed towards one side (e.g. positive), the interpretation of the gene set enrichments should be definitely different for up and down regulated gene sets. That said, you can still test both for positive and negative enrichment separately by first doing "pos" and then "neg" tests.

The second and more puzzling issue is that the analyses involving Seurat DE-ranked genes tell me different stories altogether.

I don't think that the stories are different: different would be if one of them shows a highly significant enrichment and the other don't.

The Seurat curves have "N" shape because in the middle your genes have almost zero weights (estimated logFCs are zeroes, and, in particularly, log(padj) are zeroes), while there are some significant changing genes. SNR in your case is apparently more uniform.

Finally, as you have multiple patients, I would recommend to create a pseudo-bulk data and do the differential gene expression and pathway analysis there. It's much more robust approach: https://www.nature.com/articles/s41467-021-25960-2

@assaron assaron changed the title Understanding discrepant results when using different ranking approaches and algorithmic parameters Question: Understanding discrepant results when using different ranking approaches and algorithmic parameters Feb 13, 2023
@atakanekiz
Copy link
Author

I see, thanks. One follow-up question, if I may. I agree that analyses 2 and 3 are not completely opposite but their p-values are quite different. Many studies using GSEA analysis do not consider p<0.05 as a strict significance threshold and it is not uncommon to see enrichment reported with p=0.1 or so. LogFC ranked analysis (Analysis-2) has a p-value of 0.12 while signed p-value ranked analysis (Analysis-3) has a p-value of 0.87. Is this difference meaningful or should we disregard the enrichment here altogether saying both are non-significant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants