Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update description for z-score plots #157

Open
skanwal opened this issue Jul 22, 2024 · 2 comments
Open

Update description for z-score plots #157

skanwal opened this issue Jul 22, 2024 · 2 comments
Assignees

Comments

@skanwal
Copy link
Member

skanwal commented Jul 22, 2024

Expand legend to clarify plots are using median values - because box plots describe medians.
This is as opposed to tables which are using mean.

Mean will be different from median for genes that have low expression across the cohort.

@skanwal skanwal self-assigned this Jul 22, 2024
@JMarzec
Copy link
Member

JMarzec commented Aug 6, 2024

I afraid that it's more complex than that. Both, the plots ad the tables, present median values of Z-scores calculated for individual groups/patient. The key functions to look at are:

  1. exprGroupsStats_geneWise.R ( https://github.com/umccr/RNAsum/blob/main/R/exprGroupsStats_geneWise.R ):
  • that function returns two objects: (1) group_stats.list, which includes stats for individual genes calculated FOR EACH group, and (2) gene_stats.list, which seems to include stats for individual genes but calculated ACROSS all samples
  1. exprTable.R ( https://github.com/umccr/RNAsum/blob/main/R/exprTable.R ):
  • it uses the group_stats.list from exprGroupsStats_geneWise.R() function
  1. cdfPlot.R ( https://github.com/umccr/RNAsum/blob/main/R/cdfPlot.R ):
  • it it uses both the group_stats.list and gene_stats.list objects from exprGroupsStats_geneWise.R() function

I feel that the plots requires to provide values in the context of the entire cohort while the table provide stats (median values) for the group/patient.

@JMarzec
Copy link
Member

JMarzec commented Aug 6, 2024

Re the table legend it could be mentioned that the values refer to MEDIAN Z-score (or percentile) in the reference cohort and patient, e.g. for BRCA case in the Z-score tab it could look like (changes/additions in italics font):

In the BRCA (TCGA), Patient and the Diff columns the RED colour range indicate relatively high expression (median Z-score) values and BLUE colour range indicate relatively low expression (median Z-score) values in individual sample group. The BLANK cells with missing values indicate genes with no/low expression. The Diff (Patient vs BRCA (TCGA)) column illustrates the difference between median Z-scores in patient sample and reference cancer cohort for each mutated gene...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants