Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide driver genes #1

Open
bschilder opened this issue Sep 21, 2024 · 1 comment
Open

Provide driver genes #1

bschilder opened this issue Sep 21, 2024 · 1 comment

Comments

@bschilder
Copy link

bschilder commented Sep 21, 2024

Priority: high

@bschilder will provide lists of genes driving the association between each significant phenotype-cell type association (ie genes in both lists with the highest specificity quantile at some consistent threshold).
It seems the lists provided by Nathan previously were incorrect, and instead were simply the first N genes sorted alphabetically.

@bschilder
Copy link
Author

bschilder commented Sep 21, 2024

Here's how I gathered the driver genes. You can adjust the specificity quantiles to include (set to include only genes in to top 1/4 of specificity quantiles here, ie quantiles 30-40).

The choice of the quantiles threshold is totally arbitrary, so let me know if you'd like me to adjust it if needed.

I also added the continuous specificity score (from 0-1) in case that's helpful.

results = MSTExplorer::load_example_results()[q<0.05]
results <- HPOExplorer::add_disease(results, allow.cartesian=TRUE)
## Add specificity quantiles
drivers <- MSTExplorer:::add_driver_genes(results = results, 
                                          keep_quantiles = seq(30,40))
## Add continuous specificity as well
drivers <- MSTExplorer:::add_driver_genes(results = drivers, 
                                           metric = "specificity")
data.table::fwrite(drivers[,list(ctd,CellType,hpo_id,gene_symbol,specificity_quantile,specificity)]|>unique(),
                   "Downloads/drivers.csv.gz")

Resulting table attached here:

drivers.csv.gz

Also, in case it's helpful here's some metrics for assessing how many driver genes per phenotype-cell type association there are.

hist(drivers$n_driver_genes_hpo_id)
tail(sort(table(drivers$gene_symbol)))

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant