Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to speed up the calculation? #8

Open
Yun-Ching-Chen opened this issue Jan 13, 2023 · 2 comments
Open

Any way to speed up the calculation? #8

Yun-Ching-Chen opened this issue Jan 13, 2023 · 2 comments

Comments

@Yun-Ching-Chen
Copy link

Hi,

I've tried to run Enigma trace norm for ~500 TCGA samples using my own scRNA data (15 cell types with ~ 10000 genes) as the reference (in the aggregated 10000x15 matrix but not the Seurat object). It has been running over 24 hrs. I wonder if there is any tip to speed up the calculation or if it is possible to make a multi-core version?

Thanks,
YC

@WWXkenmo
Copy link
Owner

WWXkenmo commented Jan 13, 2023

Thanks for using our tool!

The reason why ENIGNA trace norm need to perform SVD for each CSE in each round of gradient calculation. Therefore it takes time to optimise when apply ENIGMA trace norm on large datasets (>1000 samples or >5000 spots). Even though current version need to cost a long time on large dataset, it should not cost such long time (over 24 hrs) on ~500 samples. Here is my current thoughts to fix the issue:

  1. please use verbose = TRUE to check if the Kappa Score is keeping decreasing. If not, and Kappa score is increasing. The algorithm is not converge. And I suggest to set a smaller gradient step tao_k
  2. if the Kappa Score is decreasing, then I suggest to set a relative bigger gradient step tao_k, but keep in-mind, too big step size would lead algorithm not converge. Or, you could set a bigger max_ks (e.g. 2-5), to relax the end condition.
  3. Another suggestion I want to give its that please do not use too many reference cell types. Which would lead to the CSE estimation worser, and it's also the same for other CSE estimation or cell type deconvolution tools. Because too many cell types may includes some cell types have very similar gene expression patter (high correlation). I suggest you need to inspect the datasets, try to make sure each cluster has distinct gene expression profile, could be checked through calculating correlation among pairwise cell types. Meanwhile, a lower number of cell types would also help to speed up calculation.

The parallelized ENIGMA is under development and I would upload soon. Hope above information is helpful, and please let me know if you suppose have any new questions.

Best,
Ken

@WWXkenmo
Copy link
Owner

Hey

Have you fixed your question, if you still have problem, could share the data with me (omit some important information) and I could help you to address

Best,
Ken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants