Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running large no. of samples & spectra #2

Open
yangchoo opened this issue Jun 30, 2014 · 3 comments
Open

Error running large no. of samples & spectra #2

yangchoo opened this issue Jun 30, 2014 · 3 comments

Comments

@yangchoo
Copy link

Hi,
first off, thanks for the awesome program!

Everything works fine for hundreds of samples, but I'm running into periodic determinant = 0 errors while running EMu with a large number of samples (~6000). [Err msg: In get_llhood for m=6***, det(Hf) = 0.... ]

The program works fine up to ~15 spectra, then such errors start occuring periodically. I am thus unable to run EMu to completion for anything beyond 15 spectra.

Let me know if you need my .mutations or .opp. file.

Thanks!

@yangchoo yangchoo changed the title gsl_linalg_LU_det = 0 Error running large samples Jul 1, 2014
@yangchoo yangchoo changed the title Error running large samples Error running large no. of samples Jul 1, 2014
@yangchoo yangchoo changed the title Error running large no. of samples Error running large no. of samples & spectra Jul 1, 2014
@andrej-fischer
Copy link
Owner

Hello,

this error might appear, if you have very few mutations in a specific sample. If you try more signatures than there are channels occupied in a sample, the log-likelihood contribution for that sample cannot be computed (line 785-803 in MutSpec.cpp).

I have a look for a workaround. But do you really want that many signatures?

On 30 Jun 2014, at 06:41, yangchoo wrote:

Hi,
first off, thanks for the awesome program!

Everything works fine for hundreds of samples, but I'm running into periodic determinant = 0 errors while running EMu with a large number of samples (~6000). [Err msg: In get_llhood for m=6***, det(Hf) = 0.... ]

The program works fine up to ~15 spectra, then such errors start occuring periodically. I am thus unable to run EMu to completion for anything beyond 15 spectra.

Let me know if you need my .mutations or .opp. file.

Thanks!


Reply to this email directly or view it on GitHub.

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

@yangchoo
Copy link
Author

yangchoo commented Jul 3, 2014

Ah.. I see. Does that mean the contribution of a particular signature from a sample has to be non-zero?
I am trying to compare EMu vs. NMF on a large dataset. NMF has been shown by Alexandrov to resolve ~27 signatures from his dataset, and I am trying to see if EMu can detect similar signatures from a similarly large dataset.

@andrej-fischer
Copy link
Owner

Hi yangchoo,

thanks again for your question. It turns out it is aimed at the very heart of the EMu model. The technical reason for the error is described above, but underlying is the assumption that all the processes are, in principle, present in all the samples. The case that some processes are strictly absent, i.e. their activity is zero, is not well handled with the current implementation. That is mainly due to a saddle point approximation which is used to calculate the log-likelihood, but is not well defined for zero activity.
The immediate fix of this bug will take a bit time and testing, but will certainly be worth it. In the meantime, one option is to separate samples by cancer types, which was also done by Alexandrov et al.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants