-
Notifications
You must be signed in to change notification settings - Fork 21
FAQs
How can I use hg38?
Specify reference genome in make_matrix() by ref_genome_name to 'hg19'
or 'hg38'
.
The COSMIC signature catalog version
The MVA models are trained using COSMIC catalog v2 and that version should be used to get accurate predictions. However apart from the MVA predictions other functionalities of SigMA can be used by setting cosmic_version = 'v3'
in run()
. The MMRD detection functionalities were implemented with v3
.
Can I train a new model?
Yes, see here.
What is the meaning of different signature columns?
For each signature there are different measures produced which are discussed here. Below you can find some more details.
Non-negative least squares (NNLS) calculations from run()
function output:
Sig3 is reported specifically in exp_sig3
column. The rest of the signatures found in exps_all
(or exps_all_msi
) and sigs_all
(or sigs_all_msi
) columns, the difference between the columns ending with _msi
is the consideration of mismatch repair deficiency specific signatures in the decomposition.
Signature_*_l_rat (e.g. Signature_3_l_rat) indicates: (probability decomposition with that signature)/(probability decomposition with that signature + probability decomposition without that signature)
A value of 0.5 indicates that the mutations can be decomposed equally well with other signatures in the catalog, and a value above this value indicates that the decomposed spectrum better explains the mutations if Sig3 is used in the decomposition.
Signature_3_c*_ml
columns indicate the likelihood of Sig3+ clusters in WGS data. If you run in the lite mode you will find get a Signature_3_ml
column which is the sum of all the Signature_3_c*_ml
values. This indicates how likely the sample is to match a Sig3+ cluster in WGS. These values are independent of NNLS calculation. For other signatures similar columns exist (e.g. Signature_clock_c*_ml
, Signature_4_c*_ml
, etc.)
Signature_3_mva
is the multivariate classification score that uses all the scores above. The pass_mva
and pass_mva_strict
are columns applied on the Signature_3_mva
score.
How to interpret a given value of the measures? Which samples are Signature 3 positive?
- You can use the
lite_format = T
setting and look atcateg
column in the output file. - You can use the
lite_format = F
setting and usepass_mva_strict
orpass_mva
columns for 10% and < 5% FPR settings. Note: if you have trained your own model the FPR values may be different. - You can determine the FPR also for any other signature measure (e.g. likelihoods columns ending with
_ml
) using theget_threshold()
function and simulated data. For generating simulated data see the example macro and wiki documentation. E.g. for Sig3 exposure calculated with NNLS:
thresh <- get_threshold(df, limits = c(0.05, 0.1), var = 'exp_sig3',signal = 'is_true', cut_var = 'fpr')
cutoffs <- thresh$cutoff # cutoff values to be used with that parameter e.g. exp_sig3 > cutoffs[1] would correspond to 0.05 FPR
thresh$sen # sensitivity at corresponding values
thresh$fpr # false positive rate
thresh$fdr # false discovery rate
Previous issues
Perhaps someone already asked the same question in one of the previous issues.