Skip to content
Doga C. Gulhan edited this page Nov 23, 2022 · 7 revisions

How can I use hg38?

Specify reference genome in make_matrix() by ref_genome_name to 'hg19' or 'hg38'.


The COSMIC signature catalog version

The MVA models are trained using COSMIC catalog v2 and that version should be used to get accurate predictions. However apart from the MVA predictions other functionalities of SigMA can be used by setting cosmic_version = 'v3' in run(). The MMRD detection functionalities were implemented with v3.


Can I train a new model?

Yes, see here.


What is the meaning of different signature columns?

For each signature there are different measures produced which are discussed here. Below you can find some more details.

Non-negative least squares (NNLS) calculations from run() function output: Sig3 is reported specifically in exp_sig3 column. The rest of the signatures found in exps_all (or exps_all_msi) and sigs_all (or sigs_all_msi) columns, the difference between the columns ending with _msi is the consideration of mismatch repair deficiency specific signatures in the decomposition.

Signature_*_l_rat (e.g. Signature_3_l_rat) indicates: (probability decomposition with that signature)/(probability decomposition with that signature + probability decomposition without that signature)

A value of 0.5 indicates that the mutations can be decomposed equally well with other signatures in the catalog, and a value above this value indicates that the decomposed spectrum better explains the mutations if Sig3 is used in the decomposition.

Signature_3_c*_ml columns indicate the likelihood of Sig3+ clusters in WGS data. If you run in the lite mode you will find get a Signature_3_ml column which is the sum of all the Signature_3_c*_ml values. This indicates how likely the sample is to match a Sig3+ cluster in WGS. These values are independent of NNLS calculation. For other signatures similar columns exist (e.g. Signature_clock_c*_ml, Signature_4_c*_ml, etc.)

Signature_3_mva is the multivariate classification score that uses all the scores above. The pass_mva and pass_mva_strict are columns applied on the Signature_3_mva score.


How to interpret a given value of the measures? Which samples are Signature 3 positive?

  • You can use the lite_format = T setting and look at categ column in the output file.
  • You can use the lite_format = F setting and use pass_mva_strict or pass_mva columns for 10% and < 5% FPR settings. Note: if you have trained your own model the FPR values may be different.
  • You can determine the FPR also for any other signature measure (e.g. likelihoods columns ending with _ml) using the get_threshold() function and simulated data. For generating simulated data see the example macro and wiki documentation. E.g. for Sig3 exposure calculated with NNLS:
thresh <- get_threshold(df, limits = c(0.05, 0.1), var = 'exp_sig3',signal = 'is_true', cut_var = 'fpr')
cutoffs <- thresh$cutoff # cutoff values to be used with that parameter e.g. exp_sig3  > cutoffs[1] would correspond to 0.05 FPR
thresh$sen # sensitivity at corresponding values
thresh$fpr # false positive rate
thresh$fdr # false discovery rate

Previous issues

Perhaps someone already asked the same question in one of the previous issues.

Clone this wiki locally