Skip to content

Latest commit

 

History

History
78 lines (60 loc) · 8.96 KB

faq.md

File metadata and controls

78 lines (60 loc) · 8.96 KB

Frequently Asked Questions

  • What file contains X data in a FACETS run?

    Data for a FACETS run consists of a set of several files. A facets run consists of a top level directory, containing fit directories for each FACETS fit for the sample. Below are some details on specific files and their contents:

    • /facets_qc.txt - Contains a summary of data for the sample overall, and information about each existing fit for the sample. Includes information about review status, quality check flags, and run parameters for each fit.
    • /fit_dir/*.arm_level.txt - Arm level data for the fit. Gives an overall copy number state for each chromosome arm.
    • /fit_dir/*.gene_level.txt - Gene level information for the fit. Segment data for regions overlapping genes are listed, along with the copy number calls and SNP data summary for the segments.
    • /fit_dir/*.cncf.txt - Contains data for integer copy number calls (TCN/LCN) and clonal fraction data for all segments. Generally the file of interest when looking for copy number calls for a fit.
    • /fit_dir/*.seg - General information about each segments position.
    • /fit_dir/*.png - Images generated by Facets-Suite that are used in Facets Preview.
    • /fit_dir/*.Rdata - The Rdata file stores specific information about the SNPs used to create the figures for this facets run. Can be inspected by loading into R.
  • How is FACETS data generated?

    BAM files are initially compiled into SNP pileup files that can be used for FACETS processing. FACETS data is generated by running this script from Facets-Suite, providing the SNP-pileup data as input. This code is responsible for calling the FACETS algorithm, and generating output files and figures. Note that if you want to load samples in Facets Preview, that the --legacy-output flag should be set to TRUE.

  • What QC metrics are used with FACETS data?

    FACETS has a built-in quality module that checks the results of each FACETS run across several metrics. The metrics used are:

    • dipLogR_flag - Indicates extreme dipLogR value.
    • n_alternative_dipLogR - Number of alternative dipLogR values.
    • n_dip_bal_segs - Number of balanced segments at dipLogR and the fraction of genome they represent.
    • n_dip_imbal_segs - Number of imbalanced segments at dipLogR and the fraction of genome they represent.
    • n_amp - Number of segments at total copy number >= 10.
    • n_homdels - Number of homozygously deleted segments (total copy number = 0).
    • n_homdels_clonal - Number of clonal homdel segments and the fraction of the genome they represent.
    • n_cn_states - Number of unique copy-number states (i.e. combinations of major and minor copy number).
    • n_segs - Number of segments.
    • n_cnlr_clusters - Number of copy-number log-ratio clusters.
    • n_lcn_na - Number of segments where no minor copy number was inferred (lcn is NA).
    • n_loh - Number of segments where the minor copy number is 0 and the fraction of the genome they represent.
    • n_snps - Number of SNPs used for segmentation.
    • n_het_snps - Number of heterozyous SNPs used for segmentation and their fraction of the total.
    • n_het_snps_hom_in_tumor_1pct - Number of heterozyous SNPs where the tumor allele frequency is <0.01/>0.99 their fraction of the total.
    • mean_cnlr_residual - Mean and standard deviation of SNPs' log-ratio from their segments copy-number log-ratio.
    • n_segs_discordant_tcn - Mean and standard deviation of SNPs' log-ratio from their segments copy-number log-ratio.
    • n_segs_discordant_lcn - Number of segments where the naïve and EM algorithm estimates of the minor copy number are discordant and the fraction of the genome they represent.
    • n_segs_discordant_both - Number of segments where the naïve and EM algorithm estimates of the both copy numbers are discordant and the fraction of the genome they represent.
    • n_segs_icn_cnlor_discordant - Number of clonal segments where the log-ratio shows balance but the copy-number solution does not, and the reverse, and the fraction of the genome they represent.
    • dip_median_vaf - If MAF input: median tumor VAF of somatic mutations on clonal segments with total copy number 2 and allelic balance.
    • n_homdel_muts - If MAF input: number of somatic mutations in homozygously deleted segments.
    • median_vaf_homdel_muts - If MAF input: Median tumor VAF of somatic mutations homozygously deleted segments.

    These metrics are used to make determinations about the sample and make pass/fail calls for 11 FACETS QC flags.

    • Homozygous Deletions - This flag will pass if the percentage of n_homdels_clonal is < 2% and n_homdels < 5%.
    • Diploid Segments - This flag will pass if either the percentage of n_dip_bal_segs is at least 1% or n_dip_imbal_segs is >= 5%.
    • No Waterfall Pattern - This flag will pass if the SD from mean_cnlr_residual < 1 or the sample has at least 50% purity.
    • No Hyper Segmentation - This flag will fail when n_segs > 65 and an insufficient fraction of the sample is diploid.
    • Not High Ploidy - This flag will fail if any of the following are true: ploidy > 7, ploidy > 5 and purity < 10%, percent of sample that is balanced diploid < 5%.
    • Has Valid Purity - This will pass if purity is not NA and is not 0.3. (The 0.3 specific purity is a statistical artifact that indicates a failure).
    • em vs. cncf TCN/LCN Discordance - This will fail n_segs_discordant_tcn/n_segs_discordant_lcn is > 50%.
    • DipLogR Not Too Low - Fail if percent of sample below the dipLogR is < 1%, AND the percent of sanoke that is balanced and diploid is < 5%, AND percent of sample that is imbalanced and diploid is < 50%.
    • ICN is Discordant with Allelic State - Fails if the percent of the sample that is balanced where TCN is an odd number is > 20% OR percent of sample that is imbalanced by ICN is balanced diploid is > 20%.
    • High Percent Subclonal - Fails if percent of sample that is subclonal is > 60% AND < 2% of the sample is balanced and diploid.
    • Contamination Check - Fails if percent of het snps that are homozygous in the tumor is > 5% and the tumor purity is < 80%.

These flags can be seen in the /facets_qc.txt file. The above metrics and QC flags can also be seen in Facets Preview using the QC tab.

  • What is the difference between hisens and purity runs?

    Facets-Suite produces two versions for each FACETS run, suffixed in fit data by "_purity" or "_hisens". The primary difference between these runs is the cValue provided for the run. cValue is a smoothing metric that lets FACETS determine when to make breaks between segments. A lower value will result in fewer segment breaks, whereas a higher value will result in a more strict interpretation of the data leading to more breaks. Both runs are generated because depending on the intended analysis or the underlying data of the sample in question, different amounts of smoothing may be desired. IMPACT repositories by default use values of 100 and 500 respectively for purity and hisens. Using Facets Preview, it is also possible to submit fits with higher or lower cValues, to adjust smoothing as necessary for a given sample.

  • How do I identify IMPACT Facets Data for samples with X properties.

    If you want to identify samples with specific properties, for example, samples of a specific cancer type or with whole genome duplication, contact Adam Price if you do not have access to the IMPACT repository on Terra/Juno/Iris. If you do have access to the repository, the easiest way to interact and select samples is by using FacetsAPI. FacetsAPI is a powerful toolkit for interacting with FACETS repositories with extensive documentation and easy to implement examples.

  • What exactly are the CF and purity values generated by FACETS?

    The CF value is the cellular fraction of non-diploid cells, which are assumed cancer cells. Purity is reported as the largest CF across the sample, with the assumption that the largest CF within the sample represents a clonal CNA. For any segment where CF = purity, the CNA on the segment is considered clonal.

  • Can I do refits on my FACETS data?

    Facets Preview provides functionality for performing refits, however, results are stored in the active sample directory. If you want to do refits on data, please first make a copy of the sample in question and load it into Facets Preview from a local location, rather than performing refits on mounted resources such as the IMPACT Facets Repository or the TEMPO WES Repository.

  • How can I open FACETS data generated by the TEMPO pipeline?

    TEMPO generates FACETS data with slightly different naming conventions than are currently compatable with Facets Preview. To prepare your data to load with Facets Preview, it is necessary to change the sample_id.facets_qc.txt file in the top level facets directory to simply facets_qc.txt. This will enable Facets Preview to properly read the data. This issue will be resolved in future versions of Facets Preview.