Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error readSnpMatrix: Overlap of loci between the two Loess dataframes #25

Closed
iS4i4S opened this issue Aug 5, 2021 · 6 comments
Closed

Comments

@iS4i4S
Copy link

iS4i4S commented Aug 5, 2021

Hello,

I generated the normal snp and then the loess.txt using 47 normal bams successfully using:

/usr/local/lib/R/site-library/facets2n/extcode/snp-pileup-wrapper.R --output-prefix /media/user/Seagate_Exome67_133/LIGUE/Exomes_67_133/FACETS/Tonly/standard_normals_cv3heme \ --vcf-file /media/user/Seagate_Exome67_133/LIGUE/Reference_files_WESpipeline/00-common_all.vcf.gz \ --unmatched-normal-BAMS "/media/user/Seagate_Exome67_133/LIGUE/Exomes_67_133/Alignments/Final_bams/*N_cleaned_bqsr.bam"
And then in R I created the loess object and try to load a snpMatrix (from a Tumor-normalUnmatched sample):

facets2n::MakeLoessObject(pileup = PreProcSnpPileup(filename = "/media/user/Seagate_Exome67_133/LIGUE/Exomes_67_133/FACETS/Tonly/standard_normals_cv3heme.snp_pileup.gz", is.Reference = TRUE,gbuild="hg38"), write.loess = TRUE, outfilepath = "/media/user/Seagate_Exome67_133/LIGUE/Exomes_67_133/FACETS/Tonly/standard_normals_cv3heme.loess.txt", is.Reference = TRUE,gbuild = "hg38") readu <- facets2n::readSnpMatrix(filename ="CNSL047T_CNSL005N.snp_pileup.gz", MandUnormal = TRUE, ReferencePileupFile = "/media/user/Seagate_Exome67_133/LIGUE/Exomes_67_133/FACETS/Tonly/standard_normals_cv3heme.snp_pileup.gz", ReferenceLoessFile = "/media/user/Seagate_Exome67_133/LIGUE/Exomes_67_133/FACETS/Tonly/standard_normals_cv3heme.loess.txt", useMatchedX = FALSE, refX=TRUE,gbuild = "hg38")

I got the following error:
imputed patient sex from matched normal: Female
Best normal for autosomes: NA
Best normal for ChrX: NA

Error in [.data.frame(combined.pileup, , best_normX) :
undefined columns selected
In addition: There were 50 or more warnings (use warnings() to see the first 50)

I tried lowering the "MinOverlap" to 0.75 but the error persists with this and other samples

Any suggestions or ideas to make it work?

Thanks in advance

@rptashkin
Copy link
Owner

rptashkin commented Aug 5, 2021

Hi @iS4i4S, are there Female samples in your reference set of 47 normals? Are the reference normals and your sample CNSL047T sequenced with the same assay? If yes to both of those questions, perhaps you can share your reference loess file and your CNSL047T snp pileup counts file to debug further.

@iS4i4S
Copy link
Author

iS4i4S commented Aug 5, 2021

Thanks for the quick reply,

  1. Yes there are 60% females
  2. Yes they come from the same cohort sequenced the same.
    CNSL047T.snp_pileup.gz

THe link for the referenceFIle standard_normals_cv3heme.snp_pileup.gz

THe loess is a little big so I put the pileup for the normals

Thanks in advance

@rptashkin
Copy link
Owner

rptashkin commented Aug 6, 2021

Hi @iS4i4S ,

I was not able to reproduce your error:

made loess file from your provided reference pileup

MakeLoessObject(pileup = PreProcSnpPileup(filename = "standard_normals_cv3heme.snp_pileup.gz", is.Reference = TRUE), write.loess = TRUE, outfilepath = "standard_normals_cv3heme.loess.txt", is.Reference = TRUE)

readu <- readSnpMatrix(filename = "CNSL047T.snp_pileup.gz", MandUnormal = TRUE, ReferencePileupFile = "standard_normals_cv3heme.snp_pileup.gz", ReferenceLoessFile = "standard_normals_cv3heme.loess.txt", useMatchedX = FALSE, refX=TRUE)

imputed patient sex from matched normal: Female
Best normal for autosomes: File1DP
Best normal for ChrX: RefFile30DP

There were 50 or more warnings (use warnings() to see the first 50)

warnings()
Warning messages:
1: In FindBestNormalParameters(tumor.loess, tumor.pileup, ... :
Overlap of loci between the two Loess dataframes
is less than defined MinOverlap fraction of 0.9

xx <- preProcSample(readu$rcmat, unmatched = F, ndepth = 50,het.thresh = 0.25, ndepthmax = 5000, spanT = readu$spanT, spanA=readu$spanA, spanX = readu$spanX, MandUnormal = TRUE)

oo <- procSample(xx,min.nhet = 10, cval = 150)
dlr <- oo$dipLogR

oo <- procSample(xx,min.nhet = 10, cval = 50, dipLogR = dlr)
fit <- emcncf(oo, min.nhet = 10)

plotSample(x=oo,emfit=fit, plot.type = "both")

CNSL047T_default_params.pdf

A few points:

  • Looking at the log-odds ratio (the second plot) across the genome has an unexpected result that I would flag as a QC issue: There is a high degree of alleleic imbalance across the entire genome, that resemble a pattern typically seen with a tumor and normal sample from different individuals or that result when one or both of the tumor/normal samples are contaminated with DNA from an different individual.

Are your input data of Normal and Tumor BAMs that generated CNSL047T.snp_pileup.gz from the same individual? If so, can you verify that these sample are not contaminated?

  • From your pileup data, it appears that your sample coverage depth is ~50X . The values of params in the function preProcSample() are default for targeted sequencing with average coverage ~ 1000X and high on target rate and can be changed to better fit the input data
  • The warning "Overlap of loci between the two Loess dataframes is less than defined MinOverlap fraction of 0.9" implies that that there is a significant difference in coverage distributions across the genome between the standard normals that were used and the Normal sample that is in CNSL047T.snp_pileup.gz".
  • Lastly, I noticed that a threshold for imputing sample sex is hardcoded for targeted sequencing data, and may be unreliable for WES data. I would interpret chrX with caution or use refX=FALSE. Will fix this in the next release

@iS4i4S
Copy link
Author

iS4i4S commented Aug 6, 2021

THanks for the input.
Yes, indeed the normal and tumor bams are not from the same individual since the samples that I am trying to run are Tumor-only (19 samples) and I have also Tumor-paired (95 samples) that I already run through FACETS without problem. Both samples are not contaminated.

I tried reinstalling and reopening R to see if maybe there were some environment inconsistencies but it keeps throwing the error when reading the pileup.gz file.

Could you print your SessionInfo please

Thanks

@rptashkin
Copy link
Owner

rptashkin commented Aug 6, 2021

Hi,
This tool, as is FACETS, is designed for analysis for matched tumor-normal samples to generate allele specific copy number calls. You can set the argument for 'unmatched' to be TRUE for calls to readSnpMatrix() and preProcSample() with tumor only data, but I would interpret those results with caution. The use of unmatched normal samples with FACETS2n is designed to improve log ratio plots and accuracy of joint segmentation. Including session info below if it helps:

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Users/ptashkir/anaconda2/envs/renv/lib/libopenblasp-r0.3.7.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] facets2n_0.3.0 DNAcopy_1.60.0 pctGCdata_0.3.0

loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1

@iS4i4S
Copy link
Author

iS4i4S commented Aug 6, 2021

Thanks a lot, will try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants