imputations_accuracy() lists all markers as not in common and drops all from analysis #9

pbpearman · 2018-12-20T18:58:41Z

Hi Thierry,
I have filtered a .vcf dataset so to create a version with low missingness of sites and samples. I then run the following lines to impute genotypes and output .rad and .tped files, with and without rf imputation:

test2 <- genomic_converter(data = "UMBELLA_Erumb1_samples_gt_90pct_snps_gt_40pct_covered.recode.vcf",output="plink",filename = "test_2",strata="strata_eu_40.tsv", imputation.method = "rf")

The run finishes with the following results section:

############################### RESULTS ###############################
Data format of input: vcf.file
Biallelic data
Number of common markers: 156
Number of chromosome/contig/scaffold: 113
Number of individuals 551
Number of populations 56

I then check the imputation accuracy, but all 156 markers are dropped, evidently for not being shared:

> imp_acc <- imputations_accuracy("test_2.rad","test_2_imputed.rad")

Data provided still contains missing genotypes,
accuracy will be mesured on common non-missing genotypes
Removing 156 markers not in common between datasets
$Information dropped from the analysis
$Information dropped from the analysis$markers.dropped
[1] "Erumb1_s01095803__85__621" "Erumb1_s00047668__13__4936"
[3] "Erumb1_s00648045__68__686" "Erumb1_s01308175__88__428"
[5] "Erumb1_s02281967__111__51" "Erumb1_s02289709__112__65" .....

Misclassification Error: by populations
A tibble: 0 x 2
... with 2 variables: POP_ID , ME

Misclassification Error: by individuals
A tibble: 0 x 2
... with 2 variables: INDIVIDUALS , ME

Misclassification Error: by markers
A tibble: 0 x 2
... with 2 variables: MARKERS , ME

Misclassification Error: overall
[1] NaN

I think this is a bug. Do you? If you want, I can provide you with the datasets. Just let me know.
Best,
Peter

The text was updated successfully, but these errors were encountered:

pbpearman · 2018-12-21T15:30:19Z

Hi again Thierry,
I ran the same procedure on the stickleback data from the tutorial. I imputed the missing genotypes again in genomic converter, obtaining the saved imputed and unimputed datasets. I then used them both with imputation_accuracy().

stickle <- genomic_converter("example_vcf2dadi_ferchaud_2015.vcf",
filename = "stickle1", 
strata="strata.stickleback.tsv", 
imputation.method = "rf")

I got:

#######################################################################
###################### grur: imputations_accuracy #########################
#######################################################################
Data provided still contains missing genotypes,
accuracy will be mesured on common non-missing genotypes
Removing 31790 markers not in common between datasets
Information dropped from the analysis
Information dropped from the analysis$markers.dropped
[1] "groupI__28362__23825290" "groupI__28366__24653314" "groupI__28366__24653352"
[4] "groupI__28366__24653353" "groupI__28382__25273727" "groupI__28450__28141141"
[7] "groupI__28450__28141145" "groupI__34525__6537935" "groupI__34525__6537939"
[10] "groupI__34525__6537953" "groupI__34525__6537980" "groupI__5674__12268369"
[13] "groupI__5674__12268370" "groupI__6034__17836877" "groupII__1632__17623945" ...

The original number of markers was 31802. I checked the values in the resulting object, 'stickle', and all classification errors for individuals, populations etc, were all listed as zero. I also tried using the imputed and unimputed datasets that are in the object produced by genomic_converter(). Again, about 99% of the markers were dropped. I examined the marker names of the two datasets. They are identical, as far as I can see.

Is this right? Why are almost all the markers showing as not in common among the two datasets? This is not the behavior I expected.
Best, Peter

thierrygosselin · 2019-03-15T17:06:09Z

Hi Peter, definitely a bug. This function was not tested with lots of datasets and it's in the treatment of markers names before and after imputations the problem.

thierrygosselin added the bug label Mar 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imputations_accuracy() lists all markers as not in common and drops all from analysis #9

imputations_accuracy() lists all markers as not in common and drops all from analysis #9

pbpearman commented Dec 20, 2018 •

edited

Loading

pbpearman commented Dec 21, 2018

thierrygosselin commented Mar 15, 2019

imputations_accuracy() lists all markers as not in common and drops all from analysis #9

imputations_accuracy() lists all markers as not in common and drops all from analysis #9

Comments

pbpearman commented Dec 20, 2018 • edited Loading

pbpearman commented Dec 21, 2018

thierrygosselin commented Mar 15, 2019

pbpearman commented Dec 20, 2018 •

edited

Loading