Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accuracy.MC error #2

Open
rmpeery opened this issue Jan 18, 2018 · 12 comments
Open

accuracy.MC error #2

rmpeery opened this issue Jan 18, 2018 · 12 comments

Comments

@rmpeery
Copy link

rmpeery commented Jan 18, 2018

I have successfully analyzed a dataset with 248 individuals and ca. 9700 loci within assignPop using the kfold method, however, the MC method is giving an error that I'm hoping you can help with. I've tried using the information from the vignette as a starting point:
assign.MC(referenceAlleles, train.inds=c(10, 15, 19), train.loci=c(0.1, 0.25, 0.5, 1), loci.sample="fst",
iterations=30, multiprocess = TRUE, model="lda", dir="MC/")
but when I run accuracy.MC(dir="MC2/") I get the error:
Error in [<-.data.frame(*tmp*, i, , value = c(0.333333333333333, 0.235294117647059, :
replacement has 6 items, need 7.

Do you have any suggestions on how I can fix the input file so that accuracy.MC will run? I'm running R v. 3.4.2 through R studio v. 1.1.383.

@alexkychen
Copy link
Owner

Hi, are you trying to run accuracy.MC for your assign.MC results saved in the folder "MC"? If so, you should specify the same folder name when running accuracy.MC. [ i.e., accuracy.MC(dir="MC/") ]. Please let me know if the problem is something else. Thanks :)

@rmpeery
Copy link
Author

rmpeery commented Jan 19, 2018

I do not think the problem is specification of the directory. Here is my exact code and the terminal output. I think that assign.MC is working because check.loci works, but the accuracy.MC command is not. The input file does not seem to be an issue because analysis through the kfold method works as expected.

assign.MC(referenceAlleles, train.inds=c(10, 15, 19), train.loci=c(0.1, 0.25, 0.5, 1),
loci.sample="fst", iterations=30, multiprocess = TRUE, model="lda", dir="MC2/")

3 cores/threads of CPU will be used for analysis...
Monte-Carlo cross-validation done!!
360 assignment tests completed!!

accuRes_MC <- accuracy.MC(dir="MC2/")
Error in [<-.data.frame(*tmp*, i, , value = c(0.333333333333333, 0.235294117647059,
: replacement has 6 items, need 7

check.loci(dir = "MC2/", top.loci = 100)
3 levels of training individuals are found.
Which levels would you like to check? (separate levels by a whitespace if multiple)
Options: 10, 15, 19, or all

enter here: all

Results were saved in a 'High_Fst_Locus_Freq.txt' file in the directory.

@alexkychen
Copy link
Owner

Hi, It looks like the problem may relate to those "Out_xx_xx_xx.txt" files in the folder. Could you copy and paste the first few rows of data, including column name from any of those files? How many populations you have? If you have 3 populations, you should see 6 columns in your data. Columns are separated by space. Thanks.

@rmpeery
Copy link
Author

rmpeery commented Jan 19, 2018

There are 7 reference/training populations.

Ind.ID origin.pop pred.pop pop.1 pop.2 pop.3 pop.4 pop.5 pop.6 pop.7
Ind1 pop.1 pop.5 4.84706781346196e-54 4.57393762693888e-07 7.29146191461918e-31 3.64814949912892e-33 0.999999542606237 5.22110759170651e-85 1.71017166046101e-29
Ind2 pop.1 pop.4 4.93567729299893e-13 5.65857919194618e-66 9.18114277199036e-43 0.999999999977031 2.24752661106583e-11 5.05244282329466e-109 3.70833483677742e-45
Ind3 pop.1 pop.5 6.88643447363136e-32 1.98753423758658e-07 3.55556571792537e-19 1.62028194813181e-30 0.999999801246576 8.77961552598731e-80 1.81120047917388e-21
Ind4 pop.1 pop.3 3.13597690009681e-31 2.98045586414029e-07 0.99999531868614 8.45345513191409e-42 4.3832682738894e-06 6.45006618152228e-39 3.83912716502444e-17
Ind5 pop.1 pop.2 3.04788250036642e-74 1 9.83211838077908e-54 3.11838856058822e-70 2.71412666369024e-19 2.21277959386241e-116 3.29727224458226e-25
Ind6 pop.1 pop.5 3.19409244680128e-06 1.49086225660614e-29 4.60375458140432e-31 6.5782822333848e-35 0.999996805907426 1.71311828788165e-40 1.27236017210606e-13
Ind7 pop.1 pop.3 3.04101345149839e-44 0.121223029730926 0.878768920984708 2.67448709614046e-59 1.19263049028514e-25 2.52802861980595e-80 8.04928436651628e-06

@alexkychen
Copy link
Owner

Your populations, sample ID, and column names seem to be correct. I've manipulated my data to run the accuracy.MC, but still have trouble to generate the error message like you have. Do you mind to send me your zipped MC2 folder? I can run it from my side and take a closer look where is the problem. Please email me at [email protected]. Meanwhile, if you haven't done it, could you download the example data (simGenepop.txt) and give it a quick run on your computer to see if it works? Thanks so much.

@allanbcostello
Copy link

I have a similar error message:

assign.MC(SDATA, dir="Result-folder/", train.inds=c(0.5,0.7,0.9),

  • train.loci=c(0.1,0.25,0.5,1), loci.sample="fst", iterations=30,

  • model="svm")
    3 cores/threads of CPU will be used for analysis...

    Monte-Carlo cross-validation done!!
    360 assignment tests completed!!> accuMC <- accuracy.MC( dir = "Result-folder/" )
    Error in [<-.data.frame(*tmp*, i, , value = c(0, 0, 0, 0, 0, 0, 0, :
    replacement has 16 items, need 20

@allanbcostello
Copy link

One other thing... my genepop data is in 3 digit fragment size format as opposed to allele number format if that would be relevent. I ran the test data set you mention above and that works just fine.

@alexkychen
Copy link
Owner

Hi, How many samples/individuals do you have in each of the populations?
If one of your populations has only 4 or less individuals, this error could happen when using train.inds=0.9 in your assign.MC analysis. It means no individual from small populations (looks like 4 out of 20 in your case) were assigned to test sets, because 4 x 0.9 is rounded up as 4. To fix it, you could either use a fixed number of training individuals (e.g., train.inds = 3) in assign.MC or increase your sample size in small populations. Some people duplicate individuals in small populations, but that could inflate your results in those populations.
If your issue is something else, please let me know and I will further investigate. Thanks!

PS. Number of digits in your Genepop file should have nothing to do with the function accuracy.MC., but thanks for providing that information. Just be sure to set "haploid=TRUE" in read.Genepop, if your data is haploid data type.

@allanbcostello
Copy link

allanbcostello commented Mar 21, 2018 via email

@alexkychen
Copy link
Owner

alexkychen commented Mar 26, 2018

@rmpeery @allanbcostello
I have modified the function accuracy.MC and update the package on this Github repo. Now it should be able to handle a population that doesn't have samples in your test sets, meaning that the error you had should disappear. You can update/re-install the package from Github for now, or just copy the function from here, and run it on your machine locally. The official version 1.1.5 will release to CRAN later on. Please let me know if it works or not. Thanks!

@TepoltC
Copy link

TepoltC commented Mar 31, 2018

I got this same error yesterday, and in my case I think it was related to using a dash in one of my population names. (When I relabelled, it ran fine.) Thought I'd add this in case anyone else runs into the same issue.

Thanks for a really nice tool, and for providing such clear instructions!

@alexkychen
Copy link
Owner

@TepoltC
Thank you for the heads-up. I will take a further look when possible. Also thanks for your nice words!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants