Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, : long vectors not supported yet: ../../../include/Rinlinedfuns.h:537 #50

Open
jcaccavo opened this issue Jul 11, 2023 · 4 comments

Comments

@jcaccavo
Copy link

Hi there,

I got the following error when trying to run the fit_LDdecay.R script:
Rscript --vanilla --slave /srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/x_scripts/ngsLD/scripts/fit_LDdecay.R --n_ind=43 --plot_scale=4 --ld_files ld_files_noDS.list --out dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD_decay1.pdf

Random seed: 41963
Warning message:
In read.table(opt$ld_files, header = opt$header, stringsAsFactors = FALSE) :
  incomplete final line found by readTableHeader on 'ld_files_noDS.list'
==> Fitting r2 LD decay assuming a one (rate of decay) parameter decay model
Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  :
  long vectors not supported yet: ../../../include/Rinlinedfuns.h:537
Calls: read.table -> type.convert -> type.convert.default
Execution halted

I'm using R/4.2.2.

I get this same error running the fit_LDdecay.R script on 2 other ngsLD outputs. Interestingly, for 1 ngsLD output, I do not get an error and am able to create the decay plot without any issues.

I wonder if it is a file size issue? The file sizes for the ngsLD outputs are as follows:

724G	dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd11_NR_depth_DS10X_LD
374G	dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd18_NR_depth_DS5X_LD
52G*	dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd29_NR_depth_DS2X_LD
333G	dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD

The file size with the asterisk (52G) is the only one that works.

I'm pasting below the heads of the 4 LD files (the 3 that don't work, and the 1 that works), and the 4 input .list files (simple text files with the name of the input file for the script) can be downloaded from my dropbox.

If you have any advice as to how I might be able to generate decay plots for these 3 ngsLD outputs, or if you require any further information, please let me know.

Thanks,
Jilda

# the DS10X_LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd11_NR_depth_DS10X_LD <==
HiC_scaffold_10:52562	HiC_scaffold_10:52709	147	0.994830	0.062234	1.000000	0.999999
HiC_scaffold_10:52422	HiC_scaffold_10:52429	7	0.999931	0.062214	1.000000	1.000000
HiC_scaffold_10:52430	HiC_scaffold_10:52562	132	0.998754	0.062226	1.000000	1.000000
HiC_scaffold_10:52429	HiC_scaffold_10:52430	1	0.999925	0.062214	1.000000	1.000000
HiC_scaffold_10:50950	HiC_scaffold_10:51186	236	0.987818	0.062257	1.000000	1.000000
HiC_scaffold_10:51493	HiC_scaffold_10:51494	1	0.999926	0.067308	0.999994	0.999986
HiC_scaffold_10:51186	HiC_scaffold_10:51289	103	0.015024	-0.004266	1.000000	0.004883
HiC_scaffold_10:52709	HiC_scaffold_10:53129	420	0.961784	0.062310	1.000000	0.999993
HiC_scaffold_10:53139	HiC_scaffold_10:53143	4	0.025715	-0.005257	1.000000	0.006112
HiC_scaffold_10:52202	HiC_scaffold_10:52352	150	0.998818	0.062218	1.000000	1.000000

# the DS5X_LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd18_NR_depth_DS5X_LD <==
HiC_scaffold_1093:2618831	HiC_scaffold_1093:2618842	11	0.139283	-0.064876	0.999946	0.137370
HiC_scaffold_10:52107	HiC_scaffold_10:52108	1	0.419782	0.058858	0.864473	0.566208
HiC_scaffold_10:52534	HiC_scaffold_10:52536	2	0.996941	0.070424	0.999987	0.999971
HiC_scaffold_10:52530	HiC_scaffold_10:52531	1	0.990613	0.070602	0.999993	0.999982
HiC_scaffold_10:52519	HiC_scaffold_10:52521	2	0.986029	0.089175	0.999999	0.999986
HiC_scaffold_10:52530	HiC_scaffold_10:52534	4	0.915120	0.070450	0.999997	0.999983
HiC_scaffold_10:52283	HiC_scaffold_10:52291	8	0.980522	0.071732	0.999997	0.999991
HiC_scaffold_10:54168	HiC_scaffold_10:54170	2	0.390106	0.071936	0.998830	0.996375
HiC_scaffold_10:52530	HiC_scaffold_10:52536	6	0.918061	0.070397	0.999998	0.999984
HiC_scaffold_10:52291	HiC_scaffold_10:52292	1	0.877743	0.074759	1.000000	0.999993

# the DS2X_LD dataset DOES work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd29_NR_depth_DS2X_LD <==
HiC_scaffold_10:79542	HiC_scaffold_10:79543	1	0.793716	0.071503	1.000000	0.999345
HiC_scaffold_10:78110	HiC_scaffold_10:78111	1	0.996641	0.044456	0.999972	0.999935
HiC_scaffold_10:79542	HiC_scaffold_10:79544	2	0.999996	0.056655	0.999909	0.999817
HiC_scaffold_10:78112	HiC_scaffold_10:78113	1	0.924173	0.043201	0.999983	0.999952
HiC_scaffold_10:79542	HiC_scaffold_10:79545	3	0.936968	0.056903	0.999909	0.999794
HiC_scaffold_10:78113	HiC_scaffold_10:78119	6	0.877176	0.043487	0.999982	0.999943
HiC_scaffold_10:79542	HiC_scaffold_10:79548	6	0.988844	0.056420	0.999907	0.999814
HiC_scaffold_10:78108	HiC_scaffold_10:78109	1	0.913702	0.044911	0.999965	0.999909
HiC_scaffold_10:78111	HiC_scaffold_10:78112	1	0.999846	0.044405	0.999973	0.999945
HiC_scaffold_10:78102	HiC_scaffold_10:78108	6	0.877358	0.044839	0.999957	0.999900

# the LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD <==
HiC_scaffold_10:60766	HiC_scaffold_10:60767	1	0.995350	0.044978	0.999946	0.999871
HiC_scaffold_10:54182	HiC_scaffold_10:54183	1	0.975895	0.068707	0.999949	0.999897
HiC_scaffold_10:60759	HiC_scaffold_10:60765	6	0.994958	0.046003	0.999965	0.999926
HiC_scaffold_10:53894	HiC_scaffold_10:53896	2	0.608587	0.043776	0.774698	0.577412
HiC_scaffold_10:51123	HiC_scaffold_10:51124	1	0.999997	0.048738	0.999996	0.999992
HiC_scaffold_10:59207	HiC_scaffold_10:60759	1552	0.000472	0.001614	0.036713	0.000302
HiC_scaffold_10:53894	HiC_scaffold_10:54168	274	0.011254	-0.004288	0.999751	0.004911
HiC_scaffold_10:60759	HiC_scaffold_10:60766	7	0.938774	0.045220	0.999980	0.999940
HiC_scaffold_10:51626	HiC_scaffold_10:52036	410	0.085267	0.136073	0.673758	0.330037
HiC_scaffold_10:56660	HiC_scaffold_10:59207	2547	0.000216	-0.000725	0.025851	0.000027
@fgvieira
Copy link
Owner

What is the input of the file ld_files_noDS.list?
R gives a warning when reading it:

Random seed: 41963
Warning message:
In read.table(opt$ld_files, header = opt$header, stringsAsFactors = FALSE) :
  incomplete final line found by readTableHeader on 'ld_files_noDS.list'

@jcaccavo
Copy link
Author

Thanks for your response!

The ld_files_noDS.list is classified per file as ASCII text, with no line terminators, as are all of my .list input files to the fit_LDdecay.R script. I have no problem plotting the LD decay with this R script for the ld_files_2X.list file, but the other 3 (ld_files_10X.list, ld_files_5X.list, ld_files_noDS.list) all result in the error indicated above.

These .list files provide the filename for the input file to the R script. This input file name indicated in the .list files is the output of ngsLD. For the input file name indicated in ld_files_noDS.list (dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD), this file represents the output from ngsLD. This output was achieved by running ngsLD with the following code:

/srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/x_scripts/ngsLD/ngsLD --geno /srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/15_angsd/dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth.beagle.gz --probs --n_ind 43 --n_sites 5044175 --n_threads 40 --max_kb_dist 100 --min_maf 0.05 --seed 1 --posH dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_SNPs_pos.txt --out dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD

Thank you again for your help, and please let me know if I addressed your question, or if there is more/different information that I can provide.

@fgvieira
Copy link
Owner

Can you send me a small example file so I can try to reproduce the error?

@jcaccavo
Copy link
Author

Apologies for the delayed response.

Given that I fear the issue may be related to the size of my input LD files, I wonder if it would not be best to work with the original files, if possible. Of the 3 datasets that aren't working with the fit_LDdecay.R script, I've zipped the input LD file, which nonetheless remains at 84 GB, and you can download it from here.

Here is the list file I am using as input to the fit_LDdecay.R script, which simply identified the file location of the input LD file indicated above.

If it's not possible to download/work with these files, if you could suggest an alternative way forward, that would be great.

Thanks so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants