-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core dumped #2
Comments
Do you have a small example file so I can replicate the error? |
How can I send you a file? |
Some services allow to just create a link (e.g. https://www.transfernow.net/en) |
Here is the link: It's an ngsLD output file, with only the first 3 columns and the 7th column. And these are only the first 150M lines. |
Hi there, I'm experiencing the same issue. I also used the ngsLD output file as input to prune_graph, unedited. This is the command I used: This is the error I get:
A zipped version of the ngsLD output file that I am using as input can be downloaded from my dropbox. The file is 12.72 GB. The head of the file looks as follows:
I am running on an HPC, and I also installed dependencies using conda. Similarly to David, I am not familiar with Rust and so it is not clear to me how to resolve this issue. Thanks in advance for your help, |
I'm trying to look into this issue, but I have been a bit busy. It seems to be a memory issue when running |
I totally understand - thanks for your efforts, and I'll stay tuned in case a solution does arise. |
It seems it was because of petgraph/petgraph#399. |
Hi there, Thanks so much for pursuing this issue, and apologies for my delayed feedback. I tried running prune_graph on the smallest of my 4 datasets (52 GB).
This worked, but it took a whopping 10 days to run on our server! Using the perl script (prune_graph.pl) to identify unlinked SNPs from the same LD file, it took <24 hours. It seems strange that prune_graph, which should be more efficient than the perl script, takes 10x as long. I think there may be in an issue in either the code or the parameters I used, because after trimming the linked SNPs from my original beagle file using the unlinked SNP ID file generated from prune_graph, I get far fewer SNPs remaining in my dataset (5090) than when I used the perl script (219,149). Using the perl script, I entered the following code and parameters:
I think an issue might be that with the perl script, I can modify the max_kb_dist, whereas with prune_graph, I don't see a way to do that. In short:
So my questions are:
Thanks in advance for your feedback! Best, |
No.
Yes, but it depends on your input file. If the file is the same as above, then the
This should only keep pairs of sites with a distance lower than In the future, you can run EDIT: the output from the |
Thank you so much for this feedback! I indeed used your suggestions to run the following code, which took 15 hours to run:
After trimming my original beagle.gz file based on the linked SNPs identified in this
I saw in the edit to your response that the outputs will not be exactly the same from It's true that the order of magnitude is the same (1 million/1 million), but I just wanted to be sure that this discrepancy of 589,222 SNPs is within the expected range, and not indicative of something wrong. Thanks so much for your help! |
Glad to hear things are starting to make sense! 😄 Just for the sake of completeness:
The two outputs are not expected to be exactly the same due to precision issues, but the difference should not be that large. EDIT: have you also tried the python script? Do you get the same? |
Thanks as always for your patience in working with me to get prune_graph to work with my data. I just re-ran prune_graph on the same dataset with the min_weight parameter modified from
The resultant number of unlinked SNPs is now 488,972, down from 808,371 with How would you recommend checking for common SNPs between two beagle.gz files? I did some searching about how to do this but didn't find any great answers. I tried zgrep -Fxf FILE1.beagle.gz FILE2.beagle.gz, but it was taking over an hour so I just killed the process. Then I tried zgrep -c FILE1.beagle.gz FILE2.beagle.gz, and I got 0, which doesn't make sense.. Unfortunately I never managed to run the python script. There was a required module (graph-tool) that I tried installing with conda, but the environment could never be solved in the installation, so I have up and I went back to trying to use prune_graph. |
I'd say that double the number, is a bit too much... Just compare the output files from both methods (no need to compare the beagles).
|
Ah of course that makes sense.. using 219,136 SNPs were identified as shared, which is nearly every single unlinked SNP identified with the perl script (219,149). It seems then that I'm attaching the two output files: dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd29_NR_depth_DS2X_unlinked_20_0.5.id.txt generated with the |
Did you do The Do you think you could generate a smaller dataset reproducing this issue? Debugging a 12.7 Gb file is kind of hard... 😄 |
I had used Remaining unlinked SNPs So indeed, the slight tweaks of the I ran a similar set of analyses on another (larger) dataset, and the results were as follows: Remaining unlinked SNPs Again with this dataset, the I am happy to move forward using I would be happy to provide a smaller dataset to reproduce this issue, I'm just not sure how to go about doing this. I produced the LD input files for the Thanks so much for all of your help. |
Thanks for looking so thoroughly into this. 😄 It would be normal for the numbers to differ a bit (due to precision issues), but the difference should not be that big. Can you send me a smaller file where I can reproduce the same pattern? A smaller dataset would be ideal but, if not possible, just get the first (e.g) 10000 lines from the beagle file and check if you see the same pattern. |
Thanks for the advice for producing the smaller file. Of my 4 datasets, I took the smallest one, and extracted the first 10000 lines from the beagle file as follows
You can download that 'mini' beagle file from my Dropbox here (the file is 11 MB). Let me know if that works, or if there's anything else I can provide that would help you reproduce this pattern. Thanks! |
Tried running your file as:
And got exactly the same result for all files:
Do you have a file where I can reproduce what you are seeing? |
Hi,
thank you for this significantly faster pruning option.
Unfortunately I have problems on
larger files (or too many lines?)some but not all files and they usually end up with a dumped core.This is sample command:
~/scripts/prune_graph/target/release/prune_graph --in snps.ld.chr1.trunc --weight-field column_7 --weight-filter "column_3 <= 50000 && column_7 >= 0.1" --out chr1.unlinked.pos
This is the error:
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
I use ngsLD files as input, either unedited or only the relevant columns:
chr1:29495 chr1:29651 156 0.093005 -0.020154 1.000000 0.027559
chr1:30862 chr1:32349 1487 0.065533 -0.012921 0.999999 0.016464
chr1:25380 chr1:27944 2564 0.633290 0.075310 0.999990 0.724847
chr1:32386 chr1:32677 291 0.996459 0.100407 1.000000 1.000000
chr1:30862 chr1:32386 1524 0.891505 0.092751 1.000000 0.914650
chr1:25380 chr1:29403 4023 0.123450 0.063818 0.999946 0.167704
System has 1TB+ RAM, 64 cores. Dependencies were installed with conda.
I have no real knowledge of Rust so there's not much I can add here or try to fix myself.
Best regards
David
The text was updated successfully, but these errors were encountered: