-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specifying bloom filter size/overflowerror #10
Comments
Hi @eocampbe , This is likely an issue with bloom filter size. I have just now merged a Pull Request submitted by @rsharris which lets you specify the bloom filter size, and this might be useful to you. In order to do so, please first perform a |
@eocampbe IIRC, You'll want to specify a bloom filter size that is about the expected length of your genome, minus repeats. I.e. to the number of distinct kmers you expect in your input data. The only downside of setting it too high is it will use more memory. I think the default value was about 3G, which relates to the human genome size (but doesn't adjust downward for repeat content). And the corresponding bloom filter data structure was something like 5G bytes. |
Hi again @md5sam and @rsharris, I am now getting another issue when I try to run discoverY.py. When I use the basic command using either a female bloom filter I created OR the example data provided, like this: I get the following error: Any ideas as to what might be causing this? |
I'm sorry, that was my mistake. I'll make a correction to my fork and issue a pull request. I'm not the owner of this repo, though. So, if you want to get up and running right away, the change will be to add "bf_capacity = None" after line 43 in discoverY.py, so that it looks like this:
You'd need to be sure to use 8 spaces in front of "bf_capacity", not tab characters. |
Great, thanks! I've added that line and it seems to be working now. |
Thanks @rsharris, I've now merged your PR. |
Hi there,
I am relatively new to python and trying to run discoverY.py in female+male mode using male_contigs.fasta, kmers_from_male_reads, and female reference assembly (female.fasta) files. I am running python 3.7.4,and all the dependencies are installed properly. I created the kmers_from_male_reads file using DSK as per the readme file, and the command I used to run discoverY.py is:
python discoverY.py --mode female+male --kmer_size 25
When I run this, I get this output:
I'm finding it difficult to determine how I might fix this issue. For instance, is the line "Please set bloom filter size before running this program" the source of this error? I can't figure out how I would specify bloom filter size, as there appears to be no option to do so and I can't find any documentation about this in the readme file. Or, is this primarily a memory issue, indicated by the OverflowError? Any help you could give me would be much appreciated!
The text was updated successfully, but these errors were encountered: