-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing with apply_async does not work #33
Comments
Hi @lmdu, Any idea on what is actually happening here in the multiprocessing stuff? I am writing a paper for my tool which uses pyfastx and depends on parallelization. Any fix or suggestion to get this working will be really great! |
I am so sorry. Pyfastx does not support pickle, you could not use Fasta object as a parameter pass to multiprocessing. It is very complicated to implement this function. Moreover, I have not found a solution to implement file handler sharing between different processes. I would add support for pickle to pyfastx v0.9.0. |
OK, thank you for the information. But will there be a memory overhead, if I create a fasta object in every child process? Say my fasta/fastq index is of size 40 or 50GiB and I use 64 cores, so if each of my processes creates a fasta object, it means there will be a memory overhead, right? |
There may be no memory overhead. Pyfastx will not load the entire index into memory. |
OK, I will check this and see whether each process loads something in memory when a fasta/fastq object is created in every child process. |
Hi @lmdu, I tried the apply_async technique in the pyfastx's documentation. Like re-creating the fasta/fastq object inside the worker process, there is no memory overhead, but only one or two processes run out of 64 initiated processes and the rest of them goes to a sleep state. This won't really work for multiprocessing. I look forward to pyfastx v0.9.0. Thank you for your support and quick response! |
Hi @lmdu ,
Sorry again, found one more issue while trying to parallelize and share the index with multiple processes.
Here is the code:
Is it not possible to share the Fasta object or the index or the identifier objects with multiple processes anymore?
Also, if I make the fasta object and the identifier object in my code (the above is a sample dummy code) as a global variable, I could see only 1 process/core at a time be running (out of 64 cores in real code) and the rest of them are in the sleep state. Do you know why this is the behaviour?
Any help here would be great as well,
Thank you!
P.S:
OS: CentOS 7
Python : 3.7.7
pyfastx: 0.8.3
The text was updated successfully, but these errors were encountered: