-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelizing read grouping post alignment #23
Comments
Hi, Ram, Maybe you could try PBLAT or Nucmer for AlignGraph? The former is the parallelized version of BLAT and the latter is much faster. Best, From: ramprasadn [[email protected]] Hi, I'm running AlignGraph for one of my projects and it has been running for quite sometime now. Upon closer inspection, I realized that the time consuming step is where AlignGraph groups reads that map to reference contigs into separate files (tmp/_reads_genome* files). This step is taking roughly four minutes for each contig in my case. I have approximately 3000 contigs and that means AlignGraph will be at this stage for atleast 200 hours. So I have a suggestion, perhaps it would be nice to have this step parallelized? If AlignGraph could independently handle multiple instances of this sorting, I could use more threads and get past this step faster. I have at least ten reference based assemblies to make and I would like for this step to not be the rate limiting one. Thank you very much, — |
Thanks for your response, Bao. I tried that but for some reason Aligngraph seems to be going for blat instead. When I do top to check up on the processes, I can see that pblat is invoked before aligning a contig to the reference genome, but for some reason it then quickly changes to blat. I think something's off here, as the contigs_genome..psl.tmp._ files are empty. I'm using the latest version of pblat from https://github.com/icebert/pblat. Considering the fact that there source was from a year ago, I think I'm using the right version, but there is no error message on the terminal so there is no way for me to tell what's happening there. What do you suggest? I've checked and I know that I have pblat in the path. In my run, the initial blat and bowtie runs were finished in about a day and half, its the read grouping post alignment has been going on for about five days and at this rate, it will take three more days to finish. It would be great if I could get pblat to work as that will allow the initial stages to finish in a couple of hours and perhaps in a later version read grouping could be parallelized as well, something that an user could specify. Even if I only could use four threads it will be roughly three times faster. Just a suggestion :) Cheers, |
Hi, Ram, If PBLAT switches to BLAT automatically, it means PBLAT meets some problem and cannot proceed (e.g. crash). I guess after the process of the first contig, PBLAT crashed. So, maybe what we can do is waiting for a more stable PBLAT. Best, From: ramprasadn [[email protected]] Thanks for your response, Bao. I tried that but for some reason Aligngraph seems to be going for blat instead. When I do top to check up on the processes, I can see that pblat is invoked before aligning a contig to the reference genome, but for some reason it then quickly changes to blat. Perhaps something's off? I'm using the latest version of pblat from https://github.com/icebert/pblat. Considering the fact that there source was from a year ago, I think I'm using the right version, but there is no error message on the terminal so there is no way for me to tell what's happening there. What do you suggest? I've checked and I know that I have pblat in the path. In my run, the initial blat and bowtie runs were finished in about a day and half, its the read grouping post alignment has been going on for about five days and at this rate, it will take three more days to finish. It would be great if I could get pblat to work as that will allow the initial stages to finish in a couple of hours and perhaps in a later version read grouping could be parallelized as well, something that an user could specify. Even if I only could use four threads it will be roughly three times faster. Cheers, — |
That's probably it. Hopefully, their new version will fix this issue. Thanks, |
Hi,
I'm running AlignGraph for one of my projects and it has been running for quite sometime now. Upon closer inspection, I realized that the time consuming step is where AlignGraph groups reads that map to reference contigs into separate files (tmp/_reads_genome* files). This step is taking roughly four minutes for each contig in my case. I have approximately 3000 contigs and that means AlignGraph will be at this stage for atleast 200 hours. So I have a suggestion, perhaps it would be nice to have this step parallelized? If AlignGraph could independently handle multiple instances of this sorting, I could use more threads and get past this step faster. I have at least ten reference based assemblies to make and I would like for this step to not be the rate limiting one.
Thank you very much,
Ram
The text was updated successfully, but these errors were encountered: