Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing read grouping post alignment #23

Open
ramprasadn opened this issue Jun 21, 2016 · 4 comments
Open

Parallelizing read grouping post alignment #23

ramprasadn opened this issue Jun 21, 2016 · 4 comments

Comments

@ramprasadn
Copy link

Hi,

I'm running AlignGraph for one of my projects and it has been running for quite sometime now. Upon closer inspection, I realized that the time consuming step is where AlignGraph groups reads that map to reference contigs into separate files (tmp/_reads_genome* files). This step is taking roughly four minutes for each contig in my case. I have approximately 3000 contigs and that means AlignGraph will be at this stage for atleast 200 hours. So I have a suggestion, perhaps it would be nice to have this step parallelized? If AlignGraph could independently handle multiple instances of this sorting, I could use more threads and get past this step faster. I have at least ten reference based assemblies to make and I would like for this step to not be the rate limiting one.

Thank you very much,
Ram

@baoe
Copy link
Owner

baoe commented Jun 21, 2016

Hi, Ram,

Maybe you could try PBLAT or Nucmer for AlignGraph? The former is the parallelized version of BLAT and the latter is much faster.

Best,
Bao


From: ramprasadn [[email protected]]
Sent: Tuesday, June 21, 2016 4:47 AM
To: baoe/AlignGraph
Subject: [baoe/AlignGraph] Parallelizing read grouping post alignment (#23)

Hi,

I'm running AlignGraph for one of my projects and it has been running for quite sometime now. Upon closer inspection, I realized that the time consuming step is where AlignGraph groups reads that map to reference contigs into separate files (tmp/_reads_genome* files). This step is taking roughly four minutes for each contig in my case. I have approximately 3000 contigs and that means AlignGraph will be at this stage for atleast 200 hours. So I have a suggestion, perhaps it would be nice to have this step parallelized? If AlignGraph could independently handle multiple instances of this sorting, I could use more threads and get past this step faster. I have at least ten reference based assemblies to make and I would like for this step to not be the rate limiting one.

Thank you very much,
Ram


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/23, or mute the threadhttps://github.com/notifications/unsubscribe/AFGl8VbaMW2Ejk9rHpF13ufjeBz2w7HQks5qN89IgaJpZM4I6pK5.

@ramprasadn
Copy link
Author

ramprasadn commented Jun 21, 2016

Thanks for your response, Bao.

I tried that but for some reason Aligngraph seems to be going for blat instead. When I do top to check up on the processes, I can see that pblat is invoked before aligning a contig to the reference genome, but for some reason it then quickly changes to blat. I think something's off here, as the contigs_genome..psl.tmp._ files are empty. I'm using the latest version of pblat from https://github.com/icebert/pblat. Considering the fact that there source was from a year ago, I think I'm using the right version, but there is no error message on the terminal so there is no way for me to tell what's happening there. What do you suggest? I've checked and I know that I have pblat in the path.

In my run, the initial blat and bowtie runs were finished in about a day and half, its the read grouping post alignment has been going on for about five days and at this rate, it will take three more days to finish. It would be great if I could get pblat to work as that will allow the initial stages to finish in a couple of hours and perhaps in a later version read grouping could be parallelized as well, something that an user could specify. Even if I only could use four threads it will be roughly three times faster. Just a suggestion :)

Cheers,
Ram

@baoe
Copy link
Owner

baoe commented Jun 22, 2016

Hi, Ram,

If PBLAT switches to BLAT automatically, it means PBLAT meets some problem and cannot proceed (e.g. crash). I guess after the process of the first contig, PBLAT crashed. So, maybe what we can do is waiting for a more stable PBLAT.

Best,
Bao


From: ramprasadn [[email protected]]
Sent: Tuesday, June 21, 2016 8:46 AM
To: baoe/AlignGraph
Cc: Bao; Comment
Subject: Re: [baoe/AlignGraph] Parallelizing read grouping post alignment (#23)

Thanks for your response, Bao.

I tried that but for some reason Aligngraph seems to be going for blat instead. When I do top to check up on the processes, I can see that pblat is invoked before aligning a contig to the reference genome, but for some reason it then quickly changes to blat. Perhaps something's off? I'm using the latest version of pblat from https://github.com/icebert/pblat. Considering the fact that there source was from a year ago, I think I'm using the right version, but there is no error message on the terminal so there is no way for me to tell what's happening there. What do you suggest? I've checked and I know that I have pblat in the path.

In my run, the initial blat and bowtie runs were finished in about a day and half, its the read grouping post alignment has been going on for about five days and at this rate, it will take three more days to finish. It would be great if I could get pblat to work as that will allow the initial stages to finish in a couple of hours and perhaps in a later version read grouping could be parallelized as well, something that an user could specify. Even if I only could use four threads it will be roughly three times faster.

Cheers,
Ram


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/23#issuecomment-227482773, or mute the threadhttps://github.com/notifications/unsubscribe/AFGl8WmEXBEaT20T6eWqeusNwsmQhS8xks5qOAdvgaJpZM4I6pK5.

@ramprasadn
Copy link
Author

That's probably it. Hopefully, their new version will fix this issue.

Thanks,
Ram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants