Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAT/PBLAT issue "Maximum single piece size (5000) exceeded" #25

Open
Schum1 opened this issue Aug 5, 2016 · 6 comments
Open

BLAT/PBLAT issue "Maximum single piece size (5000) exceeded" #25

Schum1 opened this issue Aug 5, 2016 · 6 comments

Comments

@Schum1
Copy link

Schum1 commented Aug 5, 2016

Hello Bao,
I have assembled a de-novo genome and would like to align it to the reference genome of a close species using AlignGraph. So far so good. I run AlignGraph with the following command:

/home/bin/AlignGraph/AlignGraph/AlignGraph --read1 ../Start_fasta/Start_RawReads_FD.fasta --read2 ../Start_fasta/Start_RawReads_RD.fasta --contig ../../1_Short_Read_Assembly/MaSuRCA_1/CA/10-gapclose/genome.ctg.fasta --genome ../../../reference/assembly/ref_281_v5.0.softmasked_GCM.fa --distanceLow 100 --distanceHigh 1350 --extendedContig AlignGraph_1_extendedContigs.fa --remainingContig AlignGraph_1_remainingContigs.fa

This is a small summary of the input reads/genomes and their length distribution (AlignGraph_Issue.xlsx).

So far so good, until blat/pblat (I tested both) throws out the following error in the blat_doc.txt:

Maximum single piece size (5000) exceeded by query 1.1 of size (49814). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used.

I took the freedom to add some lines to the AlignGraph.ccp. So I know that this happened around line 3654 (AlignGraph.ccp) in the

"void * task1(void * arg)"

when

"command = "/home/bin/icebert-pblat-ed0ac17/pblat tmp/_genome." + itoa(chromosomeID) + ".fa tmp/_contigs.fa -noHead tmp/_contigs_genome." + itoa(chromosomeID) + ".psl -fastMap -threads=8 > blat_doc.txt 2> blat_doc.txt";"

is called.

Now, I understand that BLAT/PBLAT is struggling with aligning the "de-novo" contigs against the "reference" genome. Because some "de novo" contigs are >5000bp and blat/pblat requires them to be shorter than 5000bp (-fastMap flag to suppress gaps) this causes the error. Did I get it right?

Is the only possibility to split my own "de-novo" contigs to acceptable sizes, or does a workaround exist? I would like to retain the longer contigs, if possible. Else I would just proceed and split every contig longer than 5000bp into separate fasta entries.

Best regards,
Ale R.

@baoe
Copy link
Owner

baoe commented Aug 6, 2016

Hi, Ale,

Thank you for your interest in AlignGraph! You may find an earlier version of BLAT to process longer contigs from https://users.soe.ucsc.edu/~kent/src/. See FAQ4 for details.

Best,
Bao


From: Schum1 [[email protected]]
Sent: Friday, August 05, 2016 3:27 AM
To: baoe/AlignGraph
Subject: [baoe/AlignGraph] BLAT/PBLAT issue "Maximum single piece size (5000) exceeded" (#25)

Hello Bao,
I have assembled a de-novo genome and would like to align it to the reference genome of a close species using AlignGraph. So far so good. I start AlignGraph with the following command:

/home/bin/AlignGraph/AlignGraph/AlignGraph --read1 ../Start_fasta/Start_RawReads_FD.fasta --read2 ../Start_fasta/Start_RawReads_RD.fasta --contig ../../1_Short_Read_Assembly/MaSuRCA_1/CA/10-gapclose/genome.ctg.fasta --genome ../../../reference/assembly/ref_281_v5.0.softmasked_GCM.fa --distanceLow 100 --distanceHigh 1350 --extendedContig AlignGraph_1_extendedContigs.fa --remainingContig AlignGraph_1_remainingContigs.fa

This is a small summary of the input reads/genomes and their length distribution (AlignGraph_Issue.xlsxhttps://github.com/baoe/AlignGraph/files/403450/AlignGraph_Issue.xlsx).

So far so good, until bldatp/blat (I tested both) throw out the following error in the blat_doc.txt:

Maximum single piece size (5000) exceeded by query 1.1 of size (49814). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used.

I took the freedom to add some lines to the AlignGrapg.ccp. So I know that this happened around line 3654 (AlignGraph.ccp) in the

"void * task1(void * arg)"

when

"command = "/home/bin/icebert-pblat-ed0ac17/pblat tmp/_genome." + itoa(chromosomeID) + ".fa tmp/_contigs.fa -noHead tmp/_contigs_genome." + itoa(chromosomeID) + ".psl -fastMap -threads=8 > blat_doc.txt 2> blat_doc.txt";"

is called.

Now, I understand that BLAT/PBLAT is struggling with aligning the "de-novo" contigs against the "reference" genome. Because some "de novo" contigs are >5000bp and blat/pblat requires them to be shorter than 5000bp (-fastMap flag to suppress gaps) this causes the error. Did I get it right?

Is the only possibility to split my own "de-novo" contigs to acceptable sizes, or does a workaround exist? I would like to retain the longer contigs, if possible. Else I would just proceed and split every contig longer than 5000bp into separate fasta entries.

Best regards,
Ale R.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/25, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFGl8WviKK7_lyNI4zfEonC_oXmd6nMLks5qcxAYgaJpZM4Jdjq3.

@Schum1
Copy link
Author

Schum1 commented Aug 9, 2016

Hi Bao,
tank you very much for your quick response. Because I prefer to use multithreaded pblat, I used the following approach:

Aligngraph.ccp calls the max length for queries (5000) from pblat/blat which, on its turn, calls genoFind.h. This is where the max length for queries is set. I changed the following line in genoFind.h:

/icebert-pblat-ed0ac17/inc/genoFind.h (LINE 380)
#define MAXSINGLEPIECESIZE 5000 /
maximum size of a single piece */

and changed it to:
#define MAXSINGLEPIECESIZE 1000000 /* maximum size of a single piece */ (just an arbitrary number)

I recompiled pblat and AlignGraph. It runs just fine :)

Best,
Ale

@baoe
Copy link
Owner

baoe commented Aug 9, 2016

Thank you so much for this tip! I will be very helpful for other users!

Best,
Bao


From: Schum1 [[email protected]]
Sent: Tuesday, August 09, 2016 12:08 AM
To: baoe/AlignGraph
Cc: Bao; Comment
Subject: Re: [baoe/AlignGraph] BLAT/PBLAT issue "Maximum single piece size (5000) exceeded" (#25)

Hi Bao,
tank you very much for your quick response. Because I prefer to use multithreaded plat, I used the following approach:

Aligngraph.ccp calls the max length for queries (5000) from pblat/blat which, on its turn, calls genoFind.h. This is where the max length for queries is set. I changed the following line in genoFind.h:

/icebert-pblat-ed0ac17/inc/genoFind.h (LINE 380)
#define MAXSINGLEPIECESIZE 5000 / maximum size of a single piece */

and changed it to:
#define MAXSINGLEPIECESIZE 1000000 /* maximum size of a single piece */ (just an arbitrary number)

I recompiled pblat and AlignGraph. It runs just fine :)

Best,
Ale


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/25#issuecomment-238471683, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFGl8XaRdCvg-hIZkT09w0b4VdX75BUsks5qeCeBgaJpZM4Jdjq3.

@kzukowski
Copy link

thx!!

@ferrolad
Copy link

Remove "-fastMap" in pblat command.

@sqwwww
Copy link

sqwwww commented May 11, 2024

Remove "-fastMap" in pblat command.

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants