-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLAT/PBLAT issue "Maximum single piece size (5000) exceeded" #25
Comments
Hi, Ale, Thank you for your interest in AlignGraph! You may find an earlier version of BLAT to process longer contigs from https://users.soe.ucsc.edu/~kent/src/. See FAQ4 for details. Best, From: Schum1 [[email protected]] Hello Bao, /home/bin/AlignGraph/AlignGraph/AlignGraph --read1 ../Start_fasta/Start_RawReads_FD.fasta --read2 ../Start_fasta/Start_RawReads_RD.fasta --contig ../../1_Short_Read_Assembly/MaSuRCA_1/CA/10-gapclose/genome.ctg.fasta --genome ../../../reference/assembly/ref_281_v5.0.softmasked_GCM.fa --distanceLow 100 --distanceHigh 1350 --extendedContig AlignGraph_1_extendedContigs.fa --remainingContig AlignGraph_1_remainingContigs.fa This is a small summary of the input reads/genomes and their length distribution (AlignGraph_Issue.xlsxhttps://github.com/baoe/AlignGraph/files/403450/AlignGraph_Issue.xlsx). So far so good, until bldatp/blat (I tested both) throw out the following error in the blat_doc.txt: Maximum single piece size (5000) exceeded by query 1.1 of size (49814). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used. I took the freedom to add some lines to the AlignGrapg.ccp. So I know that this happened around line 3654 (AlignGraph.ccp) in the "void * task1(void * arg)" when "command = "/home/bin/icebert-pblat-ed0ac17/pblat tmp/_genome." + itoa(chromosomeID) + ".fa tmp/_contigs.fa -noHead tmp/_contigs_genome." + itoa(chromosomeID) + ".psl -fastMap -threads=8 > blat_doc.txt 2> blat_doc.txt";" is called. Now, I understand that BLAT/PBLAT is struggling with aligning the "de-novo" contigs against the "reference" genome. Because some "de novo" contigs are >5000bp and blat/pblat requires them to be shorter than 5000bp (-fastMap flag to suppress gaps) this causes the error. Did I get it right? Is the only possibility to split my own "de-novo" contigs to acceptable sizes, or does a workaround exist? I would like to retain the longer contigs, if possible. Else I would just proceed and split every contig longer than 5000bp into separate fasta entries. Best regards, — |
Hi Bao, Aligngraph.ccp calls the max length for queries (5000) from pblat/blat which, on its turn, calls genoFind.h. This is where the max length for queries is set. I changed the following line in genoFind.h: /icebert-pblat-ed0ac17/inc/genoFind.h (LINE 380) and changed it to: I recompiled pblat and AlignGraph. It runs just fine :) Best, |
Thank you so much for this tip! I will be very helpful for other users! Best, From: Schum1 [[email protected]] Hi Bao, Aligngraph.ccp calls the max length for queries (5000) from pblat/blat which, on its turn, calls genoFind.h. This is where the max length for queries is set. I changed the following line in genoFind.h: /icebert-pblat-ed0ac17/inc/genoFind.h (LINE 380) and changed it to: I recompiled pblat and AlignGraph. It runs just fine :) Best, — |
thx!! |
Remove "-fastMap" in pblat command. |
thanks! |
Hello Bao,
I have assembled a de-novo genome and would like to align it to the reference genome of a close species using AlignGraph. So far so good. I run AlignGraph with the following command:
/home/bin/AlignGraph/AlignGraph/AlignGraph --read1 ../Start_fasta/Start_RawReads_FD.fasta --read2 ../Start_fasta/Start_RawReads_RD.fasta --contig ../../1_Short_Read_Assembly/MaSuRCA_1/CA/10-gapclose/genome.ctg.fasta --genome ../../../reference/assembly/ref_281_v5.0.softmasked_GCM.fa --distanceLow 100 --distanceHigh 1350 --extendedContig AlignGraph_1_extendedContigs.fa --remainingContig AlignGraph_1_remainingContigs.fa
This is a small summary of the input reads/genomes and their length distribution (AlignGraph_Issue.xlsx).
So far so good, until blat/pblat (I tested both) throws out the following error in the blat_doc.txt:
Maximum single piece size (5000) exceeded by query 1.1 of size (49814). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used.
I took the freedom to add some lines to the AlignGraph.ccp. So I know that this happened around line 3654 (AlignGraph.ccp) in the
"void * task1(void * arg)"
when
"command = "/home/bin/icebert-pblat-ed0ac17/pblat tmp/_genome." + itoa(chromosomeID) + ".fa tmp/_contigs.fa -noHead tmp/_contigs_genome." + itoa(chromosomeID) + ".psl -fastMap -threads=8 > blat_doc.txt 2> blat_doc.txt";"
is called.
Now, I understand that BLAT/PBLAT is struggling with aligning the "de-novo" contigs against the "reference" genome. Because some "de novo" contigs are >5000bp and blat/pblat requires them to be shorter than 5000bp (-fastMap flag to suppress gaps) this causes the error. Did I get it right?
Is the only possibility to split my own "de-novo" contigs to acceptable sizes, or does a workaround exist? I would like to retain the longer contigs, if possible. Else I would just proceed and split every contig longer than 5000bp into separate fasta entries.
Best regards,
Ale R.
The text was updated successfully, but these errors were encountered: