Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less BUSCO genes after scaffolding. #5

Open
a-velt opened this issue Feb 13, 2018 · 3 comments
Open

Less BUSCO genes after scaffolding. #5

a-velt opened this issue Feb 13, 2018 · 3 comments

Comments

@a-velt
Copy link

a-velt commented Feb 13, 2018

Hi,

I would just like to make a return on the scaffolding of my assembly (Sanger technology) with PacBio reads (30x coverage), by using pyScaf.

pyScaf is fast and generates interesting results in the first place. I went from 2,059 scaffolds to 1,344 scaffolds, which was encouraging. Then I launched BUSCO on both assemblies and got the following results :

95.6% of complete BUSCO genes for my assembly (before pyScaf) and 78.7% of complete BUSCO genes after pyScaf. Before scaffolding, I have 37 missing genes, after pyScaf I have 284 missing genes.

I launched pyScaf with these parameters :
pyScaf.py -f Scaffolds.fasta --identity 0.80 -o Scaffolds.pyScaf.fasta -t 10 --log pyScaf_run.log --longreads all_raw_reads.Pacbio.fasta

Maybe I have to change them ? Do you have any advice to me?

@hgdarras
Copy link

hgdarras commented Jun 9, 2018

Hi,
This is probably the same problem as the one mentioned in issue #3 :

Additionly, there might be some over-scaffolding that many contigs seemed with large overlap were linked directly (without any check such as whether the contigs overlapped actually).

In this example (.tsv output of a long read scaffolding run), a 2.4 Mb scaffold and a 3.3 Mb scaffold are merged into a 3.3 Mb scaffold.
2.4 Mb of non-redundant sequence is lost in the process.

scaffold00018 3324699 2 scaffold31_size2472606 scaffold20_size3324684 1 0 -3065490 0

@a-velt
Copy link
Author

a-velt commented Jun 11, 2018

Hi !

Yes I found the problem ! I used OPERA to perform scaffolding of my Sanger assembly with PacBio reads and I saw that OPERA merged some contigs, generating this problem with BUSCO. As OPERA generates a file giving scaffolding information, I wrote a script to perform "manual" scaffolding without merging my contigs and it's perfect ! BUSCO is very good after that. If someone encounters such problems with OPERA, contact me and I will provide my script.

Thank you,
Amandine

@liguangshuo
Copy link

Hi Amandine @a-velt

I face the same question now. Could you share your script with me?

Thanks in advances
Guangshuo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants