Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test data - Drosophila melanogaster #95

Open
gbdias opened this issue Apr 12, 2024 · 2 comments
Open

Test data - Drosophila melanogaster #95

gbdias opened this issue Apr 12, 2024 · 2 comments
Assignees

Comments

@gbdias
Copy link
Contributor

gbdias commented Apr 12, 2024

  • Drosophila melanogaster is an interesting subject for test data since
    • the genome is small (±180Mb)
    • there are only 4 chromosomes (can expect 6 chromosome arms after scaffolding because the centromeres are hard to go through even with Hi-C)
    • plenty of sequencing data is available
  • There is good PacBio HiFi data from the F1 of a cross between two highly inbred strains (A4 and ISO1).
  • This should be a good heterozygous genome that assembles very well with hifiasm. I am currently testing the assembly to make sure. Will post results to this thread.
  • There is Hi-C data for other strains, need to test if it works well.
@gbdias gbdias self-assigned this Apr 12, 2024
@gbdias
Copy link
Contributor Author

gbdias commented Apr 12, 2024

HiFi data

  • https://ncbi.nlm.nih.gov/sra/SRX6957826[accn]
  • This data was generated by PacBio for demonstrating Sequel and HiFi.
  • It has an excellent read size (size selected to 24kb).
  • The full dataset is 25.8 Gbp which corresponds to ~143x theoretical coverage.
  • I was able to subsample 10% of the reads and still get a reasonable assembly in 13 minutes using hifiasm with 20cpus.
  • Phasing is probably not great at this coverage but the haplotypes are similar enough that the final primary assembly is still very contiguous.

x-axis D.mel reference, y-axis hifiasm

SRR10238607_0 1 paf

@MartinPippel
Copy link
Contributor

Is the subsampled data set already publicly available somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants