Skip to content

Releases: rhysnewell/Lorikeet

v0.8.2

06 Dec 05:05
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.8.1...v0.8.2

v0.8.1

10 Jul 05:17
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.8.0...v0.8.1

v0.8.0

03 May 02:59
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.7.3...v0.8.0

v0.7.3

04 Oct 03:05
Compare
Choose a tag to compare
fix: release workflow copying old minimap2 header binary

v0.7.2

05 Aug 05:31
Compare
Choose a tag to compare
fix: fst calculations are now ploidy agnostic

v0.7.2rc1

05 Aug 05:30
Compare
Choose a tag to compare
fix: new releases are tagged correctly

Development build: master

17 Oct 20:48
f02a0f8
Compare
Choose a tag to compare
Pre-release
pre-release_master

Update pre-release-lorikeet.yml

v0.6.0rc2

13 Oct 06:26
dcbaf84
Compare
Choose a tag to compare

Version 0.6.0 - release candidate 2

This release candidate reintroduces consensus genome calling and strain genome discovery.
It also updates the linkage algorithm from previous versions, now utilizing a more sophisticated graph based approach for linking clusters

v0.6.0rc1

06 Oct 11:29
9c84a73
Compare
Choose a tag to compare

v0.6.0 Release Candidate 1

This release introduces the completely overhauled variant calling setup for Lorikeet. No longer does lorikeet rely on threshold based variant calling approaches, and instead takes a more sophisticated approach utilising local re-assembly of active regions. This release includes a reimplementation of the GATK HaplotypeCaller algorithm but in Rust, so hopefully it is faster. It will be at least be easier to parse multiple genomes + samples into the algorithm at once to generate called variants.

Currently, the strain resolving part of lorikeet is hidden and will be re-enabled ASAP.

The HaplotypeCaller algorithm involves breaking up genomes into potential active regions and then performing local re-assembly with the reads that mapped to those locations. The local assembly is then searched for potential haplotypes using a number of techniques and candidate haplotypes are assigned likelihoods using a pairwise HMM model to re-assign reads to the haplotypes. Ultimately, the HaplotypeCaller algorithm produces sets of high confidence variants with depths across samples.

The HaplotypeCaller code was re-implemented in Rust in order to potentially speed up the variant calling process, make it easier to parse multiple genomes and samples into the algorithm, and hopefully make use of some of the code base in future projects and in the strain resolving pipeline.

The code requires benchmarking, but early indications from tests and small datasets puts the Lorikeet variant calling speed on par with the Java implementation. I believe the real speed up will appear when multiple genomes are supplied to Lorikeet as they will be run in parallel seamlessly.

Additionally, a number of code clean-ups should be implemented as soon as possible. Primarily around the BirdToolRead, SequencesForKmers, and Kmers data structures. Currently, accessing the bytes within a read requires cloning the data with no option to create a reference pointing the data (without the added complexity of decoding every encoded base). This means SequencesForKmers and Kmers each hold a clone of the read bases which is very costly. I believe by adding a bases field to BirdToolRead that is updated when the underlying Read is changed, we can change those clones to be references and wrangle with the lifetimes to significantly speed up the graph building stage of the algorithm.

TODO:

Reimplement strain calling + abundance estimation
Reimplement consensus calling
Update README
Update Workflow image
Various code improvements

Revised genotyping

12 Nov 05:06
Compare
Choose a tag to compare

So, in keeping with tradition this release brings a bunch of changes to Lorikeet that make it pretty distant from where it was a month ago. I know only a few people are trying to keep track of all changes that keep being made here, and I'm sorry things are so stochastic. I think the words of my supervisor put it best when I told him about one of the changes I had made... "Ah, so freebayes is out this week, huh?"

Yeah, freebayes is out. Cancelled. For generating illegal instructions and segmentation fault on GPU nodes. I ain't fixing that, I'll just make my own variant caller.

Lorikeet's new best friends are UMAP and HDBSCAN. The curse of dimensionality hexed me pretty good during benchmarking, so UMAP is being used for dimensionality reduction. I chose it over PCA since it seems to discriminate grouping of variants way better. Also, since we now have to use a python library for UMAP, might as well upgrade fuzzy DBSCAN to it's better version: HDBSCAN

Changes:

  • Freebayes. OUT.
  • Fuzzy DBSCAN. OUT.
  • UMAP. IN.
  • HDBSCAN. IN.
  • Evolve now reports per sample dNdS and coverage values for each ORF

Current workflow:

lorikeet_revised (1)