Skip to content

Releases: muellan/metacache

MetaCache 2.4.2

11 Mar 13:40
Compare
Choose a tag to compare

Improved sequence id extraction from filenames and sequence headers.

The default setting works a bit smarter now, it first tries to find NCBI-style accession or accession.version identifiers, then genbank identifiers and finally uses the filename (without path and extension).

The new command line option -sequence-id-format <type> allows the user to select a preferred method for sequence id extraction.
Available values for <type> are:

  • smart: (default), works as described above
  • ncbi: only use NCBI-style accession or accession.version identifiers
  • genbank: only use genbank identifiers
  • filename: only use filename (without path and extension)
  • leadingword: only use first contiguous stretch of non-whitespace characters

MetaCache 2.4.1

11 Mar 10:00
Compare
Choose a tag to compare

fixed abundance table formatting

  • prevent scientific notation from beeing used for read counts
  • row showing unclassified reads had the taxon column missing, now shown with taxon "--"

MetaCache 2.4.0

10 Mar 15:43
Compare
Choose a tag to compare

Changed handling of non-unique sequence IDs during database build

If a reference sequence is inserted, whose ID (e.g. NCBI accession) is already present in the database, the newer sequence will now be inserted with a modified ID (an exclamation mark + duplication counter will be appended) and a warning will be printed to stderr.

Added min/max length filter

A minimum and maximum length for reads can now be set with -min-readlen <#> and -max-readlen <#>. Reads with lengths outside of this range will not be processed, i.e., treated as if they were not present in the input file. How many reads were discarded and how many were processed is printed to stderr. The default behavior, that all reads will be processed, remains unchanged.

Other changes

  • cleaned up some includes
  • updated dates
  • changed some aspects of default code formatting

MetaCache 2.3.2

29 Feb 12:27
Compare
Choose a tag to compare
  • improved parsing of assembly_summary files with inconsistent headers

MetaCache 2.3.1

09 Mar 12:07
Compare
Choose a tag to compare
  • fixed type mismatch bug that could prevented compilation with uint64_t for MC_TARGET_ID_TYPE / MC_WINDOW_ID_TYPE / DMC_KMER_TYPE
  • allow up to 10 alphanumeric characters in NCBI-style accession ids
  • GPU version: removed outdated CUDA 10.2 and CUB from documentation

MetaCache 2.3.0

03 Jan 14:13
Compare
Choose a tag to compare
  • Removed compaction step from GPU version and speed up GPU queries. This also removes the dependency on CUB.
  • Set CUDA arch=native per default to automatically detect GPU architecture.
  • Fixed make with multiple MACROS (#34 ).

MetaCache 2.2.3

08 Jul 17:00
Compare
Choose a tag to compare

Improved merge mode:

  • Added -out option
  • Recover from malformed input files (#33)
  • Show more output on verbose info level

MetaCache 2.2.2

08 Jul 16:58
Compare
Choose a tag to compare
  • Fixed kmers on GPU for k != 16 (default was working correctly)
  • Fixed shown query parameters when running abundance estimation

MetaCache 2.2.1

12 Jan 12:18
Compare
Choose a tag to compare
  • Fixed canonical kmer on GPU for k != 16 (default was working correctly)
  • Fixed merge mode

MetaCache 2.2.0

09 Dec 15:20
Compare
Choose a tag to compare
  • Fixed the NCBI genome download script (the ftp path can be empty for some genomes).
  • Changed the default data type for storing reference sequence ids from 16 to 32 bits in order to fit all complete bacterial, viral and archaea genomes of the latest NCBI RefSeq releases.
  • The error message during the build process that should have reported that the number of sequences exceeds the supported number is fixed now.