Skip to content

Releases: nextstrain/augur

24.2.1

14 Feb 00:36
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

  • frequencies: Fixed a bug introduced in 24.2.0 that prevented --method diffusion from working alongside --tree. #1412 (@victorlin)

24.2.0

12 Feb 21:07
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • filter: Added a new option --query-columns that allows specifying what columns are used in --query along with the expected data types. If unspecified, automatic detection of columns and types is attempted. #1294 (@victorlin)
  • augur.io.read_metadata: A new optional columns argument allows specifying a subset of columns to load. The default behavior still loads all columns, so this is not a breaking change. #1294 (@victorlin)
  • augur parse: A new optional --output-id-field argument allows the user to select any ID field for the produced FASTA file (e.g. 'accession' instead of 'name' or 'strain'). #1403 (@j23414)
    • When no --output-id-field is given and the data has both name and strain fields, continue to preferentially use name over strain as the sequence ID field; but, throw a deprecation warning that the order will be switched to prefer strain over name in the future to be consistent with the rest of Augur.
    • Added entry to DEPRECATED.md.
  • Compression should now be supported for all input and output files. Please open an issue if you find one that doesn't! #1381 (@victorlin)

Bug Fixes

  • filter: In version 24.1.0, automatic conversion of boolean columns was accidentally removed. It has been restored with additional support for empty values evaluated as None. #1410 (@victorlin)
  • filter: The order of rows in --output-metadata and --output-strains now reflects the order in the original --metadata. #1294 (@victorlin)
  • filter, frequencies, refine: Performance improvements to reading the input metadata file. #1294 (@victorlin)
    • For filter, this comes with increased writing times for --output-metadata and --output-strains. However, net I/O speed still decreased during testing of this change.
  • filter: Updated the help text of --include and --include-where to explicitly state that this can add strains that are missing an entry from --sequences. #1389 (@victorlin)
  • filter: Fixed the summary messages to properly reflect force-inclusion of strains that are missing an entry from --sequences. #1389 (@victorlin)
  • filter: Updated wording of summary messages. #1389 (@victorlin)
  • Enforce UTF-8 encoding when reading and writing files. Improve error messages when a non-UTF-8 file is used. #1381 (@victorlin)

24.1.0

30 Jan 20:56
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • augur.io.read_metadata: A new optional dtype argument allows custom data types for all columns. Automatic type inference still happens by default, so this is not a breaking change. #1252 (@victorlin)
  • augur.io.read_vcf has been removed and usage replaced with TreeTime's function of the same name which has improved validation of the VCF file. #1366 (@jameshadfield)

Bug Fixes

  • filter, frequencies, refine: Speed up reading of the metadata file. #1252 (@victorlin)
  • traits: Previously, columns with only numeric values were treated as numerical data. These are now treated as categorical data for discrete trait analysis. #1252 (@victorlin)
  • Support Biopython ≥1.82 by requiring bcbio-gff ≥0.7.1. #1400 (@victorlin)

24.0.0

22 Jan 23:25
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Major Changes

  • ancestral, translate: For VCF inputs please ensure you are using TreeTime 0.11.2 or later. A large number of bugfixes and improvements have been added in both Augur and TreeTime. #1355 and TreeTime #263 (@jameshadfield)
  • ancestral, translate: GenBank files now require the (GFF mandatory) source feature to be present. #1351 (@jameshadfield)
  • ancestral, translate: For GFF files, we extract the genome/sequence coordinates by inspecting the sequence-region pragma, region type and/or source type. This information is now required. #1351 (@jameshadfield)

Features

  • ancestral, translate: Improvements to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
    • Output VCF will better match the input VCF, including CHROM name and ploidy encoding.
    • VCF inputs now require --vcf-reference-output
    • AA sequences are now exported for the tree root
    • VCF writing is now 3 orders of magnitude faster (dataset dependent)
  • ancestral, translate: A range of improvements to how we parse GFF and GenBank reference files. #1351 (@jameshadfield)
    • translate will now always export a 'nuc' annotation in the output JSON, allowing it to pass validation
    • Gene/CDS names of 'nuc' are now forbidden.
    • If a Gene/CDS in the GFF/GenBank file is unparsed we now print a warning.
  • ancestral: For VCF alignments, a VCF output file is now only created when requested via --output-vcf. #1344 (@jameshadfield)
  • ancestral: Improvements to command line arguments. #1344 (@jameshadfield)
    • Incompatible arguments are now checked, especially related to VCF vs FASTA inputs.
    • --vcf-reference and --root-sequence are now mutually exclusive.
  • translate: Tree nodes are checked against the node-data JSON input to ensure sequences are present. #1348 (@jameshadfield)
  • utils::load_features: This function may now raise AugurError. #1351 (@jameshadfield)
  • export v2: Automatically minify large outputs. Use --no-minify-json to disable this default behavior. #1352 (@victorlin)
  • Added a new file DEPRECATED.md to document timelines and progress of deprecated features in the Augur CLI and Python API. #1371 (@victorlin)

Bug Fixes

  • ancestral, translate: Various fixes to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
    • Fix incorrect (but passing) tests
    • Fix case-sensitive sequence comparisons between the root and reference sequences.
    • Fix a bug where ambiguous alleles are not inferred (see #1380 for full details).
    • Fix a bug where positions with no sequence information were assigned a base because the mask was not being computed (see #1382 for full details).
    • More than one ALT allele is now correctly parsed
    • Mutations followed by an insertion are now parsed
    • Unchanged ref genotypes are now encoded as '0' rather than '.'
    • ALT alleles "*" are now valid (introduced in VCF spec 4.2, but observed in VCF 4.1 files)
    • Positions with no variation are no longer exported
  • ancestral, translate: Fixes for JSON (non-VCF) inputs. #1355 (@jameshadfield)
    • The "reference" translations are now from the provided reference sequence, not from the root of the tree. #1355 (@jameshadfield)
    • Fix a bug where positions with no sequence information were assigned a base because the mask was not applied (see #1382 for full details)
  • ancestral, translate: Avoid incompatibilities with Biopython >=1.82. #1374, #1387 (@victorlin)
  • ancestral, translate: Address Biopython deprecation warnings. #1379 (@victorlin)
  • ancestral: Previously, the help text for --genes falsely claimed that it could accept a file. Now, it can truly claim that. #1353 (@victorlin)
  • translate: The 'source' ID for GFF files is now ignored as a potential gene feature (it is still used for overall nuc coords). #1348 (@jameshadfield)
  • translate: Improvements to command line arguments. #1348 (@jameshadfield)
    • --tree and --ancestral-sequences are now required arguments.
    • separate VCF-only arguments into their own group
  • translate: Fixes a bug in the parsing behaviour of GFF files whereby the presence of the --genes command line argument would change how we read individual GFF lines. Issue #1349, PR #1351 (@jameshadfield)
  • If TreeTimeError is encountered Augur now exits with code 2 rather than 0. (This restores the original behaviour.) #1367 (@jameshadfield)
  • Deprecate read_strains from augur.utils and add it to the public API under augur.io. #1353 (@victorlin)

23.1.1

07 Nov 21:42
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

23.1.0

22 Sep 16:44
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • Support treetime 0.11.* #1310 (@corneliusroemer)
  • export: Allow minimal export using only a (newick) tree in augur export v2. #1299 (@jameshadfield)
  • A number of schema updates and improvements #1299 (@jameshadfield)
    • We now require all nodes to have node_attrs on them with one of div or num_date present
    • Some never-used properties are removed from the schemas, including a pattern for defining nucleotide INDELs which was never used by augur or auspice.
    • Tip label defaults are now settable within the auspice-config JSON
    • Empty colorings definitions are allowed (the tree will be grey in Auspice)

Bug fixes

  • ancestral: Export amino acid sequences inferred for the root node of the tree in the node data JSON output for compatibility with augur translate output. #1317 (@huddlej)

23.0.0

05 Sep 19:17
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Major Changes

Features

  • export v2: Allow the root-sequence data to be included (inlined) in the main dataset JSON file, avoiding the need for a sidecar _root-sequence.json file. #1295 (@jameshadfield)

22.4.0

29 Aug 21:01
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • refine: Export covariance matrix and standard deviation for clock rate regression in the node data JSON output when these values are calculated by TreeTime. These new values appear in the clock data structure of the JSON output as cov and rate_std keys, respectively. #1284 (@huddlej)

Bug fixes

  • clades: Fix outputs for genes named NA (previously the value was replaced by nan). #1293 (@rneher)
  • distance: Improve documentation by describing how gaps get treated as indels and how users can ignore specific characters in distance calculations. #1285 (@huddlej)
  • Fix help output compatibility with non-Unicode streams. #1290 (@victorlin)

22.3.0

14 Aug 13:53
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • ancestral: add functionality to reconstruct ancestral amino acid sequences and add inferred mutations to the node_data_json with output equivalent to augur translate. ancestral now takes an annotation (--annotation), a list of genes (--genes), and a file name pattern for amino acid alignments (--translations). Mutations for each of these genes will be inferred and added to the output JSON to each node as a list at ['aa_muts'][gene]. The annotations will be added to the annotation field in the output JSON. Inferred amino acids sequences can be saved with the new --output-translations argument. #1258 (@rneher, @huddlej)
  • ancestral: add the ability to report mutations relative to a sequence other than the inferred root of the tree. This sequence can be specified via --root-sequence and difference between this sequence and the inferred root of the tree will be added as mutations to the root node for nucleotides and amino acids. All differences between the specified root-sequence and the inferred sequence of the root node of the tree will be added as mutations to the root node. This was previously already possible for vcf input via --vcf-reference. #1258 (@rneher)
  • refine: add mid_point as rooting option to refine. #1257 (@rneher)

Bug fixes

  • filter: In version 22.2.0, --query would fail when the .str accessor was used on a column. This has been fixed. #1277 (@victorlin)

22.2.0

31 Jul 19:04
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • Adds a new sub-command augur curate titlecase. The titlecase command is intended to apply titlecase to string fields in a metadata record (e.g. BRAINE-LE-COMTE, FRANCE -> Braine-le-Comte, France). Previously, this was available in the transform-string-fields script within the monkeypox repo.
    #1197 (@j23414 and @joverlee521)

Bug fixes

  • export v2: Previously, when strain was not used as the metadata ID column, node attributes might have gone missing from the final Auspice JSON. This has been fixed. #1260, #1262 (@victorlin, @joverlee521)
  • export v1: Added a deprecation warning for this command. #1265 (@victorlin)
  • export v1: The recently introduced flag --metadata-id-columns did not work properly due to the same export v2 bug that was fixed in this release. Instead of fixing it in export v1, drop the broken feature since this command is no longer being maintained. #1265 (@victorlin)
  • filter: Expose internal Pandas errors from --query which may be useful to users. #1267 (@victorlin)
  • filter: Previously, --query would fail when numerical comparisons were used on columns with missing values. This has been fixed. #1269 (@victorlin)