Skip to content

Releases: nextstrain/augur

25.2.0

24 Jul 16:41
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • export v2: we now limit numerical precision on floats in the JSON. This should not change how a dataset is displayed / interpreted in Auspice but allows the gzipped & minimised JSON filesize to be reduced by around 30% (dataset-dependent). #1512 (@jameshadfield)
  • traits, export v2: augur traits now reports all confidence values above 0.1% rather than limiting them to the top 4 results. There is no change in the eventual Auspice dataset as augur export v2 will still only consider the top 4. #1512 (@jameshadfield)
  • curate: Excel (.xlsx and .xls) and OpenOffice (.ods) spreadsheet files are now also supported as metadata inputs (--metadata). The first sheet in the workbook is read as tabular data. #1550 (@tsibley)

Bug Fixes

  • titers sub: Fixes a bug where antigenic weights were assigned to branches for substitutions in the incorrect order of <derived allele><position><ancestral allele> instead of <ancestral allele><position><derived allele>. #1555 (@huddlej)

25.1.1

15 Jul 17:55
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

  • curate parse-genbank-location: Fix a bug where a mix of empty and populated location-field values would result in inconsistent fields in the output NDJSON #1531(@genehack)

25.1.0

11 Jul 23:39
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

25.0.0

10 Jul 21:39
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Major changes

  • curate format-dates: Raises an error if provided date field does not exist in records. #1509 (@joverlee521)
  • All curate subcommands: Verifies all input records have the same fields and raises an error if a record does not have matching fields. #1518 (@joverlee521)

Features

  • Added a new sub-command augur curate apply-geolocation-rules to apply user curated geolocation rules to the geolocation fields in a metadata file. Previously, this was available as a script within the nextstrain/ingest repo. #1491 (@victorlin)
  • Added a default color for the "Asia" region that will be used in augur export is no custom colors are provided. #1490 (@joverlee521)
  • Added a new sub-command augur curate apply-record-annotations to apply user curated annotations to existing fields in a metadata file. Previously, this was available as a merge-user-metadata in the nextstrain/ingest repo. #1495 (@joverlee521)
  • Added a new sub-command augur curate abbreviate-authors to abbreviate lists of authors to " et al." Previously, this was avaliable as the transform-authors script within the nextstrain/ingest repo. [#1483][] (@genehack)
  • Added a new sub-command augur curate parse-genbank-location to parse the geo_loc_name field from GenBank reconds. Previously, this was available as the translate-genbank-location script within the nextstrain/ingest repo. [#1485][] (@genehack)
  • curate format-dates: Added defaults to --expected-date-formats so that ISO 8601 dates (%Y-%m-%d) and its various masked forms (e.g. %Y-XX-XX) are automatically parsed by the command. #1501 (@joverlee521)
  • Added a new sub-command augur curate transform-strain-name to filter strain names based on matching a regular expression. Previously, this was available as the transform-strain-names script within the nextstrain/ingest repo. #1514 (@genehack)
  • Added a new sub-command augur curate rename to rename field / column names. Previously, a similar version was available as the transform-field-names script within the nextstrain/ingest repo however the behaviour is slightly changed here. #1506 (@jameshadfield)

Bug Fixes

  • filter: Improve speed of checking duplicates in metadata, especially for large files. #1466 (@victorlin)
  • curate: Stop adding double quotes to the metadata TSV output when field values have internal quotes. #1493 (@joverlee521)
  • curate format-dates: Mask empty date values as XXXX-XX-XX to represent unknown dates. #1509 (@joverlee521)

24.4.0

15 May 23:20
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • All commands: Allow repeating an option that takes multiple values. Previously, if multiple option flags were specified (e.g. --exclude-where 'region=A' --exclude-where 'region=B'), only the last one was used. Now, all values are used. #1445 (@victorlin)
  • ancestral, translate: output node data files are now validated. The argument --validation-mode is added which controls this behaviour (default: error). This argument also controls validation of the input node-data file (ancestral only). #1440 (@jameshadfield)
  • export: Updated default latitudes and longitudes for geography traits. This only applies if you are not using --lat-longs to override the built in mappings. #1449 (@trvrb)

Bug Fixes

  • validation: we no longer exit with a non-zero exit code when the requested validation mode is "warn" #1440 (@jameshadfield)
  • validation: we no longer perform any validation when the requested validation mode is "skip" #1440 (@jameshadfield)
  • filter: Send all log messages to stderr. This allows output to be written to stdout (e.g. --output-strains /dev/stdout). #1459 (@victorlin)

24.3.0

18 Mar 17:20
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • filter: Added a new option --max-length to filter out sequences that are longer than a certain amount of base pairs. #1429 (@victorlin)
  • parse: Added support for environments that use pandas 2.x. #1436 (@emollier, @victorlin)

Bug Fixes

  • filter: Updated docs with an example of tiered subsampling. #1425 (@victorlin)
  • export: Fixes bug #1433 introduced in v23.1.0, that causes validation to fail when gene names start with nuc, e.g. nucleocapsid. #1434 (@corneliusroemer)
  • import: Fixes bug introduced in v24.2.0 that prevented import beast from running. #1439 (@tomkinsc)
  • translate, ancestral: Compound CDS are now exported as segmented CDS and are now viewable in Auspice. #1438 (@jameshadfield)

24.2.3

23 Feb 22:08
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

  • filter: Updated the help and report text of --min-length to explicitly state that the minimum length filter only counts standard nucleotide characters A, C, G, or T (case-insensitive). This has been the behavior since version 3.0.3.dev1, but has never been explicitly documented. #1422 (@joverlee521)
  • frequencies: Fixed a bug introduced in 24.2.0 and 24.1.0 that prevented --regions from working when providing regions other than the default "global" region. #1424

24.2.2

16 Feb 22:58
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

  • filter: In versions 24.2.0 and 24.2.1, --query stopped working in cases where internal optimizations added in version 24.2.0 failed to parse the columns from the query. It now falls back to non-optimized behavior that allows queries to work. #1418 (@victorlin)
  • filter: Handle backtick quoting in internal optimizations of --query. #1417 (@victorlin)

24.2.1

14 Feb 00:36
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Bug Fixes

  • frequencies: Fixed a bug introduced in 24.2.0 that prevented --method diffusion from working alongside --tree. #1412 (@victorlin)

24.2.0

12 Feb 21:07
Compare
Choose a tag to compare

These release notes are automatically extracted from the full changelog.

Features

  • filter: Added a new option --query-columns that allows specifying what columns are used in --query along with the expected data types. If unspecified, automatic detection of columns and types is attempted. #1294 (@victorlin)
  • augur.io.read_metadata: A new optional columns argument allows specifying a subset of columns to load. The default behavior still loads all columns, so this is not a breaking change. #1294 (@victorlin)
  • augur parse: A new optional --output-id-field argument allows the user to select any ID field for the produced FASTA file (e.g. 'accession' instead of 'name' or 'strain'). #1403 (@j23414)
    • When no --output-id-field is given and the data has both name and strain fields, continue to preferentially use name over strain as the sequence ID field; but, throw a deprecation warning that the order will be switched to prefer strain over name in the future to be consistent with the rest of Augur.
    • Added entry to DEPRECATED.md.
  • Compression should now be supported for all input and output files. Please open an issue if you find one that doesn't! #1381 (@victorlin)

Bug Fixes

  • filter: In version 24.1.0, automatic conversion of boolean columns was accidentally removed. It has been restored with additional support for empty values evaluated as None. #1410 (@victorlin)
  • filter: The order of rows in --output-metadata and --output-strains now reflects the order in the original --metadata. #1294 (@victorlin)
  • filter, frequencies, refine: Performance improvements to reading the input metadata file. #1294 (@victorlin)
    • For filter, this comes with increased writing times for --output-metadata and --output-strains. However, net I/O speed still decreased during testing of this change.
  • filter: Updated the help text of --include and --include-where to explicitly state that this can add strains that are missing an entry from --sequences. #1389 (@victorlin)
  • filter: Fixed the summary messages to properly reflect force-inclusion of strains that are missing an entry from --sequences. #1389 (@victorlin)
  • filter: Updated wording of summary messages. #1389 (@victorlin)
  • Enforce UTF-8 encoding when reading and writing files. Improve error messages when a non-UTF-8 file is used. #1381 (@victorlin)