Releases: nextstrain/augur
Releases · nextstrain/augur
25.2.0
These release notes are automatically extracted from the full changelog.
Features
- export v2: we now limit numerical precision on floats in the JSON. This should not change how a dataset is displayed / interpreted in Auspice but allows the gzipped & minimised JSON filesize to be reduced by around 30% (dataset-dependent). #1512 (@jameshadfield)
- traits, export v2:
augur traits
now reports all confidence values above 0.1% rather than limiting them to the top 4 results. There is no change in the eventual Auspice dataset asaugur export v2
will still only consider the top 4. #1512 (@jameshadfield) - curate: Excel (
.xlsx
and.xls
) and OpenOffice (.ods
) spreadsheet files are now also supported as metadata inputs (--metadata
). The first sheet in the workbook is read as tabular data. #1550 (@tsibley)
Bug Fixes
25.1.1
25.1.0
These release notes are automatically extracted from the full changelog.
Features
- Support xopen major version 2. Deprecate v1. Schedule for removal around November 2024. #1532 (@corneliusroemer)
- Support networkx major version 3. #1534 (@corneliusroemer)
25.0.0
These release notes are automatically extracted from the full changelog.
Major changes
- curate format-dates: Raises an error if provided date field does not exist in records. #1509 (@joverlee521)
- All curate subcommands: Verifies all input records have the same fields and raises an error if a record does not have matching fields. #1518 (@joverlee521)
Features
- Added a new sub-command
augur curate apply-geolocation-rules
to apply user curated geolocation rules to the geolocation fields in a metadata file. Previously, this was available as a script within the nextstrain/ingest repo. #1491 (@victorlin) - Added a default color for the "Asia" region that will be used in
augur export
is no custom colors are provided. #1490 (@joverlee521) - Added a new sub-command
augur curate apply-record-annotations
to apply user curated annotations to existing fields in a metadata file. Previously, this was available as amerge-user-metadata
in the nextstrain/ingest repo. #1495 (@joverlee521) - Added a new sub-command
augur curate abbreviate-authors
to abbreviate lists of authors to " et al." Previously, this was avaliable as thetransform-authors
script within the nextstrain/ingest repo. [#1483][] (@genehack) - Added a new sub-command
augur curate parse-genbank-location
to parse thegeo_loc_name
field from GenBank reconds. Previously, this was available as thetranslate-genbank-location
script within the nextstrain/ingest repo. [#1485][] (@genehack) - curate format-dates: Added defaults to
--expected-date-formats
so that ISO 8601 dates (%Y-%m-%d
) and its various masked forms (e.g.%Y-XX-XX
) are automatically parsed by the command. #1501 (@joverlee521) - Added a new sub-command
augur curate transform-strain-name
to filter strain names based on matching a regular expression. Previously, this was available as thetransform-strain-names
script within the nextstrain/ingest repo. #1514 (@genehack) - Added a new sub-command
augur curate rename
to rename field / column names. Previously, a similar version was available as thetransform-field-names
script within the nextstrain/ingest repo however the behaviour is slightly changed here. #1506 (@jameshadfield)
Bug Fixes
- filter: Improve speed of checking duplicates in metadata, especially for large files. #1466 (@victorlin)
- curate: Stop adding double quotes to the metadata TSV output when field values have internal quotes. #1493 (@joverlee521)
- curate format-dates: Mask empty date values as
XXXX-XX-XX
to represent unknown dates. #1509 (@joverlee521)
24.4.0
These release notes are automatically extracted from the full changelog.
Features
- All commands: Allow repeating an option that takes multiple values. Previously, if multiple option flags were specified (e.g.
--exclude-where 'region=A' --exclude-where 'region=B'
), only the last one was used. Now, all values are used. #1445 (@victorlin) - ancestral, translate: output node data files are now validated. The argument
--validation-mode
is added which controls this behaviour (default: error). This argument also controls validation of the input node-data file (ancestral only). #1440 (@jameshadfield) - export: Updated default latitudes and longitudes for geography traits. This only applies if you are not using
--lat-longs
to override the built in mappings. #1449 (@trvrb)
Bug Fixes
- validation: we no longer exit with a non-zero exit code when the requested validation mode is "warn" #1440 (@jameshadfield)
- validation: we no longer perform any validation when the requested validation mode is "skip" #1440 (@jameshadfield)
- filter: Send all log messages to
stderr
. This allows output to be written tostdout
(e.g.--output-strains /dev/stdout
). #1459 (@victorlin)
24.3.0
These release notes are automatically extracted from the full changelog.
Features
- filter: Added a new option
--max-length
to filter out sequences that are longer than a certain amount of base pairs. #1429 (@victorlin) - parse: Added support for environments that use pandas 2.x. #1436 (@emollier, @victorlin)
Bug Fixes
- filter: Updated docs with an example of tiered subsampling. #1425 (@victorlin)
- export: Fixes bug #1433 introduced in v23.1.0, that causes validation to fail when gene names start with
nuc
, e.g.nucleocapsid
. #1434 (@corneliusroemer) - import: Fixes bug introduced in v24.2.0 that prevented
import beast
from running. #1439 (@tomkinsc) - translate, ancestral: Compound CDS are now exported as segmented CDS and are now viewable in Auspice. #1438 (@jameshadfield)
24.2.3
These release notes are automatically extracted from the full changelog.
Bug Fixes
- filter: Updated the help and report text of
--min-length
to explicitly state that the minimum length filter only counts standard nucleotide characters A, C, G, or T (case-insensitive). This has been the behavior since version 3.0.3.dev1, but has never been explicitly documented. #1422 (@joverlee521) - frequencies: Fixed a bug introduced in 24.2.0 and 24.1.0 that prevented
--regions
from working when providing regions other than the default "global" region. #1424
24.2.2
These release notes are automatically extracted from the full changelog.
Bug Fixes
- filter: In versions 24.2.0 and 24.2.1,
--query
stopped working in cases where internal optimizations added in version 24.2.0 failed to parse the columns from the query. It now falls back to non-optimized behavior that allows queries to work. #1418 (@victorlin) - filter: Handle backtick quoting in internal optimizations of
--query
. #1417 (@victorlin)
24.2.1
These release notes are automatically extracted from the full changelog.
Bug Fixes
- frequencies: Fixed a bug introduced in 24.2.0 that prevented
--method diffusion
from working alongside--tree
. #1412 (@victorlin)
24.2.0
These release notes are automatically extracted from the full changelog.
Features
- filter: Added a new option
--query-columns
that allows specifying what columns are used in--query
along with the expected data types. If unspecified, automatic detection of columns and types is attempted. #1294 (@victorlin) augur.io.read_metadata
: A new optionalcolumns
argument allows specifying a subset of columns to load. The default behavior still loads all columns, so this is not a breaking change. #1294 (@victorlin)augur parse
: A new optional--output-id-field
argument allows the user to select any ID field for the produced FASTA file (e.g. 'accession' instead of 'name' or 'strain'). #1403 (@j23414)- When no
--output-id-field
is given and the data has bothname
andstrain
fields, continue to preferentially usename
overstrain
as the sequence ID field; but, throw a deprecation warning that the order will be switched to preferstrain
overname
in the future to be consistent with the rest of Augur. - Added entry to DEPRECATED.md.
- When no
- Compression should now be supported for all input and output files. Please open an issue if you find one that doesn't! #1381 (@victorlin)
Bug Fixes
- filter: In version 24.1.0, automatic conversion of boolean columns was accidentally removed. It has been restored with additional support for empty values evaluated as
None
. #1410 (@victorlin) - filter: The order of rows in
--output-metadata
and--output-strains
now reflects the order in the original--metadata
. #1294 (@victorlin) - filter, frequencies, refine: Performance improvements to reading the input metadata file. #1294 (@victorlin)
- For filter, this comes with increased writing times for
--output-metadata
and--output-strains
. However, net I/O speed still decreased during testing of this change.
- For filter, this comes with increased writing times for
- filter: Updated the help text of
--include
and--include-where
to explicitly state that this can add strains that are missing an entry from--sequences
. #1389 (@victorlin) - filter: Fixed the summary messages to properly reflect force-inclusion of strains that are missing an entry from
--sequences
. #1389 (@victorlin) - filter: Updated wording of summary messages. #1389 (@victorlin)
- Enforce UTF-8 encoding when reading and writing files. Improve error messages when a non-UTF-8 file is used. #1381 (@victorlin)