Skip to content

v1.0.0

Compare
Choose a tag to compare
@alimanfoo alimanfoo released this 09 Feb 00:01
· 915 commits to master since this release

Highlights

New features and API changes

  • Ag3: Added support for genome regions when accessing data
    (GH14). N.B.,
    the contig parameter is no longer supported, instead use the
    region parameter which can be a contig ID (e.g., "3L"), a contig
    region (e.g., "3L:1000000-2000000"), a gene ID ("AGAP004070"), or a
    list of any of the above. This affects methods including
    snp_sites(), site_filters(), snp_genotypes() and
    snp_dataset(). Contributed by Nace
    Kranjc
    .

  • Ag3: The parameters for specifying which species analysis version
    is used have changed
    (GH55). This
    affects species_calls(), sample_metadata(),
    snp_allele_frequencies() and gene_cnv_frequencies(). In most
    cases the default values for these parameters should be appropriate
    and so no changes to your code should be needed.

  • Ag3: The names of the columns in dataframes containing data
    related to species calling have changed to make it clearer which
    species calling method has been used. This affects dataframes
    returned by species_calls() and sample_metadata(). See
    GH93
    for further details.

  • Ag3: The latest cohorts metadata are now automatically loaded and
    joined in with the sample metadata when calling
    sample_metadata(). See
    GH94
    for further details.

  • Ag3: SNP effects are now automatically included in the output
    dataframe from snp_allele_frequencies()
    (GH95).

  • Ag3: Added a new sample_query parameter to methods returning
    frequencies to allow for making a sub-selection of samples
    (GH96).

  • Ag3: Added a new method aa_allele_frequencies() to return a
    dataframe of amino acid substitution allele frequencies
    (GH101).

  • Ag3: Added a new method plot_frequencies_heatmap() for creating
    a heatmap plot of allele frequencies
    (GH102).

  • Ag3: The Google Cloud Storage URL ("gs://vo_agam_release") is now
    the default value when instantiating the Ag3 class
    (GH103). So
    now you don't need to provide it if you are accessing data from
    GCS. I.e., you can just do:

import malariagen_data
ag3 = malariagen_data.Ag3()
  • Ag3: The identifiers used for data releases have been changed to
    use "3.0" instead of "v3", "3.1" instead of "v3.1",
    etc. (GH104)

  • The Ag3 and Amin1 classes have a better repr
    (GH111).

  • Ag3: All dataframe columns containing allele frequency values are
    now prefixed with "frq_" to allow for easier selection of frequency
    columns
    (GH116).

  • Ag3: When computing frequencies, automatically drop columns for
    cohorts below the minimum cohort size
    (GH118).

  • Amin1: Added support for region parameter instead of contig
    (GH119).

  • Ag3: The snp_sites() method no longer returns a tuple of arrays
    if the field parameter is not provided, please provide an explicit
    field parameter or use the snp_calls() method instead
    (recommended).

Bug fixes, maintenance and documentation

  • Ag3: Move default values for analysis parameters to constants
    (GH70).

  • Ag3: Check for manifest.tsv when discovering a release
    (GH74).

  • Ag3: Decode sample IDs when building snp_calls() dataset
    (GH82).

  • Ag3: Fix snp_calls() cannot take multiple releases for
    sample_set parameter
    (GH85).

  • Ag3: Fix chunks parameter appears to be ignored
    (GH86).

  • Support Python 3.9
    (GH91).

  • Ag3: Fix pandas performance warnings
    (GH108).

  • Ag3: Fix bug involving inconsistent array lengths before and after
    computation
    (GH114).

  • Ag3: Fix compatibility with zarr 2.11.0
    (GH129).

  • Some optimisations to speed up the test suite a bit
    (GH122).

Pull requests

New Contributors

Full Changelog: v0.15.0...v1.0.0