Skip to content

Releases: malariagen/malariagen-data-python


01 Apr 16:47
Choose a tag to compare


  • Ag3: Limit docstring widths for better wrapping in colab help tabs
  • Ag3: Return a copy of cached DataFrames to any subsequent user
    modifications do not affect the cached data
  • Ag3: Improving zooming behaviour of bokeh genome plots
  • Ag3: Add sample identifiers to CNV HMM heatmap plots
  • Ag3: Exclude high coverage variance samples by default in CNV HMM
    heatmap plots
  • Ag3: Standardise default width of bokeh genome plots
  • Ag3: Consistently capitalise plot labels
  • Ag3: Tidy title for CNV HMM heatmap plots when using multiple
    sample sets
  • Ag3: Fix a bug in loading of gene CNV frequencies where
    intermediate species samples are missing

What's Changed

Full Changelog: v3.0.0...v3.1.0


14 Mar 17:40
Choose a tag to compare


  • Added a new function Ag3.plot_cnv_hmm_coverage() which generates a
    bokeh plot showing normalised coverage and HMM copy number for an
    individual sample.

  • Added a new function Ag3.plot_cnv_hmm_heatmap() which generates a
    bokeh plot showing the HMM copy number for multiple samples as a

  • Added support for accessing genome regions to the CNV data access
    functions Ag3.cnv_hmm(), Ag3.gene_cnv(),
    Ag3.gene_cnv_frequencies() and Ag3.cnv_coverage_calls()
    (GH113). Please
    use the region parameter to specify a contig or contig region. The
    previous contig parameter is no longer supported.

  • Added support for a region parameter to the Ag3.geneset()

  • Added docstrings for Ag3.plot_genes() and Ag3.plot_transcript()

  • Set plot width and height automatically in
    Ag3.plot_frequencies_heatmap() based on the number of rows and

What's Changed

Full Changelog: v2.2.0...v3.0.0


09 Mar 22:18
Choose a tag to compare


  • Added a new function Ag3.plot_genes() which generates a bokeh plot
    of gene annotations

  • Added a new function Ag3.plot_transcript() which generates a bokeh
    plot of a gene model

  • Fixed a bug in the Ag3.gene_cnv_frequencies() function

  • CI improvements

What's Changed

Full Changelog: v2.1.0...v2.2.0


08 Mar 00:18
Choose a tag to compare


  • Ag3: Add support for giving a list of contigs to the contig
    parameter in gene_cnv() and gene_cnv_frequencies()

  • Ag3: Miscellaneous optimisations and documentation fixes

What's Changed

Full Changelog: v2.0.0...v2.1.0


03 Mar 13:37
Choose a tag to compare


New features and API changes

  • Ag3: New functions have been added for space-time analysis of SNP
    allele frequencies and gene CNV frequencies

    • The new function plot_frequencies_time_series() creates faceted time
      series plots of frequencies using plotly.

    • The new function plot_frequencies_interactive_map() creates an
      ipyleaflet map with coloured markers representing frequencies in
      different cohorts, with widgets to select the variant, taxon and
      time period of interest.

    • The new function plot_frequencies_map_markers() supports plotting
      frequency markers on an existing ipyleaflet map.

    • The new function snp_allele_frequencies_advanced() computes SNP
      allele frequencies in a transcript of interest and returns an
      xarray dataset which can be used as input to space and time
      plotting functions.

    • The new function aa_allele_frequencies_advanced() computes amino
      acid substitution frequencies in a transcript of interest and
      returns an xarray dataset which can be used as input to space and
      time plotting functions.

    • The new function gene_cnv_frequences_advanced() computes gene
      CNV frequencies for a given contig and returns an xarray dataset
      which can be used as input to space and time plotting functions.

    • The function aa_allele_frequencies() has been modified
      to better handle the case where SNPs at different genome positions
      cause the same amino acid change.

  • Ag3: The function gene_cnv_frequencies() has been modified so
    that each row now represents a gene and variant (amplification or
    deletion), and columns are cohorts
    (GH139). Also
    a new parameter drop_invariant has been added, which is True by
    default, meaning that only records with some evidence of copy number
    variation in the given cohorts are returned.

  • Ag3: Samples with high coverage variance are now removed by
    default when running the gene_cnv_frequencies(), and this can be
    controlled via a new max_coverage_variance parameter
    (GH141). To
    support this, the sample_coverage_variance variable has been added
    to the output of the gene_cnv() function

  • Ag3: All functions accepting a sample_sets parameter now check
    for the same sample set being selected more than once

  • Ag3: The functions which plot frequencies, including
    plot_frequencies_heatmap(), plot_frequencies_time_series(), and
    plot_frequencies_interactive_map(), have been modified to use
    consistent labels for variants

  • Ag3: The frequencies plotting functions now automatically set a
    title based on metadata from the input dataframe or dataset
    (GH146). The
    cohorts axis labels have also been moved to the bottom to make room
    for a title.

  • Ag3: All column names in sample metadata dataframes are now lower
    case, and columns starting "adm" have been renamed to start with
    "admin" (e.g., "adm1_ISO" has been renamed to "admin1_iso") to have
    consistent naming of columns and parameter values relating to
    administrative units

  • Ag3: Functions cnv_hmm(), cnv_coverage_calls() and
    cnv_discordant_read_calls() support multiple contigs for the
    contig parameter and automatically concatenate datasets

Bug fixes, maintenance and documentation

  • Ag3: Function docstrings have been improved to document return

  • Ag3: Improve repr methods

Pull requests

New Contributors

Full Changelog: v1.0.1...v2.0.0


09 Feb 00:40
Choose a tag to compare

What's Changed

  • Expose more imshow parameters in plot_frequencies_heatmap by @alimanfoo in #134

Full Changelog: v1.0.0...v1.0.1


09 Feb 00:01
Choose a tag to compare


New features and API changes

  • Ag3: Added support for genome regions when accessing data
    (GH14). N.B.,
    the contig parameter is no longer supported, instead use the
    region parameter which can be a contig ID (e.g., "3L"), a contig
    region (e.g., "3L:1000000-2000000"), a gene ID ("AGAP004070"), or a
    list of any of the above. This affects methods including
    snp_sites(), site_filters(), snp_genotypes() and
    snp_dataset(). Contributed by Nace

  • Ag3: The parameters for specifying which species analysis version
    is used have changed
    (GH55). This
    affects species_calls(), sample_metadata(),
    snp_allele_frequencies() and gene_cnv_frequencies(). In most
    cases the default values for these parameters should be appropriate
    and so no changes to your code should be needed.

  • Ag3: The names of the columns in dataframes containing data
    related to species calling have changed to make it clearer which
    species calling method has been used. This affects dataframes
    returned by species_calls() and sample_metadata(). See
    for further details.

  • Ag3: The latest cohorts metadata are now automatically loaded and
    joined in with the sample metadata when calling
    sample_metadata(). See
    for further details.

  • Ag3: SNP effects are now automatically included in the output
    dataframe from snp_allele_frequencies()

  • Ag3: Added a new sample_query parameter to methods returning
    frequencies to allow for making a sub-selection of samples

  • Ag3: Added a new method aa_allele_frequencies() to return a
    dataframe of amino acid substitution allele frequencies

  • Ag3: Added a new method plot_frequencies_heatmap() for creating
    a heatmap plot of allele frequencies

  • Ag3: The Google Cloud Storage URL ("gs://vo_agam_release") is now
    the default value when instantiating the Ag3 class
    (GH103). So
    now you don't need to provide it if you are accessing data from
    GCS. I.e., you can just do:

import malariagen_data
ag3 = malariagen_data.Ag3()
  • Ag3: The identifiers used for data releases have been changed to
    use "3.0" instead of "v3", "3.1" instead of "v3.1",
    etc. (GH104)

  • The Ag3 and Amin1 classes have a better repr

  • Ag3: All dataframe columns containing allele frequency values are
    now prefixed with "frq_" to allow for easier selection of frequency

  • Ag3: When computing frequencies, automatically drop columns for
    cohorts below the minimum cohort size

  • Amin1: Added support for region parameter instead of contig

  • Ag3: The snp_sites() method no longer returns a tuple of arrays
    if the field parameter is not provided, please provide an explicit
    field parameter or use the snp_calls() method instead

Bug fixes, maintenance and documentation

  • Ag3: Move default values for analysis parameters to constants

  • Ag3: Check for manifest.tsv when discovering a release

  • Ag3: Decode sample IDs when building snp_calls() dataset

  • Ag3: Fix snp_calls() cannot take multiple releases for
    sample_set parameter

  • Ag3: Fix chunks parameter appears to be ignored

  • Support Python 3.9

  • Ag3: Fix pandas performance warnings

  • Ag3: Fix bug involving inconsistent array lengths before and after

  • Ag3: Fix compatibility with zarr 2.11.0

  • Some optimisations to speed up the test suite a bit

Pull requests

New Contributors

Full Changelog: v0.15.0...v1.0.0


15 Nov 15:28
Choose a tag to compare

Updates default cohorts_analysis parameter to latest analysis (20211101).

What's Changed

Full Changelog: v0.14.1...v0.15.0


10 Nov 18:13
Choose a tag to compare

What's Changed

  • Fix bug in applying site_mask parameter in Amin1 by @alimanfoo in #77

Full Changelog: v0.14.0...v0.14.1


10 Nov 18:11
Choose a tag to compare


  • Adds the Amin1 class providing access to the Anopheles minimus
    Amin1 SNP data release.

Pull requests

Full Changelog: v0.12.1...v0.14.0