Releases: malariagen/malariagen-data-python
v3.1.0
Highlights
Ag3
: Limit docstring widths for better wrapping in colab help tabs
(GH186).Ag3
: Return a copy of cached DataFrames to any subsequent user
modifications do not affect the cached data
(GH184).Ag3
: Improving zooming behaviour of bokeh genome plots
(GH189).Ag3
: Add sample identifiers to CNV HMM heatmap plots
(GH191).Ag3
: Exclude high coverage variance samples by default in CNV HMM
heatmap plots
(GH178).Ag3
: Standardise default width of bokeh genome plots
(GH174).Ag3
: Consistently capitalise plot labels
(GH176).Ag3
: Tidy title for CNV HMM heatmap plots when using multiple
sample sets
(GH175).Ag3
: Fix a bug in loading of gene CNV frequencies where
intermediate species samples are missing
(GH183).
What's Changed
- CNV gene frequency bug fixes by @alimanfoo in #188
- Misc plotting improvements by @alimanfoo in #192
- Return copies of cached dataframes by @alimanfoo in #194
- Narrow docstrings by @alimanfoo in #195
- V3.1.0 prep by @alimanfoo in #199
Full Changelog: v3.0.0...v3.1.0
v3.0.0
Highlights
-
Added a new function
Ag3.plot_cnv_hmm_coverage()
which generates a
bokeh plot showing normalised coverage and HMM copy number for an
individual sample. -
Added a new function
Ag3.plot_cnv_hmm_heatmap()
which generates a
bokeh plot showing the HMM copy number for multiple samples as a
heatmap. -
Added support for accessing genome regions to the CNV data access
functionsAg3.cnv_hmm()
,Ag3.gene_cnv()
,
Ag3.gene_cnv_frequencies()
andAg3.cnv_coverage_calls()
(GH113). Please
use theregion
parameter to specify a contig or contig region. The
previouscontig
parameter is no longer supported. -
Added support for a
region
parameter to theAg3.geneset()
function. -
Added docstrings for
Ag3.plot_genes()
andAg3.plot_transcript()
(GH170). -
Set plot width and height automatically in
Ag3.plot_frequencies_heatmap()
based on the number of rows and
columns.
What's Changed
- Ag3 CNV improvements by @alimanfoo in #171
Full Changelog: v2.2.0...v3.0.0
v2.2.0
Highlights
-
Added a new function
Ag3.plot_genes()
which generates a bokeh plot
of gene annotations
(GH154). -
Added a new function
Ag3.plot_transcript()
which generates a bokeh
plot of a gene model
(GH155). -
Fixed a bug in the
Ag3.gene_cnv_frequencies()
function
(GH166). -
CI improvements
(GH150).
What's Changed
- Fix CNV frequencies bug by @alimanfoo in #167
- Add gene and transcript plotting functions by @alimanfoo in #168
- v2.2.0 release prep by @alimanfoo in #169
Full Changelog: v2.1.0...v2.2.0
v2.1.0
Highlights
-
Ag3
: Add support for giving a list of contigs to thecontig
parameter ingene_cnv()
andgene_cnv_frequencies()
(GH162). -
Ag3
: Miscellaneous optimisations and documentation fixes
(GH153,
GH158,
GH159,
GH161).
What's Changed
- Add support for multiple contigs to Ag3 gene_cnv... functions by @alimanfoo in #163
- Misc maintenance 2022-03-07 by @alimanfoo in #164
- v2.1.0 release prep by @alimanfoo in #165
Full Changelog: v2.0.0...v2.1.0
v2.0.0
Highlights
New features and API changes
-
Ag3
: New functions have been added for space-time analysis of SNP
allele frequencies and gene CNV frequencies
(GH143).-
The new function
plot_frequencies_time_series()
creates faceted time
series plots of frequencies using plotly. -
The new function
plot_frequencies_interactive_map()
creates an
ipyleaflet map with coloured markers representing frequencies in
different cohorts, with widgets to select the variant, taxon and
time period of interest. -
The new function
plot_frequencies_map_markers()
supports plotting
frequency markers on an existing ipyleaflet map. -
The new function
snp_allele_frequencies_advanced()
computes SNP
allele frequencies in a transcript of interest and returns an
xarray dataset which can be used as input to space and time
plotting functions. -
The new function
aa_allele_frequencies_advanced()
computes amino
acid substitution frequencies in a transcript of interest and
returns an xarray dataset which can be used as input to space and
time plotting functions. -
The new function
gene_cnv_frequences_advanced()
computes gene
CNV frequencies for a given contig and returns an xarray dataset
which can be used as input to space and time plotting functions. -
The function
aa_allele_frequencies()
has been modified
to better handle the case where SNPs at different genome positions
cause the same amino acid change.
-
-
Ag3
: The functiongene_cnv_frequencies()
has been modified so
that each row now represents a gene and variant (amplification or
deletion), and columns are cohorts
(GH139). Also
a new parameterdrop_invariant
has been added, which is True by
default, meaning that only records with some evidence of copy number
variation in the given cohorts are returned. -
Ag3
: Samples with high coverage variance are now removed by
default when running thegene_cnv_frequencies()
, and this can be
controlled via a newmax_coverage_variance
parameter
(GH141). To
support this, thesample_coverage_variance
variable has been added
to the output of thegene_cnv()
function
(GH128). -
Ag3
: All functions accepting asample_sets
parameter now check
for the same sample set being selected more than once
(GH144). -
Ag3
: The functions which plot frequencies, including
plot_frequencies_heatmap()
,plot_frequencies_time_series()
, and
plot_frequencies_interactive_map()
, have been modified to use
consistent labels for variants
(GH145). -
Ag3
: The frequencies plotting functions now automatically set a
title based on metadata from the input dataframe or dataset
(GH146). The
cohorts axis labels have also been moved to the bottom to make room
for a title. -
Ag3
: All column names in sample metadata dataframes are now lower
case, and columns starting "adm" have been renamed to start with
"admin" (e.g., "adm1_ISO" has been renamed to "admin1_iso") to have
consistent naming of columns and parameter values relating to
administrative units
(GH142). -
Ag3
: Functionscnv_hmm()
,cnv_coverage_calls()
and
cnv_discordant_read_calls()
support multiple contigs for the
contig
parameter and automatically concatenate datasets
(GH90).
Bug fixes, maintenance and documentation
-
Ag3
: Function docstrings have been improved to document return
values
(GH84). -
Ag3
: Improve repr methods
(GH138).
Pull requests
- Add pf7 module by @kathryn1995 in #69
- Update version for pf7 pre-release by @kathryn1995 in #137
- Ag3 functions for space-time analysis of variant frequencies by @alimanfoo in #140
- Improve doc strings by @KellyLBennett in #135
- Fix plot_frequencies_heatmap return value by @alimanfoo in #147
- Support multiple contigs in CNV datasets by @KellyLBennett in #136
- Add sample coverage variance filtering to CNV frequencies methods by @alimanfoo in #148
- Check dup sample sets by @alimanfoo in #149
- Misc aesthetics by @alimanfoo in #151
- v2.0.0 release prep by @alimanfoo in #152
New Contributors
- @KellyLBennett made their first contribution in #135
Full Changelog: v1.0.1...v2.0.0
v1.0.1
What's Changed
- Expose more imshow parameters in plot_frequencies_heatmap by @alimanfoo in #134
Full Changelog: v1.0.0...v1.0.1
v1.0.0
Highlights
New features and API changes
-
Ag3
: Added support for genome regions when accessing data
(GH14). N.B.,
thecontig
parameter is no longer supported, instead use the
region
parameter which can be a contig ID (e.g., "3L"), a contig
region (e.g., "3L:1000000-2000000"), a gene ID ("AGAP004070"), or a
list of any of the above. This affects methods including
snp_sites()
,site_filters()
,snp_genotypes()
and
snp_dataset()
. Contributed by Nace
Kranjc. -
Ag3
: The parameters for specifying which species analysis version
is used have changed
(GH55). This
affectsspecies_calls()
,sample_metadata()
,
snp_allele_frequencies()
andgene_cnv_frequencies()
. In most
cases the default values for these parameters should be appropriate
and so no changes to your code should be needed. -
Ag3
: The names of the columns in dataframes containing data
related to species calling have changed to make it clearer which
species calling method has been used. This affects dataframes
returned byspecies_calls()
andsample_metadata()
. See
GH93
for further details. -
Ag3
: The latest cohorts metadata are now automatically loaded and
joined in with the sample metadata when calling
sample_metadata()
. See
GH94
for further details. -
Ag3
: SNP effects are now automatically included in the output
dataframe fromsnp_allele_frequencies()
(GH95). -
Ag3
: Added a newsample_query
parameter to methods returning
frequencies to allow for making a sub-selection of samples
(GH96). -
Ag3
: Added a new methodaa_allele_frequencies()
to return a
dataframe of amino acid substitution allele frequencies
(GH101). -
Ag3
: Added a new methodplot_frequencies_heatmap()
for creating
a heatmap plot of allele frequencies
(GH102). -
Ag3
: The Google Cloud Storage URL ("gs://vo_agam_release") is now
the default value when instantiating theAg3
class
(GH103). So
now you don't need to provide it if you are accessing data from
GCS. I.e., you can just do:
import malariagen_data
ag3 = malariagen_data.Ag3()
-
Ag3
: The identifiers used for data releases have been changed to
use "3.0" instead of "v3", "3.1" instead of "v3.1",
etc. (GH104) -
The
Ag3
andAmin1
classes have a better repr
(GH111). -
Ag3
: All dataframe columns containing allele frequency values are
now prefixed with "frq_" to allow for easier selection of frequency
columns
(GH116). -
Ag3
: When computing frequencies, automatically drop columns for
cohorts below the minimum cohort size
(GH118). -
Amin1
: Added support forregion
parameter instead ofcontig
(GH119). -
Ag3
: Thesnp_sites()
method no longer returns a tuple of arrays
if thefield
parameter is not provided, please provide an explicit
field
parameter or use thesnp_calls()
method instead
(recommended).
Bug fixes, maintenance and documentation
-
Ag3
: Move default values for analysis parameters to constants
(GH70). -
Ag3
: Check for manifest.tsv when discovering a release
(GH74). -
Ag3
: Decode sample IDs when buildingsnp_calls()
dataset
(GH82). -
Ag3
: Fixsnp_calls()
cannot take multiple releases for
sample_set
parameter
(GH85). -
Ag3
: Fixchunks
parameter appears to be ignored
(GH86). -
Support Python 3.9
(GH91). -
Ag3
: Fix pandas performance warnings
(GH108). -
Ag3
: Fix bug involving inconsistent array lengths before and after
computation
(GH114). -
Ag3
: Fix compatibility with zarr 2.11.0
(GH129). -
Some optimisations to speed up the test suite a bit
(GH122).
Pull requests
- Add variance vars to CNV HMM by @leehart in #83
- Support multiple releases via the sample_sets parameter by @alimanfoo in #88
- Upgrade CI by @alimanfoo in #92
- Improve species metadata by @alimanfoo in #97
- Check release manifest by @alimanfoo in #98
- Decode sample_id to str in Ag3.snp_calls() dataset by @alimanfoo in #99
- Add effects to allele frequencies by @alimanfoo in #100
- Support genomic regions by @nkran in #106
snp_allele_frequencies()
Pandas Warning - fragmented dataframe by @cclarkson in #110- Add cohorts to sample_metadata() by @cclarkson in #109
- Misc minor improvements by @alimanfoo in #112
- Add "frq_" to columns in frequency generating methods by @cclarkson in #117
- Drop cohorts columns below min cohort size when computing frequencies by @cclarkson in #120
- Fix bug with optimised dask compress by @alimanfoo in #121
ag3.aa_allele_frequencies()
method by @cclarkson in #124- Add sample_query parameter to allele frequencies by @alimanfoo in #126
- ag3.plot_frequencies_heatmap() by @cclarkson in #125
- Zarr 2.11.0 compatibility by @alimanfoo in #130
- v1.0.0 release notes by @alimanfoo in #131
- Refactor region support and add region support to Amin1 by @alimanfoo in #132
New Contributors
Full Changelog: v0.15.0...v1.0.0
v0.15.0
Updates default cohorts_analysis
parameter to latest analysis (20211101).
What's Changed
- Update
cohorts_analysis
to latest version by @cclarkson in #79 - v0.15.0 release by @cclarkson in #81
Full Changelog: v0.14.1...v0.15.0
v0.14.1
What's Changed
- Fix bug in applying site_mask parameter in Amin1 by @alimanfoo in #77
Full Changelog: v0.14.0...v0.14.1
v0.14.0
Highlights
- Adds the
Amin1
class providing access to the Anopheles minimus
Amin1
SNP data release.
Pull requests
- Amin1 by @alimanfoo in #75
Full Changelog: v0.12.1...v0.14.0