v1.0.0
Highlights
New features and API changes
-
Ag3
: Added support for genome regions when accessing data
(GH14). N.B.,
thecontig
parameter is no longer supported, instead use the
region
parameter which can be a contig ID (e.g., "3L"), a contig
region (e.g., "3L:1000000-2000000"), a gene ID ("AGAP004070"), or a
list of any of the above. This affects methods including
snp_sites()
,site_filters()
,snp_genotypes()
and
snp_dataset()
. Contributed by Nace
Kranjc. -
Ag3
: The parameters for specifying which species analysis version
is used have changed
(GH55). This
affectsspecies_calls()
,sample_metadata()
,
snp_allele_frequencies()
andgene_cnv_frequencies()
. In most
cases the default values for these parameters should be appropriate
and so no changes to your code should be needed. -
Ag3
: The names of the columns in dataframes containing data
related to species calling have changed to make it clearer which
species calling method has been used. This affects dataframes
returned byspecies_calls()
andsample_metadata()
. See
GH93
for further details. -
Ag3
: The latest cohorts metadata are now automatically loaded and
joined in with the sample metadata when calling
sample_metadata()
. See
GH94
for further details. -
Ag3
: SNP effects are now automatically included in the output
dataframe fromsnp_allele_frequencies()
(GH95). -
Ag3
: Added a newsample_query
parameter to methods returning
frequencies to allow for making a sub-selection of samples
(GH96). -
Ag3
: Added a new methodaa_allele_frequencies()
to return a
dataframe of amino acid substitution allele frequencies
(GH101). -
Ag3
: Added a new methodplot_frequencies_heatmap()
for creating
a heatmap plot of allele frequencies
(GH102). -
Ag3
: The Google Cloud Storage URL ("gs://vo_agam_release") is now
the default value when instantiating theAg3
class
(GH103). So
now you don't need to provide it if you are accessing data from
GCS. I.e., you can just do:
import malariagen_data
ag3 = malariagen_data.Ag3()
-
Ag3
: The identifiers used for data releases have been changed to
use "3.0" instead of "v3", "3.1" instead of "v3.1",
etc. (GH104) -
The
Ag3
andAmin1
classes have a better repr
(GH111). -
Ag3
: All dataframe columns containing allele frequency values are
now prefixed with "frq_" to allow for easier selection of frequency
columns
(GH116). -
Ag3
: When computing frequencies, automatically drop columns for
cohorts below the minimum cohort size
(GH118). -
Amin1
: Added support forregion
parameter instead ofcontig
(GH119). -
Ag3
: Thesnp_sites()
method no longer returns a tuple of arrays
if thefield
parameter is not provided, please provide an explicit
field
parameter or use thesnp_calls()
method instead
(recommended).
Bug fixes, maintenance and documentation
-
Ag3
: Move default values for analysis parameters to constants
(GH70). -
Ag3
: Check for manifest.tsv when discovering a release
(GH74). -
Ag3
: Decode sample IDs when buildingsnp_calls()
dataset
(GH82). -
Ag3
: Fixsnp_calls()
cannot take multiple releases for
sample_set
parameter
(GH85). -
Ag3
: Fixchunks
parameter appears to be ignored
(GH86). -
Support Python 3.9
(GH91). -
Ag3
: Fix pandas performance warnings
(GH108). -
Ag3
: Fix bug involving inconsistent array lengths before and after
computation
(GH114). -
Ag3
: Fix compatibility with zarr 2.11.0
(GH129). -
Some optimisations to speed up the test suite a bit
(GH122).
Pull requests
- Add variance vars to CNV HMM by @leehart in #83
- Support multiple releases via the sample_sets parameter by @alimanfoo in #88
- Upgrade CI by @alimanfoo in #92
- Improve species metadata by @alimanfoo in #97
- Check release manifest by @alimanfoo in #98
- Decode sample_id to str in Ag3.snp_calls() dataset by @alimanfoo in #99
- Add effects to allele frequencies by @alimanfoo in #100
- Support genomic regions by @nkran in #106
snp_allele_frequencies()
Pandas Warning - fragmented dataframe by @cclarkson in #110- Add cohorts to sample_metadata() by @cclarkson in #109
- Misc minor improvements by @alimanfoo in #112
- Add "frq_" to columns in frequency generating methods by @cclarkson in #117
- Drop cohorts columns below min cohort size when computing frequencies by @cclarkson in #120
- Fix bug with optimised dask compress by @alimanfoo in #121
ag3.aa_allele_frequencies()
method by @cclarkson in #124- Add sample_query parameter to allele frequencies by @alimanfoo in #126
- ag3.plot_frequencies_heatmap() by @cclarkson in #125
- Zarr 2.11.0 compatibility by @alimanfoo in #130
- v1.0.0 release notes by @alimanfoo in #131
- Refactor region support and add region support to Amin1 by @alimanfoo in #132
New Contributors
Full Changelog: v0.15.0...v1.0.0