diff --git a/latest/.doctrees/Af1.doctree b/latest/.doctrees/Af1.doctree index 182f1c1f..6b0f0dc3 100644 Binary files a/latest/.doctrees/Af1.doctree and b/latest/.doctrees/Af1.doctree differ diff --git a/latest/.doctrees/Ag3.doctree b/latest/.doctrees/Ag3.doctree index eeb80775..185afc96 100644 Binary files a/latest/.doctrees/Ag3.doctree and b/latest/.doctrees/Ag3.doctree differ diff --git a/latest/.doctrees/environment.pickle b/latest/.doctrees/environment.pickle index 4d10a08d..0c765f1b 100644 Binary files a/latest/.doctrees/environment.pickle and b/latest/.doctrees/environment.pickle differ diff --git a/latest/.doctrees/generated/malariagen_data.af1.Af1.haplotypes_frequencies.doctree b/latest/.doctrees/generated/malariagen_data.af1.Af1.haplotypes_frequencies.doctree new file mode 100644 index 00000000..186aa2de Binary files /dev/null and b/latest/.doctrees/generated/malariagen_data.af1.Af1.haplotypes_frequencies.doctree differ diff --git a/latest/.doctrees/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.doctree b/latest/.doctrees/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.doctree new file mode 100644 index 00000000..7b0f02cd Binary files /dev/null and b/latest/.doctrees/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.doctree differ diff --git a/latest/.doctrees/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.doctree b/latest/.doctrees/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.doctree new file mode 100644 index 00000000..4928cf61 Binary files /dev/null and b/latest/.doctrees/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.doctree differ diff --git a/latest/.doctrees/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.doctree b/latest/.doctrees/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.doctree new file mode 100644 index 00000000..8a377933 Binary files /dev/null and b/latest/.doctrees/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.doctree differ diff --git a/latest/Af1.html b/latest/Af1.html index 455114eb..7022d97d 100644 --- a/latest/Af1.html +++ b/latest/Af1.html @@ -572,6 +572,12 @@

SNP and CNV frequency analysis

gene_cnv_frequencies_advanced(region, ...[, ...])

Group samples by taxon, area (space) and period (time), then compute gene CNV counts and frequencies.

+

haplotypes_frequencies(region, cohorts[, ...])

+

Compute haplotype frequencies for a region.

+ +

haplotypes_frequencies_advanced(region, ...)

+

Group samples by taxon, area (space) and period (time), then compute haplotype frequencies.

+

plot_frequencies_heatmap(df[, index, ...])

Plot a heatmap from a pandas DataFrame of frequencies, e.g., output from snp_allele_frequencies() or gene_cnv_frequencies().

diff --git a/latest/Ag3.html b/latest/Ag3.html index 83201937..a8f36a2e 100644 --- a/latest/Ag3.html +++ b/latest/Ag3.html @@ -597,6 +597,12 @@

SNP and CNV frequency analysis

gene_cnv_frequencies_advanced(region, ...[, ...])

Group samples by taxon, area (space) and period (time), then compute gene CNV counts and frequencies.

+

haplotypes_frequencies(region, cohorts[, ...])

+

Compute haplotype frequencies for a region.

+ +

haplotypes_frequencies_advanced(region, ...)

+

Group samples by taxon, area (space) and period (time), then compute haplotype frequencies.

+

plot_frequencies_heatmap(df[, index, ...])

Plot a heatmap from a pandas DataFrame of frequencies, e.g., output from snp_allele_frequencies() or gene_cnv_frequencies().

diff --git a/latest/_sources/Af1.rst.txt b/latest/_sources/Af1.rst.txt index 5ef355a4..1e821c14 100644 --- a/latest/_sources/Af1.rst.txt +++ b/latest/_sources/Af1.rst.txt @@ -115,6 +115,8 @@ SNP and CNV frequency analysis aa_allele_frequencies_advanced gene_cnv_frequencies gene_cnv_frequencies_advanced + haplotypes_frequencies + haplotypes_frequencies_advanced plot_frequencies_heatmap plot_frequencies_time_series plot_frequencies_interactive_map diff --git a/latest/_sources/Ag3.rst.txt b/latest/_sources/Ag3.rst.txt index ce9876e1..59fa6909 100644 --- a/latest/_sources/Ag3.rst.txt +++ b/latest/_sources/Ag3.rst.txt @@ -125,6 +125,8 @@ SNP and CNV frequency analysis aa_allele_frequencies_advanced gene_cnv_frequencies gene_cnv_frequencies_advanced + haplotypes_frequencies + haplotypes_frequencies_advanced plot_frequencies_heatmap plot_frequencies_time_series plot_frequencies_interactive_map diff --git a/latest/_sources/generated/malariagen_data.af1.Af1.haplotypes_frequencies.rst.txt b/latest/_sources/generated/malariagen_data.af1.Af1.haplotypes_frequencies.rst.txt new file mode 100644 index 00000000..fb7880be --- /dev/null +++ b/latest/_sources/generated/malariagen_data.af1.Af1.haplotypes_frequencies.rst.txt @@ -0,0 +1,6 @@ +malariagen\_data.af1.Af1.haplotypes\_frequencies +================================================ + +.. currentmodule:: malariagen_data.af1 + +.. automethod:: Af1.haplotypes_frequencies \ No newline at end of file diff --git a/latest/_sources/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.rst.txt b/latest/_sources/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.rst.txt new file mode 100644 index 00000000..1017554f --- /dev/null +++ b/latest/_sources/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.rst.txt @@ -0,0 +1,6 @@ +malariagen\_data.af1.Af1.haplotypes\_frequencies\_advanced +========================================================== + +.. currentmodule:: malariagen_data.af1 + +.. automethod:: Af1.haplotypes_frequencies_advanced \ No newline at end of file diff --git a/latest/_sources/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.rst.txt b/latest/_sources/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.rst.txt new file mode 100644 index 00000000..1a60c893 --- /dev/null +++ b/latest/_sources/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.rst.txt @@ -0,0 +1,6 @@ +malariagen\_data.ag3.Ag3.haplotypes\_frequencies +================================================ + +.. currentmodule:: malariagen_data.ag3 + +.. automethod:: Ag3.haplotypes_frequencies \ No newline at end of file diff --git a/latest/_sources/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.rst.txt b/latest/_sources/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.rst.txt new file mode 100644 index 00000000..6b6dbf78 --- /dev/null +++ b/latest/_sources/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.rst.txt @@ -0,0 +1,6 @@ +malariagen\_data.ag3.Ag3.haplotypes\_frequencies\_advanced +========================================================== + +.. currentmodule:: malariagen_data.ag3 + +.. automethod:: Ag3.haplotypes_frequencies_advanced \ No newline at end of file diff --git a/latest/generated/malariagen_data.af1.Af1.haplotypes_frequencies.html b/latest/generated/malariagen_data.af1.Af1.haplotypes_frequencies.html new file mode 100644 index 00000000..30a5094f --- /dev/null +++ b/latest/generated/malariagen_data.af1.Af1.haplotypes_frequencies.html @@ -0,0 +1,527 @@ + + + + + + + + + + + malariagen_data.af1.Af1.haplotypes_frequencies — malariagen_data API documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + +
+ +
+ + + + + +
+
+ + + + +
+ + + + + + + + + + + + + +
+ +
+ + +
+
+ +
+
+ +
+ +
+ + +
+ +
+ + +
+
+ + + + + +
+ +
+

malariagen_data.af1.Af1.haplotypes_frequencies#

+
+
+Af1.haplotypes_frequencies(region: str | Region | Mapping, cohorts: str | Mapping[str, str], sample_query: str | None = None, sample_query_options: dict | None = None, min_cohort_size: int = 10, sample_sets: Sequence[str] | str | None = None, chunks: int | str | Tuple[int | str, ...] | Callable[[Tuple[int, ...]], int | str | Tuple[int | str, ...]] = 'native', inline_array: bool = True) DataFrame#
+

Compute haplotype frequencies for a region.

+
+

Parameters#

+
+
regionstr or Region or Mapping

Region of the reference genome. Can be a contig name, region string +(formatted like “{contig}:{start}-{end}”), or identifier of a genome +feature such as a gene or transcript.

+
+
cohortsstr or Mapping[str, str]

Either a string giving the name of a predefined cohort set (e.g., +“admin1_month”) or a dict mapping custom cohort labels to sample +queries.

+
+
sample_querystr or None, optional

A pandas query string to be evaluated against the sample metadata, to +select samples to be included in the returned data.

+
+
sample_query_optionsdict or None, optional

A dictionary of arguments that will be passed through to pandas +query() or eval(), e.g. parser, engine, local_dict, global_dict, +resolvers.

+
+
min_cohort_sizeint, optional, default: 10

Minimum cohort size. Raise an error if the number of samples is less +than this value.

+
+
sample_setssequence of str or str or None, optional

List of sample sets and/or releases. Can also be a single sample set +or release.

+
+
chunksint or str or tuple of int or str or Callable[[typing.Tuple[int, …]], int or str or tuple of int or str], optional, default: ‘native’

Define how input data being read from zarr should be divided into +chunks for a dask computation. If ‘native’, use underlying zarr +chunks. If a string specifying a target memory size, e.g., ‘300 MiB’, +resize chunks in arrays with more than one dimension to match this +size. If ‘auto’, let dask decide chunk size. If ‘ndauto’, let dask +decide chunk size but only for arrays with more than one dimension. If +‘ndauto0’, as ‘ndauto’ but only vary the first chunk dimension. If +‘ndauto1’, as ‘ndauto’ but only vary the second chunk dimension. If +‘ndauto01’, as ‘ndauto’ but only vary the first and second chunk +dimensions. Also, can be a tuple of integers, or a callable which +accepts the native chunks as a single argument and returns a valid +dask chunks value.

+
+
inline_arraybool, optional, default: True

Passed through to dask from_array().

+
+
+
+
+

Returns#

+
+
DataFrame

A dataframe of haplotype frequencies, one row per haplotype.

+
+
+
+
+

Notes#

+

Cohorts with fewer samples than min_cohort_size will be excluded +from output data frame.

+
+
+ +
+ + +
+ + + + + +
+ +
+
+
+ +
+ + + + +
+ + +
+
+ +
+ +
+
+
+ + + + + + + + \ No newline at end of file diff --git a/latest/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.html b/latest/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.html new file mode 100644 index 00000000..b9208150 --- /dev/null +++ b/latest/generated/malariagen_data.af1.Af1.haplotypes_frequencies_advanced.html @@ -0,0 +1,536 @@ + + + + + + + + + + + malariagen_data.af1.Af1.haplotypes_frequencies_advanced — malariagen_data API documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + +
+ +
+ + + + + +
+
+ + + + +
+ + + + + + + + + + + + + +
+ +
+ + +
+
+ +
+
+ +
+ +
+ + +
+ +
+ + +
+
+ + + + + +
+ +
+

malariagen_data.af1.Af1.haplotypes_frequencies_advanced#

+
+
+Af1.haplotypes_frequencies_advanced(region: str | Region | Mapping, area_by: str, period_by: Literal['year', 'quarter', 'month'], sample_sets: Sequence[str] | str | None = None, sample_query: str | None = None, sample_query_options: dict | None = None, min_cohort_size: int = 10, ci_method: Literal['normal', 'agresti_coull', 'beta', 'wilson', 'binom_test'] | None = 'wilson', chunks: int | str | Tuple[int | str, ...] | Callable[[Tuple[int, ...]], int | str | Tuple[int | str, ...]] = 'native', inline_array: bool = True) Dataset#
+

Group samples by taxon, area (space) and period (time), then compute +haplotype frequencies.

+
+

Parameters#

+
+
regionstr or Region or Mapping

Region of the reference genome. Can be a contig name, region string +(formatted like “{contig}:{start}-{end}”), or identifier of a genome +feature such as a gene or transcript.

+
+
area_bystr

Column name in the sample metadata to use to group samples spatially. +E.g., use “admin1_iso” or “admin1_name” to group by level 1 +administrative divisions, or use “admin2_name” to group by level 2 +administrative divisions.

+
+
period_by{‘year’, ‘quarter’, ‘month’}

Length of time to group samples temporally.

+
+
sample_setssequence of str or str or None, optional

List of sample sets and/or releases. Can also be a single sample set +or release.

+
+
sample_querystr or None, optional

A pandas query string to be evaluated against the sample metadata, to +select samples to be included in the returned data.

+
+
sample_query_optionsdict or None, optional

A dictionary of arguments that will be passed through to pandas +query() or eval(), e.g. parser, engine, local_dict, global_dict, +resolvers.

+
+
min_cohort_sizeint, optional, default: 10

Minimum cohort size. Raise an error if the number of samples is less +than this value.

+
+
ci_method{‘normal’, ‘agresti_coull’, ‘beta’, ‘wilson’, ‘binom_test’} or None, optional, default: ‘wilson’

Method to use for computing confidence intervals, passed through to +statsmodels.stats.proportion.proportion_confint.

+
+
chunksint or str or tuple of int or str or Callable[[typing.Tuple[int, …]], int or str or tuple of int or str], optional, default: ‘native’

Define how input data being read from zarr should be divided into +chunks for a dask computation. If ‘native’, use underlying zarr +chunks. If a string specifying a target memory size, e.g., ‘300 MiB’, +resize chunks in arrays with more than one dimension to match this +size. If ‘auto’, let dask decide chunk size. If ‘ndauto’, let dask +decide chunk size but only for arrays with more than one dimension. If +‘ndauto0’, as ‘ndauto’ but only vary the first chunk dimension. If +‘ndauto1’, as ‘ndauto’ but only vary the second chunk dimension. If +‘ndauto01’, as ‘ndauto’ but only vary the first and second chunk +dimensions. Also, can be a tuple of integers, or a callable which +accepts the native chunks as a single argument and returns a valid +dask chunks value.

+
+
inline_arraybool, optional, default: True

Passed through to dask from_array().

+
+
+
+
+

Returns#

+
+
Dataset

The resulting dataset contains data has dimensions “cohorts” and +“variants”. Variables prefixed with “cohort” are 1-dimensional +arrays with data about the cohorts, such as the area, period, taxon +and cohort size. Variables prefixed with “variant” are 1-dimensional +arrays with data about the variants, such as the contig, position, +reference and alternate alleles. Variables prefixed with “event” are +2-dimensional arrays with the allele counts and frequency +calculations.

+
+
+
+
+ +
+ + +
+ + + + + +
+ +
+
+
+ +
+ + + + +
+ + +
+
+ +
+ +
+
+
+ + + + + + + + \ No newline at end of file diff --git a/latest/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.html b/latest/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.html new file mode 100644 index 00000000..136e6b9e --- /dev/null +++ b/latest/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies.html @@ -0,0 +1,527 @@ + + + + + + + + + + + malariagen_data.ag3.Ag3.haplotypes_frequencies — malariagen_data API documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + +
+ +
+ + + + + +
+
+ + + + +
+ + + + + + + + + + + + + +
+ +
+ + +
+
+ +
+
+ +
+ +
+ + +
+ +
+ + +
+
+ + + + + +
+ +
+

malariagen_data.ag3.Ag3.haplotypes_frequencies#

+
+
+Ag3.haplotypes_frequencies(region: str | Region | Mapping, cohorts: str | Mapping[str, str], sample_query: str | None = None, sample_query_options: dict | None = None, min_cohort_size: int = 10, sample_sets: Sequence[str] | str | None = None, chunks: int | str | Tuple[int | str, ...] | Callable[[Tuple[int, ...]], int | str | Tuple[int | str, ...]] = 'native', inline_array: bool = True) DataFrame#
+

Compute haplotype frequencies for a region.

+
+

Parameters#

+
+
regionstr or Region or Mapping

Region of the reference genome. Can be a contig name, region string +(formatted like “{contig}:{start}-{end}”), or identifier of a genome +feature such as a gene or transcript.

+
+
cohortsstr or Mapping[str, str]

Either a string giving the name of a predefined cohort set (e.g., +“admin1_month”) or a dict mapping custom cohort labels to sample +queries.

+
+
sample_querystr or None, optional

A pandas query string to be evaluated against the sample metadata, to +select samples to be included in the returned data.

+
+
sample_query_optionsdict or None, optional

A dictionary of arguments that will be passed through to pandas +query() or eval(), e.g. parser, engine, local_dict, global_dict, +resolvers.

+
+
min_cohort_sizeint, optional, default: 10

Minimum cohort size. Raise an error if the number of samples is less +than this value.

+
+
sample_setssequence of str or str or None, optional

List of sample sets and/or releases. Can also be a single sample set +or release.

+
+
chunksint or str or tuple of int or str or Callable[[typing.Tuple[int, …]], int or str or tuple of int or str], optional, default: ‘native’

Define how input data being read from zarr should be divided into +chunks for a dask computation. If ‘native’, use underlying zarr +chunks. If a string specifying a target memory size, e.g., ‘300 MiB’, +resize chunks in arrays with more than one dimension to match this +size. If ‘auto’, let dask decide chunk size. If ‘ndauto’, let dask +decide chunk size but only for arrays with more than one dimension. If +‘ndauto0’, as ‘ndauto’ but only vary the first chunk dimension. If +‘ndauto1’, as ‘ndauto’ but only vary the second chunk dimension. If +‘ndauto01’, as ‘ndauto’ but only vary the first and second chunk +dimensions. Also, can be a tuple of integers, or a callable which +accepts the native chunks as a single argument and returns a valid +dask chunks value.

+
+
inline_arraybool, optional, default: True

Passed through to dask from_array().

+
+
+
+
+

Returns#

+
+
DataFrame

A dataframe of haplotype frequencies, one row per haplotype.

+
+
+
+
+

Notes#

+

Cohorts with fewer samples than min_cohort_size will be excluded +from output data frame.

+
+
+ +
+ + +
+ + + + + +
+ +
+
+
+ +
+ + + + +
+ + +
+
+ +
+ +
+
+
+ + + + + + + + \ No newline at end of file diff --git a/latest/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.html b/latest/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.html new file mode 100644 index 00000000..eee5da1d --- /dev/null +++ b/latest/generated/malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced.html @@ -0,0 +1,536 @@ + + + + + + + + + + + malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced — malariagen_data API documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + +
+ +
+ + + + + +
+
+ + + + +
+ + + + + + + + + + + + + +
+ +
+ + +
+
+ +
+
+ +
+ +
+ + +
+ +
+ + +
+
+ + + + + +
+ +
+

malariagen_data.ag3.Ag3.haplotypes_frequencies_advanced#

+
+
+Ag3.haplotypes_frequencies_advanced(region: str | Region | Mapping, area_by: str, period_by: Literal['year', 'quarter', 'month'], sample_sets: Sequence[str] | str | None = None, sample_query: str | None = None, sample_query_options: dict | None = None, min_cohort_size: int = 10, ci_method: Literal['normal', 'agresti_coull', 'beta', 'wilson', 'binom_test'] | None = 'wilson', chunks: int | str | Tuple[int | str, ...] | Callable[[Tuple[int, ...]], int | str | Tuple[int | str, ...]] = 'native', inline_array: bool = True) Dataset#
+

Group samples by taxon, area (space) and period (time), then compute +haplotype frequencies.

+
+

Parameters#

+
+
regionstr or Region or Mapping

Region of the reference genome. Can be a contig name, region string +(formatted like “{contig}:{start}-{end}”), or identifier of a genome +feature such as a gene or transcript.

+
+
area_bystr

Column name in the sample metadata to use to group samples spatially. +E.g., use “admin1_iso” or “admin1_name” to group by level 1 +administrative divisions, or use “admin2_name” to group by level 2 +administrative divisions.

+
+
period_by{‘year’, ‘quarter’, ‘month’}

Length of time to group samples temporally.

+
+
sample_setssequence of str or str or None, optional

List of sample sets and/or releases. Can also be a single sample set +or release.

+
+
sample_querystr or None, optional

A pandas query string to be evaluated against the sample metadata, to +select samples to be included in the returned data.

+
+
sample_query_optionsdict or None, optional

A dictionary of arguments that will be passed through to pandas +query() or eval(), e.g. parser, engine, local_dict, global_dict, +resolvers.

+
+
min_cohort_sizeint, optional, default: 10

Minimum cohort size. Raise an error if the number of samples is less +than this value.

+
+
ci_method{‘normal’, ‘agresti_coull’, ‘beta’, ‘wilson’, ‘binom_test’} or None, optional, default: ‘wilson’

Method to use for computing confidence intervals, passed through to +statsmodels.stats.proportion.proportion_confint.

+
+
chunksint or str or tuple of int or str or Callable[[typing.Tuple[int, …]], int or str or tuple of int or str], optional, default: ‘native’

Define how input data being read from zarr should be divided into +chunks for a dask computation. If ‘native’, use underlying zarr +chunks. If a string specifying a target memory size, e.g., ‘300 MiB’, +resize chunks in arrays with more than one dimension to match this +size. If ‘auto’, let dask decide chunk size. If ‘ndauto’, let dask +decide chunk size but only for arrays with more than one dimension. If +‘ndauto0’, as ‘ndauto’ but only vary the first chunk dimension. If +‘ndauto1’, as ‘ndauto’ but only vary the second chunk dimension. If +‘ndauto01’, as ‘ndauto’ but only vary the first and second chunk +dimensions. Also, can be a tuple of integers, or a callable which +accepts the native chunks as a single argument and returns a valid +dask chunks value.

+
+
inline_arraybool, optional, default: True

Passed through to dask from_array().

+
+
+
+
+

Returns#

+
+
Dataset

The resulting dataset contains data has dimensions “cohorts” and +“variants”. Variables prefixed with “cohort” are 1-dimensional +arrays with data about the cohorts, such as the area, period, taxon +and cohort size. Variables prefixed with “variant” are 1-dimensional +arrays with data about the variants, such as the contig, position, +reference and alternate alleles. Variables prefixed with “event” are +2-dimensional arrays with the allele counts and frequency +calculations.

+
+
+
+
+ +
+ + +
+ + + + + +
+ +
+
+
+ +
+ + + + +
+ + +
+
+ +
+ +
+
+
+ + + + + + + + \ No newline at end of file diff --git a/latest/genindex.html b/latest/genindex.html index 5199deff..62cbfeb4 100644 --- a/latest/genindex.html +++ b/latest/genindex.html @@ -580,14 +580,14 @@

H

  • (malariagen_data.ag3.Ag3 method)
  • - - +