Skip to content

Commit

Permalink
Biallelic pairwise distances and neighbour joining trees (#470)
Browse files Browse the repository at this point in the history
* add biallelic diplotype pairwise distances

* compare scipy

* implement plot_njt; refactor plot_haplotype_clustering

* docs

* clear notebook

* fix pca plotting

* allow tqdm class to be injected

* expose tqdm_class

* fix plotting without color

* njt legends

* docs maintenance

* optimisation

* fix typing

* manage memory

* manage more memory

* more memory management
  • Loading branch information
alimanfoo authored Dec 10, 2023
1 parent ab5c0a1 commit 282e3b6
Show file tree
Hide file tree
Showing 27 changed files with 2,057 additions and 674 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
uses: actions/checkout@v3

- name: Install poetry
run: pipx install poetry==1.4.2
run: pipx install poetry==1.7.1

- name: Setup python
uses: actions/setup-python@v4
Expand Down
7 changes: 2 additions & 5 deletions .github/workflows/latest_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,15 @@ jobs:
with:
python-version: '3.11'

- name: Install Poetry 📜
run: pip install poetry
- name: Install poetry
run: pipx install poetry==1.7.1

- name: Eyeball native environment 👀
run: python --version ; pip --version ; poetry --version ; pwd ; ls -la

- name: Install package dependencies 📜
run: poetry install

- name: Install docs dependencies 📜
run: poetry run pip install -r docs/requirements.txt

- name: Build HTML 🏗️
run: poetry run sphinx-build -b html docs/source docs/build/html

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
uses: actions/checkout@v3

- name: Install poetry
run: pipx install poetry==1.4.2
run: pipx install poetry==1.7.1

- name: Setup python
uses: actions/setup-python@v4
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
uses: JRubics/[email protected]
with:
python_version: '3.11'
poetry_version: '==1.4.2'
poetry_version: '==1.7.1'
allow_poetry_pre_release: 'yes'
ignore_dev_requirements: 'yes'
pypi_token: ${{ secrets.PYPI_TOKEN }}
Expand Down
7 changes: 2 additions & 5 deletions .github/workflows/tagged_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,15 @@ jobs:
with:
python-version: '3.11'

- name: Install Poetry 📜
run: pip install poetry
- name: Install poetry
run: pipx install poetry==1.7.1

- name: Eyeball native environment 👀
run: python --version ; pip --version ; poetry --version ; pwd ; ls -la

- name: Install package dependencies 📜
run: poetry install

- name: Install docs dependencies 📜
run: poetry run pip install -r docs/requirements.txt

- name: Build HTML 🏗️
run: poetry run sphinx-build -b html docs/source docs/build/html

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
uses: actions/checkout@v3

- name: Install poetry
run: pipx install poetry==1.4.2
run: pipx install poetry==1.7.1

- name: Setup python
uses: actions/setup-python@v4
Expand Down
2 changes: 0 additions & 2 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
sphinx
pydata-sphinx-theme
11 changes: 11 additions & 0 deletions docs/source/Af1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ SNP data access
plot_snps
site_annotations
is_accessible
biallelic_snp_calls
biallelic_diplotypes

Haplotype data access
---------------------
Expand Down Expand Up @@ -107,6 +109,14 @@ Principal components analysis (PCA)
plot_pca_coords
plot_pca_coords_3d

Genetic distance and neighbour-joining trees (NJT)
--------------------------------------------------
.. autosummary::
:toctree: generated/

plot_njt
biallelic_diplotype_pairwise_distances

Heterozygosity analysis
-----------------------
.. autosummary::
Expand Down Expand Up @@ -152,6 +162,7 @@ Haplotype clustering and network analysis

plot_haplotype_clustering
plot_haplotype_network
haplotype_pairwise_distances

Fst analysis
------------
Expand Down
11 changes: 11 additions & 0 deletions docs/source/Ag3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ SNP data access
plot_snps
site_annotations
is_accessible
biallelic_snp_calls
biallelic_diplotypes

Haplotype data access
---------------------
Expand Down Expand Up @@ -116,6 +118,14 @@ Principal components analysis (PCA)
plot_pca_coords
plot_pca_coords_3d

Genetic distance and neighbour-joining trees (NJT)
--------------------------------------------------
.. autosummary::
:toctree: generated/

plot_njt
biallelic_diplotype_pairwise_distances

Heterozygosity analysis
-----------------------
.. autosummary::
Expand Down Expand Up @@ -161,6 +171,7 @@ Haplotype clustering and network analysis

plot_haplotype_clustering
plot_haplotype_network
haplotype_pairwise_distances

Fst analysis
------------
Expand Down
2 changes: 2 additions & 0 deletions malariagen_data/af1.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ def __init__(
cohorts_analysis=None,
site_filters_analysis=None,
pre=False,
tqdm_class=None,
**storage_options, # used by fsspec via init_filesystem()
):
super().__init__(
Expand Down Expand Up @@ -120,6 +121,7 @@ def __init__(
gff_gene_type="protein_coding_gene",
gff_default_attributes=("ID", "Parent", "Note", "description"),
storage_options=storage_options, # used by fsspec via init_filesystem()
tqdm_class=tqdm_class,
)

@staticmethod
Expand Down
2 changes: 2 additions & 0 deletions malariagen_data/ag3.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ def __init__(
aim_analysis=None,
site_filters_analysis=None,
pre=False,
tqdm_class=None,
**storage_options, # used by fsspec via init_filesystem()
):
super().__init__(
Expand Down Expand Up @@ -172,6 +173,7 @@ def __init__(
gff_gene_type="gene",
gff_default_attributes=("ID", "Parent", "Name", "description"),
storage_options=storage_options, # used by fsspec via init_filesystem()
tqdm_class=tqdm_class,
)

# set up caches
Expand Down
12 changes: 9 additions & 3 deletions malariagen_data/anoph/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import pandas as pd
import zarr
from numpydoc_decorator import doc
from tqdm.auto import tqdm
from tqdm.auto import tqdm as tqdm_auto
from tqdm.dask import TqdmCallback
from yaspin import yaspin

Expand Down Expand Up @@ -52,6 +52,7 @@ def __init__(
check_location: bool = False,
storage_options: Optional[Mapping] = None,
results_cache: Optional[str] = None,
tqdm_class=None,
):
self._url = url
self._config_path = config_path
Expand All @@ -61,6 +62,9 @@ def __init__(
self._major_version_path = major_version_path
self._debug = debug
self._show_progress = show_progress
if tqdm_class is None:
tqdm_class = tqdm_auto
self._tqdm_class = tqdm_class

# Set up logging.
self._log = LoggingHelper(name=__name__, out=log, debug=debug)
Expand Down Expand Up @@ -106,15 +110,17 @@ def _progress(self, iterable, desc=None, leave=False, **kwargs): # pragma: no c
# Progress doesn't mix well with debug logging.
show_progress = self._show_progress and not self._debug
if show_progress:
return tqdm(iterable, desc=desc, leave=leave, **kwargs)
return self._tqdm_class(iterable, desc=desc, leave=leave, **kwargs)
else:
return iterable

def _dask_progress(self, desc=None, leave=False, **kwargs): # pragma: no cover
# Progress doesn't mix well with debug logging.
show_progress = self._show_progress and not self._debug
if show_progress:
return TqdmCallback(desc=desc, leave=leave, **kwargs)
return TqdmCallback(
desc=desc, leave=leave, tqdm_class=self._tqdm_class, **kwargs
)
else:
return nullcontext()

Expand Down
12 changes: 12 additions & 0 deletions malariagen_data/anoph/diplotype_distance_params.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from typing import Literal

from typing_extensions import Annotated, TypeAlias

metric: TypeAlias = Annotated[
Literal[
"cityblock",
"euclidean",
"sqeuclidean",
],
"The metric to compute distance between genotypes in two samples.",
]
19 changes: 0 additions & 19 deletions malariagen_data/anoph/hapclust_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,3 @@
]

linkage_method_default: linkage_method = "single"

count_sort: TypeAlias = Annotated[
bool,
"""
For each node n, the order (visually, from left-to-right) n's two descendant
links are plotted is determined by this parameter. If True, the child with
the minimum number of original objects in its cluster is plotted first. Note
distance_sort and count_sort cannot both be True.
""",
]

distance_sort: TypeAlias = Annotated[
bool,
"""
For each node n, the order (visually, from left-to-right) n's two descendant
links are plotted is determined by this parameter. If True, The child with the
minimum distance between its direct descendants is plotted first.
""",
]
23 changes: 1 addition & 22 deletions malariagen_data/anoph/hapnet_params.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Parameters for haplotype network functions."""

from typing import List, Mapping
from typing import Mapping

from typing_extensions import Annotated, TypeAlias

Expand All @@ -11,27 +11,6 @@

max_dist_default: max_dist = 2

color: TypeAlias = Annotated[
str,
"""
Identifies a column in the sample metadata which determines the colour
of pie chart segments within nodes.
""",
]

color_discrete_sequence: TypeAlias = Annotated[
List, "Provide a list of colours to use."
]

color_discrete_map: TypeAlias = Annotated[
Mapping, "Provide an explicit mapping from values to colours."
]

category_order: TypeAlias = Annotated[
List,
"Control the order in which values appear in the legend.",
]

node_size_factor: TypeAlias = Annotated[
int,
"Control the sizing of nodes.",
Expand Down
23 changes: 21 additions & 2 deletions malariagen_data/anoph/plotly_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# and so we set as Optional here, rather than having to repeat
# that for each function doc.

from typing import List, Literal, Optional, Union
from typing import List, Literal, Mapping, Optional, Union

import plotly.graph_objects as go
from typing_extensions import Annotated, TypeAlias
Expand Down Expand Up @@ -42,6 +42,8 @@
""",
]

title_font_size = Annotated[int, "Font size for the plot title."]

text_auto: TypeAlias = Annotated[
Union[bool, str],
"""
Expand All @@ -51,6 +53,19 @@
""",
]

color_discrete_sequence: TypeAlias = Annotated[
Optional[List], "Provide a list of colours to use."
]

color_discrete_map: TypeAlias = Annotated[
Optional[Mapping], "Provide an explicit mapping from values to colours."
]

category_order: TypeAlias = Annotated[
Optional[List],
"Control the order in which values appear in the legend.",
]

color_continuous_scale: TypeAlias = Annotated[
Optional[Union[str, List[str]]],
"""
Expand Down Expand Up @@ -97,10 +112,14 @@
]

marker_size: TypeAlias = Annotated[
int,
Union[int, float],
"Marker size.",
]

line_width: TypeAlias = Annotated[Union[int, float], "Line width."]

line_color: TypeAlias = Annotated[str, "Line color"]

template: TypeAlias = Annotated[
Optional[
Literal[
Expand Down
Loading

0 comments on commit 282e3b6

Please sign in to comment.