Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colocalization module #17

Merged
merged 9 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ of many packages of the [scverse](https://doi.org/10.1038/s41587-023-01733-8) su
[squidpy](https://squidpy.readthedocs.io/en/stable/index.html) for spatial omics analysis.

Another supported format that is part of the [scverse](https://doi.org/10.1038/s41587-023-01733-8)
is [SpatialData](https://spatialdata.scverse.org/en/latest/) for storing, aligning, and processing spatial omics data.
is [SpatialData](https://spatialdata.scverse.org/en/latest/) for storing, aligning, and processing spatial omics data.
This enables users to easily align and integrate METASPACE datasets
to other spatial omics modalities.

Additionally, the commonly used colocalization analysis for spatial metabolomics can be performed with the package.

If you encounter any bugs or have suggestions for new features, please open an issue in the
[GitHub repository](https://github.com/metaspace2020/metaspace-converter).

Expand All @@ -28,6 +30,7 @@ If you encounter any bugs or have suggestions for new features, please open an i
Our package requires `python >= 3.9`.

You can install the package directly from [PyPI](https://pypi.org/project/metaspace-converter/):

```bash
pip install metaspace-converter
```
Expand Down Expand Up @@ -161,7 +164,6 @@ sdata.points["maldi_points"] = sdata.transform_element_to_coordinate_system(

Unless specified otherwise in file headers or LICENSE files present in subdirectories, all files are licensed under the [Apache 2.0 license](https://github.com/metaspace2020/metaspace/blob/master/LICENSE).


[badge-docs]: https://img.shields.io/github/actions/workflow/status/metaspace2020/metaspace-converter/docs.yml?label=documentation
[badge-pypi]: https://img.shields.io/pypi/v/metaspace-converter
[badge-tests]: https://img.shields.io/github/actions/workflow/status/metaspace2020/metaspace-converter/tests.yml?branch=master&label=tests
Expand Down
32 changes: 32 additions & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,35 @@ Here using a reversed colormap which better represents intense values on bright

.. image:: ./_static/img/example_img_sd.png
:alt: Visualization with SpatialData


Colocalization Analysis
-----------------------

A popular type of analysis for imaging Mass Spectrometry data or spatial metabolomics
is colocalization analysis.
In the `ColocML`_ publication,
the authors evaluated data processing and colocalization metrics.
Their best method (median filtering, quantile thresholding, and cosine similarity),
can be executed with our package:

.. testcode::

from metaspace_converter import metaspace_to_anndata, colocalization

# Download data
adata = metaspace_to_anndata(dataset_id="2023-11-14_21h58m39s", fdr=0.1)

# Perform median filtering and quantile thresholding
# The processed data is saved as a layer `adata.layers["colocml_preprocessing"]`
# It has the same dimensions as `adata.X`
colocalization.colocML_preprocessing(adata, layer="colocml_preprocessing")

# Compute the pairwise colocalization metrix between all ion images
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matrix (or metrics?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# As an input, the processed data from `adata.layers["colocml_preprocessing"]` is used
# The colocalization matrix is saved in `adata.uns["colocalization"]`
colocalization.colocalization(adata, layer="colocml_preprocessing")



.. _ColocML: https://doi.org/10.1093/bioinformatics/btaa085
8 changes: 7 additions & 1 deletion metaspace_converter/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
from metaspace_converter import colocalization
from metaspace_converter.anndata_to_array import anndata_to_image_array
from metaspace_converter.to_anndata import metaspace_to_anndata
from metaspace_converter.to_spatialdata import metaspace_to_spatialdata

__all__ = ["metaspace_to_anndata", "metaspace_to_spatialdata", "anndata_to_image_array"]
__all__ = [
"metaspace_to_anndata",
"metaspace_to_spatialdata",
"anndata_to_image_array",
"colocalization",
]
112 changes: 112 additions & 0 deletions metaspace_converter/colocalization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
from typing import Optional, Tuple

import numpy as np
from anndata import AnnData
from scipy import ndimage

from metaspace_converter.anndata_to_array import anndata_to_image_array
from metaspace_converter.constants import COLOCALIZATION


def colocML_preprocessing(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename to coloc_ml_preprocessing, closer to Python naming conventions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

adata: AnnData,
layer: Optional[str] = "colocml_preprocessing",
median_filter_size: Tuple[int, int] = (3, 3),
quantile_threshold: float = 0.5,
):
"""
Preprocessing for colocalization analysis according to the colocML publication
(https://doi.org/10.1093/bioinformatics/btaa085).

In the publication the authors evaluated colocalization metrics and preprocessing approaches.
They found the best performance for
1) median filtering of ion images with a (3, 3) kernel size and
2) quantile thresholding ad 50%, meaning all pixels with intensities below the 50%
quantile set to 0.

This function performs the same preprocessing steps.
Recommended for call before running the ``colocalization`` function.

Args:
adata: An AnnData object.
layer: Key for the adata.layers dict in which the processed data will be saved.
Default value is ``colocml_preprocessing``.
If None, ``adata.X`` will be overwritten with the processed data.
median_filter_size: 2-dimensional filter size for the median filtering that
will be performed per ion image.
quantile_threshold: Float between 0 and 1. The function computes the quantile value per
ion image and all pixels below the quantile threshold will be set to 0.

Returns:
None. The processed data is saved in ``layer``. If layer is net to None, ``adata.X`` will
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

net set

be overwritten

Raises:
ValueError: If no annotations are available in ``adata.X``.

"""

# Extract image array from anndata:
imarray = anndata_to_image_array(adata=adata)

if len(imarray) == 0:
raise ValueError("AnnData contains no annotations. ColocML preprocessing cannot be called.")

# Median filtering
imarray = ndimage.median_filter(imarray, size=(1, median_filter_size[0], median_filter_size[1]))

# reshaping
imarray = imarray.reshape((imarray.shape[0], -1))

# Quantile thresholding
mask = imarray < np.percentile(imarray, q=quantile_threshold * 100, axis=1)[:, np.newaxis]
imarray[mask] = 0

# Inserting of processed data into adata object
if layer is None:
adata.X = imarray.transpose()
else:
adata.layers[layer] = imarray.transpose()


def colocalization(adata: AnnData, layer: Optional[str] = "colocml_preprocessing"):
"""
Colocalization of ion images using the cosine similarity metric.

In combination with the ``colocML_preprocessing`` function, this metric performed best in the
colocML publication (https://doi.org/10.1093/bioinformatics/btaa085).

It is recommended to call the the ``colocML_preprocessing`` function beforehand.

Args:
adata: An AnnData object.
layer: Key for ``adata.layer`` from which the ionimage_data for preprocessing taken.
If ``None``, ``adata.X`` is used. ``colocML_preprocessing`` will save the preprocessed
data per default in ``adata.layer['colocml_preprocessing']``.

Returns:
None. The processed data is saved in ``adata.uns['colocalization']``.
"""

# Select data
if layer == None or layer not in adata.layers.keys():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second condition is quite permissive. If an AnnData has both a default X and some layers, a user providing an unknown layer name would unexpectedly get the default layer, but since no error is raised, assume they got the requested layer.

When doing data analysis, is this level of permissiveness useful (because some datasets may have the layer, others not)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Now raising an error if layer not available.

data = np.array(adata.X).transpose()
else:
data = np.array(adata.layers[layer]).transpose()

# Compute colocalization
coloc = _pairwise_cosine_similarity(data)

# Save colocalization
adata.uns[COLOCALIZATION] = coloc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also fit well as adata.varp[COLOCALIZATION] = coloc since it is pairwise on features.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Good idea. Will move it to anndata.varp



def _pairwise_cosine_similarity(data: np.ndarray) -> np.ndarray:

# Divide image vectors by euclidean norm
norm_data = data / np.linalg.norm(data, axis=1, keepdims=True)

# Compute pairwise cosine similarity
similarity_matrix = np.dot(norm_data, norm_data.T)

return similarity_matrix
2 changes: 2 additions & 0 deletions metaspace_converter/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,5 @@ class COL:
XY: Final = (X, Y)
YX: Final = (Y, X)
YXC: Final = (Y, X, C)

COLOCALIZATION = "colocalization"
107 changes: 107 additions & 0 deletions metaspace_converter/tests/colocalization_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
from typing import TYPE_CHECKING

import numpy as np
import pandas as pd
import pytest
from anndata import AnnData

if TYPE_CHECKING:
import _pytest.fixtures

from metaspace_converter import anndata_to_image_array
from metaspace_converter.colocalization import colocalization, colocML_preprocessing
from metaspace_converter.constants import COL, COLOCALIZATION, METASPACE_KEY, X, Y
from metaspace_converter.to_anndata import all_image_pixel_coordinates


@pytest.fixture
def adata_dummy(request: "_pytest.fixtures.SubRequest") -> AnnData:
num_ions = request.param.get("num_ions", 0)
height = request.param.get("height", 0)
width = request.param.get("width", 0)
shape = (num_ions, height, width)
# Create a stack of ion images where every pixel has a different value
ion_images_stack = np.arange(np.prod(shape)).reshape(shape)
coordinates = all_image_pixel_coordinates(shape[1:])
# Create an AnnData with pixel coordinates
obs = pd.DataFrame(
{COL.ion_image_pixel_y: coordinates[:, 0], COL.ion_image_pixel_x: coordinates[:, 1]}
)
adata = AnnData(
X=ion_images_stack.reshape((height * width, num_ions)),
obs=obs,
uns={METASPACE_KEY: {"image_size": {X: width, Y: height}}},
)
return adata


@pytest.mark.parametrize(
"adata_dummy",
[
dict(num_ions=5, height=10, width=10),
dict(num_ions=0, height=2, width=3),
dict(num_ions=4, height=4, width=4),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the third and first case seem to cover the same classes of equivalent inputs, that means I would not expect the third one fails without the first one failing as well.

But the second one covers two different edge cases (no ions, rectangular image). I would separate them like this:

        dict(num_ions=5, height=10, width=10),
        dict(num_ions=0, height=10, width=10),
        dict(num_ions=5, height=3, width=4),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

],
indirect=["adata_dummy"],
)
def test_colocml_preprocessing(adata_dummy):

COLOCML_LAYER = "colocml_preprocessing"

adata = adata_dummy

if adata.X.shape[1] == 0:
with pytest.raises(ValueError) as e_info:
colocML_preprocessing(
adata, median_filter_size=(3, 3), quantile_threshold=0, layer=COLOCML_LAYER
)
else:

# Test median filtering
colocML_preprocessing(
adata, median_filter_size=(3, 3), quantile_threshold=0, layer=COLOCML_LAYER
)

actual = anndata_to_image_array(adata)

# median filter

expected_preprocessing = np.median(actual[0][:3, :3])
observed = adata.layers[COLOCML_LAYER][adata.uns[METASPACE_KEY]["image_size"][X] + 1, 0]
assert observed == expected_preprocessing

expected_preprocessing = np.median(actual[1][:3, :3])
observed = adata.layers[COLOCML_LAYER][adata.uns[METASPACE_KEY]["image_size"][X] + 1, 1]
assert observed == expected_preprocessing
Copy link
Collaborator

@aeisenbarth aeisenbarth Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions are not obvious to me as a reader. I would extract the inline expression to a variable width and explain the indexing. For example:

        # Check that preprocessing was done correctly.
        # Probe a single pixel of the resulting array and compute the expected median filter.
        x = 1
        y = 1
        width = adata.uns[METASPACE_KEY]["image_size"][X]

        expected_pixel_value = np.median(actual[0][y-1:y+2, x-1:x+2])
        actual_pixel_value = adata.layers[COLOCML_LAYER][y * width + x, 0]
        assert actual_pixel_value == expected_pixel_value

One can then even check all ions in one go with

        expected_pixel_features = np.median(actual[:, y - 1:y + 2, x - 1:x + 2], axis=[1, 2])
        actual_pixel_features = adata.layers[COLOCML_LAYER][y * width + x, :]
        np.testing.assert_allclose(actual_pixel_features, expected_pixel_features)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.


# Quantile thresholding
adata.X[0].reshape(-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't modify the receiver. You can remove the line adata.X[0].reshape(-1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Probably just an artifact from my internal testing. Removed it.


colocML_preprocessing(
adata, median_filter_size=(3, 3), quantile_threshold=0.5, layer=COLOCML_LAYER
)

assert all(np.sum(adata.layers[COLOCML_LAYER] == 0, axis=0) <= 0.5 * adata.X.shape[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is often preferrable to have a single test case in a test function. For a second case, you could parametrize the function with @pytest.parametrize("quantile_threshold", [0, 0.5]) or, if the call or assertions are too different, write a separate test function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the assertion a bit clearer by extracting the parameter to a variable and computing the assertion values in separate lines.

        quantile_threshold = 0.5
        colocML_preprocessing(
            adata, median_filter_size=(3, 3), quantile_threshold=quantile_threshold, layer=COLOCML_LAYER
        )
        quantile_value = quantile_threshold * adata.X.shape[0]
        actual_thresholded = np.sum(adata.layers[COLOCML_LAYER] == 0, axis=0)
        assert all(actual_thresholded <= quantile_value)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added quantile_threshold to parameterization.


# Layer exists
assert COLOCML_LAYER in adata.layers.keys()

# Layer sizes match
assert adata.X.shape == adata.layers[COLOCML_LAYER].shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move these above # Test median filtering because the other assertions would have failed anyways. That way, we first have the check that preprocessing was done, and then that it was done correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



def test_colocalization():

arr = np.array([[0.0, 1.0, 0.0], [0.0, 2.0, 0.0], [1.0, 0.0, 0.0]]).transpose()

expected = np.array([[1.0, 1.0, 0.0], [1.0, 1.0, 0.0], [0.0, 0.0, 1.0]])

adata = AnnData(X=arr)

colocalization(adata, layer=None)

assert COLOCALIZATION in adata.uns.keys()

coloc = adata.uns[COLOCALIZATION]

assert np.all(coloc == expected)
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "metaspace-converter"
version = "1.0.1"
version = "1.1.0"
authors = [
{name = "Tim Daniel Rose", email = "[email protected]"},
{name = "Andreas Eisenbarth", email = "[email protected]"},
Expand Down