-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Colocalization module #17
Changes from 3 commits
ac6bcc2
3968687
a12792f
1de5c51
7bc4dca
94a2203
7561423
178e56c
9d09565
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,11 @@ | ||
from metaspace_converter import colocalization | ||
from metaspace_converter.anndata_to_array import anndata_to_image_array | ||
from metaspace_converter.to_anndata import metaspace_to_anndata | ||
from metaspace_converter.to_spatialdata import metaspace_to_spatialdata | ||
|
||
__all__ = ["metaspace_to_anndata", "metaspace_to_spatialdata", "anndata_to_image_array"] | ||
__all__ = [ | ||
"metaspace_to_anndata", | ||
"metaspace_to_spatialdata", | ||
"anndata_to_image_array", | ||
"colocalization", | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
from typing import Optional, Tuple | ||
|
||
import numpy as np | ||
from anndata import AnnData | ||
from scipy import ndimage | ||
|
||
from metaspace_converter.anndata_to_array import anndata_to_image_array | ||
from metaspace_converter.constants import COLOCALIZATION | ||
|
||
|
||
def colocML_preprocessing( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rename to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
adata: AnnData, | ||
layer: Optional[str] = "colocml_preprocessing", | ||
median_filter_size: Tuple[int, int] = (3, 3), | ||
quantile_threshold: float = 0.5, | ||
): | ||
""" | ||
Preprocessing for colocalization analysis according to the colocML publication | ||
(https://doi.org/10.1093/bioinformatics/btaa085). | ||
|
||
In the publication the authors evaluated colocalization metrics and preprocessing approaches. | ||
They found the best performance for | ||
1) median filtering of ion images with a (3, 3) kernel size and | ||
2) quantile thresholding ad 50%, meaning all pixels with intensities below the 50% | ||
quantile set to 0. | ||
|
||
This function performs the same preprocessing steps. | ||
Recommended for call before running the ``colocalization`` function. | ||
|
||
Args: | ||
adata: An AnnData object. | ||
layer: Key for the adata.layers dict in which the processed data will be saved. | ||
Default value is ``colocml_preprocessing``. | ||
If None, ``adata.X`` will be overwritten with the processed data. | ||
median_filter_size: 2-dimensional filter size for the median filtering that | ||
will be performed per ion image. | ||
quantile_threshold: Float between 0 and 1. The function computes the quantile value per | ||
ion image and all pixels below the quantile threshold will be set to 0. | ||
|
||
Returns: | ||
None. The processed data is saved in ``layer``. If layer is net to None, ``adata.X`` will | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
be overwritten | ||
|
||
Raises: | ||
ValueError: If no annotations are available in ``adata.X``. | ||
|
||
""" | ||
|
||
# Extract image array from anndata: | ||
imarray = anndata_to_image_array(adata=adata) | ||
|
||
if len(imarray) == 0: | ||
raise ValueError("AnnData contains no annotations. ColocML preprocessing cannot be called.") | ||
|
||
# Median filtering | ||
imarray = ndimage.median_filter(imarray, size=(1, median_filter_size[0], median_filter_size[1])) | ||
|
||
# reshaping | ||
imarray = imarray.reshape((imarray.shape[0], -1)) | ||
|
||
# Quantile thresholding | ||
mask = imarray < np.percentile(imarray, q=quantile_threshold * 100, axis=1)[:, np.newaxis] | ||
imarray[mask] = 0 | ||
|
||
# Inserting of processed data into adata object | ||
if layer is None: | ||
adata.X = imarray.transpose() | ||
else: | ||
adata.layers[layer] = imarray.transpose() | ||
|
||
|
||
def colocalization(adata: AnnData, layer: Optional[str] = "colocml_preprocessing"): | ||
""" | ||
Colocalization of ion images using the cosine similarity metric. | ||
|
||
In combination with the ``colocML_preprocessing`` function, this metric performed best in the | ||
colocML publication (https://doi.org/10.1093/bioinformatics/btaa085). | ||
|
||
It is recommended to call the the ``colocML_preprocessing`` function beforehand. | ||
|
||
Args: | ||
adata: An AnnData object. | ||
layer: Key for ``adata.layer`` from which the ionimage_data for preprocessing taken. | ||
If ``None``, ``adata.X`` is used. ``colocML_preprocessing`` will save the preprocessed | ||
data per default in ``adata.layer['colocml_preprocessing']``. | ||
|
||
Returns: | ||
None. The processed data is saved in ``adata.uns['colocalization']``. | ||
""" | ||
|
||
# Select data | ||
if layer == None or layer not in adata.layers.keys(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the second condition is quite permissive. If an AnnData has both a default When doing data analysis, is this level of permissiveness useful (because some datasets may have the layer, others not)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree. Now raising an error if layer not available. |
||
data = np.array(adata.X).transpose() | ||
else: | ||
data = np.array(adata.layers[layer]).transpose() | ||
|
||
# Compute colocalization | ||
coloc = _pairwise_cosine_similarity(data) | ||
|
||
# Save colocalization | ||
adata.uns[COLOCALIZATION] = coloc | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would also fit well as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True. Good idea. Will move it to |
||
|
||
|
||
def _pairwise_cosine_similarity(data: np.ndarray) -> np.ndarray: | ||
|
||
# Divide image vectors by euclidean norm | ||
norm_data = data / np.linalg.norm(data, axis=1, keepdims=True) | ||
|
||
# Compute pairwise cosine similarity | ||
similarity_matrix = np.dot(norm_data, norm_data.T) | ||
|
||
return similarity_matrix |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,3 +28,5 @@ class COL: | |
XY: Final = (X, Y) | ||
YX: Final = (Y, X) | ||
YXC: Final = (Y, X, C) | ||
|
||
COLOCALIZATION = "colocalization" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
from typing import TYPE_CHECKING | ||
|
||
import numpy as np | ||
import pandas as pd | ||
import pytest | ||
from anndata import AnnData | ||
|
||
if TYPE_CHECKING: | ||
import _pytest.fixtures | ||
|
||
from metaspace_converter import anndata_to_image_array | ||
from metaspace_converter.colocalization import colocalization, colocML_preprocessing | ||
from metaspace_converter.constants import COL, COLOCALIZATION, METASPACE_KEY, X, Y | ||
from metaspace_converter.to_anndata import all_image_pixel_coordinates | ||
|
||
|
||
@pytest.fixture | ||
def adata_dummy(request: "_pytest.fixtures.SubRequest") -> AnnData: | ||
num_ions = request.param.get("num_ions", 0) | ||
height = request.param.get("height", 0) | ||
width = request.param.get("width", 0) | ||
shape = (num_ions, height, width) | ||
# Create a stack of ion images where every pixel has a different value | ||
ion_images_stack = np.arange(np.prod(shape)).reshape(shape) | ||
coordinates = all_image_pixel_coordinates(shape[1:]) | ||
# Create an AnnData with pixel coordinates | ||
obs = pd.DataFrame( | ||
{COL.ion_image_pixel_y: coordinates[:, 0], COL.ion_image_pixel_x: coordinates[:, 1]} | ||
) | ||
adata = AnnData( | ||
X=ion_images_stack.reshape((height * width, num_ions)), | ||
obs=obs, | ||
uns={METASPACE_KEY: {"image_size": {X: width, Y: height}}}, | ||
) | ||
return adata | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"adata_dummy", | ||
[ | ||
dict(num_ions=5, height=10, width=10), | ||
dict(num_ions=0, height=2, width=3), | ||
dict(num_ions=4, height=4, width=4), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me, the third and first case seem to cover the same classes of equivalent inputs, that means I would not expect the third one fails without the first one failing as well. But the second one covers two different edge cases (no ions, rectangular image). I would separate them like this: dict(num_ions=5, height=10, width=10),
dict(num_ions=0, height=10, width=10),
dict(num_ions=5, height=3, width=4), There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. Done. |
||
], | ||
indirect=["adata_dummy"], | ||
) | ||
def test_colocml_preprocessing(adata_dummy): | ||
|
||
COLOCML_LAYER = "colocml_preprocessing" | ||
|
||
adata = adata_dummy | ||
|
||
if adata.X.shape[1] == 0: | ||
with pytest.raises(ValueError) as e_info: | ||
colocML_preprocessing( | ||
adata, median_filter_size=(3, 3), quantile_threshold=0, layer=COLOCML_LAYER | ||
) | ||
else: | ||
|
||
# Test median filtering | ||
colocML_preprocessing( | ||
adata, median_filter_size=(3, 3), quantile_threshold=0, layer=COLOCML_LAYER | ||
) | ||
|
||
actual = anndata_to_image_array(adata) | ||
|
||
# median filter | ||
|
||
expected_preprocessing = np.median(actual[0][:3, :3]) | ||
observed = adata.layers[COLOCML_LAYER][adata.uns[METASPACE_KEY]["image_size"][X] + 1, 0] | ||
assert observed == expected_preprocessing | ||
|
||
expected_preprocessing = np.median(actual[1][:3, :3]) | ||
observed = adata.layers[COLOCML_LAYER][adata.uns[METASPACE_KEY]["image_size"][X] + 1, 1] | ||
assert observed == expected_preprocessing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These assertions are not obvious to me as a reader. I would extract the inline expression to a variable # Check that preprocessing was done correctly.
# Probe a single pixel of the resulting array and compute the expected median filter.
x = 1
y = 1
width = adata.uns[METASPACE_KEY]["image_size"][X]
expected_pixel_value = np.median(actual[0][y-1:y+2, x-1:x+2])
actual_pixel_value = adata.layers[COLOCML_LAYER][y * width + x, 0]
assert actual_pixel_value == expected_pixel_value One can then even check all ions in one go with expected_pixel_features = np.median(actual[:, y - 1:y + 2, x - 1:x + 2], axis=[1, 2])
actual_pixel_features = adata.layers[COLOCML_LAYER][y * width + x, :]
np.testing.assert_allclose(actual_pixel_features, expected_pixel_features) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, done. |
||
|
||
# Quantile thresholding | ||
adata.X[0].reshape(-1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't modify the receiver. You can remove the line There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True. Probably just an artifact from my internal testing. Removed it. |
||
|
||
colocML_preprocessing( | ||
adata, median_filter_size=(3, 3), quantile_threshold=0.5, layer=COLOCML_LAYER | ||
) | ||
|
||
assert all(np.sum(adata.layers[COLOCML_LAYER] == 0, axis=0) <= 0.5 * adata.X.shape[0]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is often preferrable to have a single test case in a test function. For a second case, you could parametrize the function with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would make the assertion a bit clearer by extracting the parameter to a variable and computing the assertion values in separate lines. quantile_threshold = 0.5
colocML_preprocessing(
adata, median_filter_size=(3, 3), quantile_threshold=quantile_threshold, layer=COLOCML_LAYER
)
quantile_value = quantile_threshold * adata.X.shape[0]
actual_thresholded = np.sum(adata.layers[COLOCML_LAYER] == 0, axis=0)
assert all(actual_thresholded <= quantile_value) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Added quantile_threshold to parameterization. |
||
|
||
# Layer exists | ||
assert COLOCML_LAYER in adata.layers.keys() | ||
|
||
# Layer sizes match | ||
assert adata.X.shape == adata.layers[COLOCML_LAYER].shape | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would move these above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
|
||
|
||
def test_colocalization(): | ||
|
||
arr = np.array([[0.0, 1.0, 0.0], [0.0, 2.0, 0.0], [1.0, 0.0, 0.0]]).transpose() | ||
|
||
expected = np.array([[1.0, 1.0, 0.0], [1.0, 1.0, 0.0], [0.0, 0.0, 1.0]]) | ||
|
||
adata = AnnData(X=arr) | ||
|
||
colocalization(adata, layer=None) | ||
|
||
assert COLOCALIZATION in adata.uns.keys() | ||
|
||
coloc = adata.uns[COLOCALIZATION] | ||
|
||
assert np.all(coloc == expected) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" | |
|
||
[project] | ||
name = "metaspace-converter" | ||
version = "1.0.1" | ||
version = "1.1.0" | ||
authors = [ | ||
{name = "Tim Daniel Rose", email = "[email protected]"}, | ||
{name = "Andreas Eisenbarth", email = "[email protected]"}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matrix (or metrics?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.