Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suport for numpy structured arrays #517

Open
1 task done
acgc99 opened this issue Mar 6, 2025 · 1 comment
Open
1 task done

Add suport for numpy structured arrays #517

acgc99 opened this issue Mar 6, 2025 · 1 comment

Comments

@acgc99
Copy link

acgc99 commented Mar 6, 2025

Things to check first

  • I have searched the existing issues and didn't find my feature already requested there

Feature description

This a copy and paste of a question I made recently in StackOverflow and the solution I found with some help of AI.

Not sure if this would work with numpy records.

Feature resquested
I want to implement functions that act on specific numpy structured arrays, but typeguard doesn't seem to work properly:

import numpy as np
from typeguard import typechecked

mytype2 = np.dtype([("type", int), ("pos", float, 2)])
mytype3 = np.dtype([("type", int), ("pos", float, 3)])

@typechecked
def process(data: mytype3) -> None:  # Variable not allowed in type expression Pylance
    print(data)

data = np.array([(1, [2, 3])], dtype=mytype2)
process(data)

That works fine, when it should raise an error.

My solution

from typing import Any
import numpy as np
from typeguard import (
    TypeCheckError, TypeCheckerCallable, TypeCheckMemo,
    checker_lookup_functions, typechecked
)


def checker_dtype(
    value: Any, origin_type: np.dtype, args: tuple[Any, ...], memo: TypeCheckMemo
) -> None:
    # Check if the value is a NumPy array
    if not isinstance(value, np.ndarray):
        raise TypeCheckError("is not an instance of numpy.ndarray")
    # Check if the array's dtype matches the expected dtype
    if value.dtype != origin_type:
        raise TypeCheckError(f"expected dtype {origin_type}, but got {value.dtype}")

def lookup_dtype(
    origin_type: Any, args: tuple[Any, ...], extras: tuple[Any, ...]
) -> TypeCheckerCallable | None:
    # Return the checker if the annotation is a numpy dtype instance
    if isinstance(origin_type, np.dtype):
        return checker_dtype
    return None

# Register the custom checker lookup function
checker_lookup_functions.append(lookup_dtype)

# Example usage
MyDtype2 = np.dtype([("type", int), ("pos", float, 2)])
MyDtype3 = np.dtype([("type", int), ("pos", float, 3)])

@typechecked
def process(data: MyDtype3) -> None:  # Pylance still shows an error
    print(data)

data = np.array([(1, [2, 3])], dtype=MyDtype2)
try:
    process(data)  # Raises TypeCheckError due to dtype mismatch
except TypeCheckError as e:
    print(e)
data = np.array([(1, [2, 3, 4])], dtype=MyDtype3)
process(data)  # No error raised

Use case

This is necessary for a consistent use of numpy structured arrays.

@acgc99
Copy link
Author

acgc99 commented Mar 8, 2025

I copy and paste Holt's solution in StackOverflow:

You want to type your function to accept an array and not a dtype instance, otherwise it does not really make sense - typeguard might work but any type-checker will give you an error because you are trying to pass an array where a dtype is expected.

Here is a possible to do what you want by adding a lookup function (bonus to check the shape of the array):

from types import GenericAlias
from typing import Any, cast

import numpy as np
import numpy.typing as npt
from typeguard import (
    TypeCheckerCallable,
    TypeCheckError,
    TypeCheckMemo,
    checker_lookup_functions,
    typechecked,
)


class NDArrayChecker:
    def __init__(self, target_shape: GenericAlias, dtype: npt.DTypeLike):
        self._dtype = np.dtype(dtype)
        self._shape = target_shape

        shape_args = self._shape.__args__

        if len(shape_args) == 2 and shape_args[1] == Ellipsis:
            shape_type = shape_args[0]
            assert isinstance(shape_type, type)

            def check_shape(shape: tuple[int, ...]):
                return all(isinstance(s, shape_type) for s in shape)
        else:
            assert all(isinstance(s, type) for s in shape_args)
            shape_type = cast(tuple[type, ...], shape_args)

            def check_shape(shape: tuple[int, ...]):
                return len(shape) == len(shape_type) and all(
                    isinstance(s, t) for s, t in zip(shape, shape_type)
                )

        self._check_shape = check_shape

    def __call__(
        self, value: Any, origin_type: Any, args: tuple[Any, ...], memo: TypeCheckMemo
    ) -> None:
        if not isinstance(value, np.ndarray):
            raise TypeCheckError("is not an instance of numpy.ndarray")

        # check dtype
        if value.dtype != self._dtype:
            raise TypeCheckError(
                f"has dtype {value.dtype} but {self._dtype} was expected"
            )

        # check shape
        if not self._check_shape(value.shape):
            raise TypeCheckError(
                f"has shape {value.shape} but {self._shape} was expected"
            )


def lookup_dtype(
    origin_type: Any, args: tuple[Any, ...], extras: tuple[Any, ...]
) -> TypeCheckerCallable | None:
    if origin_type is np.ndarray:
        # np.ndarray is typed as np.ndarray[Shape, Dtype]
        return NDArrayChecker(args[0], args[1])
    return None


checker_lookup_functions.append(lookup_dtype)

You can check with the following:

MyDtype2 = np.dtype([("type", int), ("pos", float, 2)])
MyDtype3 = np.dtype([("type", int), ("pos", float, 3)])

MyArray2 = np.ndarray[tuple[int, ...], MyDtype2]
MyArray3 = np.ndarray[tuple[int, int], MyDtype3]


@typechecked
def process(data: MyArray3) -> None:
    print(data)


try:
    # not an np.ndarray
    process([1, 2, 3])
except TypeCheckError as e:
    print(e)

try:
    # wrong dtype
    process(np.array([(1, [2, 3])], dtype=MyDtype2))
except TypeCheckError as e:
    print(e)

try:
    # wrong shape
    process(np.array([(1, [2, 3, 4])], dtype=MyDtype3))
except TypeCheckError as e:
    print(e)

process(np.array([[(1, [2, 3, 4])]], dtype=MyDtype3))

As a comment I added: "message argument "data" (numpy.ndarray) has shape (1,) but tuple[int, int] was expected from the last exception might be confusing"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant