Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nan values in PandasCodec #1747

Open
Pappol opened this issue May 7, 2024 · 7 comments
Open

Add support for nan values in PandasCodec #1747

Pappol opened this issue May 7, 2024 · 7 comments
Assignees

Comments

@Pappol
Copy link

Pappol commented May 7, 2024

PandasCodec does not support null values well at all, the method can_encode is completely misleading by just checking if it is a dataframe.

@sp1thas
Copy link

sp1thas commented Jun 13, 2024

Thanks for raising this @Pappol

json serialization is wrong as well:

import pandas as pd
from mlserver.codecs.pandas import PandasCodec

df = pd.DataFrame({'foo': [None, 1.0]})
PandasCodec.encode_request(df).json()

serialized request:

{
    "parameters": {
        "content_type": "pd"
    },
    "inputs": [
        {
            "name": "foo",
            "shape": [
                2,
                1
            ],
            "datatype": "FP64",
            "data": [
                NaN,
                1.0
            ]
        }
    ]
}

btw, In case anyone else is facing the same issue, this is my quick-n-dirty way to handle it:

import pandas as pd
from mlserver.types import InferenceRequest

def replace_nan_with_none(inference_request: InferenceRequest) -> InferenceRequest:
    for i, _input in enumerate(inference_request.inputs):
        for ii, v in enumerate(_input.data.__root__):
            if pd.isna(v):
                inference_request.inputs[i].data.__root__[ii] = None
    return inference_request

@Pappol
Copy link
Author

Pappol commented Jun 14, 2024

Main issue is with dates data types

@ramonpzg
Copy link
Contributor

Hi @sp1thas -- Thanks for bringing this up and for showing your workaround. I will assign this to myself and have a look at what exactly is causing this.

@Pappol -- Do you have a reproducible example of the behaviour you are experiencing?

@ramonpzg ramonpzg self-assigned this Jul 24, 2024
@sp1thas
Copy link

sp1thas commented Aug 28, 2024

Hey @ramonpzg , I've also took a look in the meanwhile and I've opened #1893 . Could you review it? Looking forward for your input.

@sp1thas
Copy link

sp1thas commented Sep 5, 2024

The serialization issue with np.nan is tackled since 1.4.0 and #1346 .

@bwallima
Copy link

The change only fixes the Problem with nan values. As soon you have any text data with None value it breaks: This problem occurs as soon as there is an attempt to pass any non-numeric values. For instance, I have a df that contains text values, some of which can be None. A fix would be to check for float values in mlserver/codecs/numpy.py row 109: if isinstance(val, float) and np.isnan(val)

@sakoush
Copy link
Member

sakoush commented Oct 23, 2024

@bwallima many thanks for the comment, we welcome contributions to mlserver. Feel free to raise a PR with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants