Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PandasCodec.encode_request can not handle missing values #1804

Closed
sp1thas opened this issue Jun 13, 2024 · 2 comments
Closed

PandasCodec.encode_request can not handle missing values #1804

sp1thas opened this issue Jun 13, 2024 · 2 comments

Comments

@sp1thas
Copy link

sp1thas commented Jun 13, 2024

While trying to handle missing values, I've noticed that json serialization is not correct:

import pandas as pd
from mlserver.codecs.pandas import PandasCodec

df = pd.DataFrame({'foo': [None, 1.0]})
PandasCodec.encode_request(df).json()

serialized request:

{
    "parameters": {
        "content_type": "pd"
    },
    "inputs": [
        {
            "name": "foo",
            "shape": [
                2,
                1
            ],
            "datatype": "FP64",
            "data": [
                NaN,
                1.0
            ]
        }
    ]
}

In case anyone else is facing the same issue, this is my quick-n-dirty way to handle it:

import pandas as pd
from mlserver.types import InferenceRequest

def replace_nan_with_none(inference_request: InferenceRequest) -> InferenceRequest:
    for i, _input in enumerate(inference_request.inputs):
        for ii, v in enumerate(_input.data.__root__):
            if pd.isna(v):
                inference_request.inputs[i].data.__root__[ii] = None
    return inference_request

In case this is a simple fix that could be handled by a newcomer like me, I would be interested to work on the bug fix.

@sp1thas
Copy link
Author

sp1thas commented Jun 13, 2024

Duplicate of #1747

@sp1thas sp1thas closed this as completed Jun 13, 2024
@bwallima
Copy link

The change only fixes the Problem with nan values. As soon you have any text data with None value it breaks: This problem occurs as soon as there is an attempt to pass any non-numeric values. For instance, I have a df that contains text values, some of which can be None. A fix would be to check for float values in mlserver/codecs/numpy.py row 109: if isinstance(val, float) and np.isnan(val)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants