Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of categorical features #238

Open
eonu opened this issue Nov 3, 2023 · 0 comments
Open

Representation of categorical features #238

eonu opened this issue Nov 3, 2023 · 0 comments

Comments

@eonu
Copy link

eonu commented Nov 3, 2023

Question

According to #141, categorical features are expected to not be one-hot encoded when going into synthcity plugins.

When attempting to provide a pandas.DataFrame to .fit of the Fourier Flows plugin, I receive an error about np.isnan not being callable on categorical data.

I am using the pandas.Categorical data type for category columns.

Should I provide integer codes for the categories instead? And if so, how does synthcity know to differentiate these from an actual integer-valued column?

Further Information

  File "/Users/eonu/dev/synthcity/src/synthcity/plugins/core/dataloader.py", line 1225, in pad_and_mask
    temporal_data, observation_times = TimeSeriesDataLoader.mask_temporal_data(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
    from contextlib import _GeneratorContextManager
  File "pydantic/decorator.py", line 134, in pydantic.decorator.ValidatedFunction.call
    
  File "pydantic/decorator.py", line 206, in pydantic.decorator.ValidatedFunction.execute
    
  File "/Users/eonu/dev/synthcity/src/synthcity/plugins/core/dataloader.py", line 1132, in mask_temporal_data
    nan_cnt += np.asarray(np.isnan(item)).sum()
                          ^^^^^^^^^^^^^^
  File "/Users/eonu/env/synth/lib/python3.11/site-packages/pandas/core/generic.py", line 2016, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/eonu/env/synth/lib/python3.11/site-packages/pandas/core/arraylike.py", line 404, in array_ufunc
    result = mgr.apply(getattr(ufunc, method))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/eonu/env/synth/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 350, in apply
    applied = b.apply(f, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^
  File "/Users/eonu/env/synth/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 329, in apply
    result = func(self.values, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/eonu/env/synth/lib/python3.11/site-packages/pandas/core/arrays/categorical.py", line 1374, in __array_ufunc__
    raise TypeError(
TypeError: Object with dtype category cannot perform the numpy op isnan

System Information

  • OS: macOS (M1)
  • OS Version: Ventura 13.2.1
  • Language Version: 3.11.3
  • Package Manager Version: pip 23.3.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant