Skip to content

serialization-deserialization bug #143

@patrickleonardy

Description

@patrickleonardy

Bug Report

After serializing and de-serializing a PreProcessor with only contiguous variables (to check if it is also the case when categorical variables are present)

  1. the preprocessor object can not be printed -> AttributeError
  2. when trying to transform data the KBinsDiscretizer throws -> NotFittedError

Description

For the first point: It seems that the problem with the difference in the naming of the attributes and the parameters in the function definition. self._get_param_names() returns "categorical_data_processor" but getattr() only knows "_categorical_data_processor".
By changing the naming this problem is resolved is there no other way ?

For the second point: There is a problem when creating the pipeline_dictionary it seems that some keywords are empty even if they should have a value.

Steps to Reproduce

  1. Load a dataset:
    from sklearn.datasets import load_iris
    import pandas as pd
    X, y = load_iris(return_X_y=True, as_frame=True)
    df = pd.concat([X,y])
    df = df.rename({0:"target"}, axis=1)
  2. Create preprocessor and fit it
    from cobra.preprocessing import PreProcessor
    preprocessor = PreProcessor.from_params()
    continuous_vars = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
    discrete_vars = []
    preprocessor.fit( df, continuous_vars= continuous_vars, discrete_vars= discrete_vars, target_column_name="target" )
  3. Serialize the preprocessor
    pipeline_serialized = preprocessor.serialize_pipeline()
  4. De-serialize
    new_preprocessor = PreProcessor.from_pipeline(pipeline_serialized)
  5. See what happens when printing
    new_preprocessor
  6. See what happens when transforming
    new_preprocessor.transform( df, continuous_vars= continuous_vars, discrete_vars= [] )

Actual Results

I got ...

MicrosoftTeams-image
MicrosoftTeams-image (1)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions