Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify BaseEncoder mapping #403

Open
julia-kraus opened this issue Apr 2, 2023 · 1 comment
Open

Simplify BaseEncoder mapping #403

julia-kraus opened this issue Apr 2, 2023 · 1 comment

Comments

@julia-kraus
Copy link

julia-kraus commented Apr 2, 2023

Current State
The class variable mapping of the BaseEncoder class is a dictionary with the following structure
a dictionary with the following item:

  • 'col' -> column name (str)
  • 'mapping' -> pd.Series containing the mapping from category to encoding

Suggestion
Avoid pandas data frames and just use dictionaries instead. These are faster, easier to read and easier to manipulate. Mappings could be chained more easily. Moreover, many of the encoders, they are converted to dict anyways.
Use a nested dictionary of the structure {colname: {'cat_a': mapping_val_a, 'cat_b': mapping_val_b, 'cat_c': mapping_val_c ,...)

Actually the whole encoder could just yield the mapping dictionary, because the column names can be retrieved by mapping.keys().

This would really be helpful for better development and readability, but would be work to update and test all subclasses. I think it is unnecessary that users can supply both dictionaries and pd.Series as mappings because they can run pd.Series.to_dict() themselves.

@julia-kraus julia-kraus changed the title Simplify BaseEncoder Output Simplify BaseEncoder mapping Apr 3, 2023
@PaulWestenthanner
Copy link
Collaborator

Hi Julia,
thanks for your suggestion. I think you're right and we should do it. This would be a major simplification but also a major change in the API. Maybe we can put it in the backlog for version 3.0 together with more simplifications and restructuring that we've been discussing lately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants