You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current State
The class variable mapping of the BaseEncoder class is a dictionary with the following structure
a dictionary with the following item:
'col' -> column name (str)
'mapping' -> pd.Series containing the mapping from category to encoding
Suggestion
Avoid pandas data frames and just use dictionaries instead. These are faster, easier to read and easier to manipulate. Mappings could be chained more easily. Moreover, many of the encoders, they are converted to dict anyways.
Use a nested dictionary of the structure {colname: {'cat_a': mapping_val_a, 'cat_b': mapping_val_b, 'cat_c': mapping_val_c ,...)
Actually the whole encoder could just yield the mapping dictionary, because the column names can be retrieved by mapping.keys().
This would really be helpful for better development and readability, but would be work to update and test all subclasses. I think it is unnecessary that users can supply both dictionaries and pd.Series as mappings because they can run pd.Series.to_dict() themselves.
The text was updated successfully, but these errors were encountered:
julia-kraus
changed the title
Simplify BaseEncoder Output
Simplify BaseEncoder mapping
Apr 3, 2023
Hi Julia,
thanks for your suggestion. I think you're right and we should do it. This would be a major simplification but also a major change in the API. Maybe we can put it in the backlog for version 3.0 together with more simplifications and restructuring that we've been discussing lately
Current State
The class variable
mapping
of the BaseEncoder class is a dictionary with the following structurea dictionary with the following item:
Suggestion
Avoid pandas data frames and just use dictionaries instead. These are faster, easier to read and easier to manipulate. Mappings could be chained more easily. Moreover, many of the encoders, they are converted to dict anyways.
Use a nested dictionary of the structure
{colname: {'cat_a': mapping_val_a, 'cat_b': mapping_val_b, 'cat_c': mapping_val_c ,...)
Actually the whole encoder could just yield the mapping dictionary, because the column names can be retrieved by mapping.keys().
This would really be helpful for better development and readability, but would be work to update and test all subclasses. I think it is unnecessary that users can supply both dictionaries and pd.Series as mappings because they can run
pd.Series.to_dict()
themselves.The text was updated successfully, but these errors were encountered: