-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT Add make_series and make_dataframe #798
FEAT Add make_series and make_dataframe #798
Conversation
skrub/dataframe/_pandas.py
Outdated
X : Pandas dataframe | ||
Converted output. | ||
""" | ||
if not isinstance(X, dict) or not all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of mistake do you foresee to make this check necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be slightly overkill, but I think about dictionaries of 2d arrays or dataframes.
For instance:
pd.DataFrame(
dict(
a=[[1, 2, 1]],
b=[1, 2, 3],
)
)
will raise ValueError: All arrays must be of the same length
, but this error is not informative enough IMO.
I suspect this kind of error might happen more than we'd expect. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this is a private function, right? that will be used only by skrub code not directly by users. so if skrub passes it incorrect inputs there isn't too much the user can do about it except report the problem
btw WDYT about renaming dataframe
to _dataframe
? we want skrub estimators to support both pandas and polars, but I'm not sure we want to publicly expose the machinery that makes it possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for the renaming, I'll remove the check
I adjusted the API to make the |
LGTM, I think once you remove the merge conflict we can merge it |
I think you also need to update |
and the docstring of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Addresses https://github.com/skrub-data/skrub/pull/784/files#r1358253388
This simple PR implements, for both Pandas and Polars:
When merged, this can be used directly in #784 for example.