-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert automatically to arrow strings #86
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @phofl
arrow_options_meta = arrow_options.copy() | ||
if pyarrow_strings_enabled(): | ||
types_mapper = _get_types_mapper(arrow_options.get("types_mapper", {}.get)) | ||
if types_mapper is not None: | ||
arrow_options_meta["types_mapper"] = types_mapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this twice, once here and once in bigquery_read
. Thoughts on keeping this here, passing arrow_options
(with the correct types_mapper
) through to dd.from_map
below? That way we could drop the convert_string=
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this too.
I'd like to avoid serialising that stuff in the graph, just passing a flag seems a lot easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Are the types_mapper
entries we're adding particularly large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds a few callables, which doesn't seem like a good idea (didn't do any profiling)
dask_bigquery/tests/test_core.py
Outdated
@@ -387,6 +388,21 @@ def test_arrow_options(table): | |||
assert ddf.dtypes["name"] == pd.StringDtype(storage="pyarrow") | |||
|
|||
|
|||
@pytest.mark.parametrize("convert_string", [True, False]) | |||
def test_convert_string(table, convert_string): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks nice. It'd be good to include an assert_eq
check too to make sure the values are as expected, not just the name
column dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this in a few other places already and I can't run tests locally, so don't want to mess around with things too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Just pushed a small commit to update this (and one other) test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
Co-authored-by: James Bourbeau <[email protected]>
closes #85