Convert automatically to arrow strings #86

phofl · 2024-07-24T21:22:01Z

closes #85

jrbourbeau

dask_bigquery/core.py

jrbourbeau · 2024-07-25T15:00:31Z

dask_bigquery/core.py

+        arrow_options_meta = arrow_options.copy()
+        if pyarrow_strings_enabled():
+            types_mapper = _get_types_mapper(arrow_options.get("types_mapper", {}.get))
+            if types_mapper is not None:
+                arrow_options_meta["types_mapper"] = types_mapper


We have this twice, once here and once in bigquery_read. Thoughts on keeping this here, passing arrow_options (with the correct types_mapper) through to dd.from_map below? That way we could drop the convert_string= parameter.

I thought about this too.

I'd like to avoid serialising that stuff in the graph, just passing a flag seems a lot easier.

Ah, I see. Are the types_mapper entries we're adding particularly large?

It adds a few callables, which doesn't seem like a good idea (didn't do any profiling)

jrbourbeau · 2024-07-25T15:04:43Z

dask_bigquery/tests/test_core.py

@@ -387,6 +388,21 @@ def test_arrow_options(table):
    assert ddf.dtypes["name"] == pd.StringDtype(storage="pyarrow")


+@pytest.mark.parametrize("convert_string", [True, False])
+def test_convert_string(table, convert_string):


Overall this looks nice. It'd be good to include an assert_eq check too to make sure the values are as expected, not just the name column dtype.

We have this in a few other places already and I can't run tests locally, so don't want to mess around with things too much

No problem. Just pushed a small commit to update this (and one other) test.

Co-authored-by: James Bourbeau <[email protected]>

phofl added 5 commits July 24, 2024 23:21

Convert automatically to arrow strings

7bec049

Fixup

ffa770a

Fixup

f5ba5a6

Remove print

97c4596

Fixup

2768a87

jrbourbeau reviewed Jul 25, 2024

View reviewed changes

phofl and others added 2 commits July 25, 2024 17:06

Update dask_bigquery/core.py

4f9373d

Co-authored-by: James Bourbeau <[email protected]>

Test coverage

cf4629d

jrbourbeau approved these changes Jul 25, 2024

View reviewed changes

jrbourbeau merged commit 1215a51 into main Jul 25, 2024
13 checks passed

jrbourbeau deleted the arrow_strings branch July 25, 2024 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert automatically to arrow strings #86

Convert automatically to arrow strings #86

phofl commented Jul 24, 2024 •

edited

Loading

jrbourbeau left a comment

jrbourbeau Jul 25, 2024

phofl Jul 25, 2024

jrbourbeau Jul 25, 2024

phofl Jul 25, 2024

jrbourbeau Jul 25, 2024

phofl Jul 25, 2024

jrbourbeau Jul 25, 2024

phofl Jul 25, 2024

Convert automatically to arrow strings #86

Convert automatically to arrow strings #86

Conversation

phofl commented Jul 24, 2024 • edited Loading

jrbourbeau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl commented Jul 24, 2024 •

edited

Loading