Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_finalize_array must call ak.enforce_type for concatenating categoricals #432

Open
douglasdavis opened this issue Dec 5, 2023 · 3 comments

Comments

@douglasdavis
Copy link
Collaborator

Test failure:

FAILED tests/test_str.py::test_to_categorical - NotImplementedError: merging categorical arrays is currently not implemented. Use `ak.enforce_type` to drop the categorical type and use general merging.

When .compute() is called on a categorical array collection there is a call to ak.concatenate which happens in dask_awkward.lib.core._finalize_array, we need to add a check so that we can call enforce_type when necessary

@douglasdavis
Copy link
Collaborator Author

douglasdavis commented Dec 7, 2023

Or perhaps we require a dak.enforce_type call at the end of a task graph that converges to some array of categoricals. cc @agoose77 for your take!

@agoose77
Copy link
Collaborator

agoose77 commented Dec 8, 2023

When merging categoricals, we want to preserve the uniqueness of the categorical content. If all contents are identical, and therefore only the indices vary, we could define a trivial merge rule. If this is not the case, then we'd need to re-categorise the result.

Right now, Awkward does neither; 2.5.0 just ensures that we properly complain to the user. Users can drop the categorical-ness at the point of computation to ensure that we drop the categorical-ness, and then re-categorize with ak.str.to_categorical after computation.

@jpivarski and I discussed adding a built-in operation to support categorical merging on strings, which is the most likely use-case, over in scikit-hep/awkward#2853. For now, the error is probably OK in my view, but I defer to other people's judgement!

@jpivarski
Copy link
Collaborator

It's a reminder to implement it when it's needed, which may be soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants