Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: fix axis=0 concatenation #457

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Jan 25, 2024

Fixes #456

It's clear from #456 that the complexity around axis=0 concatenation makes it hard to reason about!

The bug is caused by the loss of report objects associated with secondary arrays in the concatenate call; right now we take the first array (because all arrays are coerced to the same form) as the "meta".

To fix this, we need to build a "parent" meta such that parent.touch_data("key-1") invokes child1.touch_data("key-1-child"), child2.touch_data("key-2-child"), and so on.

I'll try and clean this up, probably with an Awkward fix, in the near future.

@codecov-commenter
Copy link

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (d3b6208) 93.10% compared to head (c62796d) 92.72%.

Files Patch % Lines
src/dask_awkward/lib/operations.py 73.01% 17 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #457      +/-   ##
==========================================
- Coverage   93.10%   92.72%   -0.38%     
==========================================
  Files          23       23              
  Lines        3293     3355      +62     
==========================================
+ Hits         3066     3111      +45     
- Misses        227      244      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jpivarski
Copy link
Collaborator

Is this affected, or even made easier, by scikit-hep/awkward#3043 (and subsequent changes to dask-awkward to defensively copy reports instead of changing them in place throughout a DAG)?

Once the reports become effectively immutable from dask-awkward's point of view—that is, each DAG node is associated with an unchanging report—then we won't need to have a hierarchical structure, nesting reports, since we could make the output of dak.concatenate propagate all the reports from its inputs.

I came to this from #456, since @gordonwatts hit the same issue again and I was checking up on it.

@agoose77
Copy link
Collaborator Author

I think this will become slightly simpler, but much of the logic will remain. The feature required by result = ak.concatenate((part1, part2, ..., partN)) is that touching result.pt.data also touches part1.pt.data, part2.pt.data, ..., partN.pt.data. I think we'll need some kind of hierarchy. We could encode that as a hierarchical report or with a parentage attribute in the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.concatenate for axis=0 can fail for two files opened using uproot.dask
3 participants