Implement inner/outer/left/right joins #262

lgray · 2023-05-13T13:37:27Z

It will be the case that HEP analyses are going to join small columns of privately produced data with a larger central set.
This is not yet possible in dask_awkward but is reasonably well trodden in dask-array and dask-dataframe.

The goal would be to take two datasets with a shared key definition (like run number, luminosity block, and event number) and then merge those tables together into a single unified dataset that a dask_awkward analysis can handle cleanly (accounting for missing keys, file skew, etc. - i.e. the usual general distributed table join operations a la SQL).

Related to / likely depends on #250

lgray · 2023-05-13T13:37:45Z

@iasonkrom

martindurant · 2023-05-15T16:24:29Z

Joins on a single-partition of concrete in-memory array we can implement trivially.

is reasonably well trodden in dask-array and dask-dataframe.

Only the latter? It depends on a well-understood concept of an index, which doesn't really apply to our arrays. Is such a join always "row-wise", meaning by top-level array item rather than at some deeper nesting level in the schema?

lgray · 2023-05-15T19:46:12Z

It applies to records, at least in HEP.
Indeed, it's rowwise, so there would be some key that we specify (like run number, luminosity block, event number) that we would then join two record arrays together based on.

I think if we could get the row-wise implementation first the more arbitrary keys could come later, the row-wise should be by far the most common and useful use case. The distributed join would be the most important thing to address, since the in-memory part is easy as you say.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement inner/outer/left/right joins #262

Implement inner/outer/left/right joins #262

lgray commented May 13, 2023 •

edited

Loading

lgray commented May 13, 2023

martindurant commented May 15, 2023

lgray commented May 15, 2023 •

edited

Loading

Implement inner/outer/left/right joins #262

Implement inner/outer/left/right joins #262

Comments

lgray commented May 13, 2023 • edited Loading

lgray commented May 13, 2023

martindurant commented May 15, 2023

lgray commented May 15, 2023 • edited Loading

lgray commented May 13, 2023 •

edited

Loading

lgray commented May 15, 2023 •

edited

Loading