Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column optimization breaks in case of computing function of records in addition to exploded records? #426

Closed
lgray opened this issue Dec 1, 2023 · 9 comments · Fixed by #430

Comments

@lgray
Copy link
Collaborator

lgray commented Dec 1, 2023

import dask
from dask_awkward.lib.testutils import assert_eq

from coffea.nanoevents import NanoEventsFactory

with dask.config.set({"awkward.optimization.enabled": True}):
    eagerevents = NanoEventsFactory.from_root(
        {"tests/samples/nano_dy.root": "Events"},
        delayed=False,
    ).events()

    daskevents = NanoEventsFactory.from_root(
        {"tests/samples/nano_dy.root": "Events"},
        delayed=True,
    ).events()

    mval_eager, (a_eager, b_eager) = eagerevents.Muon.metric_table(
        eagerevents.Jet, return_combinations=True
    )
    mval_dask, (a_dask, b_dask) = daskevents.Muon.metric_table(
        daskevents.Jet, return_combinations=True
    )

    mval_dask, (a_dask, b_dask) = dask.compute(mval_dask, (a_dask, b_dask))

    assert_eq(mval_eager, mval_dask)
    assert_eq(a_eager, a_dask)
    assert_eq(b_eager[field], b_dask[field])

results in:

/Users/lgray/coffea-dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
/Users/lgray/coffea-dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(
Traceback (most recent call last):
  File "iason.py", line 26, in <module>
    assert_eq(a_eager, a_dask)
  File "/Users/lgray/coffea-dev/dask-awkward/src/dask_awkward/lib/testutils.py", line 37, in assert_eq
    assert_eq_arrays(
  File "/Users/lgray/coffea-dev/dask-awkward/src/dask_awkward/lib/testutils.py", line 67, in assert_eq_arrays
    b_tt = typetracer_array(b)
  File "/Users/lgray/coffea-dev/dask-awkward/src/dask_awkward/lib/core.py", line 2390, in typetracer_array
    a.layout.to_typetracer(forget_length=True),
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/contents/content.py", line 275, in to_typetracer
    return self._to_typetracer(forget_length)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/contents/listoffsetarray.py", line 230, in _to_typetracer
    self._content._to_typetracer(forget_length),
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/contents/listoffsetarray.py", line 230, in _to_typetracer
    self._content._to_typetracer(forget_length),
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/contents/indexedarray.py", line 243, in _to_typetracer
    self._content._to_typetracer(forget_length),
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/contents/recordarray.py", line 370, in _to_typetracer
    contents = [x._to_typetracer(forget_length) for x in self._contents]
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/contents/recordarray.py", line 370, in <listcomp>
    contents = [x._to_typetracer(forget_length) for x in self._contents]
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/contents/numpyarray.py", line 225, in _to_typetracer
    data = self._raw(backend.nplike)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/contents/numpyarray.py", line 199, in _raw
    return to_nplike(self.data, nplike, from_nplike=self._backend.nplike)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/_nplikes/__init__.py", line 41, in to_nplike
    return nplike.asarray(array)
  File "/Users/lgray/miniforge3/envs/coffea-dev/lib/python3.8/site-packages/awkward/_nplikes/typetracer.py", line 658, in asarray
    assert not isinstance(obj, PlaceholderArray)
AssertionError

And we shouldn't be running into PlaceholderArrays after column optimization!
If you flip column optimization to False there are no more placeholder arrays, but an assertion is throw because of an error I need to track down in coffea (but there's no exception from within awkward array).

You'll need to use coffea@master to have this produce the appropriate error.

@douglasdavis
Copy link
Collaborator

I've been messing around with this issue again.

@agoose77 is there a quick way to determine where the PlaceholderArray's exist in a highlevel ak.Array object? I've been revisiting this issue and I'm quite confused as to what's going on.

The exact code block @lgray posted above does indeed raise the error, but if I change it up a bit and do something like this:

...

mval_dask, (a_dask, b_dask) = dask.compute(mval_dask, (a_dask, b_dask))
print(a_dask)  # <-- error
print(b_dask) # <-- error
print((a_dask, b_dask)) #<-- NO error!

Something strange going on with __str__/__repr__ (and related to PlaceholderArray becase printing causes slicing which PlaceholderArray doesn't support)

@lgray the compute appears to be running successfully, but are you running into a case where some fields you expect to exist in the resulting array are instead populated with a PlaceholderArray (i.e. over-optimizing)?

@agoose77
Copy link
Collaborator

@douglasdavis are we on the same page that the cause of this is the issue discussed in #430? I'm just away from my PC at the moment, but I'll touch base on this when I'm back.

@lgray
Copy link
Collaborator Author

lgray commented Jan 18, 2024

I'll give it a try!

@lgray
Copy link
Collaborator Author

lgray commented Jan 18, 2024

Yeah it's a specific array (usually the first one in the record?) that is a PlaceholderArray and it is only that field, the other are properly backed by data. The failure is triggered when I attempt to access the data of the bugged field.

@douglasdavis
Copy link
Collaborator

douglasdavis commented Jan 18, 2024

@agoose77 yes I think we're on the same page with #430! I was just taking a step back to remember what exactly was going on and that led me to observing that confusing repr/str stuff - just wanted to confirm with Lindsey what the more concrete issue was for him, which he just confirmed :)

And it also got me thinking about how it would be useful to walk down an array and easily find placeholders

@agoose77
Copy link
Collaborator

agoose77 commented Jan 18, 2024

@douglasdavis fab!

RE your actual question, no — not right now. You can use ak.to_buffers with a smarter key function if you just want to visually inspect things. If you wanted something more robust, e.g. a visual tree-like representation, then we could perhaps think about adding a repr to these arrays.

For debugging, you could just set PlaceholderArray.__repr__, if needs be!

@martindurant
Copy link
Collaborator

we could perhaps think about adding a repr to these arrays

Sounds like a useful thing for debugging. It isn't great if you can't repr an object.

@agoose77
Copy link
Collaborator

Sounds like a useful thing for debugging. It isn't great if you can't repr an object.

Yes, that's the argument in favour. Right now the error is somewhat useful, because normal pathways should never need to repr these arrays — this would be a touching operation.

Perhaps we want a context variable that can be set to opt-in.

@martindurant
Copy link
Collaborator

I think a dumb repr like "<placeholder array>" (perhaps with the typestr) would be fine and shouldn't need to access any "buffer"s which we know don't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants