-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avro reader crash #3188
Comments
Although there's not much demand (that I know of) for the Avro reader, it's a canary in the coalmine for AwkwardForth and therefore Uproot. If the Avro is not handled correctly, it can fail with a Forth runtime error, but it should not be segfaulting under any circumstances, so this is serious. I'll try to reproduce it sometime when I have access to my main computer again and see if I can narrow in on the point of failure. |
In addition, from_avro_file fails if passed a file-like rather than a str filename. On this line, the attribute should be |
Has PR #3167 solved this issue? |
I didn't see that PR. I can test. |
Updating awkward to 55a9a43 (current main) still crashes. Is there a chance of any cached state from a previous run? |
I'm not sure. GitHub Actions recompiles awkward-cpp in each test, so we shouldn't cache from previous runs. (@agoose77, did the test ever have a cache like that?) Meanwhile, is this an awkward-cpp-only reproducer? python -c 'from awkward_cpp.lib._ext import ForthMachine32 ; vm = ForthMachine32("1 2 3 4 5"); vm.run(); print(vm.stack)' I'm thinking that we should bisect awkward-cpp itself, removing parts of it until there isn't a segfault and then see which part makes the difference. Awkward relies on many parts of awkward-cpp, but awkward-cpp itself is mostly a bag of functions and classes; it should be easy to remove large parts of it and still be able to rely on the above test. |
That line runs fine. The avro reader can read other files (I have a workign test in akimbo for avro), in case that wasn't clear - it only segfaults with the example at the top, a file which reads fine with fastavro. |
but I just did
(only happens if you make the machine twice) |
The branch This should be a sufficient test:
I think the GitHub Actions (which will never pass!) provide artifacts. This works on my M2 Mac (same for awkward-cpp without removing anything). I have Apple clang version 15.0.0 (clang-1500.3.9.4), so in principle, it shouldn't work. If I understand things, the above doesn't work for you: it segfaults on the second ForthMachine. |
I got segfault only if I made two machines without running the first one |
Okay, so the test that shows the segfault for you is python -c 'from awkward_cpp.lib._ext import ForthMachine32 ; vm = ForthMachine32("1 2 3 4 5"); vm = ForthMachine32("1 2 3 4 5"); vm.run(); print(vm.stack)' Interestingly, it still doesn't segfault for me. But anyway, we have a much smaller surface area to find the bug. @ianna, are you able to reproduce it? (@henryiii said that it's independent of Intel vs Apple Silicon.) |
No, that doesn't fail either; I got the crash only with |
I was curious about this, so I decided to look into it a bit. I'm not sure if this was already known, but I'll document what I found here. I was only able to reproduce the segfault with
to something like num_items = stack_pop();
if (num_items < 0) num_items = 0; makes it so that at least it crashes gracefully. As for why negative numbers end up being popped from the stack and interpreted as a length, I'm not sure. I'm not familiar with this code, so I couldn't figure it out. Also, I'm not sure if this is related to the segfault seen from the minimal version. If someone wants to give me a quick rundown of how the code works, I can look into this further. |
@ariostas, you found a language-level bug in AwkwardForth (as opposed to a bug in the Avro-reader, which wouldn't worry me as much). The reason a value popped from the stack is being interpreted as a number of items to read is because this is the implementation of vectorized read words like Most uses of
which shouldn't pose a problem: if the
and count-down
and
These However, the Avro reader couldn't be hitting this problem because Avro doesn't have any of these unusual (but fixed) length integers. (Avro has variable length integers, which are implemented with The one case that might cause a segfault is in the write-to-stack macros: awkward/awkward-cpp/src/libawkward/forth/ForthMachine.cpp Lines 3255 to 3257 in c3c4f6f
awkward/awkward-cpp/src/libawkward/forth/ForthMachine.cpp Lines 3273 to 3275 in c3c4f6f
awkward/awkward-cpp/src/libawkward/forth/ForthMachine.cpp Lines 3294 to 3296 in c3c4f6f
A negative value could break that, and it might happen in the Avro reader when a contiguous array of floating-point or boolean values are read. (The other cases are variable-length integers.) Your fix of ensuring that
(I'm only saying that because this is highly streamlined code, and doing the check later would slow down fewer cases. I doubt the jump-compare-zero and assignment is much of a speedbump, though.) So this is great! With this fix, I'd like to know if @martindurant still sees any segfault. In #3188 (comment), he found one with tab-complete, which can't be related to this, but I wonder if that's still reproducible. (It's such an odd case, since |
Thank you @jpivarski, that sounds great.
You also need the check inside this
As for speed, since it's a one-liner I would expect the compiler to figure out the best option, but maybe it can be faster to force it to be branchless with num_items *= (num_items >= 0) |
Do you want to write the PR or me? |
I can do it |
Version of Awkward Array
ak 2.6.5
ak-cpp 34
Description and code to reproduce
I tried to read the avro file at this location https://github.com/Teradata/kylo/blob/master/samples/sample-data/avro/userdata5.avro (downloaded to local) using
ak.from_avro_file
and gotsegmentation fault
. The file loads fine with fastavro.macOS 13.6.7, conda install of python 3.10
The text was updated successfully, but these errors were encountered: