-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: adjust checks in ForthMachine to prevent segfault when num_items is negative #3209
Conversation
Maybe to make it more complete it should set |
That was my first thought, since this is another thing that could go wrong. However, for some AwkwardForth type codes, it has effectively been taking the maximum of the stack-read variable and zero for a while now. Turning this into an error would make code development in AwkwardForth more developer-friendly, but most of the development that (I believe) will ever be done in this language has already been done. Its existence was motivated by accelerating Uproot TTree-reading, and as we move to RNTuple, it becomes irrelevant. In principle, it's useful for other non-columnar data formats whose types are static through the dataset but not known before reading (i.e. the type is specified in a file-bound schema). Avro is one example of that, Protobuf, and Thrift are others, maybe also Flatbuffers, but there hasn't been high demand for getting these formats into Awkward Arrays. AwkwardForth is not the best choice for fully dynamic formats (JSON, BSON, etc.) or columnar formats (Parquet), or if a JIT-compiler is available. It's fairly niche—great for the Uproot TTree use-case—but that's done. So I'd be happy to turn the accidental interpretation of negative awkward/docs/reference/awkwardforth.rst Lines 1183 to 1220 in af232f5
For tests (for this PR), how about the following? I ran these on the old/broken AwkwardForth on Linux. Positive >>> vm = awkward.forth.ForthMachine32("input source 5 source #q-> stack")
>>> vm.run({"source": np.array([1, 2, 3, 4, 5])})
>>> vm.stack
[1, 2, 3, 4, 5] and >>> vm = awkward.forth.ForthMachine32("input source output sink float64 5 source #q-> sink")
>>> vm.run({"source": np.array([1, 2, 3, 4, 5])})
>>> vm.output("sink")
array([1., 2., 3., 4., 5.]) Negative >>> vm = awkward.forth.ForthMachine32("input source -5 source #q-> stack")
>>> vm.run({"source": np.array([1, 2, 3, 4, 5])})
>>> vm.stack
[] and >>> vm = awkward.forth.ForthMachine32("input source output sink float64 -5 source #q-> sink")
>>> vm.run({"source": np.array([1, 2, 3, 4, 5])})
>>> vm.output("sink")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: negative dimensions are not allowed Maybe these would segfault on MacOS, or maybe the segfault only happens deeper in a more complex example. But anyway, with the fix currently implemented in this PR, the first would return the same thing and the second would also return an empty array. Actually, maybe it would be better to test a negative >>> vm = awkward.forth.ForthMachine32("input source -5 source #q-> stack 5 source #q-> stack")
>>> vm.run({"source": np.array([1, 2, 3, 4, 5])})
>>> vm.stack
[1068248464, 3, -1170705504, 0, 49] With this PR, the output should be >>> vm = awkward.forth.ForthMachine32("input source output sink float64 -5 source #q-> sink 5 source #q-> sink")
>>> vm.run({"source": np.array([1, 2, 3, 4, 5])})
>>> vm.output("sink")
array([], dtype=float64) This should also be This is clearly doing bad things in Linux because a few steps after these steps, I got a core dump. In a new process, I tested the positive >>> vm = awkward.forth.ForthMachine32("input source 5 source #q-> stack -5 source #q-> stack")
>>> vm.run({"source": np.array([1, 2, 3, 4, 5])})
>>> vm.stack
[1, 2, 3, 4, 5] and >>> vm = awkward.forth.ForthMachine32("input source output sink float64 5 source #q-> sink -5 source #q-> sink")
>>> vm.run({"source": np.array([1, 2, 3, 4, 5])})
>>> vm.output("sink")
array([], dtype=float64) |
Thanks so much, Jim. I added the rule to the documentation and the tests you suggested. I couldn't find if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good and I'd say it's ready to merge!
I don't know which awkward-cpp tests you mean, but almost all of the awkward-cpp testing is done in Python through Awkward, anyway. (The major exception is the kernel-testing, but that's unrelated to AwkwardForth.) These tests are definitely sufficient.
Hmm I'll have to check why some Windows tests are failing. |
Welp, it's the second time this week where I run into issues by not explicitly specifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks!
@all-contributors please add @ariostas for code |
I've put up a pull request to add @ariostas! 🎉 |
This PR addresses bug that was found in #3188. For some particular file, the
num_items
variable ended up being negative and it was causing a segfault when trying to write some data. I adjusted and added some checks to prevent the segfault and make it crash gracefully. Nevertheless, it does not address the issue with reading the file in #3188.I checked with the Compiler Explorer and both of these options produce the same machine code, so I just used the more expressive one.