Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: improve C accum error handling #840

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tylerjereddy
Copy link
Collaborator

  • returning -1 from a public API C function isn't sufficient to provide useful error information when working in Python/CFFI--it only tells you something went wrong if you check for a return code in the first place (which isn't normally done in Python anyway--normally you end execution at the error point and exit() with an appropriate error message)

  • when there are multiple conditions that can trigger the -1 return value, the situation is even worse, one literally has to printf sprinkle the source to figure out what went wrong where

  • as a compromise, I'll leave the -1 approach in since that is quite common in standard C, but I'm going to add in prints to stderr so that Python can then intercept the -1 and refer the user to stderr

  • also, darshan_accumulator_inject() assumed that the module_id was reasonable because it was set/checked in
    darshan_accumulator_create(), however my experience in ENH: Python derived/accum interface #839 was that the accumulator memory location can get freed, or not properly set at CFFI boundary after calling the creation function, so I think the assumption that a previous function was called and worked perfectly is too fragile--I'm adding error handling to prevent a hard segfault on nonsense values of that structure member as a result

* returning `-1` from a public API C function isn't sufficient
to provide useful error information when working in Python/CFFI--it
only tells you something went wrong if you check for a return
code in the first place (which isn't normally done in Python
anyway--normally you end execution at the error point and `exit()`
with an appropriate error message)

* when there are multiple conditions that can trigger the `-1`
return value, the situation is even worse, one literally has
to `printf` sprinkle the source to figure out what went wrong
where

* as a compromise, I'll leave the `-1` approach in since that
is quite common in standard `C`, but I'm going to add in
prints to `stderr` so that Python can then intercept the `-1`
and refer the user to `stderr`

* also, `darshan_accumulator_inject()` assumed that the `module_id`
was reasonable because it was set/checked in
`darshan_accumulator_create()`, however my experience in darshan-hpcgh-839
was that the accumulator memory location can get freed, or not
properly set at CFFI boundary after calling the creation function,
so I think the assumption that a previous function was called
and worked perfectly is too fragile--I'm adding error handling
to prevent a hard segfault on nonsense values of that structure
member as a result
@tylerjereddy
Copy link
Collaborator Author

Unique return codes for different situations would likely be even more useful--i.e., preventing a segfault due to corruption vs. "this module doesn't support accum interface" or whatever.

@carns
Copy link
Contributor

carns commented Nov 1, 2022

Thanks @tylerjereddy . I'm happy with the darshan-logutils-accumulator.c improvements (adding stderr error msgs is a particularly good idea), but the .py code isn't correct (see #839 for issues with the fn prototype in particular). I'm not sure how to proceed exactly, but I think any of the following would be fine, and I can help with the 3rd option if that's preferred.

  • make a new PR with just the .c changes
  • wait for resolution of the bindings/prototype issues in ENH: Python derived/accum interface #839 and then apply the same strategy here
  • add a munit C code unit test for the error handling (this can be tested in C)

@shanedsnyder
Copy link
Contributor

I agree the changes in the C library here are definitely helpful.

I think I'd vote for us to fix the issues in #839 first, and then rebase on that and adopt the same style code for the test in this PR just for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants