-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Go error tracking #1004
base: main
Are you sure you want to change the base?
Add Go error tracking #1004
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I think this will generally work! I left a few comments.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1004 +/- ##
===========================================
- Coverage 81.06% 65.69% -15.38%
===========================================
Files 145 145
Lines 14558 14652 +94
===========================================
- Hits 11802 9625 -2177
- Misses 2188 4481 +2293
+ Partials 568 546 -22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There's one concern I came with thinking more about the symbol table. We essentially need to keep the symbol table live for the duration of the executable being instrumented. We should figure out how much memory is this for something reasonably large, like the OTel Collector, Alloy or Prometheus. If we were to instrument 10 applications, if this symbol table memory footprint is very large, we'll easily balloon the overall Beyla memory usage. I'm not sure if there's a solution, but perhaps this can be optionally enabled for certain applications, if it's large memory footprint. |
04f16b5
to
2513121
Compare
dd270be
to
12a90af
Compare
12a90af
to
b4f9c50
Compare
It mostly LGTM, thanks for the great work! I have a proposal for some improvements with respect to memory (as previously stated by @grcevski ). They don't have to be done in this PR but I'd suggest to do them in another PR.
type symtabs struct {
Get(pid) *Symtab
Put(pid, execPath, *Symtab)
Delete(pid)
} Then, provide two implementations, one using a hash map (to be used when POST-EDIT: ah, I just saw that you already nil the symtabs if error reporting is disabled!
If I misunderstood the way the code works, please forget point |
var stacktrace strings.Builder | ||
if symTab != nil && trace.Error.UstackSz > 0 { | ||
for _, pc := range trace.Error.Ustack { | ||
f := symTab.PCToFunc(pc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to check how this works with multiple processes launched from the same executable. I'm wondering if the symbol table will be meant for the address range of the initial executable which launched. What I mean by this is that I think you must know the executable base memory address to be able to do the PC to symbol translation, because technically the symbol table has only offsets from the executable start, but the PC that comes from eBPF is the absolute memory address of the instruction pointer.
Let's take the following example:
- Imagine we have a symbol table with one function
foo()
at offset0x100
. - We launch the executable once and its base code virtual address is
0x20000
. - We launch the executable one more time and this one starts at virtual address
0x30000
. - We get exception at 0x200105 and then at 0x300105. Technically that's exception at two different addresses, but it's the same exception in two different processes.
We'll only parse the symbol table once, since we detect that the executable is already instrumented.
I think this code translation must take into account the base memory address of the executable, I think the symbol table is not sufficient.
This PR adds the ability to automatically correct Go errors happening during an HTTP request.
The BPF probe only collects the last error, which is reported using
fmt.Errorf
, happened duringthe request.
The stack trace and error message are used to add an exception event to the Span of the request:
https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/
The env var
BEYLA_TRACES_REPORT_EXCEPTION_EVENTS
allows to enable this feature, which isdisabled by default. In a follow up PR i will add docs for config option and details of the feature.
Special thanks to @tpaschalis @inkel @rlankfo which contributed to this project.
Fixes #874