-
Notifications
You must be signed in to change notification settings - Fork 1.7k
engine: expose internal logging call counts as internal metrics #10326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
engine: expose internal logging call counts as internal metrics #10326
Conversation
@@ -33,7 +33,7 @@ | |||
#include <stdarg.h> | |||
#include <ctype.h> | |||
|
|||
static flb_sds_t sds_alloc(size_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This collides with the definition in cfl/cfl.h. This caused problems once I added the cmetrics import to flb_log.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is not the name, the problem is that cfl_sds.h
declares it and it's clearly meant to be a private function.
It's ok for you to fix it this way here but please also submit a PR to cfl
removing the public definition of that function and doing the same rename and re-scoping if it in cfl_sds.c
.
It's important for us to be mindful to completely address issues when found and not just do enough to get it out of the way since that makes our own job harder in the future. (It's a request not a complaint)
return NULL; | ||
} | ||
|
||
ret_ctx->u = flb_upstream_create(ret_ctx->config, "127.0.0.1", 2020, 0, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any pattern or prior art for picking random free ports in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not but we can talk about it if you're interested in making that improvement, I'd appreciate it.
d83dbeb
to
68ede1c
Compare
Internal logger calls increment a new v2 metric exposed by the HTTP server Prometheus scrape endpoint. There is one time series per log message type. Signed-off-by: Alec Holmes <[email protected]>
68ede1c
to
e61c2c0
Compare
@@ -164,6 +164,7 @@ struct flb_config { | |||
/* Logging */ | |||
char *log_file; | |||
struct flb_log *log; | |||
struct flb_log_metrics *log_metrics_ctx; /* Global metrics for logging calls */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this metrics context supposed to be limited to the logger instance? If that's the case then it should be declared inside if flb_log
instead of flb_config
?
@@ -232,6 +242,8 @@ static inline int flb_log_suppress_check(int log_suppress_interval, const char * | |||
int flb_log_worker_init(struct flb_worker *worker); | |||
int flb_log_worker_destroy(struct flb_worker *worker); | |||
int flb_errno_print(int errnum, const char *file, int line); | |||
struct flb_log_metrics *flb_log_metrics_create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two shouldn't be public as they should be only invoked by flb_log_create
and flb_log_destroy
.
@@ -391,6 +391,14 @@ struct flb_config *flb_config_init() | |||
flb_regex_init(); | |||
#endif | |||
|
|||
/* Create internal logger metrics */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to flb_log_create
@@ -573,39 +597,32 @@ int flb_log_construct(struct log_message *msg, int *ret_len, | |||
int total; | |||
time_t now; | |||
const char *header_color = NULL; | |||
const char *header_title = NULL; | |||
const char *header_title = flb_log_message_type_str(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't make function calls in the middle of the declarations, initialize it to NULL and move the function call to line 606.
@@ -564,6 +566,28 @@ struct flb_log *flb_log_create(struct flb_config *config, int type, | |||
return log; | |||
} | |||
|
|||
static inline char *flb_log_message_type_str(int type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the return type to const char *
to match the current usage.
return lm; | ||
|
||
error: | ||
if (lm && lm->logs_total_counter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code was to stay this would be a much cleaner structure which I'd ask you to follow:
if (lm != NULL) {
if (lm->logs_total_counter != NULL) {
cmt_counter_destroy(lm->logs_total_counter);
}
if (lm->cmt != NULL) {
cmt_destroy(lm->cmt);
}
flb_free(lm);
}
break; | ||
} | ||
|
||
if (cmt_counter_set(lm->logs_total_counter, ts, 0, 1, (char *[]) {message_type_str}) == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't put function calls inside of the conditions, it's ok to use simpler functions such as strlen
sometimes there but this is not the case, declare a variable, store the result value there and then compare it.
The compiler will optimize it anyway and the result code will remain in a register in 99% of the cases, especially in the modern fastcall-ish conventions (amd64 & arm).
* Initialize counters for all log message types to 0. | ||
* This assumes types are contiguous starting at 1 (FLB_LOG_ERROR). | ||
*/ | ||
i = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
is already initialized by the for
, remove this.
@@ -33,7 +33,7 @@ | |||
#include <stdarg.h> | |||
#include <ctype.h> | |||
|
|||
static flb_sds_t sds_alloc(size_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is not the name, the problem is that cfl_sds.h
declares it and it's clearly meant to be a private function.
It's ok for you to fix it this way here but please also submit a PR to cfl
removing the public definition of that function and doing the same rename and re-scoping if it in cfl_sds.c
.
It's important for us to be mindful to completely address issues when found and not just do enough to get it out of the way since that makes our own job harder in the future. (It's a request not a complaint)
return NULL; | ||
} | ||
|
||
ret_ctx->u = flb_upstream_create(ret_ctx->config, "127.0.0.1", 2020, 0, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not but we can talk about it if you're interested in making that improvement, I'd appreciate it.
This PR adds a new v2 runtime metric that exposes the number of logger calls by message type. A fluent-bit process consistently logging errors can be indicative of significant configuration or infrastructure problems. A common pattern for observing failures across many instances of software is to expose failures as metric counters that can then be observed and alerted on.
The implementation piggybacks on the src/flb_log.c logging library already extracting a worker context from the current thread.
Here is the example output of curling a fluent-bit with a service http_server enabled:
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.