Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE 9670: Adds AWS credentials refresh to out_prometheus_remote_write #9765

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Tradunsky
Copy link

@Tradunsky Tradunsky commented Dec 25, 2024

Handle 403 http error code when credentials expired, credentials refreshed from ~/.aws/credentials.

Fixes: #9670
Similar implementation already exists for kinesis_streams:

aws_client->provider->provider_vtable->


Steps to reproduce:

  1. Create temporary credentials in ~/.aws/credentials with min duration time 900 (anything to reproduce quickly). Put the output to ~/.aws/credentials as default profile.
aws sts assume-role --role-arn arn:aws:iam::<account_number>:role/prometheus_role --role-session-name tmp --duration-seconds 900
#or 
aws sts get-session-token     --duration-seconds 900     --serial-number arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):mfa/role_name     --token-code MFA-CODE <MFA code> 
  1. Start fluent-bit with the given configuration file:
[SERVICE]
    Flush        1
    Log_Level    DEBUG

[INPUT]
    Name                 node_exporter_metrics
    Tag                  metrics
    Scrape_interval      30

[OUTPUT]
    Name prometheus_remote_write
    Match metrics
    Host aps-workspaces.us-west-2.amazonaws.com
    Port 443
    Uri /workspaces/ws-<your workspaceid>/api/v1/remote_write
    AWS_Auth true
    AWS_region us-west-2
    Tls On
    Tls.verify On
    add_label  test test
./bin/fluent-bit -c fluent-bit.conf
  1. Wait until the credentials expire and fluent-bit prometheus_remote_write out plugin starts to fail with 403 credentials expired as shown in the example:
[2024/12/24 16:31:49] [error] [output:prometheus_remote_write:prometheus_remote_write.1] aps-workspaces.us-west-2.amazonaws.com:443, HTTP status=403
{"message":"The security token included in the request is expired"}
  1. Repeat the step #1 to refresh credentials in ~/.aws/credentials with much fresh credentials (usually done by automation):
    Before the PR fix: Fluent-bit keeps failing with 403 as it is using old expired credentials that is cached in memory
[error] [output:prometheus_remote_write:prometheus_remote_write.1] aps-workspaces.us-west-2.amazonaws.com:443, HTTP status=403
{"message":"The security token included in the request is expired"}

After the PR fix: Fluent-bit picks up fresh credentials without downtime.

[2024/12/24 18:45:21] [ info] [output:prometheus_remote_write:prometheus_remote_write.0] auth error, refreshing creds
[2024/12/24 18:45:21] [debug] [aws_credentials] Refresh called on the env provider
[2024/12/24 18:45:21] [debug] [aws_credentials] Refresh called on the profile provider
[2024/12/24 18:45:21] [debug] [aws_credentials] Reading shared config file.
[2024/12/24 18:45:21] [debug] [aws_credentials] Reading shared credentials file.
[2024/12/24 18:45:21] [debug] [upstream] KA connection #89 to aps-workspaces.us-west-2.amazonaws.com:443 is now available
[2024/12/24 18:45:21] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] http_post result FLB_RETRY
...

[2024/12/24 18:45:43] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] signing request with AWS Sigv4
[2024/12/24 18:45:43] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] aps-workspaces.us-west-2.amazonaws.com:443, HTTP status=200
[2024/12/24 18:45:43] [debug] [upstream] KA connection #88 to aps-workspaces.us-west-2.amazonaws.com:443 is now available
[2024/12/24 18:45:43] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] http_post result FLB_OK

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS rolling credentials from file support for Prometheus
1 participant