Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(aws provider): Add ability to disable request signing #20973

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jszwedko
Copy link
Member

This can be useful when sending anonymous requests to AWS S3. Potentially it could be useful in other situations as well (e.g. sending to AWS API compatible endpoints that don't support signing).

I'd really have liked to introduce this as a new "strategy" for AWS authentication configuration since it is mutually exclusive with the others, but given the way AWS authentication configuration is currently implemented as an untagged enum, adding it to the default config enum seemed like the best option.

If/when we refactor this to follow
https://github.com/vectordotdev/vector/blob/master/docs/specs/configuration.md#polymorphism then we can move it.

Prompted by a user in discord: https://discord.com/channels/742820443487993987/1267892632319692802/1267892632319692802

This can be useful when sending anonymous requests to AWS S3. Potentially it could be useful in
other situations as well (e.g. sending to AWS API compatible endpoints that don't support signing).

I'd really have liked to introduce this as a new "strategy" for AWS authentication configuration
since it is mutually exclusive with the others, but given the way AWS authentication configuration
is currently implemented as an untagged enum, adding it to the default config enum seemed like the
best option.

If/when we refactor this to follow
https://github.com/vectordotdev/vector/blob/master/docs/specs/configuration.md#polymorphism then we
can move it.

Signed-off-by: Jesse Szwedko <[email protected]>
@jszwedko jszwedko requested review from a team as code owners July 31, 2024 01:44
@github-actions github-actions bot added domain: sinks Anything related to the Vector's sinks domain: external docs Anything related to Vector's external, public documentation labels Jul 31, 2024
Signed-off-by: Jesse Szwedko <[email protected]>
@datadog-vectordotdev
Copy link

datadog-vectordotdev bot commented Jul 31, 2024

Datadog Report

Branch report: jszwedko/add-aws-none-option
Commit report: 949d275
Test service: vector

✅ 0 Failed, 443 Passed, 0 Skipped, 4m 5.35s Total Time

Signed-off-by: Jesse Szwedko <[email protected]>
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Jul 31, 2024
Copy link

@aliciascott aliciascott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good for docs

Comment on lines +705 to +710
match config.auth {
AwsAuthentication::Default { sign, .. } => {
assert!(!sign);
}
_ => panic!(),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be more concise as a assert!(matches!(...))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, true, though I like it as-is since it matches the other tests here.

Comment on lines +207 to +211
.set_http_client(Some(SharedHttpClient::new(connector)))
.set_sleep_impl(Some(SharedAsyncSleep::new(Arc::new(TokioSleep::new()))))
.set_identity_cache(Some(auth.credentials_cache().await?))
.set_region(Some(region.clone()))
.set_retry_config(Some(retry_config.clone()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did these all become options? Somewhat confusing since the credentials provider doesn't appear to be one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good question, in hindsight I should have left a comment explaining this. This change was because I wanted to use set_credentials_provider to pass in an Option (since the credentials provider can be None). This requires a &mut self, and modifies in place, rather than a self where it returns an updated self. I updated all of the calls for consistency but I think I could leave the existing calls and just use set_credentials_provider for the one where I need to pass an Option if preferred.

Copy link
Member Author

@jszwedko jszwedko Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builder has <option> and set_<option> methods for all configuration options. The ones with set_ take an Option (and &mut self) where the ones without do not (and take a self).

@rams3sh
Copy link

rams3sh commented Aug 3, 2024

Just thought of adding this extra information here in case it helps anyone / for anything .With respect to anonymous access to S3 (via S3 sink) for whose use case this unsigned request feature can be used, AWS does not allow multipart upload (for reason only known to AWS ) . S3 supports file uploads in chunks of 5 MB., and hence, only one 5 MB file can be uploaded in one go at max with unsigned request. There is no official document with this regards. Post my discussion in discord (link mentioned in issue description) , I was experimenting with batch uploads of files using AWS cli with anonymous access instead of relying on vector until this feature gets into main branch. And I just happened to stumble on this error.
The closest information I got was the one given in the below link :-

aws/aws-sdk-js#512 (comment)

@rams3sh
Copy link

rams3sh commented Aug 6, 2024

Hey @jszwedko

Even before its merged to main branch , I have started testing it 😝 .So kindly excuse my impatience.

For some reason, the logs are not getting delivered to my S3 bucket with the made changes.

I have detailed my experiment below. Let me know if it's a bug or if I am going wrong somewhere in my steps.

Pr-requisites

The target <MY_BUCKET> is attached with the below policy :-

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "SourceIPBasedLogUploadRestriction",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:PutObject",
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<MY_BUCKET>",
                "arn:aws:s3:::<MY_BUCKET>/<KEY_PREFIX>/*"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": [
                        "<MY_RESTRICTED_IP>/32"
                    ]
                }
            }
        }
    ]
}

And the instance from where vector is being run has <MY_RESTRICTED_IP>.

Steps to reproduce the issue

Step -1 Building from source

apt-get update && apt-get install git curl make libsasl2-dev protobuf-compiler -y && \
    git clone -b "jszwedko/add-aws-none-option" https://github.com/vectordotdev/vector.git && \
    cd vector && \
    make build

Step -2 Drafting a dummy log generator and Anonymous access based S3 sink vector config

data_dir: "/var/lib/vector"
api:
  enabled: false
sources:
  wp_logs:
    type: "demo_logs"
    format: "json"
transforms:
  wp_logs_transformer:
    type: remap
    inputs:
      - wp_logs
    source: |
       . = parse_json!(.message)
sinks:
  wp_logs_s3_sink:
    inputs:
      - "wp_logs_transformer"
    auth:
      sign: false  # Mark siging to false
    type: "aws_s3"
    region: "us-east-1"
    bucket: "<MY_BUCKET>"
    key_prefix: "<KEY_PREFIX>/vector/wp/"
    compression: "gzip"
    buffer:
      type: disk # Store the buffer on disk
      max_size: 268435488 # Maximum size for the buffer is 256 MB
    batch:
      timeout_secs: 300 # Sync at least once every 5 mins
      max_bytes: 4900000 # Sync at least once the buffer size is at 4.9 MB (anonymous request can upload only 5 MB at max at one go, hence this number)
    encoding:
      codec: json

Step 3: Set the trace mode for checking out trace log entries for debugging

export VECTOR_LOG=trace
export RUST_LOG=trace

Step 4: Running vector

vector -c <above_drafted_config>.yaml

When vector is run I get (not some clear) errors which looks like below :-

  1. First one , topology healthcheck failure
2024-08-06T10:36:46.347475Z ERROR vector::topology::builder: msg="Healthcheck failed." error=dispatch failure component_kind="sink" component_type="aws_s3" component_id=wp_logs_s3_sink
  1. Logs related to uploading of objects to S3. (Not clearly giving out any failure, but something related to retrying is being printed. )
2024-08-06T10:20:06.370350Z DEBUG sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime_api::client::interceptors::context: entering 'transmit' phase
2024-08-06T10:20:06.370380Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::orchestrator: transmitting request request=Request { body: SdkBody { inner: BoxBody, retryable: true }, uri: Uri { as_string: "https://s3.us-east-1.amazonaws.com/<MY_BUCKET>/<KEY_PREFIX>/vector/wp/1722939340-f7f62251-d64e-46cd-91f0-fe0ab501f2bc.log.gz?x-id=PutObject", parsed: H0(https://s3.us-east-1.amazonaws.com/<MY_BUCKET>/<KEY_PREFIX>/vector/wp/1722939340-f7f62251-d64e-46cd-91f0-fe0ab501f2bc.log.gz?x-id=PutObject) }, method: PUT, extensions: Extensions { extensions_02x: Extensions, extensions_1x: Extensions }, headers: Headers { headers: {"content-encoding": HeaderValue { _private: H0("gzip") }, "content-md5": HeaderValue { _private: H0("pzHGODDnKfkj+o3UdNivZg==") }, "content-type": HeaderValue { _private: H0("text/x-log") }, "x-amz-storage-class": HeaderValue { _private: H0("STANDARD") }, "content-length": HeaderValue { _private: H0("9698") }, "user-agent": HeaderValue { _private: H0("aws-sdk-rust/1.3.3 os/linux lang/rust/1.79.0") }, "x-amz-user-agent": HeaderValue { _private: H0("aws-sdk-rust/1.3.3 api/s3/1.4.0 os/linux lang/rust/1.79.0") }, "amz-sdk-request": HeaderValue { _private: H0("attempt=1; max=1") }, "amz-sdk-invocation-id": HeaderValue { _private: H0("52344858-f818-4b28-8f7f-08bc1ca4a2a4") }} } }
2024-08-06T10:20:06.370488Z DEBUG sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: applying minimum upload throughput check future options=MinimumThroughputBodyOptions { minimum_throughput: Throughput { bytes_read: 1, per_time_elapsed: 1s }, grace_period: 5s, check_window: 1s }
2024-08-06T10:20:06.370541Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: hyper::client::pool: checkout waiting for idle connection: ("https", s3.us-east-1.amazonaws.com)
2024-08-06T10:20:06.370639Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: hyper::client::connect::http: Http::connect; scheme=Some("https"), host=Some("s3.us-east-1.amazonaws.com"), port=None
2024-08-06T10:20:06.370753Z DEBUG hyper::client::connect::dns: resolving host="s3.us-east-1.amazonaws.com"
2024-08-06T10:20:06.383200Z DEBUG sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: hyper::client::connect::http: connecting to 52.216.145.221:443
2024-08-06T10:20:06.472344Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.573302Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.592039Z DEBUG sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: hyper::client::connect::http: connected to 52.216.145.221:443
2024-08-06T10:20:06.674614Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.776308Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.877325Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated
2024-08-06T10:20:06.978343Z TRACE sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}:invoke{service=s3 operation=PutObject sdk_invocation_id=5648678}:try_op:try_attempt: aws_smithy_runtime::client::http::body::minimum_throughput: not enough data to decide if minimum throughput has been violated

I unset the trace for both RUST_LOG and VECTOR_LOG so that noise can be filtered out and I got below WARN level logs.

2024-08-06T10:41:46.391810Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:41:47.791739Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being suppressed to avoid flooding.
2024-08-06T10:41:59.340587Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] has been suppressed 5 times.
2024-08-06T10:41:59.340644Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:42:02.638699Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being suppressed to avoid flooding.
2024-08-06T10:42:20.896324Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] has been suppressed 1 times.
2024-08-06T10:42:20.896378Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:42:32.296972Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:42:50.259249Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
2024-08-06T10:42:56.355776Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being suppressed to avoid flooding.
2024-08-06T10:43:11.253607Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] has been suppressed 1 times.
2024-08-06T10:43:11.253661Z  WARN sink{component_kind="sink" component_id=wp_logs_s3_sink component_type=aws_s3}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true

Further, I tested with aws cli with sample file just to make sure if my permissions were correct. The sample file got uploaded successfully. The command used for it is given below :-

aws s3 cp <SAMPLE_FILE> s3://<MY_BUCKET>/<KEY_PREFIX>/ --no-sign-request

Thought of putting these now here, so that if any issue exists, it can be corrected in the PR.

@jszwedko
Copy link
Member Author

jszwedko commented Aug 9, 2024

@rams3sh Thanks for trying this out proactively! That's unfortunate to hear it doesn't seem to be working for you. I'll try to reproduce and see if I can figure it out this upcoming week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks domain: sources Anything related to the Vector's sources
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants