Skip to content

Implement S3 Support #456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Implement S3 Support #456

wants to merge 11 commits into from

Conversation

eberle1080
Copy link

@eberle1080 eberle1080 commented Aug 4, 2025

This PR implements S3 support. It stores messages as objects in S3 and allows certain attributes to be configured, such as:

  • Bucket name
  • Region
  • Object prefix + suffix
  • Some control over object dynamic naming (admittedly this could be more flexible)
  • Storage class

Copy link

vercel bot commented Aug 4, 2025

@eberle1080 is attempting to deploy a commit to the Hookdeck Team on Vercel.

A member of the Team first needs to authorize it.

@eberle1080 eberle1080 marked this pull request as ready for review August 5, 2025 18:58
@Copilot Copilot AI review requested due to automatic review settings August 5, 2025 18:58
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements comprehensive AWS S3 support for Outpost, adding the ability to store events in Amazon S3 buckets as a destination type. The implementation includes validation, configurable object key generation, storage class support, and proper error handling.

Key changes include:

  • Complete S3 provider implementation with validation and publisher functionality
  • Comprehensive test coverage for validation and formatting
  • Integration with the destination registry and metadata system

Reviewed Changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/destregistry/providers/destawss3/destawss3.go Core S3 provider implementation with validation, configuration resolution, and publisher
internal/destregistry/providers/destawss3/destawss3_validate_test.go Comprehensive validation tests for S3 destination configuration
internal/destregistry/providers/destawss3/destawss3_format_test.go Tests for S3 object formatting and key generation
internal/destregistry/providers/default.go Registration of S3 provider in the default provider registry
internal/destregistry/metadata/providers/aws_s3/metadata.json Metadata definition for S3 destination configuration fields
internal/destregistry/metadata/providers/aws_s3/instructions.md User documentation for configuring S3 destinations
go.mod AWS SDK v2 dependencies and version updates
examples/docker-compose/compose-awss3.yml Docker Compose example for S3 with LocalStack
docs/pages/overview.mdx Documentation updates to include S3 in supported destinations
docs/apis/openapi.yaml OpenAPI schema definitions for S3 destination configuration
cmd/destinations/awss3/main.go Demo application for monitoring S3 bucket objects
build/test/compose.yml Test environment updates to include S3 service

Comment on lines +216 to +224
hasher := sha256.New()

if _, err := hasher.Write(payload); err != nil {
return "", err
}

checksum := hasher.Sum(nil)

return base64.StdEncoding.EncodeToString(checksum), nil
Copy link
Preview

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checksum calculation in getChecksums uses sha256.New() and hasher.Write(), but the Format method already calls sha256.Sum256() directly. Consider using a consistent approach - either use the crypto/sha256 package consistently or extract the checksum logic to a helper function to avoid different implementations.

Suggested change
hasher := sha256.New()
if _, err := hasher.Write(payload); err != nil {
return "", err
}
checksum := hasher.Sum(nil)
return base64.StdEncoding.EncodeToString(checksum), nil
checksum := sha256.Sum256(payload)
return base64.StdEncoding.EncodeToString(checksum[:]), nil

Copilot uses AI. Check for mistakes.

Comment on lines +52 to +53
hasher := sha256.Sum256(data)
expectedChecksum := base64.StdEncoding.EncodeToString(hasher[:])
Copy link
Preview

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses sha256.Sum256() while the actual implementation in getChecksums() uses sha256.New() and hasher.Write(). This inconsistency could lead to test failures if the implementations produce different results. Use the same approach as the production code for consistency.

Suggested change
hasher := sha256.Sum256(data)
expectedChecksum := base64.StdEncoding.EncodeToString(hasher[:])
hasher := sha256.New()
hasher.Write(data)
expectedChecksum := base64.StdEncoding.EncodeToString(hasher.Sum(nil))

Copilot uses AI. Check for mistakes.

{
"key": "include_event_id",
"type": "checkbox",
"label": "Include Event ID in Object Key ",
Copy link
Preview

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a trailing space after 'Key' in the label text.

Suggested change
"label": "Include Event ID in Object Key ",
"label": "Include Event ID in Object Key",

Copilot uses AI. Check for mistakes.

sensitive: true,
},
]
- type: "aws_s3"
Copy link
Preview

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aws_s3 destination type appears to be duplicated in the OpenAPI example responses (lines 1747-1783 and 1784-1820). This duplication should be removed to avoid confusion and maintain clean API documentation.

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@alexluong alexluong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, thanks a lot for taking the time to open the PR. Everything looks really good from what I can see. My 2 main notes are

  1. the supported config, specifically around key configuration; maybe a template would work better, and also for consistency purposes too
  2. we should include the publisher test suite

Happy to assist if you run into any issue or if you have any further questions.

Comment on lines +27 to +30
KeyPrefix string
KeySuffix string
IncludeTimestamp bool
IncludeEventID bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider using a template language for this?

We use JMESPath for AWS Kinensis partition key. I believe JMESPath is decently adopted within the AWS ecosystem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if we replace these config options with KeyTemplate?

In Kinesis, the input of the template is the same as the full payload document. In Kinesis, there's a Outpost-specific configuration so the payload could either be data or be a map of both data and metadata.

We can do something similar-ish here for consistency sake?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small note since I'm thinking on this a bit further. There's a small nuance or decision around the input of the template function here. I'll share the Kinesis destination approach first and discuss some pros / cons.

In Kinesis, the event payload is the payload of the template function for its partition key. The reason is the structure is something the end-user should expect, so it's something they can use the customize the partition key accordingly. Furthermore, there's an Outpost-specific config (not destination config) where the operator can decide to include metadata in the Kinesis event payload or not.

Back to S3, we can consider something similar, using event.data as the payload. However, it can potentially miss critical metadata that the user may want to use to customize the template such as event-id, timestamp, or topic.

What do you think is the right balance of information / customization while maintaining an API surface or the DX that makes sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, say this is the event

{
    "id": "evt_1",
    "tenant_id": "ten_1",
    "topic": "user.created",
    "eligible_for_retry": false,
    "metadata": {
        "my_metadata": "metadata",
    },
    "data": {
        "my_data": "data"
    }
}

In the current S3 implementation, this would be the payload of the stored event:

{
  "my_data": "data"
}

I'm concerned this may not be sufficient for the end-users as they may not be able to meaningfully come up with a template that makes sense for them.

We can consider something like this:

{
  "metadata": {
    "my_metadata: "metadata",
    // system metadata
    "event-id": "evt_1",
    "topic": "user.created",
    "timestamp": 123456,
  },
  "data": {
    "my_data": "data"
  }
}

This would have enough information for the user, but it's a bit unintuitive and unfamiliar to the end users as they may not understand what is available within metadata itself.

Copy link
Contributor

@alexbouchardd alexbouchardd Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd explore is to set a default value for the field, which can also be displayed in the UI with a sensible default. That educates the user about the shape and properties of the field. What's the shape of the data actually saved to S3? I'd expect it to include the metadata? Ideally the data shape used for the template should match the data persisted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR, the shape seems to be just event.data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue it should include the metadata

{
  "metadata": {
    "my_metadata: "metadata",
    "event-id": "evt_1",
    "topic": "user.created",
    "timestamp": 123456,
  },
  "data": {
    "my_data": "data"
  }
}

Comment on lines +13 to +15
awssdk "github.com/aws/aws-sdk-go-v2/aws"
awsconfig "github.com/aws/aws-sdk-go-v2/config"
awscreds "github.com/aws/aws-sdk-go-v2/credentials"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit but is there a reason why we alias these import statements? Do let me know if it's convention within the Go AWS SDK. I'm curious to learn more!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readability mostly. Just slightly nicer to read. But no, not a convention.


_, err := parseStorageClass(sc)
if err != nil {
return nil, nil, fmt.Errorf("invalid storage class %q: %w", sc, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return destregistry.NewErrDestinationValidation if possible, see destwebhook for an example

region := destination.Config["region"]
return destregistry.DestinationTarget{
Target: fmt.Sprintf("%s in %s", bucket, region),
TargetURL: "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we construct the AWS console URL of this S3 bucket?

Bucket: awssdk.String(p.bucket),
Key: awssdk.String(key),
Body: bytes.NewReader(data),
Metadata: event.Metadata,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the MakeMetadata function from the base publisher which includes some Outpost-specific metadata.

@@ -0,0 +1,71 @@
package destawss3_test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the format & validation test, can we also include the publish test? Outpost provides a standardized publisher test suite that all destinations should pass. Happy to share more or elaborate more on this if you have any questions.

Comment on lines +20 to +23
- AWS_S3_ENDPOINT=http://aws:4566
- AWS_S3_REGION=us-east-1
- AWS_S3_ACCESS_KEY_ID=test
- AWS_S3_SECRET_ACCESS_KEY=test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting approach. So we're embedding the env variables so that the SDK will always connect to LocalStack instead of real AWS environment?

I think it's a pretty neat approach. So far, when implementing AWS destinations, we support a hidden config value of "endpoint" to configure this. I'm not sure if that is something we should consider supporting here, mainly for consistency sake. But I do like this idea a lot. But I don't imagine this works well during the local dev but only during the examples? Any tips from your side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants