Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation for CloudFormation Custom Resource Emulator #1806

Merged
merged 7 commits into from
Nov 8, 2024

Conversation

flostadler
Copy link
Contributor

@flostadler flostadler commented Nov 7, 2024

This PR adds support for CloudFormation Custom Resource to the aws-native provider. It implements an emulator that enables Pulumi programs to interact with Lambda-backed CloudFormation Custom Resources.

A CloudFormation custom resource is essentially an extension point to run arbitrary code as part of the CloudFormation lifecycle. It is similar in concept to the Pulumi Command Provider, the difference being that CloudFormation CustomResources are executed in the Cloud; either through Lambda or SNS.

For the first implementation we decided to limit the scope to Lambda backed Custom Resources, because the SNS variants are not widely used.

Custom Resource Protocol

The implementation follows the CloudFormation Custom Resource protocol. I derived the necessary parts by combining information from the docs, CDKs CustomResource Framework and trial&error.

Notable aspects of that protocol are:

Custom Resource Lifecycle

sequenceDiagram
    participant A as aws-native
    participant S3 as S3 Bucket
    participant L as Lambda
    
    %% Create Flow
    Note over A,L: Create Operation
    A->>S3: Generate presigned URL
    A->>L: Invoke with CREATE event
    activate L
    loop Until response found or timeout
        A->>S3: Poll for response
        L-->>S3: Upload response
    end
    deactivate L
    A->>S3: Fetch response
    alt Success
        A->>A: Store PhysicalId & outputs
    else Failure
        A->>A: Return error
    end

    %% Update Flow
    Note over A,L: Update Operation
    A->>S3: Generate presigned URL
    A->>L: Invoke with UPDATE event
    activate L
    loop Until response found or timeout
        A->>S3: Poll for response
        L-->>S3: Upload response
    end
    deactivate L
    A->>S3: Fetch response
    alt Success
        A->>A: Check PhysicalId
        alt ID Changed
            A->>S3: Generate presigned URL for cleanup
            A->>L: Invoke with DELETE event for old resource
            activate L
            loop Until cleanup response found or timeout
                A->>S3: Poll for cleanup response
                L-->>S3: Upload cleanup response
            end
            deactivate L
            A->>S3: Fetch cleanup response
        end
    else Failure
        A->>A: Return error
    end

    %% Delete Flow
    Note over A,L: Delete Operation
    A->>S3: Generate presigned URL
    A->>L: Invoke with DELETE event
    activate L
    loop Until response found or timeout
        A->>S3: Poll for response
        L-->>S3: Upload response
    end
    deactivate L
    A->>S3: Fetch response
    alt Success
        A->>A: Return success
    else Failure
        A->>A: Return error
    end
Loading

Reviewer Notes

Key areas to review:

  1. Error handling in the response collection mechanism
  2. Timeout management, especially for the Update lifecycle
  3. Documentation completeness and accuracy

Exposing this resource and schematizing it is part of this PR #1807.
Automatically cleaning up the response objects is not included in this PR in order to keep its size manageable. Implementing this is tracked here: #1813.

Please pay special attention to:

  • S3 response collection mechanism security
  • State management during updates
  • Cleanup handling when physical resource IDs change

Testing

Related Issues

@flostadler
Copy link
Contributor Author

flostadler commented Nov 7, 2024

Copy link
Contributor

github-actions bot commented Nov 7, 2024

Does the PR have any schema changes?

Looking good! No breaking changes found.
No new resources/functions.

@flostadler flostadler changed the base branch from flostadler/cfn-custom-resource-aws-clients to flostadler/cfn-custom-resource-helpers November 7, 2024 17:32
@flostadler flostadler force-pushed the flostadler/cfn-custom-resource-impl branch from baf7ba4 to 1d4b118 Compare November 7, 2024 17:32
Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 59.43152% with 157 lines in your changes missing coverage. Please review.

Project coverage is 48.13%. Comparing base (04539b4) to head (b184dc0).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
provider/pkg/resources/cfn_custom_resource.go 59.43% 142 Missing and 15 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1806      +/-   ##
==========================================
+ Coverage   46.81%   48.13%   +1.32%     
==========================================
  Files          42       43       +1     
  Lines        6167     6554     +387     
==========================================
+ Hits         2887     3155     +268     
- Misses       3052     3156     +104     
- Partials      228      243      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@flostadler flostadler self-assigned this Nov 7, 2024
@flostadler flostadler requested review from t0yv0, corymhall and a team November 7, 2024 18:06
@flostadler flostadler force-pushed the flostadler/cfn-custom-resource-impl branch from 59357d1 to 94cfa06 Compare November 8, 2024 10:19
@flostadler flostadler marked this pull request as ready for review November 8, 2024 11:54
@flostadler flostadler force-pushed the flostadler/cfn-custom-resource-helpers branch from 60b5047 to 4cd4ddb Compare November 8, 2024 12:01
@flostadler flostadler force-pushed the flostadler/cfn-custom-resource-impl branch from 94cfa06 to 19b72f2 Compare November 8, 2024 12:01
Base automatically changed from flostadler/cfn-custom-resource-helpers to master November 8, 2024 13:52
@flostadler flostadler force-pushed the flostadler/cfn-custom-resource-impl branch from 19b72f2 to bc9e116 Compare November 8, 2024 13:53
Copy link
Contributor

@corymhall corymhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor comments, otherwise this is awesome! Great work!

provider/pkg/resources/cfn_custom_resource.go Outdated Show resolved Hide resolved
provider/pkg/resources/cfn_custom_resource.go Outdated Show resolved Hide resolved
provider/pkg/resources/cfn_custom_resource.go Outdated Show resolved Hide resolved
// - Initiates cleanup of old resource
// - Sends DELETE event for old PhysicalResourceId
// 5. Returns updated properties and new PhysicalResourceId
func (c *cfnCustomResource) Update(ctx context.Context, urn urn.URN, id string, inputs, oldInputs, state resource.PropertyMap, timeout time.Duration) (resource.PropertyMap, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for us to get the resource options here, specifically the retainOnDelete option? I know there were times that I would set the retainOnDelete to true for custom resources because they didn't handle delete well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only MLCs have access to retainOnDelete it seems:
https://github.com/pulumi/pulumi/blob/master/proto/pulumi/provider.proto#L473

For CustomResources that are traditional I don't think the engine ever cooperates with the provider on this. The engine handles it all engine-side.

},
"serviceToken": {
Description: "The service token to use for the Custom Resource. The service token is invoked when the resource is created, updated, or deleted.\n" +
"This can be a Lambda Function ARN with optional version or alias identifiers.\n\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence here is a bit confusing. "This can be a Lambda Function ARN", perhaps give a quick example?

Also should this be marked Secret/Sensitive in the scehema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda function ARNs (including alias or version) are a very common concept in AWS. I don't think we should go into too much detail about it here.

This is also not a secret, ARNs are just unique IDs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused that this is called "serviceToken". This made me think of access tokens or some such. If there's precedent of calling lambda ARNs "tokens" then TIL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't like the name... It's what CloudFormation called this property. I think there's precedence to stick to their naming patterns given we're trying to emulate their behavior/API

// Read returns the current inputs and outputs of the custom resource because CFN custom resources do not store state.
// They are just a stateless wrapper around a Lambda function or SNS topic.
func (c *cfnCustomResource) Read(ctx context.Context, urn urn.URN, id string, oldInputs resource.PropertyMap, oldState resource.PropertyMap) (resource.PropertyMap, resource.PropertyMap, bool, error) {
if len(oldState) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here that len(oldState) is trying to guess if we are in a Read-for-import.
The developer docs are improving but not quite there yet but are being improved (cc @lunaris )

https://pulumi-developer-docs.readthedocs.io/latest/developer-docs/providers/implementers-guide.html#read


type Clock interface {
Now() time.Time
Since(time.Time) time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Go likes minimal interfaces so you only really need Now(), as Since can be computed from Now in a func.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that would clutter the implementation. The purpose of this interface is to make the code testable. I'd rather just pass through to the stdlib than add custom code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, I guess this is a bit of a moot point from my side given that you can subtract times and get the duration that way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's really a small nit, feel free to disregard.

// It returns the inputs, validation failures, and an error if the inputs cannot be unmarshalled.
func (c *cfnCustomResource) Check(ctx context.Context, urn urn.URN, randomSeed []byte, inputs resource.PropertyMap, state resource.PropertyMap, defaultTags map[string]string) (resource.PropertyMap, []ValidationFailure, error) {
var typedInputs CfnCustomResourceInputs
_, err := resourcex.Unmarshal(&typedInputs, inputs, resourcex.UnmarshalOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not fully trust this function, and reading the docs quickly is not very reassuring some of the concerns I have.

I think we might need to think about this a bit (probably out of scope of this PR but before being fully confident).

What happens with unknowns, dependencies and secrets?

For secrets it's a design question - custom resources do not support them right? So we need to support secrets through the black-box transformation I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already quite heavily using the resourcex.Decode and resourcex.Unmarshal methods in the provider. From what I can tell they work fine.

Additionally we're not using the unmarshalled typedInputs to construct the outputs, but rather use the inputs resource.PropertyMap to construct the outputs.
From that aspect, we're properly handling secrets.

Unknowns are a good point though. I'll try to double check whether this can receive unknowns if SupportsPreview: false.

// - On success: Returns PhysicalResourceId and properties
// - On failure: Returns error with reason
// - Handles `NoEcho` for sensitive data
func (c *cfnCustomResource) Create(ctx context.Context, urn urn.URN, inputs resource.PropertyMap, timeout time.Duration) (*string, resource.PropertyMap, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this is ever called in Preview, but looks like the provider is SupportsPreview: false so we're good. It never is! No unknowns then.


// Check validates the inputs of the resource and applies default values if necessary.
// It returns the inputs, validation failures, and an error if the inputs cannot be unmarshalled.
func (c *cfnCustomResource) Check(ctx context.Context, urn urn.URN, randomSeed []byte, inputs resource.PropertyMap, state resource.PropertyMap, defaultTags map[string]string) (resource.PropertyMap, []ValidationFailure, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might still receive unknowns but I'm not 100% sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check. It's not a big deal if we stick to using the PropertyMap for Check

},
},
{
name: "SecretResponse",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to see here what is returned to the engine from Create in this test case?

Perhaps with hexops/autogold if it's too much trouble to write by hand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're asserting this with the assertion in line 348. In case noEcho is set to true, we expect the data prop to be marked as a secret.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok!


expectedError := fmt.Errorf("failed to invoke lambda")
stackID := "stack-id"
serviceToken := "arn:aws:lambda:us-west-2:123456789012:function:my-function"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the serviceToken example, I see. Indeed it looks like a lambda ARN. Interesting.

Copy link
Member

@t0yv0 t0yv0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@flostadler flostadler force-pushed the flostadler/cfn-custom-resource-impl branch from cdfdb9b to 914125c Compare November 8, 2024 16:49
@flostadler flostadler merged commit 14a21ae into master Nov 8, 2024
18 checks passed
@flostadler flostadler deleted the flostadler/cfn-custom-resource-impl branch November 8, 2024 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants