Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue where provider config didn't handle complex types #172

Closed
wants to merge 3 commits into from

Conversation

blampe
Copy link
Contributor

@blampe blampe commented Dec 22, 2023

Some of our languages, specifically Node and Python, send JSON-encoded properties in CheckConfig and Configure calls. We don't currently handle this situation, so these clients aren't able to interact with our providers if they happen to have non-trivial configurations.

YAML doesn't share this behavior, which is why the existing gRPC tests didn't catch this problem. It's not clear to me why Node and Python do this, or if it's possible to change their behavior, but that might be a separate conversation.

This PR adds a node SDK for the gRPC tests along with failing test cases:

Diagnostics:
  pulumi:providers:config (provider):
    error: pulumi:providers:config resource 'provider': property b value {true} has a problem: Field 'b' on 'config.Config' must be a 'bool'; got 'string' instead
    error: pulumi:providers:config resource 'provider': property i value {42} has a problem: Field 'i' on 'config.Config' must be a 'int'; got 'string' instead
    error: pulumi:providers:config resource 'provider': property m value {{"fizz":"buzz"}} has a problem: Field 'm' on 'config.Config' must be a 'map[string]string'; got 'string' instead
    error: pulumi:providers:config resource 'provider': property a value {&{{["one","two"]}}} has a problem: Field 'a' on 'config.Config' must be a '[]string'; got 'string' instead
    error: pulumi:providers:config resource 'provider': property n value {&{{{"s":"foo","b":true,"i":42,"m":{"fizz":"buzz"},"a":["one","two"]}}}} has a problem: Field 'n' on 'config.Config' must be a 'config.ConfigNested'; got 'string' instead
    error: pulumi:providers:config resource 'provider': property an value {&{{[{"s":"bar","b":false,"i":7,"m":{"fizz":"boo"},"a":["three"]}]}}} has a problem: Field 'an' on 'config.Config' must be a '[]config.ConfigNested'; got 'string' instead
    error: pulumi:providers:config resource 'provider': property db value {true} has a problem: Field 'db' on 'config.Config' must be a 'bool'; got 'string' instead
    error: pulumi:providers:config resource 'provider': property di value {42} has a problem: Field 'di' on 'config.Config' must be a 'int'; got 'string' instead

To address the underlying problem, we vendor tfbridge's ConfigEncoding as an internal package with minor modifications to make it compatible with the go-provider. As far as I can tell only the unmarshal logic is needed, but I'd like @t0yv0 to weigh in as I'm sure there must be a reason for the custom marshaling...

Fixes #171

@t0yv0
Copy link
Member

t0yv0 commented Dec 26, 2023

@blampe out of curiosity and concern that we have some more latent bugs around this, can you shed some light on how you're arriving at the "YAML doesn't share this behavior" conclusion? I've tried the following program:

name: aws-yaml-configs
runtime: yaml
description: A minimal AWS Pulumi YAML program
outputs:
  # Export the name of the bucket
  bucketName: ${my-bucket.id}
resources:
  provider:
    type: pulumi:providers:aws
    properties:
      defaultTags:
        tags:
          foo: bar
    options:
      version: 6.17.0
  my-bucket:
    type: aws:s3:Bucket
    options:
      provider: ${provider}

According to PULUMI_DEBUG_GRPC logs, all of Configure, CheckConfig and DiffConfig methods are receiving JSON-in-proto encoded "defaultTags" value for this program, I'm listing CheckConfig below:

{
  "method": "/pulumirpc.ResourceProvider/CheckConfig",
  "request": {
    "urn": "urn:pulumi:dev::aws-yaml-configs::pulumi:providers:aws::provider",
    "olds": {
      "defaultTags": "{\"tags\":{\"foo\":\"bar\"}}",
      "skipCredentialsValidation": "false",
      "skipMetadataApiCheck": "true",
      "skipRegionValidation": "true",
      "version": "6.17.0"
    },
    "news": {
      "defaultTags": {
        "tags": {
          "foo": "bar"
        }
      },
      "version": "6.17.0"
    }
  },
  "response": {
    "inputs": {
      "defaultTags": "{\"tags\":{\"foo\":\"bar\"}}",
      "skipCredentialsValidation": "false",
      "skipMetadataApiCheck": "true",
      "skipRegionValidation": "true",
      "version": "6.17.0"
    }
  },
  "metadata": {
    "kind": "resource",
    "mode": "client",
    "name": "aws"
  }
}

So it appears to me that YAML is consistent with Node here (on 3.97.0). I've dug around and I don't think we have a request on file to change this behavior to pass the data using the more intuitive direct encoding, which certainly would be preferable, but costly due to bw-compat requirements. The closest I found is pulumi/pulumi#12773 I filed a while back, where this encoding puts functional limitations on how we can mark secrets.

@t0yv0
Copy link
Member

t0yv0 commented Dec 26, 2023

minor modifications to make it compatible with the go-provider

Mind calling these out for easier review? Thanks a ton!

@@ -0,0 +1 @@
{"name":"config","version":"0.1.0","config":{"variables":{"a":{"type":"array","items":{"type":"string"}},"an":{"type":"array","items":{"$ref":"#/types/config:index:ConfigNested"}},"b":{"type":"boolean","defaultInfo":{"environment":["BOOL"]}},"db":{"type":"boolean","default":true},"di":{"type":"integer","default":42},"ds":{"type":"string","default":"defString"},"i":{"type":"integer","defaultInfo":{"environment":["INT"]}},"m":{"type":"object","additionalProperties":{"type":"string"}},"n":{"$ref":"#/types/config:index:ConfigNested"},"s":{"type":"string","defaultInfo":{"environment":["STRING"]}}},"defaults":["s","b","i","m","a","n","an"]},"types":{"config:index:ConfigNested":{"properties":{"a":{"type":"array","items":{"type":"string"}},"b":{"type":"boolean"},"i":{"type":"integer"},"m":{"type":"object","additionalProperties":{"type":"string"}},"s":{"type":"string"}},"type":"object","required":["s","b","i","m","a"]}},"provider":{"properties":{"a":{"type":"array","items":{"type":"string"}},"an":{"type":"array","items":{"$ref":"#/types/config:index:ConfigNested"}},"b":{"type":"boolean","defaultInfo":{"environment":["BOOL"]}},"db":{"type":"boolean","default":true},"di":{"type":"integer","default":42},"ds":{"type":"string","default":"defString"},"i":{"type":"integer","defaultInfo":{"environment":["INT"]}},"m":{"type":"object","additionalProperties":{"type":"string"}},"n":{"$ref":"#/types/config:index:ConfigNested"},"s":{"type":"string","defaultInfo":{"environment":["STRING"]}}},"required":["s","b","i","m","a","n","an"],"inputProperties":{"a":{"type":"array","items":{"type":"string"}},"an":{"type":"array","items":{"$ref":"#/types/config:index:ConfigNested"}},"b":{"type":"boolean","defaultInfo":{"environment":["BOOL"]}},"db":{"type":"boolean","default":true},"di":{"type":"integer","default":42},"ds":{"type":"string","default":"defString"},"i":{"type":"integer","defaultInfo":{"environment":["INT"]}},"m":{"type":"object","additionalProperties":{"type":"string"}},"n":{"$ref":"#/types/config:index:ConfigNested"},"s":{"type":"string","defaultInfo":{"environment":["STRING"]}}},"requiredInputs":["s","b","i","m","a","n","an"]},"resources":{"config:index:Get":{"properties":{"config":{"type":"string"}},"required":["config"]}}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://go-review.googlesource.com/c/go/+/23353/3/src/cmd/go/test.go

There're some special treatments given to "testdata" folder, and while not necessary here, perhaps can consider adopting to be conventional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't testdata, it's just the provider's schema. We seem to have a convention of checking that in (which is good IMO) so I did that here, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. it's hard to read on one line, consider indenting or using YAML. Just a nit though of course.

pulumi login --cloud-url "file://$(CURDIR)/state"
pulumi stack select organization/test/test --create
PULUMI_DEBUG_GRPC=grpc.json pulumi up --yes
pulumi destroy --yes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something special about indent here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing special, just copy/pasted from Ian's original script.

# yarn lockfile v1


"@grpc/[email protected]":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important to pin these deps? Curious.. Should we consider not checking in the lock file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests like this should include a lock file to keep things as reproducible as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the stance but most of our tests don't at the moment. I think we're a bit on the fence on this.

@t0yv0
Copy link
Member

t0yv0 commented Dec 26, 2023

Looking quickly, these gRPC tests feel a little bit underpowered to pin the desired behavior, let me explain... What I'd like to know is that, as a user of pulumi-go-provider, I can pass complex nested props into the provider configuration. If gRPC methods were "pure" or encoded enough information in the response it would be all we need here, however the interesting bit here is the "side-effect", when I configure the provider, does this configuration take effect? I don't see asserts looking into that.. It's relatively easy to regress returning OK from configure while passing wrongly-parsed data to the provider.

What I'm looking for is something like this that customizes the test provider to assert that it behaves as you'd expect. Please feel free to prove me wrong here, perhaps there's a test that is equivalent in strength I'm just not understanding it.

@t0yv0
Copy link
Member

t0yv0 commented Dec 26, 2023

be a reason for the custom marshaling...

Absolutely. https://github.com/pulumi/pulumi-terraform-bridge/blob/master/pkg/tfbridge/provider.go#L372 is the context that uses custom marshaling. Part of the job of CheckConfig seems to be to not just check but automatically modify the provider configuration, in the bridged provider case we landed on the idea that it's CheckConfig responsibility to ensure that properties that are marked as secret in the schema are indeed secret in Pulumi (see MarkSchemaSecrets). This pass has difficulty with secrets, hence the custom code.

There's an interesting question on what happens if we used standard marshaling instead. At this point I'm not entirely sure this would work, but it likely could be made to work with some care. This would need some legwork to test out backwards compatibility for (in the bridge case especially). We'd need to make sure that ConfigEncoding can parse back the data produced by regular encoding as results of CheckConfig are passed to Configure:

CheckConfig(cfg)
   return stdenc.Marshal(modify(configEng.Unmarshal(cfg)))

Configure(cfg):
  configEnc.Unmarshal(cfg)

Or else we could switch Configure to use standard encoding to Unmarshal. It's an interesting possibility, if this works in a bw-compat manner it could lead to simplifications as well as more precise secret handling.

@t0yv0
Copy link
Member

t0yv0 commented Dec 26, 2023

This LGTM generally but I had some questions 🙏 Also consider not making the package internal, but put some disclaimer on its stability. If we could link to it from the bridge it could help sharing continued fixes on this code. I don't think there's anything particularly different between bridged and non-bridged providers around this, and if there is I'd love to figure out how to factor it out and make explicit.

@iwahbe
Copy link
Member

iwahbe commented Dec 28, 2023

Related: pulumi/pulumi-terraform-bridge#423

@blampe
Copy link
Contributor Author

blampe commented Jan 2, 2024

Thanks for the feedback @t0yv0!

Absolutely. https://github.com/pulumi/pulumi-terraform-bridge/blob/master/pkg/tfbridge/provider.go#L372 is the context that uses custom marshaling. Part of the job of CheckConfig seems to be to not just check but automatically modify the provider configuration, in the bridged provider case we landed on the idea that it's CheckConfig responsibility to ensure that properties that are marked as secret in the schema are indeed secret in Pulumi (see MarkSchemaSecrets). This pass has difficulty with secrets, hence the custom code.

This is helpful context. I was also curious why this was originally introduced, like if it was necessary for Terraform compatibility or something. AFAICT the behavior was originally introduced in pulumi/pulumi-terraform-bridge@b444691 due to a limitation with Pulumi's provider config at the time. Today we can support rich config, so we no longer need the workaorund. Given the choice, I'd much rather fix the root issue instead of carrying complexity forward. But that might not be feasible.

There's an interesting question on what happens if we used standard marshaling instead. At this point I'm not entirely sure this would work, but it likely could be made to work with some care. This would need some legwork to test out backwards compatibility for (in the bridge case especially).

Right. It seems like we have all the compatibility pieces we need to use standard marshaling throughout. (With some special handling for JSON-encoded config already persisted in state files -- or does that not matter, do we read the config directly from the program each time?)

We'd need to make sure that ConfigEncoding can parse back the data produced by regular encoding as results of CheckConfig are passed to Configure:

CheckConfig(cfg)
   return stdenc.Marshal(modify(configEng.Unmarshal(cfg)))

Configure(cfg):
  configEnc.Unmarshal(cfg)

Or else we could switch Configure to use standard encoding to Unmarshal. It's an interesting possibility, if this works in a bw-compat manner it could lead to simplifications as well as more precise secret handling.

Unless I'm mistaken YAML clients are already using standard marshaling, which would mean ConfigEncoding is already able to handle both encodings. For example this nested property doesn't get JSON encoded in the request, only the response:

{
  "method": "/pulumirpc.ResourceProvider/CheckConfig",
  "request": {
    "urn": "urn:pulumi:dev::provider-config::pulumi:providers:tpsdkv2::provider",
    "olds": {},
    "news": {
      "objectProp": {
        "strNestedProp": "jH=2vzcQnHy%pteU"
      },
      "strConfigProp": "jH=2vzcQnHy%pteU"
    }
  },
  "response": {
    "inputs": {
      "objectProp": {
        "4dabf18193072939515e22adb298388d": "1b47061264138c4ac30d75fd1eb44270",
        "value": "{\"strNestedProp\":\"jH=2vzcQnHy%pteU\"}"
      },
      "strConfigProp": "jH=2vzcQnHy%pteU"
    }
  },

Compare that with a Node client's request which does JSON encoding:

{
  "method": "/pulumirpc.ResourceProvider/CheckConfig",
  "request": {
    "urn": "urn:pulumi:test::test::pulumi:providers:config::provider",
    "olds": {},
    "news": {
      "a": "[\"one\",\"two\"]",
      ...
    }
  },

At a minimum, it seems like we should be able to update codegen to stop JSON-encoding these fields. That will help simplify things from the client direction, but I think the bridge will still need ConfigEncoding.UnmarshalProperties for backwards compatibility with state files.

I think the order of operations would look like:

  1. Update codegen to stop JSON-encoding config fields. This will make SDK behavior match YAML's behavior for requests.
  2. Update tfbridge to use plugin.MarshalProperties instead of configencoding in CheckConfig. I believe this is already compatible with configencoding, but we should confirm.
  3. (Breaking) Drop configencoding.UnmarshalProperties in favor of plugin.UnmarshalProperties in Configure and CheckConfig.

Steps 1 and 2 could happen at any time (and in any order?). Assuming we need back-compat with state files, step 3 would need to happen ~years later to give state files an opportunity to update.

Also consider not making the package internal, but put some disclaimer on its stability. If we could link to it from the bridge it could help sharing continued fixes on this code. I don't think there's anything particularly different between bridged and non-bridged providers around this, and if there is I'd love to figure out how to factor it out and make explicit.

I'd like to keep it internal until we're actually certain this is usable from the bridge. I'm skeptical that it's possible because the bridge's schema semantics don't exactly match infer's. For example the bridge always knows the field's primitive type, but this isn't the case with infer (at least not currently) -- we might only know that a field has a Ref to another type, but we don't know if the ref is primitive. IIRC this is partly why I had to omit marshaling, because we don't expose the same depth of type information.

Looking quickly, these gRPC tests feel a little bit underpowered to pin the desired behavior, let me explain... What I'd like to know is that, as a user of pulumi-go-provider, I can pass complex nested props into the provider configuration. If gRPC methods were "pure" or encoded enough information in the response it would be all we need here, however the interesting bit here is the "side-effect", when I configure the provider, does this configuration take effect? I don't see asserts looking into that.. It's relatively easy to regress returning OK from configure while passing wrongly-parsed data to the provider.

This is true. I'm more interested in the API contract right now since that's currently broken, but it shouldn't be too hard to assert the provider has an expected configuration after these operations are replayed. I'll look into that.

@blampe
Copy link
Contributor Author

blampe commented Jan 2, 2024

@blampe out of curiosity and concern that we have some more latent bugs around this, can you shed some light on how you're arriving at the "YAML doesn't share this behavior" conclusion? I've tried the following program:

@t0yv0 sorry I missed this one. The difference between YAML/Node is in the news request parameter:

❯ cat Pulumi.yaml
name: aws-config
runtime: yaml
description: A minimal AWS Pulumi YAML program
resources:
  provider:
    type: pulumi:providers:aws
    properties:
      defaultTags:
        tags:
          foo: bar
    options:
      version: 6.17.0
{
  "method": "/pulumirpc.ResourceProvider/CheckConfig",
  "request": {
    "urn": "urn:pulumi:dev::aws-config::pulumi:providers:aws::provider",
    "olds": {},
    "news": {
      "defaultTags": {
        "tags": {
          "foo": "bar"
        }
      },
      "version": "6.17.0"
    }
  },
  "response": {
    "inputs": {
      "defaultTags": "{\"tags\":{\"foo\":\"bar\"}}",
      "skipCredentialsValidation": "false",
      "skipMetadataApiCheck": "true",
      "skipRegionValidation": "true",
      "version": "6.17.0"
    }
  },
  "metadata": {
    "kind": "resource",
    "mode": "client",
    "name": "aws"
  }
}

Versus Node which JSON-encodes news.defaultTags:

 cat index.ts
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const provider = new aws.Provider("provider", {defaultTags: {
    tags: {
        foo: "bar",
    },
}});
{
  "method": "/pulumirpc.ResourceProvider/CheckConfig",
  "request": {
    "urn": "urn:pulumi:dev::aws-config::pulumi:providers:aws::provider",
    "olds": {},
    "news": {
      "defaultTags": "{\"tags\":{\"foo\":\"bar\"}}",
      "skipCredentialsValidation": "false",
      "skipMetadataApiCheck": "true",
      "skipRegionValidation": "true",
      "version": "6.17.0"
    }
  },
  "response": {
    "inputs": {
      "defaultTags": "{\"tags\":{\"foo\":\"bar\"}}",
      "skipCredentialsValidation": "false",
      "skipMetadataApiCheck": "true",
      "skipRegionValidation": "true",
      "version": "6.17.0"
    }
  },
  "metadata": {
    "kind": "resource",
    "mode": "client",
    "name": "aws"
  }
}

@t0yv0
Copy link
Member

t0yv0 commented Jan 2, 2024

Oh wow! YAML is already sending unpacked news. That is very interesting, I missed that.

The rollout plan would make sense, just need to be careful.

I'll reread the bridge-side ConfigEncoding tomorrow to see if there's any corner cases where it'd parse data differently than vanilla decoder.

@iwahbe
Copy link
Member

iwahbe commented Jan 23, 2024

@t0yv0 @blampe Are we still trying to make progress here?

@t0yv0
Copy link
Member

t0yv0 commented Jan 23, 2024

We'd rather have pulumi/pulumi#15032 - waiting on platform there

@blampe
Copy link
Contributor Author

blampe commented Feb 29, 2024

Since pulumi/pulumi#15032 will likely take a while longer, I'll be vendoring this workaround as an internal package to unblock myself. (Note for future readers: this is bad and you should not do this!)

We definitely shouldn't upstream this, since we don't want this JSON protocol to proliferate any further. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically unmarshal complex configuration types
3 participants