Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking for alternate validation solution, without the need of extension options within descriptor #42

Closed
dravula1 opened this issue Oct 6, 2023 · 3 comments
Assignees

Comments

@dravula1
Copy link

dravula1 commented Oct 6, 2023

Description

Validator skips/ignores as unable to find descriptor with FieldOptions extension(s). Descriptor was generated using confluent java api OR protoc command with --descriptor_set_out param

With current client architecture, .proto schema files get stored in the Confluent Schema registry. And when reading & parsing the .proto schemas from schema registry, below two options are tried and the generated descriptors does not serve the purpose.

Option1: service uses confluent java library, which is resulting in a descriptor with NO FieldOptions extension(s).

field {
  name: "uuid_field"
  number: 1
  type: TYPE_STRING
}

Option2: Generated descriptor file using protoc command with --descriptor_set_out=<file_name> option parameter. With this option, custom FieldOptions are parsed as unknownFields and options are fetched with extension numbers, instead of FieldOption extension(s).

Command:
protoc --retain_options --include_imports -I <base_dir_with_main_and_imports_proto_files> --descriptor_set_out=<descriptor_file_name> main.proto

protoc command results in the descriptor with below data,

field {
  name: "uuid_field"
  number: 1
  label: LABEL_OPTIONAL
  type: TYPE_STRING
  options {
    1159: {
      14: {
        22: 1
      }
    }
  }
  json_name: "uuidField"
}  

Expected data in the descriptor

field {
  name: "uuid_field"
  number: 1
  label: LABEL_OPTIONAL
  type: TYPE_STRING
  options {
    [buf.validate.field] {
      string {
        uuid: true
      }
    }
  }
}  

Validator psuedo code

var schema = // Pulls proto schema from confluent schema registry, parses using confluent java library to generate descriptor (descriptor has missing options due to limitation in confluent java api)
var jsonString = "{\"uuid_field\":\"123\"}";
var data = // converts to bytestring, by encoding fields with tags
var dm = DynamicMessage.parseFrom(schema.getDescriptor(), data.toByteArray());
var isValidated = validator.validate(dm);  

Test schema that was used:

    syntax = "proto3";
    package org.test.validate;

    import "buf/validate/validate.proto";

    message PocCommon {
    // uuid field - UUID
    string uuid_field = 1 [(buf.validate.field).string.uuid = true];
    }

Lastly one option worked with generated classes/model, but it cannot be used with the current client architecture. So, looking for any other alternate option

  • Using generated classes, descriptor parses FieldOptions extension(s) as expected. Validator succeeds so far only with this.
    Whereas in our case generated model is NOT an option so far. Would there be any other solution that someone can suggest instead of using descriptor with or without generated model ? Thank you
@dravula1 dravula1 changed the title Seeking for alternate validation options without the need of extension options within descriptor Seeking for alternate validation solution, without the need of extension options within descriptor Oct 6, 2023
@pkwarren pkwarren self-assigned this Oct 16, 2023
@pkwarren
Copy link
Member

pkwarren commented Oct 16, 2023

Thank you for the detailed issue report.

Option 1 (no extensions/options)

I don't think there is anything to be done here if another tool removes extensions. Working with no extensions/options is a major architectural change for protovalidate, so if you wish to pursue further this is probably better tracked at https://github.com/bufbuild/protovalidate (since it isn't a problem specific to the Java implementation).

I do recall that at some point Confluent didn't support extensions but I believe newer versions do - see confluentinc/schema-registry#2357. Which version of the Confluent libraries are you on?

Also note, Buf has a Confluent Schema Registry integration available for enterprise customers which preserves all features of .proto files including extensions/options: https://buf.build/docs/data-pipeline/confluent/overview.

Option 2

I believe #48 should resolve issues seen with protoc --retain_options ----descriptor_set_out (if the extensions are present but unknown). This fix is in the newly released 0.1.6 version, which should be available on Maven central shortly.

@venkatduddu
Copy link

venkatduddu commented Oct 18, 2023

Hi @pkwarren
About Also note, Buf has a Confluent Schema Registry integration available for enterprise customers which preserves all features of .proto files including extensions/options: https://buf.build/docs/data-pipeline/confluent/overview.

We think the issue is with how confluent schema-registry code which we use to retrieve the schema through REST API, but not with the schema registry itself.

Could you please explain more on how switching to the Buf hosted schema registry will help in this case.

@pkwarren
Copy link
Member

pkwarren commented Oct 19, 2023

We think the issue is with how confluent schema-registry code which we use to retrieve the schema through REST API, but not with the schema registry itself.

Could you please explain more on how switching to the Buf hosted schema registry will help in this case.

The main issue with losing extensions/options on proto messages is with how protobuf schemas tend to make it into the CSR. With the default producer setting of auto registering schemas, the client code will convert generated protobuf code (via reflection) to a .proto schema (which can be lossy), which is then registered with the CSR which parses the schema into its own representation before persisting it (also can be lossy). What is ultimately served up by the REST API is the final form of the schema after it has been processed through multiple systems.

For example, a producer using an older version of Confluent protobuf support will lose extensions/options when registering schemas or if an instance of the CSR is on an older version, it too could drop parts of the schema which aren't supported.

The BSR solution doesn't perform a potentially lossy conversion of a protobuf schema using reflection - it stores the .proto files directly in the BSR with no transformation (the files on disk are identical to the ones stored in the BSR with buf push). The CSR integration is built using mappings defined in the .proto files themselves that associate a protobuf message with a CSR subject, and the schemas served up via the REST API are always identical to the ones in the BSR. The CSR integration is also read-only, preventing an older producer or other system from overwriting the contents of a schema. This also ties in well with other BSR features like server wide breaking change detection and other governance flows that prevent backwards incompatible changes from being pushed and breaking Kafka consumers/producers.

All that being said feel free to hop on our public slack if you have further questions.

I'm going to go ahead and close this issue as I believe the primary reported issue regarding validation with DynamicMessage is now resolved in the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants