Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro Compact Format #1213

Merged
merged 5 commits into from
Jul 5, 2023
Merged

Avro Compact Format #1213

merged 5 commits into from
Jul 5, 2023

Conversation

alexec
Copy link
Contributor

@alexec alexec commented Jun 1, 2023

No description provided.

@alexec alexec force-pushed the avro-turbo branch 2 times, most recently from e6020d6 to c74889f Compare June 1, 2023 15:31
@alexec
Copy link
Contributor Author

alexec commented Jun 1, 2023

@duglin I've had a negative contributor experience with this PR:

  • Fighting the DCO check. There is no need to sign every commit, just one really should enough to meet this check. This is make-work.
  • Merge conflicts. I've been editing files that are also edited on master branch. This is also make-work.

This has resulted in a messy PR very unlike my usual work.

@duglin
Copy link
Collaborator

duglin commented Jun 1, 2023

@duglin I've had a negative contributor experience with this PR:
* Fighting the DCO check. There is no need to sign every commit, just one really should enough to meet this check. This is make-work.
* Merge conflicts. I've been editing files that are also edited on master branch. This is also make-work.
This has resulted in a messy PR very unlike my usual work.

Signing is a commit-level thing, so the DCO checker needs to check each commit. One option is to --amend your commit so you only have one. No one will complain (or should since I do it and no one has complained).

Not sure what to say about conflicts - that's just part of a collaborative tool like git and I don't see how someone can avoid it if there are busy files. One thing I found helpful is to rebase often. On the positive side, since we only (usually) merge PRs once a week, rebasing shouldn't be needed more than that.

@alexec alexec marked this pull request as ready for review June 6, 2023 14:54
@alexec
Copy link
Contributor Author

alexec commented Jun 6, 2023

I've traced the cause of the conflict. The instruction for fixing the DCO check are wrong. You cannot push using --force-with-lease once you git merge master, this corrupts the git history.

@alexec
Copy link
Contributor Author

alexec commented Jun 6, 2023

cloudevents/sdk-java#578

@alexec
Copy link
Contributor Author

alexec commented Jun 6, 2023

#1210

@alexec
Copy link
Contributor Author

alexec commented Jun 7, 2023

@duglin open questions:

Existing event formats loose type information (time is rounded to microseconds and offset lost, URIs are de. Both avro and json formats lose URI type information. Ee to limitations in the spec and limitation in JSON respectively.

It is not clear if the HTTP formats support this, but I'm not sure how they would. Say you head custom context attribute that was foo, how can you turn this into a HTTP header and back and preserve type?

Given how common the json format is, it is now de-facto URIs are not supported in custom context attributes.

Avro Turbo chooses performance over complexity. Supporting these types is <= O(9) cost (the number of CE SDKs is 9). Complex systems tend to perform poorly.

For time , the argument is not so strong. we could use an union, but it makes implementation costly, complex and buggy.

My Java implementation is 126 lines long.

  • json is 923 lines.
  • protobuf is 700 lines.
  • xml is 1017 lines long

It is a full order of magnitude smaller. This is achieved by keenly eschewing complexity.

Less code means less bugs. I don't want to make implementing it harder and more buggy than needed.


@pierDipi is v3x of the Java SDK backwards incompatible with v2.x? If that is the case, then I'll put this work on hold because we will not have resource to do a version upgrade anytime in the next 3-6 months.

@pierDipi
Copy link
Member

pierDipi commented Jun 8, 2023

@duglin open questions:

Existing event formats loose type information (time is rounded to microseconds and offset lost, URIs are de. Both avro and json formats lose URI type information. Ee to limitations in the spec and limitation in JSON respectively.

It is not clear if the HTTP formats support this, but I'm not sure how they would. Say you head custom context attribute that was foo, how can you turn this into a HTTP header and back and preserve type?

Given how common the json format is, it is now de-facto URIs are not supported in custom context attributes.

Avro Turbo chooses performance over complexity. Supporting these types is <= O(9) cost (the number of CE SDKs is 9). Complex systems tend to perform poorly.

For time , the argument is not so strong. we could use an union, but it makes implementation costly, complex and buggy.

My Java implementation is 126 lines long.

  • json is 923 lines.
  • protobuf is 700 lines.
  • xml is 1017 lines long

It is a full order of magnitude smaller. This is achieved by keenly eschewing complexity.

Less code means less bugs. I don't want to make implementing it harder and more buggy than needed.

@pierDipi is v3x of the Java SDK backwards incompatible with v2.x? If that is the case, then I'll put this work on hold because we will not have resource to do a version upgrade anytime in the next 3-6 months.

In v3, there will be a few breaking changes (current list of breaking changes), however, avro-turbo will be a new module, so we can backport it for a 2.x release

@duglin
Copy link
Collaborator

duglin commented Jun 9, 2023

Existing event formats loose type information (time is rounded to microseconds and offset lost, URIs are de. Both avro and json formats lose URI type information. Ee to limitations in the spec and limitation in JSON respectively.

I think I understand how avro loses precision on "time", but how does JSON lose URI type info? Do you mean because it's just a string and there's no way to know if hello is a URI or just a random string? But, also is there a question here?

It is not clear if the HTTP formats support this, but I'm not sure how they would. Say you head custom context attribute that was foo, how can you turn this into a HTTP header and back and preserve type?

Everything is converted to a string. You need to know what the extension's spec-defined type is to know how to deserialize it back to the right type.

Given how common the json format is, it is now de-facto URIs are not supported in custom context attributes.

I'm not sure what you mean by this one.

Avro Turbo chooses performance over complexity. Supporting these types is <= O(9) cost (the number of CE SDKs is 9). Complex systems tend to perform poorly.

For time , the argument is not so strong. we could use an union, but it makes implementation costly, complex and buggy.

My Java implementation is 126 lines long.

* `json` is 923 lines.
* `protobuf` is 700 lines.
* `xml` is 1017 lines long

It is a full order of magnitude smaller. This is achieved by keenly eschewing complexity.

Less code means less bugs. I don't want to make implementing it harder and more buggy than needed.

Is there a question here? It seems to me that if people choose to use Avro then they're accepting the limitations you've mentioned to gain performance.

@alexec
Copy link
Contributor Author

alexec commented Jun 10, 2023

Thank you. You answered my questions (even if not intentionally).

  • URIs custom context attributes not being back to URIs is acceptable.
  • Time being truncated to micros is acceptable.
  • The main breaking change in the Java SDK appears to be the upgrade to Kafka 3 libraries.

@duglin
Copy link
Collaborator

duglin commented Jun 14, 2023

@alexec is this one ready for final review?
Also, in the previous version you updated the SDK.md file - did you want to do that again for this one?

@alexec
Copy link
Contributor Author

alexec commented Jun 15, 2023

ready for review

@duglin
Copy link
Collaborator

duglin commented Jun 15, 2023

Discussed on the 6/15 call and the general direction looks good.
Just a few outstanding comments that need to be dealt with, and some folks wanted a little more time to review.
Hopefully we'll vote next week.

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the suggestion to rename this to "compact" - I'm happy to do that in an extra commit in this PR if you'd like, so that we're not adding work for you.

Other than that:

  • One wording nitpick which can be addressed later
  • One question about type differentiation for String/URI/URI-Ref
  • One suggested renaming

cloudevents/formats/avro-turbo-format.md Outdated Show resolved Hide resolved
cloudevents/formats/avro-turbo-format.md Outdated Show resolved Hide resolved
"default": null
},
{
"name": "attributes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be extensionattributes or extensions? (The other values are all attributes too, apart from data.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly - @clemensv _

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to “extensions”

cloudevents/formats/avro-turbo-format.md Outdated Show resolved Hide resolved
@alexec alexec changed the title Avro Turbo Avro Compact Format Jun 20, 2023
@alexec
Copy link
Contributor Author

alexec commented Jun 20, 2023

I've renamed this to "Avro Compact" because 3 of us preferred that.

@duglin
Copy link
Collaborator

duglin commented Jun 20, 2023

Also, in the previous version you updated the SDK.md file - did you want to do that again for this one?

@duglin
Copy link
Collaborator

duglin commented Jun 20, 2023

can you rebase? There's a conflict on the SDK.md

@alexec
Copy link
Contributor Author

alexec commented Jun 20, 2023

there is always a conflict with that file, I'll fix conflicts once approved

@duglin
Copy link
Collaborator

duglin commented Jun 20, 2023

I fixed the rebase for you so we can have the checker do it's job

Signed-off-by: Alex Collins <[email protected]>

ok

Signed-off-by: Alex Collins <[email protected]>

Update cloudevents-turbo.avsc

Signed-off-by: Alex Collins <[email protected]>

Update avro-format.md

Signed-off-by: Alex Collins <[email protected]>

turbo -> compact

Signed-off-by: Alex Collins <[email protected]>

turbo -> compact

Signed-off-by: Alex Collins <[email protected]>

turbo -> compact

Signed-off-by: Alex Collins <[email protected]>
@duglin
Copy link
Collaborator

duglin commented Jun 20, 2023

I also added a change to the primer since the Adobe docs moved. Might as well just reuse this PR. Now the only failing check is an ok one.

@duglin
Copy link
Collaborator

duglin commented Jun 22, 2023

@alexec sorry for the late notice, but I just realized that for consistency I think you should put the new spec into: cloudevents/working-drafts dir, like we did for XML.

Copy link
Contributor

@JemDay JemDay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor tweak ....

cloudevents/formats/cloudevents-compact.avsc Outdated Show resolved Hide resolved
Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @clemensv for the naming of "attributes" vs "extensions" for the map of extension attributes. I'd personally like to see that resolved before merging. One new thing I've noticed as well around nullability... but we're getting there :)

cloudevents/formats/cloudevents-compact.avsc Outdated Show resolved Hide resolved
Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being patient with us :)

@duglin
Copy link
Collaborator

duglin commented Jun 27, 2023

@alexec can you please move the new spec into the cloudevents/working-drafts dir?

@duglin
Copy link
Collaborator

duglin commented Jun 29, 2023

@alexec couple of bad refs now:

cloudevents/SDK.md: 'https://github.com/cloudevents/spec/blob/main/cloudevents/formats/avro-compact-format.md' was not found
cloudevents/working-drafts/avro-compact-format.md: Translation file cloudevents/languages/he/working-drafts/avro-compact-format.md does not exist
cloudevents/working-drafts/avro-compact-format.md: Translation file cloudevents/languages/zh-CN/working-drafts/avro-compact-format.md does not exist

actually, I think the first one is supposed to fail but it needs to point to "working-drafts" instead of "formats". The other 2 probably need the files moved to the working-drafts dir

Signed-off-by: Alex Collins <[email protected]>
@duglin
Copy link
Collaborator

duglin commented Jul 5, 2023

@alexec thanks - merging as it was approved during last week's call.

@duglin duglin merged commit 777d0c0 into cloudevents:main Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants