-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨Use base64 to encode mssql binary field #33755
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Warning 🚨 Connector code freeze is in effect until 2024-01-02. This PR is changing connector code. Please contact the current OC engineers if you want to merge this change to master. |
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
But how is it helping with the mentioned issue of 6GB updates failing?
@@ -128,7 +129,7 @@ protected void putBinary(final ObjectNode node, | |||
final int index) | |||
throws SQLException { | |||
final byte[] bytes = resultSet.getBytes(index); | |||
final String value = new String(bytes, Charset.defaultCharset()); | |||
final String value = Base64.encodeBase64String(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this I think is in line with what we do in other java connectors 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a test also for binary type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that a binary field in the source database is encoded as a AirbyteType String? Is that something that was discussed, or just how it was implemented for simplicity reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current behavior for binary column is undefined in our public doc (and see slack discussion with destination team here https://airbytehq-team.slack.com/archives/C03C4AVJWG4/p1700173565688209), it sounds like base64 is the direction we are heading to; so yes mostly for simplicity reason.
@rodireich Apologize! I linked the wrong ticket. This is the one. Updated #19337 |
I'll have to update version after code freeze to avoid any confliction! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good.
I just wanted to understand for myself if mssql is different from mysql and postgres where I think we're doing specific type conversions.
I fully support this change (this solves other issue that we had: #30856). But from my POV this is a breaking change? My view of the situation: {
"my-byte-array": "hello"
} In destination connector we were using Now json is {
"my-byte-array": "aGVsbG8="
} same destination connector would save array So necessary steps for upgrade are:
Which is non-trivial upgrade process typically expected for patch. Am I missing something? |
What
#19337
Since airbyte transport everything in Json, binary was converted into String using default char set - thus for many cases they are not converted into a human readable string and cause confusion. Currently we did not specify how airbyte is going to handle binary in our official doc; so I propose we encode that in base64.
Slack discussion: https://airbytehq-team.slack.com/archives/C03C4AVJWG4/p1700173565688209
Result:
before:
===
after:
How
Describe the solution
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user?
For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.
If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Actions
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.0.0.1
Dockerfile
has version0.0.1
README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog with an entry for the initial version. See changelog exampledocs/integrations/README.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
Updating a connector
Community member or Airbyter
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
Connector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:generateScaffolds
then checking in your changesUpdating the Python CDK
Airbyter
Before merging:
--use-local-cdk --name=source-<connector>
as optionsairbyte-ci connectors --use-local-cdk --name=source-<connector> test
After merging: