Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 📝 Source Snowflake: fix coercion-to-UTC for timestamp datasource #31631

Merged
merged 15 commits into from
Oct 24, 2023

Conversation

stephane-airbyte
Copy link
Contributor

@stephane-airbyte stephane-airbyte commented Oct 19, 2023

What

fixes #31568

When exporting a timestamp_tz column, we lose the timeoffset information, and get the timestamp at UTC (for example, 2023-08-01T00:00:00.000-07:00 will become 2023-08-01T00:07:00.000Z).
Per snowflake documentation, a timestamp_tz is really a UTC timestamp with a timezone associated to it, so when we read a timestamp_tz as a java timestamp, we get the UTC time.

This is not a very resilient change, but the existing code was even more brittle.
In addition, this change reduces the precision of snowflake's timestamp_tz to ms (TIME(3))), while they might be stored as TIME(9) (with ns precision)

How

snowflake doesn't support custom datatype mapping, so a timestamp_tz is always mapped to a java Timestamp (without timezone information).
The default behavior that forces a mapping to OffsetDateTime is not applicable here.
Instead, we have to get the column value as a String and parse it to get the OffsetDateTime, and then write that as a JSON object.
Unfortunately, even that behavior is brittle. There are a bunch of parameters that affect the seriazliation of timestamp columns :
JDBC_TREAT_TIMESTAMP_NTZ_AS_UTC (should be set to true)
JDBC_USE_SESSION_TIMEZONE (should be set to false)
TIMESTAMP_TYPE_MAPPING should be set to 'TIMESTAMP_TZ' for maximum accuracy
TIMESTAMP_LTZ_OUTPUT_FORMAT, TIMESTAMP_NTZ_OUTPUT_FORMAT, TIMESTAMP_OUTPUT_FORMAT, and TIMESTAMP_TZ_OUTPUT_FORMAT should all be set to something like 'YYYY-MM-DDTHH24:MI:SS.FF9TZH:TZM' to make our lives easier. (The current default value is 'YYYY-MM-DD HH24:MI:SS.FF3 TZHTZM', which forces me to do some custom parsing, and loses any precision past the ms
All those parameters can be set at the session level, which means we should be able to set them at connection creation, but I haven't found a way to create a connection-creation hook, so I'm solving the immediate issue

🚨 User Impact 🚨

not a breaking change. We increase data fidelity for timestamp_tz columns.

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.

@vercel
Copy link

vercel bot commented Oct 19, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Oct 24, 2023 5:44pm

@CLAassistant
Copy link

CLAassistant commented Oct 19, 2023

CLA assistant check
All committers have signed the CLA.

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Oct 19, 2023
@github-actions
Copy link
Contributor

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 517cd148c5) - ❌

⏲️ Total pipeline duration: 07mn35s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/x86_64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@@ -20,7 +20,7 @@ data:
name: Paypal Transaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be reverted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's a bug in our approve-and-merge pipeline. The automatic formatting should have kicked in but it didn't, and as a result, another random change updated that file: https://github.com/airbytehq/airbyte/pull/31610/files

@airbyte-oss-build-runner
Copy link
Collaborator

source-paypal-transaction test report (commit 64d65f66c0) - ✅

⏲️ Total pipeline duration: 01mn42s

Step Result
Build source-paypal-transaction docker image for platform(s) linux/x86_64
Acceptance tests
Code format checks
Validate metadata for source-paypal-transaction
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-paypal-transaction test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 64d65f66c0) - ❌

⏲️ Total pipeline duration: 09mn36s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/x86_64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

Copy link
Contributor

@akashkulk akashkulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Couple of high-level comments :

  1. We should bump the version of the snowflake connector in the metadata.yaml : (airbyte-integrations/connectors/source-snowflake/metadata.yaml). This deploys a new version of the source-snowflake connector in cloud.
  2. We need to add to the changelog in docs/integrations/sources/snowflake.md outlining the change.
  3. [Optional] : We should add a quick blurb about the weirdness in handling timestamp_tz in snowflake.

Copy link
Contributor

@akashkulk akashkulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look into defaulting some of these JDBC params while connecting to snowflake in the future while creating a connection, but for now I think this works

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 1e7557b664) - ❌

⏲️ Total pipeline duration: 08mn36s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/x86_64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@stephane-airbyte
Copy link
Contributor Author

  1. We should bump the version of the snowflake connector in the metadata.yaml : (airbyte-integrations/connectors/source-snowflake/metadata.yaml). This deploys a new version of the source-snowflake connector in cloud.
    done
  2. We need to add to the changelog in docs/integrations/sources/snowflake.md outlining the change.
    done
  3. [Optional] : We should add a quick blurb about the weirdness in handling timestamp_tz in snowflake.
    I created an issue instead: https://github.com/airbytehq/airbyte/issues/31663

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 176fd8f12e) - ❌

⏲️ Total pipeline duration: 08mn00s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/x86_64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 4405314d67) - ❌

⏲️ Total pipeline duration: 12mn06s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 8edc45a650) - ❌

⏲️ Total pipeline duration: 06mn53s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 4fb334ba33) - ❌

⏲️ Total pipeline duration: 09mn37s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit c7866dfb22) - ❌

⏲️ Total pipeline duration: 02mn09s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 18f042546f) - ❌

⏲️ Total pipeline duration: 08mn54s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 0812b6c725) - ❌

⏲️ Total pipeline duration: 06mn48s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is really annoying to diff, but the only difference is in columns 24 and 26, where we get 00:00:00-07:00 and 00:12:00+05:00 respectively instead of 00:07:00Z (Where Z means UTC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those comments are only here to facilitate debugging when a CAT fails. The sorting is to improve it further, so a simple diff can do

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 127b14246b) - ❌

⏲️ Total pipeline duration: 10mn45s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@stephane-airbyte stephane-airbyte changed the title fix coercion-to-UTC in snowflake timestamp_tz datasource 🐛 📝 Source Snowflake: fix coercion-to-UTC for timestamp datasource Oct 23, 2023
@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 01f4046839) - ❌

⏲️ Total pipeline duration: 07mn37s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 5ff619f192) - ❌

⏲️ Total pipeline duration: 09mn54s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit 9033da929f) - ❌

⏲️ Total pipeline duration: 10mn57s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit c8bae50303) - ✅

⏲️ Total pipeline duration: 10mn49s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@stephane-airbyte stephane-airbyte enabled auto-merge (squash) October 24, 2023 17:45
@airbyte-oss-build-runner
Copy link
Collaborator

source-snowflake test report (commit b39f48cf86) - ✅

⏲️ Total pipeline duration: 09mn09s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-snowflake docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@stephane-airbyte stephane-airbyte merged commit df9da28 into master Oct 24, 2023
13 of 15 checks passed
@stephane-airbyte stephane-airbyte deleted the sgeneix-31568 branch October 24, 2023 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 Source Snowflake: Timestamps coerced to UTC and lose the original timezone offsets
5 participants