Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐙 source-twilio: per partition states for nested streams #49097

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

mkrawc
Copy link
Contributor

@mkrawc mkrawc commented Dec 11, 2024

What

  • Manage per partition state for nested streams
  • Get Usage records per day so that usage can be analyzed and aggregated
  • Fix edge case that Usage endpoint may return 404 if requesting usage for start/end dates way before the account creation date

How

  • use StreamSlice to manage stream slices and partitions
  • calculate UsageRecords min datetime based on accounts date_created

Review guide

User Impact

  • After upgrade existing connectors will download all data since the configured start_date because state format changed

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented Dec 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Dec 12, 2024 1:12pm

Copy link

vercel bot commented Dec 11, 2024

@mkrawc is attempting to deploy a commit to the Airbyte Growth Team on Vercel.

A member of the Team first needs to authorize it.

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Dec 11, 2024
@mkrawc mkrawc temporarily deployed to community-ci-auto December 11, 2024 15:43 — with GitHub Actions Inactive
@mkrawc mkrawc changed the title Twilio stream partition states source-twilio: per partition states for nested streams Dec 11, 2024
@mkrawc mkrawc changed the title source-twilio: per partition states for nested streams 🐙 source-twilio: per partition states for nested streams Dec 11, 2024
@marcosmarxm
Copy link
Member

Hello @mkrawc is there an open issue reporting the problem?

@marcosmarxm
Copy link
Member

@btkcodedev do you mind taking a look into this contribution

@mkrawc
Copy link
Contributor Author

mkrawc commented Dec 16, 2024

Hello @mkrawc is there an open issue reporting the problem?

@marcosmarxm I don't think so. I was working with twilio connector and noticed those issues.

The main problem is that if the stream state is saved globally across all accounts the connector may miss data for subaccounts. So the scenario is:

  • connector lists accounts
  • connector downloads the first account data for requestes timeframe
  • connector sets the state to the latest downloaded date
  • connector crashes (or gets cancelled) before downloading data for all of the subaccounts
  • connector restarts with the stream state pointing to the latest date
  • connector doesn't read the data for remaining subaccounts

After the change the connector state is saved per individual subaccounts for after the failure the connector would pick up where it ended last time.

The other two changes are:

  • Get Usage records per day so that usage can be analyzed and aggregated - Before the change the connector would save just one usage record since the connector start date to the current date which makes the date useless for further aggregations. After the change the connector will save usage records for each day so that the data can be summed for a desired timeframe.
  • Fix edge case that Usage endpoint may return 404 if requesting usage for start/end dates way before the account creation date - It looks like Twilio API returns 404 errors when requesting usage records where both StartDate and EndDate are before when the account was created:
404 Client Error: Not Found for url: https://api.twilio.com/2010-04-01/Accounts/xxxxx/Usage/Records/Daily.json?PageSize=1000&StartDate=2022-12-01&EndDate=2023-12-01

Hope this helps. Let me know if you have any other questions.

@btkcodedev
Copy link
Collaborator

Looking forward to this

Copy link
Collaborator

@btkcodedev btkcodedev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting on breaking change. If the spec change is important, we need a review from breaking-change-reviewers too

@@ -10,7 +10,7 @@
"account_sid": {
"title": "Account ID",
"description": "Twilio account SID",
"airbyte_secret": true,
"airbyte_secret": false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a spec change, it is a breaking change. We need to bump the version to 1.0.0 and add a twilio-migrations.md file.
But first, do we really need to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btkcodedev I don't think this change is critical. Account ID doesn't sound like a secret variable and just thought it doesn't need to be a secret. I'll revert it to not cause a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants