Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-pardot contribution from justbeez #48841

Conversation

justbeez
Copy link
Contributor

@justbeez justbeez commented Dec 7, 2024

What

This PR updates source Pardot (source-pardot).

The contributor provided the following description of the change:

  • Fix authentication
  • Migrate all existing streams to Pardot v5 API (except email_clicks which is only available in v4)
  • Re-implement incremental syncs for existing streams where possible
  • Add 17 new streams from the v5 API (folders, emails, engagement_studio_programs, folder_contents, forms, form_fields, form_handlers, form_handler_fields, landing_pages, layout_templates, lifecycle_stages, lifecycle_histories, list_emails, opportunities, tags, tracker_domains, visitor_page_views)
  • Add additional configuration options to better handle large accounts (e.g. adjustable split-up windows, page size)
  • Align to Pardot-recommended sort/filter/pagination conventions to avoid timeouts (based on Pardot support case #469072278)

Reviewer checklist

  • Resolve any merge conflicts and validate file diffs (make sure the PR only includes changes intended by the contributor)
  • After reviewing the changes, run the bump-version Airbyte-CI command locally to update the version of the connector according to the versioning guidelines. Add breakingChanges to metadata if necessary.
  • Ensure connector docs are up to date with any changes
  • Run /format-fix to resolve any formatting errors
  • Click into the CI workflows that wait for a maintainer to run them, which should trigger CI runs

Copy link

vercel bot commented Dec 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Dec 7, 2024 5:59pm

Copy link

vercel bot commented Dec 7, 2024

@justbeez is attempting to deploy a commit to the Airbyte Growth Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link

CLAassistant commented Dec 7, 2024

CLA assistant check
All committers have signed the CLA.

@justbeez
Copy link
Contributor Author

justbeez commented Dec 7, 2024

This MR is originally based on a Slack thread with @natikgadzhi available here for context:
https://airbytehq.slack.com/archives/C027KKE4BCZ/p1731076657058859

Pulling the following notes from that thread as they're relevant to some of the choices implemented here:

It's worth noting that I tried the current version of the connector with several accounts and it's completely broken (auth doesn't work right, plus what I mentioned with incremental being dropped). So while my version is a breaking change, it's a breaking change on an already broken connector (at least from what I've tried on OSS).

The reason I undertook a more significant rewrite is due to multiple of our large Pardot orgs experiencing failures of streams like list_membership because of the way the old connector was built against the v4 API (TLDR; the mixing of id and created_at filtering causes problems for them, since it can't be correctly optimized). They're not willing to fix that, and v4 workarounds like adding their output=bulk filter didn't work for all the nonprofit accounts we serve. So I moved everything over to v5 per their recommendation, and implemented better hybrid split up interval filtering and along with token-based cursor pagination.

There are a couple things you'll see that might be confusing at first:

  • Their v5 API requires explicit field lists for every request. I pulled all currently available fields from their docs to maximize use-case coverage (in the case of relations to other streams, I just pulled the appropriate *Id key field to avoid bloating payloads with redundant data)
  • Their token-based cursor pagination in v5 doesn't allow other parameters to be present when you're reading the token; so I'm using Jinja merges to drop them when looping the token (which means I can't use the normal functionality provided by some of the components and instead have to manually list the parameters and check if a token is present to conditionally include them)
  • The above doesn't seem to work when only the Pagination component is used (e.g. non-incremental streams); the next_page_token is just blank for no reason when I'm not also using the Incremental component. To work around this, I enabled a dummy Parameterized Requests config on these streams which makes next_page_token (I'll file a proper issue on this if there isn't one already, but for now I just put a VERY verbose note in the field name as to why its there). [example: prospect_accounts stream]

Happy to clarify and help out however needed to get this landed.

I also have code I generated for the other 6 v5 streams not included in this MR, but I don't have an account with permissions to where I can test them (so couldn't submit them through the UI). If y'all have a sandbox account, I'm happy to provide those as well to get you full coverage of their v5 API.

One more item I didn't think of at the time is that I expanded the schemas to include all possible return fields and their appropriate types, as a lot of options only show up in some accounts. I would not recommend overwriting the schemas as each individual account tends to only have a subset of the fields.

@justbeez
Copy link
Contributor Author

Superseded by #49424

(didn't realize I had a stale fork when contributing through Builder. oops.)

@justbeez justbeez closed this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants