Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: asyncio-based CDK #34424

Closed
wants to merge 74 commits into from
Closed

WIP: asyncio-based CDK #34424

wants to merge 74 commits into from

Conversation

clnoll
Copy link
Contributor

@clnoll clnoll commented Jan 22, 2024

Follow-up to the POC that shows how to integrate asyncio into the CDK and Salesforce connector.

This demonstrates how we might split out an asyncio-based CDK module.

High-level overview:

In this implementation, we interact with the CDK from the same entrypoint (although it's worth considering whether to create a new asynico-based entrypoint).

To make use of the asyncio-based concurrency, connectors will inherit from the AsyncAbstractSource and AsyncStream, which provide async versions of the abstractmethods required for the Airbyte commands. This code also includes a SourceDispatcher which can route requests to either synchronous or async sources, for backwards compatibility.

Outside of swapping out the requests library for aiohttp, there is just one other major change to support the async functionality, which is the introduction of a SourceReader. This was required to bridge the entrypoint's sync code with the async code.

The SourceReader is instantiated by the source. When instantiated, it creates a queue, and kicks off a single thread that will do the async work. Instead of emitting AirbyteMessages on this second thread, they're enqueued in the SourceReader's queue, and the async source, using SourceReader as an iterator, reads and emits items from the queue.

image

This will replace the infrastructure for the thread-based implementation, which I won't go into detail on here, but here's a diagram giving a simplified overview.
image

TODOs:

  • There's a lot of duplicate code between the normal and asyncio-based CDK that can be consolidated.
  • Consider propagating the asyncio code to the entrypoint so that we don't require the StreamReader.

@clnoll clnoll requested a review from a team as a code owner January 22, 2024 19:15
Copy link

vercel bot commented Jan 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Jan 22, 2024 7:15pm

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit connectors/source/salesforce labels Jan 22, 2024
@clnoll clnoll changed the title Asyncio cdk WIP: asyncio-based CDK Jan 22, 2024
Copy link
Contributor

@clnoll what's the status of this PR? Just wondering as I created a filter for PRs that have the extensibility team as a required reviewer and noticed it.

@clnoll
Copy link
Contributor Author

clnoll commented Feb 24, 2024

@erohmensing this is not going anywhere any time soon. Closing it now.

@clnoll clnoll closed this Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues CDK Connector Development Kit connectors/source/salesforce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants