You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background
Allow resources and transformers to be async generators. Allows DltResource to accept async iterator as data.
Async iterators should be evaluated in PipeIterator like regular iterators, but via futures pool.
Please note:
async iterators are still iterated item by item so a single async iterator does not help to run extraction in parallel. however several such iterators may be run in parallel in futures pool
note that DltResource implements a regular iterator interface and this ticket does not change that.
Implementation
One approach could be to wrap async iterator in a coroutine that will next element and then return this element and the async generator, that is immediately re-scheduled if not empty. Also several code paths are preventing async generators from being used. See tests below
Also we need to improve the resource state name being available in executed awaitables / callables. Most probably we will wrap all of them and set the resource name just before calling - already in the pool. if such resources is accessed as a first operation all should work correctly
Tests
Following cases should pass. You'll find more to test during the implementation
@dlt.resource
async def async_gen_resource(idx):
for l in ["a", "b", "c"]:
await asyncio.sleep(1)
yield {"async_gen": idx, "letter": l}
pipeline_1 = dlt.pipeline("pipeline_1", destination="duckdb", full_refresh=True)
pipeline_1.run(
async_gen_resource(10)
)
pipeline_1.run(
async_gen_table(11)
)
The text was updated successfully, but these errors were encountered:
Background
Allow resources and transformers to be async generators. Allows
DltResource
to accept async iterator as data.Async iterators should be evaluated in
PipeIterator
like regular iterators, but via futures pool.Please note:
DltResource
implements a regular iterator interface and this ticket does not change that.Implementation
One approach could be to wrap async iterator in a coroutine that will next element and then return this element and the async generator, that is immediately re-scheduled if not empty. Also several code paths are preventing async generators from being used. See tests below
Also we need to improve the resource state name being available in executed awaitables / callables. Most probably we will wrap all of them and set the resource name just before calling - already in the pool. if such resources is accessed as a first operation all should work correctly
Tests
Following cases should pass. You'll find more to test during the implementation
The text was updated successfully, but these errors were encountered: