2023-08-24 kernel meeting notes #27
zachschuermann
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
summary
main java threads:
main rust threads:
attendees
@tdas @wjones127 @nicklan @ryan-johnson-databricks @schuermannator
notes stream
question from td for will: async
object storage layer doesn't even offer a sync API
nick:
block_on
is dangerous to use. wrong thread can cause it to panicblocks the current thread
if we want a truly small/clean/sync API, doing block_on isn't the right call. instead just disallow async
many rust projects just build an async API then throw a sync API on top (postgres etc. i think does this - zach)
other way is very hard. making sync API then do async
we want to be able to issue reads and then wait for the first (or prefetching etc)
hypothesis - there might not be a shim, more like parallel implementations (pure logic shared sans IO)
--> is this @wjones127 PR? yes! to review
more code than ideal but a decent way to appease delta-rs and FFI
consuming the iterator that produces addfiles
ryan: still - what do we gain from async? pragmatically.
in delta-kernel prototype from ryan: we had a thread-friendly version of the kernel where it tracks the queue of work items that 'need doing' (i'm going to need parquet file read, i'm going to need json parsed - no IO) helper thread picks from queue and does it. blocking from kernel perspective but not actually externally.
zach observation: can we encode the above in async rust? is it easier to do with queues? tradeoffs?
will: thought he could reduce the duplicate code more than he did.
would like to explore: async API in rust then FFI does sync wrapper around that (which requires an executor - no matter how simple)
same concern: do we want to depend on an executor (even lightweight one like block_on or smol)
nick: thing it buys: we get to use async crates.
does it buy us that much if we need to build sync version and async version?
idea:
if you do async io, you can provide an engine in the tableclient?
will question: how to wrap in C++? do C++17 at least? build system? duckDB does CMake for extensions.
care about C++ std for headers, library deps, ABI
zach question: strawman: start with sync
summarize: driving questions
Stream
s in the interfaceBeta Was this translation helpful? Give feedback.
All reactions