You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I started keeping my notes later in the call because I was unfortunately multitasking, so hopefully the others can add some details
Todo
TD is really interested in a prototype using an FFI bridge to do interesting things in a C++ PoC connector
@roeap and @rtyler feel like we're close to having a read prototype with delta-rs on kernel-rs
Raw Notes
Matthew shares a really cool spreadsheet showing the possible connectors sit on top of them.
Lots of discussion around the Future and poll_next()
Deletion vectors
JSON checkpoint schema, what do we do?
pulled in serde_json
cannot provide the full schema for that table with the current files
right now we read the log and pick and choose what we want to read
with the _last_checkpoint we get the version and list from that version. Within that file there's an optional schema of the checkpoint files, right now we're not using that, we're just inferring the schema right now. Since Spark writes it @roeap feels there must be some use for that data.
checkpoint schema is supposed to match with the stats parsed column.
Right now we'll just ignore that
We don't want to parse the JSON untijl after data skipping
Discussing serde_json::Value and its utility as an opaque type.
Trying to have JsonHandler match the behavior of the ParuetReader. Wanting to have schema pre-suggested/expected.
Some data might be a little trickier, such as with timestamps, etc.
@rtyler mentioned his work on the ColumnarBatch types and that we're just going to have to implement a bunch of the types defined in the protocol in Rust.
@roeap has some concerns about type handling for partition values
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Below are some of my rough notes from discussing things with @tdas @MrPowers @wjones127 @roeap @ryan-johnson-databricks and @vkorukanti
I started keeping my notes later in the call because I was unfortunately multitasking, so hopefully the others can add some details
Todo
Raw Notes
Future
andpoll_next()
serde_json
_last_checkpoint
we get the version and list from that version. Within that file there's an optional schema of the checkpoint files, right now we're not using that, we're just inferring the schema right now. Since Spark writes it @roeap feels there must be some use for that data.serde_json::Value
and its utility as an opaque type.ColumnarBatch
types and that we're just going to have to implement a bunch of the types defined in the protocol in Rust.Beta Was this translation helpful? Give feedback.
All reactions