Skip to content

Commit

Permalink
feat: introduce CDC write-side support for the Update operations
Browse files Browse the repository at this point in the history
This change introduces a `CDCTracker` which helps collect changes during
merges and update. This is admittedly rather inefficient, but my hope is
that this provides a place to start iterating and improving upon the
writer code

There is still additional work which needs to be done to handle table
features properly for other code paths (see the middleware discussion we
have had in Slack) but this produces CDC files for Update operations

Fixes delta-io#604
Fixes delta-io#2095
  • Loading branch information
rtyler committed May 21, 2024
1 parent 319229d commit df43924
Show file tree
Hide file tree
Showing 9 changed files with 812 additions and 20 deletions.
3 changes: 2 additions & 1 deletion crates/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,8 @@ tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
utime = "0.3"

[features]
default = []
cdf = []
default = ["cdf"]
datafusion = [
"dep:datafusion",
"datafusion-expr",
Expand Down
6 changes: 6 additions & 0 deletions crates/core/src/delta_datafusion/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -509,6 +509,12 @@ impl<'a> DeltaScanBuilder<'a> {
self
}

/// Use the provided [SchemaRef] for the [DeltaScan]
pub fn with_schema(mut self, schema: SchemaRef) -> Self {
self.schema = Some(schema);
self
}

pub async fn build(self) -> DeltaResult<DeltaScan> {
let config = self.config;
let schema = match self.schema {
Expand Down
Loading

0 comments on commit df43924

Please sign in to comment.