-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce TableChangesScan::execute
and ScanFileReader
#555
Closed
Closed
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
9e5b04a
initial log replay
OussamaSaoudi-db 2ff7061
Add basic mock table for testing
OussamaSaoudi-db 3820dd3
Finish testing framework for commit actions
OussamaSaoudi-db ff7e1a1
Fix deletion vectors
OussamaSaoudi-db 674b9df
Add protocol test
OussamaSaoudi-db 2460f2b
Make MockTable async
OussamaSaoudi-db 3fe9347
add schema check
OussamaSaoudi-db 98d64d3
Add config flag parsing
OussamaSaoudi-db 6221dc5
Add configuration check
OussamaSaoudi-db 2a33d77
Change log replay to work with table changes scan
OussamaSaoudi-db fe4c0e6
add timestamp tests
OussamaSaoudi-db c2b1c00
Use map_ok
OussamaSaoudi-db 63435e9
Address some pr comments
OussamaSaoudi-db d8b1225
Integrate with table changes builder
OussamaSaoudi-db f072e6e
Fix private visit_protocol, remove print
OussamaSaoudi-db ba76bb1
Change selection vector computation
OussamaSaoudi-db a08dc6e
Add comments for log replay
OussamaSaoudi-db 93e900a
more documentation
OussamaSaoudi-db 9702ef0
Add file-level doc
OussamaSaoudi-db 196de69
Revert "Add file-level doc"
OussamaSaoudi-db d188791
Add file level doc
OussamaSaoudi-db 33f4b7e
Initial scan file
OussamaSaoudi-db 968b543
Move docs to fields
OussamaSaoudi-db b183304
Move common utils to utils::test_utils
OussamaSaoudi-db 27d7ed5
Refactor to prepare for scan_file, remove unused annotation
OussamaSaoudi-db 8c85361
Merge branch 'log_replay_2' into cdf_scan_file
OussamaSaoudi-db e384be6
add scan file visitor test
OussamaSaoudi-db 21b7e85
Fix clippy
OussamaSaoudi-db 2c9f483
stub out execute
OussamaSaoudi-db 8d3d843
Add scan_data_to_scan_file phase
OussamaSaoudi-db ad29685
Merge branch 'cdf_scan_file' into cdf_read_phase
OussamaSaoudi-db a1c78ce
initial data read phase
OussamaSaoudi-db 40dca95
initial data read phase
OussamaSaoudi-db 7f1098b
update to scan_file expression
OussamaSaoudi-db 6e37d99
Merge branch 'cdf_scan_file' into cdf_read_phase
OussamaSaoudi-db 13cb83c
Working cdf dv resolution
OussamaSaoudi-db File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
[package] | ||
name = "change_data_feed" | ||
version = "0.1.0" | ||
edition = "2021" | ||
|
||
[dependencies] | ||
arrow-array = { workspace = true } | ||
arrow-schema = { workspace = true } | ||
clap = { version = "4.5", features = ["derive"] } | ||
delta_kernel = { path = "../../../kernel", features = [ | ||
"cloud", | ||
"default-engine", | ||
"developer-visibility", | ||
"sync-engine" | ||
] } | ||
env_logger = "0.11.3" | ||
url = "2" | ||
itertools = "0.13" | ||
arrow = { workspace = true, features = ["prettyprint"] } |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
use std::sync::Arc; | ||
|
||
use arrow::{compute::filter_record_batch, util::pretty::print_batches}; | ||
use arrow_array::RecordBatch; | ||
use clap::Parser; | ||
use delta_kernel::{ | ||
engine::{arrow_data::ArrowEngineData, sync::SyncEngine}, | ||
DeltaResult, Table, | ||
}; | ||
use itertools::Itertools; | ||
|
||
#[derive(Parser)] | ||
#[command(author, version, about, long_about = None)] | ||
#[command(propagate_version = true)] | ||
struct Cli { | ||
/// Path to the table to inspect | ||
path: String, | ||
|
||
start_version: u64, | ||
|
||
end_version: Option<u64>, | ||
} | ||
|
||
fn main() -> DeltaResult<()> { | ||
let cli = Cli::parse(); | ||
let table = Table::try_from_uri(cli.path)?; | ||
let engine = Arc::new(SyncEngine::new()); | ||
let table_changes = table.table_changes(engine.as_ref(), cli.start_version, cli.end_version)?; | ||
|
||
let x = table_changes.into_scan_builder().build()?; | ||
let batches: Vec<RecordBatch> = x | ||
.execute(engine)? | ||
.map(|scan_result| -> DeltaResult<_> { | ||
let scan_result = scan_result?; | ||
let mask = scan_result.full_mask(); | ||
let data = scan_result.raw_data?; | ||
let record_batch: RecordBatch = data | ||
.into_any() | ||
.downcast::<ArrowEngineData>() | ||
.map_err(|_| delta_kernel::Error::EngineDataType("ArrowEngineData".to_string()))? | ||
.into(); | ||
if let Some(mask) = mask { | ||
Ok(filter_record_batch(&record_batch, &mask.into())?) | ||
} else { | ||
Ok(record_batch) | ||
} | ||
}) | ||
.try_collect()?; | ||
print_batches(&batches)?; | ||
|
||
Ok(()) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,9 @@ | ||
//! Code relating to parsing and using deletion vectors | ||
|
||
use std::io::{Cursor, Read}; | ||
use std::sync::Arc; | ||
|
||
use bytes::Bytes; | ||
use roaring::RoaringTreemap; | ||
use std::io::{Cursor, Read}; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do you need these? |
||
use std::sync::Arc; | ||
use url::Url; | ||
|
||
use delta_kernel_derive::Schema; | ||
|
@@ -13,6 +12,7 @@ use crate::utils::require; | |
use crate::{DeltaResult, Error, FileSystemClient}; | ||
|
||
#[derive(Debug, Clone, PartialEq, Eq, Schema)] | ||
#[cfg_attr(test, derive(serde::Serialize), serde(rename_all = "camelCase"))] | ||
pub struct DeletionVectorDescriptor { | ||
/// A single character to indicate how to access the DV. Legal options are: ['u', 'i', 'p']. | ||
pub storage_type: String, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra spaces