Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[write] add Transaction with commit info and commit implementation #370

Merged
merged 72 commits into from
Oct 25, 2024
Merged
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
f40221a
new Transaction API, write_json. empty commit for now
zachschuermann Oct 1, 2024
b16491c
commit info working
zachschuermann Oct 2, 2024
432a339
fix commit info
zachschuermann Oct 2, 2024
93fab4d
well that was a mess
zachschuermann Oct 3, 2024
64d7eaf
better
zachschuermann Oct 4, 2024
05e9488
cleanup
zachschuermann Oct 7, 2024
2282cd8
fmt
zachschuermann Oct 7, 2024
21928b8
test cleanup
zachschuermann Oct 7, 2024
532ea8c
appease clippy
zachschuermann Oct 8, 2024
215ed4e
fmt
zachschuermann Oct 8, 2024
78c8464
lil cleanup
zachschuermann Oct 8, 2024
0f1f955
add a test
zachschuermann Oct 8, 2024
8cc9cc9
better assert
zachschuermann Oct 8, 2024
114c16f
address feedback
zachschuermann Oct 8, 2024
b7c351f
address feedback, cleanup
zachschuermann Oct 10, 2024
9a9e9d3
fmt
zachschuermann Oct 10, 2024
6b0c2d4
Update kernel/src/engine/sync/json.rs
zachschuermann Oct 10, 2024
d1af098
more feedback
zachschuermann Oct 10, 2024
0ba047d
nits
zachschuermann Oct 10, 2024
667a8e2
add empty commit test
zachschuermann Oct 10, 2024
52bd5f2
add empty commit info tests, debugging expr
zachschuermann Oct 10, 2024
7696d7d
just make my test fail
zachschuermann Oct 10, 2024
fa6c81d
try to leverage ParsedLogPath?
zachschuermann Oct 11, 2024
a3abbfa
fmt
zachschuermann Oct 11, 2024
d7ea4c4
enforce single-row commit info
zachschuermann Oct 11, 2024
bac1d09
error FFI
zachschuermann Oct 11, 2024
bc541dd
better path api
zachschuermann Oct 11, 2024
fa1caf4
comment
zachschuermann Oct 11, 2024
9d875cd
clean
zachschuermann Oct 11, 2024
023b85a
fix all the schema mess
zachschuermann Oct 11, 2024
c1c6e2a
remove lifetime
zachschuermann Oct 11, 2024
da43cf2
fix executor
zachschuermann Oct 11, 2024
26b8dbd
docs and i forgot a test
zachschuermann Oct 11, 2024
858f3fb
add commit info schema test
zachschuermann Oct 13, 2024
1ef5ffc
add sync json writer, add FileAlreadyExists error
zachschuermann Oct 13, 2024
6ee69e7
fix rebase
zachschuermann Oct 14, 2024
0b2b1ed
remove old file
zachschuermann Oct 14, 2024
2258549
revert arrow_expression and default/mod.rs
zachschuermann Oct 14, 2024
f463e22
revert little spelling fix (in separate pr)
zachschuermann Oct 14, 2024
1149a17
clean up some crate:: with use
zachschuermann Oct 14, 2024
3877ccc
cleanup
zachschuermann Oct 14, 2024
0b5b301
Merge remote-tracking branch 'upstream/main' into transaction
zachschuermann Oct 17, 2024
3daed9b
it's getting close
zachschuermann Oct 18, 2024
327bbde
have i done it?
zachschuermann Oct 18, 2024
6d2b41a
wip
zachschuermann Oct 21, 2024
0abd291
remove my wrong null_literal for map lol rip
zachschuermann Oct 21, 2024
b793523
back to using empty struct for operationParameters
zachschuermann Oct 22, 2024
2f4e4d0
comment
zachschuermann Oct 22, 2024
68edef2
Merge remote-tracking branch 'upstream/main' into transaction
zachschuermann Oct 22, 2024
673af96
wip need to fix commit info operationParameters
zachschuermann Oct 22, 2024
559bbea
fix commit info
zachschuermann Oct 22, 2024
5afe8db
fix error ffi
zachschuermann Oct 22, 2024
7f87591
fmt
zachschuermann Oct 22, 2024
76cdfaa
remove my debugging
zachschuermann Oct 22, 2024
cc7598c
docs, cleanup, better tests
zachschuermann Oct 23, 2024
a1ba008
clippy
zachschuermann Oct 23, 2024
525b8ff
rename + docs
zachschuermann Oct 23, 2024
a86495a
make CommitInfo have correct schema and isolate the hack inside gener…
zachschuermann Oct 23, 2024
0a2ecfc
Merge remote-tracking branch 'upstream/main' into transaction
zachschuermann Oct 23, 2024
c22f625
fix tests to match on Backtraced { .. }
zachschuermann Oct 23, 2024
630c694
appease clippy
zachschuermann Oct 23, 2024
f5530f9
fmt
zachschuermann Oct 23, 2024
37db615
Merge remote-tracking branch 'upstream/main' into transaction
zachschuermann Oct 23, 2024
d7ad2e6
use column_* macros
zachschuermann Oct 23, 2024
2141ecf
Update kernel/src/engine/arrow_utils.rs
zachschuermann Oct 23, 2024
75c976c
rename
zachschuermann Oct 23, 2024
81866c9
Merge remote-tracking branch 'refs/remotes/origin/transaction' into t…
zachschuermann Oct 23, 2024
4908174
make generate_commit_info take & not Arc
zachschuermann Oct 24, 2024
20ffd33
fix unwrap
zachschuermann Oct 24, 2024
4aba873
address comments
zachschuermann Oct 24, 2024
b4feb4f
Merge remote-tracking branch 'upstream/main' into transaction
zachschuermann Oct 24, 2024
1fc535e
make it with_operation and with_commit_info
zachschuermann Oct 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
revert arrow_expression and default/mod.rs
zachschuermann committed Oct 14, 2024
commit 2258549b763b19b7c1f08695626a1ac9ba611c36
8 changes: 4 additions & 4 deletions kernel/src/engine/arrow_expression.rs
Original file line number Diff line number Diff line change
@@ -214,12 +214,12 @@ fn evaluate_expression(
let output_fields: Vec<ArrowField> = output_cols
.iter()
.zip(schema.fields())
.map(|(output_col, input_field)| -> DeltaResult<_> {
ensure_data_types(input_field.data_type(), output_col.data_type())?;
.map(|(array, input_field)| -> DeltaResult<_> {
ensure_data_types(input_field.data_type(), array.data_type())?;
Ok(ArrowField::new(
input_field.name(),
output_col.data_type().clone(),
output_col.is_nullable(),
array.data_type().clone(),
array.is_nullable(),
))
})
.try_collect()?;
43 changes: 22 additions & 21 deletions kernel/src/engine/default/mod.rs
Original file line number Diff line number Diff line change
@@ -40,37 +40,38 @@ pub struct DefaultEngine<E: TaskExecutor> {
impl<E: TaskExecutor> DefaultEngine<E> {
/// Create a new [`DefaultEngine`] instance
///
/// # Parameters
/// The `path` parameter is used to determine the type of storage used.
///
/// - `table_root`: The URL of the table within storage.
/// - `options`: key/value pairs of options to pass to the object store.
/// - `task_executor`: Used to spawn async IO tasks. See [executor::TaskExecutor].
pub fn try_new<K, V>(
table_root: &Url,
options: impl IntoIterator<Item = (K, V)>,
task_executor: Arc<E>,
) -> DeltaResult<Self>
/// The `task_executor` is used to spawn async IO tasks. See [executor::TaskExecutor].
pub fn try_new<I, K, V>(path: &Url, options: I, task_executor: Arc<E>) -> DeltaResult<Self>
where
I: IntoIterator<Item = (K, V)>,
K: AsRef<str>,
V: Into<String>,
{
// table root is the path of the table in the ObjectStore
let (store, table_root) = parse_url_opts(table_root, options)?;
Ok(Self::new(Arc::new(store), table_root, task_executor))
let (store, prefix) = parse_url_opts(path, options)?;
let store = Arc::new(store);
Ok(Self {
zachschuermann marked this conversation as resolved.
Show resolved Hide resolved
file_system: Arc::new(ObjectStoreFileSystemClient::new(
store.clone(),
prefix,
task_executor.clone(),
)),
json: Arc::new(DefaultJsonHandler::new(
store.clone(),
task_executor.clone(),
)),
parquet: Arc::new(DefaultParquetHandler::new(store.clone(), task_executor)),
store,
expression: Arc::new(ArrowExpressionHandler {}),
})
}

/// Create a new [`DefaultEngine`] instance
///
/// # Parameters
///
/// - `store`: The object store to use.
/// - `table_root_path`: The root path of the table within storage.
/// - `task_executor`: Used to spawn async IO tasks. See [executor::TaskExecutor].
pub fn new(store: Arc<DynObjectStore>, table_root: Path, task_executor: Arc<E>) -> Self {
pub fn new(store: Arc<DynObjectStore>, prefix: Path, task_executor: Arc<E>) -> Self {
Self {
file_system: Arc::new(ObjectStoreFileSystemClient::new(
store.clone(),
table_root,
prefix,
task_executor.clone(),
)),
json: Arc::new(DefaultJsonHandler::new(