Skip to content

Commit

Permalink
Depend on wider arrow versions, add integration test (#366)
Browse files Browse the repository at this point in the history
This widens our arrow dependency and adds a basic integration test.

Note that the patching workflow is not 100% ideal as it requires a
version specification that reduces to exactly one version. I don't think
there's a way around this, so we may want to in the future adjust to a
feature flag based approach. That's a lot more work though, so we'll do
this for now and unblock people using older arrow versions, and then
iterate if needed.

At this point, the test just makes sure we can compile (and run a
trivial program) with all the versions of arrow we say we support. It
creates an arrow schema and a kernel schema, and then compares them to
make sure the versions work together.

The testing script pulls all versions of arrow, checks that it's in the
range we support, then `sed`s each one into the `integration-test`'s
`Cargo.toml` as both the required version of arrow and as the versions
in the `[patch]` section. Then compiles and runs the crate.

Reviewers, recommend you also look at the workflow jobs triggered to see
what the tests do.

---------

Co-authored-by: Nick Lanham <[email protected]>
  • Loading branch information
nicklan and Nick Lanham authored Oct 21, 2024
1 parent 284db10 commit 2b1c46f
Show file tree
Hide file tree
Showing 7 changed files with 160 additions and 12 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/run_integration_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Run tests to ensure we can compile across arrow versions

on: [workflow_dispatch, push, pull_request]

jobs:
arrow_integration_test:
runs-on: ${{ matrix.os }}
timeout-minutes: 20
strategy:
fail-fast: false
matrix:
os:
- macOS-latest
- ubuntu-latest
- windows-latest
steps:
- uses: actions/checkout@v3
- name: Install minimal stable rust
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: stable
override: true
- uses: Swatinem/rust-cache@v2
- name: run integration tests
shell: bash
run: pushd integration-tests && ./test-all-arrow-versions.sh
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
.cargo/
target/
/Cargo.lock
integration-tests/Cargo.lock

# Project
acceptance/tests/dat/
Expand Down
22 changes: 11 additions & 11 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,17 @@ readme = "README.md"
version = "0.3.1"

[workspace.dependencies]
arrow = { version = "53.0" }
arrow-arith = { version = "53.0" }
arrow-array = { version = "53.0" }
arrow-buffer = { version = "53.0" }
arrow-cast = { version = "53.0" }
arrow-data = { version = "53.0" }
arrow-ord = { version = "53.0" }
arrow-json = { version = "53.0" }
arrow-select = { version = "53.0" }
arrow-schema = { version = "53.0" }
parquet = { version = "53.0", features = ["object_store"] }
arrow = { version = ">=52, <54" }
arrow-arith = { version = ">=52, <54" }
arrow-array = { version = ">=52, <54" }
arrow-buffer = { version = ">=52, <54" }
arrow-cast = { version = ">=52, <54" }
arrow-data = { version = ">=52, <54" }
arrow-ord = { version = ">=52, <54" }
arrow-json = { version = ">=52, <54" }
arrow-select = { version = ">=52, <54" }
arrow-schema = { version = ">=52, <54" }
parquet = { version = ">=52, <54", features = ["object_store"] }
object_store = "0.11.0"
hdfs-native-object-store = "0.12.0"
hdfs-native = "0.10.0"
Expand Down
38 changes: 37 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,42 @@ We intend to follow [Semantic Versioning](https://semver.org/). However, in the
are still unstable. We therefore may break APIs within minor releases (that is, `0.1` -> `0.2`), but
we will not break APIs in patch releases (`0.1.0` -> `0.1.1`).

## Arrow versioning
If you enable the `default-engine` or `sync-engine` features, you get an implemenation of the
`Engine` trait that uses [Arrow] as its data format.

The [`arrow crate`](https://docs.rs/arrow/latest/arrow/) tends to release new major versions rather
quickly. To enable engines that already integrate arrow to also integrate kernel and not force them
to track a specific version of arrow that kernel depends on, we take as broad dependecy on arrow
versions as we can.

This means you can force kernel to rely on the specific arrow version that your engine already uses,
as long as it falls in that range. You can see the range in the `Cargo.toml` in the same folder as
this `README.md`.

For example, although arrow 53.x has been released, you can force kernel to compile on 52.2.0 by
putting the following in your project's `Cargo.toml`:

```toml
[patch.crates-io]
arrow = "52.2"
arrow-arith = "52.2"
arrow-array = "52.2"
arrow-buffer = "52.2"
arrow-cast = "52.2"
arrow-data = "52.2"
arrow-ord = "52.2"
arrow-json = "52.2"
arrow-select = "52.2"
arrow-schema = "52.2"
parquet = "52.2"
```

Note that unfortunatly patching in `cargo` requires that _exactly one_ version matches your
specification. If only arrow "52.2.0" has been released the above will work, but if "52.2.1" is
released, the specification will break and you will need to provide a more restrictive
specification.

## Documentation

- [API Docs](https://docs.rs/delta_kernel/latest/delta_kernel/)
Expand Down Expand Up @@ -140,4 +176,4 @@ Some design principles which should be considered:
[cargo-llvm-cov]: https://github.com/taiki-e/cargo-llvm-cov
[FFI]: ffi/
[Arrow]: https://arrow.apache.org/rust/arrow/index.html
[Tokio]: https://tokio.rs/
[Tokio]: https://tokio.rs/
23 changes: 23 additions & 0 deletions integration-tests/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[package]
name = "integration-tests"
version = "0.1.0"
edition = "2021"

[workspace]

[dependencies]
arrow = "=52.1.0"
delta_kernel = { path = "../kernel", features = ["arrow-conversion"] }

[patch.'file:///../kernel']
arrow = "=52.1.0"
arrow-arith = "=52.1.0"
arrow-array = "=52.1.0"
arrow-buffer = "=52.1.0"
arrow-cast = "=52.1.0"
arrow-data = "=52.1.0"
arrow-ord = "=52.1.0"
arrow-json = "=52.1.0"
arrow-select = "=52.1.0"
arrow-schema = "=52.1.0"
parquet = "=52.1.0"
22 changes: 22 additions & 0 deletions integration-tests/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
fn create_arrow_schema() -> arrow::datatypes::Schema {
use arrow::datatypes::{DataType, Field, Schema};
let field_a = Field::new("a", DataType::Int64, false);
let field_b = Field::new("b", DataType::Boolean, false);
Schema::new(vec![field_a, field_b])
}

fn create_kernel_schema() -> delta_kernel::schema::Schema {
use delta_kernel::schema::{DataType, Schema, StructField};
let field_a = StructField::new("a", DataType::LONG, false);
let field_b = StructField::new("b", DataType::BOOLEAN, false);
Schema::new(vec![field_a, field_b])
}

fn main() {
let arrow_schema = create_arrow_schema();
let kernel_schema = create_kernel_schema();
let convereted: delta_kernel::schema::Schema =
delta_kernel::schema::Schema::try_from(&arrow_schema).expect("couldn't convert");
assert!(kernel_schema == convereted);
println!("Okay, made it");
}
39 changes: 39 additions & 0 deletions integration-tests/test-all-arrow-versions.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash

set -eu -o pipefail

is_version_le() {
[ "$1" = "$(echo -e "$1\n$2" | sort -V | head -n1)" ]
}

is_version_lt() {
if [ "$1" = "$2" ]
then
return 1
else
is_version_le "$1" "$2"
fi
}

test_arrow_version() {
ARROW_VERSION="$1"
echo "== Testing version $ARROW_VERSION =="
sed -i'' -e "s/\(arrow[^\"]*=[^\"]*\).*/\1\"=$ARROW_VERSION\"/" Cargo.toml
sed -i'' -e "s/\(parquet[^\"]*\).*/\1\"=$ARROW_VERSION\"/" Cargo.toml
cargo clean
rm -f Cargo.lock
cargo update
cat Cargo.toml
cargo run
}

MIN_ARROW_VER="52.0.0"
MAX_ARROW_VER="54.0.0"

for ARROW_VERSION in $(curl -s https://crates.io/api/v1/crates/arrow | jq -r '.versions[].num' | tr -d '\r')
do
if ! is_version_lt "$ARROW_VERSION" "$MIN_ARROW_VER" && is_version_lt "$ARROW_VERSION" "$MAX_ARROW_VER"
then
test_arrow_version "$ARROW_VERSION"
fi
done

0 comments on commit 2b1c46f

Please sign in to comment.