Skip to content

Commit

Permalink
adapters: Iceberg source connector.
Browse files Browse the repository at this point in the history
Initial implementation of the Iceberg source connector. The connector is
built on the `iceberg` crate, which still in its early days and has many
limitations and performance issues.

* It currently only supports primitive types (no structs, maps, lists)
* It only supports reading tables (hence no sink connector yet)
* It only supports snapshot reads, not table following, although I think
  the latter could be mostly implemented using available low-level APIs.
* I haven't figured out how to do efficient range queries for time
  seried data: apache/iceberg-rust#811

The implementation has a very similar structure to the Delta Lake
connector and actually share a bunch of code with it (I moved some of
this code to `adapterslib`, but I copied some other code, which I
thought may diverge in the future).  Both connectors register the table
as a datafusion table provider and mostly work with it via the
datafusion API.

The main difference between Iceberg and Delta is that Iceberg cannot really
be used without a catalog, since catalog is responsible for tracking the
location of the latest metadata file (metadata file is the root object
required to do anything with the Iceberg table). We currently support
two of the most common catalog APIs: Glue (for Iceberg tables in AWS),
and REST, which seems to be increasingly popular in the Iceberg
community. We should be able to easily add SQL and hive catalogs, which
are supported by the `iceberg` crate.

The connector should work with tables in S3, local FS, and GCS, but only the
first two have been tested. The `iceberg` crate currently doesn't
support azure and other data stores, although it should be easy to add them if
necessary, since they are supported by the `opendal` crate, which
`iceberg` uses for FileIO.

Signed-off-by: Leonid Ryzhyk <[email protected]>
  • Loading branch information
Leonid Ryzhyk committed Dec 18, 2024
1 parent ca409ca commit 1fe8ebe
Show file tree
Hide file tree
Showing 31 changed files with 4,051 additions and 731 deletions.
8 changes: 5 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,13 @@ jobs:
s3_access_key: ${{ secrets.ci_s3_aws_access_key }}
s3_secret: ${{ secrets.ci_s3_aws_secret }}

# Ship secrets for the AWS CI account for the delta table output transport test to Earthly.
- name: Delta output S3 secrets
# Ship secrets for the AWS CI account for Deltalake and Iceberg adapters tests to Earthly.
- name: Delta/Iceberg S3 secrets
run: |
echo DELTA_TABLE_TEST_AWS_ACCESS_KEY_ID="${delta_table_test_aws_access_key_id}" >> .arg && \
echo DELTA_TABLE_TEST_AWS_SECRET_ACCESS_KEY="${delta_table_test_aws_secret_access_key}" >> .arg
echo DELTA_TABLE_TEST_AWS_SECRET_ACCESS_KEY="${delta_table_test_aws_secret_access_key}" >> .arg && \
echo ICEBERG_TEST_AWS_ACCESS_KEY_ID="${delta_table_test_aws_access_key_id}" >> .arg && \
echo ICEBERG_TEST_AWS_SECRET_ACCESS_KEY="${delta_table_test_aws_secret_access_key}" >> .arg
env:
delta_table_test_aws_access_key_id: ${{ secrets.delta_table_test_aws_access_key_id }}
delta_table_test_aws_secret_access_key: ${{ secrets.delta_table_test_aws_secret_access_key }}
Expand Down
Loading

0 comments on commit 1fe8ebe

Please sign in to comment.