adapters: Iceberg source connector.

Initial implementation of the Iceberg source connector. The connector is built on the `iceberg` crate, which still in its early days and has many limitations and performance issues. * It currently only supports primitive types (no structs, maps, lists) * It only supports reading tables (hence no sink connector yet) * It only supports snapshot reads, not table following, although I think the latter could be mostly implemented using available low-level APIs. * I haven't figured out how to do efficient range queries for time seried data: apache/iceberg-rust#811 The implementation has a very similar structure to the Delta Lake connector and actually share a bunch of code with it (I moved some of this code to `adapterslib`, but I copied some other code, which I thought may diverge in the future). Both connectors register the table as a datafusion table provider and mostly work with it via the datafusion API. The main difference between Iceberg and Delta is that Iceberg cannot really be used without a catalog, since catalog is responsible for tracking the location of the latest metadata file (metadata file is the root object required to do anything with the Iceberg table). We currently support two of the most common catalog APIs: Glue (for Iceberg tables in AWS), and REST, which seems to be increasingly popular in the Iceberg community. We should be able to easily add SQL and hive catalogs, which are supported by the `iceberg` crate. The connector should work with tables in S3, local FS, and GCS, but only the first two have been tested. The `iceberg` crate currently doesn't support azure and other data stores, although it should be easy to add them if necessary, since they are supported by the `opendal` crate, which `iceberg` uses for FileIO. Signed-off-by: Leonid Ryzhyk <[email protected]>
feldera · Dec 18, 2024 · 1fe8ebe · 1fe8ebe
1 parent ca409ca
commit 1fe8ebe
Show file tree

Hide file tree

Showing 31 changed files with 4,051 additions and 731 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -92,11 +92,13 @@ jobs:
           s3_access_key: ${{ secrets.ci_s3_aws_access_key }}
           s3_secret: ${{ secrets.ci_s3_aws_secret }}
 
-      # Ship secrets for the AWS CI account for the delta table output transport test to Earthly.
-      - name: Delta output S3 secrets
+      # Ship secrets for the AWS CI account for Deltalake and Iceberg adapters tests to Earthly.
+      - name: Delta/Iceberg S3 secrets
         run: |
           echo DELTA_TABLE_TEST_AWS_ACCESS_KEY_ID="${delta_table_test_aws_access_key_id}" >> .arg && \
-          echo DELTA_TABLE_TEST_AWS_SECRET_ACCESS_KEY="${delta_table_test_aws_secret_access_key}" >> .arg
+          echo DELTA_TABLE_TEST_AWS_SECRET_ACCESS_KEY="${delta_table_test_aws_secret_access_key}" >> .arg && \
+          echo ICEBERG_TEST_AWS_ACCESS_KEY_ID="${delta_table_test_aws_access_key_id}" >> .arg && \
+          echo ICEBERG_TEST_AWS_SECRET_ACCESS_KEY="${delta_table_test_aws_secret_access_key}" >> .arg
         env:
           delta_table_test_aws_access_key_id: ${{ secrets.delta_table_test_aws_access_key_id }}
           delta_table_test_aws_secret_access_key: ${{ secrets.delta_table_test_aws_secret_access_key }}