Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial implementation of the Iceberg source connector. The connector is built on the `iceberg` crate, which still in its early days and has many limitations and performance issues. * It currently only supports primitive types (no structs, maps, lists) * It only supports reading tables (hence no sink connector yet) * It only supports snapshot reads, not table following, although I think the latter could be mostly implemented using available low-level APIs. * I haven't figured out how to do efficient range queries for time seried data: apache/iceberg-rust#811 The implementation has a very similar structure to the Delta Lake connector and actually share a bunch of code with it (I moved some of this code to `adapterslib`, but I copied some other code, which I thought may diverge in the future). Both connectors register the table as a datafusion table provider and mostly work with it via the datafusion API. The main difference between Iceberg and Delta is that Iceberg cannot really be used without a catalog, since catalog is responsible for tracking the location of the latest metadata file (metadata file is the root object required to do anything with the Iceberg table). We currently support two of the most common catalog APIs: Glue (for Iceberg tables in AWS), and REST, which seems to be increasingly popular in the Iceberg community. We should be able to easily add SQL and hive catalogs, which are supported by the `iceberg` crate. The connector should work with tables in S3, local FS, and GCS, but only the first two have been tested. The `iceberg` crate currently doesn't support azure and other data stores, although it should be easy to add them if necessary, since they are supported by the `opendal` crate, which `iceberg` uses for FileIO. Signed-off-by: Leonid Ryzhyk <[email protected]>
- Loading branch information