diff --git a/LICENSE.bin b/LICENSE.bin index 27052f44232..14d44b7d31e 100644 --- a/LICENSE.bin +++ b/LICENSE.bin @@ -301,6 +301,7 @@ Apache Iceberg AWS Apache Iceberg core Apache Iceberg Hive metastore + Apache Iceberg GCP Apache Ivy Apache Log4j 1.x Compatibility API Apache Log4j API diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md index e1c8e6f1e86..1d8a20c40fa 100644 --- a/docs/iceberg-rest-service.md +++ b/docs/iceberg-rest-service.md @@ -138,6 +138,22 @@ For other Iceberg OSS properties not managed by Gravitino like `client.security- Please set the `gravitino.iceberg-rest.warehouse` parameter to `oss://{bucket_name}/${prefix_name}`. Additionally, download the [Aliyun OSS SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in the classpath of Iceberg REST server, `iceberg-rest-server/libs` for the auxiliary server, `libs` for the standalone server. ::: +#### GCS + +Supports using google credential file to access GCS data. + +| Configuration item | Description | Default value | Required | Since Version | +|----------------------------------|----------------------------------------------------------------------------------------------------|---------------|----------|---------------| +| `gravitino.iceberg-rest.io-impl` | The io implementation for `FileIO` in Iceberg, use `org.apache.iceberg.gcp.gcs.GCSFileIO` for GCS. | (none) | No | 0.6.0 | + +For other Iceberg GCS properties not managed by Gravitino like `gcs.project-id`, you could config it directly by `gravitino.iceberg-rest.gcs.project-id`. + +Please make sure the credential file is accessible by Gravitino, like using `export GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json` before Gravitino Iceberg REST server is started. + +:::info +Please set `gravitino.iceberg-rest.warehouse` to `gs://{bucket_name}/${prefix_name}`, and download [Iceberg gcp bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle) and place it to the classpath of Gravitino Iceberg REST server, `iceberg-rest-server/libs` for the auxiliary server, `libs` for the standalone server. +::: + #### HDFS configuration You should place HDFS configuration file to the classpath of the Iceberg REST server, `iceberg-rest-server/conf` for Gravitino server package, `conf` for standalone Gravitino Iceberg REST server package. When writing to HDFS, the Gravitino Iceberg REST catalog service can only operate as the specified HDFS user and doesn't support proxying to other HDFS users. See [How to access Apache Hadoop](gravitino-server-config.md#how-to-access-apache-hadoop) for more details. @@ -321,7 +337,7 @@ For example, we can configure Spark catalog options to use Gravitino Iceberg RES --conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/ ``` -You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in S3, you need to download [Iceberg AWS bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle) jar and place it in the classpath of Spark, no extra config is needed because S3 related properties is transferred from Iceberg REST server to Iceberg REST client automaticly. +You may need to adjust the Iceberg Spark runtime jar file name according to the real version number in your environment. If you want to access the data stored in cloud, you need to download corresponding jars (please refer to the cloud storage part) and place it in the classpath of Spark, no extra config is needed because related properties is transferred from Iceberg REST server to Iceberg REST client automatically. ### Exploring Apache Iceberg with Apache Spark SQL diff --git a/docs/lakehouse-iceberg-catalog.md b/docs/lakehouse-iceberg-catalog.md index 0fc3af157c0..e73f0ee39e6 100644 --- a/docs/lakehouse-iceberg-catalog.md +++ b/docs/lakehouse-iceberg-catalog.md @@ -103,6 +103,22 @@ For other Iceberg OSS properties not managed by Gravitino like `client.security- Please set the `warehouse` parameter to `oss://{bucket_name}/${prefix_name}`. Additionally, download the [Aliyun OSS SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in the `catalogs/lakehouse-iceberg/libs/` directory. ::: +#### GCS + +Supports using google credential file to access GCS data. + +| Configuration item | Description | Default value | Required | Since Version | +|------------------------|----------------------------------------------------------------------------------------------------|---------------|----------|---------------| +| `io-impl` | The io implementation for `FileIO` in Iceberg, use `org.apache.iceberg.gcp.gcs.GCSFileIO` for GCS. | (none) | No | 0.6.0 | + +For other Iceberg GCS properties not managed by Gravitino like `gcs.project-id`, you could config it directly by `gravitino.bypass.gcs.project-id`. + +Please make sure the credential file is accessible by Gravitino, like using `export GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json` before Gravitino server is started. + +:::info +Please set `warehouse` to `gs://{bucket_name}/${prefix_name}`, and download [Iceberg gcp bundle jar](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle) and place it to `catalogs/lakehouse-iceberg/libs/`. +::: + #### Catalog backend security Users can use the following properties to configure the security of the catalog backend if needed. For example, if you are using a Kerberos Hive catalog backend, you must set `authentication.type` to `Kerberos` and provide `authentication.kerberos.principal` and `authentication.kerberos.keytab-uri`. diff --git a/docs/spark-connector/spark-catalog-iceberg.md b/docs/spark-connector/spark-catalog-iceberg.md index b13b0ccf9ef..8552177f8aa 100644 --- a/docs/spark-connector/spark-catalog-iceberg.md +++ b/docs/spark-connector/spark-catalog-iceberg.md @@ -127,4 +127,8 @@ You need to add s3 secret to the Spark configuration using `spark.sql.catalog.${ ### OSS -You need to add OSS secret key to the Spark configuration using `spark.sql.catalog.${iceberg_catalog_name}.client.access-key-id` and `spark.sql.catalog.${iceberg_catalog_name}.client.access-key-secret`. Additionally, download the [Aliyun OSS SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in the classpath of Spark. \ No newline at end of file +You need to add OSS secret key to the Spark configuration using `spark.sql.catalog.${iceberg_catalog_name}.client.access-key-id` and `spark.sql.catalog.${iceberg_catalog_name}.client.access-key-secret`. Additionally, download the [Aliyun OSS SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in the classpath of Spark. + +### GCS + +No extra configuration is needed. Please make sure the credential file is accessible by Spark, like using `export GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json`, and download [Iceberg gcp bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle) and place it to the classpath of Spark. diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml index 6980414c01f..4efb10eb220 100644 --- a/gradle/libs.versions.toml +++ b/gradle/libs.versions.toml @@ -162,6 +162,7 @@ iceberg-aws = { group = "org.apache.iceberg", name = "iceberg-aws", version.ref iceberg-core = { group = "org.apache.iceberg", name = "iceberg-core", version.ref = "iceberg" } iceberg-api = { group = "org.apache.iceberg", name = "iceberg-api", version.ref = "iceberg" } iceberg-hive-metastore = { group = "org.apache.iceberg", name = "iceberg-hive-metastore", version.ref = "iceberg" } +iceberg-gcp = { group = "org.apache.iceberg", name = "iceberg-gcp", version.ref = "iceberg" } paimon-core = { group = "org.apache.paimon", name = "paimon-core", version.ref = "paimon" } paimon-format = { group = "org.apache.paimon", name = "paimon-format", version.ref = "paimon" } paimon-hive-catalog = { group = "org.apache.paimon", name = "paimon-hive-catalog", version.ref = "paimon" } diff --git a/iceberg/iceberg-common/build.gradle.kts b/iceberg/iceberg-common/build.gradle.kts index fcb0a2b1f4d..d0964e2dacb 100644 --- a/iceberg/iceberg-common/build.gradle.kts +++ b/iceberg/iceberg-common/build.gradle.kts @@ -37,6 +37,7 @@ dependencies { implementation(libs.iceberg.aliyun) implementation(libs.iceberg.aws) implementation(libs.iceberg.hive.metastore) + implementation(libs.iceberg.gcp) implementation(libs.hadoop2.common) { exclude("com.github.spotbugs") exclude("com.sun.jersey")