Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5051] GravitinoClient needs metalake to obtain version #5060

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
public class GravitinoClient extends GravitinoClientBase
implements SupportsCatalogs, TagOperations {

private final GravitinoMetalake metalake;
private static GravitinoMetalake metalake = null;

/**
* Constructs a new GravitinoClient with the given URI, authenticator and AuthDataProvider.
Expand All @@ -84,6 +84,24 @@ private GravitinoClient(
this.metalake = loadMetalake(metalakeName);
}

/**
* Constructs a new GravitinoClient with the given URI, authenticator and AuthDataProvider.
*
* @param uri The base URI for the Gravitino API.
* @param authDataProvider The provider of the data which is used for authentication.
* @param checkVersion Whether to check the version of the Gravitino server. Gravitino does not
* support the case that the client-side version is higher than the server-side version.
* @param headers The base header for Gravitino API.
* @throws NoSuchMetalakeException if the metalake with specified name does not exist.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the metalake set after calling this constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I will close this PR first and make some modifications. Later, I will open a new PR for further discussion

private GravitinoClient(
String uri,
AuthDataProvider authDataProvider,
boolean checkVersion,
Map<String, String> headers) {
super(uri, authDataProvider, checkVersion, headers);
}

/**
* Get the current metalake object
*
Expand Down Expand Up @@ -537,7 +555,7 @@ public GravitinoClient build() {
metalakeName != null && !metalakeName.isEmpty(),
"The argument 'metalakeName' must be a valid name");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the checks on metalakeName will stop the new constructor from being called if metalake name is null or empty.


return new GravitinoClient(uri, metalakeName, authDataProvider, checkVersion, headers);
return new GravitinoClient(uri, authDataProvider, checkVersion, headers);
}
}
}
28 changes: 14 additions & 14 deletions docs/apache-hive-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,25 +133,25 @@ Since 0.6.0-incubating, the data types other than listed above are mapped to Gra
Table properties supply or set metadata for the underlying Hive tables.
The following table lists predefined table properties for a Hive table. Additionally, you can define your own key-value pair properties and transmit them to the underlying Hive database.

| Property Name | Description | Default Value | Required | Since version |
|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------|
| `location` | The location for table storage, such as `/user/hive/warehouse/test_table`. | HMS uses the database location as the parent directory by default. | No | 0.2.0 |
| `table-type` | Type of the table. Valid values include `MANAGED_TABLE` and `EXTERNAL_TABLE`. | `MANAGED_TABLE` | No | 0.2.0 |
| `format` | The table file format. Valid values include `TEXTFILE`, `SEQUENCEFILE`, `RCFILE`, `ORC`, `PARQUET`, `AVRO`, `JSON`, `CSV`, and `REGEX`. | `TEXTFILE` | No | 0.2.0 |
| `input-format` | The input format class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. | The property `format` sets the default value `org.apache.hadoop.mapred.TextInputFormat` and can change it to a different default. | No | 0.2.0 |
| `output-format` | The output format class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat`. | The property `format` sets the default value `org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat` and can change it to a different default. | No | 0.2.0 |
| `serde-lib` | The serde library class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcSerde`. | The property `format` sets the default value `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe` and can change it to a different default. | No | 0.2.0 |
| Property Name | Description | Default Value | Required | Since version |
|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------|
| `location` | The location for table storage, such as `/user/hive/warehouse/test_table`. | HMS uses the database location as the parent directory by default. | No | 0.2.0 |
| `table-type` | Type of the table. Valid values include `MANAGED_TABLE` and `EXTERNAL_TABLE`. | `MANAGED_TABLE` | No | 0.2.0 |
| `format` | The table file format. Valid values include `TEXTFILE`, `SEQUENCEFILE`, `RCFILE`, `ORC`, `PARQUET`, `AVRO`, `JSON`, `CSV`, and `REGEX`. | `TEXTFILE` | No | 0.2.0 |
| `input-format` | The input format class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. | The property `format` sets the default value `org.apache.hadoop.mapred.TextInputFormat` and can change it to a different default. | No | 0.2.0 |
| `output-format` | The output format class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat`. | The property `format` sets the default value `org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat` and can change it to a different default. | No | 0.2.0 |
| `serde-lib` | The serde library class for the table, such as `org.apache.hadoop.hive.ql.io.orc.OrcSerde`. | The property `format` sets the default value `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe` and can change it to a different default. | No | 0.2.0 |
| `serde.parameter.` | The prefix of the serde parameter, such as `"serde.parameter.orc.create.index" = "true"`, indicating `ORC` serde lib to create row indexes | (none) | No | 0.2.0 |

Hive automatically adds and manages some reserved properties. Users aren't allowed to set these properties.

| Property Name | Description | Since Version |
|-------------------------|---------------------------------------------------|---------------|
| Property Name | Description | Since Version |
|-------------------------|-------------------------------------------------|---------------|
| `comment` | Used to store a table comment. | 0.2.0 |
| `numFiles` | Used to store the number of files in the table. | 0.2.0 |
| `totalSize` | Used to store the total size of the table. | 0.2.0 |
| `EXTERNAL` | Indicates whether the table is external. | 0.2.0 |
| `transient_lastDdlTime` | Used to store the last DDL time of the table. | 0.2.0 |
| `numFiles` | Used to store the number of files in the table. | 0.2.0 |
| `totalSize` | Used to store the total size of the table. | 0.2.0 |
| `EXTERNAL` | Indicates whether the table is external. | 0.2.0 |
| `transient_lastDdlTime` | Used to store the last DDL time of the table. | 0.2.0 |

### Table indexes

Expand Down
14 changes: 7 additions & 7 deletions docs/flink-connector/flink-catalog-hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,13 @@ The configuration of Flink Hive Connector is the same with the original Flink Hi
Gravitino catalog property names with the prefix `flink.bypass.` are passed to Flink Hive connector. For example, using `flink.bypass.hive-conf-dir` to pass the `hive-conf-dir` to the Flink Hive connector.
The validated catalog properties are listed below. Any other properties with the prefix `flink.bypass.` in Gravitino Catalog will be ignored by Gravitino Flink Connector.

| Property name in Gravitino catalog properties | Flink Hive connector configuration | Description | Since Version |
|-----------------------------------------------|------------------------------------|-----------------------|---------------|
| `flink.bypass.default-database` | `default-database` | Hive default database | 0.6.0 |
| `flink.bypass.hive-conf-dir` | `hive-conf-dir` | Hive conf dir | 0.6.0 |
| `flink.bypass.hive-version` | `hive-version` | Hive version | 0.6.0 |
| `flink.bypass.hadoop-conf-dir` | `hadoop-conf-dir` | Hadoop conf dir | 0.6.0 |
| `metastore.uris` | `hive.metastore.uris` | Hive metastore uri | 0.6.0 |
| Property name in Gravitino catalog properties | Flink Hive connector configuration | Description | Since Version |
|-----------------------------------------------|------------------------------------|-----------------------|------------------|
| `flink.bypass.default-database` | `default-database` | Hive default database | 0.6.0-incubating |
| `flink.bypass.hive-conf-dir` | `hive-conf-dir` | Hive conf dir | 0.6.0-incubating |
| `flink.bypass.hive-version` | `hive-version` | Hive version | 0.6.0-incubating |
| `flink.bypass.hadoop-conf-dir` | `hadoop-conf-dir` | Hadoop conf dir | 0.6.0-incubating |
| `metastore.uris` | `hive.metastore.uris` | Hive metastore uri | 0.6.0-incubating |

:::caution
You can set other hadoop properties (with the prefix `hadoop.`, `dfs.`, `fs.`, `hive.`) in Gravitino Catalog properties. If so, it will override
Expand Down
60 changes: 30 additions & 30 deletions docs/flink-connector/flink-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ This capability allows users to perform federation queries, accessing data from
1. [Build](../how-to-build.md) or [download](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-flink-connector-runtime-1.18) the Gravitino flink connector runtime jar, and place it to the classpath of Flink.
2. Configure the Flink configuration to use the Gravitino flink connector.

| Property | Type | Default Value | Description | Required | Since Version |
|--------------------------------------------------|--------|-------------------|----------------------------------------------------------------------|----------|---------------|
| table.catalog-store.kind | string | generic_in_memory | The Catalog Store name, it should set to `gravitino`. | Yes | 0.6.0 |
| table.catalog-store.gravitino.gravitino.metalake | string | (none) | The metalake name that flink connector used to request to Gravitino. | Yes | 0.6.0 |
| table.catalog-store.gravitino.gravitino.uri | string | (none) | The uri of Gravitino server address. | Yes | 0.6.0 |
| Property | Type | Default Value | Description | Required | Since Version |
|--------------------------------------------------|--------|-------------------|----------------------------------------------------------------------|----------|------------------|
| table.catalog-store.kind | string | generic_in_memory | The Catalog Store name, it should set to `gravitino`. | Yes | 0.6.0-incubating |
| table.catalog-store.gravitino.gravitino.metalake | string | (none) | The metalake name that flink connector used to request to Gravitino. | Yes | 0.6.0-incubating |
| table.catalog-store.gravitino.gravitino.uri | string | (none) | The uri of Gravitino server address. | Yes | 0.6.0-incubating |

Set the flink configuration in flink-conf.yaml.
```yaml
Expand Down Expand Up @@ -66,28 +66,28 @@ SELECT * FROM hive_students;

Gravitino flink connector support the following datatype mapping between Flink and Gravitino.

| Flink Type | Gravitino Type | Since Version |
|----------------------------------|-------------------------------|---------------|
| `array` | `array` | 0.6.0 |
| `bigint` | `long` | 0.6.0 |
| `binary` | `fixed` | 0.6.0 |
| `boolean` | `boolean` | 0.6.0 |
| `char` | `char` | 0.6.0 |
| `date` | `date` | 0.6.0 |
| `decimal` | `decimal` | 0.6.0 |
| `double` | `double` | 0.6.0 |
| `float` | `float` | 0.6.0 |
| `integer` | `integer` | 0.6.0 |
| `map` | `map` | 0.6.0 |
| `null` | `null` | 0.6.0 |
| `row` | `struct` | 0.6.0 |
| `smallint` | `short` | 0.6.0 |
| `time` | `time` | 0.6.0 |
| `timestamp` | `timestamp without time zone` | 0.6.0 |
| `timestamp without time zone` | `timestamp without time zone` | 0.6.0 |
| `timestamp with time zone` | `timestamp with time zone` | 0.6.0 |
| `timestamp with local time zone` | `timestamp with time zone` | 0.6.0 |
| `timestamp_ltz` | `timestamp with time zone` | 0.6.0 |
| `tinyint` | `byte` | 0.6.0 |
| `varbinary` | `binary` | 0.6.0 |
| `varchar` | `string` | 0.6.0 |
| Flink Type | Gravitino Type | Since Version |
|----------------------------------|-------------------------------|------------------|
| `array` | `list` | 0.6.0-incubating |
| `bigint` | `long` | 0.6.0-incubating |
| `binary` | `fixed` | 0.6.0-incubating |
| `boolean` | `boolean` | 0.6.0-incubating |
| `char` | `char` | 0.6.0-incubating |
| `date` | `date` | 0.6.0-incubating |
| `decimal` | `decimal` | 0.6.0-incubating |
| `double` | `double` | 0.6.0-incubating |
| `float` | `float` | 0.6.0-incubating |
| `integer` | `integer` | 0.6.0-incubating |
| `map` | `map` | 0.6.0-incubating |
| `null` | `null` | 0.6.0-incubating |
| `row` | `struct` | 0.6.0-incubating |
| `smallint` | `short` | 0.6.0-incubating |
| `time` | `time` | 0.6.0-incubating |
| `timestamp` | `timestamp without time zone` | 0.6.0-incubating |
| `timestamp without time zone` | `timestamp without time zone` | 0.6.0-incubating |
| `timestamp with time zone` | `timestamp with time zone` | 0.6.0-incubating |
| `timestamp with local time zone` | `timestamp with time zone` | 0.6.0-incubating |
| `timestamp_ltz` | `timestamp with time zone` | 0.6.0-incubating |
| `tinyint` | `byte` | 0.6.0-incubating |
| `varbinary` | `binary` | 0.6.0-incubating |
| `varchar` | `string` | 0.6.0-incubating |
Loading
Loading