-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29233: Iceberg: Validate HiveRESTCatalogClient with external RESTCatalogs like Gravitino #6108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @difin, Can we use the standalone HMS instead Gravitino as the Iceberg REST server for E2E testing? |
@zhangbutao, we already have a test for HMS RestCatalog (iceberg_rest_catalog_hms.q). |
import java.util.List; | ||
|
||
@RunWith(Parameterized.class) | ||
public class TestIcebergRESTCatalogGravitinoLlapLocalCliDriver { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to have a single TestIcebergRESTCatalogLlapLocalCliDriver
with multiple RestCatalog providers, configured via additional param. Something similar to backend db arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to do it with one CLI driver and extra parameter as you suggested, just currently there is a problem with syncing host warehouseDir with container warehouseDir, if I manage to solve it, I'll try to do it in one driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @deniskuzZ, Gravitino test works now in CI and on local, but I don't think it is the best approach to unify the driver for HMS REST server and Gravitino for the following reasons:
- Both drivers are very different and their implementations share nothing in common.
- It will be one very complicated driver that combines 2 different approaches, not the best for readability.
- Having separate drivers is good as a ready to use clean working integration demo examples.
.q
files are slightly different: gravitino .q file has! sleep
command as a workaround to give the manual sync process to complete afterINSERT
and before reading the table.- .
q.out
files are slightly different: HMS REST Catalog has thedefault
database, Gravitino doesn't.
What do you think?
c949a61
to
f3abd30
Compare
f3abd30
to
0497faa
Compare
0497faa
to
b604394
Compare
b604394
to
8656b64
Compare
8656b64
to
54bb487
Compare
I checked out this branch and ran
|
@okumin gravitino test passed on CI https://ci.hive.apache.org/job/hive-precommit/job/PR-6108/5/testReport/org.apache.hadoop.hive.cli/TestIcebergRESTCatalogGravitinoLlapLocalCliDriver/Testing___split_06___PostProcess___testCliDriver_iceberg_rest_catalog_gravitino_/ Passes on my local too:
|
54bb487
to
881dd62
Compare
…TCatalogs like Gravitino
881dd62
to
aa254a8
Compare
My local machine has no luck.
and then
It is likely an issue with my local or Gravitino's setup. I'm not sure what is different from the CI env |
@@ -0,0 +1,86 @@ | |||
-- SORT_QUERY_RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified the diff is likely expected.
% diff iceberg/iceberg-handler/src/test/queries/positive/iceberg_rest_catalog_hms.q iceberg/iceberg-handler/src/test/queries/positive/iceberg_rest_catalog_gravitino.q
68a69,73
> --! In CI, Testcontainers' .withFileSystemBind() is not able to bind the same host path to the same container path,
> --! so as a workaround, the .metadata.json files from container are manually synced in a daemon process,
> --! since the sync can take some time, need to wait for it to happen after the insert operation.
> ! sleep 20;
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #6108 (comment):
- .q files are slightly different: gravitino .q file has ! sleep command as a workaround to give the manual sync process to complete after INSERT and before reading the table.
- .q.out files are slightly different: HMS REST Catalog has the default database, Gravitino doesn't.
@@ -0,0 +1,231 @@ | |||
PREHOOK: query: create database ice_rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified the diff is likely expected.
% diff iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_rest_catalog_hms.q.out iceberg/iceberg-handler/src/test/results/positive/llap/iceberg_rest_catalog_gravitino.q.out
219d218
< default
233d231
< default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.q.out files are slightly different: HMS REST Catalog has the default
database, Gravitino doesn't.
<groupId>org.testcontainers</groupId> | ||
<artifactId>testcontainers</artifactId> | ||
<scope>test</scope> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the library that allows to run docker containers in tests - GenericContainer
is from testcontainers
:
private void startGravitinoContainer() {
gravitinoContainer = new GenericContainer<>(GRAVITINO_IMAGE)
.withExposedPorts(9001)
// Update entrypoint to create the warehouse directory before starting the server
.withCreateContainerCmdModifier(cmd -> cmd.withEntrypoint("bash", "-c",
String.format("mkdir -p %s && exec %s", warehouseDir.toString(), GRAVITINO_STARTUP_SCRIPT)))
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.hadoop.hive.cli; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow SonarQube's instructions.
https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=6108&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
private static final CliAdapter adapter = new CliConfigs.TestIcebergRESTCatalogGravitinoLlapLocalCliDriver().getCliAdapter(); | ||
private static final DockerImageName GRAVITINO_IMAGE = | ||
DockerImageName.parse("apache/gravitino-iceberg-rest:1.0.0-rc3"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use 1.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release v1.0.0 was released just 2 days ago, done.
gravitino.iceberg-rest.httpPort = 9001 | ||
|
||
# --- Iceberg REST Catalog Backend (set to JDBC) --- | ||
gravitino.iceberg-rest.catalog-backend = jdbc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory mode does still not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in memory mode and confirmed it does work correctly now with the manual sync between the container and host warehouse directories.
That said, I’m not sure memory mode should be the preferred option here. One of the goals of this PR is to demonstrate how HMSRestCatalogClient
integrates with an external RESTCatalog
server like Gravitino. For that reason, I think it makes more sense to highlight jdbc mode in the examples/tests, since it better reflects real-world usage where the catalog state is persisted beyond the lifetime of a single container.
The port 9001 is not the correct one to use on the host. This port is only open inside the gravitino container. It is mapped to a random external port in this place in the
|
|
What changes were proposed in this pull request?
Implemented a new qtest-driver
TestIcebergRESTCatalogGravitinoLlapLocalCliDriver
for testingHiveRESTCatalogClient
with Gravitino Iceberg Rest Server and a q-test for it.Why are the changes needed?
To validate support for external RestCatalogs like Gravitino.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New q-test.