-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5072]Improvement(docs): Add docs for Trino cascading query #5012
Merged
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
--- | ||
title: "Apache Gravitino connector Trino cascading query" | ||
slug: /trino-connector/trino-cascading-query | ||
keyword: gravitino connector trino | ||
license: "This software is licensed under the Apache License version 2." | ||
--- | ||
|
||
# Background | ||
With the `Apache Gravitino Trino connector` and the `Gravitino Trino cascading connector`, you can implement cascading queries in Trino. | ||
These connectors allow you to treat other Trino clusters as data sources for the current Trino cluster, | ||
enabling queries across catalogs in different Trino clusters. | ||
|
||
This mechanism prioritizes executing queries in the Trino cluster located in the same region as the data, | ||
based on the data distribution in the catalogs. By doing so, it significantly reduces the amount of data | ||
transferred over the network, addressing the performance issues commonly found in traditional federated query engines | ||
where large volumes of data need to be transmitted across networks. | ||
|
||
# Deploying Trino | ||
|
||
## Deploying Trino | ||
|
||
To set up the Trino cascading query environment, you should first deploy at least two Trino environments. | ||
Next, install the `Apache Gravitino Trino connector` plugin and `Gravitino Trino cascading connector` plugin into Trino. | ||
For detailed steps, please refer to the [Deploying Trino documentation](installation.md). | ||
|
||
Follow these steps: | ||
|
||
1. [Download](https://github.com/apache/gravitino/releases) the `Apache Gravitino Trino connector` tarball and unpack it. | ||
The tarball contains a single top-level directory named `gravitino-trino-connector-<version>`. Rename this directory to `gravitino`. | ||
2. [Download](https://github.com/datastrato/trino-cascading-connector/releases) the `Gravitino Trino cascading connector` tarball and unpack it. | ||
This tarball also contains a single top-level directory named `gravitino-trino-cascading-connector-<version>`. Rename this directory to `trino`. | ||
3. Copy both connector directories to Trino's plugin directory. | ||
Typically, this directory is located at `Trino-server-<version>/plugin` and contains other catalogs used by Trino. | ||
|
||
Ensure that the `plugin` directory includes the `gravitino` and `trino` subdirectories. | ||
Verify the network connectivity between the machines hosting the two Trino clusters, identified as `c1-trino` and `c2-trino`. | ||
|
||
|
||
## Deploying Trino in Containers | ||
|
||
Download the `Apache Gravitino Trino connector` tarball and `Gravitino Trino cascading connector` tarball, then unpack them. | ||
After unpacking, you will find the directories named `gravitino-trino-connector-<version>` | ||
and `gravitino-trino-cascading-connector-<version>`. | ||
|
||
To start Trino on the host `c1-trino` and mount the plugins, execute the following command: | ||
|
||
```bash | ||
docker run --name c1-trino -d -p 8080:8080 <image-name> -v `gravitino-trino-connector-<version>`:/usr/lib/trino/plugin/gravitino \ | ||
-v `gravitino-trino-cascading-connector-<version>`:/usr/lib/trino/plugin/trino | ||
|
||
``` | ||
|
||
Similarly, to start Trino on the host `c2-trino` and mount the plugins, use: | ||
|
||
```bash | ||
docker run --name c2-trino -d -p 8080:8080 <image-name> -v `gravitino-trino-connector-<version>`:/usr/lib/trino/plugin/gravitino \ | ||
-v `gravitino-trino-cascading-connector-<version>`:/usr/lib/trino/plugin/trino | ||
``` | ||
|
||
After starting the Trino containers, ensure the configuration directory `/etc/trino` is correctly set up. | ||
Also, verify that the Trino containers on `c1-trino` and `c2-trino` can communicate with each other over the network. | ||
|
||
## Configuring Trino | ||
|
||
For detailed instructions on configuring Trino, please refer to the [Trino documentation](https://trino.io/docs/current/installation/deployment.html#configuring-trino). | ||
After completing the basic configuration, proceed to configure the Gravitino connector. | ||
Create a `gravitino.properties` file in the `etc/catalog` directory on the `c1-trino` host with the following contents: | ||
|
||
```text | ||
connector.name = gravitino | ||
gravitino.uri = http://GRAVITINO_HOST_IP:GRAVITINO_HOST_PORT | ||
gravitino.metalake = GRAVITINO_METALAKE_NAME | ||
gravitino.cloud.region-code=c1 | ||
``` | ||
|
||
The property `gravitino.cloud.region-code=c1` specifies that the `c1-trino` host is in the `c1` region, | ||
which will handle queries for catalogs in the `c1` region. For handling queries in the `c2` region, | ||
they will be delegated to the `c2-trino` host. | ||
|
||
Similarly, on the `c2-trino` host, create a `gravitino.properties` file in the `etc/catalog directory`: | ||
|
||
```test | ||
connector.name = gravitino | ||
gravitino.uri = http://GRAVITINO_HOST_IP:GRAVITINO_HOST_PORT | ||
gravitino.metalake = GRAVITINO_METALAKE_NAME | ||
gravitino.cloud.region-code=c2 | ||
``` | ||
|
||
The `gravitino.cloud.region-code=c2` indicates that the `c2-trino` host is designated for the `c2` region, | ||
thus queries for catalogs in this region will execute on c2-trino. | ||
|
||
Ensure that the `gravitino.uri` setting on both `c1-trino` and `c2-trino` points to the same Gravitino server. | ||
Verify that the server is operational and properly connected. Restart Trino after making any configuration changes. | ||
|
||
## Creating Catalogs | ||
|
||
To verify federated queries, create catalogs. | ||
Below is an example using the `gt_mysql` catalog in the `c2` region, configured for federated querying from `c1-trino`. | ||
Execute the following command in the `c1-trino` CLI to create the catalog: | ||
|
||
To verify federated queries, start by creating catalogs. Below is an example of configuring the `gt_mysql` catalog | ||
in the `c2` region for federated querying from `c1-trino`. | ||
Execute the following command in the `c1-trino` CLI to create the catalog: | ||
|
||
```sql | ||
CALL gravitino.system.create_catalog( | ||
'gt_mysql', | ||
'jdbc-mysql', | ||
MAP( | ||
ARRAY['jdbc-url', 'jdbc-user', 'jdbc-password', 'jdbc-driver', 'cloud.region-code', 'cloud.trino.connection-url', 'cloud.trino.connection-user', 'cloud.trino.connection-password'], | ||
ARRAY['${mysql_uri}/?useSSL=false', 'trino', 'ds123', 'com.mysql.cj.jdbc.Driver', 'c2', 'jdbc:trino://c2-trino:8080', 'admin', ''] | ||
) | ||
); | ||
``` | ||
|
||
Where: | ||
- `cloud.region-code` specifies that the `gt_mysql` catalog is in the `c2` region. | ||
- `cloud.trino.connection-url` specifies the Trino JDBC connection URL for the Trino in `c2` region. | ||
- `cloud.trino.connection-user` specifies the Trino JDBC user for the Trino in `c2` region. | ||
- `cloud.trino.connection-password` specifies the Trino JDBC user password for the Trino in `c2` region. | ||
|
||
This configuration enables `c1-trino` to access `c2-trino` through the specified Trino JDBC connection information. | ||
After successfully creating the catalog, use the Trino CLI on both `c1-trino` and `c2-trino` to view the catalogs. | ||
|
||
```sql | ||
SHOW CATALOGS; | ||
``` | ||
|
||
## Adding Data | ||
|
||
After creating the catalog, the next step is to add data to verify queries. Since the `Gravitino Trino cascading connector` | ||
does not support data writing directly, execute the following commands using the `c2-trino` CLI to set up your data environment: | ||
|
||
|
||
```sql | ||
CREATE SCHEMA gt_mysql.gt_db1; | ||
|
||
CREATE TABLE gt_mysql.gt_db1.tb01 ( | ||
name VARCHAR(255), | ||
salary INT | ||
); | ||
|
||
INSERT INTO gt_mysql.gt_db1.tb01(name, salary) VALUES ('sam', 11); | ||
INSERT INTO gt_mysql.gt_db1.tb01(name, salary) VALUES ('jerry', 13); | ||
INSERT INTO gt_mysql.gt_db1.tb01(name, salary) VALUES ('bob', 14), ('tom', 12); | ||
``` | ||
|
||
## Query Verification | ||
|
||
After successfully inserting the data, execute the following query using the `c1-trino` CLI: | ||
|
||
```sql | ||
SELECT * FROM gt_mysql.gt_db1.tb01 ORDER BY name; | ||
``` | ||
|
||
Verify the query results using the `c2-trino` CLI: | ||
|
||
```sql | ||
SELECT * FROM gt_mysql.gt_db1.tb01 ORDER BY name; | ||
``` | ||
|
||
The output of the `c1-trino` CLI is the same as `c2-trino`. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you plz add a link for
Gravitino Trino cascading connector
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed