Skip to content

334 docs rfc create the docs for tigerlake #4194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: latest
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions use-timescale/page-index/page-index.js
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,11 @@ module.exports = [
href: "limitations",
excerpt: "Current limitations of TigerData product features",
},
{
title: "TigerLake",
href: "tigerlake",
excerpt: "Unifies the Timescale Cloud operational architecture with the data lake (S3 + Iceberg) architectures",
},
{
title: "Troubleshoot TigerData products",
href: "troubleshoot-timescaledb",
Expand Down
176 changes: 176 additions & 0 deletions use-timescale/tigerlake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
title: BaseLake
excerpt: Unifies the Tiger Cloud operational architecture with datalake architectures. This enables real-time application building alongside efficient data pipeline management within a single system.
products: [cloud]
keywords: [data lake, lakehouse, s3, iceberg]
---

# BaseLake

BaseLake unifies operational architecture with datalake architectures of S3 and Iceberg.
This enables real-time application building alongside efficient data pipeline management within a single system.

This experimental release is a native integration enabling continuous replication between AWS [S3 Tables](s3-tables) (managed Iceberg and catalog) running in your AWS account and
relational tables and hypertables in Tiger Cloud.

## Getting started

To connect TigerData with AWS S3 Tables, the ARN of a table bucket and a ARN of role with permission to write to the table bucket, are required.
Three options are available to curate these ARNs:
* [Using the AWS CouldFormation Console](#setup-baselake-using-aws-management-console)
* [Through the AWS CLI with a CloudFormation template](#setup-baselake-using-the-aws-cloudformation-cli)
* [Step by step guide to create bucket and role](#step-by-step-guide)

### Setup BaseLake using AWS Management Console

1. Sign in to the AWS Management Console and open [CloudFormation console][cmc].
2. In the navigation bar on the top of the page:
1. Choose the name of the currently displayed AWS Region
2. Set it to the Region in which you want to create your table bucket. **This must match the region your Tiger Cloud service** is running in. If the regions do not match AWS charges you for cross-region data transfer.
3. Click **Create stack**. If prompted choose **With new resources**. This is the standard option.
4. Under **Specify Template**, copy the following URL into the Amazon S3 URL box and Click **Next**.
```
https://tigerlake.s3.us-east-1.amazonaws.com/tigerlake-connect-cloudformation.yaml
```
5. Enter the following details, then click `Next`:
* `Stack Name`: the name for this CloudFormation stack
* `BucketName`: The name of the S3 table bucket which will be created
* `ProjectID` and `ServiceID`: Your Tiger Cloud service details, see [these instructions](get-project-id)
6. Check `I acknowledge that AWS CloudFormation might create IAM resources`, then click `Next`.
7. On the review page, click `Submit` and wait for the deployment to complete.
8. Click `Outputs`, then copy all four outputs.
9. Provide the outputs to Timescale. Timescale uses them to provision your BaseLake services.

### Setup BaseLake using the AWS CloudFormation CLI

Replace the following values in the command, then run it from the terminal:
* `Stack Name`: the name for this CloudFormation stack
* `BucketName`: The name of the S3 table bucket which will be created
* `ProjectID` and `ServiceID`: Your Tiger Cloud service details, see [these instructions](get-project-id)

```shell
aws cloudformation create-stack \
--capabilities CAPABILITY_IAM \
--template-url https://tigerlake.s3.us-east-1.amazonaws.com/tigerlake-connect-cloudformation.yaml \
--stack-name {STACK_NAME} \
--parameters \
ParameterKey=BucketName,ParameterValue="{BUCKET_NAME}" \
ParameterKey=ProjectID,ParameterValue="{ProjectID}" \
ParameterKey=ServiceID,ParameterValue="{ServiceID}"
```

### Step by step guide

#### Create an S3 Bucket
1. Log in to the [AWS Management Console](aws-console).
2. Open the [Amazon S3 console](s3-console).
3. In the navigation bar on the top of the page, choose the name of the currently displayed AWS Region. Next, choose the region in which you want to create your table bucket. **This should match the region your Tiger Cloud service** will be in, or AWS will charge you for cross-region data transfer.
4. In the left navigation pane, choose Table buckets
5. Click Create table bucket then enter a name for your bucket, and create it. Note down the bucket’s Amazon Resource Name (ARN) that is displayed.

#### Create ARN role
1. Open [IAM Dashboard](iam-dashboard), to create a new Role.
2. In the left navigation pane click Roles, then click Create role and select Custom trust policy
3. Replace the entire **Custom trust policy** code block with the following, substituting `{PROJECT_ID}` and `{SERVICE_ID}` with the appropriate values for the Tiger Cloud project and the service you intend to use with TigerLake. To locate your Project ID and Service ID, [follow these steps](get-project-id).

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::142548018081:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "{PROJECT_ID}/{SERVICE_ID}"
}
}
}
]
}
```

4. Click Next, then click Next again without selecting any permission policies.
5. Give the Role a name and click Create role.
6. In Roles Overview, select the role you just created, click Add Permissions > Create inline policy.
7. Select JSON then replace the entire Policy editor code block with the following, substituting the two instances of `{S3TABLE_BUCKET_ARN}` with the ARN for the table bucket you created earlier.

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketOps",
"Effect": "Allow",
"Action": [
"s3tables:*"
],
"Resource": "{S3TABLE_BUCKET_ARN}"
},
{
"Sid": "BucketTableOps",
"Effect": "Allow",
"Action": [
"s3tables:*"
],
"Resource": "{S3TABLE_BUCKET_ARN}/table/*"
}
]
}
```

8. Click Next, then give the inline policy a name and click Create policy
9. Provide TigerData with the ARN of this role, the ARN of the S3 table bucket, and your Tiger Cloud Project and Service IDs.
10. We’ll spin up the services with the configurations and let you know when it’s completed.

## Provisioning

- takes about 10-15 minutes
- service will be restarted

## API

To stream a Postgres table or hypertable from a Tiger Cloud service to Iceberg, run the following statement:
```sql
ALTER TABLE <table_name> SET (
tigerlake.iceberg_sync = true | false,
tigerlake.iceberg_partitionby = '<partition_specification>'
)
```

* `tigerlake.iceberg_sync`: `boolean`, set to `true` to start streaming and to `false` to stop the stream. Please be aware that a stream can not be resumed after being stopped.
* `tigerlake.iceberg_partitionby`: optional field to define a partition specification in Iceberg. By default the partitioning specification of the hypertable is used. Streamed Postgres tables can have a partition specification for the Iceberg table, if intentially defined. Please refer to the [Iceberg partition specification](iceberg-partition-spec).

When a stream is started, the full table is synchronized to Iceberg, this means that all prior records are imported first.
The write throughput is ranging at approximately 40.000 records / second, for larger tables a full import can take some time.

The partition interval of for an Iceberg table is by default the same as the one from a hypertable.

Only tables or hypertables with primary keys are supported, this includes composite primary keys as well.
A primary key is necessary for Iceberg to perform update or delete statements.

## Query your data

To execute queries against Iceberg, best practice is to use the following products:
* [AWS Athena][aws-athena]: ensure that integration with the AWS analytics services is enabled for the table bucket.
* [duckdb][duckdb]: support for S3 Tables is in preview.
* [Apache Spark][apache-spark]

## Limitations
* Only Postgres 17 is supported.
* Only the S3 Tables REST Iceberg catalog is supported.
* Certain columnstore optimizations must be disabled in hypertables in order to collect correlating WAL events.

[cmc]: https://console.aws.amazon.com/cloudformation/
[aws-athena]: https://aws.amazon.com/athena/
[duckdb]: https://duckdb.org/docs/stable/extensions/iceberg/amazon_s3_tables
[apache-spark]: https://spark.apache.org/
[s3-tables]: https://aws.amazon.com/s3/features/tables/
[aws-console]: https://console.aws.amazon.com/
[s3-console]: https://console.aws.amazon.com/s3/
[iam-dashboard]: https://console.aws.amazon.com/iamv2/home
[get-project-id]: https://docs.tigerdata.com/integrations/latest/find-connection-details/#find-your-project-and-service-id
[iceberg-partition-spec]: https://iceberg.apache.org/spec/#partition-transforms
110 changes: 110 additions & 0 deletions use-timescale/tigerlake/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: TigerLake
excerpt: Unifies the Timescale Cloud operational architecture with the data lake (S3 + Iceberg) architectures. This enables real-time application building alongside efficient data pipeline management within a single system.
products: [cloud]
keywords: [data lake]
---

# TigerLake

TigerLake Unifies the Timescale Cloud operational architecture with the data lake (S3 + Iceberg) architectures. This enables real-time application building alongside efficient data pipeline management within a single system.

This experimental release is a native integration enabling continuous replication
between AWS S3 Tables (managed Iceberg and catalog) running in your AWS account and
relational tables and hypertables in Timescale Cloud.

You interact directly with TigerLake using Timescale Console. The pricing associated
with the integration will be introduced at a later date. This document explains how to
get started for the early access partners. There are no costs associated with the Timescale Cloud service, we set everything up and then invite you to the project.

## Get started

You first create an S3 table bucket, then use an with an ARN role to enable Timescale Cloud to write to this bucket. Timescale Console supplies a CloudFormation template. You setup TigerCLoud
using either:

- AWS Management Console
- AWS cloudformation CLI


### Setup TigerLake using AWS Management Console

1. Sign in to the AWS Management Console and open [CloudFormation console][cmc].
1. In the navigation bar on the top of the page:
1. Choose the name of the currently displayed AWS Region
2. Set it to the Region in which you want to create your table bucket.

This must match the region your Timescale service is running in. If the regions do
not match AWS charges you for cross-region data transfer.
1. Click `Create stack`.

If prompted choose `With new resources`. This is the standard option.

1. Under `Specify Template`, copy the following URL into the Amazon S3 URL box and Click `Next`.

```https://tigerlake.s3.us-east-1.amazonaws.com/tigerlake-connect-cloudformation.yaml```

1. Enter the following details, then click `Next`:
- **Stack Name**: the name for this CloudFormation stack
- **BucketName**: The name of the S3 table bucket which will be created
- **ProjectID and ServiceID**: Your Timescale Cloud service details (see these instructions)

1. Check `I acknowledge that AWS CloudFormation might create IAM resources`, then click `Next`.
1. On the review page, click `Submit` and wait for the deployment to complete.
1. Click `Outputs`, then copy all four outputs.
1. Provide the outputs to Timescale.
Timescale uses them to provision your TigerLake services.


### Setup TigerLake using the aws cloudformation CLI

Copy the following command, and replace the values for:
- **Stack Name**: the name for this CloudFormation stack
- **BucketName**: The name of the S3 table bucket which will be created
- **ProjectID and ServiceID**: Your Timescale Cloud service details (see these instructions)

```shell
aws cloudformation create-stack \
--capabilities CAPABILITY_IAM \
--template-url https://tigerlake.s3.us-east-1.amazonaws.com/tigerlake-connect-cloudformation.yaml \
--stack-name {STACK_NAME} \
--parameters \
ParameterKey=BucketName,ParameterValue="{BUCKET_NAME}" \
ParameterKey=ProjectID,ParameterValue="{ProjectID}" \
ParameterKey=ServiceID,ParameterValue="{ServiceID}"
```


## Start streaming

To stream a table from a Timescale Cloud service to Iceberg, run the following statement:
```sql
SELECT create_iceberg_sync('<TABLE_NAME>'::regclass);
```

When a stream is started, the full table is synchronized to Iceberg, meaning that all prior records will be imported first.

## Query your data

To execute queries against Iceberg, best practice is to use the following:

- [AWS Athena][aws-athena]: ensure that integration with AWS analytics services is enabled for the table bucket.
- [duckdb][duckdb]: support for S3 Tables is in preview.
- [Apache Spark][apache-spark]

## Gotchas and known issues

- Only Postgres 17 is supported.
- The only supported Iceberg catalog is the S3 Tables REST catalog.
- It is not possible to stop a stream from TimescaleDB to Iceberg at the moment, this ability is under implementation.
- When streaming a hypertable to Iceberg, the Iceberg table will be partitioned with a hardcoded one-day partition interval.
- Only tables with primary keys are supported.
- We have some limitations on the changes you can make to a table after setting up Iceberg sync : dropping/renaming columns and modifying a column’s data type is not permitted. We will add support for this soon.
- Certain optimizations must be disabled in Hypertables with the columnstore enabled to retrieve correlating WAL events.

## What’s next
We love to get all your feedback, on what worked, and what didn’t. If possible, we would love to have a call and learn more about the problems you tried to solve and what would be the next thing we could build to support you.

[cmc]: https://console.aws.amazon.com/cloudformation/
[aws-athena]: https://aws.amazon.com/athena/
[duckdb]: https://duckdb.org/docs/stable/extensions/iceberg/amazon_s3_tables
[apache-spark]: https://spark.apache.org/