Polaris Catalog is an open source catalog for Apache Iceberg ™️. Polaris Catalog implements Iceberg’s open REST API for multi-engine interoperability with Apache Doris ™️, Apache Flink® , Apache Spark ™️, StarRocks and Trino.
Polaris Catalog is open source under an Apache 2.0 license.
- ⭐ Star this repo if you’d like to bookmark and come back to it!
- 📖 Read the announcement blog post for more details!
API docs are hosted via Github Pages at https://polaris.io. All updates to the main branch update the hosted docs.
The Polaris management API docs are found here
The Apache Iceberg REST API docs are found here
Docs are generated using Redocly. They can be regenerated by running the following commands from the project root directory
docker run -p 8080:80 -v ${PWD}:/spec docker.io/redocly/cli join spec/docs.yaml spec/polaris-management-service.yml spec/rest-catalog-open-api.yaml -o spec/index.yaml --prefix-components-with-info-prop title
docker run -p 8080:80 -v ${PWD}:/spec docker.io/redocly/cli build-docs spec/index.yaml --output=docs/index.html --config=spec/redocly.yaml
- Java JDK >= 21, see CONTRIBUTING.md.
- Gradle - This is included in the project and can be run using
./gradlew
in the project root. - Docker - If you want to run the project in a containerized environment.
Polaris is a multi-module project with three modules:
polaris-core
- The main Polaris entity definitions and core business logicpolaris-server
- The Polaris REST API serverpolaris-eclipselink
- The Eclipselink implementation of the MetaStoreManager interface
Build the binary (first time may require installing new JDK version). This build will run IntegrationTests by default.
./gradlew build
To skip tests.
./gradlew assemble
Run the Polaris server locally on localhost:8181
./gradlew runApp
The server will start using the in-memory mode, and it will print its auto-generated credentials to STDOUT in a message like the following:
realm: default-realm root principal credentials: <id>:<secret>
These credentials can be used as "Client ID" and "Client Secret" in OAuth2 requests (e.g. the curl
command below).
While the Polaris server is running, run regression tests, or end-to-end tests in another terminal
./regtests/run.sh
Build the image:
docker build -t localhost:5001/polaris:latest .
Run it in a standalone mode. This runs a single container that binds the container's port 8181
to localhosts 8181
:
docker run -p 8181:8181 localhost:5001/polaris:latest
Unit and integration tests are run using gradle. To run all tests, use the following command:
./gradlew test
Regression tests, or functional tests, are stored in the regtests
directory. They can be executed in a docker
environment by using the docker-compose.yml
file in the project root.
docker compose up --build --exit-code-from regtest
They can also be executed outside of docker by following the setup instructions in the README
You can run Polaris as a mini-deployment locally. This will create two pods that bind themselves to port 8181
:
./setup.sh
You can check the pod and deployment status like so:
kubectl get pods
kubectl get deployment
If things aren't working as expected you can troubleshoot like so:
kubectl describe deployment polaris-deployment
Before connecting with Spark, you'll need to create a catalog. To create a catalog, generate a token for the root principal:
curl -i -X POST \
http://localhost:8181/api/catalog/v1/oauth/tokens \
-d 'grant_type=client_credentials&client_id=<principalClientId>&client_secret=<mainSecret>&scope=PRINCIPAL_ROLE:ALL'
The response output will contain an access token:
{
"access_token": "ver:1-hint:1036-ETMsDgAAAY/GPANareallyverylongstringthatissecret",
"token_type": "bearer",
"expires_in": 3600
}
Set the contents of the access_token
field as the PRINCIPAL_TOKEN
variable. Then use curl to invoke the
createCatalog
api:
$ export PRINCIPAL_TOKEN=ver:1-hint:1036-ETMsDgAAAY/GPANareallyverylongstringthatissecret
$ curl -i -X POST -H "Authorization: Bearer $PRINCIPAL_TOKEN" -H 'Accept: application/json' -H 'Content-Type: application/json' \
http://${POLARIS_HOST:-localhost}:8181/api/management/v1/catalogs \
-d '{"name": "polaris", "id": 100, "type": "INTERNAL", "readOnly": false, "storageConfigInfo": {"storageType": "FILE"}, "properties": {"default-base-location": "file:///tmp/polaris"}}'
This creates a catalog called polaris
. From here, you can use Spark to create namespaces, tables, etc.
You must run the following as the first query in your spark-sql shell to actually use Polaris:
use polaris;
Apache Iceberg, Iceberg, Apache Spark, Spark, Apache Flink, Flink, Apache Doris, Doris, Apache, the Apache feather logo, the Apache Iceberg project logo, the Apache Spark project logo, the Apache Flink project logo, and the Apache Doris project logo are either registered trademarks or trademarks of The Apache Software Foundation. Copyright © 2024 The Apache Software Foundation.