diff --git a/assets/js/search-data.json b/assets/js/search-data.json index 8ef3bc29d..b6916fcd7 100644 --- a/assets/js/search-data.json +++ b/assets/js/search-data.json @@ -3151,7 +3151,7 @@ },"450": { "doc": "Unity Delta Sharing", "title": "Setup", - "content": "This guide explains how to set up and use lakeFS Delta Sharing. Currently, you will have to configure lakeFS Delta Sharing in collaboration with Treeverse Customer Success. Once set up is complete, you will of course be able to use lakeFS Delta Sharing on existing and on new tables without further assistance. 1. Collect data and initial setup . | Select a Delta Sharing configuration URL. This is a single location on lakeFS to hold the top-level configuration of lakeFS Delta Sharing across all repositories of your organization. Typically, it will have the form lakefs://REPO/main/lakefs_delta_sharing.yaml for one of your repositories. A longer path may be supplied - however, we do recommend keeping it on the main branch, as this object represents state for the entire installation. | Create a user lakefs-delta-sharing-service for lakeFS Delta Sharing, and an access key for that user. It should have at least read permissions for the configuration URL and for all repositories and all data accesses by Unity. lakeFS Delta Sharing will these credentials to communicate with lakeFS. | . Communicate these items to Customer Success: . | Configuration URL | Access key ID and secret access key for user lakefs-delta-sharing-service. | . 2. Initial configuration . Select a secret authorization token to share Unity catalog. Unity catalog will use to authenticate to the lakeFS Delta Sharing server. You might use this command on Linux: . head -c 50 /dev/random | base64 . Create a file lakefs_delta_sharing.yaml and place it at the config URL selected above. It should look like this: . authorization_token: \"GENERATED TOKEN\" # Map lakeFS repositories to Unity shares repositories: - id: sample-repo share_name: undev # List branches and prefixes to export. Each of these branches (and only # these branches) will be available as a schema on Unity. branches: - main - staging - dev_* - id: repo2 share_name: share_two branches: - \"*\" . Note that a plain “*” line must be quoted in YAML. Upload it to your config URL. For instance if the config URL is lakefs://repo/main/lakefs_delta_sharing.yaml, you might use: . lakectl fs upload -s ./lakefs_delta_sharing.yaml lakefs://repo/main/lakefs_delta_sharing.yaml . 3. Connect Unity to lakeFS Delta Sharing! . You now need to configure Unity to use the lakeFS Delta Sharing server. Create a share provider file config.share.json; see the Delta Sharing manual: . { \"shareCredentialsVersion\": 1, \"endpoint\": \"https://ORG_ID.REGION.lakefscloud.io/service/delta-sharing/v1\", \"bearerToken\": \"GENERATED TOKEN\", \"expirationTime\": \"2030-01-01T00:00:00.0Z\" } . “GENERATED TOKEN” is the secret authorization token use above. Install the databricks cli. We will use it to create Delta Share on Unity. Follow the instructions to configure it. Run the provider creation command: . databricks unity-catalog providers create \\ --name lakefs-cloud \\ --recipient-profile-json-file config.share.json . Go to “Data » Delta Sharing” on the DataBricks environment. Once Treeverse have configured lakeFS Delta Sharing on your account with your config URL, the “lakefs-cloud” provider should appear under “Shared with me”. Click the provider to see its shares. You can now create a catalog from these shares. And you can see schemas for each of the branches that you configured in the share. Here branch name dev_experiment1 matches the pattern dev_* that we defined in the configuration object lakefs-delta-sharing.yaml, so it appears as a schema. At this point you have configured Delta Sharing on lakeFS, and DataBricks to communicate with lakeFS delta sharing. No further Treeverse involvement is required. Updates to lakefs_delta_sharing.yaml will update within a minute of uploading a new version. 4. Configure tables . Everything is ready: lakeFS repositories are configured as shares, and branches are configured as schemas. Now you can define tables! Once a repository is shared, its tables are configured as a table descriptor object on the repository on the path _lakefs_tables/TABLE.yaml. Delta Lake tables . Delta Lake format includes full metadata, so you only need to configure the prefix: . name: users type: delta path: path/to/users/ . When placed inside _lakefs_tables/users.yaml this defines a table users on the prefix path/to/users/ (so path/to/users/ holds the prefix _delta_log). Hive tables . Hive metadata server tables are essentially just a set of objects that share a prefix, with no table metadata stored on the object store. You need to configure prefix, partitions, and schema. name: clicks type: hive path: path/to/clicks/ partition_columns: ['year'] schema: type: struct fields: - name: year type: integer nullable: false metadata: {} - name: page type: string nullable: false metadata: {} - name: site type: string nullable: true metadata: comment: a comment about this column . Useful types recognized by DataBricks Photon include integer, long, short, string, double, float, date, and timestamp. For exact type mappings, and whether to specify a field as nullable: false, refer to DataBricks Photon documentation. ", + "content": "This guide explains how to set up and use lakeFS Delta Sharing. Currently, you will have to configure lakeFS Delta Sharing in collaboration with Treeverse Customer Success. Once set up is complete, you will of course be able to use lakeFS Delta Sharing on existing and on new tables without further assistance. 1. Collect data and initial setup . | Select a Delta Sharing configuration URL. This is a single location on lakeFS to hold the top-level configuration of lakeFS Delta Sharing across all repositories of your organization. Typically, it will have the form lakefs://REPO/main/lakefs_delta_sharing.yaml for one of your repositories. A longer path may be supplied - however, we do recommend keeping it on the main branch, as this object represents state for the entire installation. | Create a user lakefs-delta-sharing-service for lakeFS Delta Sharing, and an access key for that user. It should have at least read permissions for the configuration URL and for all repositories and all data accesses by Unity. lakeFS Delta Sharing will these credentials to communicate with lakeFS. | . Communicate these items to Customer Success: . | Configuration URL | Access key ID and secret access key for user lakefs-delta-sharing-service. | . Note: All YAML files extensions used in this guide must be yaml. Do not use a yml extension instead. 2. Initial configuration . Select a secret authorization token to share Unity catalog. Unity catalog will use to authenticate to the lakeFS Delta Sharing server. You might use this command on Linux: . head -c 50 /dev/random | base64 . Create a file lakefs_delta_sharing.yaml and place it at the config URL selected above. It should look like this: . authorization_token: \"GENERATED TOKEN\" # Map lakeFS repositories to Unity shares repositories: - id: sample-repo share_name: undev # List branches and prefixes to export. Each of these branches (and only # these branches) will be available as a schema on Unity. branches: - main - staging - dev_* - id: repo2 share_name: share_two branches: - \"*\" . Note that a plain “*” line must be quoted in YAML. Upload it to your config URL. For instance if the config URL is lakefs://repo/main/lakefs_delta_sharing.yaml, you might use: . lakectl fs upload -s ./lakefs_delta_sharing.yaml lakefs://repo/main/lakefs_delta_sharing.yaml . 3. Connect Unity to lakeFS Delta Sharing! . You now need to configure Unity to use the lakeFS Delta Sharing server. Create a share provider file config.share.json; see the Delta Sharing manual: . { \"shareCredentialsVersion\": 1, \"endpoint\": \"https://ORG_ID.REGION.lakefscloud.io/service/delta-sharing/v1\", \"bearerToken\": \"GENERATED TOKEN\", \"expirationTime\": \"2030-01-01T00:00:00.0Z\" } . “GENERATED TOKEN” is the secret authorization token use above. Install the databricks cli. We will use it to create Delta Share on Unity. Follow the instructions to configure it. Run the provider creation command: . databricks unity-catalog providers create \\ --name lakefs-cloud \\ --recipient-profile-json-file config.share.json . Go to “Data » Delta Sharing” on the DataBricks environment. Once Treeverse have configured lakeFS Delta Sharing on your account with your config URL, the “lakefs-cloud” provider should appear under “Shared with me”. Click the provider to see its shares. You can now create a catalog from these shares. And you can see schemas for each of the branches that you configured in the share. Here branch name dev_experiment1 matches the pattern dev_* that we defined in the configuration object lakefs-delta-sharing.yaml, so it appears as a schema. At this point you have configured Delta Sharing on lakeFS, and DataBricks to communicate with lakeFS delta sharing. No further Treeverse involvement is required. Updates to lakefs_delta_sharing.yaml will update within a minute of uploading a new version. 4. Configure tables . Everything is ready: lakeFS repositories are configured as shares, and branches are configured as schemas. Now you can define tables! Once a repository is shared, its tables are configured as a table descriptor object on the repository on the path _lakefs_tables/TABLE.yaml. Delta Lake tables . Delta Lake format includes full metadata, so you only need to configure the prefix: . name: users type: delta path: path/to/users/ . Note: The filename of the ‘yaml’ file containing the table definition must match the ‘name’ of the table itself. In the example above, ‘_lakefs_tables/users.yaml’. When placed inside _lakefs_tables/users.yaml this defines a table users on the prefix path/to/users/ (so path/to/users/ holds the prefix _delta_log). Hive tables . Hive metadata server tables are essentially just a set of objects that share a prefix, with no table metadata stored on the object store. You need to configure prefix, partitions, and schema. name: clicks type: hive path: path/to/clicks/ partition_columns: ['year'] schema: type: struct fields: - name: year type: integer nullable: false metadata: {} - name: page type: string nullable: false metadata: {} - name: site type: string nullable: true metadata: comment: a comment about this column . Useful types recognized by DataBricks Photon include integer, long, short, string, double, float, date, and timestamp. For exact type mappings, and whether to specify a field as nullable: false, refer to DataBricks Photon documentation. ", "url": "/howto/unity-delta-sharing.html#setup", "relUrl": "/howto/unity-delta-sharing.html#setup" diff --git a/howto/unity-delta-sharing.html b/howto/unity-delta-sharing.html index ecb972b35..f549d91da 100644 --- a/howto/unity-delta-sharing.html +++ b/howto/unity-delta-sharing.html @@ -567,6 +567,8 @@

  • Configuration URL
  • Access key ID and secret access key for user lakefs-delta-sharing-service.
  • + +

    Note: All YAML files extensions used in this guide must be yaml. Do not use a yml extension instead.

    @@ -684,6 +686,8 @@

    path: path/to/users/ +

    Note: The filename of the ‘yaml’ file containing the table definition must match the ‘name’ of the table itself. In the example above, ‘_lakefs_tables/users.yaml’.

    +

    When placed inside _lakefs_tables/users.yaml this defines a table users on the prefix path/to/users/ (so path/to/users/ holds the prefix _delta_log).