Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create fabricspark-setup.md #5038

Draft
wants to merge 9 commits into
base: current
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions website/docs/docs/core/connect-data-platform/fabricspark-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
title: "Microsoft Fabric Spark setup"
description: "Read this guide to learn about the Microsoft Fabric Spark lakehouse setup in dbt."
id: "fabricspark-setup"
meta:
maintained_by: Microsoft
authors: 'Pradeep Srikakolapu'
github_repo: 'microsoft/dbt-fabricspark'
pypi_package: 'dbt-fabricspark'
min_core_version: 'v1.7.0'
cloud_support: Not Supported
min_supported_version: 'n/a'
slack_channel_name: 'db-fabric-synapse'
slack_channel_link: 'https://app.slack.com/client/T0VLPD22H/C01DRQ178LQ'
platform_name: 'Fabric Spark'
config_page: '/reference/resource-configs/fabricspark-configs'
---


<Snippet path="warehouse-setups-cloud-callout" />
<Snippet path="dbt-fabricspark-for-fabricspark" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @prdpsvs - it doesn't look like there's a file for this snippet. is this something you're planning on adding? the deploy is failing bc it can't find fabricspark-for-fabricspark.md.


import SetUpPages from '/snippets/_setup-pages-intro.md';

<SetUpPages meta={frontMatter.meta} />

```zsh
# livy connections
$ python -m pip install "dbt-fabricspark"
```

<h2> Configuring {frontMatter.meta.pypi_package} </h2>

<p>For {frontMatter.meta.platform_name}-specific configuration please refer to <a href={frontMatter.meta.config_page}>{frontMatter.meta.platform_name} Configuration</a> </p>

<p>For further info, refer to the GitHub repository: <a href={`https://github.com/${frontMatter.meta.github_repo}`}>{frontMatter.meta.github_repo}</a></p>

## Connection Methods

dbt-fabricspark can connect to Spark clusters using livy connections:

- [`livy`](#livy) is the supported method to connect to Microsoft Fabric Data Engineering experience. It supports connecting to a job cluster.

### LIVY

Use the `livy` connection method to connect to Microsoft Fabric data engineering experience.

<File name='~/.dbt/profiles.yml'>

```yaml
fabric-spark-profile:
target: fabricspark-dev
fabricspark-dev:
authentication: CLI
method: livy
endpoint: https://api.fabric.microsoft.com/v1
workspaceid: [workspace id]
lakehouseid: [lakehouse id]
lakehouse: [lakehouse name]
schema: [lakehouse name]
threads: 1
type: fabricspark
livy_session_parameter:
"spark.driver.memory": "4g"
```

</File>


## Optional configurations

### Retries

Intermittent errors can crop up unexpectedly while running queries against Microsoft Fabric Spark. If `retry_all` is enabled, dbt-spark will naively retry any query that fails, based on the configuration supplied by `connect_timeout` and `connect_retries`. It does not attempt to determine if the query failure was transient or likely to succeed on retry. This configuration is recommended in production environments, where queries ought to be succeeding.

For instance, this will instruct dbt to retry all failed queries up to 3 times, with a 5 second delay between each retry:

<File name='~/.dbt/profiles.yml'>

```yaml
retry_all: true
connect_timeout: 5
connect_retries: 3
```

</File>



<VersionBlock firstVersion="1.7">

### Server side configuration (Livy Session Parameters)

Fabric Spark can be customized using [Application Properties](https://spark.apache.org/docs/latest/configuration.html). Using these properties the execution can be customized, for example, to allocate more memory to the driver process. Also, the Spark SQL runtime can be set through these properties. For example, this allows the user to [set a Spark catalogs](https://spark.apache.org/docs/latest/configuration.html#spark-sql).
</VersionBlock>

## Caveats

### Supported Functionality

Delta-only features:
1. Incremental model updates by `unique_key` instead of `partition_by` (see [`merge` strategy](/reference/resource-configs/spark-configs#the-merge-strategy))
2. [Snapshots](/docs/build/snapshots)
3. [Persisting](/reference/resource-configs/persist_docs) column-level descriptions as database comments
Loading
Loading