Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add watsonx spark config and setup files #6895

Merged

Conversation

bynist
Copy link

@bynist bynist commented Feb 10, 2025

What are you changing in this pull request and why?

Adding IBM Watsonx Spark config and setup files

Checklist

  • I have reviewed the Content style guide so my content adheres to these guidelines.
  • The topic I'm writing about is for specific dbt version(s) and I have versioned it according to the version a whole page and/or version a block of content guidelines.
  • I have added checklist item(s) to this list for anything anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."
  • The content in this PR requires a dbt release note, so I added one to the release notes page.

@bynist bynist requested a review from a team as a code owner February 10, 2025 17:44
Copy link

welcome bot commented Feb 10, 2025

Hello!👋 Thanks for contributing to the dbt product documentation and opening this pull request! ✨
We use Markdown and some HTML to write the dbt product documentation. When writing content, you can use our style guide and content types to understand our writing standards and how we organize information in the dbt product docs.
We'll review your contribution and respond as soon as we can. 😄

Copy link

vercel bot commented Feb 10, 2025

@ReemaAlzaid is attempting to deploy a commit to the dbt-labs Team on Vercel.

A member of the Team first needs to authorize it.

@runleonarun runleonarun added the new contributor Label for first-time contributors label Feb 10, 2025
@github-actions github-actions bot added content Improvements or additions to content size: large This change will more than a week to address and might require more than one person labels Feb 10, 2025
- [watsonx.data SaaS Catalog](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-reg_database)
- [watsonx.data Software Catalog](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=components-adding-data-source)

### Extra configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change Extra > Additional


## Installing dbt-watsonx-spark

Since dbt v1.8, installing an adapter no longer installs `dbt-core` automatically.
Copy link

@DivyaLokesh DivyaLokesh Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine line 33 and 34 as follows:
Use the following command to install the adapter:
Note: From dbt v1.8, installing an adapter no longer installs 'dbt-core' automatically. This is because adapters and dbt Core versions are decoupled to avoid overwriting dbt-core installations.

```

## Configuring `dbt-watsonx-spark`
For IBM watsonx.data-specific configuration, please refer to [IBM watsonx.data configs.](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=spark-configuration-setting-up-your-profile)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "please"


## Connecting to IBM watsonx.data Spark

To connect dbt with watsonx.data Spark, you need to configure a profile in your `profiles.yml` file located in the `.dbt/` directory of your home folder. The following is an example configuration for connecting to IBM watsonx.data SaaS and Software instances:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change, "you need to configure" to "configure a profile in your..."


## Host parameters

The following profile fields are required to configure watsonx.data Spark connections. For IBM watsonx.data SaaS or Software instances, you can get the `profile` details by clicking **View connect details** after `the query server` is in RUNNING stat, The Connection details page opens with the profile configuration.
Copy link

@DivyaLokesh DivyaLokesh Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the 'profile' details, click 'View connect details' when the 'query server' is in RUNNING status in watsonx.data (In watsonx.data (both SaaS or Software). The Connection details page opens...

Remove , after stat.

## Host parameters

The following profile fields are required to configure watsonx.data Spark connections. For IBM watsonx.data SaaS or Software instances, you can get the `profile` details by clicking **View connect details** after `the query server` is in RUNNING stat, The Connection details page opens with the profile configuration.
Copy the connection details. Then Paste the connection details in the profiles.yml file that is located in .dbt of your home directory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy and paste the connection details in the profiles.yml...

| `catalog` | Required | The catalog that is associated with the Spark engine. | `my_catalog` |
| `use_ssl` | Optional (default: **false**) | Specifies whether to use SSL. | `true` or `false` |
| `instance` | Required | For **SaaS** set it as CRN of watsonx.data. As for **Software**, set it as instance ID of watsonx.data| `1726574045872688`|
| `user` | Required | Your watsonx.data username | `[email protected]`|
Copy link

@DivyaLokesh DivyaLokesh Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The authentication (username and passwords) are different for SaaS and Software and also there are some variations. Ensure to give all those details.

Also, change "Your watsonx.data username" > "Username for the watsonx.data instance"


### Schemas and catalogs

When selecting the catalog, ensure the user has read and write access. This selection does not limit your ability to query into the schema spcified/created but also serves as the default location for materialized `tables`, `views`, and `incremental`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to say "This selection does not limit your ability to query into the schema specified/created by other users?"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly

| ---------- | ----------------------------- | ------------------------------------------------------------------------- | ----------------- |
| `method` | Required | Specifies the connection method to the spark query server. Use `http`. | `http` |
| `schema` | Required| To choose an existing schema within spark engine or create a new schema. | `spark_schema` |
| `host` | Required | Hostname of your watsonx.data console. For more information, see [Getting connection information](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=references-getting-connection-information#connection_info__conn_info_).| `https://dataplatform.cloud.ibm.com` |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your watsonx.data > the watsonx.data

### SSL verification

- If the Spark instance uses an unsecured HTTP connection, set `use_ssl` to `false`.
- If the instance uses `HTTPS`, this parameter should be set to `true`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "this parameter should be set to 'true' > set it 'true'


## Additional parameters

The following profile fields are optional to set up. They let you configure your instance session and dbt for your connection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following profile fields are optional. You can configure the instance session and dbt for the connection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DivyaLokesh I’ve addressed all the requested changes, Please let me know if there’s anything else

@github-actions github-actions bot added size: medium This change will take up to a week to address and removed size: large This change will more than a week to address and might require more than one person labels Feb 12, 2025
@bynist bynist requested a review from DivyaLokesh February 13, 2025 08:41
Copy link

vercel bot commented Feb 18, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview Feb 18, 2025 9:31pm

@mirnawong1 mirnawong1 disabled auto-merge February 20, 2025 18:47
@mirnawong1 mirnawong1 merged commit f68119e into dbt-labs:current Feb 20, 2025
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content new contributor Label for first-time contributors size: medium This change will take up to a week to address
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants