Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMP-110505 [DWH] Update Snowflake/DataBricks import doc for mutation setup #294

Merged
merged 12 commits into from
Oct 4, 2024
15 changes: 14 additions & 1 deletion content/collections/source-catalog/en/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ For guided instructions to setting up this integration, view the [Loom video](ht
- [materialized views](https://docs.databricks.com/en/views/materialized.html)
- [streaming tables](https://docs.databricks.com/en/delta-live-tables/index.html#streaming-table)

- SQL input restrictions for Continuous Sync change data feed type:
LeontiBrechko marked this conversation as resolved.
Show resolved Hide resolved
- Only one source Delta Table (referred to as “main table”)
- Single SELECT statement
- Common Table Expressions (CTE) (for example, WITH-clause) aren't supported
- Set operations like `UNION`, `INTERSECT`, `MINUS`, EXCEPT aren't supported
markzegarelli marked this conversation as resolved.
Show resolved Hide resolved
- Statements with a `JOIN` clause use mutation metadata from the main table, ignoring the mutation history of joined table. Amplitude uses the latest version of data in the joined table during data synchronization
markzegarelli marked this conversation as resolved.
Show resolved Hide resolved
- Explicit SQL validation may not cover all edge cases. For example, if you provide more than one source table, validation may succeed during source creation, but fail during import execution
markzegarelli marked this conversation as resolved.
Show resolved Hide resolved

## Configure Databricks

Before you start to configure the Databricks source in Amplitude, complete the following tasks in Databricks.
markzegarelli marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -148,7 +156,12 @@ To add Databricks as a source in Amplitude, complete the following steps.

For the `Event` data type, optionally select *Sync User Properties* or *Sync Group Properties* to sync the corresponding properties **within** an event.

2. Configure the SQL command that transforms data in Databricks before Amplitude imports it.
2. If you selected the Event or Profiles as data type, choose the change data feed type:
markzegarelli marked this conversation as resolved.
Show resolved Hide resolved

- **Ingestion Only**: Ingest data warehouse data with Amplitude's standard enrichment services like ID resolution, property and attribution syncing, and resolving location info.
- **Continuous Sync**: Directly mirror the data in Snowflake with insert, update, and delete operations. This deactivates Amplitude's enrichment services to remain in sync with your source of truth.

3. Configure the SQL command that transforms data in Databricks before Amplitude imports it.
- Amplitude treats each record in the SQL execution output as an event to be import. See the Example body in the [Batch Event Upload API](/docs/apis/analytics/batch-event-upload) documentation to ensure each record you import complies.
- Amplitude can transform / import from only the tables you specify in step 1 above.
- For example, if you have access to tables `A`, `B` and `C` but only selected `A` in step 1, then you can only import data from `A`.
Expand Down
Loading
Loading