diff --git a/content/collections/source-catalog/en/databricks.md b/content/collections/source-catalog/en/databricks.md
index d586f3d42..2ffe2bd76 100644
--- a/content/collections/source-catalog/en/databricks.md
+++ b/content/collections/source-catalog/en/databricks.md
@@ -38,9 +38,17 @@ For guided instructions to setting up this integration, view the [Loom video](ht
- [materialized views](https://docs.databricks.com/en/views/materialized.html)
- [streaming tables](https://docs.databricks.com/en/delta-live-tables/index.html#streaming-table)
+- SQL input restrictions for Continuous Sync change data feed type:
+ - Only one source Delta Table (referred to as “main table”)
+ - Single SELECT statement
+ - Common Table Expressions (CTE) (for example, WITH-clause) aren't supported
+ - Set operations like `UNION`, `INTERSECT`, `MINUS`, and `EXCEPT` aren't supported
+ - Statements with a `JOIN` clause use mutation metadata from the main table, ignoring the mutation history of the joined table. Amplitude uses the latest version of data in the joined table during data synchronization.
+ - Explicit SQL validation may not cover all edge cases. For example, if you provide more than one source table, validation may succeed during source creation, but fail during import execution.
+
## Configure Databricks
-Before you start to configure the Databricks source in Amplitude, complete the following tasks in Databricks.
+Before you start to configure the Databricks source in Amplitude, complete these tasks in Databricks:
### Find or create an all-purpose compute cluster
@@ -148,7 +156,12 @@ To add Databricks as a source in Amplitude, complete the following steps.
For the `Event` data type, optionally select *Sync User Properties* or *Sync Group Properties* to sync the corresponding properties **within** an event.
-2. Configure the SQL command that transforms data in Databricks before Amplitude imports it.
+2. If you selected `Event` or `Profiles` as the data type, choose the change data feed type:
+
+- **Ingestion Only**: Ingest data warehouse data with Amplitude's standard enrichment services like ID resolution, property and attribution syncing, and resolving location info.
+- **Continuous Sync**: Directly mirror the data in Snowflake with insert, update, and delete operations. This deactivates Amplitude's enrichment services to remain in sync with your source of truth.
+
+3. Configure the SQL command that transforms data in Databricks before Amplitude imports it.
- Amplitude treats each record in the SQL execution output as an event to be import. See the Example body in the [Batch Event Upload API](/docs/apis/analytics/batch-event-upload) documentation to ensure each record you import complies.
- Amplitude can transform / import from only the tables you specify in step 1 above.
- For example, if you have access to tables `A`, `B` and `C` but only selected `A` in step 1, then you can only import data from `A`.
diff --git a/content/collections/source-catalog/en/snowflake.md b/content/collections/source-catalog/en/snowflake.md
index 51394b6a4..d7053512e 100644
--- a/content/collections/source-catalog/en/snowflake.md
+++ b/content/collections/source-catalog/en/snowflake.md
@@ -23,120 +23,146 @@ exclude_from_sitemap: false
updated_by: 5817a4fa-a771-417a-aa94-a0b1e7f55eae
updated_at: 1726777780
---
-With Amplitude's Snowflake integration, you can ingest Snowflake data directly into your Amplitude project. This article walks you through the steps needed to make that happen.
+With Amplitude's Snowflake integration, you can ingest Snowflake data directly into your Amplitude project. The integration supports four strategies to import your Snowflake data, depending on the data types you select.
+{{partial:admonition type="note" heading="Amplitude regional IP addresses"}}
+Depending on your company's network policy, you may need to add these IP addresses to your allowlist in order for Amplitude's servers to access your Snowflake instance:
-## Considerations
+| Region | IP Addresses |
+| ------ | ----------------------------------------------- |
+| US | `52.33.3.219`, `35.162.216.242`, `52.27.10.221` |
+| EU | `3.124.22.25`, `18.157.59.125`, `18.192.47.195` |
-- Depending on your company's network policy, you may need add these IP addresses to your allowlist in order for Amplitude's servers to access your Snowflake instance:
-
- - Amplitude US IP addresses:
- - 52.33.3.219
- - 35.162.216.242
- - 52.27.10.221
- - Amplitude EU IP addresses:
- - 3.124.22.25
- - 18.157.59.125
- - 18.192.47.195
+{{/partial:admonition}}
## Limits
- Maximum running time for a single Snowflake SQL query is 12 hours.
+
{{partial:admonition type="warning" title="User and Group properties sync"}}
Amplitude's Data Warehouse Import sometimes processes events in parallel, so time-ordered syncing of user and group properties on events isn't guaranteed in the same way as submitting events directly to the Identify and Group Identify APIs.
{{/partial:admonition}}
-## Modeling methods
+## Add and configure the Snowflake source
+
+Complete the following steps to configure the Snowflake source:
+
+1. [Set up and verify the connection](#set-up-and-verify-the-connection)
+2. [Select data](#select-data)
+3. [Select the import strategy](#select-the-import-strategy)
+4. [Map your data](#map-your-data)
+5. [Schedule your sync](#schedule-your-sync)
+
+### Set up and verify the connection
+
+To add Snowflake as a data source in your Amplitude project, follow these steps:
+
+1. In Amplitude Data, navigate to *Catalog → Sources*.
+2. In the Warehouse Sources section, click *Snowflake*.
+3. Enter the required credentials for the Snowflake instance you want to connect:
+
+ - **Account**: Snowflake account name. Case sensitive. This is the first part of your Snowflake URL, before `snowflakecomputing.com`. Don't include ".snowflakecomputing.com" in your account name.
+ - **Database**: Name of the database where Amplitude can find the data.
+ - **Warehouse**: Used by Amplitude to execute SQL.
+ - **Username**: Used by Amplitude for authentication.
+ - **Password**: Used by Amplitude for authentication.
+
+ Amplitude offers password-based and key pair authentication for Snowflake.
-Amplitude's Snowflake Data Import supports two methods for importing data from Snowflake, Change Data Capture and Custom SQL Query.
+ - If you want to use password authentication, select *Password* and enter your password in the *Password* field.
+ - If you want to use key pair authentication, select *Key pair* and then click *Generate Key*. Then provide the organization and account names in the format `ORGNAME-ACCOUNTNAME`.
-| | Change Data Capture | Custom SQL Query |
-| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
-| Import data types | Event, User property, Group Property | Event, User property, Group Property |
-| Import strategy | Change-based | Time-based, Full Sync (only for group and user properties) |
-| When to use | Recommended for most use cases, user-friendly, minimal SQL knowledge required.
Limited data source selection functionality, consider creating Snowflake VIEW (see Prerequisites for details). | Use when data selection requires customization, even though it may lead to data discrepancies and higher costs if misconfigured |
+4. Copy the autogenerated SQL query and run it in Snowflake to give Amplitude the proper permissions.
-### Change Data Capture
+5. After running the query, click *Next* to test the connection.
-Change Data Capture identifies and captures changes made to data in a database and delivers those changes in real time to a downstream process or system.
+6. After the test succeeds, click *Next* again to move on to the data selection stage.
-For the Snowflake source in Amplitude, Change Data Capture uses mechanisms available in Snowflake, [Time Travel](https://docs.snowflake.com/en/user-guide/data-time-travel) and [CHANGES](https://docs.snowflake.com/en/sql-reference/constructs/changes) clause, to identify changes made in the data source since the last successfully completed import job.
+### Select the data type
-#### Prerequisites and considerations
+The data type you select defines the strategies and settings available to you for configuration.
-- If a data source is represented as a complex SQL SELECT statement (for instance, with a JOIN clause), create a VIEW in your Snowflake account that wraps the data source to use it with a change-based import strategy.
-- Enable change tracking for the source table or view. See [Enabling Change Tracking on Views and Underlying Tables Snowflake](https://docs.snowflake.com/en/user-guide/streams-manage.html#label-enabling-change-tracking-views) for more information.
-- `DATA_RETENTION_TIME_IN_DAYS` must be greater than or equal to `1`, but Amplitude recommends at least `7` days. Otherwise, the change-based import fails. For more details, see [Time Travel](https://docs.snowflake.com/en/user-guide/data-time-travel) in Snowflake's documentation. Setting `DATA_RETENTION_TIME_IN_DAYS` to `0` disables the change tracking, and causes the connection to become unrecoverable. If this happens, recreate the source.
-- [Data field](#data-fields) requirements also apply.
-- (Optional, recommended) Ensure the data to be imported has a unique and immutable `insert_id` for each row to prevent data duplication if there are any unexpected issues. More about Amplitude deduplication and `insert_id` is [Event Deduplication](/docs/apis/analytics/http-v2/#event-deduplication).
-- If you disable change tracking in Snowflake, or disconnect the Amplitude source for a period longer than the value of `DATA_RETENTION_TIME_IN_DAYS`, Amplitude loses ability to track historical changes. In this case, recreate the connection. To avoid duplicate events, ensure all events have an `insert_id` set, and recreate the connection within seven days.
-- The initial import job transfers all data from the source. Subsequent jobs import the differences from the last successful import.
-- Snowflake [`CHANGES`](https://docs.snowflake.com/en/sql-reference/constructs/changes#usage-notes) limitations apply.
+| Data Type | Description |
+| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Event | Includes user actions associated with either a user ID or a device ID and may also include event properties. |
+| User Properties | Includes dictionaries of user attributes you can use to segment users. Each property is associated with a user ID. |
+| Group Properties | Includes dictionaries of group attributes that apply to a a group of users. Each property is associated with a group name. |
+| Profiles | Includes dictionaries of properties that relate to a user profile. Profiles display the most current data synced from your warehouse, and are associated with a user ID. |
-### Custom SQL query
+### Select the import strategy
-The Custom SQL query supports time-based import of events, user properties, and group properties, and full syncs of user properties and group properties.
+Select from the following strategies, depending on your data type selection.
-For Time-based import, Amplitude requires that you use a monotonically increasing timestamp value. This value should show when the record loaded into the source table the SQL configuration is querying. The warehouse import tool brings data into Amplitude by continually updating the maximum value of the column referenced in the *Timestamp Column Name* input within the Import Config UI with each subsequent import.
+| Strategy | Description |
+| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Full Sync | Ingests the entire dataset on a defined schedule. This option is useful for datasets that change over time, but can't show which rows are changed. |
+| Timestamp | Ingests the most recent rows on a schedule, as determined by the Timestamp column. |
+| Change data capture (CDC) | Ingests the most recent rows of data on a schedule, as determined by Snowflake's Change Data Capture feature. CDC supports customization of the feed type (for event data) and data mutability settings.|
-{{partial:admonition type="example" title=""}}
-Upon first import, Amplitude imports all the data returned from the query configured in the Import Config. Amplitude saves a reference of the maximum timestamp referenced in the *Timestamp Column Name*: `timestamp_1`. Upon subsequent import, Amplitude imports all data from the timestamp saved earlier (`timestamp_1`), to what's now the new maximum timestamp (`timestamp_2`). Then after that import, Amplitude saves `timestamp_2` as the new maximum timestamp.
+See the following table to understand which data types are compatible with which import strategies.
+
+| Data type | Supported import strategies |
+| -------| ----- |
+| Event | CDC, Timestamp |
+| User properties | Full Sync, Timestamp |
+| Group Properties | Full Sync, Timestamp |
+| Profiles | CDC |
+
+{{partial:admonition type="note" heading="Change Data Capture options"}}
+For the `Event` data type, the CDC strategy supports configuration of the CDC feed type.
+
+Select *Ingestion Only* to ingest from your warehouse and include Amplitude's enrichment services like ID Resolution, property and attribution syncing, and location resolution.
+
+Select *Continuous Sync* to mirror your Snowflake data with support for `insert`, `update`, and `delete` operations. This option deactivates Amplitude's enrichment services to ensure you remain in sync with your source-of-truth.
+
+*Continuous Sync* also supports Data Mutability settings. Select which options to enable, `update` or `delete`. `insert` operations are always on.
{{/partial:admonition}}
-## Add Snowflake as a source
+### Map your data
-To add Snowflake as a data source in your Amplitude project, follow these steps:
+Depending on the import strategy you choose, either map your data with a SQL statement to transform the data (Timestamp, Full Sync) or use the data selection tool to map column names directly to Amplitude properties.
-1. In Amplitude Data, navigate to *Catalog -> Sources*.
-2. In the Warehouse Sources section, click *Snowflake*.
-3. Enter the required credentials for the Snowflake instance you want to connect:
- - *Account*: Snowflake account name. Case sensitive. This is the first part of your Snowflake URL, before `snowflakecomputing.com`. Don't include ".snowflakecomputing.com" in your account name.
- - *Database*: Name of the database where Amplitude can find the data.
- - *Warehouse*: Used by Amplitude to execute SQL.
- - *Username*: Used by Amplitude for authentication.
- - *Password*: Used by Amplitude for authentication.
+### Schedule your sync
+
+Provide a name for the source, and set the frequency with which Amplitude imports your data.
+
+## Choose the best integration for your use case
+
+When choosing an integration strategy, consider the following:
+
+- **Full Sync**: Choose this option if you need to periodically ingest the entire dataset and can't track which rows have changed. This method is best for smaller datasets where tracking incrementally isn't possible. This method isn't suitable for large datasets due to the overhead required to ingest all data each time.
+
+- **Timestamp Import**: Choose this option if you can incrementally import data using a monotonically increasing timestamp column that indicates when records when Snowflake loads the records. This is efficient and works well when you append new data with timestamps.
- Amplitude offers password-based and key pair authentication for Snowflake.
- If you want to use password authentication, select *Password* and enter your password in the *Password* field. If you want to use key pair authentication, select *Key pair* and then click *Generate Key*. Then provide the organization and account names in the format `ORGNAME-ACCOUNTNAME`.
+- **Change Data Capture (CDC) Ingestion Only**: Choose this option to import data based on changes detected by Snowflake's CDC feature while still using Amplitude's enrichment services. This method only supports reading `INSERT` operations from the CDC
-4. Copy the autogenerated SQL query and run it in Snowflake to give Amplitude the proper permissions.
-5. After running the query, click *Next* to test the connection.
-6. After the test is successful, click *Next* again to move on to the data selection stage.
-7. Choose the modeling method, either [Change Data Capture](#table-selection-ui-settings) or [Custom SQL Query](#custom-sql-query-settings).
+- **Change Data Capture (CDC) Continuous Sync**: Choose this option to directly mirror the data in Snowflake with `INSERT`, `UPDATE`, and `DELETE` operations based on changes detected by Snowflake's CDC feature. This method disables Amplitude's enrichment services to remain in sync with your source of truth and is ideal when you need to keep Amplitude data fully synchronized with your Snowflake data. `UPDATE` and `DELETE` operations mutate data in Amplitude.
-### Change Data Capture settings
+{{partial:partials/data/snowflake-strat-comp}}
-Configure the modeling method:
+## Prerequisites and considerations for CDC
-- **Data source**: Choose a table or view from the left panel.
-- **Data type**: Select if the table maps to event, user property, or group property data.
-- **Frequency**: Select the interval with which Amplitude should check for changes in the Snowflake table.
+When using CDC Continuous Sync, keep the following things in mind:
-Map the required and custom fields: Setup name mapping between columns in the Snowflake data source and data field name that Amplitude requires. For more information, see [Data fields](#data-fields) below.
+- **Enable Change Tracking**: Enable change tracking for the source table or view. See [Enabling Change Tracking on Views and Underlying Tables](https://docs.snowflake.com/en/user-guide/streams-manage.html#label-enabling-change-tracking-views) in Snowflake's documentation.
-When complete, click **Test Mapping** to verify the correct data appears under the right property in Amplitude.
+- **Data Retention Settings**: `DATA_RETENTION_TIME_IN_DAYS` must be greater than or equal to one, but Amplitude recommends at least seven days. Otherwise, the change-based import fails. For more details, see [Time Travel](https://docs.snowflake.com/en/user-guide/data-time-travel) in Snowflake's documentation. Setting `DATA_RETENTION_TIME_IN_DAYS` to `0` disables the change tracking and renders the connection unrecoverable. If this happens, recreate the source.
-### Custom SQL query settings
+- **Disable Change Tracking**: If you disable change tracking in Snowflake, or disconnect the Amplitude source for a period longer than the value of `DATA_RETENTION_TIME_IN_DAYS`, Amplitude loses the ability to track historical changes. In this case, recreate the connection. To avoid duplicate events, ensure all events have an `insert_id` set, and recreate the connection within seven days.
-Choose your configuration options:
+- **Unique and Immutable `insert_id`**: Ensure the data to be imported has a unique and immutable `insert_id` for each row to prevent data duplication if there are any unexpected issues. More about Amplitude deduplication and `insert_id` is available in [Event Deduplication](/docs/apis/analytics/http-v2/#event-deduplication).
-- *Type of data*: This tells Amplitude whether you're ingesting event data, user property data, or group property data.
-- *Type of import:*
- - *Full Sync*: Amplitude periodically ingests the entire dataset, regardless of whether that data has already been imported. This is good for data sets where the row data changes over time, but there is no easy way to tell which rows have changed. Otherwise, the more efficient option would be a time-based import. This option isn't supported for ingesting event data.
- - *Time-based*: Amplitude periodically ingests the most recent rows in the data, as determined by the provided *Timestamp* column. The first import brings in all available data, and later ingests any data with timestamps after the maximum timestamp seen during the last import job. To use this, include the timestamp of the data load into Snowflake. For more information on how this works, see [the time-based import](#time-based-import) section.
-- *Frequency*: Choose from several scheduling options ranging from five minutes to one month. With the one month option, Amplitude ingests data on the first of the month.
-- *SQL query*: This is the code for the query Amplitude uses to decide which data is ingested.
+- **Complex SQL Statements**: If a data source is represented as a complex SQL `SELECT` statement (for instance, with a `JOIN` clause), create a `VIEW` in your Snowflake account that wraps the data source to use it with a change-based import strategy. See [Streams on Views](https://docs.snowflake.com/en/user-guide/streams-intro#streams-on-views) for considerations when using CDC with views in Snowflake.
-Finish the configuration:
+- **Views with JOINs**: While Snowflake CDC is efficient, using views that contain JOINs can have performance implications. Consider syncing joined data as user profiles instead.
-1. After you've set your configuration options, click *Test SQL* to see how the data is coming through from your Snowflake instance. Errors appear on this screen.
-2. If there are no errors, click *Finish*.
+- **Avoid table deletion and re-creation**: Don't delete and recreate tables with the same name, as Snowflake CDC doesn't capture changes in this scenario. Use [incremental models](https://docs.getdbt.com/docs/build/incremental-models) with tools like [dbt](https://www.getdbt.com/) to prevent table replacement.
-Amplitude displays a notification indicating you enable the new Snowflake source and redirects you to the Sources listing page.
+- **Handling schema changes**: CDC supports adding new columns with default `NULL` values to CDC-tracked tables or views. Amplitude recommends against other kinds of schema changes. Snowflake CDC only reflects changes from DML statements. DDL statements that logically modify data (such as adding new columns with default values, dropping existing columns, or renaming columns) affect future data sent to Amplitude, but Snowflake doesn't update historical data with changes caused by DDL statements. As a result, Amplitude doesn't reflect these updates for historical data.
-If you have any issues or questions while following this flow, contact the Amplitude team.
+- **Amplitude enrichment services disabled**: When using CDC **Continuous Sync**, Amplitude disables enrichment services like ID resolution, property and attribution syncing, and resolving location info to remain in sync with your source of truth.
-## Migrate from custom SQL to Change Data Capture
+## Migrate from custom SQL to CDC
To change the modeling method of your Snowflake source:
diff --git a/public/docs/css/site.css b/public/docs/css/site.css
index 3fe5a3cc2..dd2afbbce 100644
--- a/public/docs/css/site.css
+++ b/public/docs/css/site.css
@@ -971,8 +971,6 @@ video {
margin-top: 2rem;
}.mt-auto {
margin-top: auto;
-}.ml-auto {
- margin-left: auto;
}.box-border {
box-sizing: border-box;
}.block {
@@ -1024,8 +1022,6 @@ video {
height: max-content;
}.h-screen {
height: 100vh;
-}.h-5 {
- height: 1.25rem;
}.max-h-12 {
max-height: 3rem;
}.max-h-\[575px\] {
@@ -1057,8 +1053,6 @@ video {
}.w-max {
width: -moz-max-content;
width: max-content;
-}.w-52 {
- width: 13rem;
}.min-w-\[712px\] {
min-width: 712px;
}.max-w-12 {
@@ -1149,8 +1143,6 @@ video {
flex-wrap: nowrap;
}.content-center {
align-content: center;
-}.content-between {
- align-content: space-between;
}.items-start {
align-items: flex-start;
}.items-center {
@@ -1175,16 +1167,12 @@ video {
--tw-space-y-reverse: 0;
margin-top: calc(1rem * calc(1 - var(--tw-space-y-reverse)));
margin-bottom: calc(1rem * var(--tw-space-y-reverse));
-}.place-self-start {
- place-self: start;
}.self-start {
align-self: flex-start;
}.self-end {
align-self: flex-end;
}.self-center {
align-self: center;
-}.justify-self-end {
- justify-self: end;
}.overflow-auto {
overflow: auto;
}.overflow-hidden {
@@ -1289,10 +1277,6 @@ video {
fill: #414349;
}.fill-white {
fill: #ffffff;
-}.fill-mint-700 {
- fill: #028376;
-}.fill-error-red {
- fill: #EC4747;
}.p-0 {
padding: 0px;
}.p-2 {
@@ -2210,6 +2194,16 @@ line-height: 1.25rem;
}
}th {
text-align: left;
+}.snow-comp th:first-child, .snow-comp td:first-child {
+ position: sticky;
+ left: 0;
+ background-color: #fff; /* Optional: Keep the frozen column background intact */
+ z-index: 1; /* Ensure the first column stays above the scrolling part */
+ border-right: 1px solid #ddd;
+}.snow-comp td {
+ min-width: 200px;
+ word-wrap: break-word;
+ white-space: normal;
}.hover\:h-6:hover {
height: 1.5rem;
}.hover\:h-8:hover {
diff --git a/resources/docs/css/site.css b/resources/docs/css/site.css
index 90550af6d..b0813c846 100644
--- a/resources/docs/css/site.css
+++ b/resources/docs/css/site.css
@@ -989,4 +989,18 @@ line-height: 1.25rem;
}
th {
text-align: left;
+}
+
+.snow-comp th:first-child, .snow-comp td:first-child {
+ position: sticky;
+ left: 0;
+ background-color: #fff; /* Optional: Keep the frozen column background intact */
+ z-index: 1; /* Ensure the first column stays above the scrolling part */
+ border-right: 1px solid #ddd;
+}
+
+.snow-comp td {
+ min-width: 200px;
+ word-wrap: break-word;
+ white-space: normal;
}
\ No newline at end of file
diff --git a/resources/views/partials/data/snowflake-strat-comp.antlers.html b/resources/views/partials/data/snowflake-strat-comp.antlers.html
new file mode 100644
index 000000000..07c606b95
--- /dev/null
+++ b/resources/views/partials/data/snowflake-strat-comp.antlers.html
@@ -0,0 +1,54 @@
+
Import Strategy | +Data Types Supported | +Data Mutability | +Amplitude Enrichment Services | +Column Mapping Method | +When to Use | +Considerations | +
---|---|---|---|---|---|---|
Full Sync | +User Properties, Group Properties | +N/A | +Enrichment services applied | +Custom SQL SELECT Query | +Use when you need to periodically ingest the entire dataset and cannot track changes incrementally. | +Not suitable for large datasets due to the need to ingest the entire dataset each time. | +
Timestamp | +Events, User Properties, Group Properties | +N/A | +Enrichment services applied | +Custom SQL SELECT Query | +Use when you can track new data using a monotonically increasing timestamp column. | +Requires a timestamp column that indicates when the record was loaded into Snowflake. | +
CDC: Ingest only | +Events | +Insert operations only | +Enrichment services applied | +UI-based table and column selection | +Use when you want to import data based on changes detected by Snowflake's CDC feature, with Amplitude enrichment services. | +Requires change tracking to be enabled in Snowflake. | +
CDC: Continuous Sync | +Events, Profiles | +Supports insert, update, delete operations | +Enrichment services not applied | +UI-based table and column selection | +Use when you want to directly mirror data in Snowflake, including updates and deletions, and keep Amplitude in sync with source data. | +Disables Amplitude's enrichment services to remain in sync with the source of truth. Requires careful consideration of limitations, such as data retention settings in Snowflake and that deletions/renames of columns may not be captured. See limitations section for more details. | +