Skip to content

Commit

Permalink
Draft
Browse files Browse the repository at this point in the history
  • Loading branch information
markzegarelli committed Sep 27, 2024
1 parent 1c9951b commit e27a643
Showing 1 changed file with 43 additions and 34 deletions.
77 changes: 43 additions & 34 deletions content/collections/source-catalog/en/snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,10 @@ With Amplitude's Snowflake integration, you can ingest Snowflake data directly i
{{partial:admonition type="note" heading="Amplitude regional IP addresses"}}
Depending on your company's network policy, you may need add these IP addresses to your allowlist in order for Amplitude's servers to access your Snowflake instance:

| Region | IP Addresses |
| -------| ------------|
| US | `52.33.3.219`, `35.162.216.242`, `52.27.10.221` |
| EU | `3.124.22.25`, `18.157.59.125`, `18.192.47.195`|
| Region | IP Addresses |
| ------ | ----------------------------------------------- |
| US | `52.33.3.219`, `35.162.216.242`, `52.27.10.221` |
| EU | `3.124.22.25`, `18.157.59.125`, `18.192.47.195` |

{{/partial:admonition}}

Expand All @@ -47,10 +47,13 @@ Amplitude's Data Warehouse Import sometimes processes events in parallel, so tim

Complete the following steps to configure the Snowflake source:

1. [Define and verify the connection](#define-and-verify-the-connection)
1. [Set up and verify the connection](#set-up-and-verify-the-connection)
2. [Select data](#select-data)
3. [Select the import strategy](#select-the-import-strategy)
4. [Map your data](#map-your-data)
5. [Schedule your sync](#schedule-your-sync)

### Define and verify the connection
### Set up and verify the connection

To add Snowflake as a data source in your Amplitude project, follow these steps:

Expand All @@ -77,49 +80,55 @@ To add Snowflake as a data source in your Amplitude project, follow these steps:

6. After the test is successful, click **Next** again to move on to the data selection stage.

### Select data
### Select the data type

The data type you select informs the strategies and settings available to you for configuration.

#### Event
| Data Type | Description |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Event | Includes user actions associated with either a user ID or a device ID and may also include event properties. |
| User Properties | Includes dictionaries of user attributes you can use to segment users. Each property is associated with a user ID. |
| Group Properties | Includes dictionaries of group attributes that apply to a a group of users. Each property is associated with a group name. |
| Profiles | Includes dictionaries of properties that relate to a user profile. Profiles display the most current data synced from your warehouse, and are associated with a user ID. |

The **Event** data type includes user actions associated with either a user ID or a device ID and may also include event properties.
### Select the import strategy

## Select import strategy and change data feed type
Select from the following strategies, depending on your data type selection.

Amplitude's Snowflake Data Import supports multiple strategies for importing data from Snowflake. Selecting the appropriate import strategy and change data feed type is crucial for ensuring that your data is accurately and efficiently imported into Amplitude.
| Strategy | Description |
| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| Full Sync | Ingests the entire dataset on a defined schedule. This option is useful for datasets that change over time, but can't show which rows are changed. |

Check warning on line 100 in content/collections/source-catalog/en/snowflake.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Passive] 'are changed' looks like passive voice. Raw Output: {"message": "[Amplitude.Passive] 'are changed' looks like passive voice.", "location": {"path": "content/collections/source-catalog/en/snowflake.md", "range": {"start": {"line": 100, "column": 149}}}, "severity": "WARNING"}
| Timestamp | Ingests the most recent rows in the data on a schedule, as determined by the Timestamp column. |
| Change data capture (CDC) | Ingests the most recent rows of data on a schedule, as determined by Snowflake's Change Data Capture feature. CDC supports customization of the Feed Type (for Event data) and Data Mutability Settings.|

### Select the data type
See the following table to understand which data types are compatible with which import strategies.

- **Event**: User actions associated with either a user ID or a device ID. May also include event properties.
- **User Properties**: Dictionaries of user attributes that can be used to segment users. Associated with a user ID.
- **Group Properties**: Dictionaries of group attributes applied to a group of users. Associated with a group name.
- **Profiles**: Dictionaries of properties that are associated with a user profile. Profiles always display the most current data synced from your warehouse. Associated with a user ID.
| Data type | Supported import strategies |
| -------| ----- |
| Event | CDC, Timestamp |
| User properties | Full Sync, Timestamp |
| Group Properties | Full Sync, Timestamp |
| Profiles | CDC |

### Select the import strategy
{{partial:admonition type="note" heading="Change Data Capture options"}}
For the Event data type, the CDC strategy supports configuration of the CDC Feed Type.

- **Full Sync** (only for user and group properties): Periodically ingest the entire dataset, regardless of whether that data has already been imported. Useful for datasets that change over time but have no easy way to tell which rows have changed.
- **Timestamp** (for events, user and group properties): Periodically ingest the most recent rows in the data, as determined by the provided Timestamp column.
- **Change Data Capture** (for events and profiles): Periodically ingest data based on changes detected by Snowflake's Change Data Capture (CDC) feature.
Select *Ingestion Only* to ingest from your warehouse and include Amplitude's enrichment services like ID Resolution, property and attribution syncing, and location resolution.

### Choose a data feed type
Select *Continuous Sync* to mirror your Snowflake data with support for `insert`, `update`, and `delete` operations. This option deactivates Amplitude's enrichment services to ensure you remain in sync with your source-of-truth.

{{partial:admonition type="tip" heading=""}}
This step applies only to the Change Data Capture import strategy.
*Continuous Sync* also supports Data Mutability settings. Select which options to enable, `update` or `delete`. `insert` operations are always on.
{{/partial:admonition}}

If you selected the Change Data Capture import strategy for Event import, you must choose the change data feed type:
### Map your data

- **Ingestion Only**: Ingest data warehouse data with Amplitude's out-of-the-box enrichment services (ID resolution, property and attribution syncing, resolving location info, etc.).
- **Continuous Sync**: Directly mirror the data in Snowflake with insert, update, and delete operations. This deactivates Amplitude's enrichment services to remain in sync with your source of truth.
Depending on the Import Strategy you choose, you'll map your data with a SQL statement to transform the data (Timestamp, Full Sync) or use the Data Selection Tool to map column names directly to Amplitude properties.

Check warning on line 125 in content/collections/source-catalog/en/snowflake.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Amplitude.Will] Future tense! Avoid using 'you'll map'. Raw Output: {"message": "[Amplitude.Will] Future tense! Avoid using 'you'll map'.", "location": {"path": "content/collections/source-catalog/en/snowflake.md", "range": {"start": {"line": 125, "column": 46}}}, "severity": "WARNING"}

### Data mutability settings
### Schedule your sync

{{partial:admonition type="note" heading="Data Mutability settings"}}
If you chose Continuous Sync in the previous step, enable or disable the ability to update and delete data records.
{{/partial:admonition}}
Provide a name for the source, and select the frequency with which Amplitude imports your data.

### Choose the best integration for your use case
## Choose the best integration for your use case

When choosing an integration strategy, consider the following:

Expand All @@ -133,7 +142,7 @@ When choosing an integration strategy, consider the following:

{{partial:partials/data/snowflake-strat-comp}}

### Prerequisites and considerations for CDC
## Prerequisites and considerations for CDC

When using CDC Continuous Sync, be aware of the following limitations:

Expand All @@ -149,9 +158,9 @@ When using CDC Continuous Sync, be aware of the following limitations:

- **Views with JOINs**: While Snowflake CDC is efficient, using views that contain JOINs can have performance implications. Consider syncing joined data as User Profiles instead.

- **Avoid Table Deletion and Recreation**: Don't delete and recreate tables with the same name, as Snowflake CDC doesn't capture changes in this scenario. Use [incremental models](https://docs.getdbt.com/docs/build/incremental-models) with tools like dbt to prevent table replacement.
- **Avoid Table Deletion and Recreation**: Don't delete and recreate tables with the same name, as Snowflake CDC doesn't capture changes in this scenario. Use [incremental models](https://docs.getdbt.com/docs/build/incremental-models) with tools like [dbt](https://www.getdbt.com/) to prevent table replacement.

Check warning on line 161 in content/collections/source-catalog/en/snowflake.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'dbt'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'dbt'?", "location": {"path": "content/collections/source-catalog/en/snowflake.md", "range": {"start": {"line": 161, "column": 253}}}, "severity": "WARNING"}

- **Handling Schema Changes**: Adding new columns with default NULL values to CDC-tracked tables or views is supported. Other types of schema changes are not recommended. Snowflake CDC only reflects changes from DML statements. DDL statements that logically modify data (such as adding new columns with default values, dropping existing columns, or renaming columns) will affect future data sent to Amplitude, but Snowflake will not update historical data with changes caused by DDL statements. Hence, such updates won't be reflected in Amplitude for historical data.
- **Handling Schema Changes**: CDC supports adding new columns with default `NULL` values to CDC-tracked tables or views. Amplitude recommends against other kinds of schema changes. Snowflake CDC only reflects changes from DML statements. DDL statements that logically modify data (such as adding new columns with default values, dropping existing columns, or renaming columns) affect future data sent to Amplitude, but Snowflake doesn't update historical data with changes caused by DDL statements. As a result, Amplitude doesn't reflect these updates for historical data.

- **Amplitude Enrichment Services Disabled**: When using CDC **Continuous Sync**, Amplitude disables enrichment services like ID resolution, property and attribution syncing, and resolving location info to remain in sync with your source of truth.

Expand Down

0 comments on commit e27a643

Please sign in to comment.