Merge pull request #306 from amplitude/profiles-article

Profiles article
amplitude · Sep 30, 2024 · ca523a6 · ca523a6
2 parents ab481ab + 9a664ab
commit ca523a6
Show file tree

Hide file tree

Showing 5 changed files with 134 additions and 103 deletions.
diff --git a/content/collections/data/en/profile-properties.md b/content/collections/data/en/profile-properties.md
diff --git a/content/collections/data/en/profiles.md b/content/collections/data/en/profiles.md
@@ -0,0 +1,126 @@
+---
+id: 762c167a-caad-4a25-9758-3b35af55857d
+blueprint: data
+title: Profiles
+landing: false
+exclude_from_sitemap: false
+updated_by: 5817a4fa-a771-417a-aa94-a0b1e7f55eae
+updated_at: 1727721833
+---
+**Profiles** enable you to join customer profile data from your data warehouse with existing behavioral product data already in Amplitude. 
+
+Profiles act as standalone properties, in that they aren't associated with specific events and are instead associated with a user profile. They're different from traditional user properties and offer the opportunity to conduct more expansive analyses.
+
+Profiles always display the most current data synced from your warehouse.
+
+## Before you begin
+
+### Snowflake users
+If this is your first time importing data from this table, set a data retention time and enable change tracking in Snowflake with the following commands:
+
+```sql
+ALTER TABLE DATAPL_DB_STAG.PUBLIC.PROFILES_PROPERTIES_TABLE_1 SET DATA_RETENTION_TIME_IN_DAYS = 7;
+
+ALTER TABLE DATAPL_DB_STAG.PUBLIC.PROFILES_PROPERTIES_TABLE_1 SET CHANGE_TRACKING = TRUE;
+```
+On Snowflake Standard Edition plans, the maximum retention time is one day. If you’re on this plan, you should set the frequency to 12 hours in later steps. 
+
+### Databricks users
+Follow these instructions to [enable change tracking](https://docs.databricks.com/en/delta/delta-change-data-feed.html#enable):
+
+* If you're working with a new table, set the table property `delta.enableChangeDataFeed = true` in the `CREATE TABLE` command:
+    `CREATE TABLE student (id INT, name STRING, age INT) TBLPROPERTIES (delta.enableChangeDataFeed = true)`
+
+    Also set `spark.databricks.delta.properties.defaults.enableChangeDataFeed = true` for all new tables.
+
+* If you're working with an existing table, set the table property `delta.enableChangeDataFeed = true` in the `ALTER TABLE` command:
+    `ALTER TABLE myDeltaTable SET TBLPROPERTIES (delta.enableChangeDataFeed = true)`
+
+Set a [data retention period](https://docs.databricks.com/en/delta/history.html#configure-data-retention-for-time-travel-queries). This must be at least one day, but in most cases you should set this period to seven days or longer. If your retention period is too short, the import process can fail.
+
+## Set up a profile (Snowflake users)
+To set up a profile in Amplitude, follow these steps:
+
+1. In Amplitude Data, navigate to *Connections Overview*. Then in the *Sources panel*, click *Add More*. Scroll down until you find the Snowflake tile and click it.
+2. On the *Set Up Connection* tab, connect Amplitude to your data warehouse by filling in all the relevant fields under *Snowflake Credentials*, which are  outlined in the [Snowflake Data Import guide](/docs/data/source-catalog/snowflake#add-snowflake-as-a-source). You can either create a new connection, or reuse an existing one. Click *Next* when you're done.
+3. You can see a list of your tables under *Select Table*. To begin column mapping, click the table you're interested in.
+4. In the list of required fields under *Column Mapping*, enter the column names in the appropriate fields to match columns to required fields. To add more fields, click *+ Add field*.
+5. On the *Select Data* tab, select the `profiles` data type. Amplitude pre-selects the required change data capture import strategy for you, which you can see under the *Select Import Strategy* dropdown:
+
+    * **Insert**: Always on, creates new profiles when added to your table.
+    * **Update**: Syncs changes to values from your table to Amplitude.
+    * **Delete**: Syncs deletions from your table to Amplitude.
+
+6. When you're done, click *Test Mapping* verify your mapping information. Then click *Next*.
+7. Name the source and set the frequency at which Amplitude should refresh your profiles from the data warehouse. You should set the frequency to 12 hours if you are on Snowflake Standard Edition.
+
+## Set up a profile (Databricks users)
+To set up a profile in Amplitude, follow these steps:
+
+1. In Amplitude Data, navigate to *Connections Overview*. Then in the *Sources* panel, click Add More. Scroll down until you find the Databricks tile and click it.
+2. In the *Set Up Connection* tab, connect Amplitude to your data warehouse. Have the following information ready:
+    * **Server hostname**: This is the hostname of your Databricks cluster. You can find it in your cluster configuration by navigating to *Advanced Options -> JDBC/ODBC -> Server Hostname*.
+    * **HTTP path**: This is the HTTP path of the cluster you would like to connect to. You can find it in your cluster configuration by navigating to *Advanced Options -> JDBC/ODBC -> HTTP Path*.
+    * **Personal access token**: Use the personal access token to authenticate with your Databricks cluster. [Learn how to create them here](https://docs.databricks.com/en/dev-tools/auth/index.html#common-tasks-for-databricks-authentication).
+
+    Click Next when you're done.
+3. You can see a list of your tables under *Select Table*. To begin column mapping, click the table you're interested in.
+4. In the list of required fields under *Column Mapping*, enter the column names in the appropriate fields to match columns to required fields. To add more fields, click *+ Add field*.
+5. In the *Data Selection* tab, select the `profiles` data type.
+6. When you're done, click *Test Mapping* to verify your mapping information. Then click *Next*.
+7. Name the source and set the frequency at which Amplitude should refresh your profiles from the data warehouse. The default frequency is 12 hours, but you can change it.
+
+## Data specifications
+Profiles supports a maximum of 200 warehouse properties, and supports known Amplitude users. A `user_id` must go with each profile.
+
+| Field               | Description                                                                                                                   | Example                  |
+| ------------------- | ----------------------------------------------------------------------------------------------------------------------------- | ------------------------ |
+| `user_id`             | Identifier for the user. Must have a minimum length of 5.                                                                     | 
+| `Profile Property 1`  | Profile property set at the user level. The value of this field is the value from the customer’s source since last sync. |
+| `Profile Property 2` | Profile property set at the user level. The value of this field is the value from the customer’s source since last sync. |
+
+Example:
+```json
+{
+  "user_id": 12345,
+  "number of purchases": 10,
+  "title": "Data Engineer"
+}
+```
+
+See [this article for information on Snowflake profiles](/docs/data/source-catalog/snowflake#profile-properties).
+
+## SQL template
+
+```sql
+SELECT
+         AS "user_id",
+         AS "profile_property_1",
+         AS "profile_property_2"
+FROM DATABASE_NAME.SCHEMA_NAME.TABLE_OR_VIEW_NAME
+```
+
+## Clear a profile value
+
+When you remove profile values in your data warehouse, those values sync to Amplitude during the next sync operation. You can also use Amplitude Data to remove unused property fields from users in Amplitude.
+
+## Sample queries
+
+```sql
+SELECT 
+	user_id as "user_id",
+	upgrade_propensity_score as "Upgrade Propensity Score",
+	user_model_version as "User Model Version"
+FROM
+	ml_models.prod_propensity_scoring
+```
+
+```sql
+SELECT 
+	m.uid as "user_id",
+	m.title as "Title",
+	m.seniority as "Seniority",
+	m.dma as "DMA"
+FROM
+	prod_users.demo_data m
+```
diff --git a/content/trees/collections/en/data.yaml b/content/trees/collections/en/data.yaml
@@ -13,8 +13,6 @@ tree:
     entry: 1c4b9202-0063-4acf-9b99-66b73435630b
   -
     entry: 42e4e239-54c9-4ac4-9170-b535a1fd5eba
-  -
-    entry: 762c167a-caad-4a25-9758-3b35af55857d
   -
     entry: e738a4d5-a463-405e-aad0-665115a2b631
   -

diff --git a/content/trees/navigation/en/data.yaml b/content/trees/navigation/en/data.yaml
@@ -204,6 +204,9 @@ tree:
           -
             id: eb2ff912-6151-4093-b5c0-61762b0f4fc5
             entry: 69cefed6-2b87-4333-8cc6-ba5bac1b41e5
+          -
+            id: d903dce8-3a04-4f78-a85b-5703d0e6ef29
+            entry: 762c167a-caad-4a25-9758-3b35af55857d
           -
             id: 2d36d2b4-0d23-4bf0-9127-224901dfa27a
             entry: 1c4b9202-0063-4acf-9b99-66b73435630b

diff --git a/vercel.json b/vercel.json
@@ -1,6 +1,11 @@
 {
   "trailingSlash": false,
   "redirects": [
+    {
+      "source": "/docs/data/profile-properties",
+      "destination": "/docs/data/profiles",
+      "statusCode": 301
+    },
     {
       "source": "/docs/hc/en-us/categories/5078631395227((?:-[a-zA-Z0-9-]+)?)",
       "destination": "/docs/data",