diff --git a/docs/articles/user_acquisition_analytics.md b/docs/articles/user_acquisition_analytics.md index 1cde0bb88..df9120c6c 100644 --- a/docs/articles/user_acquisition_analytics.md +++ b/docs/articles/user_acquisition_analytics.md @@ -1,43 +1,32 @@ -Notebook 4: Analytics -[Article built around this notebook: https://github.com/superlinked/superlinked/blob/main/notebook/analytics_user_acquisition.ipynb] - # User Acquisition Analytics -Organizations have until recently done their user acquisition analytics on structured data. But now, vector embeddings - because they capture semantic meaning and context - enable orgs to incorporate unstructured data formats such as text into their queries, permitting a more detailed understanding of what drives user behavior, to inform user targeting strategies. Still, vector embeddings present several challenges - including choosing the right embedding techniques, computational complexity, and interpretability. - -[outline...] -In this article, we'll show you how to use the Superlinked library to overcome these challenges - letting you leverage vector embedding power to identify and analyze users on the basis of how they respond to particular ad messaging, target them more precisely, and improve conversion rates in your campaigns. - -## Vector embedding - power & challenges +Organizations have until recently done most of their user acquisition analytics on structured data. Vector embeddings have changed this. Vectors capture the semantic meaning and context of unstructured data such as text, providing more nuanced and detailed insights to inform user acquisition analysis - enabling organizations to create more strategic marketing and sales campaigns. -By capturing intricate relationships and patterns between data points and representing them as high-dimensional vectors in a latent space, embeddings empower you to extract deeper insights from complex datasets, thereby enabling more nuanced analysis, interpretation, and accurate predictions to inform your ad campaign decision-making. +In this article, we'll show you how to use the Superlinked library to leverage vector embedding power to **identify and analyze users on the basis of how they respond to particular ad messaging, target them more precisely, and improve conversion rates in your campaigns**. -But while vector embeddings are a powerful tool for user analysis, they also introduce **additional challenges**: +You can implement each step below as you read, in the corresponding [notebook](https://github.com/superlinked/superlinked/blob/main/notebook/analytics_user_acquisition.ipynb) and [colab](https://colab.research.google.com/github/superlinked/superlinked/blob/main/notebook/analytics_user_acquisition.ipynb). -- *Quality and relevance* - to achieve good retrieval results and avoid postprocessing and reranking, embedding generation techniques and parameters need to be selected carefully -- *Scalability with high-dimensional data* - rich data increases computational complexity and resource requirements, especially when working with large datasets -- *Interpretability* - identifying underlying patterns and relationships (including how specific users respond to certain ad types) embedded in abstract vectors can be tricky +## Embedding with smarter vectors -## Smarter vectors +By capturing intricate relationships and patterns between data points, and representing them as high-dimensional vectors in a latent space, embeddings empower you to extract deeper insights from complex datasets. With smart vectors, you can do more nuanced analysis, interpretation, and accurate predictions to inform your ad campaign decision-making. -Superlinked's framework helps overcome these challenges by: enabling you to create vectors that are smarter representations of your data and let you optimize retrieval - i.e., get high quality, actionable insights - without postprocessing or reranking. In the process, we'll visualize and understand how different users respond to different ad creatives. +Superlinked's framework lets you create vectors that are smarter representations of your data, empowering you to retrieve high quality, actionable insights (e.g., understanding why different ad creatives attract different users) without postprocessing or reranking. -Let's walk through how you can perform user acquisition analytics on different ad sets using Superlinked library elements, namely: +Let's walk through how you can perform user acquisition analysis on ad creatives using Superlinked library elements, namely: -- **Recency space** - to understand the freshness of information -- **Number space** - to interpret user activity -- **TextSimilarity space** - to interpret the ad creatives -- **Query time weights** - optimize your results by defining how you treat your data when you run a query, without needing to re-embed the whole dataset +Recency space - encodes when a data point occurred (e.g., users' signup date) +Number space - encodes the frequency of a data event (e.g., subscribed users' API calls/day) +TextSimilarity space - encodes the semantic meaning of text data (e.g., campaign ad_creative text) ## User data -We have data from two recent 2023 ad campaigns - one from August (with more generic ad messages), and another from December (assisted by a made-up influencer, "XYZCr$$d"). Our data (for 8000 users) includes: +We have data from two 2023 ad campaigns - one from August (with more generic ad messages), and another from December (assisted by a made-up influencer, "XYZCr$$d"). Our data (for 8000 users) includes: -1. signup date, as unix timestamp -2. the ad creative a user clicked on [before signing up?] -3. average (user) daily activity, measured in API calls/day [what kind of user activity does this represent? and calls/day over how many days?] +1. user signup date, as unix timestamp +2. the ad creative a user clicked on before signing up +3. average daily activity (for users who signed up by clicking on a campaign ad_creative), measured in api calls/day (over the user's lifetime) -To make our ad campaigns smarter, we want to know which users to target with which kinds of ad messaging. We can discover this by embedding our data into a vectorspace, where we can cluster them and find meaningful user groups - using a UMAP visualization to examine the cluster labels' relationship to features of the ad creatives. +We **want to know which creatives bring in what kinds of users**, so we can create ad messaging that attracts and retains active users. By embedding our data into a vectorspace using Superlinked Spaces, we're able to cluster users, find meaningful groups, and use a UMAP visualization to examine the relationship between cluster labels and features of our ad creatives. Let's get started. @@ -46,33 +35,36 @@ Let's get started. First, we install superlinked and umap. ```python -%pip install superlinked==9.32.1 +%pip install superlinked==9.47.1 %pip install umap-learn ``` -Next, import all our components and constants. - -(Note: Omit `alt.renderers.enable(“mimetype”)` if you’re running this in [google colab](https://colab.research.google.com/github/superlinked/superlinked/blob/main/notebook/recommendations_e_commerce.ipynb). Keep it if you’re executing in [github](https://github.com/superlinked/VectorHub/blob/main/docs/articles/ecomm-recys.md).) +Next, we import all our dependencies and declare our constants. ```python -from datetime import datetime, timedelta import os import sys + +from datetime import datetime, timedelta +from typing import Any + import altair as alt import numpy as np import pandas as pd -from sklearn.cluster import HDBSCAN import umap +from sklearn.cluster import HDBSCAN + from superlinked.evaluation.charts.recency_plotter import RecencyPlotter from superlinked.evaluation.vector_sampler import VectorSampler from superlinked.framework.common.dag.context import CONTEXT_COMMON, CONTEXT_COMMON_NOW from superlinked.framework.common.dag.period_time import PeriodTime from superlinked.framework.common.embedding.number_embedding import Mode +from superlinked.framework.common.parser.dataframe_parser import DataFrameParser from superlinked.framework.common.schema.schema import Schema from superlinked.framework.common.schema.schema_object import String, Float, Timestamp from superlinked.framework.common.schema.id_schema_object import IdField -from superlinked.framework.common.parser.dataframe_parser import DataFrameParser +from superlinked.framework.common.util.interactive_util import get_altair_renderer from superlinked.framework.dsl.executor.in_memory.in_memory_executor import ( InMemoryExecutor, InMemoryApp, @@ -84,29 +76,29 @@ from superlinked.framework.dsl.space.number_space import NumberSpace from superlinked.framework.dsl.space.recency_space import RecencySpace -alt.renderers.enable("mimetype") # comment this line out when running in Colab to render altair plots +alt.renderers.enable(get_altair_renderer()) alt.data_transformers.disable_max_rows() -os.environ["TOKENIZERS_PARALLELISM"] = "false" pd.set_option("display.max_colwidth", 190) pd.options.display.float_format = "{:.2f}".format +np.random.seed(0) ``` -Now we import our dataset. +Here's where we declare constants. ```python -dataset_repository_url = ( +DATASET_REPOSITORY_URL: str = ( "https://storage.googleapis.com/superlinked-notebook-user-acquisiton-analytics" ) -USER_DATASET_URL = f"{dataset_repository_url}/user_acquisiton_data.csv" -NOW_TS = 1708529056 -EXECUTOR_DATA = {CONTEXT_COMMON: {CONTEXT_COMMON_NOW: NOW_TS}} +USER_DATASET_URL: str = f"{DATASET_REPOSITORY_URL}/user_acquisiton_data.csv" +NOW_TS: int = int(datetime(2024, 2, 16).timestamp()) +EXECUTOR_DATA: dict[str, dict[str, Any]] = { + CONTEXT_COMMON: {CONTEXT_COMMON_NOW: NOW_TS} +} ``` -(If you're waiting something you're executing to finish, you can always find interesting reading in [VectorHub](https://superlinked.com/vectorhub/).) - ## Read and explore our dataset -Now that our dataset's imported, let's take a closer look at it: +Let's read our dataset, and then take a closer look at it: ```python NROWS = int(os.getenv("NOTEBOOK_TEST_ROW_LIMIT", str(sys.maxsize))) @@ -115,9 +107,9 @@ print(f"User data dimensions: {user_df.shape}") user_df.head() ``` -We have 8000 users and (as we can see from the first five rows) 4 columns of data: +We have 8000 users, and (as we can see from the first five rows) 4 columns of data: -![first 5 rows of our dataset](/user_id-sign_up_date-ad_creative-activity.png) +![first 5 rows of our dataset](../assets/use_cases/user_acquisition_analytics/user_id-sign_up_date-ad_creative-activity.png) To understand which ad creatives generated how many signups, we create a DataFrame: @@ -125,13 +117,13 @@ To understand which ad creatives generated how many signups, we create a DataFra pd.DataFrame(user_df["ad_creative"].value_counts()) ``` -which looks like this: +...which looks like this: -![ad_creatives by count](/ad_creative-count.png) +![ad_creatives by count](../assets/use_cases/user_acquisition_analytics/ad_creative-count.png) -observation: the influencer (XYZCr$$d) backed ad creatives seem to have worked better - generating many more [signups?] than the August ad creatives. +The influencer (XYZCr$$d) backed ad creatives seem to have worked better - generating many more signups (total 5145) than the August campaign ad creatives (total 2855). -Now, let's take a look at the distribution of users according to activity level. +Now, let's take a look at the **distribution of users by activity level** (api calls/day). ```python alt.Chart(user_df).mark_bar().encode( @@ -140,11 +132,9 @@ alt.Chart(user_df).mark_bar().encode( ).properties(width=600, height=400) ``` -![distribution by activity count](/user_distrib-activity_count.png) +![distribution by activity count](../assets/use_cases/user_acquisition_analytics/user_distrib-activity_count.png) -out[6]: activity = api calls / day? -The activity distribution is bimodal. Here, the first activity level group may be largely new users, who - because they signed up in the most recent (December) campaign - have less time to accumulate activity than users who signed up earlier. -Also, this informs NumberSpace parameters[=? how?] +The activity (api calls/day) distribution is bimodal. Most users are either inactive or have very low activity levels - i.e., they signed up, maybe performed a few actions and then never returned, or returned only occasionally after clicking on a subsequent campaign. The active users distribution has a mode around 0.6-0.8 api calls/day - i.e., returning every 2-3 days, triggering 1-2 api calls/day. (Note: we can also use this activity distribution to derive our NumberSpace min and max.) Now let's examine the distribution of new users per signup date. @@ -158,24 +148,34 @@ alt.Chart(dates_to_plot).mark_bar().encode( ).properties(height=400, width=1200) ``` -![new users per signup date](/new_users-signup_date.png) +![new users per signup date](../assets/use_cases/user_acquisition_analytics/new_users-signup_date.png) + +This distribution confirms that our second campaign (December) works much better than the first (August). The jump in signups at 2023-12-21 coincides with our second campaign, and our data is exclusively users who signed up by clicking on campaign ad_creatives. To get more insights from our user acquisition analysis, we need two periods that fit our distribution. A first period of 65 days and a second of 185 days seem appropriate. + +Of our 8k users, roughly 2k subscribed in the first campaign period (65 days), and 6k in the second. Since we already know which ad_creatives users clicked on to go sign up (2855 on old campaign ads, 5145 on new campaign ads), we know that some of the roughly 6k users clicked on old (August) campaign ads *after* the new (December) campaign ad began (possibly after seeing the new campaign ads). -[observations] First, the distribution confirms that our second campaign (December) works much better than the first (August). -Second, the jump in signups at 2023-12-21 is due to the second campaign (our data is exclusively campaign-related). To analyze the two campaigns, we need two periods: a first period of 65 days and a second of 185 days seems appropriate. -Of our 8k users, roughly 2k subscribed in the first campaign, and 6k in the second. "Maybe the second push brought in subscribers intrinsically, but also through spillover to old ads [??] as well - will see that by the ad_creatives..." -... +Here's what we've established so far: -summarize what we know, and what we don't know that embedding and Superlinked spaces will help reveal +- that many more users signed up in response to the influencer-backed ad creatives (second campaign) than the first +- which specific ad creatives generated the most user signups +- that the vast majority of users have low to moderate activity +We don't know: + +- which ad creatives resulted in low vs medium vs high activity users? +- how did users cluster in terms of signup date, activity levels, *and* ad_creatives they clicked? +- are there user data patterns we can mine to make better decisions when planning our next campaign? + +Fortunately, embedding with Superlinked spaces will help provide answers to these questions, empowering us to adopt a more effective user acquisition strategy. ## Embedding with Superlinked -Now, let's use Superlinked to embed our data in a semantic space - -to -1. inform the model re *which ad creatives* generated **signups**, and *which users* signed up... -2. group ad creatives that have similar meanings.. +Now, let's use Superlinked to embed our data in a semantic space, so that we can: + +1. inform our model as to which ad creatives generated signups, and *which users* (with specific activity level distributions) signed up +2. group specific ad creatives that have similar meanings, and possibly outcomes -Define a schema for our user data: +First, we define a schema for our user data: ```python class UserSchema(Schema): @@ -189,17 +189,20 @@ class UserSchema(Schema): user = UserSchema() ``` -Now we create a semantic space for our ad_creatives using a text similarity model. Then encode user activity into a numerical space to represent users' activity level. We also encode the signup date into a recency space, allowing our clustering algorithm to take account of the two specific periods of signup activity (following our two campaign start dates). +Now we create a **semantic space** for our **ad_creatives** using a text similarity model. Then, we encode user activity into a **numerical space** to represent **users' activity level**. We also encode the **signup date** into a **recency space**, allowing our clustering algorithm to take account of the two specific periods of signup activity (following our two campaign start dates). ```python +# create a semantic space for our ad_creatives using a text similarity model creative_space = TextSimilaritySpace( text=user.ad_creative, model="sentence-transformers/all-mpnet-base-v2" ) +# encode user activity into a numerical space to represent users' activity level activity_space = NumberSpace( number=user.activity, mode=Mode.SIMILAR, min_value=0.0, max_value=1.0 ) +# encode the signup date into a recency space recency_space = RecencySpace( timestamp=user.signup_date, period_time_list=[PeriodTime(timedelta(days=65)), PeriodTime(timedelta(days=185))], @@ -207,18 +210,16 @@ recency_space = RecencySpace( ) ``` -Now, let's plot our recency scores by date. +Let's plot our recency scores by date. ```python recency_plotter = RecencyPlotter(recency_space, context_data=EXECUTOR_DATA) recency_plotter.plot_recency_curve() ``` -![](/recency_scores-by-date.png) +![](../assets/use_cases/user_acquisition_analytics/recency_scores-by-date.png) -[how exactly do the recency scores work?] - -Next, we set up an in-memory data processing pipeline for indexing, parsing, and executing operations on user data, including clustering (where RecencySpace lets our model take account of user signup recency). +Next, we set up an in-memory data processing pipeline for indexing, parsing, and executing operations on user data. First, we create our index with the spaces we use for clustering. @@ -232,7 +233,7 @@ Now for dataframe parsing. user_df_parser = DataFrameParser(schema=user) ``` -We create an `InMemorySource` object to hold the user data in memory, and set up our executor (with our user data source and index) so that it takes account of context data. +We create an `InMemorySource` object to hold the user data in memory, and set up our executor (with our user data source and index) so that it takes account of context data. The executor vectorizes based on the index's grouping of Spaces. ```python source_user: InMemorySource = InMemorySource(user, parser=user_df_parser) @@ -242,15 +243,17 @@ executor: InMemoryExecutor = InMemoryExecutor( app: InMemoryApp = executor.run() ``` -And then input our user data. (This step make take a few minutes. In the meantime, why not read more about vectors in [Vectorhub](https://superlinked.com/vectorhub/).) +Now we input our user data. ```python source_user.put([user_df]) ``` + (The step above make take a few minutes or more. In the meantime, why not learn more about vectors in [Vectorhub](https://superlinked.com/vectorhub/).) + ## Load features -Next, we collect all our vectors from the app. +Next, we collect all our vectors from the app. The vector sampler helps us export vectors so we can cluster and (umap) visualize them. ```python vs = VectorSampler(app=app) @@ -266,31 +269,17 @@ vector_df.shape Here are the first five rows (of 8000), and 776 columns, of the resulting dataframe: -![](/collected_vectors.png) +![](../assets/use_cases/user_acquisition_analytics/collected_vectors.png) ## Clustering Next, we fit a clustering model. -NOTE: If you run into issues running this notebook on Colab, we suggest using your own environment. In Colab, the management of python packages is less straight-forward, which can cause issues. - ```python hdbscan = HDBSCAN(min_cluster_size=500, metric="cosine") hdbscan.fit(vector_df.values) ``` -workaround on colab -```python -%pip install hdbscan -from scipy.spatial import distance -from hdbscan import HDBSCAN -from scipy.spatial import distance -# Assuming vector_df is your DataFrame -mat = distance.cdist(vector_df.values, vector_df.values, metric='cosine') -hdbscan = HDBSCAN(min_cluster_size=500, metric='precomputed') -hdbscan.fit(mat) -``` - Let's create a DataFrame to store the cluster labels assigned by HBDSCAN and count how many users belong to each cluster: ```python @@ -300,7 +289,7 @@ label_df = pd.DataFrame( label_df["cluster_label"].value_counts() ``` -![user distribution by cluster label](/user_distrib-by-cluster_label.png) +![user distribution by cluster label](../assets/use_cases/user_acquisition_analytics/user_distrib-by-cluster_label.png) ## Visualizing the data @@ -318,10 +307,9 @@ umap_df = pd.DataFrame( umap_df = umap_df.join(label_df) ``` -Next,we join our dataframes and create a chart, letting us visualize the UMAP-transformed vectors, and coloring them with cluster labels. +Next, we join our dataframes and create a chart, letting us visualize the UMAP-transformed vectors, and color them with cluster labels. ```python -umap_df = umap_df.join(label_df) alt.Chart(umap_df).mark_circle(size=8).encode( x="dimension_1", y="dimension_2", color="cluster_label:N" ).properties( @@ -331,7 +319,6 @@ alt.Chart(umap_df).mark_circle(size=8).encode( anchor="middle", ).configure_legend( strokeColor="black", - fillColor="#EEEEEE", padding=10, cornerRadius=10, labelFontSize=14, @@ -341,18 +328,16 @@ alt.Chart(umap_df).mark_circle(size=8).encode( ) ``` -![cluster visualization](/cluster_visualization) - +![cluster visualization](../assets/use_cases/user_acquisition_analytics/cluster_visualization.png) -The dark blue clusters (label -1) are outliers - not large or dense enough to form a distinct group. -The large number of blobs [better word?] results from the fact that 1/3 of the vector norm mass is made of not very many ad_creatives. -Note - 2D (UMAP) visualizations often make some clusters look quite dispersed [scattered]. +The dark blue clusters (label -1) are outliers - not large or dense enough to form a distinct group. Our data points are represented by vectors with three attributes (signup recency, ad_creative textSimilarity, and signed up user activity level), each attribute accounting for 1/3 of each vector's mass. The limited number of discrete ad_creatives (13), then, tends towards producing more distinct blobs in the umap visualization of the vector space, whereas signup_date and activity level are more continuous variables, and tend towards producing more dense blobs. +*Note: 2D (UMAP) visualizations can make some clusters look quite dispersed / scattered. ## Understanding the cluster groups -To understand our user clusters better, we can generate some activity histograms. +To understand our user clusters better, we can produce some activity histograms. -First, we join user data with cluster labels, create separate DataFrames for each cluster, generates activity histograms for each cluster, and then concatenates these histograms into a single visualization. +First, we join user data with cluster labels, create separate DataFrames for each cluster, generate activity histograms for each cluster, and then concatenate these histograms into a single visualization. ```python # activity histograms by cluster @@ -366,7 +351,10 @@ by_cluster_data = { activity_histograms = [ alt.Chart(user_df_part) .mark_bar() - .encode(x=alt.X("activity", bin=True), y="count()") + .encode( + x=alt.X("activity", bin=True, scale=alt.Scale(domain=[0, 1.81])), + y=alt.Y("count()", scale=alt.Scale(domain=[0, 1000])), + ) .properties(title=f"Activity histogram for cluster {label}") for label, user_df_part in by_cluster_data.items() ] @@ -374,14 +362,16 @@ activity_histograms = [ alt.hconcat(*activity_histograms) ``` -![histograms of clusters](/cluster_histograms.png) +![histograms of clusters](../assets/use_cases/user_acquisition_analytics/cluster_histograms.png) + +Our users' activity profiles conform to a power-law distribution that's common in user activity profiles: most users tend to be low activity, some medium activity, and very active users quite rare. From our histograms, we can observe that: -- outliers (cluster -1)... [@mor - what does "outliers are the most active relatively - active users are rare" mean?] -- cluster 2 and cluster 3 users are quite similar, positive[?], but low activity -- cluster 0 has the highest proportion of medium activity users -- cluster 1 users are active, "are not outliers and have a fairly balanced activity profile" [meaning?] +- cluster -1 users are outliers, not dense enough to form a proper cluster - balanced, with predominantly low, a few medium, and all the most active users +- cluster 0 has mostly low-moderate activity users, and the lowest number of (i.e., nearly zero) highly active users +- cluster 1 has the most low-moderate/medium activity users, and the second largest number of highly active users (next to the outliers cluster (-1)) +- clusters 2 and 3 are predominately low-moderate activity users, with a few medium activity users To see the distribution of ad_creatives across different clusters, we create a DataFrame that shows each ad_creative's count value within each cluster: @@ -389,14 +379,14 @@ To see the distribution of ad_creatives across different clusters, we create a D pd.DataFrame(user_df.groupby("cluster_label")["ad_creative"].value_counts()) ``` -![ad creatives distribution across clusters](ad_creatives-per-cluster.png) +![ad creatives distribution across clusters](../assets/use_cases/user_acquisition_analytics/ad_creatives-per-cluster.png) -observations: +What can we observe here? -- outliers clicked on ad_creatives from both campaigns (as expected) -- cluster 3 clicked on only one distinct ad_creative - from the influencer based campaign -- clusters 0 and 2 clicked on only two distinct influencer based creatives -- cluster 1 clicked on both campaigns' ad_creatives, but more on the first (non-influencer) campaign +- cluster -1 (with the most high and extremely high activity users) clicked on ad_creatives from both campaigns (as expected); 1324 clicks on new campaign ads, 1344 on old +- cluster 0 clicked on only the first (non-influencer) campaign; these ads focus on community and competition within the gaming platform +- clusters 1 and 2 clicked on only two influencer campaign ad creatives each; cluster 1 ads focused on the benefits of premium membership; cluster 2 ads highlighted experience and achievement in the gaming platform +- cluster 3 clicked on only one ad_creative - from the influencer campaign; this ad was an immediate appeal based on promised excitement and the exclusive rewards of joining Now, let's get some descriptive stats for our signup dates to help us interpret our clusters' behavior further. @@ -412,19 +402,34 @@ for col in desc.columns: desc ``` -![signup date per cluster group](/signup_date-per-cluster.png) +![signup date per cluster group](../assets/use_cases/user_acquisition_analytics/signup_date-per-cluster.png) + +What do our clusters' signup dates data indicate? + +- outliers' (cluster -1) have signup dates that are scattered across both campaign periods +- cluster 0's signups came entirely from clicks on the first campaign's ad_creatives +- clusters 1, 2, and 3 signed up in response to the new (influencer-augmented) campaign only + +## Our findings, in sum + +Let's summarize our findings. + +| cluster label | activity level | ad creative (-> signup) | signup date (campaign) | # users | % of total | +| :--- | :----------------- | :---------------------- | :----------------------- | :---------- | -------: | +| -1 (outliers) | all levels, with *many* highly active users | both campaigns (6 new, 6 old) | all | 2668 | 33% | +| 0 | low to medium | only first campaign | first campaign (5 ads) | 1511 | 19% | +| 1 | low to medium, but balanced | only 2 influencer campaign ads | influencer campaign | 1926 | 24% | +| 2 | low to medium | only 2 influencer campaign ads | influencer campaign | 805 | 10% | +| 3 | low to medium | only 1 influencer campaign ad | influencer campaign | 1090 | 14% | -observations... +Overall, the influencer-backed (i.e., second) campaign performed better. Roughly 58% of user signups came *exclusively* from clicks on second campaign ad_creatives. These ads were influencer-based, included a call to action, emphasized benefits, and had motivating language. (Two ads in particular accounted for 38% of all signups: "Unleash your gaming potential! Upgrade to premium for 2 months free and dominate the competition with XYZCr$$d!" (22%), and "Ready to level up? Join XYZCr$$d now for intense gaming battles and exclusive rewards!" (16%).) -- outliers' (cluster -1) have signup dates that are scattered -- cluster 1's signups mostly (75%) came from clicks on the first campaign's ad_creatives -- clusters 0, 2, and 3 signed up in response to the new (influencer-augmented) campaign only +Users who signed up in response to ads that focused on enhanced performance, risk-free premium benefits, and community engagement - cluster 1 - tended to be predominantly medium activity users, but also included a nontrivial number of low and high activity users. Cluster 1 users also made up the largest segment of subscriptions (35%). Both these findings suggest that this kind of ad_creative provides the best ROI. -## In sum +Cluster 0, though its users signed up exclusively in response to the first (non-influencer) campaign, is still relatively low to medium activity - its overall distribution is left of clusters 1, 2, and 3 - suggesting that users who subscribe in response to the non-influencer campaign ads are less active than users who signup after clicking on new campaign ads. Still, we need to continue monitoring user activity to see if these patterns hold over time. -Superlinked's framework enables you to perform more nuanced user acquisition analytics. +## Conclusion -lets you take advantage of the power of embedding structured *and* unstructured data (e.g., ad campaign text), giving you accurate, relevant results and insights - for more nuanced user acquisition and, more generally, behavioral analytics.. -so you can improve the effectiveness of your user acquisition (but also retention and engagement)... +Using Superlinked's Number, Recency, and TextSimilarity Spaces, we can embed various aspects of our user data, and create clusters on top our vector embeddings. By analyzing these clusters, we reveal patterns previously invisible in the data - which ad_creatives work well at generating signups, and how users behave *after* signups resulting from certain ad_creatives. These insights are invaluable for shaping more strategic marketing and sales decisions. -![summary of outcomes](/summary.png) \ No newline at end of file +Now it's your turn. Experiment with the Superlinked library yourself in the [notebook](https://github.com/superlinked/superlinked/blob/main/notebook/analytics_user_acquisition.ipynb)! \ No newline at end of file diff --git a/docs/assets/use_cases/user_acquisition_analytics/ad_creative-count.png b/docs/assets/use_cases/user_acquisition_analytics/ad_creative-count.png new file mode 100644 index 000000000..c9c9fdd57 Binary files /dev/null and b/docs/assets/use_cases/user_acquisition_analytics/ad_creative-count.png differ diff --git a/docs/assets/use_cases/user_acquisition_analytics/cluster_histograms.png b/docs/assets/use_cases/user_acquisition_analytics/cluster_histograms.png index 8e456376a..bd1590ab3 100644 Binary files a/docs/assets/use_cases/user_acquisition_analytics/cluster_histograms.png and b/docs/assets/use_cases/user_acquisition_analytics/cluster_histograms.png differ diff --git a/docs/assets/use_cases/user_acquisition_analytics/cluster_visualization.png b/docs/assets/use_cases/user_acquisition_analytics/cluster_visualization.png index a0247aebf..c79ef2b6e 100644 Binary files a/docs/assets/use_cases/user_acquisition_analytics/cluster_visualization.png and b/docs/assets/use_cases/user_acquisition_analytics/cluster_visualization.png differ diff --git a/docs/assets/use_cases/user_acquisition_analytics/user_id-sign_up_date-ad_creative-activity.png b/docs/assets/use_cases/user_acquisition_analytics/user_id-sign_up_date-ad_creative-activity.png new file mode 100644 index 000000000..01e45eef7 Binary files /dev/null and b/docs/assets/use_cases/user_acquisition_analytics/user_id-sign_up_date-ad_creative-activity.png differ