Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MMM Calibration #1034

Open
PrathameshKachkure opened this issue Aug 20, 2024 · 4 comments
Open

MMM Calibration #1034

PrathameshKachkure opened this issue Aug 20, 2024 · 4 comments

Comments

@PrathameshKachkure
Copy link

PrathameshKachkure commented Aug 20, 2024

While training a model I encountered an issue during model calibration. We initially built a model using 2 years of data and then added synthetic spend data for a media channel over the recent 4 weeks. This channel had 0 contribution earlier, and after adding significant spend data for the last 4 weeks, it still showed 0 contribution.
To address this, we calibrated the model using incremental revenue and spends for the same 4-week period, but the channel continued to show 0 contribution. However, when we increased the training size, we started to see contribution for this media channel.
Could anyone share best practices or insights on how to effectively calibrate the model for recent periods?

@gufengzhou @laresbernardo

@laresbernardo
Copy link
Collaborator

If all the spending period was excluded from the training dataset then there's no spending data being used for that new channel, thus the coef will be zero. If you use all the available data, and also add the calibration input (calibration input is part of the training data), then it should have a larger impact than 0.
Notice the test periods will always be the last periods of the available data.

@PrathameshKachkure
Copy link
Author

thanks @laresbernardo for your response,

I have a few follow-up questions regarding calibration:

What should we typically expect from calibration in terms of impact on the model’s outputs?
For the period that we've calibrated, should we expect the ROI to align closely with the experiment's ROI?
How does calibration impact the overall contribution distribution between paid media, baseline, and other variables?
After calibration, should we generally expect the contribution of paid media variables to increase, especially if the synthetic spend data was added for recent periods?

@amanrai2508
Copy link

amanrai2508 commented Sep 13, 2024

Hi @laresbernardo
Let’s assume our MMM model (up to July 2024) indicated a 0% contribution for a particular channel. We then ran an experiment for August for this channel and recalibrated the model, setting ts_validation to false.
In these cases we will be able to get the contribution, right ?

Currently we are not getting it ?

@asana-neishabouri
Copy link

asana-neishabouri commented Oct 11, 2024

Hi,
I see a huge accuracy drop after robyn calibration, and I go from R2= 0.79 to a negative value.
here is my code :
subset_df <- df_input %>%
filter(week >= as.Date('2024-01-01') & week <= as.Date('2024-05-26'))

n <- nrow(subset_df)

group_assignment <- rep(c("test", "control"), length.out = n)

set.seed(123)
group_assignment <- sample(group_assignment)

subset_df <- subset_df %>%
mutate(group = group_assignment)

average_sales <- subset_df %>%
group_by(group) %>%
summarize(average_units = mean(dep_var))

incremental_lift <- average_sales %>%
summarize(incremental_lift = average_units[group == "test"] - average_units[group == "control"])

total_test_spend <- subset_df %>%
#filter(group == "test") %>%
summarize(total_spend = sum(spend_chanl_in))

calibration_input <- data.frame(

channel = c("spend_chanl_in"),
liftStartDate = as.Date(c("2024-01-01")), # liftStartDate must be within input data range

liftEndDate = as.Date(c("2024-05-26")), # liftEndDate must be within input data range (in this case is the last week of data)
liftAbs = c(incremental_lift$incremental_lift),

spend = c(total_test_spend$total_spend ),
confidence = c(0.95),
metric = c("dep_var"),

calibration_scope = c("immediate")
)

colnames(calibration_input)[colnames(calibration_input) == "incremental_lift"] <- "liftAbs"
colnames(calibration_input)[colnames(calibration_input) == "total_spend"] <- "spend"

InputCollect <- robyn_inputs(InputCollect = InputCollect, calibration_input = calibration_input)

OutputModels <- robyn_run(
InputCollect = InputCollect, # feed in all model specification
#outputs = FALSE,
cores = cores_, # NULL defaults to (max available - 1)
iterations = iteration_, # 2000 recommended for the dummy dataset with no calibration
trials = trials_, # 5 recommended for the dummy dataset
#ts_validation = TRUE

)

OutputCollect <- robyn_outputs(
InputCollect, OutputModels,
pareto_front =3,
pareto_fronts = 5, # Manually set the number of Pareto fronts
#pareto_fronts = "auto", # automatically pick how many pareto-fronts to fill min_candidates (100)
min_candidates = 100, # top pareto models for clustering. Default to 100
calibration_constraint = 0.05, # range c(0.01, 0.1) & default at 0.1
csv_out = "pareto", # "pareto", "all", or NULL (for none)
clusters = TRUE, # Set to TRUE to cluster similar models by ROAS. See ?robyn_clusters
#export = create_files, # this will create files locally
plot_folder = robyn_object, # path for plots exports and files creation
plot_pareto = TRUE # Set to FALSE to deactivate plotting and saving model one-pagers
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants