-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce BetaGeoNBD Random Variable #1317
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good start, but it looks like a lot of changes were copied over from #1269. This will create merge conflicts and/or require a rebase to preserve @ricardoV94's contributions in that PR.
Can you revert the changes to the other distribution blocks? I would prefer to merge #1269 before proceeding with this one.
@ColtAllen Thanks for taking a look at the draft PR
Indeed. I need to work on top of pymc>=5.19.1 and pytensor>=2.26.1 [since it's the only way to make blas work on apple silicon without having to hack c-compiler flags]. Due to this I need to make changes to the signatures just like #1269 So I guess we will need to leave the PR as draft until #1269 is merged. Then we can rebase.
As explained above I cannot revert, because basically nothing runs on my setup if reverted. |
Here's a workaround for MacOS. |
…ropout|purchase_rate as dropout|purchase_rate to consolidate naming among models
…_rate. Introduce distribution_new_customer_recency_frequency. Improve tests
…ith other CLV models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far! Just need to revise the sim_data
function, docstrings, and a few other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
Co-authored-by: Colt Allen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! This will allow use of the utils.plot_purchase_pmf
function for BetaGeoModel
, as well as permit arviz
metrics like WAIC for out-of-sample predictive power.
If you wish to create an equivalent RV for ModifiedBetaGeoModel
, you can probably just inherit from these classes and override the rng_fn
and logp
methods.
I truly appreciate the time you took to go over it. |
@wd60622 Could you have a look at this PR? If you don't see any issues could you consider merging it? I am preparing Thanks! |
@@ -483,7 +483,7 @@ def test_clv_fit_mcmc(model_cls, clv_data) -> None: | |||
run_id = run.info.run_id | |||
inputs, params, metrics, tags, artifacts = get_run_data(run_id) | |||
|
|||
assert inputs == [] | |||
assert isinstance(inputs, list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize("model_cls", [BetaGeoModel])
def test_clv_fit_mcmc(model_cls, clv_data) -> None:
mlflow.set_experiment("pymc-marketing-test-suite-clv")
sampler_config = {
"draws": 2,
"chains": 1,
"tune": 1,
}
model = model_cls(data=clv_data, sampler_config=sampler_config)
with mlflow.start_run() as run:
model.fit()
assert mlflow.active_run() is None
run_id = run.info.run_id
inputs, params, metrics, tags, artifacts = get_run_data(run_id)
> assert inputs == []
E assert [<DatasetInpu...e='sample'>]>] == []
E Left contains one more item: <DatasetInput: dataset=<Dataset: digest='cebda4ee', name='dataset', profile=('{"features_shape": {}, "features_size": ... '"mlflow.source.type": "LOCAL"}}'), source_type='code'>, tags=[<InputTag: key='mlflow.data.context', value='sample'>]>
E Full diff:
E [
E - ,
E + <DatasetInput: dataset=<Dataset: digest='cebda4ee', name='dataset', profile=('{"features_shape": {}, "features_size": {}, "features_nbytes": {}, '
E + '"targets_shape": {"recency_frequency": [4, 2]}, "targets_size": '
E + '{"recency_frequency": 8}, "targets_nbytes": {"recency_frequency": 64}}'), schema=None, source=('...
E
E ...Full output truncated (4 lines hidden), use '-vv' to show
tests/test_mlflow.py:486: AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputs
is not an empty array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. thanks for the traceback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I just had question about the mlflow test. I'll merge still
Description
Context: Prepares the introduction of time-invariant covariates in (M)BG/NBD models
distribution_new_customer | distribution_new_customer_dropout
now accept data asOptional
. This is in line with other distributions, contributing to the standardizationdistribution_new_customer_recency_frequency
in line with standardizationModifiedBetaGeoModel.distribution_new_customer
. Note that we cannot usesuper().distribution_new_customer
until a new distribution block is added forModifiedBetaGeoRV
Related Issue
Checklist
Modules affected
Type of change
📚 Documentation preview 📚: https://pymc-marketing--1317.org.readthedocs.build/en/1317/