Introduce BetaGeoNBD Random Variable #1317

PabloRoque · 2024-12-25T22:04:08Z

Description

Context: Prepares the introduction of time-invariant covariates in (M)BG/NBD models

Introduces BetaGeoNBDRV contributing to CLV API Standardization #527
distribution_new_customer | distribution_new_customer_dropout now accept data as Optional. This is in line with other distributions, contributing to the standardization
Adds distribution_new_customer_recency_frequency in line with standardization
Overrides ModifiedBetaGeoModel.distribution_new_customer. Note that we cannot use super().distribution_new_customer until a new distribution block is added for ModifiedBetaGeoRV

Related Issue

Closes #
Related to CLV API Standardization #527 CLV Distribution RVs not Model-Specific #128
Related to List of Continuous, Non-Contractual `clv` Models to Consider Adding #1230

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Modules affected

MMM
CLV

Type of change

📚 Documentation preview 📚: https://pymc-marketing--1317.org.readthedocs.build/en/1317/

ColtAllen

It's a good start, but it looks like a lot of changes were copied over from #1269. This will create merge conflicts and/or require a rebase to preserve @ricardoV94's contributions in that PR.

Can you revert the changes to the other distribution blocks? I would prefer to merge #1269 before proceeding with this one.

pymc_marketing/clv/distributions.py

pymc_marketing/clv/models/beta_geo.py

tests/clv/models/test_beta_geo.py

tests/clv/test_distributions.py

PabloRoque · 2024-12-27T15:28:18Z

@ColtAllen Thanks for taking a look at the draft PR

It's a good start, but it looks like a lot of changes were copied over from #1269

Indeed. I need to work on top of pymc>=5.19.1 and pytensor>=2.26.1 [since it's the only way to make blas work on apple silicon without having to hack c-compiler flags]. Due to this I need to make changes to the signatures just like #1269

So I guess we will need to leave the PR as draft until #1269 is merged. Then we can rebase.

Can you revert the changes to the other distribution blocks? I would prefer to merge #1269 before proceeding with this one.

As explained above I cannot revert, because basically nothing runs on my setup if reverted.

ColtAllen · 2024-12-27T20:18:53Z

As explained above I cannot revert, because basically nothing runs on my setup if reverted.

Here's a workaround for MacOS.

…ropout|purchase_rate as dropout|purchase_rate to consolidate naming among models

…_rate. Introduce distribution_new_customer_recency_frequency. Improve tests

…ith other CLV models

pymc_marketing/clv/distributions.py

ColtAllen

Looks good so far! Just need to revise the sim_data function, docstrings, and a few other things.

pymc_marketing/clv/distributions.py

pymc_marketing/clv/models/beta_geo.py

tests/clv/models/test_beta_geo.py

tests/clv/test_distributions.py

ColtAllen

Almost there!

pymc_marketing/clv/distributions.py

Co-authored-by: Colt Allen <[email protected]>

ColtAllen

Thanks for adding this! This will allow use of the utils.plot_purchase_pmf function for BetaGeoModel, as well as permit arviz metrics like WAIC for out-of-sample predictive power.

If you wish to create an equivalent RV for ModifiedBetaGeoModel, you can probably just inherit from these classes and override the rng_fn and logp methods.

PabloRoque · 2025-01-12T16:28:13Z

I truly appreciate the time you took to go over it.
Expect that PR you mention, and one introducing covars for each of the 2 models.

PabloRoque · 2025-01-14T09:59:12Z

@wd60622 Could you have a look at this PR? If you don't see any issues could you consider merging it?

I am preparing ModifiedBetaGeoNBDRV and would be good to rebase to main with this PR merged.

Thanks!

wd60622 · 2025-01-14T12:32:49Z

tests/test_mlflow.py

@@ -483,7 +483,7 @@ def test_clv_fit_mcmc(model_cls, clv_data) -> None:
    run_id = run.info.run_id
    inputs, params, metrics, tags, artifacts = get_run_data(run_id)

-    assert inputs == []
+    assert isinstance(inputs, list)


what was happening here?

@pytest.mark.parametrize("model_cls", [BetaGeoModel]) def test_clv_fit_mcmc(model_cls, clv_data) -> None: mlflow.set_experiment("pymc-marketing-test-suite-clv") sampler_config = { "draws": 2, "chains": 1, "tune": 1, } model = model_cls(data=clv_data, sampler_config=sampler_config) with mlflow.start_run() as run: model.fit() assert mlflow.active_run() is None run_id = run.info.run_id inputs, params, metrics, tags, artifacts = get_run_data(run_id) > assert inputs == [] E assert [<DatasetInpu...e='sample'>]>] == [] E Left contains one more item: <DatasetInput: dataset=<Dataset: digest='cebda4ee', name='dataset', profile=('{"features_shape": {}, "features_size": ... '"mlflow.source.type": "LOCAL"}}'), source_type='code'>, tags=[<InputTag: key='mlflow.data.context', value='sample'>]> E Full diff: E [ E - , E + <DatasetInput: dataset=<Dataset: digest='cebda4ee', name='dataset', profile=('{"features_shape": {}, "features_size": {}, "features_nbytes": {}, ' E + '"targets_shape": {"recency_frequency": [4, 2]}, "targets_size": ' E + '{"recency_frequency": 8}, "targets_nbytes": {"recency_frequency": 64}}'), schema=None, source=('... E E ...Full output truncated (4 lines hidden), use '-vv' to show tests/test_mlflow.py:486: AssertionError

inputs is not an empty array

I see. thanks for the traceback

wd60622

Looks good. I just had question about the mlflow test. I'll merge still

PabloRoque added 2 commits December 22, 2024 16:47

Add BGNBD distribution and BGNBD Random Variable

817514d

Add BGNBD excel test

4f6fd14

github-actions bot added CLV tests labels Dec 25, 2024

PabloRoque changed the title ~~[WIP] Introduce BetaGeoNBG Random Variable~~ [WIP] Introduce BetaGeoNBD Random Variable Dec 25, 2024

ColtAllen requested changes Dec 27, 2024

View reviewed changes

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved

pymc_marketing/clv/models/beta_geo.py Outdated Show resolved Hide resolved

tests/clv/models/test_beta_geo.py Outdated Show resolved Hide resolved

tests/clv/test_distributions.py Show resolved Hide resolved

PabloRoque added 2 commits December 27, 2024 16:30

Remove logp in terms of Potential

715d5c4

Rename BGNBD -> BetaGeoNBD

e2a0607

PabloRoque added 17 commits December 28, 2024 10:41

Add logp and test matching lifetimes

ca62118

Add logp param.type.ndim > 1. Add logp pt.switch. Related tests

ed4a0f0

Add test_bg_nbd_sample_prior

e65aa6a

Add _distribution_new_customers and related test. Rename population_d…

b1b891e

…ropout|purchase_rate as dropout|purchase_rate to consolidate naming among models

Adjust test_model_repr expected result to BetaGeoNBD instead of BGNBD

866ef8e

Improve distribution_new_customer, distribution_new_customer_purchase…

df04039

…_rate. Introduce distribution_new_customer_recency_frequency. Improve tests

Revert @pytest.mark.slow to in test_model_convergence

86364d8

Revert distribution changes related to pymc-labs#1269

256b607

Revert more changes related to pymc-labs#1269

5bf47d7

Revert BetaGeoBetaBinomRV changes

7aa31f7

Revert ParetoNBDRV changes

ef55998

Docstring cleanup

4f91de5

Revert changes in ContContract dist

3c5260c

Clean ContContract changes

21cb076

Revert deletion _supp_shape_from_params

55da4d1

Remove commented chunk on fit_result. Opted for data to standardize w…

8beb7fc

…ith other CLV models

BetaGeoNBDRV as pre-pymc-labs#1269 definition

4c46ae8

wd60622 reviewed Jan 2, 2025

View reviewed changes

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved

Fix test_numerically_stable_logp

9356052

PabloRoque marked this pull request as ready for review January 8, 2025 14:17

PabloRoque added 3 commits January 9, 2025 09:59

Merge branch 'main' into BGNBDRV

60c35d3

Adapt BetaGeoNBDRV to pymc-labs#1269

bcc8fa7

Fix test_clv_fit_mcmc

17818ae

ColtAllen requested changes Jan 10, 2025

View reviewed changes

PabloRoque added 7 commits January 10, 2025 10:19

Modify sim_data to reflect the beta-distributed dropout process

15503ec

Add reference to BetaGeoNBD

db8c796

Delete _logp

378220e

Delete commented weights param in test_bg_nbd

7b9508f

Ammend BetaGeoNBD docstring

c4edfa9

Fix BetaGeoNBD math

5d0db19

Fix test_posterior_distributions to include dropout distributions

7d05269

PabloRoque requested a review from ColtAllen January 10, 2025 15:59

PabloRoque added 2 commits January 10, 2025 16:59

Merge branch 'main' into BGNBDRV

df6a304

Fix #NUM! docstring

ba3625c

ColtAllen requested changes Jan 10, 2025

View reviewed changes

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved

pymc_marketing/clv/distributions.py Show resolved Hide resolved

PabloRoque and others added 3 commits January 11, 2025 00:19

Tweak sim_data

911f8a0

Merge branch 'main' into BGNBDRV

0db4e14

Add co-author.

bb4e82b

Co-authored-by: Colt Allen <[email protected]>

PabloRoque requested a review from ColtAllen January 11, 2025 00:12

ColtAllen approved these changes Jan 12, 2025

View reviewed changes

PabloRoque requested a review from wd60622 January 13, 2025 10:15

Merge branch 'main' into BGNBDRV

e9fe1ac

wd60622 reviewed Jan 14, 2025

View reviewed changes

wd60622 added the enhancement New feature or request label Jan 14, 2025

wd60622 approved these changes Jan 14, 2025

View reviewed changes

wd60622 merged commit 8d94482 into pymc-labs:main Jan 14, 2025
20 checks passed

PabloRoque deleted the BGNBDRV branch January 14, 2025 15:49

PabloRoque mentioned this pull request Jan 15, 2025

Implement ModifiedBetaGeoNBD and ModifiedBetaGeoNBDRV #1375

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce BetaGeoNBD Random Variable #1317

Introduce BetaGeoNBD Random Variable #1317

PabloRoque commented Dec 25, 2024 •

edited by ColtAllen

Loading

ColtAllen left a comment

PabloRoque commented Dec 27, 2024 •

edited

Loading

ColtAllen commented Dec 27, 2024

ColtAllen left a comment

ColtAllen left a comment

ColtAllen left a comment

PabloRoque commented Jan 12, 2025

PabloRoque commented Jan 14, 2025

wd60622 Jan 14, 2025

PabloRoque Jan 14, 2025

PabloRoque Jan 14, 2025

wd60622 Jan 14, 2025

wd60622 left a comment

Introduce BetaGeoNBD Random Variable #1317

Introduce BetaGeoNBD Random Variable #1317

Conversation

PabloRoque commented Dec 25, 2024 • edited by ColtAllen Loading

Description

Related Issue

Checklist

Modules affected

Type of change

ColtAllen left a comment

Choose a reason for hiding this comment

PabloRoque commented Dec 27, 2024 • edited Loading

ColtAllen commented Dec 27, 2024

ColtAllen left a comment

Choose a reason for hiding this comment

ColtAllen left a comment

Choose a reason for hiding this comment

ColtAllen left a comment

Choose a reason for hiding this comment

PabloRoque commented Jan 12, 2025

PabloRoque commented Jan 14, 2025

wd60622 Jan 14, 2025

Choose a reason for hiding this comment

PabloRoque Jan 14, 2025

Choose a reason for hiding this comment

PabloRoque Jan 14, 2025

Choose a reason for hiding this comment

wd60622 Jan 14, 2025

Choose a reason for hiding this comment

wd60622 left a comment

Choose a reason for hiding this comment

PabloRoque commented Dec 25, 2024 •

edited by ColtAllen

Loading

PabloRoque commented Dec 27, 2024 •

edited

Loading