Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce BetaGeoNBD Random Variable #1317

Merged
merged 42 commits into from
Jan 14, 2025
Merged

Conversation

PabloRoque
Copy link
Contributor

@PabloRoque PabloRoque commented Dec 25, 2024

Description

Context: Prepares the introduction of time-invariant covariates in (M)BG/NBD models

  • Introduces BetaGeoNBDRV contributing to CLV API Standardization #527
  • distribution_new_customer | distribution_new_customer_dropout now accept data as Optional. This is in line with other distributions, contributing to the standardization
  • Adds distribution_new_customer_recency_frequency in line with standardization
  • Overrides ModifiedBetaGeoModel.distribution_new_customer. Note that we cannot use super().distribution_new_customer until a new distribution block is added for ModifiedBetaGeoRV

Related Issue

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--1317.org.readthedocs.build/en/1317/

@PabloRoque PabloRoque changed the title [WIP] Introduce BetaGeoNBG Random Variable [WIP] Introduce BetaGeoNBD Random Variable Dec 25, 2024
Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good start, but it looks like a lot of changes were copied over from #1269. This will create merge conflicts and/or require a rebase to preserve @ricardoV94's contributions in that PR.

Can you revert the changes to the other distribution blocks? I would prefer to merge #1269 before proceeding with this one.

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/beta_geo.py Outdated Show resolved Hide resolved
tests/clv/models/test_beta_geo.py Outdated Show resolved Hide resolved
tests/clv/test_distributions.py Show resolved Hide resolved
@PabloRoque
Copy link
Contributor Author

PabloRoque commented Dec 27, 2024

@ColtAllen Thanks for taking a look at the draft PR

It's a good start, but it looks like a lot of changes were copied over from #1269

Indeed. I need to work on top of pymc>=5.19.1 and pytensor>=2.26.1 [since it's the only way to make blas work on apple silicon without having to hack c-compiler flags]. Due to this I need to make changes to the signatures just like #1269

So I guess we will need to leave the PR as draft until #1269 is merged. Then we can rebase.

Can you revert the changes to the other distribution blocks? I would prefer to merge #1269 before proceeding with this one.

As explained above I cannot revert, because basically nothing runs on my setup if reverted.

@ColtAllen
Copy link
Collaborator

As explained above I cannot revert, because basically nothing runs on my setup if reverted.

Here's a workaround for MacOS.

@PabloRoque PabloRoque marked this pull request as ready for review January 8, 2025 14:17
Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! Just need to revise the sim_data function, docstrings, and a few other things.

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved
pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved
pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/beta_geo.py Outdated Show resolved Hide resolved
tests/clv/models/test_beta_geo.py Outdated Show resolved Hide resolved
tests/clv/test_distributions.py Outdated Show resolved Hide resolved
@PabloRoque PabloRoque requested a review from ColtAllen January 10, 2025 15:59
Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

pymc_marketing/clv/distributions.py Outdated Show resolved Hide resolved
pymc_marketing/clv/distributions.py Show resolved Hide resolved
@PabloRoque PabloRoque requested a review from ColtAllen January 11, 2025 00:12
Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! This will allow use of the utils.plot_purchase_pmf function for BetaGeoModel, as well as permit arviz metrics like WAIC for out-of-sample predictive power.

If you wish to create an equivalent RV for ModifiedBetaGeoModel, you can probably just inherit from these classes and override the rng_fn and logp methods.

@PabloRoque
Copy link
Contributor Author

I truly appreciate the time you took to go over it.
Expect that PR you mention, and one introducing covars for each of the 2 models.

@PabloRoque PabloRoque requested a review from wd60622 January 13, 2025 10:15
@PabloRoque
Copy link
Contributor Author

@wd60622 Could you have a look at this PR? If you don't see any issues could you consider merging it?

I am preparing ModifiedBetaGeoNBDRV and would be good to rebase to main with this PR merged.

Thanks!

@@ -483,7 +483,7 @@ def test_clv_fit_mcmc(model_cls, clv_data) -> None:
run_id = run.info.run_id
inputs, params, metrics, tags, artifacts = get_run_data(run_id)

assert inputs == []
assert isinstance(inputs, list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was happening here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pytest.mark.parametrize("model_cls", [BetaGeoModel])
    def test_clv_fit_mcmc(model_cls, clv_data) -> None:
        mlflow.set_experiment("pymc-marketing-test-suite-clv")
    
        sampler_config = {
            "draws": 2,
            "chains": 1,
            "tune": 1,
        }
    
        model = model_cls(data=clv_data, sampler_config=sampler_config)
        with mlflow.start_run() as run:
            model.fit()
    
        assert mlflow.active_run() is None
    
        run_id = run.info.run_id
        inputs, params, metrics, tags, artifacts = get_run_data(run_id)
    
>       assert inputs == []
E       assert [<DatasetInpu...e='sample'>]>] == []
E         Left contains one more item: <DatasetInput: dataset=<Dataset: digest='cebda4ee', name='dataset', profile=('{"features_shape": {}, "features_size": ... '"mlflow.source.type": "LOCAL"}}'), source_type='code'>, tags=[<InputTag: key='mlflow.data.context', value='sample'>]>
E         Full diff:
E           [
E         -  ,
E         +  <DatasetInput: dataset=<Dataset: digest='cebda4ee', name='dataset', profile=('{"features_shape": {}, "features_size": {}, "features_nbytes": {}, '
E         +  '"targets_shape": {"recency_frequency": [4, 2]}, "targets_size": '
E         +  '{"recency_frequency": 8}, "targets_nbytes": {"recency_frequency": 64}}'), schema=None, source=('...
E         
E         ...Full output truncated (4 lines hidden), use '-vv' to show

tests/test_mlflow.py:486: AssertionError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputs is not an empty array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. thanks for the traceback

@wd60622 wd60622 added the enhancement New feature or request label Jan 14, 2025
Copy link
Contributor

@wd60622 wd60622 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just had question about the mlflow test. I'll merge still

@wd60622 wd60622 merged commit 8d94482 into pymc-labs:main Jan 14, 2025
20 checks passed
@PabloRoque PabloRoque deleted the BGNBDRV branch January 14, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLV enhancement New feature or request tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants