Bring structural time series functionality to PyMC #6284

drbenvincent · 2022-11-10T10:33:47Z

The problem

PyMC already has some timeseries capability, but this needs to be expanded to cover Bayesian Structural Time series (STS). We do have some good time series example notebooks:

One is on a Prophet-like model. This kind of approach does not look too daunting to beginer/intermediate user as it basically comes down to creating some Fourier features as predictor variables. That said, it could be convenient if PyMC provided a utility function to do this
Another is on Structural AR timeseries. While this notebook is excellent, it also demonstrates that it is non-trivial to implement and acts as a barrier to entry for some.

What we want

If we look at the Bayesian Structural Time Series section of the excellent Bayesian Modeling and Computation in Python book (by @aloctavodia, @canyon289, and @junpenglao) then we can see that it is trivial to build an STS model using tfp.sts:

def generate_bsts_model(observed=None):
    """
    Args:
        observed: Observed time series, tfp.sts use it to generate prior.
    """
    # Trend
    trend = tfp.sts.LocalLinearTrend(observed_time_series=observed)
    # Seasonal
    seasonal = tfp.sts.Seasonal(num_seasons=12, observed_time_series=observed)
    # Full model
    return tfp.sts.Sum([trend, seasonal], observed_time_series=observed)

observed = tf.constant(us_monthly_birth["birth_in_thousands"], dtype=tf.float32)
birth_model = generate_bsts_model(observed=observed)

# Generate the posterior distribution conditioned on the observed
target_log_prob_fn = birth_model.joint_log_prob(observed_time_series=observed)

So long story short, it would be excellent if we can expand the native PyMC time series capabilities in general, but specifically for Bayesian STS.

Note: There is already an STS implementation in JAX here. One approach would be to call jax code using the kind of approach outlined in How to wrap a JAX function for use in PyMC. However, @ricardoV94 and @lucianopaz point out that that approach would restrict the backends available. Native PyMC/Aesara implementions would mean C/Python/Numba backends are available.

A list of relevant useful functionality that we may want to enable is given on the tfp.sts page.

The text was updated successfully, but these errors were encountered:

NathanielF · 2022-11-10T10:41:31Z

This is a great idea.

junpenglao · 2022-11-10T10:44:03Z

cc @jessegrabowski @nicospinu

Currently, the road map is to build on top of Nicoleta's GSoC work:

Add a Linear Gaussian State Space Model distribution in timeseries
Add a BSTS (similar to tfp.sts) module
High level API to build common BSTS (SARIMA etc)

While I dont have a timeline yet, the initial PR to add 1. is almost done. The majority of the work is to add a Kalman filter, but we need some experiment to explore what is the most performative approach as otherwise scan could be quite slow.

jessegrabowski · 2022-11-10T10:55:45Z

I have a prototype library for BSTS here: https://github.com/jessegrabowski/pymc_statespace

Kalman filtering is brutally slow for anything but the most trivial problems. I spent a good chunk of time trying to optimize the Aesara code, but I feel like I went as far as I can, given my limited knowledge of Aesara. I have been exploring alternatives, including:

Pure Numba implementation via Nutpie
Approximation of the kalman log-likelihood by training a support neural network
Approximate gradients using finite difference + Op wrapper around numba code

None have been particularly fruitful. (2) is a flight of fancy, and (3) hasn't ended up working very well. I think there is gold in the hills of (1), but I need to make an effort to reach out to @aseyboldt and see what needs to be done to get loop/scan support up and running in Nutpie, and whether Aesara linking could be cut out of the loop entirely to write a log-likelihood function directly with numba.

juanitorduz · 2022-11-10T13:51:15Z

+1 to this request. I am not an expert but would be happy to contribute (even unit-testing)

canyon289 · 2022-11-10T14:41:54Z

thanks for the tag. am excited to see this come together

jessegrabowski · 2022-11-10T18:43:10Z

Just to make the point that numba is the way to go, I put together a short gist notebook showing the difference between a numba-wrapped kalman filter implementation and the pure aesara implementation. Gradients aren't available, but using numba gives a 4x speedup (even faster when using slice sampler)

I think this could be pushed even higher by making several adjustments, but there is work to do. A faster kalman filter would use linalg.solve_triangular to avoid a matrix inverse, but there's no numba overload for solve_triangular . There are also no numba overloads for linalg.solve_discrete_lyapunov and linalg.solve_discrete_are (nor for linalg.svd, which they also use), which could also be used to get more speed. I also have implementations for all of these, but they need input from someone more experienced working with interfacing C/FORTRAN code with numba. In particular they "load" extraordinarily slow, so the first code execution is agonizing. (Aside: is numba-scipy a dead project? I hoped submitting a draft over there would garner some comment, but it's been gathering dust for months)

I think a productive avenue of attack right now would be to get a working model compiled with nutpie and see what that offers in terms of speedup, without worrying about anything fancier. The framework I have errors out when compiling to nutpie -- something to do with the gradient of the scan in the logp. I'm going to open an issue over there and see if it goes anywhere.

devindg · 2023-03-01T12:33:58Z

I'll +1 Numba. I essentially ported R's bsts package to Python and used Numba to speed up the Kalman filter calculations. It's also important to use an efficient algorithm for getting the posterior of the state vector. To this end, I used the approach in "A simple and efficient simulation smoother for state space time series analysis" (Durbin and Koopman, 2002). The Kalman filter code can be found here. Note, however, that ~~this assumes~~ my package only supports static regression coefficients, even though the Kalman filter/Durbin and Koopman smoother can accommodate dynamic ones.

mowgliamu · 2023-11-20T07:44:15Z

I was wondering if there are any updates on this? TIA

ricardoV94 · 2023-11-20T07:51:05Z

I was wondering if there are any updates on this? TIA

Check out the module @jessegrabowski developed over pymc-experimental: https://discourse.pymc.io/t/pymc-experimental-now-includes-state-spaces-models/12773

twiecki · 2023-11-20T08:59:52Z

We can actually close this as completed I'd say.

ricardoV94 added enhancements help wanted labels Nov 10, 2022

drbenvincent mentioned this issue Nov 10, 2022

ITS: add Bayesian Structural Timeseries model pymc-labs/CausalPy#52

Open

twiecki closed this as completed Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring structural time series functionality to PyMC #6284

Bring structural time series functionality to PyMC #6284

drbenvincent commented Nov 10, 2022

NathanielF commented Nov 10, 2022

junpenglao commented Nov 10, 2022 •

edited

Loading

jessegrabowski commented Nov 10, 2022 •

edited

Loading

juanitorduz commented Nov 10, 2022

canyon289 commented Nov 10, 2022

jessegrabowski commented Nov 10, 2022 •

edited

Loading

devindg commented Mar 1, 2023 •

edited

Loading

mowgliamu commented Nov 20, 2023

ricardoV94 commented Nov 20, 2023

twiecki commented Nov 20, 2023

Bring structural time series functionality to PyMC #6284

Bring structural time series functionality to PyMC #6284

Comments

drbenvincent commented Nov 10, 2022

The problem

What we want

NathanielF commented Nov 10, 2022

junpenglao commented Nov 10, 2022 • edited Loading

jessegrabowski commented Nov 10, 2022 • edited Loading

juanitorduz commented Nov 10, 2022

canyon289 commented Nov 10, 2022

jessegrabowski commented Nov 10, 2022 • edited Loading

devindg commented Mar 1, 2023 • edited Loading

mowgliamu commented Nov 20, 2023

ricardoV94 commented Nov 20, 2023

twiecki commented Nov 20, 2023

junpenglao commented Nov 10, 2022 •

edited

Loading

jessegrabowski commented Nov 10, 2022 •

edited

Loading

jessegrabowski commented Nov 10, 2022 •

edited

Loading

devindg commented Mar 1, 2023 •

edited

Loading