Skip to content

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

andreacate
Copy link
Contributor

@andreacate andreacate commented Mar 31, 2025

Dynamical Factor Models (DFM) Implementation

This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.

Overview

  • Added DFM.py with initial functionality

Current Status

This implementation is a work in progress and I welcome any feedback

Next Steps

  • Vectorize the construction of the transition and selection matrices (possibly by reordering state variables).
  • Add support for measurement error.

@zaxtax
Copy link
Contributor

zaxtax commented Apr 1, 2025

Looks interesting! Just say when you think it's ready for review

@fonnesbeck
Copy link
Member

cc @jessegrabowski

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@andreacate
Copy link
Contributor Author

Thanks for the feedback!

I'm still exploring the best approach for implementing Dynamic Factor Models.
I've added a simple custom DFM model in a Jupyter notebook, which I plan to use as a prototype and testing tool while developing the main BayesianDynamicFactor class.

# Factor states
for i in range(self.k_factors):
for lag in range(self.factor_order):
names.append(f"factor_{i+1}_lag{lag}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I've been using stata notation for lagged states, e.g. L{lag}.factor_{i+1}

Not married to it, but consider it for consistency's sake.

if self.error_order > 0:
for i in range(self.k_endog):
for lag in range(self.error_order):
names.append(f"error_{i+1}_lag{lag}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above


# If error_order > 0
if self.error_order > 0:
coords["error_ar_param"] = list(range(1, self.error_order + 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coords["error_ar_param"] = list(range(1, self.error_order + 1))
coords[ERROR_AR_PARAM_DIM] = list(range(1, self.error_order + 1))

It's weird to have a global everywhere except here


self.ssm["initial_state_cov", :, :] = P0

# TODO vectorize the design matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to have to double-check all of these matrix constructions if you re-ordered the states.

@andreacate andreacate force-pushed the DFM_draft_implementation branch 2 times, most recently from 21560db to a459a1a Compare July 25, 2025 10:44
@jessegrabowski
Copy link
Member

Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process

@andreacate andreacate force-pushed the DFM_draft_implementation branch from 1c04f65 to bc3fcf2 Compare July 25, 2025 13:51
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!

Comment on lines 95 to 101
Internally, this model is represented in state-space form by stacking all current and lagged latent factors and,
if present, autoregressive observation errors into a single state vector. The full state vector has dimension
:math:`k_factors \cdot factor_order + k_endog \cdot error_order`, where :math:`k_endog` is the number of observed time series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show the actual transition equation that is used in block form, using the vectors/matrices that you defined above.

@andreacate andreacate force-pushed the DFM_draft_implementation branch 3 times, most recently from 7846f15 to e15cdd3 Compare July 29, 2025 07:59
@andreacate andreacate force-pushed the DFM_draft_implementation branch from e15cdd3 to 3b8bfe4 Compare August 8, 2025 12:36
@andreacate andreacate force-pushed the DFM_draft_implementation branch from 6496f38 to 615960b Compare August 15, 2025 21:20
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a deeper pass on everything except the build_symbolic_graph method. I need to spend more time on that because it's gotten quite complex.

I'll finish ASAP.

Comment on lines 21 to 22
# TODO: check test for error_var=True, since there are problems with statsmodels, the matrices looks the same by some experiments done in notebooks
# (FAILED tests/statespace/models/test_DFM.py::test_DFM_update_matches_statsmodels[True-2-2-2] - numpy.linalg.LinAlgError: 1-th leading minor of the array is not positive definite)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you replace this TODO with a test that fails, and mark it as xfail with this comment about statsmodels maybe doing something wrong?

factor_order : int
Order of the VAR process for the latent factors. If 0, the factors are treated as static (no dynamics).
Therefore, the state vector will include one state per factor and "factor_ar" will not exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "no dynamics" do you mean the estimated factors will literally be static, or just that they won't be autoregressive?

I guess I'm asking if they still get stochastic innovations when factor_order = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry, maybe that was a bit misleading. The factor won't be autoregressive, but will still have stochastic innovation

Names of the exogenous variables. If not provided, but `k_exog` is specified, default names will be generated as `exog_1`, `exog_2`, ..., `exog_k`.
shared_exog_states: bool, optional
Whether exogenous latent states are shared across the observed states. If True, there will be only one set of exogenous latent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "exogenous latent state"? The learned regression coefficient states?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are the betas coefficient

Notes
-----
TODO: adding to notes, how exog variables are handled and add them in the example?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already have them as $x_t$ in the equations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I was also thinking about adding the explanation of how we handle the exogenous variables in the section about the state-space matrices. For example, explaining that we extend the state… Do you think that’s unnecessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that could be nice, but consider it a stretch goal, not a must-have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants