-
Notifications
You must be signed in to change notification settings - Fork 70
Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Looks interesting! Just say when you think it's ready for review |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks for the feedback! I'm still exploring the best approach for implementing Dynamic Factor Models. |
pymc_extras/statespace/models/DFM.py
Outdated
# Factor states | ||
for i in range(self.k_factors): | ||
for lag in range(self.factor_order): | ||
names.append(f"factor_{i+1}_lag{lag}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I've been using stata notation for lagged states, e.g. L{lag}.factor_{i+1}
Not married to it, but consider it for consistency's sake.
pymc_extras/statespace/models/DFM.py
Outdated
if self.error_order > 0: | ||
for i in range(self.k_endog): | ||
for lag in range(self.error_order): | ||
names.append(f"error_{i+1}_lag{lag}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
pymc_extras/statespace/models/DFM.py
Outdated
|
||
# If error_order > 0 | ||
if self.error_order > 0: | ||
coords["error_ar_param"] = list(range(1, self.error_order + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coords["error_ar_param"] = list(range(1, self.error_order + 1)) | |
coords[ERROR_AR_PARAM_DIM] = list(range(1, self.error_order + 1)) |
It's weird to have a global everywhere except here
pymc_extras/statespace/models/DFM.py
Outdated
|
||
self.ssm["initial_state_cov", :, :] = P0 | ||
|
||
# TODO vectorize the design matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're going to have to double-check all of these matrix constructions if you re-ordered the states.
21560db
to
a459a1a
Compare
Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process |
1c04f65
to
bc3fcf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!
pymc_extras/statespace/models/DFM.py
Outdated
Internally, this model is represented in state-space form by stacking all current and lagged latent factors and, | ||
if present, autoregressive observation errors into a single state vector. The full state vector has dimension | ||
:math:`k_factors \cdot factor_order + k_endog \cdot error_order`, where :math:`k_endog` is the number of observed time series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show the actual transition equation that is used in block form, using the vectors/matrices that you defined above.
7846f15
to
e15cdd3
Compare
e15cdd3
to
3b8bfe4
Compare
In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)
…pymc_extras/statespace/models/structural/components/regression.py
6496f38
to
615960b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a deeper pass on everything except the build_symbolic_graph
method. I need to spend more time on that because it's gotten quite complex.
I'll finish ASAP.
tests/statespace/models/test_DFM.py
Outdated
# TODO: check test for error_var=True, since there are problems with statsmodels, the matrices looks the same by some experiments done in notebooks | ||
# (FAILED tests/statespace/models/test_DFM.py::test_DFM_update_matches_statsmodels[True-2-2-2] - numpy.linalg.LinAlgError: 1-th leading minor of the array is not positive definite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace this TODO with a test that fails, and mark it as xfail
with this comment about statsmodels maybe doing something wrong?
pymc_extras/statespace/models/DFM.py
Outdated
factor_order : int | ||
Order of the VAR process for the latent factors. If 0, the factors are treated as static (no dynamics). | ||
Therefore, the state vector will include one state per factor and "factor_ar" will not exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "no dynamics" do you mean the estimated factors will literally be static, or just that they won't be autoregressive?
I guess I'm asking if they still get stochastic innovations when factor_order = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry, maybe that was a bit misleading. The factor won't be autoregressive, but will still have stochastic innovation
pymc_extras/statespace/models/DFM.py
Outdated
Names of the exogenous variables. If not provided, but `k_exog` is specified, default names will be generated as `exog_1`, `exog_2`, ..., `exog_k`. | ||
shared_exog_states: bool, optional | ||
Whether exogenous latent states are shared across the observed states. If True, there will be only one set of exogenous latent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "exogenous latent state"? The learned regression coefficient states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are the betas coefficient
pymc_extras/statespace/models/DFM.py
Outdated
Notes | ||
----- | ||
TODO: adding to notes, how exog variables are handled and add them in the example? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you already have them as
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I was also thinking about adding the explanation of how we handle the exogenous variables in the section about the state-space matrices. For example, explaining that we extend the state… Do you think that’s unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that could be nice, but consider it a stretch goal, not a must-have
Dynamical Factor Models (DFM) Implementation
This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.
Overview
DFM.py
with initial functionalityCurrent Status
This implementation is a work in progress and I welcome any feedback
Next Steps