Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoARIMA with CSS method does not initialise residuals to zero #957

Closed
andrewscottm opened this issue Dec 19, 2024 · 1 comment · Fixed by #959
Closed

AutoARIMA with CSS method does not initialise residuals to zero #957

andrewscottm opened this issue Dec 19, 2024 · 1 comment · Fixed by #959
Labels

Comments

@andrewscottm
Copy link
Contributor

andrewscottm commented Dec 19, 2024

What happened + What you expected to happen

AutoARIMA with the CSS method includes random garbage values in the residuals, sigma2 and the forecast's fitted values. The issue is traced to the function arima_css, which does not initialise the returned array of residuals. A fix might be to add

  for (size_t l = 0; l < ncond && l < n; ++l) {
    resid[l] = 0;
  }

before L151.

Versions / Dependencies

statsforecast 2.0.0
numpy 2.0.2
pandas 2.2.3
Python 3.11.10 (main, Oct 3 2024, 02:37:52) [Clang 14.0.6 ]
MacOS 15.2

Reproducible example

Ignoring some warnings due to extreme example chosen to highlight problem.

import warnings
warnings.simplefilter("ignore", UserWarning)
from scipy.optimize import OptimizeWarning
warnings.simplefilter("ignore", OptimizeWarning)

import numpy as np
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from statsforecast.arima import arima_string

df = pd.DataFrame({
    'unique_id': '0',
    'ds': pd.date_range(start='2000-01-01', periods=185, freq='1D'),
    'y': np.array([ 
          0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
          2., 2., 2., 2., 2., 2., 2., 2., 0., 9., 0., 9., 0.,
          3., 3., 3.])
    })


m = StatsForecast(models=[AutoARIMA(season_length=8, allowdrift=False, allowmean=False, method='CSS')], freq='1D', n_jobs=1)
m.fit(df=df)

model = m.fitted_[0][0].model_
print(arima_string(model))

arma = model['arma']
residuals = model['residuals']
sigma2 = model['sigma2']
ncond = arma[0] + arma[5] + arma[4] * (arma[2] + arma[6])
print(f"sigma2: {model['sigma2']}")
print(f"first {ncond} residuals should be zero: \n{residuals[:ncond]}")

denom = np.nansum(residuals ** 2) / model['sigma2']
sigma2_corrected = np.nansum(residuals[(arma[0] + arma[5] + arma[4] * (arma[2] + arma[6])):] ** 2) / denom
print(f"sigma2_corrected: {sigma2_corrected}")

Multiple runs will produce different results, depending on what the C array initialised to in memory:

ARIMA(5,1,0)(2,0,0)[8]                   
sigma2: 0.4799087477689993
first 22 residuals should be zero: 
[0.0e+000 2.9e-322 0.0e+000 2.9e-322 0.0e+000 0.0e+000 0.0e+000 0.0e+000
 0.0e+000 0.0e+000 0.0e+000 0.0e+000 0.0e+000 2.0e+000 2.0e+000 2.0e+000
 2.0e+000 2.0e+000 2.0e+000 2.0e+000 2.0e+000 2.0e+000]
sigma2_corrected: 0.27651891726052474

ARIMA(5,1,0)(2,0,0)[8]                   
sigma2: 0.2779313466390558
first 22 residuals should be zero: 
[ 0.00000000e+000  2.91498731e-322  6.92791425e-310  1.06718180e-321
  0.00000000e+000  0.00000000e+000  0.00000000e+000  0.00000000e+000
  0.00000000e+000  0.00000000e+000  0.00000000e+000  0.00000000e+000
  0.00000000e+000  0.00000000e+000  0.00000000e+000  0.00000000e+000
  0.00000000e+000  0.00000000e+000  0.00000000e+000  0.00000000e+000
 -5.00000000e-001  0.00000000e+000]
sigma2_corrected: 0.2765189172605247

ARIMA(5,1,0)(2,0,0)[8]                   
sigma2: 1.310270097924344
first 22 residuals should be zero: 
[ 0.00000000e+000  6.16388421e+000  6.92791425e-310  6.20827629e+000
  8.27310868e+000  1.82447942e-008 -1.51752349e-008  1.50125807e-008
  8.97511508e-008  4.64459977e-008 -1.37124736e-009  4.93928500e-008
  5.89794431e-008  1.08460057e-008 -9.49380437e-009  1.01675170e-008
  8.68460121e-008  4.67413914e-008 -8.23158693e-010  4.70780216e-008
  6.27358661e-008  6.16388421e+000]
sigma2_corrected: 0.27651891726052474

Fitted values include the garbage residuals too:

forecast = m.forecast(df=df, h=1, fitted=True)
fitted_values = m.forecast_fitted_values()
print(fitted_values.iloc[:25, :])

   unique_id         ds    y     AutoARIMA
0          0 2000-01-01  0.0  0.000000e+00
1          0 2000-01-02  0.0 -6.163884e+00
2          0 2000-01-03  0.0  0.000000e+00
3          0 2000-01-04  0.0 -6.208276e+00
4          0 2000-01-05  0.0 -8.273109e+00
5          0 2000-01-06  0.0 -1.824479e-08
6          0 2000-01-07  0.0  1.517523e-08
7          0 2000-01-08  0.0 -1.501258e-08
8          0 2000-01-09  0.0 -8.975115e-08
9          0 2000-01-10  0.0 -4.644600e-08
10         0 2000-01-11  0.0  1.371247e-09
11         0 2000-01-12  0.0 -4.939285e-08
12         0 2000-01-13  0.0 -5.897944e-08
13         0 2000-01-14  2.0  2.000000e+00
14         0 2000-01-15  2.0  2.000000e+00
15         0 2000-01-16  2.0  2.000000e+00
16         0 2000-01-17  2.0  2.000000e+00
17         0 2000-01-18  2.0  2.000000e+00
18         0 2000-01-19  2.0  2.000000e+00
19         0 2000-01-20  2.0  2.000000e+00
20         0 2000-01-21  2.0  2.000000e+00
21         0 2000-01-22  2.0 -4.163884e+00
22         0 2000-01-23  2.0  2.000000e+00
23         0 2000-01-24  2.0  2.000000e+00
24         0 2000-01-25  2.0  2.000000e+00

Issue Severity

Low: It annoys or frustrates me.

@jmoralez
Copy link
Member

Thanks a lot for the detailed report. Indeed, seems like I missed this line

resid[:ncond] = 0.0
when migrating. I'll push a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants