-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUSUM.py get_chu_stinchcombe_white_statistics speed-up and bug fix #417
Comments
@jasonharris438 Thank you for these improvements! I wrote a list of tests to run in the comments to Issue #348. |
@jasonharris438 Also, please be sure to make your local branch from |
Thanks @PanPip , I just sent you a msg on slack. |
Hi @jasonharris438, has this ticket been resolved? |
Hi @Jackal08, this change requires a change to the unit test script. I'll outline for you over the weekend |
Hey @Jackal08, maybe don't worry about this one. I've had far too much on my plate the last couple of months, just caught up on all of the discussion around the source code going private. I've been working and studying full time during stage 4.5 lockdown alongside searching for a career change. Unfortunately haven't had the time to revisit this library. I think my fork is now quite out of date, so probably will struggle to do a PR to maximise the benefit not being able to see the latest version. Effectively what I had done here requires a change to the unit test as it was not asserting the correct output. I've attached my version of the CUSUM.py file, it's very similar to the original, just with the inner loop removed and a fix to the calculation of the denominator on line 55 (it was previously using the variance, this needs to be the standard deviation). |
@jasonharris438 I tried your cusum, it gets totally different result from original code, is this indeed correct? For example, the same data, the first is the chart using original code with stat>critical value, the 2nd is yours |
There is one minor issue in the existing function that I have fixed in my proposed solution, however the main purpose here is for a speed-up.
Change No. 1:
Bug on line 55 - currently s_n_t = 1 / (sigma_sq_t * np.sqrt(integer_index - temp_integer_index)) * values_diff
Change to: s_n_t = 1 / (np.sqrt(sigma_sq_t * (integer_index - temp_integer_index))) * values_diff
To take sqrt of sigma_sq_t to get \hat\sigma_{t} - Correctly standardizes according to the original paper and Marcos' book.
S_{n,t} & = (y_{t}-y_{n})(\hat\sigma_{t}\sqrt{t-n})^{-1}, \ \ t>n \
The test statistics are currently disproportionately large compared to the critical values as they are not scaled correctly.
To Reproduce
The script from the docs is sufficient to see this.
import pandas as pd
import numpy as np
from mlfinlab.structural_breaks import (get_chu_stinchcombe_white_statistics,
get_chow_type_stat, get_sadf)
Importing price data
bars = pd.read_csv('BARS_PATH', index_col = 0, parse_dates=[0])
Changing to log prices data
log_prices = np.log(bars.close) # see p.253, 17.4.2.1 Raw vs Log Prices
Chu-Stinchcombe test (one-sided and two-sided)
one_sided_test = get_chu_stinchcombe_white_statistics(log_prices, test_type='one_sided')
two_sided_test = get_chu_stinchcombe_white_statistics(log_prices, test_type='two_sided')
Change No. 2:
I have implemented a modified _get_s_n_for_t() function that results in a large time saving. Currently the script takes about 7.8 minutes to run on a simulated dataset of 10,000 points. The proposed changes result in a running time of 0.45 minutes on the same dataset.
This also removes the need for _get_values_diff(), as it adds unnecessary overhead.
Can you please point me to where I can do your unit testing for this? I have no apparent issues on my local environment, however before I create a PR I would like to be certain. I will edit this issue and create a PR once the new script passes.
Thanks guys
The text was updated successfully, but these errors were encountered: