CUSUM.py get_chu_stinchcombe_white_statistics speed-up and bug fix #417

jasonharris438 · 2020-07-02T02:13:59Z

There is one minor issue in the existing function that I have fixed in my proposed solution, however the main purpose here is for a speed-up.

Change No. 1:

Bug on line 55 - currently s_n_t = 1 / (sigma_sq_t * np.sqrt(integer_index - temp_integer_index)) * values_diff

Change to: s_n_t = 1 / (np.sqrt(sigma_sq_t * (integer_index - temp_integer_index))) * values_diff

To take sqrt of sigma_sq_t to get \hat\sigma_{t} - Correctly standardizes according to the original paper and Marcos' book.

S_{n,t} & = (y_{t}-y_{n})(\hat\sigma_{t}\sqrt{t-n})^{-1}, \ \ t>n \

The test statistics are currently disproportionately large compared to the critical values as they are not scaled correctly.

To Reproduce
The script from the docs is sufficient to see this.

import pandas as pd
import numpy as np
from mlfinlab.structural_breaks import (get_chu_stinchcombe_white_statistics,
get_chow_type_stat, get_sadf)

Importing price data

bars = pd.read_csv('BARS_PATH', index_col = 0, parse_dates=[0])

Changing to log prices data

log_prices = np.log(bars.close) # see p.253, 17.4.2.1 Raw vs Log Prices

Chu-Stinchcombe test (one-sided and two-sided)

one_sided_test = get_chu_stinchcombe_white_statistics(log_prices, test_type='one_sided')
two_sided_test = get_chu_stinchcombe_white_statistics(log_prices, test_type='two_sided')

Change No. 2:

I have implemented a modified _get_s_n_for_t() function that results in a large time saving. Currently the script takes about 7.8 minutes to run on a simulated dataset of 10,000 points. The proposed changes result in a running time of 0.45 minutes on the same dataset.

This also removes the need for _get_values_diff(), as it adds unnecessary overhead.

Can you please point me to where I can do your unit testing for this? I have no apparent issues on my local environment, however before I create a PR I would like to be certain. I will edit this issue and create a PR once the new script passes.

Thanks guys

PanPip · 2020-07-02T07:05:37Z

@jasonharris438 Thank you for these improvements! I wrote a list of tests to run in the comments to Issue #348.

PanPip · 2020-07-02T07:07:08Z

@jasonharris438 Also, please be sure to make your local branch from develop branch and then make a PR also to develop and not master.

jasonharris438 · 2020-07-05T04:03:24Z

Thanks @PanPip , I just sent you a msg on slack.
I have passed the styling requirements, just got hit on line 74 of the test_structural_breaks.py test.
I think there might be an issue with the asserted stats for the one_sided test, as the value of 3729.001 looks too big to me (as do the others that are >100). It should be on the same scale as the values for the 2 sided test.

Jackal08 · 2020-08-05T16:17:22Z

Hi @jasonharris438, has this ticket been resolved?

jasonharris438 · 2020-08-18T03:58:59Z

Hi @Jackal08, this change requires a change to the unit test script. I'll outline for you over the weekend

jasonharris438 · 2020-08-23T06:37:13Z

Hey @Jackal08, maybe don't worry about this one. I've had far too much on my plate the last couple of months, just caught up on all of the discussion around the source code going private. I've been working and studying full time during stage 4.5 lockdown alongside searching for a career change. Unfortunately haven't had the time to revisit this library. I think my fork is now quite out of date, so probably will struggle to do a PR to maximise the benefit not being able to see the latest version. Effectively what I had done here requires a change to the unit test as it was not asserting the correct output.

I've attached my version of the CUSUM.py file, it's very similar to the original, just with the inner loop removed and a fix to the calculation of the denominator on line 55 (it was previously using the variance, this needs to be the standard deviation).

cusum.txt

hpad06 · 2020-12-25T16:13:50Z

@jasonharris438 I tried your cusum, it gets totally different result from original code, is this indeed correct?

For example, the same data, the first is the chart using original code with stat>critical value, the 2nd is yours

PanPip added the enhancement New feature or request label Jul 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUSUM.py get_chu_stinchcombe_white_statistics speed-up and bug fix #417

CUSUM.py get_chu_stinchcombe_white_statistics speed-up and bug fix #417

jasonharris438 commented Jul 2, 2020

PanPip commented Jul 2, 2020

PanPip commented Jul 2, 2020

jasonharris438 commented Jul 5, 2020 •

edited

Loading

Jackal08 commented Aug 5, 2020

jasonharris438 commented Aug 18, 2020

jasonharris438 commented Aug 23, 2020

hpad06 commented Dec 25, 2020

CUSUM.py get_chu_stinchcombe_white_statistics speed-up and bug fix #417

CUSUM.py get_chu_stinchcombe_white_statistics speed-up and bug fix #417

Comments

jasonharris438 commented Jul 2, 2020

Change No. 1:

Importing price data

Changing to log prices data

Chu-Stinchcombe test (one-sided and two-sided)

Change No. 2:

PanPip commented Jul 2, 2020

PanPip commented Jul 2, 2020

jasonharris438 commented Jul 5, 2020 • edited Loading

Jackal08 commented Aug 5, 2020

jasonharris438 commented Aug 18, 2020

jasonharris438 commented Aug 23, 2020

hpad06 commented Dec 25, 2020

jasonharris438 commented Jul 5, 2020 •

edited

Loading