Skip to content

📈Synthetic financial time series generation with regime clustering📈

License

Notifications You must be signed in to change notification settings

kirillzx/CLSGAN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic financial time series generation with regime clustering

Authors: Kirill Zakharov, Elizaveta Stavinova

Zakharov, K., Stavinova, E. and Boukhanovsky, A., 2023. Synthetic Financial Time Series Generation with Regime Clustering. Journal of Advances in Information Technology, 14(6).

@article{zakharov2023synthetic, title={Synthetic Financial Time Series Generation with Regime Clustering}, author={Zakharov, Kirill and Stavinova, Elizaveta and Boukhanovsky, Alexander}, journal={Journal of Advances in Information Technology}, volume={14}, number={6}, year={2023} }

For questions and recommendations write [email protected]

Installation

To use the method, you must install following Python libraries:

pip install ruptures
pip install stumpy
pip install scipy
pip install sklearn
pip install torch torchvision
pip install yfinance
pip install pyts
pip install statsmodels
pip install fbprophet

Method

We have proposed three new methods: CLFF, CLGAN, CLSGAN. General pipeline of our method is presented below. The main idea is to use the clustering approach on allocated regimes. For a detail description see the article. Pipeline

We also proposed the modification of existing GAN architectures, adding Supervisor and second Discriminator.

Experiments

For the experiments we have used three open access datasets which describes stock prices. All data available in folder Data and even more here. For the quality assessment we have used distribution statistics (skewness, kurtosis), sum of absolute squared values of spectral density, Jensen-Shannon divergence, two-sample Kolmogorov-Smirnov test statistic, local extrema, autocorrelation and machine learning metrics (MSE on forecasting by time series cross-validation).

It can be seen from the autocorrelation plots that our approach gives a better approximation during the lags.

Obtained Q–Q plots of the extremum points in synthetic and the corresponding initial time series:

On the figure below presented distributions: the original one and the distribution of revenues in daily (differentiated time series) and monthly (differentiated with time lag of 20 days time series) scales.

Hyperparameters

For training procedure you can use the following hyperparameters.

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.1%
  • Python 3.9%