This code base is for the following paper:
Jacky Chen, John Hull, Zissis Poulos, Haris Rasul, Andreas Veneris, Yuntao Wu, "A Variational Autoencoder Approach to Conditional Generation of Possible Future Volatility Surfaces", to appear in The Journal of Financial Data Science, 2025.
We use CVAE + LSTM to generate volatility surfaces based on an arbitrary context length.
This folder contains all the code for data cleaning/generation.
data_preproc.py
:- cleaning up the data downloaded from WRDS
- generating 5x5 volatility surface grid (moneyness x time to maturity)
- The usage can be found in
spx_volsurface_generation.ipynb
andspx_convert_to_grid.ipynb
. - Note: We might need to download the S&P500 stock prices from yahoo finance, using ticker
^GSPC
.
sabr_gen.py
:- For SABR volatility surface grid (K/S moneyness x time to maturity) generation (Appendix C)
- The usage can be found in
sabr_volsurface.ipynb
To preprocess the data, use the following two files in the main directory:
spx_volsurface_generation.ipynb
: This uses thedata_preproc.py
, cleans the data and generates a dataframe containing the interpolated IVS data.spx_convert_to_grid.ipynb
: This converts the dataframe generated byspx_volsurface_generation.ipynb
to create the 5x5 numpy grid.
This folder contains code for generating distributions of surfaces for single day and multiple days and relevant evaluation functions, such as histogram plotting and latent manipulation.
This folder contains all the code for VAE definitions.
base
: the base VAE, encoder and decoder classesdense_vae
: VAE that flattens the input and treat everything as 1D vectorconv_vae
: VAE that uses 2D convolutional layers for encoder and decodercvae
: conditional VAE that uses Conv2D/Linear layers for encoder and decodercvae_with_mem
: cvae but with memory added, can use LSTM, GRU, RNN. Default LSTM.cvae_with_mem_randomized
: same ascvae_with_mem
, but with variable context length and generate 1 day forward. Used in the current paper.
Other codes:
datasets
: the customized dataset definitions used for the models. The classes with__getitem__
returning a dictionary is currently used.datasets_randomized
: the customized dataset definitions used forcvae_with_mem_randomized
, generates data points with variable context length.utils
: code used for setting random seeds, training, testing and evaluation.
param_search.py
can be used to search for optimal parameters and train the modelsgenerate_surfaces.py
can be used to generate distributions of surfaces over a time horizongenerate_surfaces_max_likelihood.py
can be used to generate the surfaces with maximum likelihood (encoded latent with zero for generated date)
The following files contains the code for table and plot generation for the final paper:
main_analysis.py
Detailed implementations are in analysis_code
.
The S&P 500 Option price data is downloaded from WRDS Get Data. OptionMetrics/Ivy DB US/Options/Option Prices.
Step 1:
- Date Range: 2000-01-01 to 2023-02-28
Step 2:
- SECID = 108105
- Option Type: Both
- Exercise Type: Both
- Security Type: Both
Step 3:
- Query Variables: all
Step 4:
- Output Format: *.csv
- Compression: *.zip
- Date Format: YYYY-MM-DD
Models and parsed data can be downloaded from Google Drive