First clone this repository to your computer.
To Ensure your research is reproducible. If you don't already have it, download a conda distribution from: https://conda.io/docs/user-guide/install/index.html.
Create a virtual environment by running:
conda create -n env_TradeBot python=3.9
source activate env_TradeBot
pip install -r requirements.txt
We construct a TradeBot algorithm for sequential prediction based on Transformers and RNN Model, the main objectif of the model is to optimize portfolio allocation given uncertainity in the energy market.
Our methodology is based on the following articles:
Probabilistic forecasting with Factor Quantile Regression: Application to electricity trading
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
Foundations of Sequence-to-Sequence Modeling for Time Series
Are Transformers Effective for Time Series Forecasting?
Multi-Period Trading via Convex Optimization
For this task, our goal is:
1- Develop an algorithm that synthetizes a signal of energy market that's predictive of forward returns over some time horizon.
2- Translating our signal into an algorithm that can turn a profit while also managing risk exposures, in this case is minimizing the downturn loss, or simply minimizing a constraint.
Traditionally in quantitative finance, the solution to the problem of maximizing returns while constraining risk has been to employ some form of Portfolio Optimization, but performing sophisticated optimizations is challenging on today's market.
Algorithmic trading strategies are driven by signals that indicate when to buy or sell assets to generate superior returns relative to
a benchmark such as an index. The portion of an asset's return that is not explained by exposure to this benchmark is called alpha
,
and hence the signals that aim to produce such uncorrelated returns are also called alpha factors.
We first adopt an RNN model that predicts a day-ahead energy prices da
based real time pricesrt
, in fact the trader is faced on two information:
1- the current energy prices
2- a-one-day ahead estimated prices defined by regulators, this price will let the agent to determine the maximum price (long) known as bid and the minimum prices (short) that he is willing to place.
Since information differ from an agent to another, the bid and offer fluctuate with respect to market place (Hub node). Besides agent's descisions, there is many other factors that make prices fluctuate (quantity/price that a particular agent is forced to accept, liquidiy of the market, etc.)
In the first naive model, we consider an RNN, the major innovation of RNN is that each prediction output is a function of both previous output and new data.
RNNs have been successfully applied to various tasks that require mapping one or more input sequences to one or more output sequences and are particularly well suited to time series forecasting.
In a second phase, we want to incorporate correlated market information to our model which is considered as causal effect on prices fluctuation. In terms of modeling time series data which are sequential in nature, as one can imagine, researchers have come up with models which use Recurrent Neural Networks (RNN) as discussed earlier like LSTM or GRU, and more recently Transformer based methods which fit naturally to the time series forecasting setting.
In this repo, we're going to leverage the vanilla Transformer as presented in Attention Is All You Need.
The architecture is based on an Encoder-Decoder Transformer which is a natural choice for forecasting as it encapsulates several inductive biases nicely.
To begin with, the use of an Encoder-Decoder architecture is helpful at inference time where typically for some data we wish to forecast some prediction steps into the future using attention mechanism as shown in the figure below.
We first, sample the next token which is a autoregressive generation
).
In this implementation, we use an output distribution for both the encoder and decoder and, sample from it to provide forecasts up until our desired prediction horizon. This is known as Greedy Sampling/Search, this technique will help the training step to avoid local minima but also provide uncertainty estimates for robustness.
Secondly, a Transformer helps us to train on time series data which might contain thousands of time points. It might not be feasible to input all the history of a time series at once to the model, due to the time- and memory constraints of the attention mechanism. In the gigure below, thus, one can consider some appropriate context window and sample this window and the subsequent prediction length sized window from the training data when constructing batches for stochastic gradient descent (SGD). The context sized window can be passed to the encoder and the prediction window to a causal-masked decoder. This means that the decoder can only look at previous time steps when learning the next value. This is referred to as "teacher forcing".
In the diagram below we show how the procedure works
We consider the energy market over a short time period e.g 24hours and we want to predict the next 24 hours. We assumpe we have some amount of money to invest in any of
The return matrix is
We say that there is an arbitrage opportunity in this event if there exists a betting strategy
Our optimization problem is the following:
subject to
where
Conditional Value at Risk (CVaR) is a popular risk measure among professional investors used to quantify the extent of potential big losses. The metric is computed as an average of the $\alpha%$ worst case scenarios over some time horizon.
We want to place our order/trades in a conservative way, focusing on the less profitable outcomes. For high values of
The following code implements the convex optimization based on cVaR.
def maximize_trade_constrain_downside(self,bid_price, offer_price, da_validate, rt_validate, percentile, max_loss, gamma):
bid_return = (da_validate <= bid_price) * (rt_validate - da_validate)
offer_return = (offer_price < da_validate) * (da_validate - rt_validate)
weights1 = cp.Variable(bid_return.mean(axis=0).shape)
weights2 = cp.Variable(offer_return.mean(axis=0).shape)
objective = cp.Maximize(weights1* bid_return.mean(axis=0)+ weights2* offer_return.mean(axis=0))
nsamples = round(bid_return.shape[0]*self.percentile)
portfolio_rets = weights1*bid_return.T + weights2*offer_return.T
wors_hour = cp.sum_smallest(portfolio_rets, nsamples)/nsamples
constraints = [wors_hour>=max_loss, weights1>=0, weights2>=0, cp.norm(weights2, self.l_norm) <= self.gamma,
cp.norm(weights1, self.l_norm) <= self.gamma]
problem = cp.Problem(objective, constraints)
problem.solve()
return weights1.value.round(4).ravel(), bid_return, weights2.value.round(4).ravel(), offer_return, problem.value
To ensure that our startegy is robust we apply a regularization technique as shown above in the code snippet, we choose a range of gamma that represent the upper bound regularizer based on the L1 and L2 norm. We pick the one that:
1- Ensure a better diversification of our portfolio (Risk diversification)
2- Ensure to achieve the maximum return under the underlying constraint.
The best model is teh one that not only exhibits the maximum return but also satisfies the constraint or the downturn loss. For both model we retain the model with
In the plot below we see that our strategy respects all constraints, we note that the return are heavy tails which is common in financial data.
We can display the trade combination, for both models the portfolio is well diversified.
-
We limit ourself to market price conditions but we could have used feature data such as climate conditions, however we assume that all external information have been captured by the price fluctuations in the market.
-
In the general trading setting, we add a
slippage
model for limit orders. Slippage not only refers to the calculation of a realistic price but also a realistic volume, because with slippage, the model evaluates if the order is too big, hence it must be rejected since we can't trade more than market's volume.
===========================================================================
Implementation of robust principal component analysis pursuit based on Algorithm 1 (Principal Component Pursuit by Alternating Directions) described on page 29 in this paper:
- Candes, Emmanuel J. et al. "Robust Principal Component Analysis?" Journal of the ACM, Vol. 58, No. 3, Article 11, 2011.
You can have acces here https://arxiv.org/abs/0912.3599
The classical Principal Component Analysis (PCA) is widely used for high-dimensional analysis and dimensionality reduction. Mathematically, if all the data points are stacked as column vectors of a (n, m)matrix
where
subject to
given that rank(
To resolve this issue, Candes, Emmanuel J. et al proposed Robust Principal Component Analysis (Robust PCA or RPCA).
The objective is to decompose
1- A low-rank matrix
2- A sparse matrix
Electricity prices tend to vary smoothly in response to supply and demand signals, but are subject to intermittent price spikes that deviate substantially from normal behaviour as shown below
Forming the price data from one commerical trading hub into a matrix
Since we can only measure the market prices
Minimizing the
1-
2-
Here
The Robust PCA algorithm allows the separation of sparse but outlying values from the original data as shown below
The drawback of Robust PCA algorithm is its scalability, because it is generally slow since the implementation do SVD (singular value decomposition) in the converging iterations. We can alternatively look at Stable PCP which is intuitively more practical since it combines the strength of classical PCA and Robust PCA. However, we should be careful on the context of the problem and the data provided.
Unroll the daily values to plot the timeseries. Note the spikes we wish to separate.
data = pd.read_csv("Question1.csv", index_col=0, parse_dates=True)
timeseries = data.stack()
timeseries.index = timeseries.index.droplevel(1)
timeseries.plot()
M = data.values
rpca = RobustPCA(max_iter=10000)
rpca.train_pca(M)
L = rpca.get_low_rank_matrix_L()
S = rpca.get_sparse_matrix_S()
Here L
and S
are desired low rank matrix and sparse matrix that contains the spike prices.