Typical RL task tries to optimize the object function
where
which is a discounted cunmulated sum of all one-step reward. However, such object function does not take market risk into consideration. The simplest benchmark to measure risk and return is the Sharpe ratio, i.e.
with
J.Moody propose the idea of DSR, which can turn
First, for given step n, the Sharpe ratio
with
both
Now we can extend such formulism to an exponential moving average Sharpe ratio on time scale
with
initialized with
where we define the DSR as
Now if we expand the whole
Now the original object function of Sharpe ratio becomes totally additive to first order. One step reward is now replaced by
Here use S&P500 daily close price to illustrate the idea of DSR. We note that the initial condition
# encoding: utf-8
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
dt = pd.read_csv('SP500.csv', index_col=0)['close']
pct = dt.pct_change().ffill().fillna(0.0)
pct = pct.values
def sharpe(ls):
return np.mean(ls)/np.std(ls)
# ls1 contains the true SR values
ls1 = []
sr0 = sharpe(pct[:200])
for i in range(500):
sr = sharpe(pct[:200+i+1])
ls1.append(sr-sr0)
sr0 = sr
# ls2 uses cumulated DSR to approximate SR
ls2 = []
eta = 0.004
# use the first 200 days to set an initial value of SR
sr = sharpe(pct[:200])
for i in range(500):
A = np.mean(pct[:200+i])
B = np.mean(pct[:200+i]**2)
delta_A = pct[200+i+1] - A
delta_B = pct[200+i+1]**2 - B
Dt = (B*delta_A - 0.5*A*delta_B) / (B-A**2)**(3/2)
sr += eta * Dt
ls2.append(Dt*eta)
The comparison between SR and cumulated DSR is presented a