TurtlePlayer is a reinforcement learning framework designed for financial trading strategies using the Turtle Trading system. It differs from most RL traders in that the action space isn't correlated with buying, selling, and holding actions. Instead, the action space adjusts the lookback period for entries and exits. The type 1 turtle strategy enters when a close exceeds yesterday's 20 day high and exits when the close price is below yesterday's 10 day's low. Turtle Player can dynamically adjust these periods. Feel free to modify the parameters in config.py to make your own turtle traders.
At its core, Turtle Player is designed to experiment with Reinforcement Learning (RL) in trading where the action space isn't associated with buying or selling. As the turtle trading strategy and basically all variations of it have been priced in, it is highly unlikely that turtle player will be able to generate competitive returns.
Turtle Player is built using Gymnasium and PyTorch for RL and NN training, Pandas and Numpy for data loading and manipulation, and Matplotlib, tabulate, and imageio for analysis.
Follow these steps to get TurtlePlayer up and running on your system:
I recommend using Anaconda here so you can avoid libary conflicts (Run all commands in anaconda terminal!)
conda create -n turtle python=3.12
conda activate turtle
Clone the TurtlePlayer repository to your local machine using the following command: (Run all commands in anaconda terminal!)
git clone https://github.com/lordyabu/TurtlePlayer.git
cd TurtlePlayer
pip install -r requirements.txt
cd src
Open the code in an editor like VSCode
Be sure to read and configure config.py !!!!!
I've developed two agents: 1. BaseAgent(Base turtle trading algorithm) 2. DQNAgent
to run agents after modifying config.py
python run_agent.py --agent BaseAgent
python run_agent.py --agent DQNAgent
To analyze a specifc log number, first go into the logs folder and find the session number and episode numbers you want to look at. As there a lot of different analyzation types I won't show all the commands.
python analyze.py --type state --session1 1 --episode_nums1 1
python analyze.py --type state --session1 1 --episode_nums1 1 --session2 2 --epiosde_nums 1,2,3
python analyze.py --type trade --session1 2
python analyze.py --type train --session1 2
python analyze.py --type performance --session1 1 --session2 2 # For this a Base agent session must always be fist
The Turtle Solver is implemented to analyze completed trades and determine optimal actions for each timestep during the trade and provide addtional metadata for the reward function.
After a trade is closed, the Turtle Solver retrospects through each timestep of the trade to identify what the optimal actions could have been, based on available data.
For each timestep, the Turtle Solver identifies one of the following optimal actions:
- BuyRange (Able to buy): Indicates that buying at this timestep is possible and optimal given the turtle parameters.
- AvoidBuyRange (Able to avoid buying): Indicates that avoiding a buy at this timestep is possible and optimal given the turtle parameters.
- ForcedBuy (Unable to avoid buying): Denotes situations where buying was unavoidable given the turtle parameters.
- CantBuy (Unable to buy): Denotes situation where we want to buy, but it is impossible given the turtle parameters.
The solver calculates an 'optimal window range' for each timestep, defined by minimum and maximum values (e.g., min = 15
, max = 35
). This range indicates where the trader's entry period should ideally fall to align with the best action identified.
A 'smoothed ideal' is also calculated for each timestep as a weighted average of the min
and max
values, typically using weights of 0.2 and 0.8, respectively. This figure represents a target or ideal value that combines insights from the range boundaries with a bias towards the max
value.
A 'transition approaching' is calculated for each time step if the 'optimal action' in [CantBuy or ForcedBuy] and an 'optimal action' in [BuyRange or AvoidBuyRange] within the next 5 time steps
The Turtle Solver outputs a list of data for each timestep, which includes the optimal actions, the optimal window range, and the smoothed ideal. This data is subsequently utilized to calculate rewards.
1a. Reward for Being Inside the Optimal Range
1b. Penalty for Being Inside the Optimal Range
5. Final reward for element
Performance result tables (comparing Base Turtle and TurtlePlayer when exploration rate == 0) and graphs.
Orange represents best turtle trader from training epiosdes, Red represents worst turtle trader from training episodes, Green represents average turtle trader from training epiosdes, and Blue represents the base turtle trader
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2006-01-20 | 2007-01-19 | -0.37% | 20 |
Base | 2010-02-05 | 2011-02-02 | 1.49% | 20 |
Base | 2014-02-24 | 2015-02-20 | 0.07% | 20 |
Base | 2018-03-06 | 2019-03-05 | -0.38% | 20 |
DQN_Average | 2006-01-20 | 2007-01-19 | -0.54% | 27.93 |
DQN_Average | 2010-02-05 | 2011-02-02 | 6.37% | 28.52 |
DQN_Average | 2014-02-24 | 2015-02-20 | -0.23% | 28.71 |
DQN_Average | 2018-03-06 | 2019-03-05 | -0.37% | 28.84 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2006-01-19 | 2007-01-18 | 0.66% | 20 |
Base | 2010-02-04 | 2011-02-01 | -0.43% | 20 |
Base | 2014-02-21 | 2015-02-19 | 0.21% | 20 |
Base | 2018-03-08 | 2019-03-07 | -0.43% | 20 |
DQN_Average | 2006-01-19 | 2007-01-18 | 1.75% | 30.62 |
DQN_Average | 2010-02-04 | 2011-02-01 | -0.81% | 32.11 |
DQN_Average | 2014-02-21 | 2015-02-19 | 1.41% | 30.32 |
DQN_Average | 2018-03-08 | 2019-03-07 | -0.38% | 29.6 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2006-01-20 | 2007-01-19 | 0.22% | 20 |
Base | 2010-02-05 | 2011-02-02 | 0.12% | 20 |
Base | 2014-02-24 | 2015-02-20 | 0.67% | 20 |
Base | 2018-03-09 | 2019-03-08 | 0.48% | 20 |
DQN_Average | 2006-01-20 | 2007-01-19 | 1.16% | 30.2 |
DQN_Average | 2010-02-05 | 2011-02-02 | -0.03% | 32.03 |
DQN_Average | 2014-02-24 | 2015-02-20 | 0.47% | 31.39 |
DQN_Average | 2018-03-09 | 2019-03-08 | 0.33% | 30.81 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2006-01-20 | 2007-01-19 | -0.02% | 20 |
Base | 2010-02-05 | 2011-02-02 | 0.33% | 20 |
Base | 2014-02-24 | 2015-02-20 | -0.24% | 20 |
Base | 2018-03-09 | 2019-03-08 | -0.07% | 20 |
DQN_Average | 2006-01-20 | 2007-01-19 | -0.01% | 30.87 |
DQN_Average | 2010-02-05 | 2011-02-02 | 0.33% | 32.46 |
DQN_Average | 2014-02-24 | 2015-02-20 | -0.09% | 30.64 |
DQN_Average | 2018-03-09 | 2019-03-08 | -0.10% | 30.19 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2006-01-20 | 2007-01-19 | 1.24% | 20 |
Base | 2010-02-05 | 2011-02-02 | 0.45% | 20 |
Base | 2014-02-24 | 2015-02-20 | -0.08% | 20 |
Base | 2018-03-09 | 2019-03-08 | -0.02% | 20 |
DQN_Average | 2006-01-20 | 2007-01-19 | 0.13% | 32.79 |
DQN_Average | 2010-02-05 | 2011-02-02 | 0.30% | 31.15 |
DQN_Average | 2014-02-24 | 2015-02-20 | -0.10% | 29.33 |
DQN_Average | 2018-03-09 | 2019-03-08 | -0.10% | 30.01 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2007-09-07 | 2008-09-04 | 0.31% | 20 |
Base | 2011-09-21 | 2012-09-18 | 0.15% | 20 |
Base | 2015-10-08 | 2016-10-05 | -0.00% | 20 |
Base | 2019-10-24 | 2020-10-21 | -0.01% | 20 |
DQN_Average | 2007-09-07 | 2008-09-04 | 0.26% | 29.93 |
DQN_Average | 2011-09-21 | 2012-09-18 | 0.00% | 31.44 |
DQN_Average | 2015-10-08 | 2016-10-05 | -0.01% | 30.23 |
DQN_Average | 2019-10-24 | 2020-10-21 | 0.00% | 29.56 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2010-06-18 | 2011-06-15 | -0.24% | 20 |
Base | 2014-07-07 | 2015-07-02 | -0.54% | 20 |
Base | 2018-07-20 | 2019-07-19 | -1.04% | 20 |
DQN_Average | 2010-06-18 | 2011-06-15 | 0.30% | 29.01 |
DQN_Average | 2014-07-07 | 2015-07-02 | -0.65% | 30.46 |
DQN_Average | 2018-07-20 | 2019-07-19 | -0.95% | 30.26 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2006-01-20 | 2007-01-19 | -0.76% | 20 |
Base | 2010-02-05 | 2011-02-02 | 8.22% | 20 |
Base | 2014-02-24 | 2015-02-20 | 0.24% | 20 |
Base | 2018-03-09 | 2019-03-08 | -0.25% | 20 |
DQN_Average | 2006-01-20 | 2007-01-19 | -0.71% | 27.88 |
DQN_Average | 2010-02-05 | 2011-02-02 | 10.38% | 29.95 |
DQN_Average | 2014-02-24 | 2015-02-20 | 0.29% | 29.09 |
DQN_Average | 2018-03-09 | 2019-03-08 | -0.43% | 28.28 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2006-01-20 | 2007-01-19 | 6.25% | 20 |
Base | 2010-02-05 | 2011-02-02 | 13.07% | 20 |
Base | 2014-02-24 | 2015-02-20 | -0.95% | 20 |
Base | 2018-03-09 | 2019-03-08 | -0.17% | 20 |
DQN_Average | 2006-01-20 | 2007-01-19 | 8.33% | 28.82 |
DQN_Average | 2010-02-05 | 2011-02-02 | 17.29% | 30.14 |
DQN_Average | 2014-02-24 | 2015-02-20 | -1.41% | 28.97 |
DQN_Average | 2018-03-09 | 2019-03-08 | -0.17% | 27.91 |
Episode | Start Date | End Date | PnL% Change | Avg Period |
---|---|---|---|---|
Base | 2006-01-20 | 2007-01-19 | 0.62% | 20 |
Base | 2010-02-05 | 2011-02-02 | -0.73% | 20 |
Base | 2014-02-24 | 2015-02-20 | -0.32% | 20 |
Base | 2018-03-09 | 2019-03-08 | 0.17% | 20 |
DQN_Average | 2006-01-20 | 2007-01-19 | 0.29% | 27.87 |
DQN_Average | 2010-02-05 | 2011-02-02 | -1.10% | 29.47 |
DQN_Average | 2014-02-24 | 2015-02-20 | -0.01% | 28.66 |
DQN_Average | 2018-03-09 | 2019-03-08 | 0.43% | 29.54 |
Ticker | Episode | Initial Total Value | Final Total Value | Cumulative Reward | PnL% Change | Total Units Traded |
---|---|---|---|---|---|---|
MSFT | Base Episode | 10,000,000.00 | 10,236,146.00 | 1031.92 | 2.36% | 436.0 |
MSFT | DQN Episode Average | 10,000,000.00 | 10,356,828.89 | 1287.26 | 3.57% | 377.3 |
NVDA | Base Episode | 10,000,000.00 | 12,059,314.11 | 961.41 | 20.59% | 444.0 |
NVDA | DQN Episode Average | 10,000,000.00 | 13,381,670.81 | 1273.13 | 33.82% | 398.22 |
F | Base Episode | 10,000,000.00 | 10,691,739.21 | 1182.49 | 6.92% | 335.0 |
F | DQN Episode Average | 10,000,000.00 | 12,220,967.77 | 1475.67 | 22.21% | 283.45 |
TGT | Base Episode | 10,000,000.00 | 9,984,240.40 | 1154.38 | -0.16% | 369.0 |
TGT | DQN Episode Average | 10,000,000.00 | 9,964,932.84 | 1455.4 | -0.35% | 323.69 |
M | Base Episode | 10,000,000.00 | 10,017,133.36 | 896.87 | 0.17% | 248.0 |
M | DQN Episode Average | 10,000,000.00 | 9,871,696.82 | 1176.59 | -1.28% | 213.88 |
NFLX | Base Episode | 10,000,000.00 | 14,532,029.97 | 1035.63 | 45.32% | 441.0 |
NFLX | DQN Episode Average | 10,000,000.00 | 14,098,771.12 | 1281.56 | 40.99% | 400.05 |
COKE | Base Episode | 10,000,000.00 | 10,211,641.18 | 1171.98 | 2.12% | 367.0 |
COKE | DQN Episode Average | 10,000,000.00 | 10,217,117.20 | 1532.15 | 2.17% | 317.42 |
CVX | Base Episode | 10,000,000.00 | 10,118,803.04 | 1067.46 | 1.19% | 422.0 |
CVX | DQN Episode Average | 10,000,000.00 | 10,163,114.62 | 1436.41 | 1.63% | 360.75 |
GOOG | Base Episode | 10,000,000.00 | 10,322,082.70 | 965.82 | 3.22% | 456.0 |
GOOG | DQN Episode Average | 10,000,000.00 | 10,342,720.69 | 1244.97 | 3.43% | 402.68 |
AMZN | Base Episode | 10,000,000.00 | 11,063,667.31 | 964.14 | 10.64% | 481.0 |
AMZN | DQN Episode Average | 10,000,000.00 | 10,537,750.99 | 1278.27 | 5.38% | 412.06 |