TurtlePlayer

TurtlePlayer is a reinforcement learning framework designed for financial trading strategies using the Turtle Trading system. It differs from most RL traders in that the action space isn't correlated with buying, selling, and holding actions. Instead, the action space adjusts the lookback period for entries and exits. The type 1 turtle strategy enters when a close exceeds yesterday's 20 day high and exits when the close price is below yesterday's 10 day's low. Turtle Player can dynamically adjust these periods. Feel free to modify the parameters in config.py to make your own turtle traders.

Description

At its core, Turtle Player is designed to experiment with Reinforcement Learning (RL) in trading where the action space isn't associated with buying or selling. As the turtle trading strategy and basically all variations of it have been priced in, it is highly unlikely that turtle player will be able to generate competitive returns.

Turtle Player is built using Gymnasium and PyTorch for RL and NN training, Pandas and Numpy for data loading and manipulation, and Matplotlib, tabulate, and imageio for analysis.

Installation Guide

Follow these steps to get TurtlePlayer up and running on your system:

I recommend using Anaconda here so you can avoid libary conflicts (Run all commands in anaconda terminal!)

conda create -n turtle python=3.12
conda activate turtle

Step 1: Clone the Repository

Clone the TurtlePlayer repository to your local machine using the following command: (Run all commands in anaconda terminal!)

git clone https://github.com/lordyabu/TurtlePlayer.git

Step 2: Navigate to TurtlePlayer directory

cd TurtlePlayer

Step 3: Install dependencies

pip install -r requirements.txt

Step 4: cd int source code

cd src

Configuration and running guide

Open the code in an editor like VSCode

Be sure to read and configure config.py !!!!!

I've developed two agents: 1. BaseAgent(Base turtle trading algorithm) 2. DQNAgent

to run agents after modifying config.py

python run_agent.py --agent BaseAgent

python run_agent.py --agent DQNAgent

Analyzing results

To analyze a specifc log number, first go into the logs folder and find the session number and episode numbers you want to look at. As there a lot of different analyzation types I won't show all the commands.

python analyze.py --type state --session1 1 --episode_nums1 1
python analyze.py --type state --session1 1 --episode_nums1 1 --session2 2 --epiosde_nums 1,2,3
python analyze.py --type trade --session1 2
python analyze.py --type train --session1 2
python analyze.py --type performance --session1 1 --session2 2 # For this a Base agent session must always be fist

Reward function

Part 1: Turtle solver

The Turtle Solver is implemented to analyze completed trades and determine optimal actions for each timestep during the trade and provide addtional metadata for the reward function.

After a trade is closed, the Turtle Solver retrospects through each timestep of the trade to identify what the optimal actions could have been, based on available data.

Actions Analyzed

For each timestep, the Turtle Solver identifies one of the following optimal actions:

BuyRange (Able to buy): Indicates that buying at this timestep is possible and optimal given the turtle parameters.
AvoidBuyRange (Able to avoid buying): Indicates that avoiding a buy at this timestep is possible and optimal given the turtle parameters.
ForcedBuy (Unable to avoid buying): Denotes situations where buying was unavoidable given the turtle parameters.
CantBuy (Unable to buy): Denotes situation where we want to buy, but it is impossible given the turtle parameters.

Optimal Window Range

The solver calculates an 'optimal window range' for each timestep, defined by minimum and maximum values (e.g., min = 15, max = 35). This range indicates where the trader's entry period should ideally fall to align with the best action identified.

Smoothed Ideal Calculation

A 'smoothed ideal' is also calculated for each timestep as a weighted average of the min and max values, typically using weights of 0.2 and 0.8, respectively. This figure represents a target or ideal value that combines insights from the range boundaries with a bias towards the max value.

Transition Approaching Calculation

A 'transition approaching' is calculated for each time step if the 'optimal action' in [CantBuy or ForcedBuy] and an 'optimal action' in [BuyRange or AvoidBuyRange] within the next 5 time steps

Output

The Turtle Solver outputs a list of data for each timestep, which includes the optimal actions, the optimal window range, and the smoothed ideal. This data is subsequently utilized to calculate rewards.

2. Reward Calculation

For each element in list provided from turtle solver:

1a. Reward for Being Inside the Optimal Range

$$d_{\mathrm{ideal}} = |\mathrm{agent_window} - \mathrm{smoothed_ideal}|$$

$$\mathrm{base_reward} = 0.75 \times \left(1 - \frac{d_{\mathrm{ideal}}}{\max(\mathrm{solver_window}['\mathrm{max}'] - \mathrm{solver_window}['\mathrm{min}'], 1)}\right)$$

1b. Penalty for Being Inside the Optimal Range $$\mathrm{base_penalty} = -0.5 \times \left(1 - \frac{1}{\max \left(\frac{1}{\log \left(\max \left(\mathrm{solver_window}['\mathrm{max}'] - \mathrm{solver_window}['\mathrm{min}'], 2 \right)\right)}, 1 \right)}\right)$$

$$ \begin{cases} \mathrm{base_reward} \times= 0.55 & \mathrm{if \ 'transition \ approaching' \ and \ 'optimal \ action' \ in \ (ForcedBuy \ or \ CantBuy)} \\ \mathrm{base_reward} \times= 0.3 & \mathrm{if \ not \ 'transition \ approaching' \ and \ (ForcedBuy \ or \ CantBuy)} \\ \mathrm{base_penalty} \times= 1.15 & \mathrm{if \ optimal \ action \ not \ in \ (BuyRange \ or \ AvoidBuyRange) \ and \ 'transition \ approaching'} \\ \mathrm{base_penalty} \times= 1.75 & \mathrm{if \ optimal \ action \ in \ (BuyRange \ or \ AvoidBuyRange) } \end{cases} $$

$$ \begin{cases} \mathrm{base_reward} \mathrel{+=} 0.15 & \mathrm{if \ in \ optimal \ window \ and \ (agent_window > smoothed_ideal \ and \ agent_action = 'Decrease')} \\ \mathrm{base_reward} \mathrel{+=} 0.15 & \mathrm{if \ in \ 'optimal \ window' \ and \ (agent_window < smoothed_ideal \ and \ agent_action = 'Increase')} \\ \mathrm{base_reward} += 0.075 & \mathrm{if \ in 'optimal \ window' \ and \ agent_action = 'Nothing'} \\ \mathrm{base_reward} -= 0.15 & \mathrm{if \ in \ 'optimal \ window' \ and \ moving \ away \ from \ the \ 'smoothed \ ideal'} \\ \mathrm{base_penalty} \mathrel{+=} 0.2 & \mathrm{if \ not \ in \ 'optimal \ window' \ and \ (agent_window < min \ and \ agent_action = 'Increase')} \\ \mathrm{base_penalty} \mathrel{+=} 0.2 & \mathrm{if \ not \ in \ 'optimal \ window' \ and \ (agent_window > max \ and \ agent_action = 'Decrease')} \\ \mathrm{base_penalty} -= 0.2 & \mathrm{if \ not \ in \ 'optimal \ window' \ and \ moving \ away \ from \ the \ range} \end{cases} $$

5. Final reward for element $$Reward = \mathrm{base_reward} \ or \ \mathrm{base_penalty} $$

Actual results

Performance result tables (comparing Base Turtle and TurtlePlayer when exploration rate == 0) and graphs.

Orange represents best turtle trader from training epiosdes, Red represents worst turtle trader from training episodes, Green represents average turtle trader from training epiosdes, and Blue represents the base turtle trader

F Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2006-01-20	2007-01-19	-0.37%	20
Base	2010-02-05	2011-02-02	1.49%	20
Base	2014-02-24	2015-02-20	0.07%	20
Base	2018-03-06	2019-03-05	-0.38%	20
DQN_Average	2006-01-20	2007-01-19	-0.54%	27.93
DQN_Average	2010-02-05	2011-02-02	6.37%	28.52
DQN_Average	2014-02-24	2015-02-20	-0.23%	28.71
DQN_Average	2018-03-06	2019-03-05	-0.37%	28.84

MSFT Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2006-01-19	2007-01-18	0.66%	20
Base	2010-02-04	2011-02-01	-0.43%	20
Base	2014-02-21	2015-02-19	0.21%	20
Base	2018-03-08	2019-03-07	-0.43%	20
DQN_Average	2006-01-19	2007-01-18	1.75%	30.62
DQN_Average	2010-02-04	2011-02-01	-0.81%	32.11
DQN_Average	2014-02-21	2015-02-19	1.41%	30.32
DQN_Average	2018-03-08	2019-03-07	-0.38%	29.6

COKE Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2006-01-20	2007-01-19	0.22%	20
Base	2010-02-05	2011-02-02	0.12%	20
Base	2014-02-24	2015-02-20	0.67%	20
Base	2018-03-09	2019-03-08	0.48%	20
DQN_Average	2006-01-20	2007-01-19	1.16%	30.2
DQN_Average	2010-02-05	2011-02-02	-0.03%	32.03
DQN_Average	2014-02-24	2015-02-20	0.47%	31.39
DQN_Average	2018-03-09	2019-03-08	0.33%	30.81

CVX Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2006-01-20	2007-01-19	-0.02%	20
Base	2010-02-05	2011-02-02	0.33%	20
Base	2014-02-24	2015-02-20	-0.24%	20
Base	2018-03-09	2019-03-08	-0.07%	20
DQN_Average	2006-01-20	2007-01-19	-0.01%	30.87
DQN_Average	2010-02-05	2011-02-02	0.33%	32.46
DQN_Average	2014-02-24	2015-02-20	-0.09%	30.64
DQN_Average	2018-03-09	2019-03-08	-0.10%	30.19

AMZN Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2006-01-20	2007-01-19	1.24%	20
Base	2010-02-05	2011-02-02	0.45%	20
Base	2014-02-24	2015-02-20	-0.08%	20
Base	2018-03-09	2019-03-08	-0.02%	20
DQN_Average	2006-01-20	2007-01-19	0.13%	32.79
DQN_Average	2010-02-05	2011-02-02	0.30%	31.15
DQN_Average	2014-02-24	2015-02-20	-0.10%	29.33
DQN_Average	2018-03-09	2019-03-08	-0.10%	30.01

GOOG Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2007-09-07	2008-09-04	0.31%	20
Base	2011-09-21	2012-09-18	0.15%	20
Base	2015-10-08	2016-10-05	-0.00%	20
Base	2019-10-24	2020-10-21	-0.01%	20
DQN_Average	2007-09-07	2008-09-04	0.26%	29.93
DQN_Average	2011-09-21	2012-09-18	0.00%	31.44
DQN_Average	2015-10-08	2016-10-05	-0.01%	30.23
DQN_Average	2019-10-24	2020-10-21	0.00%	29.56

M Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2010-06-18	2011-06-15	-0.24%	20
Base	2014-07-07	2015-07-02	-0.54%	20
Base	2018-07-20	2019-07-19	-1.04%	20
DQN_Average	2010-06-18	2011-06-15	0.30%	29.01
DQN_Average	2014-07-07	2015-07-02	-0.65%	30.46
DQN_Average	2018-07-20	2019-07-19	-0.95%	30.26

NFLX Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2006-01-20	2007-01-19	-0.76%	20
Base	2010-02-05	2011-02-02	8.22%	20
Base	2014-02-24	2015-02-20	0.24%	20
Base	2018-03-09	2019-03-08	-0.25%	20
DQN_Average	2006-01-20	2007-01-19	-0.71%	27.88
DQN_Average	2010-02-05	2011-02-02	10.38%	29.95
DQN_Average	2014-02-24	2015-02-20	0.29%	29.09
DQN_Average	2018-03-09	2019-03-08	-0.43%	28.28

NVDA Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2006-01-20	2007-01-19	6.25%	20
Base	2010-02-05	2011-02-02	13.07%	20
Base	2014-02-24	2015-02-20	-0.95%	20
Base	2018-03-09	2019-03-08	-0.17%	20
DQN_Average	2006-01-20	2007-01-19	8.33%	28.82
DQN_Average	2010-02-05	2011-02-02	17.29%	30.14
DQN_Average	2014-02-24	2015-02-20	-1.41%	28.97
DQN_Average	2018-03-09	2019-03-08	-0.17%	27.91

TGT Performance Results

Episode	Start Date	End Date	PnL% Change	Avg Period
Base	2006-01-20	2007-01-19	0.62%	20
Base	2010-02-05	2011-02-02	-0.73%	20
Base	2014-02-24	2015-02-20	-0.32%	20
Base	2018-03-09	2019-03-08	0.17%	20
DQN_Average	2006-01-20	2007-01-19	0.29%	27.87
DQN_Average	2010-02-05	2011-02-02	-1.10%	29.47
DQN_Average	2014-02-24	2015-02-20	-0.01%	28.66
DQN_Average	2018-03-09	2019-03-08	0.43%	29.54

Full time period results(Includes all time steps even when exploration rate > 0)

Ticker	Episode	Initial Total Value	Final Total Value	Cumulative Reward	PnL% Change	Total Units Traded
MSFT	Base Episode	10,000,000.00	10,236,146.00	1031.92	2.36%	436.0
MSFT	DQN Episode Average	10,000,000.00	10,356,828.89	1287.26	3.57%	377.3
NVDA	Base Episode	10,000,000.00	12,059,314.11	961.41	20.59%	444.0
NVDA	DQN Episode Average	10,000,000.00	13,381,670.81	1273.13	33.82%	398.22
F	Base Episode	10,000,000.00	10,691,739.21	1182.49	6.92%	335.0
F	DQN Episode Average	10,000,000.00	12,220,967.77	1475.67	22.21%	283.45
TGT	Base Episode	10,000,000.00	9,984,240.40	1154.38	-0.16%	369.0
TGT	DQN Episode Average	10,000,000.00	9,964,932.84	1455.4	-0.35%	323.69
M	Base Episode	10,000,000.00	10,017,133.36	896.87	0.17%	248.0
M	DQN Episode Average	10,000,000.00	9,871,696.82	1176.59	-1.28%	213.88
NFLX	Base Episode	10,000,000.00	14,532,029.97	1035.63	45.32%	441.0
NFLX	DQN Episode Average	10,000,000.00	14,098,771.12	1281.56	40.99%	400.05
COKE	Base Episode	10,000,000.00	10,211,641.18	1171.98	2.12%	367.0
COKE	DQN Episode Average	10,000,000.00	10,217,117.20	1532.15	2.17%	317.42
CVX	Base Episode	10,000,000.00	10,118,803.04	1067.46	1.19%	422.0
CVX	DQN Episode Average	10,000,000.00	10,163,114.62	1436.41	1.63%	360.75
GOOG	Base Episode	10,000,000.00	10,322,082.70	965.82	3.22%	456.0
GOOG	DQN Episode Average	10,000,000.00	10,342,720.69	1244.97	3.43%	402.68
AMZN	Base Episode	10,000,000.00	11,063,667.31	964.14	10.64%	481.0
AMZN	DQN Episode Average	10,000,000.00	10,537,750.99	1278.27	5.38%	412.06

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TurtlePlayer

Description

Installation Guide

Step 1: Clone the Repository

Step 2: Navigate to TurtlePlayer directory

Step 3: Install dependencies

Step 4: cd int source code

Configuration and running guide

Analyzing results

Reward function

Part 1: Turtle solver

Actions Analyzed

Optimal Window Range

Smoothed Ideal Calculation

Transition Approaching Calculation

Output

2. Reward Calculation

For each element in list provided from turtle solver:

Actual results

Performance result tables (comparing Base Turtle and TurtlePlayer when exploration rate == 0) and graphs.

Orange represents best turtle trader from training epiosdes, Red represents worst turtle trader from training episodes, Green represents average turtle trader from training epiosdes, and Blue represents the base turtle trader

F Performance Results

MSFT Performance Results

COKE Performance Results

CVX Performance Results

AMZN Performance Results

GOOG Performance Results

M Performance Results

NFLX Performance Results

NVDA Performance Results

TGT Performance Results

Full time period results(Includes all time steps even when exploration rate > 0)

About

Releases

Packages

Languages

License

patternfinder03/TurtlePlayer

Folders and files

Latest commit

History

Repository files navigation

TurtlePlayer

Description

Installation Guide

Step 1: Clone the Repository

Step 2: Navigate to TurtlePlayer directory

Step 3: Install dependencies

Step 4: cd int source code

Configuration and running guide

Analyzing results

Reward function

Part 1: Turtle solver

Actions Analyzed

Optimal Window Range

Smoothed Ideal Calculation

Transition Approaching Calculation

Output

2. Reward Calculation

For each element in list provided from turtle solver:

Actual results

Performance result tables (comparing Base Turtle and TurtlePlayer when exploration rate == 0) and graphs.

Orange represents best turtle trader from training epiosdes, Red represents worst turtle trader from training episodes, Green represents average turtle trader from training epiosdes, and Blue represents the base turtle trader

F Performance Results

MSFT Performance Results

COKE Performance Results

CVX Performance Results

AMZN Performance Results

GOOG Performance Results

M Performance Results

NFLX Performance Results

NVDA Performance Results

TGT Performance Results

Full time period results(Includes all time steps even when exploration rate > 0)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages