Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 2.08 KB

README.md

File metadata and controls

34 lines (25 loc) · 2.08 KB

ExchangeAgent

Training a stock exchange agent with Reinforcement Learning algorithms and Decision Transformer.

Environment

  • The environment used was the gym-anytrading, which can load stock exchange history as an OpenAI Gymnasium environment, and enable users to train their custom agents to trade stocks in the environment.
  • The environment returns a window of stock exchange prices as the observation and allows for action 0 for selling the stocks and 1 for buying the stocks.

Reinforcement Learning Algorithms

  • The first set of agents was trained using A2C and PPO algorithms implemented in the stable-baselines3 library.
  • The agents were trained on each algorithm with 25K, 50K, 100K, 200K, 500K, and 1M timesteps.
  • The best-performing agent was the PPO algorithm with 200K timesteps, which collects around ~900 avg. reward on an episode.

Performance Comparison between Models:

random-a2c-25-50-100-steps-comparison-chart.png random-a2c-200-500-1000-steps-comparison-chart.png

random-ppo-25-50-100-steps-comparison-chart.png random-ppo-200-500-1000-steps-comparison-chart.png

Sample Trading History (PPO-200K):

ppo-200-stock-exchange-graph.png

Decision Transformer

  • The decision transformer from the paper Decision Transformer: Reinforcement Learning via Sequence Modeling was also trained on the collected trajectories from the PPO-200K model (offline RL policy).
  • However, the decision transformer failed to converge, due to the small observation space of the time-series environment, as well as the model not converging easily on discrete action space.