We explore the application of offline Reinforcement Learning (RL), specifically focusing on learning a goal-oriented knowledge representation framework called World Value Function (WVF). We benchmark the performance of selected offline RL algorithms, including offline Deep Q-Network (DQN) and Batch Constrained deep Q-learning (BCQ), under varying data buffer sizes. Notably, these selected algorithms were modified to learn goal-oriented value functions. Using a 2D video game and a robotic environment, our experiments span discrete and continuous action domains. The success rates of learned WVF using these algorithms over varying replay data show valuable insights into the efficiency of these algorithms under different conditions and domains, highlighting the significance of possessing a large and diverse dataset for learning WVFs in a batch setting. Read More
Note: I am unable to publish the code for the experiments at this time as it is currently being utilized for other ongoing project. However, I am open to sharing the code upon request. Please feel free to reach out, and I will be more than willing to provide it when appropriate.