Implementing FrozenLake-v1 by following examples #618
-
I'm new to reinforcement learning. I'm trying to solve the FrozenLake-v1 game using OpenAI's gymnasium learning environment and BindsNet, which is a library to simulate Spiking Neural Networks using PyTorch. I've gone over the examples provided by BindsNet, mainly BreakoutDeterministic-v4 and SpaceInvaders-v0. I understand that for using a DQN the no. of neurons in the input layer should map to the observation space while the no. of neurons in the output layer should map to the action space. I've followed their RL example for Breakout and SpaceInvaders and made changes as per my requirements (the no. of neurons and shape of the input and output layer).
I also had to make a change to the preprocess() function in the environment.py file in BindsNet. A condition for FrozenLake-v1 needed to be added and the observation one hot encoded.
After this the algorithm runs without errors but it doesn't seem to be learning. While debugging I can see that the output layer returns 'S' is a 4x4 tensor. I'm confused on what the output should look like and represent in terms of Q values. Based on having 4 neurons I think we should get 4 outputs and the one with the max probability would be the associated action to be taken. I'm confused on how to get information for the Q values, I think that each observation space should have 4 associated Q values for each action (so 16 x 4). Based on my limited knowledge and going over the documentation for BindsNet I'm unable to figure out why my algorithm doesn't seem to be learning. I've confirmed that I'm also confused on why this gymnasium has specified rewards for this game as either reward 1 or 0. How would we get an accumulated reward for an episode? Shouldn't the accumulated reward be affected if the agent reached the goal in 5 steps vs. 10 steps? Any help would be really appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
At first glance, the code appears to be fine. To clarify, the DQN example in BindsNET has two main steps. First, we train an artificial neural network (ANN) using DQN. Next, we copy the weights from the ANN (with slight modifications) to a spiking neural network (SNN) that has the same topology. Did you also train an ANN and then copy the weights in your code? The other reinforcement learning (RL) code examples in BindsNET serve as proof of concept to demonstrate that the framework can utilize reward modulated signals to change the weights, similar to the RL framework. However, this arrangement does not perform well. Regarding your other question, if the weights are not changing, I suggest adding monitors to the network to plot the spike trains of the activity in each layer of the network, including the reward signal. Then, check if they are correlated. To change the weights, the spiking activity needs to be aligned, and only then will the STDP with the reward module be activated to modify the weights. With regard to the desired output of gymnasium, each game has different controllers and scores. |
Beta Was this translation helpful? Give feedback.
-
Thank you for that clarification. I didn't train an ANN and copy the code so I will begin there. Thanks for the explanation & suggestions, things make more sense to me now & I should be able to move forward. Appreciate you taking the time out to respond to all my questions. |
Beta Was this translation helpful? Give feedback.
At first glance, the code appears to be fine.
To clarify, the DQN example in BindsNET has two main steps. First, we train an artificial neural network (ANN) using DQN. Next, we copy the weights from the ANN (with slight modifications) to a spiking neural network (SNN) that has the same topology. Did you also train an ANN and then copy the weights in your code?
The other reinforcement learning (RL) code examples in BindsNET serve as proof of concept to demonstrate that the framework can utilize reward modulated signals to change the weights, similar to the RL framework. However, this arrangement does not perform well.
Regarding your other question, if the weights are not changing, I sugges…