Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed OpenAI ROM Missing Error: "Exception: ROM is missing for pong" - Lab 3 #96

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added lab3/HC ROMS.zip
Binary file not shown.
111 changes: 60 additions & 51 deletions lab3/RL.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,15 @@
"id": "lbYHLr66i15n"
},
"source": [
"from google.colab import files\n",
"# The following command will open an upload window. \n",
"# To load the pong ROMs, please upload both HC_ROMS.zip and ROMS.zip to this window.\n",
"uploaded = files.upload()\n",
"\n",
"!pip install atari_py\n",
"\n",
"!python -m atari_py.import_roms .\n",
"\n",
"def create_pong_env(): \n",
" return gym.make(\"Pong-v0\", frameskip=5)\n",
"env = create_pong_env()\n",
Expand Down Expand Up @@ -805,8 +814,8 @@
"id": "YBLVfdpv7ajG"
},
"source": [
"Let's also consider the fact that, unlike CartPole, the Pong environment has an additional element of uncertainty -- regardless of what action the agent takes, we don't know how the opponent will play. That is, the environment is changing over time, based on *both* the actions we take and the actions of the opponent, which result in motion of the ball and motion of the paddles.\r\n",
"\r\n",
"Let's also consider the fact that, unlike CartPole, the Pong environment has an additional element of uncertainty -- regardless of what action the agent takes, we don't know how the opponent will play. That is, the environment is changing over time, based on *both* the actions we take and the actions of the opponent, which result in motion of the ball and motion of the paddles.\n",
"\n",
"Therefore, to capture the dynamics, we also consider how the environment changes by looking at the difference between a previous observation (image frame) and the current observation (image frame). We've implemented a helper function, `pong_change`, that pre-processes two frames, calculates the change between the two, and then re-normalizes the values. Let's inspect this to visualize how the environment can change:"
]
},
Expand All @@ -816,15 +825,15 @@
"id": "ItWrUwM87ZBw"
},
"source": [
"next_observation, _,_,_ = env.step(np.random.choice(n_actions))\r\n",
"diff = mdl.lab3.pong_change(observation, next_observation)\r\n",
"\r\n",
"f, ax = plt.subplots(1, 3, figsize=(15,15))\r\n",
"for a in ax:\r\n",
" a.grid(False)\r\n",
" a.axis(\"off\")\r\n",
"ax[0].imshow(observation); ax[0].set_title('Previous Frame');\r\n",
"ax[1].imshow(next_observation); ax[1].set_title('Current Frame');\r\n",
"next_observation, _,_,_ = env.step(np.random.choice(n_actions))\n",
"diff = mdl.lab3.pong_change(observation, next_observation)\n",
"\n",
"f, ax = plt.subplots(1, 3, figsize=(15,15))\n",
"for a in ax:\n",
" a.grid(False)\n",
" a.axis(\"off\")\n",
"ax[0].imshow(observation); ax[0].set_title('Previous Frame');\n",
"ax[1].imshow(next_observation); ax[1].set_title('Current Frame');\n",
"ax[2].imshow(np.squeeze(diff)); ax[2].set_title('Difference (Model Input)');"
],
"execution_count": null,
Expand All @@ -845,14 +854,14 @@
"id": "YiJLu9SEAJu6"
},
"source": [
"### Rollout function\r\n",
"\r\n",
"We're now set up to define our key action algorithm for the game of Pong, which will ultimately be used to train our Pong agent. This function can be thought of as a \"rollout\", where the agent will 1) make an observation of the environment, 2) select an action based on its state in the environment, 3) execute a policy based on that action, resulting in some reward and a change to the environment, and 4) finally add memory of that action-reward to its `Memory` buffer. We will define this algorithm in the `collect_rollout` function below, and use it soon within a training block.\r\n",
"\r\n",
"Earlier you visually inspected the raw environment frames, the pre-processed frames, and the difference between previous and current frames. As you may have gathered, in a dynamic game like Pong, it can actually be helpful to consider the difference between two consecutive observations. This gives us information about the movement between frames -- how the game is changing. We will do this using the `pong_change` function we explored above (which also pre-processes frames for us).\r\n",
"\r\n",
"We will use differences between frames as the input on which actions will be selected. These observation changes will be forward propagated through our Pong agent, the CNN network model, which will then predict the next action to take based on this observation. The raw reward will be computed. The observation, action, and reward will be recorded into memory. This will loop until a particular game ends -- the rollout is completed.\r\n",
"\r\n",
"### Rollout function\n",
"\n",
"We're now set up to define our key action algorithm for the game of Pong, which will ultimately be used to train our Pong agent. This function can be thought of as a \"rollout\", where the agent will 1) make an observation of the environment, 2) select an action based on its state in the environment, 3) execute a policy based on that action, resulting in some reward and a change to the environment, and 4) finally add memory of that action-reward to its `Memory` buffer. We will define this algorithm in the `collect_rollout` function below, and use it soon within a training block.\n",
"\n",
"Earlier you visually inspected the raw environment frames, the pre-processed frames, and the difference between previous and current frames. As you may have gathered, in a dynamic game like Pong, it can actually be helpful to consider the difference between two consecutive observations. This gives us information about the movement between frames -- how the game is changing. We will do this using the `pong_change` function we explored above (which also pre-processes frames for us).\n",
"\n",
"We will use differences between frames as the input on which actions will be selected. These observation changes will be forward propagated through our Pong agent, the CNN network model, which will then predict the next action to take based on this observation. The raw reward will be computed. The observation, action, and reward will be recorded into memory. This will loop until a particular game ends -- the rollout is completed.\n",
"\n",
"For now, we will define `collect_rollout` such that a batch of observations (i.e., from a batch of agent-environment worlds) can be processed serially (i.e., one at a time, in sequence). We will later utilize a parallelized version of this function that will parallelize batch processing to help speed up training! Let's get to it."
]
},
Expand Down Expand Up @@ -935,17 +944,17 @@
"id": "msNBRcULHbrd"
},
"source": [
"### Rollout with untrained Pong model ###\r\n",
"\r\n",
"# Model\r\n",
"test_model = create_pong_model()\r\n",
"\r\n",
"# Rollout with single batch\r\n",
"single_batch_size = 1\r\n",
"memories = collect_rollout(single_batch_size, env, test_model, choose_action)\r\n",
"rollout_video = mdl.lab3.save_video_of_memory(memories[0], \"Pong-Random-Agent.mp4\")\r\n",
"\r\n",
"# Play back video of memories\r\n",
"### Rollout with untrained Pong model ###\n",
"\n",
"# Model\n",
"test_model = create_pong_model()\n",
"\n",
"# Rollout with single batch\n",
"single_batch_size = 1\n",
"memories = collect_rollout(single_batch_size, env, test_model, choose_action)\n",
"rollout_video = mdl.lab3.save_video_of_memory(memories[0], \"Pong-Random-Agent.mp4\")\n",
"\n",
"# Play back video of memories\n",
"mdl.lab3.play_video(rollout_video)"
],
"execution_count": null,
Expand Down Expand Up @@ -979,27 +988,27 @@
"id": "FaEHTMRVMRXP"
},
"source": [
"### Hyperparameters and setup for training ###\r\n",
"# Rerun this cell if you want to re-initialize the training process\r\n",
"# (i.e., create new model, reset loss, etc)\r\n",
"\r\n",
"# Hyperparameters\r\n",
"learning_rate = 1e-3\r\n",
"MAX_ITERS = 1000 # increase the maximum to train longer\r\n",
"batch_size = 5 # number of batches to run\r\n",
"\r\n",
"# Model, optimizer\r\n",
"pong_model = create_pong_model()\r\n",
"optimizer = tf.keras.optimizers.Adam(learning_rate)\r\n",
"iteration = 0 # counter for training steps\r\n",
"\r\n",
"# Plotting\r\n",
"smoothed_reward = mdl.util.LossHistory(smoothing_factor=0.9)\r\n",
"smoothed_reward.append(0) # start the reward at zero for baseline comparison\r\n",
"plotter = mdl.util.PeriodicPlotter(sec=15, xlabel='Iterations', ylabel='Win Percentage (%)')\r\n",
"\r\n",
"# Batches and environment\r\n",
"# To parallelize batches, we need to make multiple copies of the environment.\r\n",
"### Hyperparameters and setup for training ###\n",
"# Rerun this cell if you want to re-initialize the training process\n",
"# (i.e., create new model, reset loss, etc)\n",
"\n",
"# Hyperparameters\n",
"learning_rate = 1e-3\n",
"MAX_ITERS = 1000 # increase the maximum to train longer\n",
"batch_size = 5 # number of batches to run\n",
"\n",
"# Model, optimizer\n",
"pong_model = create_pong_model()\n",
"optimizer = tf.keras.optimizers.Adam(learning_rate)\n",
"iteration = 0 # counter for training steps\n",
"\n",
"# Plotting\n",
"smoothed_reward = mdl.util.LossHistory(smoothing_factor=0.9)\n",
"smoothed_reward.append(0) # start the reward at zero for baseline comparison\n",
"plotter = mdl.util.PeriodicPlotter(sec=15, xlabel='Iterations', ylabel='Win Percentage (%)')\n",
"\n",
"# Batches and environment\n",
"# To parallelize batches, we need to make multiple copies of the environment.\n",
"envs = [create_pong_env() for _ in range(batch_size)] # For parallelization"
],
"execution_count": null,
Expand Down
Binary file added lab3/ROMS.zip
Binary file not shown.