-
-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing tutorial for deep reinforcement learning implementation with pytorch from scratch
#1104
Conversation
Merging latest commits with my added code.
Merging latest commits from main with my own commits.
Integrating latest commits with my own code.
Merging my changes with the latest development.
Merging my development with the latest commits from main branch.
…on an environment successfully.
Merging my changes with the latest developments from main branch.
Merging with the latest commits from upstream/main.
Merging latest updates.
Merging with the latest updates.
Merging with the latest updates
… simplifying the agent class where necessary.
fetching latest changes from upstream.
…or evaluating agent and visualizing it, and some more cleaning up and change of documentation.
Merging latest developments from main with my changes.
…how results are visualized. changed evaluation method to conform to proposed standard by Machado et al 2018. added code to create gifs of evaluation checkpoints. multiple changes in documentation and variable naming. added timing functionality so that runtime is tracked and printed. added results for multiple envs.
Merging my changes with the latest changes from main.
…aulting. Replaced replay memory with a list-based one instead of pre-initialized numpy arrays. Started to use FrameStack for every env. Also, in_features of linear layer is calculated automatically now.
Merging with the latest changes from main.
…rk, and modifying the agent class accordingly.
… and create_env() function, as well as restructuring the code.
…roblems and spellings, etc.
Fetching the latest commits from main.
Was this tutorial discussed with anyone, on Discord or otherwise? |
Yes, see Issues |
Thanks for the tutorial @hahas94, is this a new tutorial or has this been published before? Having a quick look it is very impressive, I haven't checked the implementation My initial reaction is a couple of the more advanced points shouldn't be included, i.e., the I'll try to make a more complete review this week |
Thank you! No it has not been published before, so it is new. Yes please have a thorough look and let me know what changes to make or how to restructure it if necessary. |
nn.Linear(in_features=n_hidden_units, out_features=out_features), | ||
) | ||
|
||
def _output_size(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be computed, can't env.action_space.shape
or env.observation_space.shape
be used instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how it could be computed from only env.action_space.shape
or env.observation_space.shape
without going through the steps made by the Conv2d
layer, that is kerner_size
, stride
padding
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't output_size == action_space.shape[0]
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are talking about the output of the network, then in C51 it is action_space.shape[0] * n_atoms
.
However, note the two lines
in_features = self._output_size()
out_features = self._params.n_actions * self._params.n_atoms
where the method _output_size
is used to compute the output size of the flatten layer. Maybe the choice of method name is not good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in_features = self._output_size()
" I very confused by what I am reading here, maybe you need to rename the function _output_size
to something else, or add some more comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspected that. Changed the method name to:
def _fc_input_size(self) -> int:
"""Compute size of input to first linear layer."""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the case that:
in_features = env.observation_space.shape[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially true.
If the vision part of the network is not utilized, then indeed in_features = env.observation_space.shape[0]
, and this is exactly what is returned by _fc_input_size
, since the convolutional sequential is empty, so the observation is returned as is.
But if the vision part is active, then it depends on the observation shape and convolutional operations.
Assuming that env.observation_space.shape == (4, 84, 84)
and the current convolutional structure, then the size is computed manually as follows:
the output shape of the first conv2d layer is (32, 20, 20)
,
the output shape of the second conv2d layer is (64, 9, 9)
and
the output shape of the third conv2d layer is (64, 7, 7)
which will produce 64 * 7 * 7 = 3136
weights at the flatten layer.
But we can't always assume that the env.observation_space.shape == (4, 84, 84)
or that the network has this structure in general. So I prefer computing it using the current method.
Fetching and merging latest commits from main.
# --- env related --- | ||
env_name: str = "" | ||
obs_shape: tuple = (1,) # shape of observation, ex. (1, 4), (84, 84) | ||
n_actions: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are obs_shape
and n_actions
hyperparameter? They are constants of the environments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only these two, but several other parameters are in this class that are not hyperparameters, however this way it is easier to pass these parameters to any function/class that need them.
nn.Linear(in_features=n_hidden_units, out_features=out_features), | ||
) | ||
|
||
def _output_size(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in_features = self._output_size()
" I very confused by what I am reading here, maybe you need to rename the function _output_size
to something else, or add some more comments
Merge commits from main.
@pseudo-rnd-thoughts, @Kallinteris-Andreas just checking in on this PR. Is there anything more I need to address or any additional feedback you can provide? |
Is there a reason this PR is not merged but closed? would love to help out with this, seems like most of the work is done. |
Description
This PR is for a tutorial I have been working for, namely the
deep reinforcement learning implementation with pytorch from scratch
. There are no changes in other code/files, hence this RP only adds a new file, along with images and gifs.Fixes # (issue)
Type of change
Screenshots
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)