Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the compute_lambda_values function #302

Open
zichunxx opened this issue Jun 21, 2024 · 1 comment
Open

Question about the compute_lambda_values function #302

zichunxx opened this issue Jun 21, 2024 · 1 comment

Comments

@zichunxx
Copy link

zichunxx commented Jun 21, 2024

Hi! Thanks for your work on reimplementing dreamverv1 in a simple way.

I tried to learn the computation process of dreamerv1, but feel confused about the logistics of the compute_lambda_values function:

last_values = torch.clone(last_values)
last_lambda_values = 0
lambda_targets = []
for step in reversed(range(horizon - 1)):
if step == horizon - 2:
next_values = last_values
else:
next_values = values[step + 1] * (1 - lmbda)
delta = rewards[step] + next_values * done_mask[step]
last_lambda_values = delta + lmbda * done_mask[step] * last_lambda_values
lambda_targets.append(last_lambda_values)
return torch.stack(list(reversed(lambda_targets)), dim=0)

  1. Does the above snippet refer to Eq.6 in the original paper? i.e.,
$$V_\lambda(s_\lambda) = (1- \lambda) \sum_{n-1}^{H-1} \lambda ^{n-1} V_N^n(s_\lambda) + \lambda ^{H-1} V_N^H(s_\lambda)$$

I could not find anything in common between them.

  1. If so, what does delta mean? Is delta TD target?

I'm new to the Dreamer series. Please forgive me if my question looks dumb to you. Thanks.

@zichunxx
Copy link
Author

Update:

I think I have found the answer in Eq.4 of dreamerv2:

eq4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant