Clarification on `carry` variable used during training #123

jren03 · 2024-04-29T00:35:04Z

Would you mind explaining the purpose of the carry variable used during training? From what I understand, it's a tuple of latent state and latent action.

From this section, it seems like prevlat is replaced with the context from the first sample of each batch. However, why does prevact or carry[1] not replaced in this case? Asked differently, if I were to alternate sampling from two different replay buffers during training, would I need need two different carry variables, or does a shared one suffice since it is getting replaced by context anyway?

Thank you in advance for the help!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on `carry` variable used during training #123

Clarification on `carry` variable used during training #123

jren03 commented Apr 29, 2024

Clarification on carry variable used during training #123

Clarification on carry variable used during training #123

Comments

jren03 commented Apr 29, 2024

Clarification on `carry` variable used during training #123

Clarification on `carry` variable used during training #123