You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would you mind explaining the purpose of the carry variable used during training? From what I understand, it's a tuple of latent state and latent action.
From this section, it seems like prevlat is replaced with the context from the first sample of each batch. However, why does prevact or carry[1] not replaced in this case? Asked differently, if I were to alternate sampling from two different replay buffers during training, would I need need two different carry variables, or does a shared one suffice since it is getting replaced by context anyway?
Thank you in advance for the help!
The text was updated successfully, but these errors were encountered:
Would you mind explaining the purpose of the
carry
variable used during training? From what I understand, it's a tuple of latent state and latent action.From this section, it seems like
prevlat
is replaced with the context from the first sample of each batch. However, why doesprevact
orcarry[1]
not replaced in this case? Asked differently, if I were to alternate sampling from two different replay buffers during training, would I need need two differentcarry
variables, or does a shared one suffice since it is getting replaced by context anyway?Thank you in advance for the help!
The text was updated successfully, but these errors were encountered: