You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
whenever the final_timestep or done_bool is true, the collected data will be added as a trajectory. However in D4RL's docs,
Timeouts in this (medium-replay) dataset are not always marked when the agent reaches the max trajectory length, but rather when 1000 timesteps have been sampled for a particular training iteration.
Thus, there exist trajectories which are not done or timeout but rather truncated due to the limitation of sampling steps. Such trajectories are typically short in length, and if we compute return on these trajs, the return-to-go will be deviated from its true value since we don't give an estimated value for the last timestep. Will this be an issue for DT?
Please correct me if there is any mis-understanding =)
The text was updated successfully, but these errors were encountered:
Hi,
I have some question about the data preprocessing of medium-replay datasets. In the provided implementation,
https://github.com/kzl/decision-transformer/blob/e2d82e68f330c00f763507b3b01d774740bee53f/gym/data/download_d4rl_datasets.py#L35...L40
whenever the
final_timestep
ordone_bool
is true, the collected data will be added as a trajectory. However in D4RL's docs,Thus, there exist trajectories which are not done or timeout but rather truncated due to the limitation of sampling steps. Such trajectories are typically short in length, and if we compute return on these trajs, the return-to-go will be deviated from its true value since we don't give an estimated value for the last timestep. Will this be an issue for DT?
Please correct me if there is any mis-understanding =)
The text was updated successfully, but these errors were encountered: