-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about num_timestep #12
Comments
@YifeiChen777 hi Caroline, yes you need to set that to greater than 1 for the autoregressive Q-learning |
in autoregressive_q_learn() method of QLearner the time steps dimension is moved into batch dimension. E.g. if we have a batch of 16 samples with 3 timesteps each - this will be converted into a batch of 48.
It looks like all the next learning does not take into account history and treats each element of a batch independently because attention mechanism seems to not span across the batch dimension. Thus, if every item in a batch attends only to itself (and cross-attends to encoded_state but still within a single item of a batch), the model does not see and will not recognize any inter-timestep dependencies. Please correct me if miss something? |
When I change num_timestep = 50, a problem of tensor size mismatch occurs |
@Johnly1986 could you try again on the latest version? |
hi, yes this is correct afaict. this is why in the are you able to get things working with just single frames? i'm happy to invest some time building out transformer-xl component for you if you have everything setup, and willing to share your experimental results |
@yuriy10139 also, you should reach out to @2M-kotb, as i think he was playing around with the repo for his research some time back |
I've prepared a small demo in a separate repository https://github.com/yuriy10139/q-transfromer-maniskill-demo On 50 eval runs the model shows 4.054, 4.647, 4.568 of average reward for the 4000-step, 7000-step and 10000-step checkpoints respectively, so probably it learns something, but still quite far from well-trained. If you'll manage to add history-based learning, I hope to find more GPU time to test it with bigger resolution and for more timesteps. |
@yuriy10139 thank you! i'll add a few things soon |
Hi @lucidrains, was the code of demo setup any helpful? |
In the code, there is this "num_timesteps" in the constructor of the ReplayMemoryDataset class. Does this "num_timesteps" correspond to the concept of "window" in the paper? In my understanding, the q function proposed in the paper only takes states within the window range (s_{t-w} to s_t) and feed them into the transformer instead of using all of the previous states. Is this interpretation correct? I am confused because in the code, the default "num_timesteps" is 1, which, in my understanding won't encode any sequential info.
The text was updated successfully, but these errors were encountered: