-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POS_EMB #15
Comments
There is another question. I plan to input my own data set (rgb video) into this model to run. In addition to converting the data to 30fps first, do any corrections need to be made? |
The last question. The paper mentions using a method similar to vit to cut the image into multiple patches and throw them into the model. However, the patch size in line 52 of the config.yaml configuration is annotated. I would like to know your patch division. Is the principle of treating each frame of a movie as a patch? Still, the image of each frame is cut into a fixed size and used as a patch. If this is the case, why is the value of patch not defined? |
Hi @polarbear55688
That file was part of a new development about a smarter positional embedding, but ultimately, we decided to discard that idea. Just ignore it
There are no other corrections needed. As a remark, the AcT model takes human poses (skeletal data) as input, so you need to process your videos to extract human poses before using AcT.
Yeah, that shouldn't be commented out, but we found out that 1 as patch dimension works best, so nothing changes as it's the default value. Hope this helps! |
Hello, when I was looking at the config.yaml file, I saw that POS_EMB on line 54 was commented out. I wanted to know how the pos_emb.npy file was formed.
The text was updated successfully, but these errors were encountered: