POS_EMB #15

polarbear55688 · 2024-07-10T05:38:21Z

Hello, when I was looking at the config.yaml file, I saw that POS_EMB on line 54 was commented out. I wanted to know how the pos_emb.npy file was formed.

polarbear55688 · 2024-07-10T05:40:06Z

There is another question. I plan to input my own data set (rgb video) into this model to run. In addition to converting the data to 30fps first, do any corrections need to be made?

polarbear55688 · 2024-07-17T05:47:36Z

The last question. The paper mentions using a method similar to vit to cut the image into multiple patches and throw them into the model. However, the patch size in line 52 of the config.yaml configuration is annotated. I would like to know your patch division. Is the principle of treating each frame of a movie as a patch? Still, the image of each frame is cut into a fixed size and used as a patch. If this is the case, why is the value of patch not defined?
Sorry to bother you with so many questions.

simoneangarano · 2024-07-20T09:50:53Z

Hi @polarbear55688
Let me try to answer your questions:

Hello, when I was looking at the config.yaml file, I saw that POS_EMB on line 54 was commented out. I wanted to know how the pos_emb.npy file was formed.

That file was part of a new development about a smarter positional embedding, but ultimately, we decided to discard that idea. Just ignore it

There is another question. I plan to input my own data set (RGB video) into this model to run. In addition to converting the data to 30fps first, do any corrections need to be made?

There are no other corrections needed. As a remark, the AcT model takes human poses (skeletal data) as input, so you need to process your videos to extract human poses before using AcT.

The last question. The paper mentions using a method similar to vit to cut the image into multiple patches and throw them into the model. However, the patch size in line 52 of the config.yaml configuration is annotated. I would like to know your patch division. Is the principle of treating each frame of a movie as a patch? Still, the image of each frame is cut into a fixed size and used as a patch. If this is the case, why is the value of patch not defined? Sorry to bother you with so many questions.

Yeah, that shouldn't be commented out, but we found out that 1 as patch dimension works best, so nothing changes as it's the default value.

Hope this helps!

simoneangarano self-assigned this Jul 18, 2024

polarbear55688 closed this as completed Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POS_EMB #15

POS_EMB #15

polarbear55688 commented Jul 10, 2024

polarbear55688 commented Jul 10, 2024

polarbear55688 commented Jul 17, 2024

simoneangarano commented Jul 20, 2024 •

edited

Loading

POS_EMB #15

POS_EMB #15

Comments

polarbear55688 commented Jul 10, 2024

polarbear55688 commented Jul 10, 2024

polarbear55688 commented Jul 17, 2024

simoneangarano commented Jul 20, 2024 • edited Loading

simoneangarano commented Jul 20, 2024 •

edited

Loading