Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structural downsampling and static token sparsification #5

Open
Yeez-lee opened this issue Oct 29, 2021 · 3 comments
Open

Structural downsampling and static token sparsification #5

Yeez-lee opened this issue Oct 29, 2021 · 3 comments

Comments

@Yeez-lee
Copy link

Hi, it's a quite solid and promising work but I have some questions.
(1) In the paper, you perform an average pooling with kernel size 2 × 2 after the sixth block for the structural downsampling. But in Table 3, you show the results of structural downsampling and static dynamic token sparsification. What is the difference between structural downsampling and static token sparsification since their ACCs are not same?
(2) I'm interested in the average pooling with kernel size 2 × 2. Did you do extra experiments in the position of such structural downsampling, like the seventh block or the tenth block in ViT?
(3) Could you provide the codes for reproducing the results of structural downsampling and static token sparsification in Table 3 and the probability heat-map in Figure 6?

Thanks for your help!

@raoyongming
Copy link
Owner

Hi, thanks for your interest in our work.

  1. "Structural downsampling" means that we downsample the token using 2x2 average pooling. "Static token sparsification" means that we learn a fixed parameter for each token to reflect its importance using our loss and learning method.

  2. We perform the average pooling after the sixth block since the resulting model will have similar FLOPs compared to our method. In this experiment, we fix the overall complexity of each model and compare the performance.

  3. You can simply add an average pooling layer after the sixth block to implement the structural downsampling method. For the static token sparsification baseline, you can replace the output of the PredictorLG as a nn.Parameter tensor that is shared for all inputs. We will update the code after the CVPR deadline.

@Yeez-lee
Copy link
Author

Thanks for your quick response! Look forwards to seeing your official codes for structural downsampling and static token sparsification after the CVPR deadline.

@Aoshika123
Copy link

Hi, it's a quite solid and promising work but I have some questions. (1) In the paper, you perform an average pooling with kernel size 2 × 2 after the sixth block for the structural downsampling. But in Table 3, you show the results of structural downsampling and static dynamic token sparsification. What is the difference between structural downsampling and static token sparsification since their ACCs are not same? (2) I'm interested in the average pooling with kernel size 2 × 2. Did you do extra experiments in the position of such structural downsampling, like the seventh block or the tenth block in ViT? (3) Could you provide the codes for reproducing the results of structural downsampling and static token sparsification in Table 3 and the probability heat-map in Figure 6?

Thanks for your help!

Hello, do you have the code for locating Graph-6 probability matrices? I want to reproduce the results of a paper recently but I couldn’t find the corresponding code. Looking forward to your reply, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants