Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about "{d}" and "{p,d}" #5

Open
hujunyi96 opened this issue Jul 31, 2023 · 7 comments
Open

Question about "{d}" and "{p,d}" #5

hujunyi96 opened this issue Jul 31, 2023 · 7 comments

Comments

@hujunyi96
Copy link

As is written in your article:.“p” stands for using weights in T as initialization, “d” stands for applying knowledge distillation with T as the teacher.
My question is:Does “using weights in T as initialization” mean fine-tuned model? E.g.”p” stands for “fine-tuning”, namely, 1.MPCBert-B stands for the most basic pre-trained transformer, 2.MPCBert-Bw/o{d} stands for applying KD on the most basic pre-trained transformer, 3. MPCBert-Bw/o{p,d} stands for applying KD on fine-tuned transformer?

@DachengLi1
Copy link
Owner

@hujunyi96 Thanks for checking the paper! Please take a look at the baseline subsections in Experiments section.

@hujunyi96
Copy link
Author

1.MPCFORMERw/o{d} also constructs the approximated model S‘ but trains’ on D with the task-specific objective, i.e., without distillation. We note that S‘ is initialized with weights in T , i.e., with different functions, whose effect has not been studied. We thus propose a second baseline MPC-FORMERw/o{p,d}, which trains S’on D without distillation, and random weight initialization.(from the baseline subsections in Experiments section);
2.“p” stands for using weights in T as initialization, “d” stands for applying knowledge distillation with T as the teacher.(From Table2)

Hello, @DachengLi1 ,I think the two expressions are contradictory, aren't they?
Simply put it, could you please directly explain what procedures 1.MPCBert-B, 2.MPCBert-Bw/o{d}, 3.MPCBert-Bw/o{p,d} have gone through respectively? Thanks a lot!

@DachengLi1
Copy link
Owner

DachengLi1 commented Jul 31, 2023

@hujunyi96 Definitely! Assuming a Bert-Base with CoLA example,
T means a Bert-base fine-tuned on CoLA
(1) MPCBert-B is our method: trained with distillation objective with T as the teacher, started from a Bert-Base.
(2) MPCBert-B w/o {d}: trained with task objective, started from a Bert-Base.
(3) MPCBert-B w/o {p,d}: trained with task objective, started from a randomly initialized Bert-Base architecture (not trained at all).

Note: all of these three models are S', which uses approximation. Only T uses GeLU+Softmax, if that is confusing.

@hujunyi96
Copy link
Author

hello, @DachengLi1 , when I was trying to use the param "--hidden_act quad" to train baselines with appromations, which are the first major innovation in your paper(the second one would be Distillation), an error occurred: KeyError:'quad'. That means the source code of transformer libs such as 'hidden_act' in BertConfig class don't support the new activation funcs in your paper(exact lib files that cause this error are: xxx/site-package/transformers/activations.py, line 208, in getitem) .
That said, I wonder how did you realize the quad function since the current code in this repo has the error above when running. Did you change source python lib files?

  I am looking forward to your reply, thanks! 

@DachengLi1
Copy link
Owner

@hujunyi96 We have a modified version of Transformers that will do this https://github.com/DachengLi1/MPCFormer/tree/main/transformers. In particular here:

. Maybe you are using the one in your environment? Should be easy to fix by checking some file path.

Or even simpler, you can just copy paste these several new functions to whereever you want them to be.

@hujunyi96
Copy link
Author

@DachengLi1 I see. I was following the main procedures in README.md in/baselines folder, as you can see the commands listed are actually exectuting run_glue.py, which seems doesn't import [MPCFormer/src/main/transformer/modeling.py] as a module. So I didn't notice the module is already in the project. Thanks for your help!

@hujunyi96
Copy link
Author

How does the command "pip install -e ." executed in path "/MPCFormer/transformers" achieves installing modules in a different path "[/src/main/transformer/]"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants