Question about "{d}" and "{p,d}" #5

hujunyi96 · 2023-07-31T03:08:52Z

As is written in your article:.“p” stands for using weights in T as initialization, “d” stands for applying knowledge distillation with T as the teacher.
My question is：Does “using weights in T as initialization” mean fine-tuned model? E.g.”p” stands for “fine-tuning”, namely, 1.MPCBert-B stands for the most basic pre-trained transformer, 2.MPCBert-Bw/o{d} stands for applying KD on the most basic pre-trained transformer, 3. MPCBert-Bw/o{p,d} stands for applying KD on fine-tuned transformer?

DachengLi1 · 2023-07-31T05:12:01Z

@hujunyi96 Thanks for checking the paper! Please take a look at the baseline subsections in Experiments section.

hujunyi96 · 2023-07-31T07:15:05Z

1.MPCFORMERw/o{d} also constructs the approximated model S‘ but trains’ on D with the task-specific objective, i.e., without distillation. We note that S‘ is initialized with weights in T , i.e., with different functions, whose effect has not been studied. We thus propose a second baseline MPC-FORMERw/o{p,d}, which trains S’on D without distillation, and random weight initialization.(from the baseline subsections in Experiments section);
2.“p” stands for using weights in T as initialization, “d” stands for applying knowledge distillation with T as the teacher.(From Table2)

Hello, @DachengLi1 ,I think the two expressions are contradictory, aren't they?
Simply put it, could you please directly explain what procedures 1.MPCBert-B, 2.MPCBert-Bw/o{d}, 3.MPCBert-Bw/o{p,d} have gone through respectively? Thanks a lot!

DachengLi1 · 2023-07-31T07:52:40Z

@hujunyi96 Definitely! Assuming a Bert-Base with CoLA example,
T means a Bert-base fine-tuned on CoLA
(1) MPCBert-B is our method: trained with distillation objective with T as the teacher, started from a Bert-Base.
(2) MPCBert-B w/o {d}: trained with task objective, started from a Bert-Base.
(3) MPCBert-B w/o {p,d}: trained with task objective, started from a randomly initialized Bert-Base architecture (not trained at all).

Note: all of these three models are S', which uses approximation. Only T uses GeLU+Softmax, if that is confusing.

hujunyi96 · 2023-08-22T09:14:48Z

hello, @DachengLi1 , when I was trying to use the param "--hidden_act quad" to train baselines with appromations, which are the first major innovation in your paper(the second one would be Distillation), an error occurred: KeyError:'quad'. That means the source code of transformer libs such as 'hidden_act' in BertConfig class don't support the new activation funcs in your paper(exact lib files that cause this error are: xxx/site-package/transformers/activations.py, line 208, in getitem) .
That said, I wonder how did you realize the quad function since the current code in this repo has the error above when running. Did you change source python lib files?

  I am looking forward to your reply, thanks!

DachengLi1 · 2023-08-22T15:59:41Z

@hujunyi96 We have a modified version of Transformers that will do this https://github.com/DachengLi1/MPCFormer/tree/main/transformers. In particular here:

MPCFormer/src/main/transformer/modeling.py

Line 139 in 38cb42c

def quad(x):

. Maybe you are using the one in your environment? Should be easy to fix by checking some file path.

Or even simpler, you can just copy paste these several new functions to whereever you want them to be.

hujunyi96 · 2023-08-25T03:48:44Z

@DachengLi1 I see. I was following the main procedures in README.md in/baselines folder, as you can see the commands listed are actually exectuting run_glue.py, which seems doesn't import [MPCFormer/src/main/transformer/modeling.py] as a module. So I didn't notice the module is already in the project. Thanks for your help!

hujunyi96 · 2023-09-27T10:06:53Z

How does the command "pip install -e ." executed in path "/MPCFormer/transformers" achieves installing modules in a different path "[/src/main/transformer/]"?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about "{d}" and "{p,d}" #5

Question about "{d}" and "{p,d}" #5

hujunyi96 commented Jul 31, 2023

DachengLi1 commented Jul 31, 2023

hujunyi96 commented Jul 31, 2023

DachengLi1 commented Jul 31, 2023 •

edited

Loading

hujunyi96 commented Aug 22, 2023

DachengLi1 commented Aug 22, 2023

hujunyi96 commented Aug 25, 2023

hujunyi96 commented Sep 27, 2023

Question about "{d}" and "{p,d}" #5

Question about "{d}" and "{p,d}" #5

Comments

hujunyi96 commented Jul 31, 2023

DachengLi1 commented Jul 31, 2023

hujunyi96 commented Jul 31, 2023

DachengLi1 commented Jul 31, 2023 • edited Loading

hujunyi96 commented Aug 22, 2023

DachengLi1 commented Aug 22, 2023

hujunyi96 commented Aug 25, 2023

hujunyi96 commented Sep 27, 2023

DachengLi1 commented Jul 31, 2023 •

edited

Loading