You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
先简单列举几篇个人认为该领域比较有影响力的文章,感谢沐神和BryanZhu
Distilling the Knowledge in a Neural Network(Hinton在NIPS2014提出KD的原始论文)
https://arxiv.org/abs/1908.09355
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks(BERT蒸馏到BiLSTM中)
https://arxiv.org/abs/1903.12136
Patient Knowledge Distillation for BERT Model Compression(从教师模型逐层提取知识)
https://arxiv.org/abs/1908.09355
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter(预训练阶段进行知识蒸馏)
https://arxiv.org/abs/1910.01108
TinyBERT: Distilling BERT for Natural Language Understanding(两阶段蒸馏)
https://arxiv.org/abs/1909.10351
MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers(蒸馏Value-Value矩阵, 引入TA机制)
https://arxiv.org/abs/2002.10957
FastBERT: a Self-distilling BERT with Adaptive Inference Time(自蒸馏)
https://arxiv.org/abs/2004.02178
Beta Was this translation helpful? Give feedback.
All reactions