点歌：知识蒸馏（Knowledge Distillation）串烧 #186

QiushiSun · 2022-06-13T16:02:32Z

QiushiSun
Jun 13, 2022

先简单列举几篇个人认为该领域比较有影响力的文章，感谢沐神和BryanZhu

Distilling the Knowledge in a Neural Network（Hinton在NIPS2014提出KD的原始论文）

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks（BERT蒸馏到BiLSTM中）

Patient Knowledge Distillation for BERT Model Compression（从教师模型逐层提取知识）

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter（预训练阶段进行知识蒸馏）

TinyBERT: Distilling BERT for Natural Language Understanding（两阶段蒸馏）

MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers（蒸馏Value-Value矩阵，引入TA机制）

FastBERT: a Self-distilling BERT with Adaptive Inference Time（自蒸馏）