Research Article
Training Method and Device of Chemical Industry Chinese Language Model Based on Knowledge Distillation
Table 2
Distillation performance with BERT base.
| Model | Layers | Hidden | Acc (%) | F1 (%) |
| BERT (teacher) | 6 | 768 | 94.13 | 92.52 | DistillBILSTM | 3 | 300 | 91.45 | 90.21 | BERT PKD | 3 | 768 | 92.87 | 90.66 | DistillBERT [36] | 3 | 768 | 91.77 | 89.63 | BERT-of-Theseus | 3 | 768 | 93.43 | 91.14 | BERT-EMD [37] | 3 | 768 | 93.77 | 91.34 | BiLSTM-KD | 3 | 200 | 93.13 | 91.07 |
|
|