Research Article
Training Method and Device of Chemical Industry Chinese Language Model Based on Knowledge Distillation
Table 3
The experimental situation of the model in the absence of different layers.
| Distillation details | Layer | Acc (%) | F1 (%) |
| BiLSTM-KD | All layer | 93.13 | 91.07 | No embedding layer | 87.44 | 85.69 | No hidden layer | 74.97 | 71.57 | No prediction layer | 84.22 | 81.26 | BiLSTM | ā | 72.39 | 70.06 |
|
|