Review Article

Bidirectional Language Modeling: A Systematic Literature Review

Table 7

Effect of parameters with different layers, hidden layers, and attention heads.

ParametersLayersHiddenAttention head
LessMoreSameLessMoreSameLessMoreSame

1–99 M[31, 41, 42][48][31, 41, 42][48][31, 41, 42][48]
100–199 M[14, 27, 43, 45, 47, 52][15][14, 27, 43, 45, 47, 52][15][14, 27, 43, 45, 47, 52][15]
<200–300>M[33, 35][33, 35][33, 35]
301–349M[20, 39][10,2830, 32, 40, 44, 46, 49, 51][39][10, 20, 2830, 32, 40, 44, 46, 49, 51][39][10, 20, 2830, 32, 40, 44, 46, 49, 51]
350–399[11, 37, 38][11, 37, 38][11, 37, 38]
400–500[50][50][50]
501-Ow[36][34][36][34][36][34]