Review Article
Bidirectional Language Modeling: A Systematic Literature Review
| Paper | Name | Training data size | Tokens | Training dataset name | Model type | Sentence learning | Cross-layer parameter sharing |
| [17] | RoBERTa (large) | 160 GB | −2.2T | BooksCorpus + Wikipedia + CC-News + OpenWebText + Stories | Autoencoding | None | False | [10] | SpanBERT | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | False | [11] | SemBERT | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | False | [14] | ERNIE, | 9 GB+ | 4.5B + 140M | English Wikipedia + Wikidata | Autoencoding | NSP | False | [15] | ERNIE2.0 | 13 GB+ | 8B | Encyclopedia + BooksCorpus + Dialog + Discourse Relation Data | Autoencoding | None | True | [27] | BERT (base) | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | NSP | False | [28] | XLNet | 126 GB | 32.89B | BooksCorpus + Wikipedia + Giga5+ ClueWeb 2012B + Common Crawl | Autoencoding + autoregressive | None | True | [29] | UniLM | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | NSP | True | [20] | — | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | True | [30] | StructBERT | 16 GB | 2.5B+ | English Wikipedia + BooksCorpus | Autoencoding | NSP | False | [31] | TinyBERT | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | NSP | False | [32] | MT-DNN | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | NSP | True | [33] | AlBERT (xxlarge) | 16 GB | — | BooksCorpus + Wikipedia | Autoencoding | Sop | True | [34] | Megatron-LM | 174 GB | — | Wikipedia + CC-Stories + Real News + OpenWebText | Autoencoding | SOP | True | [35] | AlBERT (xxlarge-ensemble) | 16 GB | — | BooksCorpus + Wikipedia | Autoencoding | None | True | [36] | T5 | 29 TB | — | Colossal Clean Crawled Corpus | Autoencoding | None | True | [37] | SMARTRoBERTa | 160 GB | — | BooksCorpus + Wikipedia + CC-News + OpenWebText + Stories | Autoencoding | None | False | [38] | FreeLBRoBERTa | 160 GB | −2.2T | BooksCorpus + Wikipedia + CC-News + OpenWebText + Stories | Autoencoding | None | False | [39] | SesameBERT | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | False | [40] | Electra1.75M | 126 GB | 33B | BooksCorpus + Wikipedia + Giga5+ ClueWeb 2012B + Common Crawl | Autoencoding | None | True | [41] | MiniLMa | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | False | [42] | SBERT-WK | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | False | [43] | PowerBERT | 11 GB | 3.4B | BooksCorpus + Wikipedia | Autoencoding | None | False | [44] | Bam | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | True | [45] | StackBERT | 11 GB | 3.4B | BooksCorpus + Wikipedia | Autoencoding | None | False | [46] | MT-DNNKD | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | True | [47] | HUBERT | 16 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | True | [48] | AdaBERT | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | True | [49] | BERTQA | 13 GB | 3.8B | BooksCorpus + Wikipedia | Autoencoding | None | False | [50] | BART (large) | 160 GB | 2.2T | 160 GB + Wikipedia | Autoregressive | None | False | [51] | Nezha | — | 10.5B | Chinese Wikipedia + Baidu Baike + Chinese News | Autoencoding + autoregressive | NSP | True | [52] | UniLMv2 | 160 GB | — | BooksCorpus + Wikipedia + CC-News + OpenWebText + Stories | Autoencoding and partially autoregressive | None | False |
|
|