Research Article

Deep Learning Based Abstractive Text Summarization: Approaches, Datasets, Evaluation Measures, and Challenges

Table 2

Dataset preprocessing and word embedding.

ReferenceAuthorsDataset preprocessingInput (word embedding)

[18]Rush et al.PTB tokenization by using “#” to replace all digits, converting all letters to lower case, and “UNK” to replace words that occurred fewer than 5 timesBag-of-words of the input sentence embedding
[39]Chopra et al.PTB tokenization by using “#” to replace all digits, converting all letters to lower case, and “UNK” to replace words that occurred fewer than 5 timesEncodes the position information of the input words
[55]Nallapati et al.Part-of-speech and name-entity tags generating and tokenization(i) Encodes the position information of the input words
(ii) The input text was represented using the Word2Vec model with 200 dimensions that was trained using Gigaword corpus
(iii) Continuous features such as TF-IDF were represented using bins and one-hot representation for bins
(iv) Lookup embedding for part-of-speech tagging and name-entity tagging
[52]Zhou et al.PTB tokenization by using “#” to replace all digits, converting all letters to lower case, and “UNK” to replace words that occurred fewer than 5 timesWord embedding with size equal to 300
[53]Cao et al.Normalization and tokenization, using the “#” to replace digits, convert the words to lower case, and “UNK” to replace the least frequent words.GloVe word embedding with dimension size equal to 200
[54]Cai et al.Byte pair encoding (BPE) was used in segmentationTransformer
[50]Adelson et al.Converting the article and their headlines to lower case lettersGloVe word embedding
[29]LopyrevTokenization, converting the article and their headlines to lower case letters, using the symbol to replace rare wordsThe input was represented using the distributed representation
[38]Jobson et al.The word embedding randomly initialised and updated during training while GloVe word embedding was used to represent the words in the second and third models
[56]See et al.The word embedding of the input for was learned from scratch instead of using a pretrained word embedding model
[57]Paulus et al.The same as in [55]GloVe
[58]Liu et al.CNN maximum pooling was used to encode the discriminator input sequence
[30]Song et al.The words were segmented using CoreNLP tool, resolving the coreference and reducing the morphologyConvolutional neural network was used to represent the phrases
[35]Al-Sabahi et al.The word embedding is learned from scratch during training with a dimension of 128
[59]Li et al.The same as in [55]Learned from scratch during training
[60]Kryściński et al.The same as in [55]Embedding layer with a dimension of 400
[61]Yao et al.The word embedding is learned from scratch during training with a dimension of 128
[62]Wan et al.No word segmentationEmbedding layer learned during training
[65]Liu et al.BERT
[63]Wang et al.Using WordPiece tokenizerBERT
[64]Egonmwan et al.GloVe word embedding with dimension size equal to 300