Abstract

The front-page news in authoritative newspapers usually represents extremely significant national policies. Accurately classifying the front-page news from an amount of news helps us to quickly acquire and deeply understand the changing political and economic situations. In this paper, we propose a front-page news classification model StackText based on stacking the textual context and attribute information of news. In the proposed model, we first balance the classification size in the training set through the weighted random sampling algorithm, then construct the context and attribute feature vectors through the textual embedding and statistical analysis of news, and finally pretrain a classifier StackNet based on a neural network to realize the front-page news classification in the testing set. Taking People’s Daily as the experimental example, we compare the proposed model with benchmark methods based on statistical learning and deep learning. Based on the news sets in four stages of People’s Daily, the experimental results of the front-page news classification show that the proposed model achieves the highest average accuracy and a better balance between precision and recall, which is verified by the corresponding -score.

1. Introduction

News in authoritative newspapers reports influential social phenomenon and mass media information. In them, these pieces of news that usually represent extremely significant national policies are printed on the front page of authoritative newspapers, which is the so-called front-page news. For example, People’s Daily, one of the most authoritative newspapers in China, is a propaganda tool for the Chinese government to announce political and economic views to the outside world [1, 2]. It has played a decisive role in guiding Chinese mainland politics at different times by artificially composing the front-page news. There are many ways from different perspectives to study People’s Daily. For the news category in social science, accurately and automatically classifying the front-page nets from an amount of news apparently helps us to quickly acquire and deeply understand the changing political and economic situations [35].

With the great technological advances in computer science, many text classification models have been consecutively developed based on machine learning and natural language processing [611]. There are two kinds of text classification models. One is based on statistical learning and the other is based on deep learning. They are widely applied for the news classification task, where news is mainly represented by a large number of textual information [1217]. In the early stage, these text classification models based on statistical learning are used to categorize news into different polarization or topics. For example, the text classification models based on the support vector machine (SVM) classify news in each group of Twitter into positive and negative categories [18, 19]. In further, more feature extraction methods are introduced to construct the text classification models based on SVM and its variants, which are applied for multitopic news classification [20, 21]. In recent years, the quick development of deep neural networks promotes to study of these text classification models based on deep learning, which has become the mainstream in the field of text classification. A lot of benchmark methods combining word embedding and deep neural networks, such as FastText, TextCNN, TextRNN, and TextRCNN, have been proposed one after another [2226]. They are frequently applied for news classification [2729].

In this paper, we aim to propose a novel text classification model for authoritative newspapers (e.g., People’s Daily) to accurately and automatically categorize candidate news into the front-page news or the other news. Apparently, the front-page news classification task is an imbalanced classification problem because the number of the front-page news is much less than that of the other news. The proposed text classification model should consider the feature, which is the unbalance of the number of the front-page news and the other news. For that, we study a framework involving three important modules to construct the front-news classification model StackText based on stacking the textual context and attribute information of news. Especially, the news sampling module in StackText outputs the balanced training set, solving the imbalanced number between the front-page news and the other news, and promoting the classification performance. A more detailed flowchart of StackText will be illustrated in the following section.

In all, we make the following contributions:(1)We propose a novel text classification model, StackText, that is suitable for front-page news classification. StackText is constructed based on the stacking of the textual content and attribute information of news, and thus blends more features. In addition, it also introduces the new sampling module to balance the classification size in the training set, promoting the classification performance of the pretrained classifier StackNet based on a neural network.(2)We construct diverse experimental subsets from the datasets of news in People’s Daily. Each subset shows the varied styles of different leaderships during the ruling period. These subsets make us evaluate StackText more accurately and adequately.(3)We carry out extensive testing experiments and verify the better classification performance of StackText based on the diverse indexes by comparing it with other benchmark methods according to the experimental results.

The reminders of this paper are organized as follows: In Section 2, we present the problem statement of the front-page news classification. Section 3 detailedly illustrates the whole framework of StackText, including three critical modules. In Section 4, the experiment design gives the construction of the dataset and the division of the training set, the testing set, and the validation set. Section 5 explores the necessity of the news sampling module by contrastively analyzing the experimental results and verifying the effectiveness of StackText. We finally conclude the paper in Section 6.

2. Problem Statement

The news classification task can be described as obtaining such a function mapping,where represents a collection of news that need to be classified, and represents a collection of f categories under a predefined classification model. Note that the output result of has only two possible values. If , belongs to , otherwise it does not belong to . In other words, the aim is to find a valuable function mapping to classify news accurately into categories.

Concretely speaking, the front-page news classification is a binary classification task. If making indicates the front-page news and indicates the other news, C is simplified as . Assuming that there is news in total, each one has the possibility of being classified into two different categories. We obtain the possibility matrix . denotes the category of the front-page news, and denotes the category of the other news. Note that for each news, the sum of and equals to 1. If , it means that -th news is the front-page news; otherwise, it is the other news.

3. Model

In this section, we introduce the front-page news classification model, StackText. As shown in Figure 1, the schematic flowchart of StackText is composed of three core modules that are news sampling, feature extraction, and StackNet. Specifically, the news sampling module uses the weighted random sampling algorithm to solve the imbalance of training data labels. The feature extraction module uses the Doc2Vec embedding algorithm to obtain the context feature vector and captures attribute information to construct the attribute feature vector, respectively. The StackNet module is a classifier based on a neural network by stacking the context and attribute feature vectors. To better understand the framework of StackText, each module will be illustrated in the following subsections.

3.1. News Sampling Module

When dealing with the front-page news classification problem of People’s Daily, we find that the number of front-page news is much less than that of the other news. The unbalance of the number of the front-page news and the other news easily makes the classification result biased [30]. That is, candidate news will be classified into the other news in a biased way. To solve such a problem, we introduce a weighted random sampling algorithm that balances the samples of the front-page news, and the other news in training set to acquire a better classification model.

Let the training set where . Firstly, we quantify the weights of samples (i.e., the front-page news and the other news) in . For the category with the size , the weight of the sample belonging to it can be expressed by denotes the number of categories, and it is according to the description of the front-page news classification task. Note that each sample in the same category has an identical weight, so that involves two values, and . denotes the weight of the samples of the front-page news and denotes the weight of the samples of the other news. Then, we use the weighted random sampling algorithm to select samples from the training set according to their weights. Specifically, the selecting procedure is illustrated as follows:(1)Initially, we randomly select samples from to construct the reservoir where . Note that the original includes the imbalanced number of the front-page news and the other news.(2)Given a sample with , we generate a uniformly distributed random number , and set a rank-score . Across , we obtain the rank-score set of samples.(3)We sort the rank-score set and obtain the minimum rank-score .(4)We randomly select a sample , and compute its rank-score . If is larger than , replaces the sample with the rank-score .(5)Repeat steps 3 and 4 in iterations, and finally, obtain the updated.

It can be guaranteed that the updated includes the balanced number of the front-page news and the other news, where the ratio between the front-page news and the other news tends to be [31]. Note that a sample in can be a repetitive occurrence in the updated . We use the updated for the balanced training set .

3.2. Feature Extraction Module

The news of People’s Daily not only contains the textual context information but also includes the attribute information. For the textual context information, it cannot be directly described by the news text because of the diversity of news content. Alternatively, the textual context information can be represented by the context feature vector extracted from the news text. Compared with a range of textual feature representation algorithms, Doc2Vec is an unsupervised learning algorithm that can effectively represent the context feature vector of the news text of any [32] length. Figure 2 shows the framework for extracting the context feature vector based on Doc2Vec.

Doc2Vec trains the text vector and word vector at the same time. Let the one-hot encoding vector corresponding to the text of be , and the one-hot encoding vector corresponding to the word in the text be .The vector for the occurrence of the word in the text can be constructed as follows:

Substituting the vector of the word into the neural network model of Doc2Vec, the output of the hidden layer vector can be expressed as follows: is the parameter matrix between the input layer and hidden layer, is the bias in the neural network model of Doc2Vec, and is the activation function.

Then, we use the output of the hidden layer and the softmax function to predict the vector of the word , which is expressed as follows: is the parameter matrix between the hidden layer and the softmax layer, and is the bias in the neural network model of Doc2Vec.

Finally, we can use the vector to construct the loss function of the word :The distance function measures the differences between and , which is the Euclidean distance in general. By optimizing the loss function, and can be obtained. Taking as input, the 2048-dimension context feature vector can be obtained:

In addition, the candidate news in People’s Daily determined as the front-page news, mainly relates to its text content. Nevertheless, due to the limitations of the front-page layout and posttypesetting, some candidate news with a too long or too short length of text content cannot be the front-page news. The context feature vector is not used to characterize the explicit features of the candidate news. Additionally, we define the attribute information that describes the explicit features of the candidate news. The attribute information includes the title length, the maximum and minimum title length in People’s Daily of the day, the text length, the maximum and minimum text length in People’s Daily of the day, the page number of People’s Daily of the day, and the label of time stage (see the definition in the section of the experimental data). They are used to construct the 8-dimension attribute feature vector .

3.3. StackNet Module

The StackNet module, a classifier based on a neural network, is to stack the context feature vector and attribute feature vector into the refined feature vector and then classifies the candidate news. In this section, we mainly illustrate the framework of StackNet. Denote the feature vector , the classification label of a sample in , and the classification model . The loss function is defined as

The function is used to measure the consistency of the classification result and the label across the training set. That is, the output of the function is 0 when equals to . As mentioned above, the feature of is represented by the context feature vector and attribute feature vector . We rewrite (8) as

We have obtained these feature vectors in the feature extraction module. However, the dimension of the context feature vector is larger than that of the attribute feature vector. When stacking these feature vectors, we need to align their dimensions. As shown in Figure 1, the dimension reduction of is realized by designing a three-layer fully connected neural network. In the first layer of the fully connected neural network, the input is , and the output is the 1024-dimension vector, which is expressed as

The second and third layers of fully connected neural networks are similar to the first layer, which are also the hidden layers that further reduce the dimension of . Their outputs are expressed as,

We finally obtain the 8-dimension context feature vector via a three-layer fully connected neural network. The activation function is . Note that although the more hidden layers can characterize the context feature vector more subtly, they also bring more parameters that result in a higher computational cost. In addition, is normalized by the maximum of each attribute, respectively. The normalized attribute feature vector is denoted as .

Stacking the 8-dimension context feature vector and the 8-dimension attribute feature vector is directly concatenated. The stacking feature vector is expressed as

The classification model is specifically realized by a forward neural network. The input is the stacking feature vector, and the output is the 2-dimension vector of the classification result via the softmax function, which is expressed as

actually includes two probabilities that is classified into the front-page news or the other news. We set with the classification label having the larger probability and minimize the loss function across all samples to obtain StackNet.

4. Experiment Design

4.1. Dataset

In the experiment, we extracted all news published in People’s Daily from 1946 to 2008 to construct the dataset. These pieces of news printed on the front-page are automatically labelled as the front-page news, and the reminder ones are labelled as the other news. Because of the varied styles of different leadership during the ruling period, we artificially divide the dataset into four subsets according to the time scale (see in Table 1).

More concretely, the time scale of stage 1 is from May 1, 1946, to July 15, 1977, which is considered as Mao Zedong’s period. The subset has 445,749 news, including 77,021 front-page news (17.28%) and 368,728 (82.72%) other news. In stage 2, the time scale is from July 16, 1977, to June 24, 1987, which is considered as Deng Xiaoping’s period. The subset has is 237,351 news, including 30,542 (12.87%) front-page news and 206,809 (87.13%) other news. In stage 3, the time scale is from June 25, 1987, to November 14, 2002, which is considered as Jiang Zeming’s period. The subset has 476,690 news, including 48,142 (10.10%) front-page news and 428,548 (89.90%) other news. In stage 4, the time scale is from November 15, 2002, to December 31, 2008, which is considered as Hu Jintao’s period. The subset has 207,400 news, including 15,697 (7.57%) front-page news and 191,703 (92.43%) other news. In the experimental design, we divide each subset into the training set, testing set, and validation set according to the ratio 6 : 2:2. The proportions of two kinds of news in each subset are kept in the training set, the testing set, and the validation set.

4.2. Evaluation Index

The unbalance of the number of the front-page news and the other news remains in the testing set, which makes us choose imbalanced classification indexes. Thus, four indexes, specificity, precision, recall (i.e., sensitivity) and -score, are introduced to evaluate the effectiveness of StackText. We first compute the confusion matrix of classification results. It includes four values according to the predicted class and actual class, the true positive rate (TP), which describes the consistency of positive examples (i. e., the front-page news), and the true negative rate (TN), which describes the consistency of negative examples (i.e., the other news), the false positive rate (FP) that describes the positive examples are incorrectly labelled as the negative examples, and the false negative rate (FN) that describes the negative examples are incorrectly labelled as the positive examples. Then, to keep our description as self-contained as possible, we present these indexes briefly.

4.2.1. Specificity

Specificity measures the prediction performance as the percentage of the negative examples correctly predicted among all predicated negative examples. Its specific formula is as follows:

4.2.2. Precision

Precision measures the prediction performance, which is the percentage of the positive examples among the predicted results. Its specific formula is as follows:

4.2.3. Recall

Recall measures the prediction performance, which is the percentage of the positive examples correctly predicted among all predicated positive examples. Its specific formula is as follows:

Note that the indexes of recall and sensitivity equal in the formula.

4.2.4. -Score

-score is generally defined in mathematics to reconcile the results of precision and recall. Its specific formula is as follows:In the process of merging, the weight of recall is times precision. In Ref. [33], it illustrates the effect of on the derivation of -score. In most cases of evaluating classification models, setting thinks that precision and recall are equally important [34]. Herein, we follow the common application of -score and use -score to evaluate StackText. -score is defined as follows:

5. Experimental Results

The experimental results are obtained by classifying the above-mentioned subsets based on StackText and other benchmark methods. We use diverse benchmark methods, which include the algorithms based on statistical learning, extreme gradient boosting (XGBoost), and random forest (RF), and the algorithms based on deep learning, FastText, TextCNN, TextRNN, and TextRCNN. For these algorithms based on statistical learning, they only use the 2048-dimensional context feature vector in the feature engineering to train the classification models. For the algorithms based on deep learning, directly input textual context information of news into their corresponding pretrain models to refine the classification models.

Before verifying StackText, we explore the necessity of the new sampling module firstly. By neglecting the new sampling module in the flowchart of StacText, news in the (imbalanced) training set is directly translated into the context feature vector and attribute feature vector based on the feature extraction module. After that, we train StackNet via the input of these two feature vectors and use StackNet to classify candidate news into the front-page ones and others. The same training set is used for these benchmark methods. Figure 3 shows the classification results based on four subsets.

Concretely, specificity and precision are shown in Figures 4(a) and 4(b), respectively. It can be seen that all classification models show high values of specificity and relatively low values of precision. We can easily understand that the unbalance of the number of the front page, and the other news makes the trained classification models prefer to predict the candidate news as the negative examples and results in high values of specificity. It is also verified through lower values of the sensitivity shown in Figure 4(c). However, the interest of the classification task is to correctly predict the candidate news as positive examples. We thus tend to comprehensively consider precision and recall, which is synthetically evaluated by -score. In Figure 4(d), we find that StackText and the algorithms based on deep learning show relatively high values of -score in comparison with the algorithms based on statistical learning. On average, we present four evaluation indexes in Table 2, and further find that StackText presents the highest value of -score.

According to the above-mentioned analysis, we know that the absence of the news sampling module greatly affects the performance of classification models. Thus, it is expected that the performance (i.e., -score) of classification models can be improved by introducing the news sampling module to obtain a balanced training set. We train classification models of StackText and other benchmark methods based on the balanced training set and use them to obtain the classification results. Figure 4 shows the performance of classification models indicated by four evaluation indexes.

Concretely, the values of specificity and precision are reduced except that of RF classification model (see in Figures 4(a) and 4(b)). Nevertheless, the values of sensitivity are improved except for that of the RF classification model (see in Figure 4(c)). The reason is that the balanced training set makes the classification models equally predicate candidate news as the front-page news or the other news. More candidate news is classified into the front-page news, which increases (reduces) the number of true and false positive (negative) samples in comparison with the previous classification results. Meanwhile, we find that the increasing number of true and false positive samples greatly improves the values of -score.

We average the values of four evaluation indexes in respect to 7 classification models (see in Table 3). In comparison with the classification results in Table 2, we explicitly see that although they uniformly reduce the values of specificity and precision, six classification models (except the RF classification model) increase the values of recall and -score. It verifies the good effects of the news sampling module. Meanwhile, the classification results also suggest that six classification models present diverse increasing or declining degrees of four evaluation indexes. In comparison with these benchmark methods, StackText shows much better increasing or declining degrees and the highest values of specificity, precision and -score. Thus, we deem that StackText is a better front-page news classification model, and it is necessary to introduce the news sampling module.

6. Conclusion

The task of categorizing the front-page news of national newspapers and periodicals correctly is of great significance for judging changes in national political and economic policies. In this paper, we propose a front-page news classification model (i.e., StackText) to solve such a task. StackText is constructed based on the stacking of the textual context information and attribute information of news, and its framework includes three important modules involving news sampling, feature extraction, and the classifier based on a neural network (i.e., StackNet).

Taking People’s Daily in the period 1946–2008 as an example, we testify the classification performance of StackText based on four evaluation indexes. The experimental results suggest that StackText outperforms the classification models built via both the algorithms based on statistical learning and the algorithms based on deep learning and is relatively insensitive to the news sampling module. We know that the front-page news classification is an imbalanced classification task. This also suggests that StackText may be suitable for the imbalanced text classification task to a great extent.

In further, correctly categorizing candidate news into the front-page news via StackText can make us quickly and deeply mine the economic and political policy information. Such a task will be performed in our future works.

Data Availability

The experimental data used to support the findings of this study were supplied by mirror site of People’s Daily under license and so cannot be made freely available. Requests for access to these data should be made to S. C. (e-mail: [email protected]).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

K. W, K. C, M. C, L. Z, H. Y, and S. C. proposed the motivation; K. W, K. C, and M. C. designed and performed the experiment; Z. Y, L. Z, H. Y, and S. C. prepared and wrote the manuscript.

Acknowledgments

This work was partially supported by the National Nature Science Foundation of China (grant nos. 61673086 and 61803352), the Security Construction Foundation of Civil Aviation Administration of China (no. AQ20200019), and the Foundation of CAFUC (no. J2021-072).