Abstract

Interactive information between listed companies and investors is an emerging form of information disclosure in the stock market and has a crucial impact on stock market analysis. Interactive information interferes with investors’ decision-making by influencing their sentiments, significantly affecting the stock market’s health. Due to the unique nature of interactive information, the existing approaches to dynamic interactive information sentiment analysis rarely consider a multistep study under the tradeoff of cost and accuracy. To address this problem, we propose a novel unified framework combining DeeBERT with the sequential three-way decision based on the early exiting mechanism and continually mining uncertain regions for interactive information sentiment analysis. Specifically, we treat the question-answer pair interaction information as an entire sample and leverage DeeBERT to allow the samples to exit early without traversing through all the Transformer layers. Subsequently, the uncertain samples are selected at each Transformer layer to be reinvestigated at the next granularity of time-evolving data. Besides, we utilize a manual modification method to assign the determined samples to training sets to update the model. Lastly, a series of comparative experiments demonstrate that our proposed model has outstanding performance in terms of time efficiency and interactive information sentiment index.

1. Introduction

Information asymmetry in the Chinese stock market can cause huge losses to investors every year [1, 2]. To alleviate this problem, the China Securities Regulatory Commission (CSRC) and the Shenzhen Stock Exchange (SZSE) have established many digital interactive media, such as HudongYi [3]. Digital interactive media is an emerging interactive technology for communicating between investors and the senior management of listed companies. Typically, digital interactive media exhibit three characteristics: prompt question and answer via short texts, prompt information dissemination, and rich sentiment information.

Many studies in economics and finance have proven that social media content can directly reflect, influence, and even spread investor sentiments [48]. According to the irrational investor hypothesis, individual investors’ sentiment directly impacts their investment decisions [9], and plenty of irrational investment behaviors can result in dramatic fluctuations in stock prices [4, 6, 7, 10]. Consequently, quantifying and analyzing the sentiment of digital interactive media plays a crucial role in probing investor behaviors and predicting stock prices. This topic has received much attention from researchers who have developed many sentiment classification models, such as the emotion space model [11], the heterogeneous information fusion model [12], and the contextual entropy model [13]. Nevertheless, these models suffer from the following two main drawbacks.(i)Low Accuracy. There are many problems with short texts in digital interactive media, such as the feature sparsity of short texts, the usage of nonstandard words, and irregular grammar. These problems make it difficult for traditional models to learn short texts, thereby leading to low sentiment classification accuracy rates.(ii)High Time Cost. Classifying the sentiment of interactive information is a time-consuming decision problem that traditional models hardly consider. However, investment decisions in the stock market rely heavily on the promptness of sentiment analysis. Therefore, it is challenging to apply conventional models to practical investment decisions.

To solve the problems existing in the traditional models, this study proposes DeeBERT-S3WD, which combines DeeBERT [14] with the sequential three-way decision (S3WD) [15] to quantify and analyze the sentiment of interactive information in digital interactive media. Specifically, DeeBERT is an enhanced BERT, which uses a dynamic early exiting mechanism to accelerate the inference and decision-making process of the original BERT. That is, DeeBERT introduces an additional decision-making function to each layer of the original BERT. When the decision outcome hits a predetermined threshold, DeeBERT immediately exits the subsequent decision process. DeeBERT preserves the powerful representation learning capability of BERT for texts and can solve problems, such as the feature sparsity of short texts, the usage of nonstandard words, and irregular grammar. Besides, DeeBERT accelerates the decision process via its dynamic early exiting mechanism and can solve the second problem mentioned above. Consequently, DeeBERT is ideally suited for the sentiment classification of interactive information in digital interactive media. However, DeeBERT slightly sacrifices the decision accuracy due to its dynamic early exiting mechanism. To address the shortcoming of DeeBERT, we introduce the concept of S3WD to DeeBERT; that is, we add an uncertain set to cache samples that the current layer of DeeBERT cannot evaluate. These samples in the uncertain set will be further learned and judged by the subsequent layers of DeeBERT.

The ablation study demonstrates that DeeBERT-S3WD may significantly reduce the time cost of interactive sentiment analysis while maintaining its accuracy. In addition, substantial experimental results have shown that the sentiment index performs well in econometric models and machine learning algorithms. In conclusion, this research makes the following three unique contributions:(i)Based on the character-level embedding technique, DeeBERT-S3WD successfully addresses the representation learning challenges of short texts, leading to improved mining of the interactive information’s features.(ii)DeeBERT-S3WD employs the concept of dynamic early exiting to establish a balance between accuracy and inference time, which partially compensates for the time burden problems caused by dynamic training.(iii)DeeBERT-S3WD incorporates the S3WD concept into DeeBERT, thereby enhancing the effectiveness and efficiency of classifying interactive information sentiments. The key techniques of this study have been implemented and deployed on the Stock++ (Stock++ is a quantitative analysis system for securities’ market risk, which can be accessed at the website https://intelligentstock.cn/sub-vue/home) system [16].

We organize the remaining sections of this study as follows. In Section 2, we briefly introduce information sentiment classification and the concept of multigranularity computing with the three-way decision. In Section 3, we propose DeeBERT-S3WD for classifying the sentiment of interactive information in digital interactive media. Section 4 reports the experimental results, and Section 5 concludes this study.

With the emergence of digital interactive media platforms, investors and listed companies have gained a direct channel to interact and communicate. As a brand-new factor influencing the volatility of the securities market, research on the impact of interactive information on the risk volatility of the Chinese stock market is urgent and essential. The classification of information sentiment has long been of interest to academics, and related research has steadily increased over the past few years. However, there is a lack of research on quantifying the sentiment of interactive information in digital interactive media. Therefore, this study provides a systematic review of relevant research on the classification of information sentiment.

2.1. The Traditional Classification of Information Sentiment

The influence of Internet information on stock markets has been increasing considerably because of its exponentially increased volume and rapid dissemination. As a result, the researchers’ attention has gradually shifted to the sentiment factor of Internet content. Due to the limitations of text mining techniques in the early stages, the amount of Internet information was used as a sentiment indicator. For instance, [17] discovered that the amount of news on Dow Jones and Company was directly related to the aggregate indicators of stock market activity, such as trading volume and market returns, and investors tended to react slowly to negative news. However, utilizing news amounts was insufficient to reflect the influence of news.

To overcome these restrictions, researchers resorted to text mining techniques to extract valuable information from media [18]. As a pilot study, [19] measured the negative (positive) sentiment polarity of an article in terms of the fraction of negative (positive) emotional words in the document in which the sentiment words are determined by the general emotion word dictionary Harvard-IV-4. Because of their simplicity and easy implementation, these proportion-based sentiment analysis methods were widely adapted to study the influence of textual information. However, this proportion-based approach determines the sentiment polarity solely by general emotional words without considering the syntax of sentences, which could make an opposite analysis of sentiment polarity for the sentences containing negative prepositional phrases like “not” and “without.” In addition, general sentiment words may not be emotional in finance [20]. For example, the general negative sentiment word “tire” was typically used to identify a specific firm.

Furthermore, an emotionless word can be a sentiment in finance. For example, “bear” originally refers to a carnivorous mammal, indicating widespread pessimism in finance, such as a “bear market.” To improve the precision of the sentiment analysis models, it is necessary to determine the documents’ sentiments in terms of financial sentiment words rather than general sentiment words. Therefore, some researchers resorted to more complicated and advanced sentiment analysis techniques [19, 2124]. For example, [25] proposed a statistical model to detect finance-oriented sentiment words and represented news articles with financial sentiment words to study their influence on stock markets.

2.2. The Modern Classification of Information Sentiment

With the substantial increase in computational power, deep learning has begun to receive more attention, providing a new way to analyze information sentiment. In 2018, the Google AI team released a new language model, BERT, which brought a breakthrough in pretrained models in natural language processing and achieved state-of-the-art performance on various natural language processing tasks. Yu and Jiang [26] proposed a multimodal BERT architecture for multimodal target-oriented sentiment classification, and Zhang and Zhang [27] proposed a novel neural network. The main innovations of this approach lie in improving user, item, and word representation learning. They utilized BERT to obtain contextualized word representations and seamlessly feed them into their model to replace the original representations. Lei et al. [28] considered context-aware sentiment analysis as a sequence classification task and proposed a BERT-based hierarchical sequence classification model. It is evident that the BERT has good performance in handling sentiment classification.

BERT-LARGE has up to 340 million parameters, whereas BERT-BASE has 110 million parameters. Even though Google has officially released a pretraining model, the fine-tuning time is still long. The inference time of Fast-BERT [29] may change dynamically to meet various needs. In addition, Fast-BERT is fine-tuned with a novel self-distillation process with enhanced computational performance and little performance degradation. Distilled BiLSTM [30] was proposed to streamline BERT-LARGE into a single layer of BiLSTM, therefore decreasing the number of parameters by 100 times and boosting the speed by 15 times. DeeBERT [14] uses the concept of an early exiting to accelerate BERT inference. If the sample can be classified in the shallow layer, there is no need to traverse all Transformer layers. DeeBERT slightly sacrifices the decision accuracy due to its dynamic early exiting mechanism.

Our study is inspired by DeeBERT, in which the early exiting mechanism saves time and enables researchers to react more sensitively to information from financial markets. However, they are more concerned with model shrinkage, inference speed improvement, and lack of attention to model accuracy. We invoke the idea of sequential three-branching, in which the samples are divided into positive, negative, and uncertain at each inference stage of the model. The uncertain samples will continue to be classified in the subsequent steps.

2.3. The Multigranularity Computing

Multiple layers and views comprise the multigranularity structure [31]. A granule creates a granular hierarchy and forms an entire granular structure. The granular structure may abstractly describe complex systems, which can then be decomposed or synthesized via granular computing. Using granular computing in three-way choices can improve the interpretability of the decision phases. In cognitive science, Yao [32] proposed a description of “three” based on the three-way decision, including “three steps,” “three elements,” and “three levels,” and further proposed a trisecting-acting-outcome (TAO) model by considering the cognitive problem in “three.” The TAO model reflects a basic structural framework of the broad sense of the three-way decision process. The objects are classified into three categories via trisecting, and decisions are made in acting. Finally, the results of the effectiveness of the “trisecting” and “acting” process are evaluated in “outcome.” The process provides interpretability in classifying uncertain problems and the thinking in “three.” The TAO model provides a feasible interpretation form for multigranularity classification and abstract representation.

With the advent of dynamic decision-making, the traditional single-step (static) decision process has found it difficult to satisfy practical requirements, whereas the necessity for multistep decisions has increased. Complex decision-making processes should be analyzed using multigranularity to get more intelligible and interpretable results. Several researchers in recent years have extensively explored it. For example, a new S3WD model based on a penalty function to improve the classification accuracy by modifying cost parameters was proposed [33], and a cotraining method was incorporated into S3WD to label new samples with higher confidence [34]; a novel dynamic update method of three-way granular concepts was discussed [35], and other research works described granular computing and S3WD [15, 3639] and three-way granular computing based on multilevel and multiview structure [4042]. Therefore, in the interactive information sentiment analysis in this study, the application of the multigranularity method combined with S3WD ideas can be well reflected in the sequence classification task of DeeBERT.

2.4. The Sequential Three-Way Decision

To improve the accuracy of the DeeBERT model, we combined it with the sequential three-way decision models (S3WD), which balance efficiency and decision time. Yao [43, 44] introduced three-way decisions in 2010 as a valuable method for handling uncertain problems and processing information based on a decision-theoretic rough set (DTRS) and Bayesian risk decision-making [45]. It divides the analyzed object into positive, negative, and uncertain [44]. Yao [15] was the first to propose the S3WD model, a multilayer framework leading to faster decision-making at a reduced cost. Based on the DTRS architecture, many S3WD models were created. For example, Li et al. [37] presented a cost-sensitive sequential 3WD model for image classification. The cost-sensitive S3WD technique uses the decision-theoretic rough sets (DTRS) model, while conventional 3WD employs a static strategy. This model made an excellent and low-cost decision based on high-quality facial images.

The optimal decision depends on the minimum decision cost according to the currently available information. In the case of insufficient information, the decision may fall into a boundary domain with high-cost loss. Thus, it is necessary to search for an appropriate sequential decision strategy to balance the decision and test costs, leading to a lower overall decision. In the past decade, S3WD has been widely applied to different fields in decision-making. Li et al. [37] proposed a sequential three-way decision method based on granular computing for cost-sensitive face recognition. Zhang et al. [46] proposed a cost-sensitive combination technique using sequential three-way decisions in sentiment classification. Li et al. [47] presented a DNN-based sequential granular feature extraction method to extract a hierarchical granular structure from the input images.

Some recent papers focused on introducing the S3WD-based sentiment analysis methods. For example, Basiri et al. [48] proposed two deep fusion models based on three-way decision theory to analyze drug reviews. Yang et al. [49] suggested a temporal-spatial three-way multigranularity learning framework for dynamic text sentiment categorization to address dynamic data uncertainty. Zhang et al. [46, 50] introduced the 3W-CNN model for an upgraded three-way convolutional neural network to improve sentiment classification accuracy.

Nevertheless, few researchers applied S3WD in the financial area. For example, Shen et al. [51] proposed a novel three-stage reject inference learning framework by using transfer learning and three-way decision theory in credit scoring. Maldonado et al. [52] provided an approach for applying three-way decisions for credit scoring problems that are expensive and time-consuming.

In addition, S3WD models have less applicability for identifying interactive stock market information sentiment. S3WD models may improve classification efficacy and precision; this study represents an approach that combines the DeeBERT model with S3WD models to decrease time cost while ensuring accuracy dramatically. Using the S3WD model to increase classification accuracy, DeeBERT can successfully address the short text and class-imbalanced issues. Consequently, this combination is suitable for balancing efficiency and effectiveness.

3. DeeBERT-S3WD and Experiments

This study classifies the sentiment of interactive messages on digital platforms. Therefore, based on previous research, we design an intelligent framework of DeeBERT and sequential three-way decisions to solve this problem efficiently. First, we obtain investors’ inquiries over digital platforms using web crawler technology. Next, we use the DeeBERT-S3WD model to capture the sentiment tendencies of interactive information within the massive data available through the digitized media. Finally, we utilize different predictive analysis models to investigate the effect of interactive information sentiment on the stock market.

3.1. The Research Data

This study collected data from China’s three largest digital interactive platforms: Panorama Network (Panorama Network (https://www.p5w.net) was founded in 1999 as an independent website of securities’ finance and economy in Mainland China, and Panorama has accumulated more than 900, 000 pages of data and information reserve over the past ten years, which can provide users with public information of listed companies in the first time), eHuDong (eHuDong (https://sns.sseinfo.com) is a communication platform established by the Shanghai Stock Exchange and used by all participants for free. The website aims to guide and promote information communication between listed companies, investors, and other market participants), and HudongYi (HudongYi (https://irm.cninfo.com.cn) is an interactive platform for investor relations of listed companies launched by the Shenzhen Stock Exchange4. The primary purpose of HudongYi is to meet the communication and interaction needs between listed companies and investors. On the platform, investors can ask questions directly to listed companies, which can respond) [53]. The daily number of visitors might reach 2 million. Unlike conventional media, these platforms have forums for each listed company. In other words, users can publish questions on discussion boards, and representatives of each company make formal replies. Since there is no publicly available dataset of interactive information for the digital interactive platform in academia, this study extracted 1.62 million interactive information (question-answer pairs) from 1, 765 listed companies between 01/2012 and 12/2018 via a distributed web crawler system from Panorama Network, eHuDong, and HudongYi. Next, we completed the data preprocessing, such as interpolating missing values and eliminating duplicated values, based on Spark distributed technology. In addition, question-answer interactions represent the entire information disclosure process. The question-answer interactive information is manually labeled with sentiment classes. To reduce the workload of sentiment annotation, we resorted to a professional Chinese financial sentiment lexicon created by the previous work [54].

3.2. The Intelligent Collaborative System Framework

The intelligent framework for classifying the sentiment of interactive information in this study is shown in Figure 1, which mainly includes some essential parts, such as document preprocessing and document vectorization.

3.2.1. The Documents Preprocessing

Data cleaning is an indispensable step in the whole process of data analysis. The quality of the results is directly related to the effect of the model and the conclusion. Therefore, before starting the experiment, we must preprocess the documents obtained. The main steps of document preprocessing include removing missing data, replacing duplicate values, and manual sentiment annotation of interactive information (question-answer pairs) documents.

3.2.2. The DeeBERT-S3WD Model

The DeeBERT-S3WD model proposed in this study is an optimization improvement of the BERT model. The BERT model is a combination of pretraining and downstream task models, which means that the BERT model is still used when doing downstream tasks, and it naturally supports text classification tasks. However, the inference speed of traditional BERT is slow. It is difficult to effectively solve the sparsity and feature dispersion problems of short texts and lacks adaptability in the face of massive data processing tasks. In addition, it is necessary to balance misclassification cost and time cost to improve the accuracy and obtain the optimal decision in dynamic and uncertain decision-making, especially in text classification tasks for the securities market. Therefore, this study proposes an intelligent framework for textual sentiment classification that combines dynamic early exiting DeeBERT with the sequential three-way decision to analyze the sentiment analysis of interactive information in digital platforms.

(1) Multigranularity Sentiment Classification. The sequential three-way decision provides a valuable and meaningful simulation of the decision-making process. This study combines this model with multigranularity sentiment classification for interactive information sentiment analysis. In the S3WD model, a delayed decision is more reasonable when information is unavailable, and a precise decision cannot be reached quickly. What is more, an incorrect decision may result in a higher cost. The more available information can be used, the more precise granular feature can be obtained. Thus, a delayed decision in the earlier steps could be a low-cost decision with high performance [47].

A multistep (sequence) decision-making procedure is hierarchical and time-consuming. Hierarchical granularity can be seen as a supplement to sequential decision-making. The combination of S3WD and GrC is an efficient method to classify uncertain objects and achieve lower-cost results. The definition of three regions can be described in the following dynamic scenarios.

Definition 1. The multilevel structure, , and the th level of granular structure, , are given, where , and , respectively, represent a finite and nonempty set of objects, conditional attributes, and evaluation functions. In the sequential decision-making process, the th level of three regions can be defined aswhere (, ) is a pair of thresholds under the th step granularity .
The S3WD models are built based on information tables, including attribute and value sets. But interactive information sensitive analysis covers a data set rather than included values. To address this problem, a more general S3WD model was proposed by [37], which is not restricted to rough sets and information tables, where an S3WD model is formulated as follows.

Definition 2. An S3WD set is given, where is the th step in the decision process. The subset and the multigranular set are given, where the th granular set is . Thus, the sequential decision process is described as follows [55, 56]:where is the optimal three-way decision with the minimum risk cost in the th step of decision-making.
The S3WD models consider not only the decision cost but also the test cost because high-quality granular features tend to obtain a low misclassification cost with high time consumption, which increases the total cost of a decision. Therefore, it is necessary to balance the total cost and correctly evaluate a decision by considering the test cost [47]. In this study, the test cost of interactive information sentiment analysis denotes the th base classifiers in the DeeBERT model, including the time cost of the first and next base classifiers [46].
In the S3WD models, consider a state set with only two states , indicating that the event belongs to and does not belong to . Given the decision actions set as the positive, negative, and boundary decision under the th step, . In standard scenarios, different decisions will produce various losses, so a function loss value is derived, and the corresponding decision loss matrix [45] is as follows.
In Table 1, , , and is the corresponding loss values when and the three decisions of , , and are adopted, while the other loss values , , and are defined when and , , and in the th step decision-making process. The relationship between the three loss values isAccording to the risk loss matrix and the Bayesian decision method of minimum risk, the expected loss of three different decisions is calculated. Given the expected risk corresponding to the three decisions of in the th granular layer,Compute the minimum decision cost according to three expected decision costs , and [55]. The optimal selection S3WD decision can be calculated as follows:With the continuing of each single-step decision, decision cost will increase gradually. The decision cost can be computed in a single th step decision :where is a decision cost matrix.
In a cost-sensitive sequential three-way decision, the th step decision is acquired by minimizing the decision cost through the th granular feature information. The training time for a DeeBERT is related to the training loop and decision step, which is computed as the test cost. The total cost in the th step can be calculated according to the Bayesian decision procedure [37, 46, 47]:where denotes the test cost. Based on the decision costs and the decision selection with the minimum cost as the optimal decision for the th step, The optimal results of sequential decisions are determined by both the misclassification costs and test costs. If the granular features are not precise enough, the decision results in a high misclassification cost and needs more training time to obtain available and valuable information. Thus, the minimal cost is a crucial criterion in the termination step in the sequential decision process [47].
(2) DeeBERT-S3WD Predicting Model. This section describes the process of interactive information sentiment classification using the BERT-sequential three-way prediction model. BERT has achieved good results as a large-scale pretraining model. The requirement of BERT for input text makes it more suitable for handling short textual data. The interactive information of investors in the market is usually short text, so BERT is useful. However, the inference speed of BERT makes it notorious for its massive number of participants, which makes its training and inference time substantially longer.
DeeBERT [14] uses a dynamic inference mechanism to determine whether the sample has enough information to exit at each Transformer layer except the last one. If satisfied, it can leave early without going through all Transformer layers so that the sample exits early to speed up BERT inference, dramatically improving the inference speed at the expense of a small amount of accuracy. However, the input data of DeeBERT is static. After training, the model is validated on the testing set, and the accuracy and inference time are output. The information in the stock market changes rapidly, with a constant stream of information input daily. A reasonable model should be able to make decisions based on the latest information when judging each sample rather than using unchanging and ancient information.
Therefore, this study proposes a new inference decision system based on the improved DeeBERT, named DeeBERT-S3WD, drawing on previous research. The traditional DeeBERT makes decisions at each Transformer layer except the last one, exits early if there is enough confidence, continues to enter if there is not enough information, and outputs directly at the last Transformer layer if the sample finally enters the previous Transformer layer. DeeBERT-S3WD uses three-branch decisions at each Transformer layer in the inference stage; that is, the results are divided into the positive, negative, and boundary domains. The samples in the boundary domain will go to the next Transformer layer for judgment. DeeBERT-S3WD still makes judgments at the last layer, and if the sample is confident enough to exit early, then the sample is not retained. We manually modify these early exiting samples to make them into training set training models. Let the retained samples wait for the model update to enter the next batch of samples to be tested. These steps are shown in Figure 2 and Algorithm 1.
(3) Dynamic Early Exiting Mechanism. We use Algorithm 2 to illustrate the dynamic early exiting mechanism. In this study, a judgment is added to each Transformer layer, which allows the sample to exit early without going through all the layers. One of the last Transformer layers continues to make judgments with sufficient and insufficient confidence. The sample is retained and tested with the next fresh batch of test sets after waiting for the model to be updated.
A smaller threshold enables more samples to make decisions at a deeper level, resulting in a higher accuracy of the sample’s output per level but a lower speed. A larger threshold allows more samples to be made at a shallow level, resulting in a decrease in the accuracy of the sample output per level but an increase in speed. With this criterion, we hope to find a suitable set of and that minimizes the sum of misclassification cost and total time cost.

Input: data.
1while do
2
3
4 for do
5  for i = 0 to n do
6   
7   if then
8    ;
9    break;
10   end
11   if then
12    
13    break;
14   end
15  end
16  
17 end
18end
1Initiate use a large-scale text training set for training BERT;
2repeat
3 Dataset
4 Classify Three subsets of Dataset;
5 Correction Manual correction of the positive and negative sets;
6 Update the model with a certain set;
7until ;

3.2.3. Validate the Applicational Effect of the DeeBERT-S3WD Model

(1) Regression Model for Index Tests. Here, we use the regression model to test the effectiveness of interactive information sentiment classification. Thus, we adopt the most representative Fama-French 3-factor model [57] in the field of finance in our study:where is the expected return rates of a portfolio on the th day, is the risk-free return rates on the th day, and is the return of the market portfolio on the th day. is the scale factor, which stands for on the th day, and is the net asset market value ratio factor, indicating the differences between high book-to-market ratio minus low book-to-market ratio on the th day. and are used to measure the historic excess returns of small capitals over big capitals and value stocks over growth stocks.

We add the interactive information sentiment indicator into the Fama-French 3-factor model to evaluate the effect of textual information on stocks as suggested by [19], and we hope that the indicators of sentiment refined by DeeBERT-S3WD will have good performance in our experiments.

(2) Machine Learning Models for Index Tests. To further verify the effectiveness of the quantified indicators of interactive information sentiment, we also design some comparative experiments based on the computer science field. Specifically, we add interactive information sentiment indicators to traditional machine learning models (Bayes, SVM) and deep learning models (CNN, LSTM). By observing the effect changing of the machine learning model before and after the classification of interactive information sentiment indicators added, we can directly get the differences in models’ ability to capture the future trends of the stock prices. Therefore, we can obtain the performance of quantitative interactive information sentiment indicators in the practical stock market.

4. Experimental Results

4.1. Classification Performance of DeeBERT-S3WD

In this section, we evaluate the effectiveness of DeeBERT-S3WD in analyzing the sentiment of interactive information. In particular, we select SVM, CNN, DeeBERT, BERT, RoBERTa [58], and DeBERTa [59] as baseline models and test them on a large-scale corpus containing 100, 000 interaction texts with sentiment labels (positive or negative). As shown in Table 2, DeeBERT-S3WD outperforms SVM, CNN, and DeeBERT but is inferior to the state-of-the-art models, such as BERT, RoBERTa, and DeBERTa. This is because we build DeeBERT-S3WD based on the dynamic early existing mechanism of S3WD, which sacrifices accuracy for inference time.

In addition, since the interactive information in the digital platforms is class-imbalanced datasets and short text information, this may lead to underperformance in sentiment analysis. Therefore, we decide to compare the performance of the DeeBERT-S3WD-based approaches with the traditional financial methods (proportion-based approaches). In the conventional financial field, the proportion-based approach proposed by [19] is one of the most classic ones. The primary measure of media content is the standardized fraction of negative (or positive) words in each news story. In this experiment, the final accuracy is the average of the 10-fold cross-validation. As Table 3 displays, the DeeBERT-S3WD-based sentiment analysis approaches can almost reach a human-level judgment and suggest a supreme performance compared to the proportion-based approaches.

In particular, the DeeBERT-S3WD-based approaches outperform the proportion-based approaches by 8% on positive sentiment classification and 6% on negative sentiment classification. Note that we favor the proportion-based approach by utilizing financial sentiment words instead of general sentiment words adopted by [19]. Approximately three-quarters of all negative words in a general emotion word dictionary (Harvard-IV-4) are not considered negative in a financial context [20]. Therefore, we use the financial Sentiment word list provided by [25] and revise the proportion-based approach to enhance the effect.

4.2. Decision Efficiency of DeeBERT-S3WD

In fact, time efficiency is also critical for the proposed method, especially utilizing the model in particles. Here, we conduct a time detection experiment to evaluate the time-sensitive efficiency. Particularly, we pay more attention to the actual working time of the model, so we ignore the time required for manual correction and model update.

There are two reasons for this: First, we assume the manual correction happens after the closing, and other models are at rest during the manual correction phase and the DeeBERT-S3WD update phase. Second, the manually corrected samples are only newly added samples daily. Manual correction does not require much time. Thus, our work can ensure that the model decision and update can be completed before the next opening.

As shown in Table 4, DeeBERT-S3WD is more applicable to the actual financial market since it allows a fast sentiment analysis while guaranteeing high accuracy. DeeBERT-S3WD uses S3WD to calculate the threshold sensibly, allowing samples to exit early without going through all encoder layers.

4.3. Application Effectiveness of DeeBERT-S3WD Model
4.3.1. Effectiveness of Sentiment Classification Index in Econometric Model

To verify the validity of the classification results of the interactive information sentiment classification results, we apply the interactive sentiment analysis results produced by DeeBERT-S3WD to the Fama-French 3-factor model. In our first test experiments, we can explore the performance of interactive information sentiment analysis results to evaluate the performance of sentiment classification approaches. To estimate model parameters, we collect stock price data of the SSE 50 (the SSE 50 investment index is a sample stock selected from the most representative 50 stocks with large scale and good liquidity in the Shanghai stock market according to scientific and objective methods so as to comprehensively reflect the overall situation of a group of leading enterprises with the most market influence in the Shanghai stock market. For a list, see Table 5 in the appendix) index from January 2018 to January 2019 (with a one-month time window). The results are shown in Table 6.

In recent years, researchers have found that using sentiment words or sentiment indicators has captured most media images and extracted the core ideas of text information [6, 9, 54, 60, 61]. Therefore, sentiment analysis, as the classification of media information, has become the mainstream method in the academic world. Table 5 shows the performance of the interactive information sentiment variables in the econometric model before and after they are added to the model. The interactive information sentiment analysis variable has excellent statistical significance and passes the robustness test. Specifically, the DeeBERT-S3WD-based approach has a better performance in the econometric regression model, and the interactive information sentiment (S) variable has excellent statistical significance (). Therefore, the results prove that the quantified sentiment indicators extracted through the DeeBERT-S3WD-based approaches are validated in the detection of regression models, validating the effectiveness of interactive information sentiment tendency indicators. Many previous studies have proved the practical application ability of media sentiment indicators, which is consistent with the results of this study.

4.3.2. Effectiveness of Sentiment Classification Index in Machine Learning Models

We further design experiments with machine learning models to verify the effectiveness of the classified indicators of interactive information sentiment. Specifically, we add fundamental variables (R, SMB, and HML) and classification interactive information sentiment variable (S) to traditional machine learning models (Bayes and SVM) and deep learning models (CNN and LSTM) to predict stock price future direction. Figure 3 suggests the experimental results.

In Figure 3(b), we can see the experiment results of traditional machine learning models (Bayes and SVM) and deep learning models (CNN and LSTM) with only fundamental variables. SVM model has higher prediction ACC (81%) and F1 than other models. In Figure 3(a), after adding classify interactive information sentiment variable, all the models performed better in both ACC and F1 respect. The results show the following: (1) After adding sentiment variables to traditional machine learning and deep learning models, the ACC and F1 also have significantly improved, proving sentiment variables’ influential role. (2) Compared with traditional machine learning models, the quantified interactive information sentiment variable has a more significant impact on deep learning models, and the effect is more obvious. We can prove that the predictive effect of the quantitative sentiment variable on all machine learning models has a significant enhancement effect, and the enhancement effect performs better in deep learning models.

5. Conclusion

Information is essential to the development of the stock market. Investment decisions rely on the truthfulness, precision, timeliness, and accessibility of information. The emerging digital interactive platforms provide a channel of communication between investors and listed companies, which significantly impacts the securities market. In this study, we develop a classification and analysis framework called DeeBERT-S3WD for interactive information attitudes. The extensive experiments have shown the following: (i) The DeeBERT-S3WD outperforms other machine learning models in class-imbalanced datasets, short texts, and decision time. (ii) Compared to the proportion-based approaches in finance, the DeeBERT-S3WD performs better in terms of accuracy. (iii) The sentiment classification index from the DeeBERT-S3WD can significantly improve the effectiveness of econometric and machine learning models. Finally, the DeeBERT-S3WD has been implemented and deployed on the Stock++, a quantitative analysis system for securities market risk [16].

Appendix

A. BERT Pretraining Tasks

The BERT sets up two unsupervised target tasks to obtain word and sentence-level epistasis, respectively.(i)Masked LM implements pretraining of bidirectional language models: Unlike other language models such as Word2Vec, which require the prediction of all words in the input sequence, Masked LM randomly selects 15% of words in the input data for masked operation and predicts this 15% of words by contextual words to avoid the influence of the next words on the current word. Thus, this is a true “two-way” operation. Among the 15% of the masked words, 80% are replaced by the “[MASK]” symbol, 10% are replaced by randomly selected words from the corpus, and the remaining 10% are kept without transformation.(ii)Next sentence prediction: This method is used to determine whether two sentences (A, B) are related to each other as a binary classification task. 50% of the training data (A, B) is the real next sentence as the positive example, and the remaining 50% of the (A, B) is randomly selected as the negative example for training. However, neural networks, such as LSTM and GRU, have a serial problem due to their own temporal class models, which greatly limits the training speed, while BERT adopts the Transform coding structure and uses self-attention to compute the relationship between words in parallel to solve this problem.

B. Encoder Unit of BERT Model

BERT consists of a stack of multiple Transformer encoders, which represent the document as a vector and are fed into the Transformer encoding unit. The Transformer discards the circular network structure of RNN and models a piece of text based entirely on the attention mechanism. The encoding unit is shown in Figure 4.

To address the problem that the self-attentive mechanism cannot extract timing features, the Transformer adopts a position embedding approach by defining sin and cos functions for even and odd positions, respectively, to add timing information, as shown in the following equation.where pos is the position of the current word in the sentence, the BERT model takes the sum of the word vector, the text vector obtained by embedding, and the position vector obtained by positional encoding as the model input.

Subsequently, the obtained vectors are fed into the main module of the Transformer encoding. The basic formula is shown in the following equation:where , , and are all input word vector matrices.

The fundamental concept of this method is to calculate the interrelationships of each word in a sentence for all words in the sentence and then consider that these word-to-word interrelationships reflect the relatedness and importance of different words in the sentence. These interrelations are used to adjust the weights of each word to obtain a new representation of each word. This new representation implies not only the word itself but also the relationship between other words and the word and is, therefore, a more global representation than a simple word vector.

To avoid the limited information extracted by self-attention, to extend the ability of the model to focus on different locations, and to increase the representation subspace of attention units, the Transformer adopts the Multi-Head model, as shown in the following equation:

Here, Wo is the additional weight matrix that makes Q, K, and V larger, splits a portion of them equally to each head, and merges the vectors obtained from each head.

In order to ensure efficient gradient transfer in deep networks, the Transformer adds residual network and layer normalization after merging vectors to address the problem of the missing gradient in deep models effectively, and it can break network symmetry, improve network degradation problems, accelerate convergence, and normalize optimization space.where and are the parameters to be learned and and are the mean and standard deviation of the input layer. The Feed Forward Network (FFN) layer is added for spatial transformation, FFN contains two layers of a linear transformation, and the activation function in the middle is ReLu.

Finally, the output vector of the last two steps is combined with another residual operation and normalized to perform the operations in the BERT pretraining task.

C. Fine-Tuning of BERT-Based Single-Text Classification Tasks

For the text classification task, as shown in Figure 5, the BERT model inputs a [CLS] symbol in front of the text and uses the output vector corresponding to this symbol as the semantic representation of the whole text. A simple softmax classifier is added to the top of BERT to predict the probability of the specific label c.

Here, is the task-specific parameter matrix. Once the label probabilities are obtained, there is a basis for multigranularity sentiment classification of the three decisions.

Data Availability

The data used to support the findings of the study are available at https://drive.google.com/drive/folders/1Et1iNHLgmFIRoEXpJ9UIHNUgATagG6aj?usp=sharing.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (NSFC) (72071160 and 62072379), Fundamental Research Funds for the Central Universities (kjcx20210103 and JBK2103016), Financial Intelligence and Financial Engineering Key Lab of Sichuan Province, Chengdu SWUFE Jiaozi Institute of Fintech Innovation Co., Ltd. (cgzh20210204), Research Program of Science and Technology at Universities of Inner Mongolia Autonomous Region (2021GG0164), and Financial Innovation Center of the Southwestern University of Finance and Economics, Joint Lab of Data Science and Business Intelligence at Southwestern University of Finance and Economics.