Abstract

There are many brands in cross-border e-commerce platforms. Obtaining consumers’ preference for brands will help promote the development of cross-border e-commerce industry. A brand preference prediction method of cross-border e-commerce consumers based on potential tag mining is proposed. Preprocess the cross-border e-commerce brand comment information obtained, build a HowNet emotion dictionary, and calculate consumers’ emotional tendency towards the brand on this basis. The projection pursuit regression model is optimized by differential evolution algorithm to reduce the dimension of the obtained consumer brand emotion information. Mining the potential labels of the information after dimensionality reduction, combined with Bayesian personalized sorting method and paired interaction tensor decomposition method, this paper constructs a brand preference’s prediction model to predict the brand preference of cross-border e-commerce consumers. The experimental results show that the proposed method has high accuracy of brand tendency calculation results, small average absolute error of prediction results, and high model accuracy.

1. Introduction

Under the background of the rapid development of Internet technology, cross-border e-commerce has been more restricted in the international market. With the fierce trend of mutual suppression, e-commerce not only has the problem of product infringement, but also reduces profits [1, 2]. In the international mainstream market, international big brands occupy an important position, so domestic e-commerce businesses mainly focus on nonmainstream products. The low degree of consumers’ satisfaction and preference for domestic brands is mainly due to a series of problems in China’s cross-border e-commerce, including low internationalization level, low market position, and poor innovation ability [3]. In order to improve the position of China’s cross-border e-commerce in the international market, it is necessary to obtain consumers’ preference for brands.

Li and Maomao [4] and others designed a consumer preference questionnaire to obtain the current situation information of consumer brand preference, extracted brand features from four aspects of society, brand, service, and price perceived value through in-depth learning model, and established a consumer brand preference prediction model according to the feature extraction results. However, this method cannot accurately calculate consumer brand preference, which has the problem of large average absolute error of prediction results. Sun et al. [5] and others obtained online text and consumer keyword index data and used sliding time window technology to complete the above data screening. At the same time, combined with a variety of machine learning methods, the prediction model of consumer information index is established, to obtain the consumer information index of different brands and realize the prediction of consumer brand preference. The disadvantage is that the accuracy is low.

In order to solve the problems in the above methods based on the calculation of brand preference emotional tendency, this paper realizes the prediction of cross-border e-commerce consumers’ brand preference by mining potential tags.

2. Calculation of Brand Preference and Emotional Tendency

2.1. Comment Pretreatment

One of the main tasks to predict the brand preference of cross-border e-commerce consumers is comment preprocessing. The specific process is as follows in Figure 1.

Reprocess the comment data according to the process in Figure 1 to ensure the accuracy of brand preference and emotional tendency prediction. Obtain brand comment data from the Internet or other websites through web crawler technology. The comment data is segmented by means of annotation text, word segmentation, and annotation, which is divided into clauses with the smallest unit to ensure the integrity of its semantic expression.

Compared with English writing methods, Chinese writing methods are usually quite different. In Chinese writing rules, words in sentences are not divided by display separators such as spaces. The smallest semantic unit in the text is words, which is of great significance for emotion analysis about consumer brand. As the most basic step, word segmentation has many research results in the practical problems of Chinese natural language processing [6, 7]. The function of words in sentences can been described by part of speech. Therefore, part of speech tagging is an important link in the process of comment preprocessing. The commonly used open source tools for part of speech tagging and Chinese word segmentation are shown in Table 1.

In natural language processing, the word segmentation package in the tool library is usually used to complete data segmentation, and then the weight of different data is given according to the unused work and emotional needs to build the final comment database and emotion table. So far, comment data preprocessing is realized.

2.2. Construction of Emotional Dictionary

In terms of natural language processing, most countries have built knowledge bases, such as HowNet in China, Korean WordNet in Korea, and mindnet in Microsoft. HowNet has a high position in Chinese emotional dictionary. HowNet contains the relationships between ideas. It is a semantic web based on world knowledge. Its content is more detailed. The main relationships in HowNet are shown in Figure 2.

The concept in HowNet shown in Figure 2 includes many semantic relationships such as synonymy and antonymy. Among them, the most important relationship is the synonymous and antisense relationship of words. According to the relationship between different words, the word attributes of corresponding words can be decomposed.

Decompose concepts in HowNet to obtain multiple semaphores. The similarity between words can been calculated through the distance value in the semaphore tree, as shown in formula (1):where represents the change parameter, which represents the polarity value of this type of emotion, and this parameter is greater than zero. The calculation results of the above formula are prone to errors. Therefore, the proposed method calculates the similarity between words according to the correlation between the word and the negative word reference word b1 and the positive word reference word b2, that is, the semantic tendency value , as shown in formula (2):where n and m represent the number of positive and negative reference words, respectively. When the semantic propensity value of a word is positive, it belongs to a negative word. When the semantic propensity value of a word is negative, it belongs to a positive word. Specific subjects based on brand preference classification fill the existing affective polarity dictionary, and the cross-border e-commerce prediction method of the consumer brand preference based on potential tag mining describes the attributes of words through binary W, as shown in formula (3):where D represents the emotional polarity corresponding to the word, and its value is 1 or −1. When the value is −1, it belongs to negative words, and when the value is 1, it belongs to positive words. The nouns and adjectives of B and V are as shown in formula (4):

According to the attributes of two tuple descriptors, the emotion dictionary is constructed, which is used to judge the emotional bias in brand comments.

2.3. Calculation of Brand Emotional Tendency

The proposed method transforms the text content into vector operation in vector space through vector space model [8, 9], and the semantic similarity is measured by vector similarity.

In the process of text processing, vector space model uses multidimensional feature vectors to represent each document. When the number of documents in the data set is m, dimensional vector space is used to represent the whole data set and abstract the text into vectors. The prediction method about brand preference of cross-border e-commerce consumers based on potential tag mining obtains text features through linear combination feature algorithm.

2.3.1. Word Frequency Method

The importance of features can been measured by word frequency [10, 11]. Count the frequency G(t) corresponding to all words after word segmentation, delete the entries with G(t) value of 1, and sort the remaining entries according to word frequency in order from large to small. The importance of words is directly proportional to word frequency.

2.3.2. Mutual Information Method

Mutual information describes the amount of information generated by mutual influence when two things happen at the same time [12, 13]. From the granularity of emotional analysis, the initial emotional analysis is to analyze the words with emotional color; that is, the words with emotional color are divided into positive and negative aspects, because the function of emotional dictionary is mainly to judge whether there are emotional words in the constructed emotional dictionary after word segmentation of a brand comment. represents the dictionary of emotional polarity, and its expression is as follows in formula (5):where set NF is composed of negative words, set PF is composed of positive words, and set F is composed of words with frequency higher than one. The candidate word set J is obtained through the above process, as shown in formula (6):

There are n words in set J where the cross-border e-commerce consumer brand preference prediction method based on potential tag mining selects features through the mutual information method [14, 15] and obtains the mutual information between category and entry y through the following formula, as shown in formula (7):where P(y) represents the probability of the number of documents to which entry y belongs in the total number of documents; represents the probability that the entry y appears in the data of category C, and the calculation results are sorted in the order from large to small. For features, the average value O(y) corresponding to the entry is calculated by the following formula:where k represents the number of categories; represents the probability corresponding to category .

2.3.3. Linear Combination Feature Selection

The proposed method combines word frequency and mutual information to calculate the feature score of consumer brand emotion, as shown in formula (9):where represents the regulating factor; Freq (yi) represents the word frequency corresponding to the entry yi. According to the brand emotion characteristics extracted in the above process, the brand emotion tendency is obtained.

3. Cross-Border E-Commerce Consumer Brand Preference Prediction Regression Algorithm

3.1. Projection Pursuit Regression

Projection pursuit combines computer science, modern statistics, and applied mathematics. It is a high-tech and can complete nonlinear, nonnormal, and high-dimensional data processing and analysis [16, 17]. The proposed method constructs a projection pursuit regression model to process the brand emotional tendency data obtained in the above process, to realize the prediction of consumer brand preference.

3.2. Differential Evolution Algorithm

Different from evolutionary strategy, evolutionary programming, and genetic algorithm, differential evolutionary algorithm [18, 19] has many advantages. The algorithm obtains mutation operator according to the difference vector between parent individuals, sets probability, cross processes mutation individuals, and parent individuals, generates new individuals, that is, test individuals, and selects the optimal individual between test individuals and parent individuals according to the fitness. The prediction method about brand preference of cross-border e-commerce consumers based on potential tag mining adopts differential evolution algorithm to optimize the prediction method about brand preference to complete the dimensionality reduction processing of brand emotional tendency data. The processing process is as follows.

Considering that the brand affective tendency data obtained by projection pursuit regression has certain pertinence, scale, and representativeness, in order to obtain the optimal brand affective tendency data that meets the requirements, this paper uses differential evolution algorithm to reduce its dimension. Initialize the population in the feasible solution space, and a new population will be generated after the cross mutation operation of the current population. Then, the selection operation based on greedy idea is used to select the two populations one-to-one, so as to obtain a new generation of population. The specific process is to perform mutation operation on individual in each t time through formula (9) to obtain the corresponding mutant individual , as shown in formula (10):where is the parent basis vector, is the parent difference vector, and is the scaling factor. Then, and were processed by cross operation to obtain test individual . The objective functions of individuals and are constructed, and the individuals with lower function values are selected as the individuals of the new species group, as shown in formula (11).where is the objective function. In order to reduce the complexity of the algorithm, the feasible set scale is N1, the infeasible group scale is N2, and the maximum scale is N3. The total complexity of the diversity group is M = (N1, N3). Then the complexity of the difference algorithm after one iteration can be expressed as formula (12):

If the search scope is N = N1 + N2 + N3, it indicates that the data dimension of brand emotional tendency is effectively reduced.

3.3. Regression Algorithm of Brand Preference Prediction Based on Potential Tag Mining

Based on potential tag mining, the proposed method combines Bayesian personalized ranking method BPR and paired interaction tensor decomposition method (PITF) to build a brand preference prediction model.

Tag addition belongs to the user’s information release behavior, which often includes the user’s understanding of the content characteristics and themes of the brand they are browsing or the expression of the content themes they are interested in, which more clearly implies that the user is very interested in the content themes involved in the brand. So, according to the contextual relationship between brands and users, the proposed method mines high-value label samples to obtain the following labels:(1)Negative class label: .(2)Potential label: .(3)Positive label: .

On the basis of the above labels, make word frequency statistics on the brand preferred words of cross-border e-commerce consumers, arrange the word frequency in descending order, select the first 100 words, and draw the word cloud diagram using word cloud in word cloud module, as shown in Figure 3.

Based on the above labels, we can predict the brand preference of cross-border e-commerce consumers. Bayesian personalized ranking method (BPR) obtains the correct ranking of brands based on Bayesian theory [20], as shown in formula (13):where a (Γ) represents a priori probability function; Γ represent the parameters of the interactive tensor decomposition model; represents the likelihood function. Assuming that all preferences are independent of each other, the maximum a posteriori probability can be described by the following formula:where yA represents the observation label; D represents the triple set of < user , brand i, label yA >, and its expression is as follows in formula (15): represents the probability of partial order relationship. The proposed method describes the above probability through sigmoid function, as shown in formula (16):

In the formula, GA and GB represent the prediction score, and the parameters to be obtained can be calculated by the following formula:

According to the relationship among users, brands, and labels, the proposed method models the preference through the paired interaction tensor decomposition method to obtain the preference value of preference for label yA, as shown in formula (18):

Among them, Ui, Vi, , , and all represent the implicit factor matrix.

Based on the above calculation results, build a prediction model of cross-border e-commerce consumer brand preference, as shown in formula (19):

According to the solution results of the prediction model, the brand preference prediction of cross-border e-commerce consumers based on potential tag mining is completed.

4. Experimental Analysis

In order to verify the effectiveness of the brand preference prediction method of cross-border e-commerce consumers based on potential tag mining, the brand preference prediction method of cross-border e-commerce consumers based on potential tag mining, literature [4] method, and literature [5] method are used to predict their brand preference.

4.1. Data Analysis

This paper selects the shopping comment information of 200 consumers on Amazon’s cross-border e-commerce platform as the data source and uses Python web crawler technology to capture the comment data of different brands. The 200 consumers, including college students and company employees, have a certain understanding of cross-border e-commerce brands. The specific situation of 200 users is shown in Tables 2 and 3.

4.2. Analysis of Dimensionality Reduction Effect of Brand Emotional Tendency Data

The dimensionality reduction effect of brand affective tendency data is related to the response speed of the final prediction model. Therefore, through the response speed index analysis, this paper uses differential evolution algorithm to reduce the dimensionality of brand affective tendency data. The test results are shown in Figure 4.

It can be seen from the analysis of Figure 4 that the response speed of this method is always lower than 60 ms, which has a faster prediction effect of brand emotional tendency. The main reason is that this method uses differential evolution algorithm to reduce the dimension of the data on the basis of constructing the brand emotional tendency data analysis model based on projection pursuit regression, reduces the complexity and dimension of the data through multiple iterations, and improves the prediction response speed.

4.3. Calculation of Brand Emotional Tendency

The proposed method, literature [4] method, and literature [5] method are used to calculate the brand emotion tendency of the above consumers, and the calculation accuracy of different methods is compared, as shown in formula (20):where NZ represents the number of correctly calculated samples; Nall represents the total number of samples. According to formula (19), the calculation results of consumers’ emotional tendency under the application of the proposed method, literature [4] method, and literature [5] method are obtained, as shown in Table 4.

In order to better display the calculation results of emotional tendency of male and female consumers, Table 4 is drawn into Figure 5. The following figure fully shows the changes.

Figure 5 is comparison chart of calculation of consumer sentiment propensity; it can be seen from Figure 5 that the accuracy of the three methods in calculating the brand sentiment propensity of male consumers is lower than that of female consumers. The main reason may be that male consumers publish less shopping comments, and the brand sentiment propensity data obtained by the corresponding methods are relatively less. At the same time, male consumers are more rational than female consumers and more rational for comments, Therefore, it is difficult to distinguish their emotional tendencies, resulting in low accuracy. For male consumers, the accuracy of the proposed method is 89% and that of female consumers is 94%, while the accuracy of literature [4] method and literature [5] method is 70% and 70%, 80% and 69%, respectively. From a comprehensive point of view, whether male or female consumers, the calculation results of brand emotion tendency of the proposed method are better than those of [4] and [5], because the proposed method constructs a HowNet emotion dictionary, which is used to calculate consumers’ brand emotion tendency and improve the accuracy of the calculation results.

It can be seen from Figures 6 and 7 that the calculation results of brand emotion tendency of both male and female consumers by the proposed method are better than those in [4] and [5], because the proposed method constructs a HowNet emotion dictionary to calculate consumers’ brand emotion tendency on this basis, which improves the accuracy of the calculation results.

4.4. Prediction Results

The average absolute error MAE is used as an index to test the brand preference prediction results of the proposed method, the method in [4], and the method in [5], as shown in formula (21).where ai represents the prediction result; ti represents the actual results; k represents the number of brands.

As can be seen from Figure 8, the average absolute error obtained by the proposed method when predicting consumers’ brand preference for cross-border e-commerce consumers of different ages is low, with the highest error of 0.2%, which is lower than 0.6% of the method in [4] and 0.5% of the method in [5]. The main reason is that the proposed method obtains more accurate input data from the consumer comment data in the cross-border e-commerce platform through preprocessing and dimensionality reduction before prediction, the average absolute error of prediction is reduced, and the effectiveness of the proposed method is verified. At the same time, according to Figure 7, the average absolute error of the prediction of the three methods is relatively high in the age group of 21 ∼ 30. The main reason may be that the brands concerned by this group are more likely to buy goods for the parent group in addition to the acceptable brands in their own age group. At the same time, this group involves young couples with children, so their brands cover a wide range, resulting in a relatively high average absolute error of prediction.

4.5. Model Evaluation

Using the ROC curve, the farther the ROC curve from the pure opportunity line, the stronger the discrimination of the subjects, and test the accuracy of the model constructed by the proposed method, the method in [4], and the method in [5]. The test results are shown in Figure 9.

The abscissa in Figure 9 is the false positive rate and the ordinate is the true positive rate. The model constructed by the proposed method is farther from the pure opportunity line, indicating that the accuracy of the model constructed by the proposed method is higher. The main reason is that the proposed method optimizes the model input and improves the prediction accuracy of the model by mining the potential labels of brand preference of cross-border e-commerce consumers.

5. Conclusions and Prospects

Aiming at the problems of low accuracy of brand tendency calculation results, large average absolute error of prediction results, and low model accuracy caused by many cross-border e-commerce brands, a cross-border e-commerce consumer brand preference prediction method based on potential tag mining is proposed. Based on the calculation of cross-border e-commerce brand emotional tendency, this paper uses differential evolution algorithm to optimize the projection pursuit regression model, reduces the dimension of the obtained consumer brand emotional data, and realizes the prediction of brand preference based on Bayesian method. The experimental results show that this method has high accuracy for the calculation results of consumer brand tendency of different genders and different ages and improves the accuracy of the prediction model, It can provide data reference for the development of cross-border e-commerce industry, but the research on the prediction efficiency of the proposed method is insufficient, and improving the prediction efficiency will be taken as the research direction in future work.

Data Availability

The datasets generated for this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Key Research Project of Humanities and Social Sciences in Colleges and Universities of Anhui Province: Research on the Strategies of Transforming Cross-Border E-Commerce for Anhui Traditional Foreign Trade Enterprises from the Perspective of the “Belt and Road” Initiative (Project no. SK2018A0915), and Major Research Project of Humanities and Social Sciences in Colleges and Universities of Anhui Province: Research on the Policy and Path Of Low Carbon Economy Development in Anhui Province (Project no. SK2018ZD060).