#### Abstract

In terms of the low accuracy and unsatisfactory effect of traditional prediction models for consumption behavior, in the study of deep learning DNN model, a consumption behavior prediction model based on rDNN model is proposed. By choosing the appropriate function as the activation function of the model, the random sampling method is used to select negative samples of consumer behavior data to determine the N/P ratio, which improves the DNN model. Based on the improved DNN model, a consumer behavior prediction model based on the rDNN model is constructed. The results show that when the tanh function is used as the activation function and the ratio of N/P is 3, the rDNN model has the best prediction effect on consumption behavior, with AUC value of 0.8422 and the fastest operation efficiency of 434.36 s. Compared with traditional prediction models, DNN, and KmDNN deep learning models, the proposed model has more reliable prediction results and can be used to predict actual consumption behavior.

#### 1. Related Work

Consumption behavior reflects the characteristics of consumers’ consumption, individual preference, and inherent law. By analyzing consumption behavior, it is good for businesses to know more about consumers’ real demand and market demand, so as to realize accurate recommendation of commodities and increase purchase rate. There are many factors affecting consumer behavior, including consumer’s economy, consumer groups, and commodity value. How to screen out effective information from these massive data to predict consumer behavior is the main problem that needs to be solved at present. The analysis and prediction of consumption behavior has crossed the stage of qualitative analysis, and the machine learning model has been used for data studying, which effectively improves the prediction efficiency of consumption behavior. For example, Guo et al. proposed to analyze and predict consumer behavior by using a regression model. In this research method, the most crucial thing is to use individual consumption interests and habits as the basic data for the whole prediction model. The results indicated that the regression model is feasible [1]. Zhao et al. mainly analyzed the characteristics of consumer behavior, which provided reference for the feature input of machine learning [2]. Xiao and Tong proposed to improve the efficiency of clustering by clustering the data before performing regression prediction; the results showed that the prediction accuracy of the method was over 98% [3]. Chung et al. analyzed data from 252 real EV charging users and then applied ensemble learning and machine learning for behavior prediction, which resulted in improved accuracy of charging behavior prediction [4]. In addition to behavioral prediction, some machine learning algorithms are also used for prediction in other fields, such as Parhizkar et al. [5] and Shapi Mel Keytingan et al. [6]. The principal component analysis method is applied to the prediction of energy consumption. Ren et al. and Yang et al. applied machine learning algorithms to the prediction of environmental energy consumption and glider energy consumption, and these provide more references for the application of machine learning. However, with the development of e-commerce, the above methods can no longer meet the actual demand of consumer behavior [7, 8]. In order to better predict the consumption behavior, this paper intends to learn the consumption behavior from different perspectives and realize the prediction of consumption behavior through deep learning. Considering the large amount of consumption behavior data and in order to better explore the deep features of the data, the deep neural network (DNN) model is used as the basic model, and the rDNN model is constructed to predict consumption behavior by selecting the appropriate proportion of positive and negative samples and activation function.

#### 2. Basic Method

##### 2.1. Introduction of the DNN

The DNN is essentially a feedforward neural network with multiple hidden layers. With increasing the number of hidden layers in the neural layer, the feature learned by the model is richer, and the prediction effect of the model is improved. Compared with shallow neural network, the DNN has stronger learning ability, and its structure is shown in Figure 1 [9].

The DNN model training includes forward propagation and backward propagation. Assuming that the DNN model contains two hidden layers and the activation function is tanh function, the input layer is . In this expression, *n* represents the word vector dimension, and *k* represents data categories, which means the first hidden layer of forward propagation contains *n*_{1} neurons. The input and output of this layer can be expressed by expressions (1) and (2) [10, 11]:

The second hidden layer contains *n*_{2} neurons, and the input and output of this layer can be expressed by expressions (3) and (4):

The output layer of the model is [12]

In expressions (1)–(5), represent the weight matrix of hidden layer 1, hidden layer 2, and output layer, respectively, and represent the threshold matrix of hidden layer 1, hidden layer 2, and output layer, respectively.

Using to express all the parameters in the DNN, it will be

Use to show the function of *θ*, and *y*_{i} is the log probability of sample *i* of output layer; after normalization, it can be expressed as [13]

In this expression, indicates the category of training sample *i*.

Backward propagation means that the labeled sample , is used as the number of training samples, and represents the training sample *i*. Finding the parameter *θ* to maximize the log-likelihood probability of with regular term, the likelihood function will be like the following expression [14]:

The random gradient ascending method is used to learn *θ*, and the backward propagation algorithm is used to iterate and update the parameters until the preset accuracy is reached. The iteration formula is shown in the following expression:

In this expression, means learning rate.

The DNN model has good feature learning ability, but it is easy to have the problem of slow training speed because of its many parameters [15, 16]. To solve this problem, the DNN model is improved in this paper. According to the above analysis, the parameter quantity is closely related to the selection of activation function in the DNN model, so the model parameter quantity is reduced by selecting suitable activation function. In addition, the imbalance between positive and negative samples of data has some influence to the performance of the model. In order to further improve the performance of the model, the DNN model is improved by choosing the appropriate ratio of negative and positive samples, which is recorded as the N/P ratio.

##### 2.2. Improvement of the DNN Model

###### 2.2.1. Selection of Activation Function

The most frequently used activation functions are sigmoid, tanh, and relu. The mathematical expression of sigmoid function is shown in expression (10) [17]. The mathematical expression of tanh function is shown in expression (11). After simplifying calculation, tanh function is shown in expression (12) [18]. The mathematical expression of relu function is shown in expression (13). Compared with sigmoid function and tanh function, Relu has advantages in solving the problem of gradient disappearance. Therefore, the Relu function is chosen as the activation function.

###### 2.2.2. N/P Ratio Selection

An appropriate N/P ratio can avoid the problem of data feature’s singularity and improve the generalization ability of the model. Before selecting the N/P ratio, balancing the positive and negative samples of data should be done first. Referring to the literature [19, 20], random sampling of negative samples is used in this paper to balance the sample data. Subsamples with a certain proportion with positive samples are randomly selected from negative samples to balance them with positive samples, as shown in Figure 2. In this figure, white represents positive samples and black represents negative samples.

Based on the above improvements, this paper constructs an improved DNN model, which is called rDNN model, and this model is used to predict the consumption behavior. The structure of the rDNN model is shown in Figure 3.

#### 3. Prediction Method of Consumption Behavior Based on the rDNN Model

##### 3.1. Feature Selection of Consumer Behavior

Feature selection is the basis of constructing the prediction model of consumption behavior, and the most suitable data feature for consumption behavior is conducive to improving the prediction accuracy and efficiency of the model. Combining the literature [21] and the characteristics of consumption behavior data, this paper selects the characteristics of consumption behavior data from six dimensions in Table 1.

Considering that the dimensions and units of the features mentioned above are not united, in order to avoid their influence on the prediction of the model, min-max standardization is carried out on them, which is shown in expression (14) [22]. The missing values of the features are filled with 0 to obtain a large-scale sparse matrix of features.

##### 3.2. Construction of the rDNN Model

The way of constructing the rDNN model is by reducing the redundancy of model training data, improving the efficiency of model training, and realizing more accurate analysis and prediction of consumption behavior. The specific construction process of the rDNN model is as follows:(1)Collect and pretreat consumption behavior data and divide them into training set and verification set in time sequence.(2)Divide the training set into positive and negative samples with certain rules and balance the data categories with randomly getting negative samples.(3)Construct the DNN model, initialize model parameters, and train the model using the training set.(4)Use backward propagation to adjust the model parameters until the model has the best prediction of consumption behavior data, and then the model is used as the best model to predict the consumption behavior data.(5)Input the preprocessed data into the optimal rDNN model, and the output result is the prediction result.

In the above process, the rDNN model building process can be illustrated in Figure 4.

#### 4. Simulation Experiment

##### 4.1. Construction of Experimental Environment

This experiment runs on Windows 7, i5 processor, and 8G + 4G memory and is programmed with Python 2.7.

##### 4.2. Data Source and Processing

###### 4.2.1. Data Sources

In this experiment, the real data of consumer behavior in “Tianchi Big Data Competition” were selected as the experimental dataset. The dataset includes two parts: user-commodity behavior dataset and commodity subset. The specific data formats are shown in Tables 2 and 3 [23].

###### 4.2.2. Data Preprocessing

Through statistical analysis of experimental dataset, it can be seen that there is a problem of missing features in the data, and the processing effect of hash coding is not good, so this experiment deletes it. In addition, the study finds that consumer behavior has certain periodicity in all data except the “Double 12” day. Therefore, it is divided into four groups according to the characteristics, which are shown in Table 4. Group 3 contains data of the “Double 12” day, and the data explosion can easily affect the prediction effect of the model, so it was deleted in this experiment, and take group 1 as the training set, group 2 as the verification set, and group 4 as the test set.

Type 1 is defined as positive samples, and other types are samples from statistics of behavior category data. The data distribution is shown in Figure 5. It can be seen that the category data of consumer behavior are seriously unbalanced, with 325,797,507 positive samples and 23,058,448 negative samples, and the ratio is close to 1 : 99. To solve the problem of unsatisfactory model prediction results caused by unbalanced data categories, in this experiment, millions of negative samples are selected according to the characteristics of the selected dataset. Negative samples were obtained by random sampling technique. It reduces the large proportion of positive and negative samples and achieves the balance of data categories.

##### 4.3. Evaluation Indicators

In this experiment, AUC value is selected as the performance index of the evaluation model, and it is calculated by the following expression [24–26]:where *TP* represents true positive, *FN* represents false negative, *FP* represents false positive, and *TN* represents true negative.

##### 4.4. Parameter Settings

The basic parameters of the rDNN model and the DNN model in this experiment are shown in Table 5. The parameters of K-means in the KmDNN model are shown in Table 6.

##### 4.5. Experimental Results

###### 4.5.1. Model Verification

To verify the availability of the proposed model, this paper compared the effects of different N/P ratios and activation functions on the prediction [27, 28].

*(1) Different N/P Ratios*. Set the N/P ratios to 1, 2, 3, 4, and 5, respectively, and test the influence of the change of N/P ratio in a small range on the prediction effect of the model. The results are shown in Figure 6. It can be seen that when the N/P ratio is adjusted in a small range, the AUC value is around 0.8. When the N/P ratio is expanded and set to 10, 20, 30, 40, and 50 to study the prediction effect of the model in a wide range, the results are shown in Figure 7. It can be seen that with the increase of N/P ratio, the AUC value fluctuates greatly, and the maximum AUC value is 0.8214, with the N/P ratio of 10. Therefore, the experiment controls the range in (0,10) and studies the prediction effect of the model to ensure the best N/P ratio. Set the N/P ratios of 6, 7, 8, and 9 for the experiment, and the AUC value is shown in Figure 8. The figure further proves that when the N/P ratio is adjusted in a small range, the N/P ratio has little influence on the prediction effect of the model. Comparing Figure 6 with Figure 8, it can be seen that when the ratio of N/P is 3, the AUC value is the biggest, which is 0.8359, indicating that the model has the best prediction effect at this time, so this study set the ratio of N/P to 3.

*(2) Different Activation Functions*. The prediction performance of the rDNN model constructed by different activation functions is shown in Figure 9. It can be seen that when the sigmoid function is used as activation function, the AUC value of the rDNN model is the biggest with the number of hidden layers of 2. As the number of hidden layers increases, the AUC value of the model gradually decreases, indicating that when sigmoid function is used as the activation function, the optimal number of hidden layers of the rDNN model is 2. When relu function is used as activation function, the largest AUC value corresponds to 3 hidden layers, which means that when relu function is the activation function, the optimal number of hidden layers of the rDNN model is 3. Comparing sigmoid function with relu function, when relu function is used as activation function, the AUC value of relu function is bigger, and the difference of AUC value is small. When sigmoid function is used as the activation function, the AUC value of the rDNN model is greatly reduced, which indicates that the number of hidden layers has little influence on the prediction effect of the rDNN-rule model but greatly affects the prediction effect of the rDNN-sigmoid model. This proves that relu function is in favour of improving the stability of the rDNN model.

In terms of training time, the time of the rDNN-sigmoid model is 714.76 s, while the time of the rDNN-rule model is 434.36 s, which is obviously better than rDNN-sigmoid model. To sum up, relu function is more suitable as the activation function of the rDNN model than sigmoid function. Therefore, this paper chooses relu function as the activation function of the rDNN model.

###### 4.5.2. Model Comparison

In order to verify the superiority of the proposed model, the prediction results of different deep learning models are compared. To avoid accidental errors, the experiment was repeated 50 times with different depth learning models, and the average value was taken as the final experimental result, which is shown in Table 7. It can be seen that compared with the DNN model, the AUC value of the rDNN model proposed in this study is bigger, which indicates that choosing an appropriate N/P ratio to reduce the gap between positive and negative samples is conducive to improving the performance of the model. Compared with the KmDNN model with K-means algorithm, the AUC value of the proposed rDNN model is bigger, and the prediction effect is better. The reason is that the K-means algorithm needs to set the number of cluster in advance when clustering negative samples [29–31]. It leads to the inability to extract samples of different clusters with equal probability and affects the clustering accuracy, making the prediction result of the KmDNN model not optimal. To sum up, the proposed rDNN model is better than the standard DNN model and the KmDNN model with K-means algorithm.

To further verify the superiority of the proposed model, this paper compares the performance of the proposed model with the traditional prediction model, and the results are shown in Table 8. It shows that the prediction effect of the deep learning model is better than the traditional prediction model. Among the deep learning models, the rDNN model has the best prediction effect. The reason is that during training, the rDNN model reduces numerous negative samples of unbalanced data with redundant information and greatly decreases the scale of model training data, exerting good performance of positive samples of balanced data [32–34]. This improves the prediction effect of the model.

*(1) Comparison of Operation Efficiency*. To verify the effectiveness of the proposed rDNN model, the training time of this model is compared with that of standard DNN model and KmDNN model, and the results are shown in Figure 10. The figure indicates that the training time of DNN model is the longest because the dataset trained by DNN model is the original dataset, which has a large amount of data and an unbalanced proportion of positive and negative samples categories. The KmDNN model and the rDNN model have little difference in training time because the KmDNN model and rDNN model randomly sample the original data, which greatly reduces the amount of training data. But the rDNN model has a wider application range and is more suitable for applications in the background of big data. Therefore, with comprehensive analysis, the rDNN model proposed in this study has better performance, and it is more suitable for recommendation systems in big data environment, which can meet the requirements of high timeliness and precision.

#### 5. Conclusion

To sum up, the deep learning model based on the rDNN model proposed in this paper improves the DNN model by selecting the appropriate activation function and N/P ratio. When the model chooses tanh function as the activation function, the number of hidden layers is 3, and the N/P ratio is 3, and the model can predict the consumption behavior more accurately. Compared with the traditional prediction models, such as random forest, neural network, logistics model, DNN, and KmDNN models, the proposed one has a more efficient and accurate prediction effect, and the AUC value is increased from 0.7893 (the AUC value predicted by the DNN model) to 0.8422, and the training time of the model is only 434.36 s, which is effective and can be used for actual consumption behavior prediction. Due to the limited experimental conditions, there are some shortcomings in this study. For example, when using random sampling method to reduce the amount of negative samples, the number of negative samples to be processed in practical application is often more than that in the experiment, so how to reduce the amount of negative samples should be further explored.

#### Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.