Abstract

With the rapid development of online finance and social networks, a large amount of behavioral data is stored on the Internet, which can fully reflect the shopping tendencies and habits of real users. Using big data to analyze consumer behavior is more scientific and accurate than the traditional sampling survey method. Internet consumption behavior data are time series data. Therefore, this paper proposes a method of analyzing behavioral sequence data, which learns personal consumption interests and habits, and finally predicts payment behavior. The experiments compare the execution effect of different algorithms on multiple databases and verify the feasibility and effectiveness of the proposed algorithm SeqLearn.

1. Introduction

Internet e-commerce platforms contain a wealth of information about events, relationships, and attitudes. A series of technologies such as text mining, statistical theory, association analysis, and visualization are used to realize sentiment analysis, information extraction, and user influence analysis. Learning consumer behavior can help analyze the characteristics of consumers, the relationship between products, and so on. Therefore, it is a very valuable research topic to construct consumption structures according to the consumption behavior records of different consumers. The algorithm SeqLearn proposed in this paper extracts consumers' consumption interests and habits by analyzing the behavior data of e-commerce platforms and calculates the comprehensive probability to predict the next payment behavior. Forecast results can be used for product recommendations, advertising placement, and other applications.

Online shopping is a process in which users browse and search for related products to complete shopping or other related tasks in a virtual shopping environment. Payment decision is the integration and unity of consumer demand, motivation, activity, and afterthought. By analyzing the network data, the potential connections behind a series of consumer behaviors can be mined out and the prediction function can be realized. The application of this method was quick and effective to identify and segment consumer groups and facilitated the mapping of the differences among these groups and the comparison of the consumption behavior expressed by consumers on different markets [1]. In the field of social network analysis, users are defined as nodes. Relationships between nodes can be abstractly represented by edges. Common techniques used to analyze relationships between nodes include causal inference [24] and Bayesian network [5, 6]. Different strategies are used for different purposes and needs, such as emotion-oriented analysis, information extraction, and user influence analysis. For example, Bollen analyzed the sentiment in tweets and found that the creation and dissemination of online information are closely related to stock market, future commodity prices, and major social events [7]; Asur of HP LABS successfully predicted the box office using Twitter data [8]. Based on the big data of social networks, González-Bailón [9] studied the dynamics of the protest movements and revealed its influences on political trends.

Group interaction, information dissemination, and other behaviors on the Internet can affect politics, economy, and society. Network data analysis methods are divided into the following aspects: (1) analyzing the characteristics of nodes [10]; (2) analyzing relationships (such as advertising push relationship [11], intimate relationship [12], and other types of relationships [13, 14]) between nodes according to sociology [1518]; (3) analyzing the law of information transmission; and (4) group behavior studies (such as topology analysis, normalization analysis, modular analysis, random data flow analysis, node clustering, and classification) [1921].

The Internet has become a new media, which makes users have a great sense of participation. The convenient and efficient mode of communication is a driving force for Internet business to analyze and mine network data. A recommendation model based on statistical modelling is used to assist consumers facing choice overload by predicting their interests and consumption behaviors [22]. However, due to the complexity of network data, it is a huge challenge for any enterprise or operator to mine valuable information with commercial value efficiently and quickly. The development of the Internet has brought about tremendous changes in consumption concepts, consumption patterns, and consumer conditions.

In this paper, a large number of real network data are collected to support the analysis of consumer behavior. We study personal consumption interests and consumption habits and finally predict the next payment behaviors. Forecast results can be used for product recommendations, advertising placement, and other applications. In the experimental part, online consumer behavior data, such as Ali Data (an online shopping website in China), are used to compare the execution effects of different algorithms on different databases and verify the feasibility and effectiveness of SeqLearn.

The remainder of this paper is organized as follows. Section 2 proposes the predictive model of consumer behavior. Section 3 introduces the experimental results. Section 4 states the conclusions of research achievements.

2. Consumer Behavior Prediction Model

Online consumption data can reflect the actual consumption trends and consumption changes of users. This paper proposes a sequence analysis algorithm to study personal consumption preferences and consumption habits. Personal consumption preferences can be used to predict what consumers will buy over a period of time. Consumption habits reflect a time cycle, that is, how long it takes for consumers to pay attention to a certain commodity before they buy it.

The behaviors of Internet users can be divided into four categories: click, collect, add-to-cart, and payment. Among these four behaviors, payment data are considered outdated data, which only reflects the user's past shopping preferences and is used as validation data to verify the accuracy of the algorithm. Add-to-cart and collect behaviors are regarded as equal. They have the same effect on predicting shopping behavior. The amount of browsing behavior data is huge, and its correlation with user interests is weak. Experiments show that the accuracy of our prediction algorithm will not be significantly improved by analyzing the browsing behavior data. Therefore, this paper does not consider browsing behavior data and obtains accurate analysis results with as little computational effort as possible. In summary, this paper only considers purchase behavior, add-to-cart behavior, and collect behavior. In order to verify the feasibility of the algorithm, we analyze a large amount of user data. It is found that these three behaviors are strongly correlated and often appear in a logical order. This proves that using only these three behavioral data to predict consumption trends is feasible.

In order to analyze the continuously updated behavioral data sequence and prevent the impact of stale data on analysis accuracy, the length of the dynamic sequence to be analyzed is selected as n. We transform the training sequence into a fixed-length sequence s = (s1, s2,. . ., sn), where n represents the maximum length. If the sequence length is greater than n, then the nearest n actions are considered. If the sequence length is less than n, we will repeatedly add a “padding” item to the left until the length is n.

The most important task of sequential analysis is to learn each user's personal consumption preferences and consumption habits based on historical data. The prediction formula is shown as follows:

calculates the probability that user u performs action on item , where represents the predicted next item and represents the predicted user action. represents the historical access record of user u. represents the historical action sequence of user u.

The following sections introduce the formulas for learning personal consumption interests and habits.

2.1. Personal Consumption Preference Analysis

In a long time, the consumer's consumption trend is constantly changing, but in a short time, the consumption trend is stable. By analyzing user behavior over a period of time, we can find out consumers' consumption interests, which reflect the users' recent consumption trends. The following formula illustrates the process of quantifying user interests to predict user behaviors. in formula (1) reflects the consumption interest of a consumer over a period of time. The probability function formula is as follows:where represents the sequence of access items for user i. If the length of the sequence is less than n, it is automatically filled. The number of occurrences of each item in the sequence is denoted by x. x corresponds to the Poisson distribution. x is related to the number of times users use online media and the amount of information generated [23], so x~Poisson (λ), where λ=ev × c, in which ev represents the number of times a user uses network media in time t and c represents the total amount of data generated in a certain consumer interest area within time t. From this, the following probability density formula can be derived:

in formula (2) represents the probability that and appear at the same time. The specific formula is shown as follows:where n represents the sequence length and m represents the number of elements that appear in both and the sequence being studied. Substituting formula (3) and formula (4) into formula (2), the following formula is obtained:

2.2. Consumption Habit Analysis

The consumption habits of consumers are different, which can be roughly divided into impulse type and conservative type. Impulse consumers refer to consumers who purchase goods intuitively and impulsively under the influence of urgent purchase psychology. Conservative consumers have strict attitudes towards consumption. They are sensitive and wait-and-see attitudes towards products and are sensitive to prices. By comparing these two types of consumers, impulsive consumers will be influenced by the first suitable product and make quick purchase decisions instead of repeated selection and comparison. However, conservative consumers have long consumption cycles and need to compare many products before buying.

The consumption habits of any consumer are different. By studying historical data, the consumption habits of different consumers can be derived. The probability formula for predicting the next action of consumers is shown as follows:

Formula (7) calculates the frequency of purchase behavior, where n is the length of the sequence being studied and t is the number of purchases.

satisfies binomial distribution B (n, p), and let y be the number of times the purchase behavior A occurs in n user actions, so y is a random variable. y is equal to 0, 1, 2, ..., n. Set , . So the distribution law of is shown as follows:where k is the number of times the purchase behavior occurs in sequence . Then, formula (7) and formula (8) are substituted into formula (6), and the following formula can be obtained:

Finally, by substituting formula (5) and formula (9) into formula (1), the following formula can be obtained:

3. Experiment

The parameters used in the experiment are shown in Table 1.

Datasets used in the experiment are as follows:(1)Ali data (https://tianchi.aliyun.com/dataset/dataDetail?dataId = 46): this dataset is provided by Alibaba Group. It possesses a wealth of user data, such as user location information and access time. This dataset contains the following attributes: user ID, item IDs, types of behaviors (including click, collect, add-to-cart, and payment; the corresponding values are 1, 2, 3, and 4, respectively), locations, the category ID, and time.(2)EP data: it contains 15,890,209 pieces of data collected from http://www.dianping.com/ in August 2018. The dataset contains the following attributes: shop_id (unique), province, city, city_id, area, big_cate, big_cate_id, small_cate, small_cate_id, service_rating, all_remarks, very_good_remarks, good_remarks, common_remarks, bad_remarks, and very_bad_remarks.

The algorithms used in the experiment are as follows:(1)Statistical learning: statistical learning techniques are the tools we use to understand data, which are divided into supervised learning and unsupervised learning. Broadly speaking, statistical learning builds statistical models based on one or more inputs to predict or estimate outputs.(2)EC-Structure [23]: this algorithm grabs consumption data of the e-commerce platform, analyzes consumption structures, and predicts consumption behaviors according to consumption structures. The algorithm integrates multiple dimensions of network data to comprehensively study consumer behavior.

In order to show the relationship between user behavior and payment behavior, describes the behavior data of an Ali user for one month. It visually shows the sequential relationship between different user behaviors.

In Figure 1, the abscissa indicates the order of appearance of the behavior data, and the ordinate indicates the consumption category number. The dots in Figure 1 represent the behaviors of collect, add-to-car, and payment. The triangles represent the predicted payment behaviors based on historical data. It can be seen from Figure 1 that there is a certain logical connection between users' online behavior and payment behavior, and it is verified that the method of integrating consumer interests and consumption habits can accurately predict the next payment behavior.

Table 2 illustrates the accuracy of the consumer behavior prediction algorithm SeqLearn (Top@N: among the results calculated by the algorithm, the first N are selected for precision analysis). The horizontal quantity represents the actual value, and the vertical quantity represents the predicted value. In order to measure the accuracy of the algorithm, the following indicators are selected for measurement: precision P = tp/(tp + fp); recall R = tp/(tp + fn); FB parameters FB = P × R × 2/(P + R), where tp is the number of behaviors that are correctly predicted, tn is the number of the correctly predicted behaviors that will not occur, fn is the number of incorrectly predicted behaviors that will not occur, and fp is the number of incorrectly predicted consumption behaviors that will not occur. In addition, parameter ACC = tp/(tp + tn + fn + fp) is used to measure the prediction accuracy of the algorithms.

Table 3 displays the performance comparison of statistical learning, EC-Structure, and SeqLearn using different datasets. According to Table 3, we can conclude that SeqLearn has the most stable execution effect and the best results according to FB and ACC.

4. Conclusion

If a questionnaire survey is used for research, problems such as low sample coverage, long data survey cycle, and lagging survey results will arise. In this paper, a large number of real network data is collected to support the analysis of consumer behavior. The results of this method are accurate. Internet consumption behavior data are time series data. Therefore, this paper adopts the method of time series analysis to study personal consumption interests and consumption habits and finally predicts next payment behaviors. The prediction results can be used in areas such as product recommendation and advertising push. The experiment shows the process of predicting consumer behavior and verifies the feasibility and efficiency of the SeqLearn algorithm on multiple datasets. In the future research, we hope to study the characteristics of long-term consumption and short-term consumption, so that we can accurately calculate the parameters of different types of goods in the recommendation method.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by Youth Program of The National Social Science Fund of China (Project name: Research on online behavior pattern of customers and multidimensional customer insight method under big data; Grant no.19CGL024).