Abstract

The emergence and widespread use of mobile Internet technology has led to many different kinds of new mobile communications services, such as WeChat. Users could have more choices when attempting to satisfy their communications needs. The ability to predict the way in which users will use new mobile communications services is extremely valuable to mobile communications service providers. In this work, we propose a method for predicting how a user will use a new mobile service. Our scheme is inspired by the evolutionary game theory. With large-scale real world datasets collected from mobile service providers, we first extract the benefit-related features for users who were starting to use a new mobile service. Then we design our training and prediction methods for predicting potential users. We evaluate our scheme using experiments with large-scale real data. The results show that our approach can predict users’ future behavior with satisfying accuracy.

1. Introduction

With the development of mobile Internet technology [13] and large-scale distributed systems [4, 5], many new mobile services are being offered by a diverse range of providers [6, 7], including traditional service providers and Internet service providers. Communications between users through the data domain over an Internet connection are known as “Over the Top (OTT)” communications. These new kinds of mobile services provide users with a greater range of choices. Tencent provides communications services through WeChat and QQ, while as a result of the development of 4G technology, China Mobile starts to provide communications services through VoLTE technology. For example, as the most popular OTT communications service, WeChat had about 8.2 million monthly active users as of October 2016. More than 60% of WeChat users open WeChat more than 10 times per day. WeChat users post about 500 million messages every day. Competition between these service providers is becoming increasingly intense.

Users can choose appropriate mobile communications services according to their preferences [8, 9]. Being able to predict the way in which users will likely use a new mobile communications service would be extremely useful to mobile communications service providers. This would help service providers understand consumers’ demands and requirements, provide personalized services according to a user’s preferences, and improve product features and competition policy. Therefore, a method for predicting the way in which users would use a new mobile communications service, in an accurate and timely manner, has become a topic of great concern to service providers.

There have been many studies that have focused on predicting the behavior of a user of a mobile service. Some researchers have studied the reasons and factors influencing the users of a mobile service. Some research works have studied the prediction of app use in a mobile phone. Some methods for predicting the way in which a user uses a mobile service have been proposed. However, these methods cannot accurately predict the way in which users will use a new mobile service. Mainstream companies like Google and Apple use signal models considering session duration, operations per session, location, and device data to predict mobile service usage. In order to predict users behavior with new mobile service, we have added network side data, refined the data usage, and improved the forecasting model.

In this study, we first analyze the characteristics of a users’ behavior and propose our concept for predicting how a user will make use of a new service. We propose a model for predicting a user’s behavior when using a new mobile communications service based on evolutionary game theory. We extract features through the perspective of users’ motivation when using a new mobile service, based on users’ behavioral data, training the model to predict the potential users and heavy users of a new service. Then, we classify the users into different groups according to the users’ features, training the model for each group of users to obtain a better performance. We evaluated the performance of our method with real users’ usage behavior datasets. The results showed that our approach can predict the users’ future usage behavior to attain a high level of accuracy in a relatively short time.

This paper is organized as follows. In Section 2, we describe other studies related to mobile service prediction. In Section 3, we define the problem being addressed in detail. In Section 4, we illustrate our ideas and methods for predicting users’ behavior regarding a new mobile service. Section 5 describes the experiments used to validate our model. Finally, we summarize our work in Section 6.

A better understanding of users’ behavior is of great significance. Since the mobile instant messaging service became an important means of communication, researchers have undertaken many interesting studies related to the analysis of users’ behavior. For example, Oghuma et al. [10] studied users’ ongoing intentions to use mobile instant messaging. By applying an expectation confirmation model, they pointed out that the service quality will affect users satisfaction significantly, and they [11] explained a new user’s ongoing intention to use mobile instant messaging applications with a benefit-confirmation model. Ogara et al. [12] studied the main factors that influence a user’s satisfaction with a mobile messaging service. They showed that experience and social factors are the main influences on user satisfaction. Kuo and Yen [13] studied consumer’s behavioral intentions to adopt mobile value-added services. Hong et al. [14] compared the efficiency of expectation confirmation model and acceptance model for understanding the ongoing service usage behavior. The data used in these studies were collected from a user survey, such that these data directly reflect users’ intentions.

Many previous works studied users’ behavior as it is related to mobile apps [15]. Sun et al. [16] studied the mobile usage prediction problem using feature comparison analysis. To realize a dynamic home screen application, Shin et al. [17] built a prediction model [18, 19] for the usage of apps in mobile context. Zhu et al. [20] studied service usage patterns from the perspective of users and proposed an application usage prediction method. Liao et al. [21] predicted app usage through the application of a temporal feature model, and they [22] proposed a method for predicting which apps are most likely to be used with a temporal-based model. Xu [23] proposed a method for predicting Chinese mobile user behavior based on a time sequence analysis. Kloumann et al. [24] analyzed the temporal and social profiles in app usage prediction problem. These methods of predicting mobile application usage, based on data collected from telephone handsets, are extremely valuable for predicting the way in which an app is used [2529].

Some studies have focused on the prediction of users’ usage of mobile services. Haddad et al. [30] proposed a means of predicting periodic consumption behaviors based on a poison processes model. Valera and Gomez-Rodriguez [31] developed a method to predict the adoption and use frequency of similar products in social networks.

Our work differs from the previous studies. The motivation behind our work involves the prediction of the users’ behavior when a new mobile service is offered on the market. It is difficult to predict the users’ usage behavior from the usage trace data since the amount of available usage trace data is insufficient. The data used for prediction in our work was obtained from mobile service providers, so how to make full use of these data to determine a user’s motivation to use a new service is the key issue in our work. Our concept was inspired by dynamic game theory [3234]. Taylor and Jonker [35] studied the evolution of the game in dynamic case and described the state of the system by differential equations. Goyal et al. [36] proposed a game-theoretic framework to study the initial adoption of competing products.

3. Problem Definition

The main aim of this work was to predict whether a user will continue to use a new mobile service after that user starts to use it and whether a user will become a high-frequency user of the new mobile service. We call a user who will continue to use a new mobile service a “potential user” of the new service, while a user who will use a new service at a high frequency is a “heavy user” of the new service. The datasets collected from a service provider contain the activities of each user using service at time . From these datasets, we can determine the use times and data flow for a user using service over a period . When we combine the two parameters to give the level for user using service over a period , the equation for the use level will be where is the weight factor and and are the normalization factors. When the duration from the first time uses new service is sufficiently long, the can be the continuous use level . For user using a new service , the equation of the continuous use level is where is the first time user uses service .

We first need to identify the potential and heavy users from our training dataset. We extract the user’s usage data for the period starting when the user first uses the new mobile service up until the point when the user has had sufficient time to become familiar with the new service. We then apply the potential user label and heavy user label by setting a use level threshold

Subsequently, we can apply a label to a user according on whether that user is a potential user or a heavy user. The problem we want to solve is how to predict if a user will be a potential user label or a heavy user in the beginning when a user starts to use the new service. This is a classification problem, allowing us to predict the probability of a user continuing to use a new mobile service, that is, the probability of potential user and the probability of a user being a high-frequency user of the new mobile service, that is, the probability of heavy user .

The datasets we use are collected from service providers. These data include users’ behavior and users’ connection technical parameters for the new service.

4. Methodology

In this section, we first present the concept of predicting users’ usage behavior for new mobile communications services. Then, we introduce a method for addressing the prediction problem, including feature extraction and the prediction method.

4.1. Prediction Model

We need to predict a users’ feature usage behavior in an accurate and timely manner using data collected from service providers. It is difficult to predict users’ feature usage behavior in the initial stages when a user is only beginning to use a new mobile service, since there is insufficient usage behavior data.

The intuitive concept on which our method is based is that a user’s motivation leading to his or her use behavior can be explained by evolutionary game theory. Evolutionary game theory studies how biological species choose their dominating strategy and make better decisions. The basic concept is that biological species will make decisions when faced with many options according to their perceived strategy benefits. When some individuals of a species attempt a new strategy, they will evaluate the benefits of the new strategy. If the benefits of the new strategy are greater than those of an alternative strategy, there will be a greater probability of that strategy being used again. When the mean benefit of the new strategy for one species is greater than the mean benefit for all the species, the new strategy will become an evolutionary stable strategy, and the ratio of the new strategy in the species will gradually increase among the features.

Faced with this problem, the user will evaluate the new service’s perceived benefits based on his or her experience when they first try to use the service, and the benefits will affect the user’s usage behavior for the new service. Therefore, we can predict users’ usage behavior from the users’ motivation perspective. We analyze the data obtained from the service providers and extract the features from the users’ experience and benefit-related data.

4.2. Feature Extraction

By studying the main factors influencing the user’s experience and benefits of mobile services as addressed in related works, we identified four categories of factors that greatly influence the user’s experience and benefits, including the service technology factors, the ability of the services factors, interaction factors, and cost factors. The notations in feature extraction [37, 38] are listed in Notations.

4.2.1. Service Technology Feature

The service technology feature in a mobile service including the end-to-end delay, connection rate, and transmission speed will have a direct impact on the users’ experience when using a mobile communications service. From the users’ connection technical datasets, we can extract the service technology parameter, after which we transform these parameters to a normalized feature to indicate the service technology feature including the end-to-end delay feature, connected rate feature, and transmit speed feature for each user when using the new service. These features are counted over a period of time, typically for one day. The features description of service technology is listed in Table 1.

The end-to-end delay feature is estimated by the average end-to-end delay, each time user uses service in time period

The connect rate feature is estimated by the average connect rate, each time user uses the service in time period

The transmit speed feature is estimated by the average transmit speed for each time user uses the service in time period

4.2.2. Service Ability Feature

The service ability feature indicates the benefit a user can derive from a new service. We compare the new service usage to other services of the same type and then estimate the service ability feature of the new service. These features are counted over a period of time, typically for one day. The features description of service ability is listed in Table 2.

The use frequency feature is estimated by summing the use time for user , using service in time period

The use amount is estimated by summing the use count for user using service over time period

The use frequency ratio feature is estimated from the ratio of the service use time over all the services of the same category use time for user in time period

The use amount ratio feature is estimated from the ratio of the service use amount over all the services of the same category use amount for user in time period

4.2.3. Interaction Feature

The interaction feature indicates the usability of a new service for a user. We can extract the adaptation time needed for a user to become familiar with a new service and the ratio of the use amount during the learning period over the mean learning amount [39]. The adaptation time feature is extracted from a user’s use behavior by calculating the time between the first use to the achievement of stable use. The learning ratio feature will be extracted from a user’s use amount during the learning time over the mean learning amount. The features description of interaction is listed in Table 3.

The learning time feature is estimated from the time between user starting to use service and the use level of service reaching a multiple of the long period use level corresponding to the learning levelwhere is the learning level parameter.

The learning ratio feature is estimated as a summation of the use amount during the period needed for user to use service

4.2.4. Cost Feature

The cost feature indicates the cost of using a new service. We extract the cost of a new service for each time or amount of use. The cost feature is a relatively stable feature for a certain new service over a short period of time. These features are counted over a period of time, typically for one day. We can get this data from the service provider. The features description of cost is listed in Table 4.

The time cost feature is estimated as the mean cost of each use of service by user in time period

The amount cost feature is estimated as the mean cost for the total amount user uses service in time period .

These features are extracted from the use behavior and connection technical parameter datasets for each user. They will indicate each user’s experience directly, and from these data we can precisely predict the use behavior for each user.

4.3. Prediction Method

In the prediction step, we first employ a pretraining part to train the benefits of different feature categories using regression, and then we train the users’ use behavior data with the benefits of four categories. We design a classification part to separate a different kind of user to promote the performance of the prediction model.

4.3.1. Pretraining

To train the benefit of four categories of features, we implement a pretraining step for the four categories of features. The target value of this pretraining model will be the users’ long-term use level, which can indicate the benefits of the user to some degree. Therefore, the function we want to train is the benefit factor for the four categories. The training model can be a regression model. After the pretraining step, we can check whether the parameter of each feature is reasonable, test the efficacy of the feature, and guarantee the correctness of our model; then the model will be less error-prone

4.3.2. Prediction of Potential and Heavy Users

Our goal is to predict whether a user will continue to use a new mobile service after he or she starts to use it and whether a user is a high-frequency user of the new mobile service. Once we know the benefits for each user when they are trained in the use of the new mobile service in the pretraining part, we can calculate the benefits of the four categories. Then, we can get the users’ perceived benefit of the four factors for each user. Since the users’ use probability distribution is a sigmoid form conforming to evolutionary game theory, we use a logistic regression model to train the potential user with the users’ label data we have identified. Figure 1 shows the simulated use probability variation with relative perceived benefit by using evolutionary game theory. Heavy user prediction is done using the same method as that for a potential user

4.3.3. Prediction of Potential and Heavy Users after Classification of User Groups

The preference of different users is different; the above method neglects the difference between users. We classify the users into several groups; then we train the above model for each group. The parameters of each group may be different from each other. The predicted results will be compared in the evaluation section. The features we use for classification can be part of the features we extracted from users in Section 4.2; other features such as the city of the user or the age of the user can be used for classification. After classification, we can predict a users’ usage behavior for different groups of users.

5. Performance Evaluation

5.1. Dataset

In order to verify the performance of this method, we collected users’ usage behavior data for 1,048,574 users of China Mobile Communications Corporation in 14 cities across China. The date range of these data was from September 2012 to March 2013. The new mobile service covered by the dataset was WeChat. We collected the users’ behavior record, users’ technical information, and users’ service plan data from operator side. After checking the data integrity, we obtained 330,331 users’ data for training and testing. We identified the potential and heavy users of WeChat from the last two weeks of our users’ behavior data. We found 238,040 potential users and 90,905 heavy users of WeChat. The statistical information [40] of samples is listed in Table 5.

5.2. Performance of the Proposed Approach

We extracted the feature data for the three weeks from when a user first uses WeChat. After the pretraining step, we examined the efficacy of each feature and then trained the logistic regression model for potential and heavy users. Then, we divided the users into several groups and retrained the model to make a comparison with the nonclassified model. The baseline model uses a signal model trained with duration and frequency data of the three-week users’ behavior which is used as a comparison of this experiment.

We used the precision, recall, F1 score, and AUC score of ROC to measure the models’ performance [41]. First, we divided the users into two groups according to the classification method described in prediction method and then retrained the parameters for each group. Then, we compared the model’s performance to the nonclassified model and the baseline model. Table 6 shows the results, where 1-class indicates the nonclassified model and 2-class indicates the model for which two classes are classified.

Then, we compared the ROC curve for the nonclassified users, two classes of classified users, and the baseline model. Figure 2 shows the comparison between ROC curves for potential user prediction. Figure 3 shows the comparison between ROC curves for heavy users.

We compared the ROC curve for different classes. Figure 4 shows the ROC curve for each of two classes classified as potential users. Figure 5 shows the ROC curve for each class of three classes classified as potential users. Figure 6 shows the ROC curve for each class of two classes classified as heavy users. Figure 7 shows the ROC curve for each class of three classes classified as heavy users.

We used a 5-folder cross-validation method to evaluate the performance of the models. The ROC curve for each fold was compared. Figure 8 shows the ROC curve for each fold in nonclassified potential user prediction. Figure 9 shows the ROC curve for each fold in nonclassified heavy user prediction.

From the evaluation results, we can find that our models for potential and heavy user prediction offer a better performance compared with the baseline model. The prediction of potential users exhibits a higher level of performance since the potential users of WeChat constitute a larger proportion of all the users, so the prediction is easier. When we classify the users into two groups, the performance for each group will be better in that the mean performance will increase by 1% to 2%. When we divide the users into more groups, the performance will improve by 1% to 3%, but the gain difference will be small. The classes of user performance differ from each other, since the classification method can gather similar users together to make it easier to predict. The difference between the different folds is relatively small, indicating that the model is very stable.

6. Conclusion

Inspired by the evolutionary game theory, in this paper we have proposed a novel scheme for predicting users’ usage of new mobile services. We set out to understand the users’ experience of a new mobile service by extracting the benefit-related features from the users’ behavior data. We then designed a training and prediction method for potential and heavy users. We classified the users into several groups to distinguish between users for group user training. Finally we evaluated our approach using comprehensive experiments with data from real world systems. The results showed that our approach can predict the users’ future usage behavior with a high level of performance in a relatively short time. We hope that the concept addressed by our study in this work will help future research in this direction.

Notations

: Service use count for user using service at time
: End-to-end delay for user using service at time
: Connect rate for user using service at time
:Transmit speed for user using service at time
:All service sets
:All time period sets for user using service
: Time cost for user using service at time
: Cost for user to use service at time
: Service use amount for user using service at time .

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research is supported in part by the National Key Research and Development Program of China under Grant no. 2016QY02D0202, NSFC under Grants nos. 61370233, 61422202, and 61433019, Foundation for the Author of National Excellent Doctoral Dissertation of China under Grant no. 201345, and Research Fund of Guangdong Province under Grant no. 2015B010131001.