Abstract
Customer churn prediction is one of the challenging problems and paramount concerns for telecommunication industries. With the increasing number of mobile operators, users can switch from one mobile operator to another if they are unsatisfied with the service. Marketing literature states that it costs 5–10 times more to acquire a new customer than retain an existing one. Hence, effective customer churn management has become a crucial demand for mobile communication operators. Researchers have proposed several classifiers and boosting methods to control customer churn rate, including deep learning (DL) algorithms. However, conventional classification algorithms follow an errorbased framework that focuses on improving the classifier’s accuracy over cost sensitization. Typical classification algorithms treat misclassification errors equally, which is not applicable in practice. On the contrary, DL algorithms are computationally expensive as well as timeconsuming. In this paper, a novel classdependent costsensitive boosting algorithm called AdaBoostWithCost is proposed to reduce the churn cost. This study demonstrates the empirical evaluation of the proposed AdaBoostWithCost algorithm, which consistently outperforms the discrete AdaBoost algorithm concerning telecom churn prediction. The key focus of the AdaBoostWithCost classifier is to reduce falsenegative error and the misclassification cost more significantly than the AdaBoost.
1. Introduction
In developing countries, smartphones play a significant role in human life, and the number of mobile operators is rapidly increasing in every technologically advanced country. By the end of 2019, several billion people subscribed to mobile services, accounting for nearly twothirds of the global population [1]. These incessantly growing telecom operators are coming up with various valueadded subscriptions to retain their loyal customers. Hence, customer retaining with the same service provider became questionable. In this fierce competitive nature of the wireless telecommunication industry, customers have unlimited freedom to migrate from one service provider to another. This phenomenon is known as churn. A few reasons for churn are dissatisfaction in services such as unattractive recharge plans, frequent call drops, insufficient bandwidth, frequent customer care calls, unreachable networks, and slow Internet speed. In general, several techniques are used to address the customer churn prediction such as statistical learning [2], machine learning [3], evolutionary optimization technique [4], and deep learning [5]. Boosting is an ensemble technique that attempts to create a robust classifier from several weak classifiers. AdaBoost (adaptive boosting) is the first successful algorithm developed for binary classification to improve accuracy. It has now become a somewhat feasible method for different kinds of boosting in machine learning paradigms. However, AdaBoost is inherently a costinsensitive boosting algorithm; therefore, it has limited applications where costs need to be treated differently for different misclassification errors. This study is interested in attempting to mitigate the limitation.
In many realworld applications like anomaly detection scenarios such as bank loan defaulter, telecom churn prediction, fraudulent transactions in banks, domain feature retrieval [6], and rare diseases identification, the problem of costsensitive classification is predominant. The critical reasons for rising telecom churning are telecommunications’ technological development, liberalization, and aggressive competition. In a highly competitive market, mobile operators mainly rely on incessant profits from existing loyal customers. In practice, the cost of acquiring a new customer is five to ten times higher than the cost of retaining an existing customer [7]. Increased churn rate is considered the plague in revenue generation because losing a royal customer client indicates losing revenue. Therefore, the leitmotiv of marketing strategy is now royal customer retention for the telecom industry. In many realworld applications, classification with imbalanced datasets encounters the misclassification costs of rare or minority classes which are usually more expensive than those of the majority classes, especially in telecom churn, medical diagnosis, and prognosis [8]. For effective customer churn management, it is essential to build an accurate churn prediction model.
Recently, costsensitive learning [9–14] has gained considerable interest. With the rapid use of ensemble classifiers to improve accuracy, this paper proposes a design of a misclassification costsensitive boosting algorithm as an extension of favourably voted boosting method AdaBoost. The clairvoyant study empirically evaluates the AdaBoostWithCost costsensitive boosting method to predict customer churn rate with higher accuracy than the fundamental AdaBoost classifier. In general, boosting is an ensemble technique that attempts to create a robust classifier from several weak classifiers. AdaBoost (adaptive boosting) is the first successful boosting algorithm developed for binary classification using this concept to achieve more accuracy. It has now become somewhat of a goto method for different kinds of boosting in machine learning paradigms. However, AdaBoost fundamentally is not a costinsensitive boosting algorithm; therefore, it has inherent limitations for applications where costs need to be treated differently for different misclassification errors. It is interested in attempting to mitigate this limitation. Most classification algorithms treat all kinds of misclassification errors, which may not be accurate in all applications in reality. In telecom churn rate prediction, the customer who will churn if mispredicted by the model has a severe impact on revenue perspective. Therefore, model accuracy may not be the correct measure index for realworld costsensitive applications. However, instead of optimizing the accuracy, the classification algorithm should then minimize the total misclassification cost. Therefore, the paper’s key focus is on empirical evaluations and the proposed AdaBoostWithCost algorithm’s theoretical issues to reduce the cumulative misclassification cost considerably better than the AdaBoost.
1.1. CostSensitive Learning
Costsensitive learning is a type of learning that considers the misclassification costs [15]. The primary objective of this type of learning is to minimize the cumulative misclassification cost. The key difference between costsensitive learning and costinsensitive learning is that costsensitive learning treats different misclassification errors differently. The cost of labelling a positive example as negative can be different from labelling a negative example as positive. Costinsensitive learning does not consider misclassification costs. When researchers first confronted the variable cost issue, they entertained the costsensitive adjustments in binary classification settings [16]. Costsensitive learning is a distinct subfield of machine learning that takes the costs of prediction errors into account while training a machine learning model. One extra input, namely, the cost matrix, is supplied in the modelbuilding phase of the classification process used to construct costsensitive models. When the cost matrix is used in association with boosting, it is said to be costsensitive boosting.
1.1.1. The Problem of Class Imbalance
Today classification algorithms assume a proportionate distribution of examples in each class label, which is not always valid in practice. The data are said to suffer from a class imbalance problem when the class distributions are highly imbalanced. These datasets have a skewed class distribution, and they are also known as imbalanced classification problems. In this context, many classification learning algorithms have low predictive accuracy for the infrequent class [17]. In addition to assuming that the class distribution is balanced, most classifiers also assume that the costs of all types of misclassification error are equal. This assumption is not always valid in many realworld applications. In this situation, the predictive model developed using conventional machine learning algorithms could be biased and inaccurate. Researchers have put serious thought and significant attention to minimizing the misclassification cost instead of minimizing the errors. Therefore, in recent years, costsensitive learning has been a common approach to solving this class imbalance problem.
1.1.2. Issue of Cost Sensitivity
Over the past few years, it has been observed that most of the classification algorithms assume the costs of all types of misclassification errors generated by a model as equal [36], which is often not the case for imbalanced classification problems. In class imbalance problems, the wrong prediction of a positive or minority class case is worse than incorrectly classifying an example from the negative or majority class. In recent years, costsensitive learning has drawn significant interest because of the increasing number of applications that involve costs such as customer churn prediction [18], fraud detection, and bank loan defaulter.
In Section 2, the problem of mobile operators along with the boosting algorithm AdaBoost is discussed. In Section 1.1, costsensitive learning is discussed along with problems and issues. The discussion is carried out on various classification algorithms and various popular costsensitive boosting algorithms in Section 2. Then, in Section 3, AdaBoostWithCost is proposed with a detailed algorithm, equations, and explanation. An empirical evaluation is performed in Section 4 by taking a dataset to investigate the algorithm on the synthesized data, and the result is generated. An evaluation of the AdaBoostWithCost algorithm and empirical results and visualizations are presented in Section 5.
2. Related Works
In recent years, there have been countless applications of machine learning [19] and reinforcement learning [20] in the diversified areas such as healthcare predictions [21], cloud resource management [22], and mobile robot navigation [23]. Moreover, a significant surge is also observed in cyber frauds, as well as the corresponding model to counter them, such as credit card fraud detection, telecom churn prediction [2–5], and detecting rare medical diseases. In the models mentioned above, classifiers are trained to handle most costly errors compared to others. Many ensemblebased classifications have been proposed to introduce the misclassification cost in costsensitive classifiers. In literature, various algorithms have been proposed over the past decades for costsensitive classification. Various authors have modified decision trees in different ways that consider different classdependent costs. In [24], the costsensitive boosting framework has been proposed by the authors expected to optimize the loss function by applying costsensitive decision rules optimally. An adaptive cost bagging method was proposed in [25]. In the doctoral dissertation [21], a costsensitive tree stacking has been proposed where different decision trees are learned in this proposed method and then finally merged in such a way so that the cost function is minimized. In [26], a survey of costsensitive learning applications with base classifier as decision trees is demonstrated. The survey contains several types of costsensitive ensembles methods. The outline of the literature survey is described in Section 2.1.
2.1. Comparison and Discussion
This paper surveys various costsensitive boosting classifiers mentioned below. There are various popular costsensitive boosting algorithms [27] such as Boosting [28], Uboost, CostUboost [29], AdaCost [30], and CostBoost [31] in addition to recently emerged algorithms such as , , , , and [32]. It is to note that CSE stands for CostSensitive Extension. All specified ten algorithms are compared and summarized in Table 1. Boosting is extended by the CostBoost algorithm. The CostUboost classifier modified the Uboost. The discrete AdaBoost extended to , , and . In contrast, and are extensions of AdaCost. The goal of all these stipulated algorithms is to modify the weight in different ways in each iteration. As regards AdaCost [17] (AdaBoost with CostSensitive Adaptation), Freund and Schapire’s AdaBoost is the first attempt towards the study of the costsensitive boosting algorithm. AdaCost is a misclassification costsensitive boosting classifier, a variant of AdaBoost. AdaCost applies misclassifications cost in each round of boosting to update the training distribution. The central idea of AdaCost is to incorporate the cost and produce more advanced classifiers which can reduce the misclassification cost better than AdaBoost. CostBoost [31] is the extension of Boosting [28]. The modified version of Uboost is CostUboost [29]. , , and are extensions of discrete AdaBoost. On the contrary, and are extensions of AdaCost. All of these update the weight in algorithmic step. The following are the weight update equations for the costsensitive boosting classifiers [33].
Weight update equation for discrete AdaBoost is as follows:
Weight update equation for CSE_{1} is as follows:
Weight update equation for CSE_{2} is as follows:
Weight update equation for CSE_{3} is as follows:
In AdaBoost, there is no misclassification cost included in the reweighting step. However, the misclassification cost is incorporated in the weight update equation of some costsensitive classifiers such as AdaCost, , and . The symbols defined in the weight update equations (1)–(3) and (4) are specified as follows. = cost of classification and where and = \{−1 if actual predicted 1, if actual = predicted.
Weight update equation for AdaCost, , and is as follows:
Here, is identical in , , , and AdaBoost, whereas for AdaCost and , where for and and for AdaCost and . Furthermore, does not include in the calculation of [33]. From the above weight update algorithmic equation, it has been noticed that the cost parameter is directly applied to all kinds of misclassification error (falsepositive and falsenegative) equally in each boosting round. They all have given equal weight to reduce cumulative misclassification costs. Table 1 depicts the summary of the survey for ten costsensitive boosting algorithms.
3. Proposed Clairvoyant Method
Different methodologies have been studied, and the most appropriate one is selected for this paper. In practice, there have been two schools of thought while dealing with misclassification costs. The first addresses the cost sensitizing with preprocessing the data by implementing various sampling techniques to increase the influence of the desired samples. These preprocessing techniques rely on examples in the training dataset to minimize cost. The second school of thought i to handle the problem more directly by building costsensitive adjustments into the algorithmic step. In this approach, the wealth of existing machine learning algorithms is modified to use the cost matrix. This mechanism gained significant popularity and became more demanding in practice. In the case of the second methodology, for example, AdaBoost and AdaCost, the metaclassifiers are extended to incorporate the cost of misclassification in the weight update method [34]. AdaBoost is a statistical classification metaalgorithm known for adaptive boosting, and it tweaks the learners in favour of instances misclassified by the previous classifiers. On the contrary, AdaCost is a misclassification costsensitive boosting method, a variant of AdaBoost. AdaBoostWithCost is an ensemble of AdaBoost and AdaCost to improve the performance. In this paper, the proposed algorithm belongs to the second methodology described above.
3.1. AdaBoostWithCost
Nonetheless, misclassification cost is not used in AdaBoost’s weight update rule. In many other methods, the weightupdating rule increases the weights of wrong classifications more aggressively by applying the constant misclassification cost directly to the all misclassification errors (both falsepositive and falsenegative) equally in each boosting round. Such a traditional framework assumes that all misclassification errors carry the same cost. The proposed AdaBoostWithCost method applies the misclassification cost more specifically to the costly highrisk errors (falsenegative in telecom churn study) instead of applying a constant cost to all misclassification errors directly in each iteration of boosting. The algorithm focuses on classdependent cost sensitivity. The cumulative misclassification costs are reduced by assigning higher weights to costly highrisk errors over lowrisk errors. The proposed new algorithm AdaBoostWithCost is illustrated in Algorithm 1.

3.2. Definitions of Symbols
All mathematical symbols and parameters used in equations of the proposed AdaBoostWithCost algorithm (described above) and flowchart shown in Figure 1 are described in Table 2. The description of the inventive steps is as follows. The central idea of the proposed AdaBoostWithCost algorithm is to increase the weight of the costly misclassified data points more aggressively than the correctly classified data points. Hence, the weightupdating rule increases the weights of the false negatives more than false positives since the falsenegative error is more significant in the telecom churn prediction. In the above AdaBoostWithCost algorithm, steps 7 and 12 constitute the invented steps of the proposed AdaBoostWithCost algorithm. The weight update equation in each boosting round of AdaBoostWithCost is as follows:
In the above equation, denotes the new probability assigned to the data point at iteration and represents the distribution of data point at iteration t. The exponential loss function in the weight update equation is denoted by consisting of two components or subexpressions as follows:(1)The first subexpression is (2)The second subexpression which involves cost and falsenegative misclassification error is
It is worth mentioning that the value of the expression will be positive if is negative because the negative sign at the beginning changes negative to positive (since is always positive). To elaborate more, in case of any misclassification performed by the model, the expression becomes positive, whereas in case of correct classification becomes negative. To more understand the reweighting formula, consider the case of a misclassification where (wrong prediction); hence, expression is positive because always . Similarly, in case of accurate prediction, (correct prediction); hence, expression becomes negative according to the logic prescribed above. Therefore, the first subexpression is exactly similar to AdaBoost’s weight update equation and it can be derived from the above logic that AdaBoost boosts up the weights of the data points which have been misclassified consistently by earlier models and brings down the weight of the data points which have been classified correctly so that in the algorithm can focus more on the misclassified samples in its subsequent iterations. Nonetheless, the second subexpression incorporates cost derived from the supplied cost matrix (described in Section 3.1) and the parameter which represents falsenegative error at iteration (on the contrary is the total misclassification error used in the first subexpression). In the subexpression , the cost computation component is . The other component holds the same evaluation method as described in the explanation of first subexpression. Hence, the subexpression will be positive if the is negative because the negative sign at the beginning changes negative to positive and it is multiplied by cost for the falsenegative error (denoted by ). Here, it is worth mentioning that since both and are always positive, the sign of entire expression depends on the sign of as described above.
Therefore, in the second subexpression , the multiplication of cost to specifically for falsenegative error (denoted by is the nucleus of the inventive step. The central idea of AdaBoostWithCost is to incorporate the extra cost specifically for falsenegative error to enhance the boosting of the weight, in addition to the normal weight update performed by AdaBoost. This second subexpression underlines the fact that, to reduce the misclassification costs, costly and highrisk errors have been given more higher weights with respect to lowrisk error. In short, in the AdaBoostWithCost algorithm, the weightupdating rule increases the weights of costly misclassified samples more aggressively than the correctly classified samples. The flowchart for AdaBoostWithCost is depicted below. In the flowchart, the inventive step of AdaBoostWithCost is specifically highlighted to demonstrate how AdaBoostWithCost incorporated the cost into the reweighting equation. Table 3 demonstrates the key difference between their weight update equations.
3.3. Empirical Evaluation Parameters
The choice of measurement indices is of paramount importance to evaluate the classifier’s performance. Different performance metrics are used to evaluate different classification algorithms. In the context of the current study, the falsenegative classification error plays a pivotal role in telecom churn prediction. Thus, the study seriously focuses on the falsenegative error counts for the empirical evaluation. The study also considers evaluating the other two parameters: misclassification cost and mean misclassification cost, which too holds great influence in the context of this study. The performance metrics are used to evaluate the performance of the proposed costsensitive boosting algorithm AdaBoostWithCost and AdaBoost. The cost of each class error is shown in the confusion matrix in Table 4, which is supplied as an input to measure the total misclassification cost. The normalized weight distribution concerning cost is shown in Table 5. More details about the confusion matrix and weight normalization method are stipulated in Section 3.1.
4. Empirical Evaluation
4.1. Data Selection
The telecom dataset used in the investigations has been taken from Kaggle [35]. The dataset contains over 3335 rows (Call Data Records) and 21 columns (attributes). Data consist of the various behaviours of customers, and the last column states if the customer is still with the existing telecom company or not. However, the study requires generating synthetic data (over 100,000 samples) to carry out the study’s objective.
4.2. Generating Synthesized Data
The objective of the study’s experiment is to empirically evaluate the performance of the proposed classifier AdaBoostWithCost with a large volume of data. Therefore, it enforces the study to generate synthesized data to fulfil the requirement for the investigation. The idea is to generate enough synthesized data (near about 100,000 samples) points, that is, Call Data Records (CDR), to compare the robustness of the AdaBoostWithCost method against discrete AdaBoost. The number of features in the Kaggle dataset is 21 features as well as only 3335 Call Data Records (CDR), which is not sufficient for satisfying the study’s objective. Hence, it is essential to generate synthetic data from the source data collected from the Kaggle source. The synthetic data is generated by oversampling the source data using Weka [33] which transforms the source examples (data points) from 3335 CDR observations to 100,000 CDR observations that are adequate to satisfy the objective of the investigations.
4.3. The Input Cost Matrix and Weight Normalizations
Costsensitive machine learning methods explicitly use the confusion matrix as an input while building costsensitive classifiers. Fundamentally the cost matrix is a matrix that assigns a cost to each cell in the confusion matrix. The effectiveness of costsensitive learning relies strongly on the supplied cost matrix. Parameters provided in the confusion matrix have the utmost importance in both training and prediction steps [36] in the study of costsensitive learning. In most of the costsensitive boosting algorithms, the cost matrix is supplied in the modelbuilding phase. The costsensitive boosting classifiers modify the weight update equation to incorporate the misclassification cost derived from the cost matrix. Defining the confusion matrix might sometimes be challenging as it is domainspecific. In the telecom churn prediction modeling study, a model is used to predict which customers are more likely to abandon a service provider. In this context of the study, failing to detect an actual churning customer (falsenegative case) has a more serious impact on economic results than failing to identify accurately a nonchurning customer (falsepositive case). Hence, in this study, the proposed costsensitive boosting algorithm specifically focuses on reducing cumulative highrisk misclassification error (falsenegative), and, accordingly, the confusion matrix parameters are defined.
Ideally, an accurate cost matrix might be correctly defined by a domain expert or economist. In this study, since the incorrect prediction of the churning customer (falsenegative) has bigger influence, the proposed AdaBoostWithCost algorithm focuses on reducing specifically highrisk costly errors. Regarding the allocation of the cost for each class in the cost, the matrix is shown in Table 6. It has been observed by most telecom experts from various literature surveys that falsenegative classification error is 5 to 10 times more expensive than falsepositive error. Considering a worstcase scenario in telecom industries, this study assigns the falsenegative cost ten times (extreme case) more than the falsepositive cost. Hence, the cost ratio of falsepositive errors to falsenegative errors used in this study is 1 : 10, which means that falsenegative errors are ten times costlier than the falsepositive classification errors. The study experiments with running three different sets of iterations for empirical evaluation of AdaBoostWithCost and AdaBoost. It is important to note that Table 4 depicts a hypothetical cost matrix supplied as an input to the AdaBoostWithCost algorithm and used in the weight update equation to calculate the misclassification cost. In the below cost matrix, in Table 4, the notation C () indicates the cost. In C (x, y), the first parameter x is the predicted class, and the second parameter y represents the actual class. Table 4 represents the confusion matrix; the names of each cell of the confusion matrix are also listed as acronyms; for example, false positive is FP. Table 4 shows the costmatrix structure where the cost of a false positive is denoted by C(1, 0) and the cost of a false negative is denoted by C(0, 1).
Table 6 depicts the cost matrix which is supplied as input to the AdaBoostWithCost algorithm and used in the weight update equation. The assignment of a cost to each cell in the confusion matrix is defined below and referred to as the confusion matrix. It is noteworthy that cell C(0, 1) of the confusion matrix represents the cost of falsenegative error, whereas falsepositive error is designated by cell C (1, 0). Consequently, cell C (0, 1) is assigned to cost 10, and cell C (1, 0) is assigned to 1 according to the aforementioned discussion (the study considers that the falsenegative error is 10 times more costly than the falsepositive error). Table 6 shows each cell value of the confusion matrix.
Although the confusion matrix consists with four cells, nevertheless, the true positive and true negative do not play an important role in the context of telecom churn prediction. Moreover, falsepositive classification has also an insignificant impact on the context of the study. The only significant parameter is falsenegative classification which has a serious impact in telecom churn modeling, hence the high value of 10 assigned to cell C (0, 1). The calibration of weight distribution with respect to cost is essential to carry out the weight update step in AdaBoostWithCost. The normalization (rescaling) method to transform falsenegative value to weight distribution is mentioned in Table 5. To use the cost matrix in the proposed classifier, the confusion matrix cell values must be rescaled within the range of 0 to 1. This normalization or calibration [37] is an essential step to perform the weight update operation in the reweighting equation of the AdaBoostWithCost algorithm. The normalization technique ensures that the weight or probability distribution of each training data point stands between 0 and 1. The investigation of this study centered around falsenegative cost 10 and corresponding weight distribution 0.2, highlighted in Table 5.
4.4. Experimental Method
The investigations of the study estimate the three measure indices for telecom churn prediction which have utmost importance, the falsenegative errors, misclassification cost, and mean misclassification cost, to assess the performance of the proposed AdaBoostWithCost classifier. The empirical evaluation of this study demonstrates two significant aspects of benchmarking the performance of the AdaBoostWithCost algorithm against AdaBoost. First, the study focuses on measuring performance metrics: the falsenegative errors, misclassification cost, and mean misclassification cost (average misclassification costs across all sets of iterations). Second, it graphically plots the misclassification error rate (both training and test error rates) concerning multiple boosting rounds. To carry out the second measurement criteria mentioned above, this study computes the training and test misclassification error rates for each boosting round of the proposed AdaBoostWithCost classifier and plots them graphically to demonstrate the performance curve of AdaBoostWithCost boosting classifier and basic AdaBoost classifier. The input cost matrix for each category of errors is defined in Table 4. Here, it is important to mention that falsenegative error observation is the foremost interest in this study, since it significantly impacts revenue generation in telecom churn prediction. The falsepositive errors are not accounted for seriously in the experiment, since they are insignificant compared to falsenegative errors in this context.
Literature states that falsenegative classification error is generally 5–10 times more costly than the falsepositive classification error in telecom churn modeling. This study considered the worstcase scenario of the telecom industry, that is, presumed the most severe impact on the revenue generation for service providers due to the incorrect falsenegative classification. Given this worstcase scenario, the experiment assigns the falsenegative cost ten times (highest possible impact on business) more than the falsepositive cost. It is to be noted that cell C (0, 1) of the confusion matrix represents the cost of falsenegative errors, whereas falsepositive error is designated by cell C (1, 0). Consequently, cell C (0, 1) is assigned to cost 10, and cell C (1, 0) is assigned to 1. While estimating the three critical performance metrics, the cost matrix must be rescaled or normalized to a range of 0 to 1. This normalization of probability calibration [37] is mandatory to execute the weight update operation in the reweighting equation of the AdaBoostWithCost algorithm as the weight (probability) distribution of each data point varies between 0 and 1. The normalization method for transforming the confusion matrix’s falsenegative value to weight distribution is mentioned in Table 5.
The first aspect of the empirical evaluation illustrated above is to determine by using three sets of iterations 10, 20, and 40 to measure the performance metrics; the falsenegative errors, misclassification cost, and mean misclassification cost are explained as follows: the misclassification cost for each set of iterations (10, 20, and 40 used in the experiment) of the AdaBoostWithCost algorithm is computed from the following formula:where CM is the confusion matrix and C (row_index, col_index) is the cost of the cell.
The study uses iterationwise computation of cumulative misclassification cost:(a)Cumulative misclassification cost at the end of the iteration(b)Cumulative misclassification cost at the end of the iteration(c)Cumulative misclassification cost at the end of the iteration
The misclassification cost is determined by the following formula: mean misclassification cost = cumulative misclassification cost of all iterations over the number of a set of iterations.
where a, b, and c are the above steps to calculate the misclassification cost resulting from each set of iterations, and there are three sets of iterations (10, 20, and 40) that have been used for the experiment to compute the mean misclassification cost. The second aspect of the empirical evaluation specified above is to visually represent the misclassification error rate for both training and test errors by plotting graphs. One of the salient features of the investigation is to manifest the change in training and test error rate over each set of boosting rounds.
5. Results and Discussion
5.1. The Evaluation of AdaBoostWithCost and AdaBoost
The error summary of the experimental results focuses on the three important performance metrics: the total misclassification error, falsenegative error count, and training and testing error rates. Upon careful inspection of the below synopsis, it is obvious that the values of three performance metrics consistently decrease over each set of boosting rounds 10, 20, and 40, respectively. Specifically, the falsenegative error, which is a parameter of utmost importance in this study, gets reduced significantly over each interval of boosting rounds.
5.2. Interpretation of Empirical Results and Visualizations
The empirical evaluation of the proposed AdaBoostWithCost algorithm and AdaBoost classifier has been carried out in three crucial performance metrics considered in the study context. The summarized error summary is shown in Table 7. Table 7 manifests the significant difference in experimental results between AdaBoostWithCost and AdaBoost. The study observed that AdaBoostWithCost significantly reduced the falsenegative error counts compared to the traditional boosting classifier AdaBoost. Hence, the summarized results unfold the fact that AdaBoostWithCost prevails over AdaBoost in terms of falsenegative error reduction, which is the foremost influential parameter in the context of the study.
Figure 2 demonstrates how misclassification error rates of both classifiers monotonically decrease with the increasing number of iterations. Nevertheless, the span of the sharp falling edge shown as the dark blue line (indicating AdaBoostWithCost) unveils the fact that the pace of error rate reduction by AdaBoostWithCost is more expeditious than that by traditional AdaBoost. Figure 2 also reveals eventually that AdaBoostWithCost beats AdaBoost in the race of error rate reduction. The below sidebyside graph shows the decreasing pattern of training and test rates with each set of iterations for both AdaBoost and AdaBoostWithCost classifiers. The above plots show how both training and test error rates gradually get scaled down over each iteration round. Moreover, the line graphs portray how the training and test error rates monotonically decrease when the number of iterations is increased. By careful inspection, the study discovers that the intermediate gap between the two lines (training and test error rates) demonstrates that training and test error rates reduction is much expedited by AdaBoostWithCost compared to the traditional AdaBoost classifier. The study also concludes from Figure 3 that the AdaBoostWithCost model does not tend to overfit. However, there is a chance of slight overfitting in the case of AdaBoost classier.
6. Conclusion
Costsensitive learning is not new in today’s machine learning community. In recent years, it has gained tremendous popularity because of the rising demand for critical realworld costsensitive applications. Today, stateoftheart machine learning algorithms are not well designed with financial goals, in the sense that the models miss including the real financial costs during the training and evaluation phases. In the context of telecom churn prediction, a model evaluation based on a traditional measure such as accuracy does not yield the best results when measured by the actual financial cost. Failing to detect true churners severely impacts telecom operators’ revenue rather than incorrectly predicting a nonchurning customer as a churner. This paper intended to deal with the challenges of classdependent costsensitive classification and mitigate the businessspecific cost sensitivity. This paper surveyed various costsensitive boosting algorithms in today’s machine learning community and summarized their comparison in Table 1. The study also discussed the weight update equation of those costsensitive classifiers while dealing with variable cost errors. Nevertheless, the study significantly contributed to classdependent costsensitive boosting classification in two distinct aspects: First, the study devised a novel classdependent costsensitive boosting algorithm, AdaBoostWithCost, which incorporates the cost function into the weight update equation in a novel way. The inventive step of AdaBoostWithCost is in the weight update equation, which incorporates the unique cost function. The AdaBoostWithCost classifier applied the misclassification cost in the reweighting equation more specifically to the highrisk errors (falsenegative error in the telecom churn case) instead of applying to all misclassification errors directly in each iteration of boosting. Second, the study carried out an indepth inspection of experimental results summarized in Table 7 and the interpretation of graph visualizations (Figures 2 and 3). Finally, the study has drawn a significant conclusion that the AdaBoostWithCost algorithm consistently outperforms AdaBoost in all aspects of the study’s objective.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.