Research Article | Open Access
Wei Li, Youmeng Luo, Chao Tang, Kaiqiang Zhang, Xiaoyu Ma, "Boosted Fuzzy Granular Regression Trees", Mathematical Problems in Engineering, vol. 2021, Article ID 9958427, 16 pages, 2021. https://doi.org/10.1155/2021/9958427
Boosted Fuzzy Granular Regression Trees
The regression problem is a valued problem in the domain of machine learning, and it has been widely employed in many fields such as meteorology, transportation, and material. Granular computing (GrC) is a good approach of exploring human intelligent information processing, which has the superiority of knowledge discovery. Ensemble learning is easy to execute parallelly. Based on granular computing and ensemble learning, we convert the regression problem into granular space equivalently to solve and proposed boosted fuzzy granular regression trees (BFGRT) to predict a test instance. The thought of BFGRT is as follows. First, a clustering algorithm with automatic optimization of clustering centers is presented. Next, in terms of the clustering algorithm, we employ MapReduce to parallelly implement fuzzy granulation of the data. Then, we design new operators and metrics of fuzzy granules to build fuzzy granular rule base. Finally, a fuzzy granular regression tree (FGRT) in the fuzzy granular space is presented. In the light of these, BFGRT can be designed by parallelly combing multiple FGRTs via random sampling attributes and MapReduce. Theory and experiments show that BFGRT is accurate, efficient, and robust.
Learning ability is the basic feature of human intelligence. Prediction is the ability of humans to judge the future based on learning, and it is also a concrete manifestation of human learning ability. Prediction has two concrete forms, regression and classification, which is also one of the substance problems in machine learning, data mining, and statistics. How to train a learner on the basis of existing data is the primary research purpose of the regression problem. It can help people discover the laws of development and change of things from massive historical data, so as to make scientific and quantitative predictions about the future. In the classification problem, the target output is to take values in a finite discrete space, and these values can be either ordered or disordered. In regression problems, the range of output variables is ordered and continuous.
Research on regression problems based on ensemble learning is a hot topic in machine learning research in recent years and has received extensive attention. Nevertheless, the application of ensemble learning to regression problems is still unsatisfactory and needs further research. The regression problem to be solved in reality often comes from a very complex social economic system, and various intricate internal and external factors have linear or nonlinear effects on it. Some are inherent factors, and some are accidental factors. A single learner can only learn for a certain type of data, and it is difficult to get satisfied learning results. Especially in the big data environment, traditional regression learning algorithms are not able to meet the learning requirements of massive complex data in terms of predictive performance and scalability. Although a great progress has been made in the related theoretical research and technology of machine learning, how to continuously improve the generalization ability and learning efficiency of learners is still an important issue and continuous pursuit of machine learning research.
GrC is an effective method for simulating problem-solving thinking of human and processing analysis tasks of big data. It abstracts and divides complex problems into several simpler ones, which helps to better analyze and solve problems. Combining granular computing with ensemble learning to solve regression problems is a good idea.
As the first of the four main research directions in machine learning, ensemble learning learns several different single learners on the training dataset and then combines the respective prediction results as the final output one. Therefore, in most cases, it can perform better than a single learner on generalization and stability . The weak learner can be upgraded to a strong learner is one of the main theoretical foundations of ensemble learning. Kearns and Valiant gave the concepts of weak learning and strong learning from the perspective of classification problems . Avnimelech and Intrator introduced the above concepts into the regression problem and gave a proof of the equivalence of the strong and weak learning of the one . Another major theoretical basis for ensemble learning is the “No free lunch” theory proposed by Wolpert . The implementation method of ensemble learning has received extensive attention from researchers and has achieved some research results. These results can be summarized into two categories: one is the direct strategy and the other is the overproduce and choose approach. Liu and his colleagues proposed an ensemble learning method via negative correlation learning . Fang et al. proposed a selective boosting ensemble learning algorithm . Breiman obtained multiple different training datasets by repeatedly sampling the original sample dataset . Schapire proposed the boosting method, whose main idea is to continuously strengthen the learning of “difficult samples” in the iterative learning process . Ho presented the random subspace method that uses different subsets of the feature space to train and generate multiple learners . This method is different from bootstrap sampling, boosting, and cross-validation approaches and emphasizes the differences between different feature subsets. Breiman designed the output smearing method for regression problems. The primary thought is to inject Gaussian noise into the output variable . The same method is also used to manipulate input variables . Gheyas and Simith presented a dynamic weighted combination method, which is to dynamically adjust the corresponding weights through the predictive performance of the individual learner . The research of ensemble learning on regression problems started late, and there are relatively few research results in applications, such as power load forecasting [13, 14].
Granular computing is a very popular research direction in the field of computational intelligence in the past few decades. The core task of GrC is to construct, represent, and process information granules. Information granule is the foundational element of GrC. It is a set of some elements gathered together according to indistinguishability, similarity, or functionality. As a key component of knowledge representation and processing, information granules always appear with information granulation, and information granulation occurs in the process of abstracting data or inducing knowledge from information. Information (data) forms information granules through information granulation. The representation forms of information granules often include interval , fuzzy set , and rough set . The purpose of information granulation is to separate complex problems into several simpler problems. This way can make us capture the details of the problem. Information granulation and information granules are almost infiltrated in various human cognition, decision-making, and reasoning processes and are closely related to information granularity. For example, in daily life and work, people usually use different time intervals such as day, month, and year to granulate time to obtain information granules of different sizes, and the size of the formed information granules implies the level of information granularity used in granulation. Through the abstraction of the problem, the “finer” and “more special” information granules are transformed into “more coarse” and “more general” information granules and “more coarse” and “more general” information granules can be refined into “finer” and “more special” information granules. GrC helps people analyze and watch the similar problem from extremely different granularities through the transformation between information granules and finally find the most suitable level of analysis and problem solving. The basic composition of GrC mainly includes three parts: granule, granular layer, and granular structure . Granules are the foundational elements of GrC models . The granular layer is an abstract description of the problem space according to a certain granulation criterion . Granular structure is an abstract description of all granular layers. One granulation criterion corresponds to one granular layer, and different granulation rules correspond to multiple granular layers. It shows that people observe, comprehend, and solve problems from various views. All the interconnections between the granular layers form an interactive structure called the granular structure . There are two basic problems in GrC: granulation and calculation based on granulation . Granulation is the first step of GrC to solve the problem, and it is a process of constructing a problem-solving knowledge space. The human brain’s cognitive process of external things is a typical granulation, that is, from the overall and rough cognition, through the continuous processing and refinement of information, it finally forms a partial and detailed analysis and reasoning. The granulation process mainly involves granulation criteria, granulation methods, granule descriptions, and other issues . Granularization-based computing refers to solving the original problem or logical reasoning with granules as the object of operation and the granularity level as the carrier. It is mainly divided into two types: mutual reasoning between granules on the same granular layer and conversion between granules on different granular layers. After years of study and growth, many models of GrC and their extended models have been proposed. The most representative ones are three theoretical models: the fuzzy set model, rough set model, and quotient space model.
1.1. Fuzzy Set Model
Zadeh presented the fuzzy set theory in 1965. He pointed out that a single element always belongs to a set to some extent and may also belong to several sets to a different degree . Under the fuzzy set theory system, he presented a GrC model on the basis of word computing. The core idea of this method is to use words for calculation and reasoning instead of numbers, to achieve fuzzy reasoning, and control of complex information systems, which is in line with human thinking. In addition, Wang employed natural language knowledge to establish a linguistic dynamics system based on word computing and designed a computational theoretical framework for a linguistic dynamic system based on word computing by fusing concepts and schemes in multiple fields .
1.2. Rough Set Model
The degree to an object belonging to a certain set varies with the granularity of the attribute. In order to better characterize the ambiguity of set boundaries, Pawlak proposed the rough set theory in the 1980s. Its essential idea is to adopt indistinguishable relations (equivalence relations) to establish a division of the universe of equivalence classes to establish an approximate space. In the approximate space, upper approximate set and lower approximate set are employed to approximate a set with fuzzy boundaries . The classic rough set theory is mainly aimed at a thorough information system where all feature values of the object processed are understood. In order to make the rough set theory be suitable for handling of uncomplete information systems, there are currently two main ways: one is to fill incomplete data and the other is to expand the classic rough set model. Kryszkiewicz proposed an extended rough set model via tolerance relations . Stefanowski and his teamwork presented an extended rough set model on the basis of asymmetric similarity relations and an extended rough set model by quantitative tolerance relations . Wang analyzed the shortcomings of the previous two expansion models and designed a rough set model based on the restricted tolerance relationship. He found that the tolerance relationship and the asymmetric similarity relationship are the two extremes of the indistinguishable relationship expansion, that is, the condition of the tolerance relationship is too loose, the condition of the asymmetric similar relationship is too tight, and the limit tolerance relationship is between them . Pawlak employed the idea that elements in the equivalent category have the same membership function and explored the structure and granularity of knowledge granules . Polkowski and Skowron adopted the rough mereology method, neural network technology, and the idea of knowledge granulation to design a rough neural computing model, which combines the division block of the rough set and the neural network to form an efficient neural computing method . Peters and his colleagues employed the indistinguishable relationship to divide the real number into multiple subintervals and divided a whole area into several grid units, and each unit was regarded as a granule, and proposed metric is between two information granules on the adjacent relationship and the containment relationship, respectively .
1.3. Quotient Space Theory Model
B. Zhang and L. Zhang presented the theory of quotient space when studying problem solving. They said that “the recognized feature of quasi-intelligence is that human can analyze and watch the same problem from different granularity” . The quotient space theory established a formal system of quotient structure and proposed a series of theories and approaches to solve problems in the fields of heuristic search, information fusion, reasoning, and path planning. There were some related research and applications [32, 33].
In addition, many new models have been proposed, such as granular matrices for reduction and evaluation , three-way decision model [35–37], and ensemble learning for big data based on MapReduce [38–44]. In this study, we adopt parallel granulation and ensemble learning based on MapReduce to solve the regression problem from granular computing angle and enhance the performance of regression and efficiency of granulation.
In this study, a regression problem is equivalently transformed into the fuzzy granular space solution, and BFGRT are constructed from angle of GrC and ensemble learning to solve the regression problem. The main contributions are as follows.(i)First, an adaptive clustering algorithm is proposed, which can adaptively find the optimal cluster centers. It is a global optimization algorithm that solves the problem that classic clustering algorithms rely heavily on the initial cluster centers and are easy to fall into local optimal solution.(ii)Second, parallel fuzzy granulation of data based on the above clustering algorithm combined with MapReduce is presented, which solves the problem of high complexity of traditional granulation and enhances granulation efficiency(iii)Third, we define fuzzy granules and related metric operators, design a loss function, construct an individual fuzzy granular regression tree in the granular space by optimizing the loss function, and then parallelly integrate multiple fuzzy granular regression trees built by different attributes into a stronger learner based on MapReduce to accurately solve the regression problem
3. The Regression Problem
The regression problem is divided into two processes: learning and prediction. Suppose that be a regression system, where is the set of instances (, ), is the attribute set, is the set of range, is range of the attribute , and is an information function (it assigns a value to each instance on each attribute, namely, ). Let and , where corresponds to the output of . The learning system constructs a model based on the instance set. For the test instance , the learning system can predict the corresponding output by .
4. The Primary Algorithm
In order to find the solution of the problem mentioned above, the algorithm will be presented through three stages. First, cluster the data. The purpose is to prepare for the parallel fuzzy granulation of the data. In this process, we design a novel clustering algorithm that adaptively optimizes the cluster center. According to the algorithm, cluster centers of instances are automatically calculated instead of giving the number of clusters in advance. Next, these cluster centers can be used as reference objects independent of the data to granulate data parallelly. Finally, we transform this instance regression problem into a fuzzy granular regression problem in the granular space. In the fuzzy granular space, we design related operators and loss functions, construct multiple weak fuzzy granular regression trees by optimizing the loss function, and then integrate these weak fuzzy granular regression trees into a strong learner to predict the regression value. The process is shown in Figure 1.
4.1. Clustering Algorithm with Automatic Optimization of Cluster Centers
Classic clustering algorithms need to specify the number of cluster centers in advance to obtain clustering results. The methods depend heavily on the above parameter. If the number of cluster centers is not suitable, it will fall into a local minimum solution. An adaptive clustering algorithm that adaptively selects the number of cluster centers is presented, which is a global optimization algorithm. The principle is as follows. As well-known, if the standard deviation of cluster centers is larger and the standard deviation within clusters is smaller, then the performance is better. Therefore, a loss function is designed as the evaluation criterion, where denotes the standard deviation of the cluster and represents the number of cluster centers. The aim is to decrease the loss function value by adjusting cluster centers until the maximum iteration is achieved or the loss function value hardly changes. In each iteration, a set of cluster centers corresponding to loss function value can be obtained and be integrated as an evaluated set. In each iteration, the ratio of the farthest distance from the remaining instance points to cluster centers and the sum of the farthest distances from all instance points to each cluster center is the probability selected as the next cluster center. When the termination condition is achieved, find a set of cluster centers corresponding to the minimum cost function from this evaluation set, which is what is required. Step 1: remove the instances missing some attribute values Step 2: normalize instances Step 3: initialize parameters, such as maximum iteration , evaluated set (contains cluster centers and loss function value), and current iteration Step 4: initialize current cluster center set and randomly select one instance point as the cluster center , Step 5: calculate the farthest distance between the remaining instance points and all cluster centers and let , and the probability of the instance selected as the next cluster center is . Step 6: if is selected as a cluster center, then Step 7: if , go Step 5. Otherwise, go Step 8. Step 8: calculate loss function and update the evaluated set Step 9: update iteration Step 10: if or , go Step 11 (here, , is a small positive number). Otherwise, go Step 4. Step 11: in the evaluated set EV, select the cluster centers according to Namely, and the optimization number of cluster centers ( expresses the number of elements of the set). Step 12: end.
Algorithm 1 shows the pseudocode of this principle.
4.2. Parallel Fuzzy Granulation
In granulation, serial granulation is adopted in most methods. To enhance efficiency, we propose parallel fuzzy granulation. First, cluster centers can be obtained by the approach mentioned above. Then, instances are divided into a set of instance subsets, which are fuzzy granulated by cluster centers. This process is executed parallelly by MapReduce. According to Algorithm 1, the cluster center set can be obtained.
For , , and , the distance between and on the attribute can be written as follows:where , , and are established. A fuzzy granule induced by the instance and the cluster center can be written as follows:
For simplicity, the fuzzy granule uses also the following equation to denote.
Here, symbol “” is a separator and symbol “+” is an union operation, that is, fuzzy granule denotes the distance set between instance and cluster centers. Its cardinal can be written as
Four operators of fuzzy granule can be designed. For , operator and operator can be written as follows:where is a parameter. For , and , the operators between the fuzzy granule formed by and the one induced by are written as follows:
For and , fuzzy granular vector induced by on the attribute set can be written aswhere symbol “+” denotes an union operation and symbol “−” represents a separator. Its cardinal can be obtained as follows:
Operators of fuzzy granular vector are written as follows:
According to the definitions, the distance between the two fuzzy granular vectors is given by
From the above fuzzy granulation, it can be seen that the fuzzy granules are obtained by calculating instances and cluster centers using the fuzzy operators, and fuzzy granular space consists of these fuzzy granules.
Theorem 1. For , , and , the distance between fuzzy granular vectors satisfies
Proof. According to the definition of fuzzy granule, we have and . Also, from equation (3), we get and . Because of and , the inequalities and can be obtained. Equation (9) shows and . We can further obtain and . From , we getWe divide both sides of the formula by to achieve , that is, we prove
Theorem 2. For , the attribute subsets and satisfy . Let and be fuzzy granular vectors of on and , respectively. Then, has been proven.
Proof. For , equation (9) shows that . Thanks to , we have that for , . Because of , for , we get and , that is, if , we can obtain . In sum, is proved.
Now, we take a case to describe the fuzzy granulation process.
Example 1. As illustrated in Table 1, given an instance set , an attribute set , regression value set , cluster center set , and parameter , the fuzzy granulation is as follows.
We take instance as an example. The distances between and the cluster centers and on the attributes , , and are given as follows: Equation (5) shows that fuzzy granules of on , , and are calculated as follows: In the same way, fuzzy granules of on are obtained as follows: According to , here . When , suppose that and . When , suppose that and ), we haveSimilarly, we can getThus, we can obtain the distance between fuzzy granular vectors of and on with by
4.3. Fuzzy Granular Regression Tree
After the above fuzzy granulation, the data can be transformed into fuzzy granules. In fuzzy granular space, we give the following definition.
Definition 1. Suppose that is a regression system, where is a instance set, is a input variable, and is output variable corresponding to . Let , and . is a cluster center set and an attribute set. For , we can obtain fuzzy granule via fuzzy granulation (the following is abbreviated as ). Fuzzy granules and operators can create new fuzzy granules, such as . Repeating this process can expand a fuzzy granular space on . According to training data, a fuzzy granular rule base can be generated asA fuzzy granular regression tree corresponds to a division of the fuzzy granular space and output value on the divided unit.
Suppose that the input space has been splitted into units , and there is a fixed output value on each unit . Thus, the fuzzy granular regression tree can be expressed asHere, is a fuzzy granular vector and is an indicative function, which can be written aswhere denotes the number of the fuzzy granular vector. The question becomes how to divide the fuzzy granular space.
Here, we use a heuristic method to select feature as the segmentation variable and the cardinal of fuzzy granule as the optimal segmentation point. Two areas are defined byThen, find the optimal segmentation variable and point . Specifically, solve the loss functionFor input variable , the optimal segmentation point can be calculated aswhere denotes the number of elements.
Traverse all input variables and find the optimal segmentation variable to form a pair . Divide the input space into two areas in turn. Then, repeat the above division process for each area until the terminal condition is met. In this way, a fuzzy granule regression tree is generated. The algorithm is described in Algorithm 2.
4.4. Boosted Fuzzy Granular Regression Trees
BFGRT can be an algorithm that integrates multiple fuzzy granular regression trees through the idea of ensemble learning to draw conclusions. It does not rely on only one fuzzy granular regression tree but adopts many fuzzy granular regression trees to solve the task together, using the weighted average of the regression values of multiple fuzzy granular regression trees as the final regression value. Assuming that is the instance set, fuzzy granular space is , fuzzy granular rule base is , the number of instances is , and the number of attributes is , we construct parallelly fuzzy granular regression trees as follows: Step 1: create tasks Step 2: instance set extraction: randomly draw fuzzy granular vectors from with replacement and repeat them times. The probability of each fuzzy granular vector being selected is . The unselected fuzzy granular vectors form the out of bag data as the test set. Step 3: attribute extraction: extract attributes from to compose attribute subset Step 4: attribute selection: calculate the optimal segmentation attribute and the optimal segmentation point in the data set of node, divide the node into two child nodes, and allocate the remaining fuzzy granular vectors to the child nodes Step 5: generate a fuzzy granular tree. Repeat Step 3 in the fuzzy granular vector set of each child node to recursively split the nodes until all leaf nodes are generated. Step 6: repeat steps 2–5 to get different fuzzy granular regression trees, which correspond to tasks Step 7: BFGRT can consist of fuzzy granular regression trees and the process can be executed by task, that is, Where , and represents the root mean square error (RMSE) of fuzzy granular tree. This detail of the algorithm can be illustrated in Algorithm 3.