Boosted Fuzzy Granular Regression Trees

Li, Wei; Luo, Youmeng; Tang, Chao; Zhang, Kaiqiang; Ma, Xiaoyu

doi:https://doi.org/10.1155/2021/9958427

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 9958427 | https://doi.org/10.1155/2021/9958427

Boosted Fuzzy Granular Regression Trees

Wei Li,¹Youmeng Luo,¹Chao Tang,²Kaiqiang Zhang,¹and Xiaoyu Ma¹

Academic Editor: Petr Hájek

Received16 Mar 2021

Revised07 Jun 2021

Accepted09 Jul 2021

Published22 Jul 2021

Abstract

The regression problem is a valued problem in the domain of machine learning, and it has been widely employed in many fields such as meteorology, transportation, and material. Granular computing (GrC) is a good approach of exploring human intelligent information processing, which has the superiority of knowledge discovery. Ensemble learning is easy to execute parallelly. Based on granular computing and ensemble learning, we convert the regression problem into granular space equivalently to solve and proposed boosted fuzzy granular regression trees (BFGRT) to predict a test instance. The thought of BFGRT is as follows. First, a clustering algorithm with automatic optimization of clustering centers is presented. Next, in terms of the clustering algorithm, we employ MapReduce to parallelly implement fuzzy granulation of the data. Then, we design new operators and metrics of fuzzy granules to build fuzzy granular rule base. Finally, a fuzzy granular regression tree (FGRT) in the fuzzy granular space is presented. In the light of these, BFGRT can be designed by parallelly combing multiple FGRTs via random sampling attributes and MapReduce. Theory and experiments show that BFGRT is accurate, efficient, and robust.

1. Introduction

Learning ability is the basic feature of human intelligence. Prediction is the ability of humans to judge the future based on learning, and it is also a concrete manifestation of human learning ability. Prediction has two concrete forms, regression and classification, which is also one of the substance problems in machine learning, data mining, and statistics. How to train a learner on the basis of existing data is the primary research purpose of the regression problem. It can help people discover the laws of development and change of things from massive historical data, so as to make scientific and quantitative predictions about the future. In the classification problem, the target output is to take values in a finite discrete space, and these values can be either ordered or disordered. In regression problems, the range of output variables is ordered and continuous.

Research on regression problems based on ensemble learning is a hot topic in machine learning research in recent years and has received extensive attention. Nevertheless, the application of ensemble learning to regression problems is still unsatisfactory and needs further research. The regression problem to be solved in reality often comes from a very complex social economic system, and various intricate internal and external factors have linear or nonlinear effects on it. Some are inherent factors, and some are accidental factors. A single learner can only learn for a certain type of data, and it is difficult to get satisfied learning results. Especially in the big data environment, traditional regression learning algorithms are not able to meet the learning requirements of massive complex data in terms of predictive performance and scalability. Although a great progress has been made in the related theoretical research and technology of machine learning, how to continuously improve the generalization ability and learning efficiency of learners is still an important issue and continuous pursuit of machine learning research.

GrC is an effective method for simulating problem-solving thinking of human and processing analysis tasks of big data. It abstracts and divides complex problems into several simpler ones, which helps to better analyze and solve problems. Combining granular computing with ensemble learning to solve regression problems is a good idea.

As the first of the four main research directions in machine learning, ensemble learning learns several different single learners on the training dataset and then combines the respective prediction results as the final output one. Therefore, in most cases, it can perform better than a single learner on generalization and stability [1]. The weak learner can be upgraded to a strong learner is one of the main theoretical foundations of ensemble learning. Kearns and Valiant gave the concepts of weak learning and strong learning from the perspective of classification problems [2]. Avnimelech and Intrator introduced the above concepts into the regression problem and gave a proof of the equivalence of the strong and weak learning of the one [3]. Another major theoretical basis for ensemble learning is the “No free lunch” theory proposed by Wolpert [4]. The implementation method of ensemble learning has received extensive attention from researchers and has achieved some research results. These results can be summarized into two categories: one is the direct strategy and the other is the overproduce and choose approach. Liu and his colleagues proposed an ensemble learning method via negative correlation learning [5]. Fang et al. proposed a selective boosting ensemble learning algorithm [6]. Breiman obtained multiple different training datasets by repeatedly sampling the original sample dataset [7]. Schapire proposed the boosting method, whose main idea is to continuously strengthen the learning of “difficult samples” in the iterative learning process [8]. Ho presented the random subspace method that uses different subsets of the feature space to train and generate multiple learners [9]. This method is different from bootstrap sampling, boosting, and cross-validation approaches and emphasizes the differences between different feature subsets. Breiman designed the output smearing method for regression problems. The primary thought is to inject Gaussian noise into the output variable [10]. The same method is also used to manipulate input variables [11]. Gheyas and Simith presented a dynamic weighted combination method, which is to dynamically adjust the corresponding weights through the predictive performance of the individual learner [12]. The research of ensemble learning on regression problems started late, and there are relatively few research results in applications, such as power load forecasting [13, 14].

Granular computing is a very popular research direction in the field of computational intelligence in the past few decades. The core task of GrC is to construct, represent, and process information granules. Information granule is the foundational element of GrC. It is a set of some elements gathered together according to indistinguishability, similarity, or functionality. As a key component of knowledge representation and processing, information granules always appear with information granulation, and information granulation occurs in the process of abstracting data or inducing knowledge from information. Information (data) forms information granules through information granulation. The representation forms of information granules often include interval [15], fuzzy set [16], and rough set [17]. The purpose of information granulation is to separate complex problems into several simpler problems. This way can make us capture the details of the problem. Information granulation and information granules are almost infiltrated in various human cognition, decision-making, and reasoning processes and are closely related to information granularity. For example, in daily life and work, people usually use different time intervals such as day, month, and year to granulate time to obtain information granules of different sizes, and the size of the formed information granules implies the level of information granularity used in granulation. Through the abstraction of the problem, the “finer” and “more special” information granules are transformed into “more coarse” and “more general” information granules and “more coarse” and “more general” information granules can be refined into “finer” and “more special” information granules. GrC helps people analyze and watch the similar problem from extremely different granularities through the transformation between information granules and finally find the most suitable level of analysis and problem solving. The basic composition of GrC mainly includes three parts: granule, granular layer, and granular structure [18]. Granules are the foundational elements of GrC models [19]. The granular layer is an abstract description of the problem space according to a certain granulation criterion [20]. Granular structure is an abstract description of all granular layers. One granulation criterion corresponds to one granular layer, and different granulation rules correspond to multiple granular layers. It shows that people observe, comprehend, and solve problems from various views. All the interconnections between the granular layers form an interactive structure called the granular structure [21]. There are two basic problems in GrC: granulation and calculation based on granulation [22]. Granulation is the first step of GrC to solve the problem, and it is a process of constructing a problem-solving knowledge space. The human brain’s cognitive process of external things is a typical granulation, that is, from the overall and rough cognition, through the continuous processing and refinement of information, it finally forms a partial and detailed analysis and reasoning. The granulation process mainly involves granulation criteria, granulation methods, granule descriptions, and other issues [23]. Granularization-based computing refers to solving the original problem or logical reasoning with granules as the object of operation and the granularity level as the carrier. It is mainly divided into two types: mutual reasoning between granules on the same granular layer and conversion between granules on different granular layers. After years of study and growth, many models of GrC and their extended models have been proposed. The most representative ones are three theoretical models: the fuzzy set model, rough set model, and quotient space model.

1.1. Fuzzy Set Model

Zadeh presented the fuzzy set theory in 1965. He pointed out that a single element always belongs to a set to some extent and may also belong to several sets to a different degree [16]. Under the fuzzy set theory system, he presented a GrC model on the basis of word computing. The core idea of this method is to use words for calculation and reasoning instead of numbers, to achieve fuzzy reasoning, and control of complex information systems, which is in line with human thinking. In addition, Wang employed natural language knowledge to establish a linguistic dynamics system based on word computing and designed a computational theoretical framework for a linguistic dynamic system based on word computing by fusing concepts and schemes in multiple fields [24].

1.2. Rough Set Model

The degree to an object belonging to a certain set varies with the granularity of the attribute. In order to better characterize the ambiguity of set boundaries, Pawlak proposed the rough set theory in the 1980s. Its essential idea is to adopt indistinguishable relations (equivalence relations) to establish a division of the universe of equivalence classes to establish an approximate space. In the approximate space, upper approximate set and lower approximate set are employed to approximate a set with fuzzy boundaries [17]. The classic rough set theory is mainly aimed at a thorough information system where all feature values of the object processed are understood. In order to make the rough set theory be suitable for handling of uncomplete information systems, there are currently two main ways: one is to fill incomplete data and the other is to expand the classic rough set model. Kryszkiewicz proposed an extended rough set model via tolerance relations [25]. Stefanowski and his teamwork presented an extended rough set model on the basis of asymmetric similarity relations and an extended rough set model by quantitative tolerance relations [26]. Wang analyzed the shortcomings of the previous two expansion models and designed a rough set model based on the restricted tolerance relationship. He found that the tolerance relationship and the asymmetric similarity relationship are the two extremes of the indistinguishable relationship expansion, that is, the condition of the tolerance relationship is too loose, the condition of the asymmetric similar relationship is too tight, and the limit tolerance relationship is between them [27]. Pawlak employed the idea that elements in the equivalent category have the same membership function and explored the structure and granularity of knowledge granules [28]. Polkowski and Skowron adopted the rough mereology method, neural network technology, and the idea of knowledge granulation to design a rough neural computing model, which combines the division block of the rough set and the neural network to form an efficient neural computing method [29]. Peters and his colleagues employed the indistinguishable relationship to divide the real number into multiple subintervals and divided a whole area into several grid units, and each unit was regarded as a granule, and proposed metric is between two information granules on the adjacent relationship and the containment relationship, respectively [30].

1.3. Quotient Space Theory Model

B. Zhang and L. Zhang presented the theory of quotient space when studying problem solving. They said that “the recognized feature of quasi-intelligence is that human can analyze and watch the same problem from different granularity” [31]. The quotient space theory established a formal system of quotient structure and proposed a series of theories and approaches to solve problems in the fields of heuristic search, information fusion, reasoning, and path planning. There were some related research and applications [32, 33].

In addition, many new models have been proposed, such as granular matrices for reduction and evaluation [34], three-way decision model [35–37], and ensemble learning for big data based on MapReduce [38–44]. In this study, we adopt parallel granulation and ensemble learning based on MapReduce to solve the regression problem from granular computing angle and enhance the performance of regression and efficiency of granulation.

2. Contributions

In this study, a regression problem is equivalently transformed into the fuzzy granular space solution, and BFGRT are constructed from angle of GrC and ensemble learning to solve the regression problem. The main contributions are as follows.(i)First, an adaptive clustering algorithm is proposed, which can adaptively find the optimal cluster centers. It is a global optimization algorithm that solves the problem that classic clustering algorithms rely heavily on the initial cluster centers and are easy to fall into local optimal solution.(ii)Second, parallel fuzzy granulation of data based on the above clustering algorithm combined with MapReduce is presented, which solves the problem of high complexity of traditional granulation and enhances granulation efficiency(iii)Third, we define fuzzy granules and related metric operators, design a loss function, construct an individual fuzzy granular regression tree in the granular space by optimizing the loss function, and then parallelly integrate multiple fuzzy granular regression trees built by different attributes into a stronger learner based on MapReduce to accurately solve the regression problem

3. The Regression Problem

The regression problem is divided into two processes: learning and prediction. Suppose that be a regression system, where is the set of instances (, ), is the attribute set, is the set of range, is range of the attribute , and is an information function (it assigns a value to each instance on each attribute, namely, ). Let and , where corresponds to the output of . The learning system constructs a model based on the instance set. For the test instance , the learning system can predict the corresponding output by .

4. The Primary Algorithm

In order to find the solution of the problem mentioned above, the algorithm will be presented through three stages. First, cluster the data. The purpose is to prepare for the parallel fuzzy granulation of the data. In this process, we design a novel clustering algorithm that adaptively optimizes the cluster center. According to the algorithm, cluster centers of instances are automatically calculated instead of giving the number of clusters in advance. Next, these cluster centers can be used as reference objects independent of the data to granulate data parallelly. Finally, we transform this instance regression problem into a fuzzy granular regression problem in the granular space. In the fuzzy granular space, we design related operators and loss functions, construct multiple weak fuzzy granular regression trees by optimizing the loss function, and then integrate these weak fuzzy granular regression trees into a strong learner to predict the regression value. The process is shown in Figure 1.

4.1. Clustering Algorithm with Automatic Optimization of Cluster Centers

Classic clustering algorithms need to specify the number of cluster centers in advance to obtain clustering results. The methods depend heavily on the above parameter. If the number of cluster centers is not suitable, it will fall into a local minimum solution. An adaptive clustering algorithm that adaptively selects the number of cluster centers is presented, which is a global optimization algorithm. The principle is as follows. As well-known, if the standard deviation of cluster centers is larger and the standard deviation within clusters is smaller, then the performance is better. Therefore, a loss function is designed as the evaluation criterion, where denotes the standard deviation of the cluster and represents the number of cluster centers. The aim is to decrease the loss function value by adjusting cluster centers until the maximum iteration is achieved or the loss function value hardly changes. In each iteration, a set of cluster centers corresponding to loss function value can be obtained and be integrated as an evaluated set. In each iteration, the ratio of the farthest distance from the remaining instance points to cluster centers and the sum of the farthest distances from all instance points to each cluster center is the probability selected as the next cluster center. When the termination condition is achieved, find a set of cluster centers corresponding to the minimum cost function from this evaluation set, which is what is required. Step 1: remove the instances missing some attribute values Step 2: normalize instances Step 3: initialize parameters, such as maximum iteration , evaluated set (contains cluster centers and loss function value), and current iteration Step 4: initialize current cluster center set and randomly select one instance point as the cluster center , Step 5: calculate the farthest distance between the remaining instance points and all cluster centers and let , and the probability of the instance selected as the next cluster center is . Step 6: if is selected as a cluster center, then Step 7: if , go Step 5. Otherwise, go Step 8. Step 8: calculate loss function and update the evaluated set Step 9: update iteration Step 10: if or , go Step 11 (here, , is a small positive number). Otherwise, go Step 4. Step 11: in the evaluated set EV, select the cluster centers according to Namely, and the optimization number of cluster centers ( expresses the number of elements of the set). Step 12: end.

Algorithm 1 shows the pseudocode of this principle.

Input: instance set , maximum iteration , threshold value
Output: optimization cluster center set
(1)	Remove instances missing some attribute values.
(2)	Normalize each attribute value into .
(3)	Let evaluated set be an empty set.
(4)	Initialize current iteration.
(5)	WHILE OR
(6)	. Initialize the current cluster center set.
(7)	. Initialize the number of cluster center.
(8)	Random choose 1 instance point as cluster center.
(9)
(10)
(11)	WHILE
(12)
(13)	The probability of that is selected as next cluster center
(14)	p = GenProb(); Random generate a probability.
(15)	IF THEN
(16)	END WHILE
(17)	. Calculate the loss function value and cluster center in this iteration and update the evaluated set.
(18)	Update the current iteration.
(19)	END WHILE
(20)	. In the evaluated set , choose the cluster center set with minimum loss function value.
Return the optimization cluster centers and their number.
(21)	, ( represents the number of elements of the set).

4.2. Parallel Fuzzy Granulation

In granulation, serial granulation is adopted in most methods. To enhance efficiency, we propose parallel fuzzy granulation. First, cluster centers can be obtained by the approach mentioned above. Then, instances are divided into a set of instance subsets, which are fuzzy granulated by cluster centers. This process is executed parallelly by MapReduce. According to Algorithm 1, the cluster center set can be obtained.

For , , and , the distance between and on the attribute can be written as follows:where , , and are established. A fuzzy granule induced by the instance and the cluster center can be written as follows:

For simplicity, the fuzzy granule uses also the following equation to denote.

Here, symbol “” is a separator and symbol “+” is an union operation, that is, fuzzy granule denotes the distance set between instance and cluster centers. Its cardinal can be written as

Four operators of fuzzy granule can be designed. For , operator and operator can be written as follows:where is a parameter. For , and , the operators between the fuzzy granule formed by and the one induced by are written as follows:

For and , fuzzy granular vector induced by on the attribute set can be written aswhere symbol “+” denotes an union operation and symbol “−” represents a separator. Its cardinal can be obtained as follows:

Operators of fuzzy granular vector are written as follows:

According to the definitions, the distance between the two fuzzy granular vectors is given by

From the above fuzzy granulation, it can be seen that the fuzzy granules are obtained by calculating instances and cluster centers using the fuzzy operators, and fuzzy granular space consists of these fuzzy granules.

Theorem 1. For , , and , the distance between fuzzy granular vectors satisfies

Proof. According to the definition of fuzzy granule, we have and . Also, from equation (3), we get and . Because of and , the inequalities and can be obtained. Equation (9) shows and . We can further obtain and . From , we getWe divide both sides of the formula by to achieve , that is, we prove

Theorem 2. For , the attribute subsets and satisfy . Let and be fuzzy granular vectors of on and , respectively. Then, has been proven.

Proof. For , equation (9) shows that . Thanks to , we have that for , . Because of , for , we get and , that is, if , we can obtain . In sum, is proved.

Now, we take a case to describe the fuzzy granulation process.

Example 1. As illustrated in Table 1, given an instance set , an attribute set , regression value set , cluster center set , and parameter , the fuzzy granulation is as follows.
We take instance as an example. The distances between and the cluster centers and on the attributes , , and are given as follows: Equation (5) shows that fuzzy granules of on , , and are calculated as follows: In the same way, fuzzy granules of on are obtained as follows: According to , here . When , suppose that and . When , suppose that and ), we haveSimilarly, we can getThus, we can obtain the distance between fuzzy granular vectors of and on with by

4.3. Fuzzy Granular Regression Tree

After the above fuzzy granulation, the data can be transformed into fuzzy granules. In fuzzy granular space, we give the following definition.

Definition 1. Suppose that is a regression system, where is a instance set, is a input variable, and is output variable corresponding to . Let , and . is a cluster center set and an attribute set. For , we can obtain fuzzy granule via fuzzy granulation (the following is abbreviated as ). Fuzzy granules and operators can create new fuzzy granules, such as . Repeating this process can expand a fuzzy granular space on . According to training data, a fuzzy granular rule base can be generated asA fuzzy granular regression tree corresponds to a division of the fuzzy granular space and output value on the divided unit.
Suppose that the input space has been splitted into units , and there is a fixed output value on each unit . Thus, the fuzzy granular regression tree can be expressed asHere, is a fuzzy granular vector and is an indicative function, which can be written aswhere denotes the number of the fuzzy granular vector. The question becomes how to divide the fuzzy granular space.
Here, we use a heuristic method to select feature as the segmentation variable and the cardinal of fuzzy granule as the optimal segmentation point. Two areas are defined byThen, find the optimal segmentation variable and point . Specifically, solve the loss functionFor input variable , the optimal segmentation point can be calculated aswhere denotes the number of elements.
Traverse all input variables and find the optimal segmentation variable to form a pair . Divide the input space into two areas in turn. Then, repeat the above division process for each area until the terminal condition is met. In this way, a fuzzy granule regression tree is generated. The algorithm is described in Algorithm 2.

Input: instance set , regression value set
Output: fuzzy granular tree
(1)	Remove the instances missing some attribute values.
(2)	Normalize each attribute value.
(3)	Calculate the cluster center set (Algorithm 1).
(4)	and . // Parallel distributed fuzzy granulation.
(5)	//This is parallel process. Here, take as example.
FOR to

FOR to
, sample is fuzzy granulated as
END FOR
Build a fuzzy granular vector ;
Get label of , ;
A fuzzy granular rule can be built. ;
END FOR
(6)	Select the optimal segmentation variable (i.e., the attribute ) and segmentation point (i.e., ) by solving equation
That is, traverse variable to find the pair that minimizes the loss function by fixing the segmentation variable and scanning segmentation point .
(7)	Divide the area with the selected pair and decide output value as follows:




(8)	Continue to call Step 6 and Step 7 for the two subregions until the number of split nodes is .
(9)	Divide the input fuzzy granular space into regions and generate a fuzzy granular regression tree

4.4. Boosted Fuzzy Granular Regression Trees

BFGRT can be an algorithm that integrates multiple fuzzy granular regression trees through the idea of ensemble learning to draw conclusions. It does not rely on only one fuzzy granular regression tree but adopts many fuzzy granular regression trees to solve the task together, using the weighted average of the regression values of multiple fuzzy granular regression trees as the final regression value. Assuming that is the instance set, fuzzy granular space is , fuzzy granular rule base is , the number of instances is , and the number of attributes is , we construct parallelly fuzzy granular regression trees as follows: Step 1: create tasks Step 2: instance set extraction: randomly draw fuzzy granular vectors from with replacement and repeat them times. The probability of each fuzzy granular vector being selected is . The unselected fuzzy granular vectors form the out of bag data as the test set. Step 3: attribute extraction: extract attributes from to compose attribute subset Step 4: attribute selection: calculate the optimal segmentation attribute and the optimal segmentation point in the data set of node, divide the node into two child nodes, and allocate the remaining fuzzy granular vectors to the child nodes Step 5: generate a fuzzy granular tree. Repeat Step 3 in the fuzzy granular vector set of each child node to recursively split the nodes until all leaf nodes are generated. Step 6: repeat steps 2–5 to get different fuzzy granular regression trees, which correspond to tasks Step 7: BFGRT can consist of fuzzy granular regression trees and the process can be executed by task, that is, Where , and represents the root mean square error (RMSE) of fuzzy granular tree. This detail of the algorithm can be illustrated in Algorithm 3.

Input: instance set , regression value set , the number of fuzzy granular regression tree
Output: boosted fuzzy granular regression trees
(1)	Get a fuzzy granular vector rule base by parallel fuzzy granulation of the dataset (see Algorithm 2, Algorithm 4, and Algorithm 5.)
(2)	Create tasks, namely,
(3)	Execute the following operations for each independent task ():
MapFunction(key, value), where key = offset of instance and indicates that fuzzy granular vectors are randomly
//selected from .
//Randomly select attributes from the attribute set (constitutes attribute subset , that is, )
//Form a fuzzy granular rule set , build a fuzzy granular regression tree , and get its RMSE
, where .
(4)	FOR to instances-total-number
(5)	SubsetID = i mod J
(6)	context.write(SubsetID, FuzzyGranularVector)
(7)	END FOR
END MapFunction
(8)	ReduceFunction(key, value)//Here, key = SubsetID, value = FuzzyGranualrVector
(9)	Job.addCache(FuzzyGranularVector[SubsetID])
(10)	= train(SubsetID, FuzzyGranularVector)//(See Step 6–Step 9 of Algorithm 2.)
(11)	context.write(1,())
END ReduceFunction
(12)	//Calculate BFGRT composed of the linear combination of fuzzy granular regression trees.
Where .

Input: <offset, instance> and (the number of divided instance sets)
Output: <SubsetID, instance>
(1)	FOR to instances-total-number
(2)	SubsetID = i mod t
(3)	context.write(SubsetID, instance)
(4)	END FOR

Input: SubSetID, ClusterSet C, and instance subset X[SubSetID]
Output: Fuzzy Granular Set
(1)	Job.addCache(X[SubSetID])
(2)	= Granulation(X[SubSetID],C)//See Step 5 of Algorithm 2.
(3)	FuzzyGranularSet = context.write(1)

4.5. Regression

Given a test instance, we fuzzy granulate the test instance to get a fuzzy granular vector. Then, use BFGRT to predict the fuzzy granular vector to achieve the regression value. Algorithm 6 demonstrates the algorithm.

Input: test instance , cluster center set , BFGRT .
Output: regression value of instance
(1)	Granulate to fuzzy granular vector
(2)	For to
, where according to the parameters of , attribute set selected satisfies

End For
(3)
(4)
(5)	Return

5. Experimental Analysis

In this experiment, the results presented are generated on the server of 2∗Intel Xeon [email protected] GHz with 64 GB memory. Datasets include 3 datasets gathered from the UC Irvine Machine Learning Repository and 3 datasets with 1% noise constructed (Table 2), and 10-fold cross-validation which belongs to sampling without replacement was adopted to test the performance of BFGRT, as illustrated in Algorithm 3. The basic idea of 10-fold cross-validation is to partition the dataset into nonoverlapping equal parts. In the training process, one part of dataset is selected to verify the generalization ability of the learner, and the remaining parts are used as a training learner. After training times, learners can be obtained. This method is very similar to the bagging method and also supports parallel learning. Root mean square error (RMSE) and execution time are the metrics of the performance. We compared classic fuzzy granulation with parallel fuzzy granulation proposed in Figure 2 and analyzed the performance of support vector regression (SVR), random forest (RF), long short-term memory (LSTM), and BFGRT proposed in Figures 3–5 . We also gave the relation between the number of cluster centers and RMSE.

(a)

(b)

(a)

(b)

(a)

(b)

Fuzzy granulation of data is an important process of modeling. Some traditional fuzzy granulation methods do not require clustering, of which idea is to construct a matrix using the similarity of each sample to other samples. This kind of thinking does not have the conditions for parallel execution and can only be executed serially, and its time complexity is , where is the number of instances and is the number of attributes. Parallel fuzzy granulation proposed is executed as follows. First, we obtain cluster centers by the clustering algorithm designed in the study. Next, we need to construct fuzzy granules through each instance and each cluster center. This process can be executed parallelly by MapReduce. The time complexity is , where denotes the number of cluster centers , and represents the number of parallel tasks. Theoretically, the efficiency of BFGRT is % () higher than that of traditional fuzzy granulation.

MapReduce can be used for parallel fuzzy granulation. The main thought is as follows: partition the sequence file job into multiple independently runnable map tasks, assign them to several processors to execute, produce intermediate results, and then collect the reduce task operations to generate the final output. Map tasks and reduce tasks can be both parallelly executed. The MapReduce process is divided into two parts, i.e., map and reduce. Map function and reduce function of the fuzzy granulation process are shown in Algorithm 4 and Algorithm 5, respectively. The results of parallel granulation are demonstrated in Table 3. As demonstrated in Figure 2, when the number of parallel tasks is 3, parallel fuzzy granulation performs better than classic fuzzy granulation by about 252%, 647%, and 438% regarding the metrics, minimum efficiency, maximum efficiency, and average efficiency, respectively.

As shown in Figure 3(a), tested on dataset combined cycle power plant, BFGRT show a shape that is low in the middle and high on both sides. When the number of cluster centers is between 3000 and 6000, the RMSE of BFGRT is lower than the other three methods. When the number of cluster centers is within 3000, the RMSE of BFGRT drops quickly, and the slope of the curve drops significantly. When the number of cluster centers is higher than 5000, the curve of BFGRT shows a small local oscillation, which is lower than the other three methods. In particular, when the number of cluster centers is 3986, the RMSE of BFGRT achieves the minimum value of 3.1151, which is about 46.88%, 4.22%, and 32.85% better than SVR, RF, and LSTM, respectively. BFGRT is slightly better than RF and far better than SVR and LSTM. When the number of cluster centers is 1006, the RMSE of BFGRT reaches the maximum value of 3.4681, which is about 24.20% and 25.24% better than RF and LSTM, respectively, and is about 6.64% worse than RF. On average, the RMSE of BFGRT is 3.2441, which is better than 4.5753 of SVR, 3.2522 of RF, and 4.6389 of LSTM (i.e., 29.10%, 0.25%, and 30.07% improvement, respectively).

After we added 1% noise to the dataset, as illustrated in Figure 3(b), the curve of BFGRT resembles an inverted “Mexican straw hat.” When the number of cluster centers is 3986, the RMSE of BFGRT gets the minimum value of 3.3151, while that of SVR, RF, and LSTM are 4.8946, 3.5322, and 4.9289 (i.e., 32.27%, 6.15%, and 32.74% improvement, respectively). When the number of cluster centers is 9360, BFGRT obtain the maximum value of 3.5651, which improves the performance by about 27.16% and 27.67% compared with SVR and LSTM, respectively, and is about 0.93% worse than RF. On average, the RMSE of BFGRT is 3.4300, which performs about 29.92%, 2.89%, and 30.41% better than SVR, RF, and LSTM, respectively. In terms of the degree of noise influence, the RMSE of SVR, RF, LSTM, and BFGRT on the noisy dataset increases by about 6.98%, 8.61%, 6.25%, and 5.73%, respectively. BFGRT is less affected by noise and more robust than the other three algorithms.

As demonstrated in Figure 4(a), in the dataset bias correction of numerical prediction model temperature forecast, there are 7750 instances and 25 features. The shape of RMSE of BFGRT can be roughly divided into two segments with the number of cluster centers of 4000 as the cutting point. The part of less than 4000 is a descending curve and that of greater than 4000 is an ascending one. When the number of cluster centers gets 4021, the minimum value of RMSE of BFGRT is 0.7013, while SVR, RF, and LSTM just get 0.9129, 0.7292, and 0.9734 (that is, 23.18%, 3.83%, and 27.95% improvement, respectively). When the number of cluster centers is 1592, RMSE of BFGRT reach the maximum value of 0.7387, which is about 1.30% more than RF and 19.08% and 24.11% less than SVR and LSTM, respectively. The average RMSE of BFGRT is 0.7189, which is better than 21.25%, 1.42%, and 26.15% than SVR, RF, and LSTM, respectively.

The performance on a noisy dataset is shown in Figure 4(b). The RMSE of BFGRT is lower than the other three algorithms. The RMSE of BFGRT has a minimum value of 0.9016, a maximum value of 0.9397, and an average value of 0.9221. By contrast, the RMSE of SVR, RF, and LSTM is 1.2143, 1.0297, and 1.2745, respectively. The mean value of the RMSE of BFGRT is about 24.07%, 10.45%, and 27.65% better than SVR, RF, and LSTM, respectively. The performance of the algorithms analyzed in the dataset containing noise is as follows: SVR, RF, LSTM, and BFGRT have increased by 33.02%, 41.21%, 30.93%, and 27.48%, respectively. It can be seen that BFGRT can be less sensitive to noise than the other three algorithms.

The number of instances in dataset metro interstate traffic volume is more than 4 times that of datasets mentioned above. As illustrated in Figure 5(a), the maximum value of RMSE of BFGRT is 2259.1858, while that of SVR is 3050.0148 and that of RF is 2435.1646 (i.e., 25.93% and 7.23% improvement, respectively). Compared with LSTM, BFGRT decrease by about 2.27%. The minimum value of RMSE of BFGRT is 2154.1888, which performs 29.37%, 11.54%, and 2.49% better than SVR, RF, and LSTM, respectively. The mean value of RMSE of BFGRT is 2202.8738, which is 27.77%, 9.54%, and 0.28% better than SVR, RF, and LSTM, respectively. From the dataset containing 1% noise, as shown in Figure5(b), BFGRT improve the performance by about 29.08%, 11.20%, and 1.86% than SVR, RF, and LSTM, respectively. SVR, RF, LSTM, and BFGRT have increased by 2.04%, 2.07%, 2.10%, and 0.20% (the mean value of RMSE) on the noisy dataset, respectively. Compared with the other three algorithms, BFGRT is the algorithm least affected by noise.

The number of instances in dataset online shopping is more than 160000. As shown in Figure 6(a), the maximum value of RMSE of BFGRT is 1.88, while that of SVR is 2.12, that of RF is 2.03, and that of LSTM is 1.89 (i.e., 11.32%, 7.39%, and 0.53% improvement, respectively). From the dataset containing 1% noise, as shown in Figure 6(b), the maximum value of RMSE of BFGRT is 1.94, which has improved by about 16.01%, 9.77%, and 0.51%, respectively, compared with SVR, RF, and LSTM.

(a)

(b)

From the above analysis, it can be seen that BFGRT outperforms SVR, RF, and LSTM in the six datasets. In particular, it is stable on the three datasets with noise and is less disturbed by noise. Judging from the shape of the RMSE of BFGRT, it presents a form of low middle and high sides. When the number of cluster centers is close to the median of the number of instances, the performance can be optimal. For datasets that contain noise, we also found that BFGRT has the better robustness. The main reason is that BFGRT contains global comparison ideas in the fuzzy granulation process, which can overcome the noisy interference to some extent. This is also a great advantage of BFGRT designed.

6. Conclusion

In this study, we propose BFGRT suitable for the regression problem. In the algorithm, the idea of parallel fuzzy granulation is introduced to further improve the efficiency of data granulation. In the process of parallel fuzzy granulation, we design a clustering algorithm with automatic optimization of cluster centers. Through parallel fuzzy granulation, a regression problem can be solved in the fuzzy granular space where we present new operators and metrics between fuzzy granules. In the fuzzy granular space, we design a loss function to select optimal attribute as split point and construct recursively a fuzzy granular regression tree. Based on these, we build multiple fuzzy granular regression trees according to different attributes to form BFGRT to predict a test instance. In the future, BFGRT can be combined with cloud computing and thing of Internet to process the big data.

Data Availability

The dataset used to support the findings of this study is available from the UC Irvine Machine Learning Repository.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the Scientific Research “Climbing” Program of Xiamen University of Technology, China (XPDKT20027), in part by the University Natural Sciences Research Project of Anhui Province, China (KJ2020A0660), and in part by the Natural Science Foundation of Anhui Province, China (2008085MF202).

References

T. G. Dietterich, “Machine-learning research,” AI Magazine, vol. 18, no. 4, pp. 97–136, 1997.
View at: Google Scholar
M. Kearns and L. Valiant, “Cryptographic limitations on learning boolean formulae and finite automata,” Journal of the ACM, vol. 41, no. 1, pp. 67–95, 1994.
View at: Publisher Site | Google Scholar
R. Avnimelech and N. Intrator, “Boosting regression estimators,” Neural Computation, vol. 11, no. 2, pp. 499–520, 1999.
View at: Publisher Site | Google Scholar
D. H. Wolpert, “The supervised learning no-free-lunch theorems,” Soft Computing and Industry, Springer, London, UK, 2002.
View at: Publisher Site | Google Scholar
Y. Liu, X. Yao, and T. Higuchi, “Evolutionary ensembles with negative correlation learning,” IEEE Transactions on Evolutionary Computation, vol. 4, no. 4, pp. 380–387, 2000.
View at: Publisher Site | Google Scholar
Y.-K. Fang, Y. Fu, J.-L. Zhou, L. She, and C.-J. Sun, “Selective boosting algorithm for maximizing the soft margin,” Journal of Software, vol. 23, no. 5, pp. 1132–1147, 2012.
View at: Publisher Site | Google Scholar
L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.
View at: Publisher Site | Google Scholar
R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, pp. 197–227, 1990.
View at: Google Scholar
T. K. Ho, “The random subspace method for constructing decision forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832–844, 1998.
View at: Publisher Site | Google Scholar
L. Breiman, “Randomizing outputs to increase prediction accuracy,” Machine Learning, vol. 40, no. 3, pp. 229–242, 2000.
View at: Publisher Site | Google Scholar
G. P. Zhang, “A neural network ensemble method with jittered training data for time series forecasting,” Information Sciences, vol. 177, no. 23, pp. 5329–5346, 2007.
View at: Publisher Site | Google Scholar
I. A. Gheyas and L. S. Smith, “A novel neural network ensemble architecture for time series forecasting,” Neurocomputing, vol. 74, no. 18, pp. 3855–3864, 2011.
View at: Publisher Site | Google Scholar
S. Hassan, A. Khosravi, and J. Jaafar, “Improving load forecasting accuracy through combination of best forecasts,” in Proceedings of the 2012 IEEE International Conference on Power System Technology (POWERCON), IEEE, Auckland, New Zealand, 2012.
View at: Publisher Site | Google Scholar
F. Gao, P. Kou, L. Gao, and X. Guan, “Boosting regression methods based on a geometric conversion approach: using SVMS base learners,” Neurocomputing, vol. 113, pp. 67–87, 2013.
View at: Publisher Site | Google Scholar
M. J. Cloud, B. C. Drachman, and L. P. Lebedev, “A brief introduction to interval analysis,” Inequalities, vol. 113, pp. 179–193, 2014.
View at: Publisher Site | Google Scholar
L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, no. 3, pp. 338–353, 1965.
View at: Publisher Site | Google Scholar
Z. a. Pawlak, “Rough sets,” International Journal of Computer & Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.
View at: Publisher Site | Google Scholar
D. Miao, Granular Computing: Past, Present and Prospect, Science Press, Beijing, China, 2007, in Chinese.
Y. Y. Yao, “Information granulation and rough set approximation,” International Journal of Intelligent Systems, vol. 16, no. 1, pp. 87–104, 2001.
View at: Publisher Site | Google Scholar
Y. Y. Yao, “A partition model of granular computing,” Transactions on Rough Sets I, vol. 3100, pp. 232–253, 2004.
View at: Google Scholar
Y. Y. Yao, “Granular computing for data mining,” in Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006, SPIE, Kissimmee, FL, USA, 2006.
View at: Publisher Site | Google Scholar
T. Y. Lin, “Granular computing: a problem solving paradigm,” in Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 132–137, FUZZ-IEEE, Reno, NV, USA, May 2005.
View at: Google Scholar
Z. Zheng, H. Hu, and Z. Shi, “Tolerance granular space and its applications,” in Proceedings of the 2005 IEEE International Conference on Granular Computing, Beijing, China, 2005.
View at: Publisher Site | Google Scholar
F. Wang, “Computing with words and a framework for computational linguistic dynamic systems,” Pattern Recognition and Artificial Intelligence, vol. 14, no. 4, pp. 377–384, 2001, in Chinese.
View at: Google Scholar
M. Kryszkiewicz, “Rough set approach to incomplete information systems,” Information Sciences, vol. 112, no. 1–4, pp. 39–49, 1998.
View at: Publisher Site | Google Scholar
J. Stefanowski and A. Tsoukias, “On the extension of rough sets under incomplete information,” in Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, pp. 73–81, Berlin, Germany, 1999.
View at: Publisher Site | Google Scholar
G. Wang, “Extension of rough set under incomplete information systems,” in Proceedings of the 2002 IEEE International Conference on Fuzzy Systems: FUZZ-IEEE’02, vol. 2, pp. 1098–1103, Honolulu, HI, USA, 2002.
View at: Google Scholar
Z. Pawlak, “Granularity of knowledge, indiscernibility and rough sets,” in Proceedings of the IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228), Anchorage, AK, USA, 2002.
View at: Google Scholar
L. Polkowski and A. Skowron, “Rough-neuro computing,” in Rough Sets and Current Trends in Computing, W. Ziarko and Y. Yao, Eds., pp. 57–64, RSCTC, Berlin, Germany, 2001.
View at: Google Scholar
J. F. Peters, A. Skowron, Z. Suraj, M. Borkowski, and W. Rzcasa, “Measures of inclusion and closeness of information granules: a rough set approach,” in Rough Sets and Current Trends in Computing, vol. 2475, pp. 300–307, Springer, Berlin, Germany, 2002.
View at: Publisher Site | Google Scholar
B. Zhang and L. Zhang, Problem Solving Theory and Application, Tsinghua University Press, Beijing, China, 1990, in Chinese.
F. Xu, “The approach of the fuzzy granular computing based on the theory of quotient space (in Chinese),” Pattern Recognition and Artificial Intelligence, vol. 17, no. 4, pp. 424–429, 2004, in Chinese.
View at: Google Scholar
X. Zhu, W. Pedrycz, and Z. Li, “A development of granular input space in system modeling,” IEEE Transactions on Cybernetics, vol. 51, no. 3, pp. 1639–1650, 2021.
View at: Publisher Site | Google Scholar
T. Yang, X. Zhong, G. Lang, Y. Qian, and J. Dai, “Granular matrix: a new approach for granular structure reduction and redundancy evaluation,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 12, pp. 3133–3144, 2020.
View at: Publisher Site | Google Scholar
Y. Yao, “Three-way decision: an interpretation of rules in rough set theory,” in Proceedings of the International Conference on Rough Sets and Knowledge Technology RSKT 2009: Rough Sets and Knowledge Technology, pp. 642–649, Gold Coast, Australia, July 2009.
View at: Publisher Site | Google Scholar
Q. Zhang, Z. Huang, and G. Wang, “A novel sequential three-way decision model with autonomous error correction,” Knowledge-Based Systems, vol. 212, no. 5, Article ID 1, 2021.
View at: Publisher Site | Google Scholar
X. Ye and D. Liu, “An interpretable sequential three-way recommendation based on collaborative topic regression,” Expert Systems with Applications, vol. 168, pp. 1–16, 2021.
View at: Publisher Site | Google Scholar
J. Chen, H. Chen, X. Wan, and G. Zheng, “MR-ELM: a mapreduce-based framework for large-scale elm training in big data era,” Neural Computing and Applications, vol. 27, no. 1, pp. 101–110, 2016.
View at: Publisher Site | Google Scholar
J. Zhai, S. Zhang, and C. Wang, “The classification of imbalanced large data sets based on mapreduce and ensemble of elm classifiers,” International Journal of Machine Learning and Cybernetics, vol. 8, no. 3, pp. 1009–1017, 2017.
View at: Publisher Site | Google Scholar
S. Ramírez-Gallego, A. Fernández, S. García, M. Chen, and F. Herrera, “Big data: tutorial and guidelines on information and process fusion for analytics algorithms with mapreduce,” Information Fusion, vol. 42, pp. 51–61, 2018.
View at: Publisher Site | Google Scholar
J. Zhai, X. Zhou, S. Zhang, and T. Wang, “Ensemble RBM-based classifier using fuzzy integral for big data classification,” International Journal of Machine Learning and Cybernetics, vol. 10, no. 11, pp. 3327–3337, 2019.
View at: Publisher Site | Google Scholar
Q. Wu, H. Wang, X. Yan, and X. Liu, “Mapreduce-based adaptive random forest algorithm for multi-label classification,” Neural Computing and Applications, vol. 31, no. 12, pp. 8239–8252, 2019.
View at: Publisher Site | Google Scholar
J. Zhai, S. Zhang, M. Zhang, and X. Liu, “Fuzzy integral-based elm ensemble for imbalanced big data classification,” Soft Computing, vol. 22, no. 11, pp. 3519–3531, 2018.
View at: Publisher Site | Google Scholar
W. Li, X. Ma, Y. Chen et al., “Random fuzzy granular decision tree,” Mathematical Problems in Engineering, vol. 2021, Article ID 5578682, 17 pages, 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Wei Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

216

Downloads

468

Citations