Abstract

A systematic approach is proposed to optimize value for fuzzy linear regression (FLR) analysis using minimum fuzziness criteria with symmetric triangular fuzzy numbers (TFNs). Firstly, a new concept of credibility is defined to evaluate the performance of FLR models with different values when a set of sample data pairs is given. Secondly, based on the defined concept of credibility, a programming model is formulated to optimize the value of . Finally, both the numerical study and the real application show that the approach proposed in this paper is effective and efficient; that is, optimal value for can be determined definitely with respect to a set of given sample data pairs.

1. Introduction

Statistic regression analysis and fuzzy regression analysis are two types of methods underlying different philosophies to assess the functional relationship between the dependent and independent variables and determine the best-fit model for describing the relationship, by exploiting the knowledge from the given input-output data pairs. In statistical regression analysis, deviations between the observed values and the estimates are assumed to be random errors disturbed by a probabilistic distribution. Different from statistic regression analysis, in fuzzy regression analysis, the deviations are attributed to the imprecision of the observed values and/or the indefiniteness of model structure. Tanaka et al. [1] firstly proposed fuzzy linear regression (FLR) analysis using the fuzzy functions defined by Zadeh’s extension principle [2], in which the observed values can differ from the estimated values to a certain degree of belief [3]. Thus, the uncertainty in this type of regression model becomes fuzziness, not randomness, and the disturbance is incorporated into the fuzzy coefficients, and the final objective is to adjust the fuzzy coefficients from the available sample data pairs.

According to [3], the existing FLR methods can be roughly classified into the following two categories based on criterion function, that is, FLR methods using minimum fuzziness criteria and FLR methods using fuzzy least-squares criteria. By using the first category of FLR methods, FLR model can be built by minimizing the system vagueness. The first FLR method in [1] was extended by using other types of fuzzy coefficients, including general LP-type fuzzy coefficients [4], exponential fuzzy coefficients [5], and triangular fuzzy coefficients [6, 7], Chen et al. depended on symmetric triangular fuzzy numbers to study determination Method for Parameters of Rock’s shear strength through least absolute linear regression, and the analysis of practical engineering computation, and comparison to other methods shows that the method is reasonable [8], trapezoidal fuzzy coefficients [9], and Kheirfam and Verdegay extend the dual simplex method to a type of fuzzy linear programming problem involving symmetric trapezoidal fuzzy numbers. The results obtained lead to a solution for fuzzy linear programming problems and the optimal value function with fuzzy coefficients [10]. And interval regression, where a model with interval coefficients is assumed, is regarded as the simplest version of fuzzy regression analysis [1113]. Some fuzzy nonlinear regression approaches also were proposed [3, 7]; a research indicated that prediction performance of the nonlinear multiple regression model is higher than that of the fuzzy inference system model [14]. Based on the other direction of FLR methods, FLR model will be built by minimizing fuzzy distance between the predicted outputs and the observed outputs [1517]. Celmi š [15] has dealt with quadratic membership functions based on least squares fitting with indicators of discord, data spread dilator, and so forth, and Diamond [16] proposed models for least squares fitting for crisp input fuzzy output and for fuzzy input-output where the distance of fuzzy numbers is defined to measure the best fit for models. Chang [17] formulated fuzzy least-squares regression model by defining the weighted fuzzy arithmetic (WFA). Lv et al. proposed a novel least squares support vector machine- (LSSVM-) based ensemble learning paradigm to predict emission of a coal-fired boiler using real operation data. The result shows that the new soft FCM-LSSVM-PLS ensemble method can predict emission accurately [18].

Regarding the first category of FLR methods, the value of determines the range of the possibility distribution of the fuzzy parameters [1, 38, 19], so it is important to select a suitable value for in FLR analysis. Moskowitz and Kim [20] studied the relationship among the value, membership function shape, and the spreads of fuzzy parameters in FLR with symmetric fuzzy numbers, they also developed a general approach to assess the proper parameter values. Their studies showed that the system fuzziness will increase with the augment of value. And Tanaka and Watada [19] suggested that the selection of the value should be based on the sufficiency of the collected dataset. When the dataset is sufficiently large, should be used and is increased along with the decreasing volume of the collected data. However, both did not suggest how to get an optimal value for the FLR model when a sample dataset is given. In practical situations, the value of is usually subjectively preselected by the decision makers (DMs) [17, 18].

In fact, if a larger value is given to , the FLR using triangular fuzzy coefficients tends to yield large unnecessary fuzziness and estimated parameters with too large aspiration, which leads to the fuzzy predictive interval too fuzzier and has no operational definition or interpretation [8]. On the other hand, if a smaller value is used, the FLR tends to yield very lower membership degree, which leads to the very narrow fuzzy predictive interval and the reliability of the FLR model is doubtable. Therefore, it is necessary to develop a systematic approach to help DMs determine the optimal value for the FLR using minimum fuzziness criteria. To tackle the problem, with the suggestion from Tanaka and Watada [19], a new concept of credibility is introduced to measure the performance of the FLR models with different values in this paper, based on which a systematic approach is formulated to optimize values for FLR using minimum fuzziness.

The rest of the paper is organized as follows. In Section 2, Tanaka’s FLR method is described. In Section 3, the concept of credibility is proposed to measure the performance of FLR models with different values. In Section 4, a systematic approach is formulated to optimize the value for FLR with symmetric triangular fuzzy numbers (TFNs). In Section 5, a numerical example is used to show how the optimal value can be determined using the approach proposed in this paper. In Section 6, a real application is conducted to illustrate the effectiveness of the approach proposed in this paper. The conclusions are drawn in Section 7.

2. FLR with Symmetric TFNs

As in a FLR analysis, the explained variable is assumed to be a linear combination of the explanatory variables. This relationship should be obtained from a sample of observations , where is the th crisp observation and is the th crisp input vector. Moreover, , and is the observed value for the th variable in the th case of the sample. In particular, the fuzzy linear function has to be estimated as follows: where is the fuzzy estimation of . And , , are fuzzy coefficients in the terms of symmetric TFNs and can be uniquely defined by , . Here, is the spread value, and is the centre value of (see Figure 1).

The goal in the fuzzy linear regression is to determine by minimizing the system vagueness subject to the following inclusion condition [1, 19]: where is the -level set of the fuzzy output from the linear fuzzy model corresponding to the input vector . Since -level set of fuzzy numbers are intervals, (2) can be further given as follows by using Interval arithmetic:

The system fuzziness in (1), , can be given as in which is the fuzziness associated with and can be given as

Henceforth, according to [1, 19], , , in the form of symmetric TFNs, can be determined by solving the following optimal programming model: subject to in which the constraints and are corresponding to inclusion condition in (2), and the constraint guarantees that the spread values of , , are nonnegative.

3. Credibility Measure for the FLR Model

As we can see from to , the value of determines the range of the possibility distribution of the fuzzy parameters, so it is important to select a suitable value for FLR. To do this, the first problem to be solved is how to evaluate the FLR models with different values in (1). Therefore, a new concept of credibility measurement is introduced in this section, based on which the FLR models associated with different values can be assessed.

Assume that and are any two values for and and . And we denote that and , , are two sets of symmetric TFNs with respect to and , respectively, and and are the corresponding estimations of from (1). From (1), and are calculated as

Now, we are interested in which one, out of and , is better as the estimation of . That is to say, which one, out of and , is better to be used to build a FLR model when the sample set of data pairs is given? To deal with the problem, in our opinion, two factors, that is, the estimating fuzziness and the membership degree , should be taken into account. In general, the smaller is and the higher is, the better the performance of in representing will be. To illustrate our point of view, two special cases are considered firstly as follows.(1)If is equal to , it is clear that, out of and , the performance of one with smaller fuzziness is better than that of the other with larger fuzziness in representing . As shown in Figure 2, and ; therefore, is better than as the estimation of . (2)If is equal to , it is doubtless that, out of and , the performance of one with higher membership degree is better than that of the other with lower membership degree in representing . As shown in Figure 3, and ; therefore, is better than as the estimation of .

Now let us consider a more general situation that neither is equal to nor is equal to , as shown in Figure 4. To deal with this problem, a new concept of credibility measure is introduced in this paper. We denote the credibility of in representing as , which is expressed as

The higher is, the better the performance of in representing will be. Obviously, the scenarios of and are two special cases of (9) respectively.

As a result, based on (9), the total credibility of all sample data, , can be obtained to assess the performance of the obtained FLR model in (1), which can be calculated as follows: The higher is, the better of the performance of the FLR model will be. This will help us select the optimal value of for a FLR model.

4. Formulating a Systematic Approach to Optimizing Value

Based on the concept of credibility measure for FLR models introduced in Section 3, a systematic approach is proposed to optimize value for FLR models in this Section.

As shown in Figure 5, we denote the optimal solution of ,  ,   , and with respect to as , , and , and the corresponding fuzziness as , . According to Moskowitz and Kim [20], the optimal solution of  , ,   , and and the corresponding fuzziness with regard to can be obtained as This indicates that the central value of each will keep constant when the value changes from to , while the spread value of each and the objective function and the fuzziness become times simultaneously.

With respect to , , are calculated as follows: in which is given as Therefore, according to (8), the estimating credibility for all sample data with regard to can be expressed as Therefore, from (9), the total credibility for the FLR model with respect to can be obtained as

From (12), with regard to , we have Therefore, according to (11), with regard to can be calculated as Henceforth, according to (8), (10), and (16), the estimating credibility for all sample data pairs with regard to can be expressed as in which Consequently, the estimating credibility for the FLR model in (1) with regard to can be obtained as in which

For similarity, we denote and , (11), (16), (17), (18), and (20) can be rewritten as

Based on (26), the optimal value for the FLR model in (1) can be obtained by maximizing the estimating credibility; that is, the optimal value for can be obtained by solving the following programming model: It is obviously that the programming model in to is quadratic, and many kinds of optimization algorithms, such as the gradient descent method, can be used to solve the previous nonlinear programming problem.

5. Numerical Example

The numerical example in [21] is used to how the optimal value can be obtained using the approach proposed in this paper. The eight testing data pairs are listed in Table 1.

When is specified as 0, the fuzzy coefficients in terms of symmetric TFNs can be obtained by solving the programming model in  , ,   , and as and . And the corresponding FLR model is given as Accordingly, the fuzziness , parameters and , membership degree , and credibility , , are calculated, respectively, as shown in Table 2.

From Table 2, is calculated as 1.2984 and as 2.7811. According to (26), the total credibility of the FLR model with respect to can be expressed as follows: According to and , the optimal value is given as 0.2666, and the optimal coefficients in the form of symmetric case are obtained as and . Therefore, the optimal FLR model is given as The total credibility of the model in (30) is calculated as 1.4960, which is higher than that of the model in (28), that is, 1.2984.

The changes of the fuzziness , the membership degree , the credibility , , and the total credibility with different values are depicted in Figures 6, 7, 8, and 9, respectively. Notably, the vertical axis is logarithmic scale in Figure 6 to enable us to see small changes in fuzziness among eight sample data, and the membership degrees for sample data 2, 3, 6, and 7 are overlapped in Figure 7.

From Figure 6, we can see that the fuzziness of all sample data become larger with the augment of value; especially when value is close to 1, the fuzziness will be extremely larger. From Figure 7, it can be found that the membership degree of all sample data becomes higher with the increase of value, and the membership degree will be close to 1 when value is close to 1. It is clearly shown in Figure 8 that when value is close to 1, the credibility measures for all sample data will be close to zero due to the extremely larger of the fuzziness (see Figure 7). Figure 9 shows that when value is 0.2666, the total credibility of the FLR model will achieve the maximum, based on which the best FLR model can be obtained.

To further demonstrate the effectiveness of the approach proposed in this paper, another numerical example with two independent variables is given as follows. The twenty testing datasets are listed in Table 3.

According to and , the optimal value is given as 0.3184. When is specified as 0, 0.3184, and 0.5, respectively, the fuzzy coefficients in terms of symmetric TFNs and the corresponding total credibility can be obtained by solving the programming model in  , ,   , and . The results are summarized in Table 4.

From Table 4, we can see that when value is set to 0.3148, the total credibility of the FLR model will achieve the maximum, that is, 0.1371, which indicates that the approach proposed in this paper is effective and reasonable.

6. Real Application

To investigate the effectiveness of the approach proposed in this paper, modeling welding process for electronic manufacturing using fuzzy linear regression was studied. An electronic company is a famous OEM company of printed circuit board (PCB) in PR China. The engineers in this company wanted to improve the welding quality by investigating the relationship between the pull strength ( ) of welding line and the proportion of colophony ( ) in welding fluid. And an engineering experiment was conducted and the experiment results are shown in Table 5.

With the use of the 10 experimental datasets pairs, the optimal value is calculated as 0.2057. If value is given as 0.2057 (the optimal value), 0, and 0.5, respectively, the FLR models with the corresponding total credibility are summarized in Table 6.

It is not surprise that the total credibility of the FLR model with the optimal value achieves the highest among the three FLR models (see Table 6). In fact, when the optimal value is used, the total credibility of the FLR model will improve 7.19% comparing with and 15.91% comparing with .

To further investigate the modeling performance of the approach proposed in this paper, we divided the sample data pairs to two groups: one group is for modeling; that is, 80% of all the sample data pairs were used to build the FLR model; the other group is for testing; that is, the left 20% of all the sample data were used to test the performance of the FLR model. Therefore, each time, two datasets were randomly selected from ten datasets as testing datasets while the rest eight datasets were used to develop FLR models when value is specified as the optimal value, 0 and 0.5, respectively. Their corresponding total credibility was calculated. The previous procedures were repeated for ten times. Table 7 summarizes the testing results.

From Table 7, it can be seen that the predictive performance of the FLR model with the optimal value is the best among those of the other models. In fact, in general, when the optimal value is used, the total credibility of the FLR model will improve 9.73% comparing with and 22.61% comparing with . Therefore, the approach proposed in this paper is effective.

7. Conclusions

In this paper, a systematic approach is proposed to select optimal value of for FLR analysis with symmetric TFNs. Firstly, a new concept of credibility is introduced by the consideration of system fuzziness and membership degree, which can be used to assess the performance of FLR analysis with different values when a set of sample data pairs is given. Secondly, a procedure to obtain optimal value is formulated for FLR with symmetric TFN by maximizing the total credibility of FLR models. By using the approach proposed in this paper, the optimal value of can be determined definitely with respect to a set of sample data pairs. Both the numerical example and real application demonstrate that the approach proposed in this paper is effective and efficient. The further work in this direction involves extending the approach proposed in this paper for FLR analysis with fuzzy outputs, that is, developing a procedure to select optimal value for FLR with fuzzy observations and applying the proposed approach to practical problems.

Acknowledgments

The work described in this paper is supported by a Grant from the National Nature Science Foundation of Chinese (Project no. NSFC 71272177/G020902) and the funds of Innovation Program of Shanghai Municipal Education Commission (Project no. 12ZS101).