#### Abstract

Excess of body fat often leads to obesity. Obesity is typically associated with serious medical diseases, such as cancer, heart disease, and diabetes. Accordingly, knowing the body fat is an extremely important issue since it affects everyone’s health. Although there are several ways to measure the body fat percentage (BFP), the accurate methods are often associated with hassle and/or high costs. Traditional single-stage approaches may use certain body measurements or explanatory variables to predict the BFP. Diverging from existing approaches, this study proposes new intelligent hybrid approaches to obtain fewer explanatory variables, and the proposed forecasting models are able to effectively predict the BFP. The proposed hybrid models consist of multiple regression (MR), artificial neural network (ANN), multivariate adaptive regression splines (MARS), and support vector regression (SVR) techniques. The first stage of the modeling includes the use of MR and MARS to obtain fewer but more important sets of explanatory variables. In the second stage, the remaining important variables are served as inputs for the other forecasting methods. A real dataset was used to demonstrate the development of the proposed hybrid models. The prediction results revealed that the proposed hybrid schemes outperformed the typical, single-stage forecasting models.

#### 1. Introduction

In recent years, cancer, heart disease, and diabetes have been reported to be the leading causes of death for most countries in the world [1, 2]. One of the most common risk factors for those diseases is the obesity, and excess of body fat often leads to obesity. One common cause of heart disease is the process called atherosclerosis. This happens when fat accumulates in the blood vessels, and it usually results in the thicker walls of the vessels. The thicker walls lead to a reduced flow of blood to the heart, and the heart becomes damaged resulting in a heart attack. It has been reported that excess body fat can increase the risk of six different types of cancers, including bowel, oesophagus, pancreas, kidney, endometrium, and breast cancers, respectively [3]. In addition, type II diabetes is also found in those who are carrying too much body fat. Therefore, how to avoid obesity has become a very important issue.

Although excess of body fat causes obesity, extremely low BFP is also undesirable, as there are minimum requirements for brain function. Accordingly, knowing the BFP can provide a great deal of information regarding the current state of health.

Maintaining a good BFP is a must for human health. However, accurate and convenient ways to measure body fat are not straightforward [4, 5]. For example, the hydrostatic weighing was reported to be a reliable method for the measurement of body fat content, but it is not convenient [4]. A new technology, Dual Energy X-ray Absorptiometry (DEXA), is very accurate and precise to measure the body fat. However, DEXA may suffer from standardization issues; that is, results may vary with the specific equipment manufacturer, data collection methods, and/or software analysis. Consequently, it is desirable to have some convenient methods to predict the BFP.

While some typical studies have employed data mining techniques to classify the existence of certain diseases [2, 6, 7], the present study focuses on the development of intelligent forecasting techniques to effectively predict the BFP. The body fat datasets used in this study were real data obtained from Johnson [5]. The datasets contain the BFP which were determined by underwater weighing and 13 body circumference measurements for 252 people. Although one can employ the aforementioned variables to predict BFP through the use of multiple regression (MR) techniques or some machine learning approaches, the true relationship between these measurements and BFP may not be easy to determine. Several studies have used multiple regression techniques to build a forecasting model to estimate the BFP [5, 8–10]. However, the MR models are criticized for its strong assumptions such as variation homogeneity [2]. In addition, some data mining techniques, such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), and support vector regression (SVR), have become alternatives in modeling forecasting problems due to their capability to capture complex nonlinear relationships among variables [11–13]. Those data mining techniques have been reported to have better forecasting capability than the regression technique [14–18]. Nevertheless, those techniques may have some limitations. For example, ANN has been criticized for its long training process in designing the optimal network’s topology. Also, ANN is unable to identify the relative importance of potential input variables [19–22]. Additionally, using a single technology to address all of the prediction problems may not always be possible [23].

To overcome the aforementioned difficulties and maintain the prediction accuracies of existing approaches for BFP, this study is aimed at proposing single and hybrid forecasting models to predict BFP. The single-stage forecasting modeling includes MR, ANN, MARS, and SVR approaches. The hybrid models integrate two modeling components. The first component of the model uses its own feature to capture the important but fewer explanatory variables. The second component of the hybrid schemes generates the predictions based on those explanatory variables. Because MR and MARS have great capability to select the important explanatory variables, in this study, the combinations of MR and ANN (MR-ANN), MR and MARS (MR-MARS), MR and SVR (MR-SVR), MARS and MR (MARS-MR), MARS and ANN (MARS-ANN), and MARS and SVR (MARS-SVR) are employed as the hybrid forecasting models.

In terms of prediction capability, this study compares the typical single-stage forecasting models and the proposed hybrid model for BFP application. The mean absolute percentage error (MAPE), the root mean square error (RMSE), and the mean absolute difference (MAD) are used as the forecasting accuracy measures. The superior prediction capability of the proposed hybrid approach is addressed. The remainder of this study is organized as follows. The following section introduces the methodologies of MR, ANN, MARS, and SVR, and a literature review is also provided. The designed MR, ANN, MARS, and SVR models are presented in Section 3, and a real BFP dataset is used to verify the typical and proposed forecasting models. The performances for all of the forecasting models are demonstrated and discussed. The final section addresses the research findings and concludes this study.

#### 2. Research Methodologies

This study considers MR, ANN, MARS, SVR, and their hybrid modeling schemes as possible forecasting models for BFP. These methodologies are addressed as follows.

##### 2.1. Multiple Regression Modeling

Multiple regression analysis can be deemed as one of the most used statistical methods in modeling real-world applications. The MR involves setting up the relationships between one response variable and several explanatory variables. The performance of MR is acceptable when the assumptions have been met. However, the assumptions of the MR model may confine its application. The general MR model is represented as follows: where are model parameters and is the error term. The accounts for the variability in that cannot be explained by the linear effect of the explanatory variables. There are four assumptions about the in MR model, and they are as follows.(1)The is a normally distributed random variable.(2)The is a random variable with a mean value of zero; that is, .(3)The variance of is denoted by and is the same for all values of the explanatory variables .(4)The values of are independent.

Also, because collinearity among explanatory variables will lead to imprecise estimates and serious stability problems, the collinearity diagnosis procedure should be performed first before screening important explanatory variables. In this study, a well-known criterion, the variance inflation factor (VIF), is applied to examine collinearity. The VIF is described as follows: where is the coefficient of determination of a regression that evaluates all other explanatory variables. The tolerance is defined as the reciprocal of the VIF. It has been suggested that when the value of VIF is greater than 10, the sample set may have enough variation to suggest serious multicollinearity. This study used the technique where one or some explanatory variables could be dropped from the model in order to lessen the collinearity and thus reduce the standard errors of the estimated regression coefficients of the explanatory variables remaining in the model. In addition to the simplicity and effectiveness, this technique has another advantage of reducing the number of explanatory variables. This characteristic is very suitable for hybrid modeling since it usually captures less explanatory variables for the initial stage of modeling.

Typically, when considerable explanatory variables are involved in the MR design, a great amount of computation is required for examining a large volume of computer outputs, much of which are associated with poor MR models. As a consequence, three variable selection procedures are employed in this study. Those three selections include forward selection, backward elimination, and stepwise regression procedures. Given a BFP dataset with 13 explanatory variables, this study uses the aforementioned selection procedures to select the explanatory variables that lead to the best model.

##### 2.2. Artificial Neural Network Modeling

Due to ANN’s associated memory characteristic and its generalization capability, ANN has been increasingly utilized for modeling nonstationary processes [24–28].

ANN is usually classified into two categories: feedforward and feedback networks [28]. The nodes in the ANN can be divided into three layers: the input, the output, and one or more hidden layers. The output of each neuron in the input layer is the same as the input to that neuron. For each neuron in the hidden layer and neuron in the output layer, the net inputs are given by where is a neuron in the previous layer, is the output of node , and is the connection weight from neuron to neuron . The neuron outputs are given by where is the input signal from the external source to the node in the input layer and is the bias. The transformation function shown in (4) to (6) is called a sigmoid function.

The generalized delta rule is the conventional technique used to derive the connection weights of the feedforward network [28]. Initially, a set of random numbers is assigned to the connection weights. Then, to determine the pattern with a target output vector , the sum of the minimized squared error is given by where is the number of output nodes.

##### 2.3. Multivariate Adaptive Regression Splines Modeling

MARS has been generally applied in many fields [29–33]. The general MARS function can be represented as follows [33]: where and are parameters, is the number of basis functions (BF), is the number of knots, takes on values of either 1 or −1 and indicates the right or left sense of the associated step function, is the independent variable, and is the knot location.

A two-stage process is usually used to choose the optimal MARS model. Initially, a large number of basis functions were used to fit the data. Secondly, the basis functions with the least contributions were deleted using generalized cross-validation (GCV) criterion. A measure of variable importance can be obtained by observing the decrease in the calculated GCV values when a variable is removed from the model. The GCV can be expressed as follows:

##### 2.4. Support Vector Regression Modeling

While support vector machine (SVM) is a powerful technique in machine learning areas, SVR can be deemed as a special form of SVMs. Due to its prediction capability, SVR has been used for predictions in many fields [34–36]. Based on the computation of a linear regression function in a high-dimensional feature space, the inputs for SVR are mapped via a nonlinear function. The modeling of SVR can be described as follows. Suppose where is the weight vector, represents the model inputs, is a bias, and stands for a kernel function which uses a nonlinear function to transform the nonlinear input to be linear mode in a high-dimensional feature space.

Usually, the regression modeling obtains the coefficients through minimizing the square error, which can be considered as empirical risk based on loss function. The -insensitivity loss function was introduced [37], and it can be described as follows: where is the target outputs; defines the region of -insensitivity. When the predicted value falls into the band area, the loss is zero. However, when the predicted value falls outside the band area, the loss is defined as the difference between the predicted value and the margin.

When empirical risk and structure risk are both considered, the SVR can be set up to minimize the following quadratic programming problem: where is the number of training data, represents the empirical risk, stands for the structure risk preventing overlearning and lack of applied universality, and is a modifying coefficient representing the trade-off between empirical risk and structure risk. With an appropriate modifying coefficient , band area width , and kernel function, the optimum value of each parameter can be solved by Lagrange procedure.

The SVR-based regression function can be described as follows [37]: where and are Lagrangian multipliers and satisfy the equality . Additionally, since the radial basis function (RBF) is the most widely used kernel function [36], this study uses it for our experimental study. The RBF can be defined as follows: where denotes the width of the RBF.

#### 3. BFP Data and Modeling Results

##### 3.1. The BFP Dataset

In order to compare the forecasting performance of the single-stage and the proposed hybrid models, a real dataset of BFP is analyzed [5]. The dataset consists of 252 records. Each person consists of 15 variables and they are summarized in Table 1 (i.e., readers can refer to the website http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_BMI_Regression for more details and descriptions about the dataset).

In this dataset, the response variable BFP is denoted by . Since the BFP is difficult to measure, Siri’s equation, as in (15), is used to compute the value of :

In the initial “cleaning” phase for this real dataset, this study has found the body fat measurements of the 96th, 172nd, and 182nd cases are 0.4, 0.7, and −3.6, respectively. Generally speaking, those three measurements deviate too much from normal conditions, and this study has decided to delete those three cases. This study also deleted the 42nd case where the height is only 74.93 cm. Consequently, the sample size becomes 248 cases. With the 248 cases used in this study, the first 174 cases (around 70% of the total cases) were selected as the model training sample while the remaining 74 (around 30% of the total cases) will be retained as the testing sample.

##### 3.2. Typical Single-Stage Modeling Approaches

This study considers the BFP (i.e., ) as the response variable and the thirteen body circumference measurements (i.e., ) as the explanatory variables. The first variable, density determined from underwater weighing, in Table 1 is just for computing the values of BFP. To exclude variables with high collinearity, the Pearson correlation coefficients between variables are used. Table 2 shows the corresponding results. When the correlation coefficient between variables and is greater than 0.7, we exclude the variable that has a lower relationship with (i.e., exclude the variable with a smaller correlation coefficient ). After discarding variables with high collinearity, eight explanatory variables , , , , , , , and remain in the MR models.

In addition, this study employed three selection techniques to develop alternative MR models for BFP. These three techniques include forward selection, backward elimination, and the stepwise regression analysis. The forward selection is similar to the stepwise selection. The first explanatory variable is selected for inclusion of the regression equation is the one with the largest positive or negative correlation with the response variable, . This explanatory variable is entered into the regression equation only if it satisfies the tolerance criterion for entry. If the first variable is entered, the explanatory variable not in the regression equation that has the largest partial correlation is considered next. The forward selection procedure would stop when there are no explanatory variables that meet the entry criterion. The back elimination procedure initially considers all explanatory variables to be included in the regression equation and then sequentially removed. The explanatory variable with the smallest partial correlation with the dependent variable is first for the removal. If that variable meets the tolerance criterion for elimination, it is removed. After the first variable is removed, the variable remaining in the regression equation with the smallest partial correlation is considered next. The procedure would stop when there are no variables in the regression equation that satisfy the removal criteria. At each step for the stepwise selection procedure, the explanatory variable which is not in the regression equation that has the smallest probability of is entered, if the probability is sufficiently small. Explanatory variables that have already existed in the regression equation are removed if their probability of is sufficiently large. The stepwise selection procedure would stop when no more variables are eligible for inclusion or removal.

In this study, all those three selection procedures resulted in the same MR model. Table 3 lists the results of the parameter estimates. As shown in Table 3, all the values of VIFs of the remaining variables are smaller than 10. Accordingly, there is no high collinearity among these explanatory variables. This model is described in the following:

It has been reported that more than 75% of neural networks applications would employ the BPN structure, and, thus, this study uses the BPN in building the ANN forecasting model [8, 38]. Also, since one-hidden-layer network has been reported to be sufficient to model the complex system, this study considers one hidden layer for the ANN modeling structures [39–41]. This study uses 13 input nodes (or explanatory variables) and one output node. The hidden nodes range from to , where is the number of input variables. Thus, the hidden nodes are chosen as 11, 12, 13, 14, and 15, respectively. The ANN model has only one output node, the prediction of BFP. According to the findings of [42], the learning rates were set to be 0.01, 0.005, and 0.001, respectively.

In addition, since MAPE is one of the most important performance measurements for the forecasting capability, this study uses the smallest MAPE as the criterion for selecting the ANN topology. After performing the ANN modeling, this study found that the {13-11-1} topology with a learning rate of 0.01 provides the best results and a minimum testing MAPE. Here, -- stands for the number of neurons in the input layer, number of neurons in the hidden layer, and number of neurons in the output layer, respectively. Table 4 presents the corresponding MAPE values for various settings of the ANN topologies. Accordingly, the ANN topology of {13-11-1} with a learning rate of 0.01 is chosen for the model of ANN alone.

This study also performed the MARS modeling to the BFP dataset; the selection results are displayed in Table 5. During the selection process, four important explanatory variables were chosen. The corresponding relative importance indicators are shown in the last column of Table 5. As a consequence, those seven important variables would be served as the input variables for intelligent hybrid modeling process. For SVR modeling, same as ANN modeling, we have 13 input variables. The two parameters, and gamma, were estimated to be 2^{3} and 2^{−7}, respectively.

##### 3.3. Proposed Hybrid Modeling Approaches

In this study, the initial stage of the proposed hybrid modeling is to obtain the fewer but more important input variables for the second stage of forecasting models. Because this study utilizes MR and MARS modeling selections, the explanatory variables which were selected from MR and MARS models were used to serve as the input variables for other prediction models. Accordingly, this study employs six combinations of the candidate hybrid models to predict the BFP. Those hybrid models include MR-ANN, MR-MARS, MR-SVR, MARS-MR, MARS-ANN and MARS-SVR, respectively.

For the hybrid MR-ANN model, this study sets up 6 input nodes in the input layer, and the number of hidden nodes was set to 4, 5, 6, 7, and 8. The learning rates were identical to those used in the single-stage ANN model. The {6-6-1} topology with a learning rate of 0.01 provided the best results for the hybrid MR-ANN model. For the MARS-ANN hybrid model, this study used 4 input nodes in the input layer. The number of hidden nodes was set to 2, 3, 4, 5, and 6. Accordingly, the {4-3-1} topology with a learning rate of 0.01 provided the best results. For the other hybrid models, this study used 6 and 4 important explanatory variables, which are selected by using MR and MARS models, respectively, to predict BFP.

##### 3.4. Prediction Results and Performance Comparison

In addition to presenting the single-stage prediction modeling, this study develops various hybrid models for prediction of BFP. This study considers the forecasting accuracy measures of MAPE, MSE, and MAD to address the forecasting performance for typical single-stage and the proposed hybrid models. The prediction measurements are defined as follows: where stands for the residual at time . A low MAPE, MSE, or MAD is associated with better forecasting accuracy.

The prediction results for single and the proposed hybrid modeling are listed in Table 6. In Table 6, by considering single-stage modeling approaches, we note that SVR model has the best performance in terms of MAPE, RMSE, or MAD indices. For the hybrid models, the MR-MARS has the best prediction performance. In comparison to the single-stage and the proposed hybrid models in Table 6, one is able to observe that some of our proposed hybrid models provide more accurate results than the single-stage models. For example, in terms of MAPE, MSE, or MAD, the proposed hybrid MR-MARS model possesses the lowest values among all the models.

Accordingly, the MAPE percentage improvements of the proposed MR-MARS model over the single MR and MARS models are 4.2% and 9.5%, respectively. In addition, the MAPE percentage improvements of the proposed MR-SVR model over the single MR and SVR models are 4.1% and 3.8%, respectively. Although the hybrid ANN models do not make any significant improvements, the fewer body circumference measurements were used for our proposed approach. This is another significant advantage of using our hybrid models.

#### 4. Conclusion

Maintaining appropriate body fat is very crucial for human’s health. However, the measurement of the BFP is not straightforward. Accordingly, this study proposes the hybrid models to effectively predict BFP. Although the 13 body circumference measurements are involved in the real dataset, the proposed models are able to provide better predictions with fewer body circumference measurements. Actually, it is much more convenient to predict BFP with fewer body circumference measurements for most of the people.

The rationale of the proposed hybrid modeling was initially to obtain fewer important explanatory variables by performing MR and MARS modeling techniques. Those important variables were served as inputs for other designed prediction models. According to the modeling results, the proposed hybrid approaches were most appropriate for predicting the BFP. Additionally, the proposed hybrid MR-MARS model was the best alternative because it contained fewer number of explanatory variables and provided the best prediction MAPE precision.

In order to obtain more accurate predictions of BFP, in addition to the existing 13 body circumference measurements, one may collect some other measurements so that the prediction precision can be increased. Moreover, the proposed hybrid modeling is not the only prediction technique that can be employed. One may combine other data mining techniques, such as rough set or genetic algorithms, to refine the structure of the hybrid models. The possibility to apply the same procedure to combine other methods as are the evolving systems deserves further research.

#### Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was partially supported by the National Science Council of the Republic of China (Grant no. NSC 102-2221-E-030-019). The author also gratefully acknowledges the helpful comments and suggestions of the reviewers, which have improved the presentation.