Abstract

Curve fitting is a very challenging problem that arises in a wide variety of scientific and engineering applications. Given a set of data points, possibly noisy, the goal is to build a compact representation of the curve that corresponds to the best estimate of the unknown underlying relationship between two variables. Despite the large number of methods available to tackle this problem, it remains challenging and elusive. In this paper, a new method to tackle such problem using strictly a linear combination of radial basis functions (RBFs) is proposed. To be more specific, we divide the parameter search space into linear and nonlinear parameter subspaces. We use a hierarchical genetic algorithm (HGA) to minimize a model selection criterion, which allows us to automatically and simultaneously determine the nonlinear parameters and then, by the least-squares method through Singular Value Decomposition method, to compute the linear parameters. The method is fully automatic and does not require subjective parameters, for example, smooth factor or centre locations, to perform the solution. In order to validate the efficacy of our approach, we perform an experimental study with several tests on benchmarks smooth functions. A comparative analysis with two successful methods based on RBF networks has been included.

1. Introduction

In the literature, there are many methods to tackle the curve fitting problem, which remains challenging and elusive. In this study, the goal is to build a compact representation of the curve that corresponds to the best estimate of the unknown relationship between two variables from a set of data points. Curve fitting is a fundamental tool in scientific and engineering applications such as system identification, data analysis and visualization, geometric modeling, CAD/CAM systems, medical imaging, and reverse engineering [16].

The curve fitting problem has been mainly addressed by using typical methods based on linear models [710]. These methods consider that, given a set of data points, any function can be properly approximated on a specific interval using a linear combination of a set of fixed functions often called basis functions. The main basis functions used to address the curve fitting problem are polynomials, piecewise polynomials (splines), and radial basis functions (RBFs). RBFs have typically shown a successful performance in methods based on interpolation, such as in [11, 12].

In [13], the authors present a method for spike classification enhancement based on the 3-Gaussian model fitting. In [14], the peak wavelength detection accuracy in fiber Bragg grating sensors is improved by using a wavelet filter and curve fitting based on RBF. In [15], it is demonstrated that a Gaussian curve fitting substantially reduces the measurement errors of spectrally distorted FBG sensors by employing a binary search algorithm that calculates and compares MSE values at only logically selected positions. In [12] the authors propose a method based on RBF interpolation to subpixel mapping of remote sensing images by fully exploiting the spatial information in the input images. The method uses RBF interpolation to predict the soft values at each subpixel. However, the interpolation methods present some limitations, especially when the number of data samples increases or the set of data samples is disturbed by noise. Thus, in order to solve the limitations of the interpolation methods, other methods have been proposed based on regression techniques. Nevertheless, these methods require the specification of global parameters such as the number of RBFs or centre locations to perform their solution. If this is not the case, if the basis functions and any parameters which they might contain change, then the model is nonlinear and the adequate choice of parameters becomes a continuous multimodal and multivariate nonlinear optimization problem, which must be addressed using modern techniques of Computational Intelligence.

In [16] the authors propose a method based on curve fitting with Gaussian functions for modeling carotid and radial artery pulse pressure waveforms. The method uses a fixed number of Gaussian functions, and their parameters are determined by using an algorithm based on particle swarm. Neural networks based on RBF have been applied to curve fitting in [17]. This paper considers fitting noisy curves using two neural networks: the multilayer feed forward network and the radial basis function network. A comparative analysis shows that when the noise level increases, the RBF networks are best suited for the reconstruction of noisy curves. However, most of these approaches use parameter values predefined without any justification, that is, in an empirical way. In [18], the spread parameter value of a neural network is selected by trial and error from a reasonably small interval previously determined.

It is found that RBFs have rarely been applied to curve fitting by using a linear combination to change the number of radial basis functions and simultaneously to optimize their parameters. This paper elucidates the feasibility of using RBFs in a linear combination to fit a set of data points disturbed by noise; that is, we propose a new method to tackle the curve fitting problem using a strictly linear combination of RBFs. The proposed approach divides the parameter search space into linear and nonlinear parameter subspaces. We use a hierarchical genetic algorithm to minimize a model selection criterion, which allows us to automatically and simultaneously determine the nonlinear parameters: spread, centre locations, and number of RBFs. The coefficients of the linear model (linear parameters) are computed solving a set of linear equations using the least-squares method through Singular Value Decomposition (SVD) method. In order to validate the efficacy of the proposed approach, we perform an experimental study with several tests on smooth functions benchmarks. The comparative analysis with two successful methods based on RBF networks has been included.

2. Background

2.1. Radial Basis Functions

RBFs are a particular class of functions whose value depends only on the distance from the centre. Specifically, their response decreases (local response) or increases (global response) monotonically with respect to the distance from a central point. In general, a radial basis function is represented by the following equation:where denotes the norm used to measure the distance between any point and the centre of the basis function and is a specific type of RBF. Usually, the norm used is Euclidean distance [19], and the types of RBFs commonly used are expressed in Table 1, where .

2.2. Linear Models

A linear model for a function can be expressed as a linear combination of a set of basis functions aswhere are coefficients and are the basis functions of the model. In a linear model, the basis functions and any parameter which they might contain are fixed. Its ability to fit different functions is derived from the freedom to choose different coefficient values. Otherwise, if the basis functions or its parameters are not fixed, then the model is a nonlinear model. Any set of basis functions can be used as basis set; however, models containing only basis functions from a particular class have a special interest [20].

2.3. Hierarchical Genetic Algorithms

HGA is a revised version of a genetic algorithm, which is an optimization scheme based on the biological evolution. Unlike the traditional genetic algorithm where the genotype structure is fixed, the chromosome in the HGA does not have these restrictions. The HGA has a bicoded chromosome scheduled in a hierarchical mode, which consists of two kinds of genes: control and parametric genes [21].

Located at the first level, the control genes are encoded as binary digits, while parametric genes at the next levels are encoded as real numbers generally, and their activation depends upon the value of associated control genes. Specifically, the control genes activate or inactivate the parametric genes; this is particularly important to determine the phenotype with different lengths within the same chromosomal representation. Hence, HGA will search over a larger search space and converge to the right solution with an increasingly high degree of accuracy.

2.4. Curve Fitting with RBFs

The goal of curve fitting is to find the best estimate of the unknown functional relationship from a set of pairs of measurements , where is the th observation at the point . This relationship can be expressed aswhere and the term represents the th element of a vector with zero-mean random error.

In this study, we assume that is a smooth function, which can be properly approximated on the interval by a linear combination of a set of basis functions. The linear model is constructed from basis functions placed at certain locations considering that is a set of max locations placed along the domain of the independent variable and that the model has at most max basis functions. Under these assumptions, the function can be written as the following linear combination:where and is the coefficient associated with th basis function , which is defined on the set of centres .

If the type of basis function is specified beforehand, we can define completely if we find the number of basis functions and the parameters associated with them and then we calculate the model coefficients .

3. Automatic Curve Fitting Using RBF and HGA

In this study, we use a typical radial function with local response, the Gaussian function. The popular Gaussian RBF is radially symmetric with a maximum value at central point . It is selected for maximal trend sensing with minimal parameter representations for function approximation [22], where such parameters are the centre when the maximum value occurs (frequently called mean), and the width of the function is referred to as spread.

When the type of basis functions has been specified, our method applies the HGA to determine the number of the basis functions (model structure) and the centre locations and spreads (basis parameters) simultaneously. Then, the coefficients of linear model are calculated using the least-squares method through the Singular Value Decomposition (SVD) method. To find the best estimate of the unknown functional relationship by using HGA, the fitness function is defined to minimize a model selection criterion. Next, we specified the main features of HGA.

3.1. Chromosome Encoding

To simultaneously determine the number of the basis functions, the centre locations, and the spread values, we propose a two-level bicoded description, where each bit in the binary chromosome carries the implication of presence (“1”) or absence (“0”) of a specific basis function in the solution. The real chromosome represents the magnitude of the spread parameters of the basis functions. We represent the chromosome of an individual aswhere each bit is a control gene and is a real value. Each control bit simultaneously enables or disables one basis function and its associated parameter . In the present paper, each control gene corresponds to one basis function located at a specific centre . Therefore, if the th control gene is “1,” the th basis function located in the th centre and with spread exists in the candidate solution. We use a fixed length binary string to represent the max number of the basis functions . The general structure of a chromosome is graphically shown in Figure 1.

3.2. Fitness Function

To determine the model structure, as well as the basis parameters, we propose a fitness function based on the Akaikes Information Criterion (AIC). Originally, for parametric problems, this criterion was developed as a measure of the fitness of the model and the complexity of the model. The AIC is given bywhere is the number of points in and is the estimation of . The term is the number of the model parameters. In AIC, the residual sum of squares in the first term is used as a measure of the deviation of the estimated function from , and the second term is a penalty for increasing the number of parameters . In this study, the number of parameters consists of , , and for each basis function. As a result, the number of parameters is 3 times the number of basis function , and thus .

3.3. Reproduction Process and Genetic Operators

Reproduction is the process whereby the best fit individual in the population receives a corresponding large number of copies in the next generation. We use the roulette wheel method as a reproduction process. In this method, each individual in the population has a roulette wheel slot sized in proportion to its fitness value. This selection strategy favors the most fit individuals, but it gives a chance to the less fit individuals to reproduce. Otherwise, in order to prevent premature convergence, it tries to keep the selection pressure relatively constant over the entire evolution process by using the sigma scaling method [23], which is calculated according towhere is the new scaled fitness value, is the current fitness value, is the average fitness, is the standard deviation of the population, and is a constant that allows it to control the selection pressure. In addition, elitism is used in order to keep the elite individuals in the next population to prevent losing the best solution found. After a collection of good chromosomes is selected, they exchange information using the crossover operator.

The crossover process selects genes from the parent chromosomes and creates new offspring. In the uniform crossover scheme, two parents are chosen to recombine into a new individual. Each bit of the new individual is selected from one of the parents depending on a fixed probability. We use the uniform crossover operator for both the control and the parametric genes with the same probability.

In the mutation process, there are also two types of mutations. The bit mutation method is applied over the control genes. With this method, each bit is inverted or not depending on the mutation probability. For the parametric genes mutation each numeric value is changed depending on the same mutation probability according towhere is the maximum increment or decrement of the real value and rand is a function that generates a random value between 0 and 1.

4. Experimental Results

In order to evaluate the effectiveness of the proposed method (HGA), we carried out numerical simulations on five test functions, which are mathematically defined by the equations in Table 2. Our experimental study includes a spatially inhomogeneous smooth function, that is, with a sharp peak (), an oscillating function but not periodic (), and a function with a discontinuity (). Note that these test functions have been used in previous studies [2428] and they could be considered as a benchmark in smooth functions. Furthermore, we present a comparison with the well-known methods based on radial basis neural networks (RBNN) and general regression neural networks (GRNN).

The problem addressed in this paper is the automatic curve fitting given a set of noisy data points. Thus, the experimental setup was designed to study the effects of varying the number of samples and noise levels on the curve fitting task. The idea is that each test function is evaluated at design points uniformly distributed over the interval . Here, a zero-mean normal noise with a known is added to the data set in order to have a specific signal-to-noise ratio (SNR), defined as .

For each function, a collection of 101 noisy data sets is generated from each of the different configurations. In this study we use three different SNR values (2, 3, and 4) and three different sized samples for (50, 100, and 200), and hence there are altogether 9 different configurations. Thus, we tested our method over a total of 909 data sets for each test function. For completeness, in test function, 3 design points are uniformly distributed in and then they are scaled to the interval . Finally, the proposed method and the RBNN and GRNN methods were used to estimate the test functions.

For the HGA, we use a maximal number of basis function () and is defined as a subset of design points. At the beginning, the population is randomly initialized, each control gene is randomly selected from , and each parametric gene is calculated as a random real number defined over the range . The parameters used for the HGA were tuned experimentally as in [29] and are summarized in Table 3.

To compare and evaluate qualitatively the performance of the methods, the best estimates obtained using each method for the different configurations were obtained. These best estimates correspond to the minimal MSE value obtained using each method from the collection of 101 noisy data sets generated from each specific configuration. For practical reasons, we present only the results of 1 of the 9 configurations from the experimental study. These results are shown in Figure 2 and correspond to data sets with 50 samples and SNR = 3. The figure suggests that the HGA has more ability to adapt to discontinuities and sharp peaks in curves despite the level of noise and the small number of samples in contrast to GRNN and RBNN methods. These methods generate less smooth curves with overfitting in some sections. Similar results have been obtained for the other configurations.

For each simulated data set, the numerical measure that was used to quantitatively evaluate the quality of the estimated function is the mean square error (MSE) given byThe smaller value corresponds to a better quality of . To facilitate the comparison of the distribution of MSE values among the different methods, we use the boxplot. In descriptive statistics, a boxplot is a convenient visual representation that partitions a data distribution through their quartiles. Boxplots of the values for the different configurations are shown in Figures 3, 4, and 5. In Figure 3, the boxplots of MSE values for data sets with 50 samples are shown. In each panel, the boxplots correspond to a specific configuration. The columns correspond from left to right to noise ratios 2, 3, and 4, respectively. The rows correspond from top to bottom to the test functions 1 to 5. Similarly, Figures 4 and 5 show the boxplots of MSE values for data sets with 100 and 200 samples, respectively.

In Figure 3, it can be observed that in most cases the medians of the MSE values from HGA are lower than those resulting from the other methods, although there is no significant difference among the results. For example, in test function 5 with 50 samples and SNR = 2 (bottom-left of the figure), the GRNN performs a slightly smaller median of the MSE value than the HGA method, but the difference is not significant. In this case, the upper whisker of the MSE values for the HGA method is considerably bigger than those from the other methods, which implies a bigger dispersion for 25% of their MSE values. The most significant difference among the boxplots can be observed for test function 5 with 50 samples and SNR = 4 (bottom-right of the figure). In this case, the upper quartile of the MSE values for the HGA is under the first quartile of those from GRNN, which implies that 75% of MSE values from HGA are lower than 75% of MSE values from GRNN. In most cases in Figures 4 and 5, the upper quartile of the MSE values for the HGA is under the first quartile of those from the RBNN or GRNN methods. Although the whiskers of the boxplots from HGA have a different size, in all cases the lower whisker is smaller than the upper whisker, which implies an asymmetric distribution of the MSE values. Although the distribution of the MSE values for the HGA method is considerably bigger than those from the other methods, in most cases they are lower than those from the other methods.

Additionally, we compute the mean and standard error of mean for MSE values from all methods. These results are summarized in Tables 4, 5, and 6, corresponding to results for data sets with 50, 100, and 200 samples, respectively. In each table, the results corresponding to noise ratios 2, 3, and 4 are included. In Tables 7, 8, and 9 we include the results corresponding to parameters for the best function estimated from the experimental setup with 50 samples and SNR = 2, 3, and 4, respectively.

To clarify the effects of varying the number of samples and noise levels on the curve fitting task by using the HGA method, we include Figures 6 and 7. The first figure shows the best estimates obtained using HGA method for data sets from left to right with 50, 100, and 200 samples and SNR = 3. The second figure shows the best estimates obtained by using the HGA method for data sets with 50 samples and SNR = 2, 3, and 4, from left to right, respectively. In both figures, the rows correspond from top to bottom to the test functions 1 to 5, respectively. The performance of the methods significantly improves if the noise level is reduced or the number of samples increases, as shown in Tables 4, 5, and 6, and Figures 7 and 6. This comes as no surprise, because, as we increase the number of samples or decrease the noise level, we have more information to build the curve.

Finally, it could be assumed that the success of our approach is due to the simultaneous global search performed over all the parameters of the linear model. This shows the potential of HGA to handle the loss of information and high noise level. The average execution time for the methods based on neural networks was 0.29 seconds for experiments with 200 samples, compared to 132 seconds when using HGA. However, this drawback can be solved if we consider a parallel implementation of the model HGA. Our algorithm was implemented in C++ language. All methods were executed on a PC using an Intel Core i7 processor running at 2.2 GHz with 8 GB of RAM.

5. Conclusions

In this paper, we have proposed an efficient hierarchical genetic algorithm to tackle the automatic curve fitting problem. The method introduces a novel hierarchical gene structure for the chromosomal representation that permits the optimization of all free degrees of a linear model based on RBFs. To be more specific, our algorithm finds the best linear model using the fewest basis functions, optimal radii, and centre locations, and coefficients of the model simultaneously. The method does not require subjective parameters such as smooth factors or centre locations to perform the solution. On the other hand, the main drawback is the computational time required; however, it can be solved using parallel computing in order to reduce costs. This study focused on smooth functions, but it can be extended to approximate unsmooth functions. The algorithm was tested on several benchmark functions and compared to two successful methods based on RBF neural networks. Simulation results show that the proposed method presents qualitatively and quantitatively better results than the methods based on RBF neural networks because the method can handle the loss of information and high noise level more effectively. One of the attractive features of our method is that it can be easily extended to surface fitting.

Given the performance characteristics of the proposed approach, our future work will be to apply this method to fit real experimental data. We are interested in extending our approach to surface fitting, to experiment with variable length chromosomes and different basis functions.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors acknowledge the support of CONACYT Mexico, DAIP of the University of Guanajuato, the Information Technology Laboratory of CINVESTAV, Tamaulipas, and ITESI, Mexico. Also, they gratefully acknowledge the help of Dr. Carlos Montoro Rubin to improve the English language of their paper.