#### Abstract

Quasi-linear autoregressive with exogenous inputs (Quasi-ARX) models have received considerable attention for their usefulness in nonlinear system identification and control. In this paper, identification methods of quasi-ARX type models are reviewed and categorized in three main groups, and a two-step learning approach is proposed as an extension of the parameter-classified methods to identify the quasi-ARX radial basis function network (RBFN) model. Firstly, a clustering method is utilized to provide statistical properties of the dataset for determining the parameters nonlinear to the model, which are interpreted meaningfully in the sense of interpolation parameters of a local linear model. Secondly, support vector regression is used to estimate the parameters linear to the model; meanwhile, an explicit kernel mapping is given in terms of the nonlinear parameter identification procedure, in which the model is transformed from the nonlinear-in-nature to the linear-in-parameter. Numerical and real cases are carried out finally to demonstrate the effectiveness and generalization ability of the proposed method.

#### 1. Introduction

Many real-world systems exhibit complex nonlinear characteristics and hence cannot be identified directly by linear methods. In the last two decades, nonlinear models such as neural networks (NNs), radial basis function networks (RBFNs), neurofuzzy networks (NFNs), and multiagent networks have received considerable research attention for nonlinear system identification [1–4]. However, from a user’s point of view, the conventional nonlinear black-box models have been criticized mostly for not being user-friendly: they neglect some good properties of the successful linear black-box modeling, such as the linear structure and simplicity [5, 6]; an easy-to-use model is to interpret properties of nonlinear dynamics rather than being treated as vehicles for adjusting fit to the data [7]. Therefore, careful modeling is needed for a model structure favorable to certain applications.

To obtain the nonlinear models favorable to applications, a quasi-linear autoregressive with exogenous inputs (quasi-ARX) modeling scheme has been proposed with two parts included: a macro-part and a core-part [14]. As shown in Figure 1, the macro-part is a user-friendly interface favorable to specific applications, and the core-part is used to represent the complicated coefficients of the macro-part. To this end, by using Taylor expansion or other mathematical transformation techniques, a class of ARX-like interfaces is constructed as macro-parts, in which useful properties of linear models can be introduced, while their coefficients are represented by some nonlinear models such as RBFNs. In this way, a quasi-ARX predictor linear with input variable can be further designed, where in the core-part is replaced skillfully by an extra variable. Thereafter, a nonlinear controller can be generated directly from the quasi-ARX predictor, which is similar to the simple linear control method [15, 16]. In contrast, complex nonlinear controller design should be considered in NN based control methods, where two independent NNs are often contained: the one used for predictor and the other used for controller [17].

Actually, similar block-type models have been extensively studied and named in several forms according to their features, such as the state-dependent parameter models [18–20] and local linear models [10, 21]. Basically, identification methods can be categorized into three schemes:(1)Hierarchical identification scheme: quasi-ARX model structure can be considered as an “ARX submodel + NN” when NNs are utilized in the core-part [15, 16], and a hierarchical method has been proposed to identify the ARX submodel and the NN by a dual-loop scheme, where parameters in the ARX submodel are fixed and treated as constants in one loop, with the NN trained by a back propagation (BP) algorithm (only a small number of epochs are implemented); then the resultant NN is fixed to estimate the parameters of the ARX submodel in another loop. The two loops are executed alternatively to achieve a great approximation ability for nonlinear systems.(2)Parameter-classified identification scheme: when the nonlinear basis function models are embedded in the core-part of the quasi-ARX models, all the parameters can be classified as nonlinear (e.g., the center and width parameters in the embedded RBFNs) and linear (e.g., the linear weights in the embedded RBFNs) to the model. A structured nonlinear parameter optimization method (SNPOM) has been presented in [9] to optimize both the nonlinear and the linear parameters simultaneously for a RBF-type state-dependent parameter model, and improvement has been further given in [19, 22]. On the other hand, by using heuristic prior knowledge, the authors in [14, 23] estimate the nonlinear parameters of a quasi-ARX NFN model, and the least square algorithm is used to estimate the linear parameters. Similarly, a prior knowledge has been used for nonlinear parameters in a quasi-ARX wavelet network (WN) model, where identification can be explained in an integrated approach [24, 25].(3)Global identification scheme: in this category, all the parameters in the quasi-ARX models are optimized regardless of the parameter features and model structure. For instance, a hybrid algorithm of particle swarm optimization (PSO) with diversity learning and gradient descent method has been proposed in [10] to identify the WN-type quasi-ARX model, which is always used in time series prediction. Moreover, NN [26] and support vector regression (SVR) [13] are applied, respectively, to identify all the quasi-ARX model parameters.

In this paper, specific efforts are made to extend the second identification scheme based on classifying the model parameters. Compared with the other schemes, this one explores the model properties deeply and provides a promising solution to a wide range of basis function embedded quasi-ARX models. It is known that SNPOM is an efficient optimization method fallen into this category, which makes good use of the model parameters feature and gives impressive performance in time series prediction and nonlinear control. However, this technique is still considered as a “nontransparent” approach since it is aimed at data-fitting only, and model parameters are difficult to be interpreted along with physical explanation of real world or nonlinear dynamics of systems [7]. Therefore, it may constrain further development of the model. In contrast, a prior knowledge based nonlinear parameter estimation makes sense to interpret system properties meaningfully, especially with respect to the quasi-ARX RBFN model as discussed later in Section 3. The useful prior knowledge can evolve a quasi-ARX model from a “black-box” tool into a “semianalytical” one [27], which makes some parameters interpretable by our intuition, just following the principle of application favorable in quasi-ARX modeling. Owing to this fact, nonlinear parameters are determined in terms of prior interpretable knowledge, and linear parameters are adjusted to fit the data. It may contribute to low computational cost and high generalization of the model as parallel computation. Nevertheless, the problem is how to generate useful prior knowledge for an accurate nonlinear parameter estimation.

In the current study, a two-step approach is proposed to identify the quasi-ARX RBFN model for the nonlinear systems. Firstly, a clustering method is applied to generate the data distribution information for the system, whereby center parameters of the embedded RBFN are determined as cluster centers, and the width parameter of each RBF is set in terms of distance from other nearby centers. Then, it is straightforward to utilize the linear SVR for linear parameter estimation. The main purpose of this work is to provide an interpretable identification approach for the quasi-ARX models, which can be regarded as complementary to the identification procedures [6, 9, 13]. Compared with the heuristic prior knowledge used in quasi-ARX NFN model identification, the clustering based method gives an alternative approach to prior knowledge for nonlinear parameter estimation, and the quasi-ARX RBFN model is interpreted as a local linear model with interpolation. Moreover, when linear SVR is applied for linear parameter estimation, identification of the quasi-ARX RBFN model can be treated as an SVR with novel kernel mapping and associated feature space, and the kernel mapping is equivalent to the nonlinear parameter estimation procedure, which is transformed from a nonlinear-in-nature model to the linear-in-parameter one. Unlike the SVR-based method [9], the kernel function proposed in this study takes an explicit mapping, which is effective in coping with potential overfitting for some complex and noisy learning tasks [28]. Finally, in the proposed method, nonlinear parameters are estimated directly based on the prior knowledge; to some extent, it can be considered as an algorithmic approach for initialization of SNPOM.

The remainder of the paper is organized as follows. Section 2 introduces a quasi-ARX RBFN modeling scheme. Section 3 proposes the identification method of the quasi-ARX RBFN model. Section 4 investigates two numerical examples and a real case. Finally, some discussions and conclusions are made in Section 5.

#### 2. Quasi-ARX RBFN Modeling

Let us consider a single-input-single-output (SISO) nonlinear time-invariant system whose input-output dynamics is described as where ; , , and are the system input, output, and a stochastic noise of zero-mean at time , respectively; and are the unknown maximum delays of the input and output, respectively. with is the regression vector composed of the delayed input-output data. is an unknown function (black-box) describing the dynamics of system under study, which is assumed to be continuously differentiable and satisfies .

Performing the Taylor expansion to at , one hasThen (1) is reformalized with an ARX-like linear structure: where In (4), coefficients and are nonlinear functions of for and ; thus it can be represented by RBFN as where includes the center parameter vector and the width parameter of the th RBF , denotes the number of basis functions utilized, and is a connection matrix between the input variables and the associated basis functions. According to (3) and (5), a compact representation of* quasi-ARX RBFN model* is given as in which the set of RBFs with scaling parameter (the default value of is ) is

#### 3. Parameter Estimation of Quasi-ARX RBFN Model

From (6) and (7), it is known that (i.e., , ) for and are nonlinear parameters for the model, whereas () become linear when all the nonlinear parameters are determined/fixed. In the following, the clustering method and SVR are, respectively, applied to estimate those two types of parameters.

##### 3.1. Nonlinear Parameters Estimation

The choice of the center parameters plays an important role in performance of the RBF-type model [29]. In this paper, these parameters are estimated by means of prior knowledge from the clustering method rather than by minimizing the mean square of the training error. It should be mentioned that using the clustering method for initializing the center parameters is not a new idea in RBF-type models, and sophisticated clustering algorithms have been proposed in [30, 31]. In the present work, nonlinear parameters are estimated in a clustering way, which have meaningful interpretations. From this point of view, (6) is investigated as a local linear model with submodels , and the th RBF is regarded as a time-varying interpolation function for associated linear submodel to preserve the local property. Figure 2 gives a schematic diagram to illustrate the quasi-ARX RBFN model via a local linear mean.

In this way, the local linear information of the data can be generated by means of clustering algorithm, where the number of clusters (linear subspaces) is equivalent to the number of RBF neurons, and* each cluster center is set as the center parameter of the associated RBF*. In order to determine appropriately the operating area of each local linear submodel, width of each RBF is set to well cover the corresponding subspace. Generally speaking, we can* set the width parameters of the RBF neurons according to the distances among those centers*. For instance, a proper width parameter of certain RBF can be obtained as a mean value of distances from its center to its nearest two others. From (7), one knows that an excessive small value of the width parameters may result in insufficient local linear operating areas for all data, while a wide-shape setting will make all the RBFs overlapped and hence the local property of each linear submodel is weakened.

*Remark 1. *Figure 2 only gives a meaningful interpretation of the model parameters. In real applications, since the data distribution is complex and the exact local linear subspaces may not exist, the clustering partition approach is used to provide several rational operating areas, and the scaling parameter can be set to adjust the width parameters for good weighting to each associated area.

##### 3.2. Linear Parameters Estimation

After estimating and fixing the nonlinear parameters, (6) can be rewritten in a linear-in-parameter manner as where is an abbreviation of with in which, since in the th RBF has already been estimated, we represent the th RBF by a shorten form as in (9). Therefore, the nonlinear system identification problem is reduced to a linear regression one with respect to , and all the linear parameters are denoted by .

*Remark 2. *As a result of nonlinear parameter estimation, plays an important role in transforming the quasi-ARX RBFN models from nonlinear-in-nature to linear-in-parameter with respect to . Accordingly, it also transforms the nonlinear mapping from the original input space of into a high feature space; that is, . This explicit mapping will be utilized for an inner-product kernel in the later part.

In the following,* the linear parameters are estimated by a linear SVR*, considering the structural risk minimization principal assubject to where is the number of observations, and are slack variables, is a nonnegative weight determining how much the prediction errors are penalized, which exceeds the threshold value . The solution can be transformed to find a saddle point of the associated Lagrange function: where , , , and are nonnegative parameters to be designed later. The saddle point could be acquired by minimizing with respect to , , and : Thus, one can convert the primal problem (11) into an equivalent dual problem as subject to To do this, the training results and are obtained from (15), and the linear parameter vector is then obtained by the training value:

In the above way, contributions of the SVR-based linear parameter estimation method can be concluded as follows.(1)The robust performance for parameter estimation is introduced because of the structural risk minimization of SVR.(2)There is no need to calculate the linear parameter directly. Instead, it becomes a dual form of the quadratic optimization, which is represented by utilizing and depending on the size of the training data. It is very useful to alleviate the computational cost especially when the model suffers from the curse-of-dimensionality.(3)Identification of quasi-ARX model is specified as an SVR with explicit kernel mapping , which has been mentioned in Remark 2. To this end, the quasi-ARX RBFN model is reformalized as where is time of training data, and a quasi-linear kernel, which is explicitly explained in the following remark, is defined as an inner product of the explicit nonlinear mapping :

*Remark 3. *The quasi-linear kernel name is twofold. Firstly, it is derived from the quasi-ARX modeling scheme. Secondly, from (19) it is known that when is as small as zero, the kernel is reduced to a linear one, and nonlinearity of the kernel mapping is improved when increasing the value of . Compared with conventional kernels and with implicit kernel mapping, the nonlinear mapping of the quasi-linear kernel is turnable by , which also reflects the nonlinearity of the quasi-ARX RBFN models in the sense of the number of local linear subspaces utilized. A proper value of is essentially helpful to cope with the potential overfitting which will be shown in the following simulations.

#### 4. Experimental Studies

In this section, identification performance of the above proposed approach to quasi-ARX RBFN model is evaluated by three examples. The first one is an example to show the performance of quasi-ARX RBFN model for time series prediction. Second, a rational system generated from Narendra and Parthasarathy [17] is simulated with a small amount of training data, which is used to demonstrate the generalization of the proposed quasi-linear kernel. At last, an example modeling a hydraulic robot actuator is carried out for a general comparison.

In the nonlinear parameter estimation procedure, affinity propagation (AP) clustering algorithm [32] is utilized to partition the input space and automatically generate the size of clusters in terms of data distribution, where Euclidean distance is evaluated as the similarity between exemplars. Then, centers of all clusters are selected as the RBF center parameters in the quasi-ARX model, and the width parameter of a certain RBF is decided as the mean value of distances from the associated center to the nearest two others. For the linear parameter estimation, LibSVM toolbox [33] is applied, where -SVR is used with default setting by Matlab 7.6. Finally, the model performance is evaluated by root mean square error (RMSE) as where is the prediction value of the system output and is the number of regression vectors.

##### 4.1. Modeling the Mackey-Glass Time Series

The time series prediction on the chaotic Mackey-Glass differential equation is one of the most famous benchmarks for comparing the learning and generalization abilities of different models. This time series is generated from the following equation: where , , and , which are the most often used values in the previous research, and the equation does show chaotic behavior with them. To make the comparisons fair with the earlier works, we will predict using the input variables , , , and . Two thousand data points are generated with initial condition taken as for based on the fourth-order Runge–Kutta method with time step . Then, one thousand input-output data pairs are selected from to , which is shown in Figure 3. The first data pairs are used as training data, while the remaining are used to predict followed by withwhere .

The prediction of the Mackey-Glass time series using a quasi-ARX RBFN model starts, where clusters are obtained from the AP clustering algorithm, and thus RBF neurons are correspondingly constructed. Thereafter, SVR is used for linear parameter estimation, in which the super-parameter is set as . The predicted result is compared with the original time series of test data in Figure 4, which gives a RMSE of .

In Figure 4, the predicted result fits the original data very well; however, it is still not as good as the results from some famous models/methods listed in Table 1. Since no disturbance is contained in this example, it is found that the prediction performance can be easily improved by minimizing the training prediction error. In the comparison list, SNPOM for RBF-AR model, hybrid learning method for local linear wavelet neural network (LLWNN), and genetic algorithm (GA) for RBFN are all optimization-based identification methods, and it is relatively easy for them to achieve small RMSEs of the prediction by iterative training. However, these methods are much more time-costing in comparison with only seconds by the proposed method for the quasi-ARX RBFN model. In addition, although the -means clustering method for RBFN is implemented in a deterministic way and shows efficient result, the number of RBF neurons used is as big as 238. In fact, a small prediction RMSE obtained from these methods does not mean good identification of the models, since overtraining may happen some times.

In the present example, we confirm the effectiveness of the optimization-based method given above and propose a hybrid approach for identification of the quasi-ARX RBFN model, where prediction result from the proposed method can be further improved by SNPOM (the function “lsqnonlin” in the Matlab Optimization Toolbox is used [9]). It is seen that the prediction RMSE can be improved to by only iterations of implementation in SNPOM, and the result becomes compatible with others. However, such optimization is not always effective, especially in model simulations on testing data, such as in the model , where is the prediction value of . In the following, a rational system is evaluated by simulated quasi-ARX RBFN models to show advantages of the proposed method.

##### 4.2. Modeling a Rational System

Accurate identification of nonlinear systems usually requires quite long training sequences which contain a sufficient amount of data from the whole operating region. However, as the amount of data is often limited in practice, it is important to study the identification performance for shorter training sequences with a limited amount of data. The system under study is a nonlinear rational model described aswhere and is the white noise.

Difficulty of this example lies in the fact that only samples are provided for training, which is created by random sequences distributed uniformly in the interval , while testing data samples are generated from the system with input:The excited training signal and system output are illustrated in Figure 5.

In this case, clusters are automatically obtained from the AP clustering algorithm; then the nonlinear parameters and of the quasi-ARX RBFN model are estimated as Section 3 described. SVR is utilized thereafter for linear parameter estimation, where super-parameters are set with different values for testing. Following the training, the simulated model is with where and denotes the simulated result in the previous step. Figure 6 simulates the quasi-ARX RBFN model on the testing data, which gives a RMSE of under the super-parameter .

Due to the fact that identification of the quasi-ARX RBFN model can be regarded as an SVR with quasi-linear kernel, a general comparison is given to show advantages of the quasi-ARX RBFN model from SVR-based identification. Not only the short training sequence but also a long sequence with pairs of samples, which is generated and implemented in the same manner as the short one, is applied for comparing. Table 2 presents the comparison results of the proposed method (i.e., SVR with quasi-linear kernel), SVR with linear kernel, SVR with Gaussian kernel, and quasi-ARX model identified directly by an SVR (Q-ARX SVR), where various choices of SVR super-parameters and for Gaussian kernel are provided. From the simulation results under a short training sequence ( samples), it is seen that when the design parameters are optimized, SVR with quasi-linear kernel performs much better than the ones with Gaussian kernel and linear kernel, and the quasi-linear kernel also performs little sensitively with respect to the SVR super-parameter setting. Moreover, although the Q-ARX SVR method utilizes the quasi-ARX model structure, it only provides a similar simulation RMSE to SVR with Gaussian kernel. However, these simulation results cannot be resorted to refute the effectiveness of the SVR with Gaussian kernel and Q-ARX SVR method for nonlinear system identification. In the simulations, for a long training sequence ( samples), it is found that Q-ARX SVR method outperforms all the others, and SVR with Gaussian kernel also performs much better than the ones with quasi-linear and linear kernel.

On the other hand, from the perspective of the performance variation caused by different training sequences, histograms of simulated error for SVR-based methods are given in Figure 7, where performance of simulations is illustrated using, respectively, the short training sequence and the long training sequence. It indicates that the SVR with linear kernel has the most robust performance to amount of training data, and the robust performance is also found in the quasi-linear kernel compared with the Gaussian kernel and Q-ARX SVR method, where significant deterioration is found in the simulations when a limited amount of training samples are used. This result implies that Gaussian kernel and Q-ARX SVR may be overfitted since the implicit nonlinear mapping is carried out, which has strong nonlinear learning ability but with no idea about how “strong” the nonlinearity need is. In contrast, the truth behind the impressive and robust performance of the quasi-linear kernel is that prior knowledge is utilized in the kernel learning (nonlinear parameter estimation), and a number of parameters are determined in terms of data distribution, where complexity of the model (nonlinearity) is tunable according to the number of local linear subspaces clustered. In other words, the quasi-ARX RBFN model performs in a local linear way; hence it can be trained in a multilinear way, better than some unknown nonlinear approaches for the situation with insufficient training samples.

**(a)**

**(b)**

Moreover, the RBF-AR model is utilized with SNPOM estimation method for this identification problem, where the number of RBF neurons are determined by trail-and-error, whose initial values are given randomly. Considering randomness of the algorithm, ten runs are implemented except that the results fail to be simulated, and the maximum iterations value in SNPOM is set to . Consequently, four RBFs are selected for RBF-AR model, which gives a mean RMSE of using short training sequence, compared with the result of when the long training one is utilized. Although the parameter setting for this method may not be optimal, we can generate the same conclusion for the Q-ARX SVR method, which is overfitted in the case of training by short sequence.

##### 4.3. Modeling a Real System

This is an example modeling a hydraulic robot actuator, where the position of the robot arm is controlled by a hydraulic actuator. The oil pressure in the actuator is controlled by the size of the valve opening through which the oil flows into the actuator. What we want to model is the dynamic relationship between the position of the valve and the oil pressure .

A sample of pairs of has been observed as shown in Figure 8. The data is divided into two equal parts, the first samples are used as training data, and the rest are used to test the simulated model. For the purpose of comparison, the regression vector is set as . We simulate the quasi-ARX RBFN model on the testing data by with where and is set as heuristically due to the complex dynamics and data distribution in this case, which insures that the RBFs are wide enough to cover the whole space well. Similar setting of can also be found in the literature for the same purpose [34, 35].

To determine the nonlinear parameters of the quasi-ARX RBFN model, AP clustering algorithm is implemented, and clusters are generated automatically. Then, SVR is utilized for the linear parameter estimation. Finally, the model is identified and simulated in Figure 9 by the testing data, which gives a RMSE of . This simulation result is compared with the ones of linear ARX model, NN, WN, and SVR-based methods shown in Table 3. From Table 3, it is known that the proposed method outperforms the others for the real system. In addition, RBF-AR model with SNPOM estimation method fails to be simulated in this case, where the number of RBF neurons is tested from to , and their initial values are given randomly.

#### 5. Discussions and Conclusions

The proposed method has a twofold role in the quasi-ARX model identification. For one thing, the clustering method has been used to uncover the local linear information of the dataset. Although similar methods have appeared in the parameter estimation of RBFNs, meaningful interpretation has been given here to the nonlinear parameters of quasi-ARX model in the manner of multilocal linear model with interpolations. In fact, explicit local linearity does not always exist in many real problems, whereas clustering can provide at least a rational multidimensional space partition approach. In the future, a more accurate and general space partition algorithm is to be investigated for identification of quasi-ARX models. For another, SVR has been utilized for the model’s linear parameter estimation; meanwhile, a quasi-linear kernel is deduced and performed as a composite kernel. The parameter in the kernel function (19) corresponds to the amount of subspaces partitioned, which is therefore preferred not to be a big value to cope with the potential overfitting.

In this paper, a two-step learning approach has been proposed for identification of quasi-ARX model. Unlike the conventional black-box identification approaches, prior knowledge is introduced and makes sense for the interpretability of quasi-ARX models. By minimizing the training data error, linear parameters to the model are estimated. In the simulations, the quasi-ARX model is denoted in the form of SVR with quasi-linear kernel, which shows great approximation ability as optimization-based methods for quasi-ARX models but outperforms them when the training sequence is limited. Finally, the best performance of the proposed method has been demonstrated with a real system identification problem.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grants 81320108018 and 31570943 and the Six Talent Peaks Project for the High Level Personnel from the Jiangsu Province of China under Grant 2015-DZXX-003.