Abstract

Choosing an accurate model order is one of the important sections in system identification. Traditionally, the model order selection of a nonlinear system depends on a predetermined model. However, it requires excess calculation and is impossible to get rid of the trouble of the structural design of the model, once the specific model was determined. A false nearest neighbor (FNN) algorithm that only relies on input-output data to estimate the model order is proposed here. Due to the FNN algorithm is sensitive to its own threshold which is a crucial constant for evaluating the model structure, Gaussian mixture model (GMM) clustering based on a genetic version of the expectation-maximization (EM) algorithm and minimum description length (MDL) criterion is developed in this paper, where the order can be determined without relying on a specific model. The GMM clustering is proposed to calculate the threshold of the FNN. Then, the genetic algorithm and MDL criteria are embedded to optimize the calculation of the EM algorithm as reduce the influence of initial values and not prone to fall into local extreme values as well. Three examples are given here to indicate the superiority of this technique: simulation of a strongly nonlinear system, isothermal polymerization process, and Van der Vusse reaction in relevant reference. Finally, some typical modeling methods are conducted to confirm the validity of this approach.

1. Introduction

The applications of employing input-output data to drive nonlinear models have attracted great interests. Such as artificial neural networks (ANNs), support vector machines (SVMs), fuzzy systems, and other empirical methods based on data or rules have been widely utilized [14]. These methods all belong to the same modeling framework that is Nonlinear Auto-Regressive models with eXogenous inputs (NARX). For modeling such a structure, the “dimensions” or other information required are necessary to integrate the model. The NARX has a critical excellence for modeling as the construction process, which includes many inherent parameters. These parameters are such as the number of neurons in the neural network and order of differential equation and the nonlinearity form of the model which have a considerable impact on the capability of created models [5]. However, the model will fall into overfitting and a mass of calculation time will be consumed when the dimension of the input variables is up to a certain range. The search space is large that would increase the complexity of structure detection, once the model has increased the maximum degree of nonlinearity. A nonlinear polynomial NARX model of its maximum number of times and a maximum input/output lag of 10 is considered, and a group of more than 30 million candidate models would be generated in the search space [6]. Hence, seeking for the appropriate dimension of input data is the first step. The determination of the input dimension in NARX is often refers to the real order of the control system, which is a very hard task to identify the true order of a nonlinear system in practice. In response to this issue, previous scholars have presented some effective methods, which are mainly divided into two types that model-driven or model-free.

1.1. Model-Driven

The model structure is presumed to be known about most data-driven recognition algorithms. Furthermore, the parameter identification is implemented on the basis of the predetermined model. In other words, the structure and parameters of the model are obtained simultaneously. For the Auto-Regressive (AR) model, [7] set it as a bilinear system. Then an auxiliary model-based least-square iteration (AM-LSI) algorithm and a hierarchical auxiliary model-based least-square iteration (H-AM-LSI) algorithm were proposed to estimate parameters in order to achieve modeling purpose. When heavy-tailed noise existed in the system, the method of used the RBF network to match the nonlinear modules in the Hammerstein model and adding the cuckoo search, a heuristic algorithm in the identification process was proposed by [8], which reduced the influence of heavy-tail noise in model identification. A new type of Wiener-type recurrent neural network with MDL principle was designed by [9] for unknown dynamic nonlinear system identification, in order to convert the issue of Wiener model identification to the RNN structural parameter identification. And then the operation of constructed and evaluated through MDL criteria is highly convenient. However, the model order selection for nonlinear systems requires excessive computation and diverse model design structures, which results in unnecessary waste of resources. Therefore, we introduce a false nearest neighbor algorithm that only relies on input-output data to estimate the model order.

1.2. Model-Free

There is relatively little research on the structure selection of model-free in nonlinear systems, while the false nearest neighbor (FNN) is a more typical algorithm for this matter. It was originally applied to analyze the minimum embedding dimension of chaotic time series [10], and the FNN for data-based model order selection was first innovated by [11]. The structure was determined by calculating the percentage of false neighbors (i.e., vectors that countered the above surmises), but the user must specify appropriate threshold constants for the scheme. Unfortunately, the determined thresholds are not entirely suitable for applications in different time series but depended on the particular system being researched. In this regard, the technique of approximating threshold constants by Gath–Geva fuzzy clustering was developed by [12] to solve this problem. However, this clustering algorithm inevitably increased the complexity of the calculation, thus combining model parameter estimation with the selection of the number of cluster components is expected in subsequent research.

In order to take shape a briefly interpretative model that does not depend on converted input variables, a new clustering algorithm was established in view of the expectation-maximization (EM) distinction of the Gaussian mixture of models. An important model for statistical machine learning, pattern recognition, and array data analysis, Gaussian mixture model (GMM) clustering was widely utilized in systems identification and modeling [13], process monitoring [14], and fault diagnosis [15, 16]. At present, there are some powerful information theory criteria that provide a basis for component selection in input-output models. The currently prevalent model determination criteria include final prediction error (FPE), Akaike’s information criterion (AIC), Bayesian information criterion (BIC), MDL, minimum message length (MML), and other information criteria can be used to evaluate the structure of the model. It is a fairly simple task that determines the components of a model through these tools [17]. The MDL criterion developed by [18] can produce a consistent estimate of the structure of a linear dynamic model. In [19, 20], these methods are designed for composition selection in Wiener neural network models and Hammerstein recurrent neural networks. Although these tools do not have complicated and profound theories, they are convenient and effective for most models. Some applications and improvements can be seen in [21, 22].

Another issue is whether the estimated parameter of the candidate model can correctly match the corresponding data. The EM algorithm furnishes an optional iterative procedure for calculating posterior density or likelihood functions. The EM algorithm increases the posterior probability at each step, nevertheless, obtaining a unique global optimal value from multiple optimal values cannot be guaranteed. This leads to changing the initial values of the parameters in the EM algorithm, which may lead to completely different results. There are also a number of practical solutions, including the split and merge operations to the EM, deterministic annealing means, and Bayesian Yin Yang technique, which were seen in [2325]. The genetic algorithm was a typical evolutionary algorithm introduced into the EM algorithm by [26], hence a GA-EM version has been completed. The method attempted to improve future results through the “knowledge” of past iterations. The research and application of GA-EM was gradually popular [27], The MDL criterion has also been appended to the evolution of GA so that the algorithm can find the optimal number of components and determine the parameters of the mixture model components simultaneously. An improved EM algorithm based on standard GA in bridge damage detection was proposed by [28]. The robustness of seeking the optimal cluster number of GMM, its parameters, and enhancing the performance of damage classification are highlighted in this work [29] has exploited a hybrid descendant algorithm combined with variant EM (GA-VEM) to greatly ameliorate the performance of brain MR image segmentation.

Therefore, we introduce a false nearest neighbor algorithm that only relies on input-output data to estimate the model order. Meanwhile, due to the false nearest neighbor algorithm is sensitive to its own threshold which is a crucial constant for evaluating the model structure, Gaussian mixture model clustering based on a genetic version of expectation-maximization algorithm and minimum description length criterion is developed in this paper. The Gaussian mixture model clustering is proposed to calculate the thresholds of the false nearest neighbor, and genetic algorithms and minimum description length criteria are embedded to optimize the computation. The advantages of this approach include (1) the order can be determined without relying on a specific model. (2) The drawback of the FNN is avoided with the GMM clustering. (3) The genetic algorithm is introduced to optimize the calculation of the EM algorithm as reduces the influence of initial values and not prone to fall into local extreme values as well. Meanwhile, it should be kept in mind that this paper aims to provide some beneficial guidance for choosing a tentative model and decomposing this complex problem. Therefore, when the order of the nonlinear model is precisely selected, the subsequent selections of parameters and structure will be greatly facilitated.

This paper is organized as follows: in Section 2, the geometric concepts and principles of the FNN algorithm are stated. In Section 3, this paper proposed how the FNN algorithm was improved by Gaussian mixture model clustering, and the detailed derivation process of the clustering covariance matrix was explained. In Section 4, according to the theory of GA and MDL, the detailed information and workflow of the GA-EM algorithm are illustrated. In Section 5, the superior feasibility of the developed approach is demonstrated by three instances of nonlinear simulation processes. Section 6 summarizes this paper.

2. False Nearest Neighbors Algorithm

In this section, we introduce the NARX framework and the FNN algorithm and describe how to build a NARX-type model based on the geometric idea of the FNN algorithm.

Most nonlinear systems can be represented by the NARX framework. The input of this framework is expressed in a vector as follows:where is the system output of historical time m before k, and is the system input of historical time n before k. A nonlinear regression function is involved here to associate with .where can be a polynomial, neural network, or any other nonlinear mapping. The numbers of past output and input are represented by m and n, respectively, and their values are usually called model orders. It is assumed here that all the input information we need to construct the model is contained in x. In addition, there are input noise and observation noise that cannot be measured. Therefore, our task is to choose the best m and n to construct a NARX-type model as shown in Figure 1.

In view of this task, we introduce the FNN algorithm which is based on the geometric idea. If the dimension of the input vector is large enough, then input points which are close in the regression space will always remain relatively close in the future output. For the above two arbitrary regression vectors that are close in the regression space which embedded in proper dimensions, the relationship between them is expressed as follows:where is the matrix consisted of partial derivatives at . And, represents a better quantity that approaches zero much faster than the distance between and as approaches . This approximation is used exclusively in the representation of Taylor polynomials for univariate scalar functions. And, since the two regression vectors are very close, their distances are expressed by higher-order portion. Discarding the higher-order portion, Cauchy–Schwarz inequality can be written as follows:

If , accord with the above-given expression, the vector is considered as a true neighbor. Otherwise, they are false neighbors.

In view of this theory, the following is an outline of the FNN algorithm:(1)For a certain data vector and its nearest neighbor in the regression space, d is used here to represent the distance between them:(2)Determine whether they are true neighbors to each other,If the expression is less than R, this data point is saved as a true neighbor. Otherwise, it is a false condition. Here, R is the predetermined threshold constant.(3)The algorithm is executed k times for all data points.(4)Calculate the percentage of FNN in this case.(5)With the method of increasing m and n, the percentage value of FNN is continuously reduced to an acceptable number.

However, this is only a desired solution. As [10] said, the accuracy of the FNN algorithm directly depends on whether the selection of the R threshold is reasonable. It is generally selected based on empirical intervals in . Yet, it is worth noting that an invariable threshold that can be applied to all data points is absolutely nonexistent. In this case, a more optimized method is to use the maximum value of the Jacobian determinant to estimate R from equation (4), as developed by [11].

Since this approach involves the Jacobian determinants of the identified model, whose solving process may tend to inaccurate values and the estimation of the model order may be damaged by its inherent structure and parameters. Thus, we should carefully consider the design of the model used to calculate the Jacobian matrix of a nonlinear system. In order to improve the robustness of the FNN algorithm, an approach based on GMM clustering is presented in the following section.

3. GMM Clustering Based FNN

In this section, we introduce GMM clustering of data and typical EM algorithms and describe the estimation of R thresholds. And, the FNN algorithm is improved by the GMM clustering and the detailed derivation process of the clustering covariance matrix is explained.

When the available input-output data is approximately contracted in the space of the regressors and the system output and the proper regressors are employed, the collected clusters can be viewed as the regression surface of the model. In this condition, each cluster can be approximated as a local linear block of the system and can be contributed to estimate R, as depicted in Figure 2(a).

3.1. GMM Clustering for the Data

The GMM is a mixed density model with a basic Gaussian distribution, which is equivalent to the weighted average of multiple Gaussian probability density functions. Each Gaussian density function is called a component. When the number of components is large enough, the Gaussian mixture model can arbitrary approximate continuous distributions with high accuracy. Therefore, the spatial distribution of the data and its characteristics can be well characterized.

First, we newly define a data matrix Z, which is composed of regression matrix x and output vector y and is written as , where , . Since n + m + 1 elements are included in each row of Z, the column vector is represented as . Z is divided into c clusters by GMM clustering and the posterior probability matrix is calculated, whose element is the posterior probability of generated in the ith Gaussian mixture submodel, i = 1,…,c.

Definition 1. .In sample space , a Gaussian mixture model refers to a probability distribution with the following form:where , is the n-dimensional mean vector in the ith cluster, is the -dimensional covariance matrix in the ith cluster, is the weight of each component that is called the mixture weight and according to:Each component density is a normal probability distribution, can be written as follows:Using the Maximum Likelihood (ML) theory to acquire the parameters of the GMM, the log-likelihood expression is given by the following equation:The estimation procedure of the GMM parameters with the EM algorithm is listed as follows and is denoted as Algorithm 1.

(i)Initialization. Choose the number of components c, maximum of iterations iter_max, tolerance and the initial value of
(ii)Repeat
(iii)while or
(iv) for (E-step)
  Compute the posterior probability generated by each component
 end for
(v) for (M-step)
  Mean vector:
  Covariance matrix:
  Mixing coefficient:
(vi) end for
, update parameters
(vii)end while
(viii)The cluster is divided according to and then the completed covariance matrix is obtained

3.2. Estimation of R Threshold

As shown in Figure 2(a), the collection of c clusters is approximately seem as several regression surfaces. Therefore, each cluster can be approximated as a local linear subspace depicted by a cluster ellipsoid, as depicted in Figure 2(b). According to the characteristics of the data, in the covariance matrix of a certain cluster, its smallest eigenvalue is usually significantly smaller than the remaining eigenvalues in orders of magnitude.

In the same cluster, the eigenvector with the smallest eigenvalue is regarded as the normal vectors of the hyperplane spanned by the rest of the eigenvectors [12].

The mean vector is also written in the same format as the observation vector Z, which is partitioned as two parts. It concludes a vector corresponding to , and a scalar corresponding to . The smallest eigenvector is divided in the same pattern, . After that, equation (11) can be partitioned as follows:

After the analysis of the above linear hyperplane, equation (12) is further rewritten into the form with a linear equation:

In this way, the best value is estimated from the partial derivative (13) calculated by the shape of the clusters:

The threshold can be computed as follows:

Although this method that extracts the parameters from the regression space is based on geometric theory, it can be proved that equation (13) is still a weighted total least-squares estimation for the parameters. With the operation of a linear transformation, the threshold of FNN is changed from a constant to an adaptive function that varies with the data point. This suggests that based on the results of GMM clustering, different values are calculated for all input-output data points.

4. The Genetic Version of EM Algorithm

In this section, the GA and MDL algorithms are introduced. Then, the GA-EM algorithm is constructed based on the theory of the GA and the MDL. Meanwhile, the details and workflow of the GA-EM algorithm are described.

4.1. Genetic Algorithm

The Genetic Algorithm aims to imitate the evolution of nature. In the natural environment (fitness function), the initial population will be selected as the fittest population. The genetic inheritance when breeding a new generation of individuals includes recombination (crossover operation) and mutation. This genetic method can guarantee that the population evolves in a better direction, as the diversity of the population avoids local convergence as well. The reproduction of the population continues until the optimal solution is reached. Finally, a mixed distribution model of the optimal number of components is selected according to the optimal model selection criteria [30]. The following are the specific processes.

4.1.1. Encoding

As shown in Figure 3(a), each individual in the population consists of three stages. The first part (Part A) creates binary encoding, in which the total number of bits is determined by the user-defined maximum of components Mmax. Each bit represents one particular Gaussian component in the model. If the value of this bit is 0 in the encoding, then its corresponding component can be omitted in order to model the mixture. In contrast, it means this component is responsible for some data points in the mixture when the bit is set to 1. The second part (Part B) introduces floating-point value encoding which is utilized to record the weight of Mmax components. Note that, the weight might need to be reset to uniform distribution except for the best individual (elitist) as a result of the switching mechanism of the components among individuals during evolution. And, the principle is to keep the weight as long as possible. Similarly, the third part (Part C) also uses the same way to record the mean and covariance of Mmax components.

4.1.2. Recombination

After crossover operations, two-parent individuals are randomly selected from the current population and recombined to generate two offspring. The total number of H (H < Z and is a multiple of 2) children will be generated in this step. Through the single-point crossover, the crossover position within Part A of the individual is randomly chosen and the positions to the right of the value of the gene between two selected parent individuals are exchanged (see Figure 3(b)). Part B of the offspring are set to uniform distribution and Part C are exchanged correspondingly.

4.1.3. Selection

The (T, H)-strategy is a common method for retaining elite offspring in GA and is applied here. The method involves both a parent population and an offspring population corresponding T and H individuals respectively. Among the two populations, the T best individuals are chosen to produce the next generation population .

4.1.4. Enforced Mutation

The aim of the mutating operation is to ensure the diversity of the population and consequently avoid the local convergence that may result from crossovers. The correlation coefficient is calculated in pairs between the components p and q according to the posterior probabilities and , which is an index to measure this similarity. If the value of the correlation coefficient is higher than the previously set threshold , one of these two components is randomly chosen and added into the candidate set for mutation. A binary value is extracted from the uniform distribution of each component when the candidate set for forced mutation has been established. Then, the candidate components are sequentially deleted from which reset the corresponding bit in the individual part A according to their respective values, or a randomly selected data point is set as the average value of the candidate component.

4.1.5. Mutation

This form of the mutation used here is relatively simple that the value of the original gene in the individual coding string is replaced with a random number that accord with the probability constant pm. For part C, the mutated genes of individual are set to a uniformly distributed random numbers sampled within the lower and upper limits. The mutation rate of value encoding is reduced in proportion to L. Since the GA-EM is elitist, we have no need to make any mutations in the best individuals.

The uniform mutation operation is particularly suitable for the initial running stage of GA-EM. It allows the search points to move freely throughout the search space, thereby the algorithm can handle more patterns in a consequence of the increased diversity of the population.

4.2. Model Selection Criterion: MDL

The MDL criterion is one of the most commonly employed selection criterion, which was be developed by [18] when studying universal coding in 1978. The initial principle of MDL is to find the minimum of the decoding length accompany with its residual. In the application of model selection, MDL usually consists of two parts of coding guidelines which are defined as follows:where M and N are the dimension and number of observation vectors, respectively, K is the number of GMM components. The expression of L as follows:

From the above-given formulas, it can be seen that the MDL can not only make the parameters of the model to be estimated fit the maximum likelihood under the existing sample conditions but also can simultaneously punish the order and training data of the generated model. Therefore, the phenomenon that the model is excessively accurate owing to too much data and overfitting is avoided.

4.3. The Procedure of GA-EM

The key purpose of interacting GA with EM algorithms is to take advantage of the characteristics of both algorithms. In the population of the genetic algorithm, each individual may be the solution of a Gaussian mixture model. The MDL criterion mentioned in the previous section is treated as the fitness function in GA. In this work, two measures are used in the individual evaluation process. First, the EM algorithm is operated Vmax times for each individual, where Vmax is the maximal iterations set by the user. This results in the set of parameter to be gradually updated with each execution of the EM algorithm until the relative log-likelihood function drops below a certain threshold ε. The second measure is to solve the MDL value of each updated individual to evaluate the model. It can be seen that the excellence of the GA-EM is that the offspring has the capacity to inherit the best individual from its parent generation. It must be noted, however, that this mechanism has been promoted to elitist. That is to say, the convergence of the best population in the generation t + 1 will not be worse than that in the generation t. Therefore, the weight of the best individual should also be retained in the next generation like the parameter .

The evolutionary process of GA-EM would not be terminated prematurely until the components in the optimal model no longer change in the next few generations. From practical experience, the number of evolutions generally will not exceed five times. Once the relative log-likelihood function of the mixture model has fallen below an appropriate threshold ε, the EM algorithm played a role in promoting the best individuals amin up to now.

The procedure of the GA-EM algorithm is given in Algorithm 2, and the whole flow chart of this project is illustrated in Figure 4.

begin
 Initialize
, ,
while
  
  
  
  
  
  
  
  
  
  if then
   
   
  else
   
  end if
  
  
  
end while
 Run GA-EM until the terminal condition of convergence is reached
end

5. Application

The designed model order selection algorithm based on the GMM cluster analysis will be illustrated by three nonlinear cases, including strong object-free nonlinear difference functions, isothermal polymerization process, and third-order Van der Vusse reactor from relevant literature. All three examples are typical nonlinear systems, so the differences between the consequences highlight the necessity to apply the proposed method. Finally, typical modeling methods are used to calculate the posterior error and analyze the accuracy of this work.

5.1. Simulated Function

The first case is a strongly nonlinear difference equation, where m = 2, n = 2.

Due to the theoretical characteristics and experimental research of the FNN algorithm, a pseudorandom binary signal (PRBS), which is commonly selected as the excitation signal cannot be applied to this work. When a small amount of identification data with different control signal sequences are generated by PRBS, the data is approximately regarded as a Gaussian distribution, which means the reliable derivative information cannot be accurately collected from the cluster. As a result, the sensitivity of the proportion of nearest neighbors to the input order n is descended. Thus, PRBS is not suitable for generating identification data for the FNN method, and in this research, the process was excited by a random signal with amplitude. The data distribution of is described in Figure 5 when m = 0, n = 1, and the results of division by MDL criteria, where GA is introduced in (b) but (a) is not.

The range of input values in this article is randomly determined as a uniform distribution in 0.12–1.89; in addition, is selected in turn. In order to approach a more realistic working situation, Gaussian noise with a mean of zero and different variance is added to the observation variables of the system. The consequence is set out in Table 1. In this task, the values of R in Table 1 (a) are all fixed at constant 1, the eigenvector information of the cluster is computed by the EM algorithm in Table 1 (b), and GA-EM is used in Table 1 (c) where the parameters are K = 10, H = 4, R = 5, Mmax = 10, t = 0.95, pm = 0.02.

It can be seen from table 1 that the expected result cannot be discerned by directly using the fixed threshold R. When the genetic algorithm is not introduced, the false positive rate of FNN under the condition of m = 1, n = 2 is 0.1, which is very close to the FNN ratio of 0.05 under the condition of m = 2, n = 2. This means that the proposed method is not valid for this case, that is, the order information of the model cannot be identified. After adding the genetic algorithm, the influence of the initial value and local extreme value of the EM algorithm on the system is effectively reduced. Furthermore, the order information of the model can be clearly identified. Besides, the parameters of the GA-EM algorithm can be selected by the lowest MDL value, as compared in Figure 6 (The result is based on the data at m = 1, n = 0).

5.2. Isothermal Polymerization System

In this case, an isothermal polymerization reaction model is introduced here and the control structure is shown in Figure 7. The reaction is a free radical polymerization of methyl methacrylate (MMA) with azobisisobutyronitrile (AIBN) as the initiator and toluene as the solvent and carried out on a jacketed continuously stirred tank reactor. We simulate the model in a computer to generate the data. The model can be given by a set of nonlinear differential equations.

At the temperature of 335 K, the steady-state operating points and model parameters are given in Tables 2 and 3. For a more detailed introduction, please refer to [31].

The number average molecular weight (NAMW) is the product by controlling the inlet initiator flow rate FI. As shown in Table 3, NAMW is the process output y and FI is the process input u. In order to estimate the model order, 500 sets of input-output data points are used as the origin data of the system (see Figure 8).

The GMM clustering can convert the nonlinear feature of discrete data into local linear combinations, so the linear feature information extracted from the clustering can be directly applied to estimate the working state and the order of the model. Because it has been stated in Ex. 1 that it is not feasible to set the threshold R to be a constant value, no further examples will be given.

It can be seen from Tables4 and 5 that the introduction of genetic algorithms has no significant effect on the ratio of FNN. The reason that can be explained is there are not a large number of data sampling points used in the case. As the dataset increases, the difference in FNN ratios between the genetic version and the pure version will become more apparent.

5.3. Van der Vusse Reactor

The process considered in Case 3 is a third-order exothermic Van der Vusse reaction in a continuous stirred tank reactor (CSTR) with a cooling jacket. It is a strongly nonlinear process with nonminimum phase behavior and multiple inputs [32, 33]. The energy equation of the system is given by the following equation:where , are the inlet and outlet concentrations, respectively, is the reactor temperature, is the temperature difference between the internal environment and jacket, represents the dilution rate. The parameters are given in Table 6. For more parameter introductions and initial states of reactions, see [32]. The input flow is selected as the input to the system, while remains constant at . To estimate the model order, the data points from [12] are used.

Both calculation methods based on GMM clustering have resulted in the value of the FNN that given by n = 2 and m = 2 dropping to 0 (see Tables 7 and 8). In earlier papers [34], a similar model structure has appeared in the control system based on this work.

5.4. Verification Using Determinate Models

As a superior nonlinear mapping tool, an artificial neural network (ANN) has theoretically the inherent potential of fitting arbitrary functions and processing parallel information. In many cases, ANN can provide fairly precise models for nonlinear controls when sufficient data is available or the equations and functions of the model are unknown. Due to the inherently nonlinear structure and the powerful ability to model with only training data and expected labels, ANN has remarkable excellence in the soft measurement of industrial data and capturing the dynamic characteristics of complex control systems [35].

5.4.1. Several Algorithms

Radial Basis Function (RBF) network, Support Vector Regression (SVR), SVR based on Bat Algorithm optimization (BA-SVR), Extreme Learning Machine (ELM), and Back Propagation network be trained using Levenberg–Marquardt algorithm (LM-BP) are proposed here to verify the validity of this approach. The error criterion is Root Mean Squared Error (RMSE), described in the following equation:where N is the number of training data, and represent the target/desired output and the neural network output, respectively. A 10-fold cross-validation was performed on the original data after scaling. In the meantime, white noise with a mean of 0 and a variance of 0.001 was added to the data to simulate the real working condition. After the above operation, we got Table 9.

It can be seen from the value in Table 9 that different modeling tools have inconsistent error accuracy. From the longitudinal (inside these tools) comparison results of the above-given five tools, the RMSE values when m = 2 and n = 2 has a downward trend relative to the surrounding values. In other words, this position is the local minimum point. This is caused by the reason that the noise error of the model is highly accumulated during the simulation process and the subsequent overfitting. This result well indicates that the improved FNN algorithm of model order selection is fairly reliable.

6. Conclusion

This article provided a new idea for order selection of the NARX model framework. The basic aim of this paper is that if there is adequate historical input-output information, we can choose the model order that is most suitable for the system based on the data directly. The validity and superiority of this approach are demonstrated in a simulation model and two real instances. First, Gaussian mixture model clustering analysis was introduced into the input-output data space. And, then the model order is estimated based on the eigenvalues of the each covariance matrix which obtained by GMM clustering. Considering the limitations of the EM algorithm and the unknown number of clustering components, genetic algorithms, and MDL criteria are embedded into the EM calculation process at the same time, which improved the accuracy of FNN ratios.

Without the above-given project, it is necessary to construct many models with different orders and then analyze the results after establishing them. However, the FNN criterion based on the GA-EM-GMM algorithm made the model order selection not readily impacted by the parameter identification algorithms and structural parameters of the predetermined nonlinear model. Therefore, the efficiency of the entire nonlinear system identification is greatly improved. The whole work has certain research significance and guidance in the modeling section of the control system.

In future work, in order to improve the robustness of the clustering algorithm, we will further contemplate the necessity of the ensemble clustering technique to obtain a better clustering result [36, 37], thus making the whole work of some research significance and guidance in the modeling aspect of the control system. Moreover, we will apply the deep leaning or the long short term memory network to optimize parameters and improve the self-adaptability of the GMM clustering method.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work of this paper was supported by the National Natural Science Foundation of China (Grant no. 21676012) and the Fundamental Research Funds for the Central Universities (Project no. XK1802-4).