Abstract

This paper presents a systematic methodology for the generation of high-level performance models for analog component blocks. The transistor sizes of the circuit-level implementations of the component blocks along with a set of geometry constraints applied over them define the sample space. A Halton sequence generator is used as a sampling algorithm. Performance data are generated by simulating each sampled circuit configuration through SPICE. Least squares support vector machine (LS-SVM) is used as a regression function. Optimal values of the model hyper parameters are determined through a grid search-based technique and a genetic algorithm- (GA-) based technique. The high-level models of the individual component blocks are combined analytically to construct the high-level model of a complete system. The constructed performance models have been used to implement a GA-based high-level topology sizing process. The advantages of the present methodology are that the constructed models are accurate with respect to real circuit-level simulation results, fast to evaluate, and have a good generalization ability. In addition, the model construction time is low and the construction process does not require any detailed knowledge of circuit design. The entire methodology has been demonstrated with a set of numerical results.

1. Introduction

An analog high-level design process is defined as the translation of analog system-level specifications into a proper topology of component blocks, in which the specifications of all the component blocks are completely determined so that the overall system meets its desired specifications optimally [1–3]. The two important steps of an analog high-level design procedure are high-level topology generation/selection [4, 5] and high-level specification translation [6]. At the high-level design abstraction, a topology is defined as an interconnection of several analog component blocks such as amplifier, mixer and filter. The detailed circuit-level implementations of these component blocks are not specified at this level of abstraction. The analog component blocks are represented by their high-level models.

During the past two decades, many optimization-based approaches have been proposed to handle the task of topology generation/selection [7–11]. These approaches involves the task of topology sizing, where the specification parameters of all the component blocks of a topology are determined such that the desired system specifications are optimally satisfied. The two important modules for this type of design methodology are a performance estimation module and an optimization engine. The implementation of the design methodology is based upon the flow of information between these two modules.

The performance models that are used in the high-level design abstraction are referred to as high-level performance models. An analog high-level performance model is a function that estimates the performance of an analog component block when some high-level design parameters of the block are given as inputs [12, 13]. The important requirements for a good high-level performance model are as follows. (i) The model needs to be low dimensional. (ii) The predicted results need to be accurate. The model accuracy is measured as the deviation of the model predicted value from the true function value. The function value in this case is the performance parameter obtained from transistor level simulation [12]. (iii) The evaluation time must be short. This is measured by the CPU time required to evaluate a model. (iv) The time required to construct an accurate model must be small, so that the design overhead does not become high. As a rough estimate, the construction cost is measured as𝑇construction=𝑇datageneration+𝑇training,(1) where the terms are self-explanatory. There exists a tradeoff between these requirements since a model with lower prediction error generally takes more time for construction and evaluation.

In this work, we have developed the performance models using least squares support vector machine (LS-SVM) as the regressor. The transistor sizes of the circuit-level implementations of the component blocks along with a set of geometry constraints applied over them define the sample space. Performance data are generated by simulating each sampled circuit configuration through SPICE. The LS-SVM hyper parameters are determined through formal optimization-based techniques. The constructed performance models have been used to implement a high-level topology sizing process. The advantages of this methodology are that the constructed models are accurate with respect to real circuit-level simulation results, fast to evaluate and have a good generalization ability. In addition, the model construction time is low and the construction process does not require any detailed knowledge of circuit design. The entire methodology has been demonstrated with a set of experimental results.

The rest of the paper is organized as follows. Section 2 reviews some related works. Section 3 presents the background concepts on least squares support vector machines. An outline of the methodology is provided in Section 4. The model generation methodology is described in detail in Section 5. The topology sizing process is described in Section 6. Numerical results are provided in Section 7 and finally conclusion is drawn in Section 8.

A fairly complete survey of related works is given in [14]. An analog performance estimation (APE) tool for high-level synthesis of analog integrated circuits is described in [15, 16]. It takes the design parameters (e.g., transistor sizes, biasing) of an analog circuit as inputs and determines its performance parameters (e.g., power consumption, thermal noise) along with anticipated sizes of all the circuit elements. The estimator is fast to evaluate but the accuracy of the estimated results with respect to real circuit-level simulation results is not good. This is because the performance equations are based on simplified MOS models (SPICE level 1 equations). A power estimation model for ADC using empirical formulae is described in [13]. Although this is fast, the accuracy with respect to real simulation results under all conditions is off by orders of magnitude. The technique for generation of posynomial equation-based performance estimation models for analog circuits like op-amps, multistage amplifiers, switch capacitor filters, and so forth, is described in [17, 18]. An important advantage of such a modeling approach is that the topology sizing process can be formulated as a geometric program, which is easy to solve through very fast techniques. However, there are several limitations of this technique. The derivation of performance equations is often a manual process, based on simple MOS equations. In addition, although many analog circuit characteristics can be cast in posynomial format, this is not true for all characteristics. For such characteristics, often an approximate representation is used. An automatic procedure for generation of posynomial models using fitting technique is described in [19, 20]. This technique overcomes several limitations of the handcrafted posynomial modeling techniques. The models are built from a set of data obtained through SPICE simulations. Therefore, full accuracy of SPICE simulation is achieved through such performance models. A neural network-based tool for automated power and area estimation is described in [21]. Circuit simulation results are used to train a neural network model, which is subsequently used as an estimator. Fairly recently, support vector machine (SVM) has been used for modeling of performance parameters for RF and analog circuits [22–24]. In [25], SVM-optimized by GA has been used to develop a soft fault diagnosis method for analog circuits. In [26], GA and SVM has been used in conjunction for developing feasibility model which is then used within an evolutionary computation-based optimization framework for analog circuit optimization.

2.1. Comparison with Existing Methodologies

The present methodology uses nonparametric regression technique for constructing the high-level performance models. Compared with the other modeling methodologies employing symbolic analysis technique or simulation-based technique, the advantages of the present methodology are as follows. (i) Full accuracy of SPICE simulations and advanced device models, such as BSIM3v3 are used to generate the performance models. The models are thus accurate compared to real circuit-level simulation results. (ii) There is no need for any a priori knowledge about the unknown dependency between the inputs and the outputs of the models to be constructed. (iii) The generalization ability of the models is high. (iv) The model construction time is low and the construction process does not require any detailed circuit design knowledge.

The EsteMate methodology [21] using artificial neural network (ANN) and the SVM-based methodology discussed in [22, 23] are closely related with the present methodology. The methodology that we have developed, however, has a number of advantages over them. These are as follows. (1)In the EsteMate methodology, the specification parameters of a component block constitute the sample space for training data generation. The specification parameters are electrical parameters and there exists strong nonlinear correlations amongst them. Therefore, sophisticated sampling strategies are required for constructing models with good generalization ability in the EsteMate methodology. On the other hand, in our method, the transistor sizes along with a set of geometry constraints applied over them define the sample space. Within this sample space, the circuit performance behavior becomes weakly nonlinear. Thus simple sampling strategies are used in our methodology to construct models with good generalization ability.(2)In EsteMate, for each sample, a complete circuit sizing task using a global optimization algorithm is required for generation of the training data. This is usually prohibitively time consuming. On the other hand, in our method, simple circuit simulations using the sampled transistor sizes are required for data generation. Therefore, the cost of training data generation in our method is much less compared to that in the EsteMate methodology [21]. With the EsteMate methodology, the training sample points are so generated that performances such as power is optimized. On the other hand, in our methodology, the task of performance optimization has been considered as a separate issue, isolated from the performance model generation procedure. Our strategy is actually followed in all practical optimization-based high-level design procedures [1, 27].(3)The generalization ability of the models constructed with our methodology is better than that generated through the EsteMate methodology. This is because the latter uses ANN regression technique. Neural network-based approaches suffer from difficulties with generalization, producing models that can overfit the data. This is a consequence of the optimization algorithms used for parameter selection and the statistical measures used to select the β€œbest” model. SVM formulation, on the other hand, is based upon structural risk minimization (SRM) principle [28], which has been shown to be superior to traditional empirical risk minimization (ERM) principle, employed by the conventional neural networks. SRM minimizes an upper bound on the expected risk, as opposed to ERM that minimizes the error on the training data. Therefore an SVM has greater generalization capability.(4)The SVM-based methodology, as presented in [23], uses heuristic knowledge to determine the model hyper parameters. The present methodology uses optimization techniques to determine optimal values for them. GA-based methodology for determination of optimal values for the model hyper parameters is found to be faster compared to the grid search technique employed in [22].

3. Background: Least Squares Support Vector Regression

In recent years, the support vector machine (SVM), as a powerful new tool for data classification and function estimation, has been developed [28]. Suykens and Vandewalle [29] proposed a modified version of SVM called least squares SVM. In this subsection, we briefly outline the theory behind the LS-SVM as function regressor.

Consider a given set of training samples {π‘₯π‘˜,π‘¦π‘˜}π‘˜=1,2,…,𝑁tr where π‘₯π‘˜ is the input value and π‘¦π‘˜ is the corresponding target value for the π‘˜th sample. With an SVR, the relationship between the input vector and the target vector is given asΜ‚β€Œπ‘¦(π‘₯)=π‘€π‘‡πœ™(π‘₯)+𝑏,(2) where πœ™ is the mapping of the vector π‘₯ to some (probably high-dimensional) feature space, 𝑏 is the bias, and 𝑀 is the weight vector of the same dimension as the feature space. The mapping πœ™(π‘₯) is generally nonlinear which makes it possible to approximate nonlinear functions. The approximation error for the π‘˜th sample is defined asπ‘’π‘˜=π‘¦π‘˜βˆ’Μ‚π‘¦π‘˜ξ€·π‘₯π‘˜ξ€Έ.(3) The minimization of the error together with the regression is given as1minπ’₯(𝑀,𝑒)=2𝑀𝑇1𝑀+𝛾2𝑁trξ“π‘˜=1𝑒2π‘˜,(4) with equality constraintπ‘¦π‘˜=π‘€π‘‡πœ™ξ€·π‘₯π‘˜ξ€Έ+𝑏+π‘’π‘˜,π‘˜=1,2,…,𝑛,(5) where 𝑁tr denotes the total number of training datasets and the suffix π‘˜ denotes the index of the training set, that is, π‘˜th training data, 𝛾 is the regularization parameter.

The optimization problem (4) is considered to be a constrained optimization problem and a Lagrange function is used to solve it. Instead of minimizing the primary objective (4), a dual objective, the so-called Lagrangian, is formed of which the saddle point is the optimum. The Lagrangian for this problem is given asβ„’(𝑀,𝑏,𝑒,𝛼)=π’₯(𝑀,𝑒)βˆ’π‘trξ“π‘˜=1π›Όπ‘˜ξ€·π‘€π‘‡πœ™ξ€·π‘₯π‘˜ξ€Έ+𝑏+π‘’π‘˜βˆ’π‘¦π‘˜ξ€Έ,(6) where π›Όπ‘˜s’ are called the Lagrangian multipliers. The saddle point is found out by setting the derivatives equal to zero:πœ•β„’πœ•π‘€=0βŸΆπ‘€=𝑁trξ“π‘˜=1π›Όπ‘˜πœ™ξ€·π‘₯π‘˜ξ€Έ,πœ•β„’πœ•π‘=0βŸΆπ‘€=𝑁trξ“π‘˜=1π›Όπ‘˜=0,πœ•β„’πœ•π‘’π‘˜=0βŸΆπ›Όπ‘˜=π›Ύπ‘’π‘˜,πœ•β„’πœ•π›Όπ‘˜=0βŸΆπ‘€π‘‡πœ™ξ€·π‘₯π‘˜ξ€Έ+𝑏+π‘’π‘˜βˆ’π‘¦π‘˜=0.(7) By eliminating π‘’π‘˜ and 𝑀 through substitution, the final model is expressed as a weighted linear combination of the inner product between the training points and a new test object. The output is given asΜ‚β€Œπ‘¦ξ€·π‘₯ξ€Έξ€·=βŸ¨π‘€,πœ™π‘₯ξ€ΈβŸ©=𝑁trξ“π‘˜=1π›Όπ‘˜πœ™ξ€·π‘₯π‘˜ξ€Έξ„•=,πœ™(π‘₯)+𝑏𝑁trξ“π‘˜=1π›Όπ‘˜ξ«πœ™ξ€·π‘₯π‘˜ξ€Έξ¬=,πœ™(π‘₯)+𝑏𝑁trξ“π‘˜=1π›Όπ‘˜πΎξ€·π‘₯π‘˜ξ€Έ,π‘₯+𝑏,(8) where 𝐾(π‘₯π‘˜,π‘₯) is the kernel function. The elegance of using the kernel function lies in the fact that one can deal with feature spaces of arbitrary dimensionality without having to compute the map πœ™(π‘₯) explicitly. Any function that satisfies Mercer's condition can be used as the kernel function. The Gaussian kernel function used in the present work is defined as𝐾π‘₯π‘˜ξ€Έξƒ©βˆ’β€–β€–π‘₯,π‘₯=expπ‘˜β€–β€–βˆ’π‘₯2𝜎2ξƒͺ,(9) and is commonly used, where 𝜎2 denotes the kernel bandwidth. The two important parameters, kernel parameter 𝜎2, and the regulation parameter 𝛾 as defined in (4) are referred to as hyper parameters. The values of these parameters have to determined critically in order to make the network efficient.

4. An Outline of the Methodology

The high-level performance model of an analog component block is mathematically represented as𝜌=𝒫𝑋,(10) where 𝜌 is a set of performance parameters and 𝑋 is a set of specification parameters. The input specification parameters are referred to as the high-level design parameters. It is to be noted that out of various possible specification parameters, only the dominant parameters are to be considered as inputs. The selection of these is based upon the designer's knowledge [12]. These high-level design parameters describe a space referred to as the sample space. This sample space is explored to extract sample points through suitable algorithms. The numerical values of the sample points (both inputs and outputs of the performance model to be constructed) are generated through SPICE simulations. The data points so generated are divided into two sets, referred to as the training set and the test set. A least squares SVM network approximating a performance model is constructed by training the network with the training set. The test dataset is used to validate the SVM model. Suitable kernel functions are selected for constructing the SVM. An initial SVM model is constructed through some initial values of the hyper parameters. An iterative process is then executed to contruct the final LS-SVM so as to maximize its efficiency through optimal determination of the hyper parameters. An outline of the process for constructing the performance model of a single component block is illustrated in Figure 1(a).

For a complex system, consisting of many component blocks, the high-level performance model of the complete system is constructed at the second level of hierarchy, where the high-level models of the individual component blocks are combined analytically (see Figure 1(b)). The constructed performance models are used to implement a high-level topology sizing process. For a given unsized high-level topology of an analog system, the topology parameters (which are the specification parameters of the individual blocks of the high-level topology) are determined such that the desired design goals are satisfied. The entire operation is performed within an optimization procedure, which in the present work is implemented through GA. The constructed LS-SVM models are used within the GA loop. An outline of the sizing methodology is shown in Figure 1(c).

The following two important points may be noted in connection with the present methodology. First, the high-level performance model of a complete system is generated in a hierarchical manner. The major advantage of this hierarchical approach is reusability of the high-level model of the individual component blocks. The high-level model of the component blocks can be utilized whenever the corresponding component blocks are part of a system, provided the functionality and performance constraints are identical. This generally happens. The issue of reusability of the component block level high-level models is demonstrated in Experiment 3, provided later. However, this advantage comes at the cost of reduced accuracy of the model of the complete system. This tradeoff is a general phenomenon in analog design automation process. It may, however, be noted that it is possible to construct the high-level performance model of a complete system using the regression technique discussed here. For some customized applications, this may be done. Second, the requirement of low dimensionality of the models must be carefully taken care of. The scalability of our approach of model generation is not high, compared to analytical approach. However, compared to other black-box approaches like ANN-based, the scalability of our SVM-based approach is high. In addition, many of the global optimization algorithms suffer from the problem of β€œcurse of dimensionality.” For a topology sizing procedure, employing high-dimensional model the design space in which to search for optimal design points becomes too large to be handled by simple optimization algorithms. Therefore, while selecting the inputs of the model, only the dominant specification parameters need to be considered.

The detailed operations of each of the steps outlined above are discussed in the following sections and subsections.

5. High-Level Performance Model Generation

In this section, we describe the various steps of the performance model generation procedure in detail.

5.1. Sample Space Definition, Data Generation, and Scaling

In (10), both 𝜌 and 𝑋 are taken to be functions of a set of geometry parameters 𝛼 (transistor sizes) of a component block, expressed as𝑋=ℛ𝛼,ξ€·πœŒ=𝒬𝛼.(11)β„› and 𝒬 represents the mapping of the geometry parameters to electrical parameters. This is illustrated in Figure 2. The multidimensional space spanned by the elements of the set 𝛼 is defined as circuit-level design space π’Ÿπ›Ό. The sample space is a subspace within π’Ÿπ›Ό (see Figure 3), defined through a set of geometry constraints. These geometry constraints include equality constraints as well as inequality constraints. For example, for matching purpose, the sizes of a differential pair transistors are equal. The inequality constraints are determined by the feature size of a technology and conditions that the transistors are not excessively large. With elementary algebraic transformations, all the geometry constraints are combined into a single nonlinear vector inequality, which is interpreted element wise as𝐢𝑔𝛼β‰₯0βŸΊβˆ€π‘–βˆˆ{1β‹―π‘ž}𝐢𝑔𝑖𝛼β‰₯0.(12) Within this sample space, the circuit performance behavior becomes weakly nonlinear [27, 30]. Therefore, simple sampling strategies are used to construct models with good generalization ability. In the present work, the sample points are extracted through Halton sequence generation. This is a quasirandom number generator which generates a set of uniformly distributed random points in the sample space [31]. This ensures a uniform and unbiased representation of the sample space. The number of sample data plays an important role in determining the efficiency of the constructed LS-SVM model. Utilizing a separate algorithm, it is possible to determine an optimum size of the training sample data such that models built with smaller training set than this optimum value will have lower accuracy than the models built with optimum number of training sample and models built with larger training data than the optimum number will have no significant higher accuracy. However, in the present work, in order to make the sampling procedure simple, the number of sample data is fixed which is determined through a trial and error method.

The training data generation process is outlined in Figure 4. For each input sample (transistor sizes) extracted from the sample space π’Ÿπ‘”, the chosen circuit topology of a component block is simulated using SPICE through Cadence Spectre tool using the BSIM3v3 model. Depending upon the selected input-output parameters of an estimation function, it is necessary to construct a set of test benches that would provide sufficient data to facilitate automatic extraction of these parameters via postprocessing of SPICE output files. A set of constraints, referred to as feasibility constraints are then applied over the generated data to ensure that only feasible data are taken for training.

The generated input-output data are considered to be feasible, if either they themselves satisfy a set of constraints or the mapping procedures (β„›,𝒬) through which they are generated satisfy a set of constraints. The constraints are as follows [30]. (1)Functionality constraints 𝐢𝑓: these constraints are applied on the measured node voltages and currents. They ensure correct functionality of the circuit and are expressed as 𝐢𝑓=ξ€½π‘“π‘˜(𝑣,𝑖)β‰₯0,π‘˜=1,2,…,𝑛𝑓.(13) For example, the transistors of a differential pair must work in saturation. (2)Performance constraints 𝐢𝑝: these are applied directly on the input-output parameters, depending upon an application system. These are expressed as 𝐢𝑝=ξ‚†π‘“π‘˜ξ€·πœŒξ€Έβ‰₯0π‘“π‘˜ξ‚€π‘‹ξ‚β‰₯0π‘˜=1,2,…,𝑛𝑝.(14) For example, the phase margin of an op-amp must be greater than 45Β°.

The total set of constraints for feasibility checking is thus 𝐢={𝐢𝑓βˆͺ𝐢𝑝}. It is to be noted that through the process of feasibily checking, various simulation data are discarded. This at a glance may give an impression about wastage of costly simulation time. However, for an analog designer (who is a user of the model), this is an important advantage. This is because, the infeasible data points will never appear as solution whenever the model is used for design characterization/optimization. Even from the model developer's perspective, this is not a serious matter considering the fact that the construction process is in general a onetime process [24]. The feasibility constraints remain invariant if the performance objectives are changed. Even if the design migrates by a small amount, these constraints usually do not change [27]. This, however, demands an efficient determination of the feasibility constraints.

Data scaling is an essential step to improve the learning/training process of SVMs. The data of the input and/or output parameters are scaled. The commonly suggested scaling schemes are linear scaling, log scaling, and two-sided log scaling. The present methodology employs both linear scaling as well as logarithmic scaling depending upon the parameters chosen. The following formula are used for linear and logarithmic scaling within an interval [0,1] [32]:Linear:π‘‘ξ…žπ‘—=π‘‘π‘—βˆ’π‘™π‘,π‘’π‘βˆ’π‘™π‘Logarithmic:π‘‘ξ…žπ‘—=𝑑log𝑗/𝑙𝑏,log(𝑒𝑏/𝑙𝑏)(15) where 𝑑𝑗 is the unscaled 𝑗th data of any parameter bounded within the interval [𝑙𝑏,𝑒𝑏]. Linear scaling of data balances the ranges of different inputs or outputs. Applying log scale to data with large variations balances large and small magnitudes of the same parameter in different regions of the model.

5.2. LS-SVM Construction

In this subsection, we discuss the various issues related to the construction of the LS-SVM regressor.

5.2.1. Choice of Kernel Function

The first step of construction of an LS-SVM model is the selection of an appropriate kernel function. For the choice of kernel function 𝐾(π‘₯π‘˜,π‘₯), there are several alternatives. Some of the commonly used functions are listed in Table 1, where 𝑑, 𝜎, πœ…, and πœƒ are constants, referred to as hyper parameters. In general, in any classification or regression problem, if the hyper parameters of the model are not well selected, the predicted results will not be good enough. Optimum values for these parameters therefore need to be determined through proper tuning method. Note that the Mercer condition holds for all 𝜎 and 𝑑 values in the radial basis function (RBF) and the polynomial case, but not for all possible choices of πœ… and πœƒ in the multilayer perceptron (MLP) case. Therefore, the MLP kernel will not be considered in this work.

5.2.2. Tuning of Hyper Parameters

As mentioned earlier, when designing an effective LS-SVM model, the hyper parameter values have to be chosen carefully. The regularization parameter 𝛾, determines the tradeoff cost between minimizing the training error and minimizing the model error. The kernel parameter 𝜎 or 𝑑 defines the nonlinear mapping from the input space to some high-dimensional feature space [33].

Optimal values of the hyper parameters are usually determined by minimizing the estimated generalization error. The generalization error is a function that measures the generalization ability of the constructed models, that is, the ability to predict correctly the performance of an unknown sample. The techniques used for estimating the generalization error in the present methodology are as follows. (1)Hold-out method: this is a simple technique for estimating the generalization error. The dataset is separated into two sets, called the training set and the test set. The SVM is constructed using the training set only. Then it is tested using the test dataset. The test data are completely unknown to the estimator. The errors it makes are accumulated to give the mean test set error, which is used to evaluate the model. This method is very fast. However, its evaluation can have a high variance. The evaluation may depend heavily on the data points that end up in the training set and on those which end up in the test set, and thus the evaluation may be significantly different depending on how the division is made.(2) β€œπ‘˜β€-fold cross-validation method: in this method, the training data is randomly split into π‘˜ mutually exclusive subsets (the folds) of approximately equal size [33]. The SVM is constructed using π‘˜βˆ’1 of the subsets and then tested on the subset left out. This procedure is repeated π‘˜ times. Averaging the test error over the π‘˜ trials gives an estimate of the expected generalization error. The advantage of this method is that the accuracy of the constructed SVM does not depends upon how the data gets divided. The variance of the resulting estimate is reduced as π‘˜ is increased. The disadvantage of this method is that it is time consuming.

Primarily there are three different approaches for optimal determination of the SVM hyper parameters: heuristic method, local search method and global search method. The 𝜎 value is related to the distance between training points and the smoothness of the interpolation of the model. A heuristic rule has been discussed in [34] for estimating the 𝜎 value as [𝜎min,𝜎max] where 𝜎min is the minimum distance (non-zero) between two training points and 𝜎max is the maximum distance between two training points. The regularization parameter 𝛾 is determined based upon the tradeoff between the smoothness of the model and its accuracy. The bigger its value the more importance is given to the error of the model in the minimization process. Choosing a low value is not suggested while using exponential RBF to model performances which are often approximately linear or weakly quadratic in most input variables. While constructing LS-SVM-based analog performance model, heuristic method has been applied for determining the hyper parameters in [23]. The hyper parameters generated through heuristic method are often found to be suboptimal as demonstrated in [12]. Therefore, determination of hyper parameters through formal optimization procedure is suggested [33].

The present methodology employs two techniques for selecting optimal values of the model hyper parameters. The first one is a grid search technique and the other one is a genetic algorithm-based technique. These are explained below considering the RBF as the kernel function. For other kernels, the techniques are accordingly used.

(1) Grid Search Technique
The basic steps of the grid search-based technique is outlined below.
(1)Consider a grid space of (𝛾,𝜎2), defined by log2π›Ύβˆˆ{𝑙𝑏𝛾,𝑒𝑏𝛾} and log2𝜎2∈{π‘™π‘πœŽ2,π‘’π‘πœŽ2}, where [𝑙𝑏𝛾,𝑒𝑏𝛾] and [π‘™π‘πœŽ2,π‘’π‘πœŽ2] define the boundary of the grid space. (2)For each pair within the grid space, estimate the generalization error through hold-out/π‘˜-fold cross-validation technique. (3)Choose the pair that leads to the lowest error. (4)Use the best parameter to create the SVM model as predictor.
The grid search technique is simple. However, this is computationally expensive since this is an exhaustive search technique. The accuracy and time cost of the grid method are tradeoff depending on the grid density. In general, with the increase in grid density, the computational process becomes expensive. On the other hand, sparse density lowers the accuracy. The grid search technique is therefore performed in two stages. In the first stage, a coarse grid search is performed. After identifying a better region on the grid, a finer grid search on that region is conducted in the second stage. In addition, the grid search process is a tricky task since a suitable sampling step varies from kernel to kernel and the grid interval may not be easy to locate without prior knowledge of the problem. In the present work, these parameters are determined through trial and error method.

(2) Genetic Algorithm-Based Technique
In order to reduce the computational time required to determine the optimal hyper parameter values without sacrificing the accuracy, numerical gradient-based optimization technique can be used. However, it has been found that often the SVM model selection criteria have multiple local optima with respect to the hyper parameter values [28]. In such cases, the gradient-based method have chances to be trapped in bad local optima. Considering this fact, we use a genetic algorithm-based global optimization technique for determining the hyper parameter values.
In the GA-based technique, the task of selection of the hyper parameters is same as an optima searching task, and each point in the search space represents one feasible solution (specific hyper parameters). Each feasible solution is marked by its estimated generalization ability, and the determination of a solution is equal to determination of some extreme point in the search space.
An outline of a simple GA-based process is shown in Figure 5. The chromosomes consist of two parts, log2𝛾 and log2𝜎2. The encoding of the hyper parameters into a chromosome is a key issues. A realcoded scheme is used as the representation of the parameters in this work. Therefore, the solution space coincides with the chromosome space. In order to produce the initial population, the initial values of the designed parameters are distributed in the solution space evenly. The selection of population size, is one of the factors that affects the performance of GA. The GA evaluation duration is proportional to the population size. If the population size is too large, a prohibitive amount of time for optimization will be required. On the other hand, if the population size is too small, the GA can prematurely converge to a suboptimal solution, thereby reducing the final solution quality. There is no generally accepted theory for determining optimal population size. Usually, it is determined by experimentation or experience.
During the evolutionary process of GA, a model is trained with the current hyper parameter values. The hold-out method as well as the π‘˜-fold cross-validation method are used for estimating the generalization error. The fitness function is an important factor for estimation and evolution of SVMs providing satisfactory and stable results. The fitness function expresses the users' objective and favours SVMs with satisfactory generalization ability. The fitness of the chromosomes in the present work is determined by the average relative error (ARE) calculated over the test samples. The fitness function is defined as 1𝐹=ξ€·ARE𝛾,𝜎2ξ€Έ.(16) Thus, maximizing the fitness value corresponds to minimizing the predicted error. The ARE function is defined as 1ARE=𝑁teπœŒξ…žπ‘te1ξ€·πœŒξ…žξ€Έ.βˆ’πœŒ(17) Here 𝑁te, 𝜌, and πœŒβ€² are the number of test data, the SVM estimator output, and the corresponding SPICE simulated value, respectively. The fitness of each chromosome is taken to be the average of five repetitions. This reduces the stochastic variability of the model training process in GA-based LS-SVM.
The genetic operator includes the three basic operators such as selection, crossover, and mutation. Roulette wheel selection technique is used for the selection operation. The probability 𝑝𝑖 of selecting the 𝑖th solution is given by 𝑝𝑖=πΉπ‘–βˆ‘π‘pop𝑖=1𝐹𝑖,(18) where 𝑁pop is the size of the population. Besides, in order to keep the best chromosome in every generation, the idea of elitism is adopted. The use of a pair of real-parameter decision variable vectors to create a new pair of offspring vectors is done by the crossover operator. For two parent solutions 𝐱𝟏 and 𝐱𝟐, the offspring is determined through a blend crossover operator. For two parent solutions 𝐱𝟏 and 𝐱𝟐, such that 𝐱𝟏<𝐱𝟐, the blend crossover operator (BLX-𝛽) randomly picks a solution in the range [π±πŸβˆ’π›½(π±πŸβˆ’π±πŸ),𝐱𝟐+𝛽(π±πŸβˆ’π±πŸ)]. Thus, if 𝑒 be a random number in the range (0,1) and 𝛼=(1+2𝛽)π‘’βˆ’π›½, then the following is an offspring: 𝐱new=ξ€·1βˆ’π›Όπ±πŸξ€Έ+π›Όπ±πŸ.(19) If 𝛽 is zero, this crossover creates a random solution in the range (𝐱𝟏,𝐱𝟐). It has been reported for a number of test cases that BLX-0.5 (with 𝛽=0.5) performs better than BLX operators with any other 𝛽 value. The mutation operator is used with a low probability to alter the solutions locally to hopefully create better solutions. The need for mutation is to maintain a good diversity of the population. The normally distributed mutation operator is used in this work. A zero mean Gaussian probability distribution with standard deviation πœ‚π‘– for the 𝑖th solution is used. The new solution is given as 𝐱new=𝐱𝐒+𝑁0,πœ‚π‘–ξ€Έ.(20) The parameter πœ‚π‘– is user-defined and dependent upon the problem. Also, it must be ensured that the new solution lies within the specified upper and lower limits. When the difference between the estimated error of the child population and that of the parent population is less than a predefined threshold over certain fixed generations, the whole process is terminated and the corresponding hyper parameter pair is taken as the output.

It may be mentioned here that there is no fixed method for defining the GA parameters, which are all empirical in nature. However, the optimality of the hyper parameter values is dependent upon the values of the GA parameters. In the present work, the values of the GA parameters are selected primarily by trial and error method over several runs.

5.3. Quality Measures

Statistical functions are generally used to assess the quality of the generated estimator. The ARE function defined in (17) is one such measure. Another commonly used measure is the correlation coefficient (𝑅). This is defined as follows:𝑁𝑅=teβˆ‘πœŒπœŒξ…žβˆ’βˆ‘πœŒβˆ‘πœŒξ…žξ‚™ξ‚ƒπ‘teβˆ‘πœŒ2βˆ’ξ€·βˆ‘πœŒξ€Έ2𝑁teβˆ‘πœŒξ…ž2βˆ’ξ€·βˆ‘πœŒξ…žξ€Έ2ξ‚„.(21) The correlation coefficient is a measure of how closely the LS-SVM outputs fit with the target values. It is a number between 0 and 1. If there is no linear relationship between the estimated values and the actual targets, then the correlation coefficient is 0. If the number is equal to 1.0, then there is a perfect fit between the targets and the outputs. Thus, the higher the correlation coefficient, the better it is.

6. Topology Sizing Methodology Using GA

The topology sizing process is defined as the task of determining the topology parameters (specification parameters of the constituent component blocks) of a high-level topology such that the desired specifications of the system are satisfied with optimized performances. In this section, we discuss a genetic algorithm-based methodology for a topology sizing process employing the constructed LS-SVM performance models.

An outline of the flow is shown in Figure 6. A high-level topology is regarded as a multidimensional space, in which the topology parameters are the dimensions. The valid design space for a particular application consists of those points which satisfy the design constraints. The optimization algorithm searches in this valid design space for the point which optimizes a cost function. The optimization targets, that is, the performance parameters to be optimized and system specifications to be satisfied are specified by the user. The GA optimizer generates a set of chromosomes, each representing a combination of topology parameters in the given design space. Performance estimation models for estimating the performances of a topology of the entire system are constructed by combining the LS-SVM models of the individual component blocks through analytical formulae. The performance estimation models take each combination of topology parameters and produce an estimation of the desired performance cost of the topology as the output. A cost function is computed using these estimated performance values. The chromosomes are updated according to their fitness, related to the cost function. This process continues until a desired cost function objective is achieved or a maximum number of iterations are executed.

7. Numerical Results

In this section, we provide experimental results demonstrating the methodologies described above. The entire methodology has been implemented in MATLAB environment and the training of the LS-SVM has been done using MATLAB toolbox [35].

7.1. Experiment 1

A two-stage CMOS operational transconductance amplifier (OTA) is shown in Figure 7. The technology is 0.18 μm CMOS process, with a supply voltage of 1.8 V. The transistor level parameters along with the various feasibility constraints are listed in Table 2. The functional constraints ensure that all the transistors are on and are in the saturation region with some user-defined margin. We consider the problem of modeling input referred thermal noise (𝜌1), power consumption (𝜌2), and output impedance (𝜌3) as functions of DC gain (𝑋1), bandwidth (𝑋2), and slew rate (𝑋3). From the sample space defined by the transistor sizes, a set of 5000 samples is generated using a Halton sequence generator. These are simulated through AC analysis, operating point analysis, noise analysis, and transient analysis using SPICE program. Out of all samples, only 1027 samples are found to satisfy the functional and performance constraints listed in Table 2.

The estimation functions are generated using LS-SVM technique. The generalization errors are estimated through the hold-out method and the 5-fold cross-validation method. The hyper parameters are computed through the grid search and the GA-based technique. In the grid search technique, the hyper parameters (𝜎2,𝛾) are restricted within the range [0.1,6.1] and [10,510]. The grid search algorithm is performed with a step size of 0.6 in 𝜎2 and 10 in 𝛾. These parameters are fixed based on heuristic estimations and repeated trials. The determined hyper parameter values along with the quality measures and the training time are reported in Tables 3 and 4 for the hold-out method and the cross-validation method, respectively. From the results, we observe that the average relative errors for the test samples are low (i.e., the generalization ability of the models is high) when the errors are estimated using the cross-validation method. However, the cross-validation method is much slower compared to the hold-out method.

For GA, the population size is taken to be ten-times the number of the optimization variables. The crossover probability and the mutation probability are taken as 0.8 and 0.05, respectively. These are determined through a trial and error process. The hyper parameter values and the quality measures are reported in Tables 5 and 6. From the results the above observations are also noted.

A comparison between the grid-search technique and the GA-based technique with respect to accuracy (ARE), correlation coefficient (𝑅), and required training time is made in Table 7. All the experiments are performed on a PC with PIV 3.00 GHz processor and 512 MB RAM. We observe from the comparison that the accuracy of SVM models constructed using the grid search technique and the GA-based technique are almost the same. However, the GA-based technique is at least ten-times faster than the grid search method. From (1), we conclude that the construction cost of the GA-based method is much lower than the grid search-based method, since the data generation time is same for both the methods.

The scatter plots of SPICE-simulated and LS-SVM estimated values for normalized test data of the three models are shown in Figures 8(a), 8(b), and 8(c), respectively. These scatter plots illustrate the correlation between the SPICE simulated and the LS-SVM-estimated test data. The correlation coefficients are very close to unity. Perfect accuracy would result in the data points forming a straight line along the diagonal axis.

7.2. Experiment 2

The objective of this experimentation is to quantitatively compare between our methodology and the EsteMate [21]. The power consumption model is reconstructed using the EsteMate technique. The specification parameter space is sampled randomly. A set of 5000 samples is considered. For each selected sample, an optimal sizing is performed and the resulting power consumption is measured. The sizing is done with a simulated annealing-based optimization procedure and standard analytical equations relating transistor sizes to the specification parameters [36] following the EsteMate procedure. Of these, 3205 samples are accepted and the rest are rejected. The determination of the training set took 10 hours of CPU time. The training is done through an artificial neural network structure with two hidden layers. The number of neurons for the first layer is 9, the number of neurons for the second layer is 6. The hold-out method is used for estimating the generalization ability.

A comparison between the two methodologies is reported in Table 8. From the results, we find that the data generation time is much less in our method compared to the EsteMate method. In addition, we find that the accuracy of our method is better than the EsteMate method. The experimental observations verify the theoretical arguments given in Section 2.1.

7.3. Experiment 3

The objective of this experimentation is to demonstrate the process of constructing high-level performance model of a complete system and the task of topology sizing.

System Considerations
We choose a complete analog systemβ€”interface electronics for MEMS capacitive sensor system as shown in Figure 9(a). In this configuration, a half-bridge consisting of the sense capacitors 𝐢1,𝐢2 is formed and driven by two pulse signals with 180Β° phase difference. The amplitude of the bridge output 𝑉π‘₯ is proportional to the capacitance change Δ𝐢 and is amplified by a voltage amplifier. The final output voltage 𝑉out is given by 𝑉out=𝑉02Δ𝐢2𝐢0+𝐢𝑝𝐴𝑣,(22) where 𝐢0 is the nominal capacitance value, 𝐢𝑝 is the parasitic capacitance value at the sensor node, 𝑉0 is the amplitude of the applied ac signal, and 𝐴𝑣 is the gain of the system, depending upon the desired output voltage sensitivity. The topology employs a chopper modulation technique for low 1/𝑓 noise purpose.
The desired functional specifications to be satisfied are (i) output voltage sensitivity (i.e., the total gain, since the input sensitivity is known) and (ii) cutoff frequency of the filter. The performance parameters to be optimized are (i) input-referred thermal noise, (ii) total power consumption, and (iii) parasitic capacitance at the sensor node 𝑉π‘₯. The functional specifications and design constraints for the system are based on [37] and are listed in Table 9.

Identification of the Component Blocks and the Corresponding Performance Models
The synthesizable component blocks are the preamplifier (PA), inverter (IN) of the phase demodulator, low-pass filter (LF), and the output amplifier (OA). These are constructed using OTAs and capacitors. Figure 9(b) shows the implementations of the amplifier and the filter blocks using OTAs and capacitor [38, 39].
High-level performance models for the synthesizable component blocks corresponding to the performance parametersβ€”(i) input referred thermal noise, (ii) power consumption, and (iii) sensor node parasitics are constructed. The specification parameters which have dominant influence on the first two performances as well as on the functional specification, that is, the output voltage sensitivity and the cutoff frequency are the transconductance values of all the OTAs involved. On the other hand, for the last performance parameter, that is, sensor node parasitics, transconductance value of the first OTA of the preamplifier block is the single design parameter. Thus the πΊπ‘š values of the OTAs are considered as high-level design parameters. In summary, we construct three performance models, input referred thermal noise, power consumption, and sensor node parasitics as functions of the πΊπ‘š values of the OTAs.

Construction of Performance Models for the PA Block
The geometry constraints and the feasibility constraints for the PA block of the topology are tabulated in Table 10. Similar types of constraints are considered for the other component blocks also. The input-output parameters of the models to be constructed are extracted through techniques discussed earlier. The sensor node parasitic capacitance is measured utilizing the half-bridge circuit shown in Figure 9(a), with only one amplifier block. Considering Δ𝐢=5 fF, 𝐢0=65 fF, a square wave signal with amplitude 𝑉0=500 mV is applied and transient analysis is performed. Measuring the signal at the node 𝑉π‘₯, 𝐢𝑝 is calculated using (22).
Table 11 shows the hyper parameter values, percentage average relative error, and correlation coefficient of the constructed performance models for the preamplifier, with respect to SPICE-simulated value.

Reusability of Models and Construction of High-Level Model for the Complete System
The performance models corresponding to the noise and the power consumption for the PA block are reused for the other component blocks. This is because all the component blocks have topological similarities and each of them is constructed from OTA circuits, as demonstrated in Figure 9(b). The issue of reusability of individual high-level models in a complete system is thus applied here.
The high-level models of the PA, IN, LF and OA blocks are combined analytically to construct the model of the complete system. The input referred noise and power consumption of the total system is given by 𝑉2𝑛𝑇=𝑉2𝑛1ξ€·πΊπ‘š1,πΊπ‘š2ξ€Έ+𝑉2𝑛2ξ€·πΊπ‘š3,πΊπ‘š4𝐴21+𝑉2𝑛3ξ€·πΊπ‘š5,πΊπ‘š6𝐴21+𝑉2𝑛4ξ€·πΊπ‘š7,πΊπ‘š8𝐴21,𝑃(23)𝑇=𝑃1ξ€·πΊπ‘š1,πΊπ‘š2ξ€Έ+𝑃2ξ€·πΊπ‘š1,πΊπ‘š2ξ€Έ+𝑃3ξ€·πΊπ‘š1,πΊπ‘š2ξ€Έ+𝑃4ξ€·πΊπ‘š1,πΊπ‘š2ξ€Έ.(24)𝐴1 is the gain of the preamplifier. 𝑉𝑛1(πΊπ‘š1,πΊπ‘š2) is the thermal noise model for the PA block, 𝑉𝑛2(πΊπ‘š3,πΊπ‘š4) is that for the IN block of the phase demodulator, and so on. It is to be noted that 𝑉𝑛2(πΊπ‘š3,πΊπ‘š4) need not be constructed again. It is same as 𝑉𝑛1(πΊπ‘š1,πΊπ‘š2). This is true for 𝑉𝑛3(πΊπ‘š5,πΊπ‘š6) and 𝑉𝑛4(πΊπ‘š7,πΊπ‘š8). This reusability principle is applied for the power consumption model of all the blocks. The sensor node parasitics π‘ƒπ‘Ž=π‘ƒπ‘Ž(πΊπ‘š1) is the same as the input parasitics of the preamplifier. It is to be noted that while constructing the high-level performance model of a complete system, the interactions between the transistors are taken care of while constructing the component-level performance model utilizing SPICE simulation data and the coupling between the blocks are considered through analytical equations.

Optimization Problem Formulation and Results
With these, the optimization problem for the topology sizing task is formulated as follows: Minimizeπœ”1𝑉𝑛𝑇+πœ”2𝑃𝑇+πœ”3π‘ƒπ‘Žξ€·π‘‰suchthatoutξ€Έtargetβˆ’π‘‰inξ‚ΈπΊπ‘š1πΊπ‘š2πΊπ‘š3πΊπ‘š4πΊπ‘š5πΊπ‘š6πΊπ‘š7πΊπ‘š8ξ‚Ήβ‰€πœ–1π‘“π‘βˆ’πΊπ‘š62πœ‹πΆπΏβ‰€πœ–2πΊπ‘šπ‘–minβ‰€πΊπ‘šπ‘–β‰€πΊπ‘šπ‘–max𝐢𝐿min≀𝐢𝐿≀𝐢𝐿max,(25) where πœ”π‘– are the associated weights.
The target output voltage sensitivity of the system (i.e., the total gain of the system) is taken as 145 mV/g and the cutoff frequency is taken as 35 KHz. The synthesis procedure took 181 seconds on a PIV, 3.00 GHz processor PC with 512 MB RAM. The crossover and the mutation probability are taken as 0.85 and 0.05, respectively. These are determined through a trial and error process. Table 12 lists the synthesized values of the topology parameters, as obtained from the synthesis procedure.

Validation
To validate the synthesis procedure, we simulate the entire system at the circuit-level using SPICE. Exact values of πΊπ‘š are not achievable often. In such cases, the nearest neighbouring values are realized. An approximate idea about the transistor sizes required to implement the synthesized πΊπ‘š values are made from the large set of data gathered during the estimator construction. A comparison between the predicted performances and simulated values is presented in Table 13. We observe that the relative error between predicted performances and simulated performances in each case is acceptable. However, for the output sensitivity and the cutoff frequency, the error is high. This is because the circuit-level nonideal effects have not been considered in the topology sizing process while formulating the final cost function and constraint functions. Following conventional procedure, this has been done purposefully in order to make the functions simple and the process converge smoothly [1, 27]. The acceptability and feasibility of the results are ensured to a large extent, since the utilized model is based on SPICE simulation results. The robustness of the results, however, could be verified by process corner analysis [27].

8. Conclusion

This paper presents a methodology for generation of high-level performance models for analog component blocks using nonparametric regression technique. The transistor sizes of the circuit-level implementations of the component blocks along with a set of geometry constraints applied over them define the sample space. Performance data are generated by simulating each sampled circuit configuration through SPICE. Least squares support vector machine (LS-SVM) is used as a regression function. The generalization ability of the constructed models has been estimated through a hold-out method and a 5-fold cross-validation method. Optimal values of the model hyper parameters are determined through a grid search-based technique and a GA-based technique. The high-level models of the individual component blocks are combined analytically to construct the high-level model of a complete system. The entire methodology has been implemented under MATLAB environment. The methodology has been demonstrated with a set of experiments. The advantages of the present methodology are that the constructed models are accurate with respect to real circuit-level simulation results, fast to evaluate and have a good generalization ability. In addition, the model construction time is low and the construction process does not require any detailed knowledge of circuit design. The constructed performance models have been used to implement a GA-based topology sizing process. The process has been demonstrated by considering the interface electronics for an MEMS capacitive accelerometer sensor as an example. It may be noted that multiobjective optimization algorithms [40] can also be used in the proposed approach for solving (25).

Acknowledgment

The first author likes to thank the Department of Science and Technology, Government of India for partial financial support of the present paper through Fast Track Young Scientist Scheme, no. SR/FTP/ETA-063/2009.