New Advances in Distributed Control of Large-Scale SystemsView this Special Issue
A Simulation Perspective: Error Analysis in the Distributed Simulation of Continuous System
To construct a corresponding distributed system from a continuous system, the most convenient way is to partition the system into parts according to its topology and deploy the parts on separated nodes directly. However, system error will be introduced during this process because the computing pattern is changed from the sequential to the parallel. In this paper, the mathematical expression of the introduced error is studied. A theorem is proposed to prove that a distributed system preserving the stability property of the continuous system can be found if the system error is limited to be small enough. Then, the compositions of the system error are analyzed one by one and the complete expression is deduced, where the advancing step T in distributed environment is one of the key factors associated. At last, the general steps to determine the step T are given. The significance of this study lies in the fact that the maximum T can be calculated without exceeding the expected error threshold, and a larger T can reduce the simulation cost effectively without causing too much performance degradation compared to the original continuous system.
Engineering problems involve dealing with physical systems. Since most physical laws are described by differential equations, simulation in engineering is in fact related to numerical resolution of differential equations. This is called continuous system simulation and even the components (i.e., the subsystems or submodels) of it are generally described in time discretization manner .
The large engineering system has sustained requirement on distributed simulation for two reasons. First, the continuous growth on system scale and complexity results in intense demand on computing capacity, which can be mitigated in distributed environment by scattering the computing load to networked nodes. Second, the system itself is sometimes geographically fragmented; thus, the distributed structure is needed to be consistent with its topology. Except those constructed in distributed manner from scratch, there are still many classic continuous systems that were constructed in the nondistributed way, in that the system was designed and tested without considering the possible scenario of distributed simulation. The demand to transform such system into the distributed one emerges.
The classic continuous system is characterized by computing its components in pipeline, where a computing sequence is set up and the components are computed one by one within a step; the sequence is determined by the component’s input/output characteristics and the data dependence. By contrast, the computation on each component in the distributed environment will start simultaneously and advance in a parallel manner; the updating data is exchanged periodically through the network for synchronization.
The difference between two computing patterns can be formulized as follows. Suppose a system containing components: . A single component can be represented as , where is the input, is the output, and is the model function. We have . Assuming a computing sequence has been determined in the nondistributed environment, the pipeline computing within one step can be described aswhere is the step size and the subscript refers to the index in . In the pipeline computing, the first component is assumed to get the input from a signal source , and the others get inputs following the data dependence among themselves.
By contrast, the one-step computing in distributed environment can be described aswhere is the advancing step of the distributed system. It should be noted that the computing sequence is not held any more in this case.
There are two differences comparing these two computing processes. First, the computing is synchronous in the nondistributed environment. For each component, the input is updated to time step firstly, and then the output at is computed (the data dependence among components is assumed to be consistent with the computing sequence). On the other hand, the computation is parallel in the distributed environment and each component starts simultaneously. The input of a component will still hold the value of step when the output at needs to be computed. Second, the advancing steps of two environments, and , may be different. In the nondistributed environment, is determined by the equation solver; either fixed or variable step is employed. However, the determination of needs to consider the bandwidth of the underlying network, signal frequency, human perception limitation, and so forth. It is often the case where the whole system advances with a fixed and .
A formal approach capable of partitioning a system into parallel parts and properly handling the above two issues is the key to build a distributed simulation from a nondistributed one. This problem, denoted as the Partitioning Problem, was early studied in the simulation of complex mechanical systems [2, 3] and then became a research focus in the field of discrete event simulation [2, 4, 5] and the decentralized, large-scale control system [6, 7]. However, the proposed approaches all involved extra efforts to reformulize or reengineer the simulated system and did not consider the possible difference on and either. These drawbacks set up barriers to apply these approaches broadly.
In , anovel partitioning approach is proposed using the system’s topology. This approach is easy to perform since most classic simulations have block-diagram structures, but it is “lossy” because the disorder of the component’s input/output data cannot be avoided, thus leading to the error on state trajectory of the resulting distributed system. To reduce the error, a portioning rule was proposed to eliminate the possible cumulative delays caused by improper partitioning, which contribute a lot to the overall error.
This paper will study the mathematical expression of this error. The correlation between the error and the advancing step in distributed environment is revealed. Then, a maximum can be calculated according to the error threshold specified by the system engineer. The significance to find the maximum lies in the fact that a larger will improve the parallelism of the partitioned system by reducing the synchronization frequency between nodes, without causing too much performance degradation at the same time.
2. Literature Review
The Partitioning Problem was early studied in the coupled problem [2, 5] of complex mechanical system simulation, where the fluids, thermal, control, and structure subsystems interacted with each other and formed multi-physical-field system . The partitioning is applied to decompose such system into partitions with physical or computational considerations. The resulting system could separately advance in time over each partition, thus gained high simulation efficiency. However, these studies were focused on the finite element analysis, not for general purpose simulation.
The Partitioning Problem is also critical to build the decentralized, large-scale control system [6, 7]. A graph-based approach was employed in that the system states were connected as vertexes and the coupling strength (measured by weight factor) between any two of them was evaluated. Small weight connections were more likely to form the partitioning edge by which the system was split into relative independent parts. Another methodwas to use the delay differential equations [9, 10] to describe the dynamic of the distributed system. Networked control scheme  was constructed to obtain the convergence and stability in control of the system. However, this approach focuses on the influence of signal delay associated with network latency, rather than the errors caused by the change of computing pattern. In our opinion, such errors will still exist even though the underlying network is perfect and has no delay.
In the Modeling and Simulation (M&S) domain, DEVS (Discrete Event System Specification) theory casts light into the solving of Partitioning Problem. The basic idea was to transform a continuous system into the DEVS form by quantizing the system states. A DEVS system is comprised of a set of connected components (called “atomic model” or “coupled model”). The output of each component will hold unchanged until the values of the states it maintained exceed some predefined quantized level. As a result, the time driving continuous system is transformed into an event driving system; thus, it can be decoupled and partitioned easily [12–14]. The transformed system is suitable for asynchronous, distributed environment in nature; however, the “illegitimacy” phenomenon, where the states may transit for unlimited times within a limited period, often causes the simulation to fail to converge correctly.
As an extension of DEVS, the QSS approach [15–17] was proposed to resolve the “illegitimacy” problem by using a special “hysteretic quantization” method, where the output of each component (or model, as called in DEVS system) will not transit to the next quantization level during its descending period until the change has exceeded certain threshold. QSS provides a general approach for the Partitioning Problem since each QSS model interacts with discrete, states updating events. However, the QSS model needs to be particularly designed to allow states transiting between quantization levels. The changes of states are triggered by unpredicted threshold crossing events in QSS. In other words, the advancing “step” of QSS system is unpredicted and time-varying. This characteristic makes QSS unsuitable for applications such as the Hardware-In-the-Loop (HIL) or Man-In-the-Loop (MIL) simulation, where the system has to advance with some fixed step to align the hardware’s working frequency or to take care of the needs of human perception.
The approaches mentioned above all involve extra efforts to reformulize or reengineer the simulated system. For example, in the graph-based approach, the detailed analysis of the coupling strength between system states requests the engineer to have full knowledge of system dynamics. The QSS approach also needs the system components to be modified to follow the DEVS formulation. Additionally, the possible difference between and was not considered either. These drawbacks set up barriers to apply these approaches broadly.
By contrast, the partitioning approach proposed in  took the advantage of the system’s structure characteristics; no extra work is needed except for decomposing the system according to the data transferring route. However, the incurred error needs to be further studied considering the possible correlation with .
The content of this paper is organized as follows: in Section 3, the previous work is briefly reviewed. In Section 4, the mathematical expression of the errors is given with theorems. In Section 5, a series of steps are concluded to determine the maximum .
3. Problem Description
The components within this system can be classified into two categories: the Direct-Feed-Through (DFT) component and the Non-Direct-Feed-Through (NDFT) component. The output of DFT component is directly associated with its input; that is, the output at time is determined by the input at the same time. On the other hand, the output of NDFT component is only determined by its current state and has nothing to do with the current input. It implies that, when computing a DFT component, the components it depends on should be computed firstly to maintain the correct data dependence. The NDFT component has no such requirement since its current output does not rely on the current input.
With parallel computing in the distributed environment, more deductions can be deduced from the foregoing. First, if part of the system deployed on a separate node formed a NDFT component (both DFT and NDFT component can have recursive structure), its output would be delayed for when updating to other nodes. Second, if two or more DFT components were cascaded, they should not be divided into different nodes; otherwise, the output delay would accumulate from the first DFT component. For example, if the components , , and in Figure 1 were deployed to different nodes, the output of component will be delayed compared to the original system, component is delayed, and component is delayed.
These outcomes have been observed in the experiments of , and a partitioning rule was proposed to reduce the accumulated delays. The rule is simple: the distributed DFT component should be avoided; each partitioned part should be ensured to form a NDFT component. The system performance was improved by applying this rule, as shown in Figure 2.
Although the partitioning rule made the distributed system more close to the original continuous system, there is still one step () delay among each node. This delay cannot be eliminated completely because of the parallelism nature of distributed environment, leading to the system error (denoted as ). To understand the influence of to , the mathematical expression of needs to be formulized.
To simplify the analysis of , an abstract control system containing two components is employed here: one component is the Controller and the other is the Plant. This system is assumed to be Asymptotically Stable. Denoting the original continuous system as and the distributed system as , the controller and plant can be described as differential equations in :where and are system states, and are outputs, and are inputs, and are state functions, and and are output functions. In , the controller and the plant are described aswhere is the discretization operator. This operator indicates the operand updates its value to the external world following the advancing step , just like being discretized.
Perturbation Analysis is used here to analyze the composition of . Perturbation analysis treats as the outcome of disturbances to such as the output delay and discretization. The difference between and can be determined without formulizing the differential equations of both systems and then comparing the eigenvalues. First of all, the system is expressed in the form of perturbation equations:where(i), the error of controller state caused by time delay;(ii), the error of plant state caused by time delay;(iii), the error of plant output caused by time delay;(iv), the error of controller output caused by discretization;(v), the plant output error caused by discretization.
The system error is represented asTo find out the detailed expression of , a preparation theorem is proposed as follows.
4. The Mathematical Analysis
4.1. A Preparation Theorem
Based on the assumption of Asymptotically Stable, the functions of , , , and in (3a), (3b), (4a), and (4b) are further assumed to be Lipschitz Continuous over a region , where is a nonsaturation region defined by and :For convenience, the time symbol is omitted unless necessary. and are nonsaturation regions for and , respectively, that is, the state spaces of the controller and the plant. and are nonsaturation regions for and , respectively, that is, the output spaces of the controller and the plant. With this assumption, a preparation theorem  is given.
Theorem 1 (preparation theorem). For an asymptotically stable as shown in (3a) and (3b), if (a) the functions and are continuously differentiable and (b) a Lyapunov function is defined over an open region containing the original point (assumed to be the equilibrium point), then a can be obtained from , in that all start positions lying in an arbitrary interior region (denoted as ) of are attracted to another arbitrary interior region () of in finite time. and are defined by sections of crossed by two level surfaces.
This theorem is critical because it guarantees that can be derived from a given that is stable. The assumption that the original point is the equilibrium is easy to be satisfied by translation transformation of the state variables.
Proof. Define an auxiliary function:We havewhere and is the Lyapunov function of .
Define the second auxiliary function over a region :where . is continuous since is continuous. Then, we haveand thenA positive real number can be found since is the negative definition to satisfy:Combining (11b), thenAs a result, a region can always be found in the vicinity of the origin of :where is a positive real number defining the radius of . Another positive real number () can be found to satisfy (14) over :Considering inequality (11a), we haveConsider (9), and we haveTo integrate both sides of (16) in ,where is the start point of state trajectory. Inequation (17b) means the state trajectory in is strictly constrained by a diminishing function in . Considering the fact that , we havewhere is the value of where the state trajectory crosses the level surface that defines region . Inequations (18a) and (18b) mean that the state trajectory of will stay inside . Considering , we also havewhere is the value of where the state trajectory crosses the level surface that defines region . Then, the solution trajectory of will go from the start point to within finite time:Inequation (20) means a can be found, whose trajectory will converge to the equilibrium point of eventually within finite time, given CS being asymptotically stable and the system error being constrained as (13) shows.
4.2. The Mathematical Expression of System Error
The mathematical expression of is studied by analyzing each component of it.
is the variation of the controller’s state value within time interval . According to the definition in (5a) and (5b),Integrating both sides of (21b), is bounded in since it is continuously differentiable (one of the conditions in Theorem 1). Then, we havewhere is the upper bound of within .
is the variation of the plant’s state value within time interval . Following a similar approach in the analysis of , we have where is the upper bound of within .
is the variation of the plant’s output within time interval . According to (5a) and (5b), we haveConsidering is Lipschitz Continuous (Theorem 1), we haveConsidering (24), we have where is the Lipschitz Const of .
is caused by discretization on the plant’s output, where the discrete interval is . is bounded by the changing rate of the output, as shown in Figure 3.
Similarly, we have Combining (6), (23), (24), (26b), (28), and (29),where defines the upper bound of . Once an expected is given (denoted as ), can be determined: for any , and the overall error will be less than . Equation (31) also indicates the way to partition will influence : the more the parts partitioned (supposing each part is deployed on a node), the greater the denominator of the right part of (31), thus the smaller that of the allowable .
A distributed simulation is constructed in this section to show the determination of . The original continuous system is a point mass system with fraction as follows:where is spring const, is mass, and is friction coefficient. This system is asymptotically stable; the trajectories of and are shown in Figure 4.
A distributed system can be partitioned from the original system, where the structure has the block-diagram representation (see Figure 5).
The differential equations of these two parts arewhere , are the output of components I and II, respectively. These two parts are deployed on two nodes, forming the simple distributed system.
Equation (31) indicates that the parameters reflecting system dynamic need to be determined firstly, which are as follows::the upper bound of function within any time interval ;:the Lipschitz Const of function ;:the upper bound of function within any time interval ;:the Lipschitz Const of function .From the engineering perspective, these parameters can be directly observed from the simulation results of the original system. In this case, is determined by the lower bound of as Figure 6 shows.
As a result, . According to the definition of Lipschitz Const, is limited by the upper bound of , that is, the upper bound of in this case. Then, we have . Following similar procedures, we have , . According to (31), is estimated as follows:The distributed system described in (33a) and (33b) is simulated. The trajectories of the system states are compared to those of the original continuous system, as Figure 7 shows. The threshold value of is 0.5, and then is estimated to be 0.1055 s according to (34). Three configurations, where s, s, and s, are tested in the form of distributed simulation. As expected, only the case s satisfies that both and do not exceed the threshold value of .
However, the actual error does not exceed the threshold value immediately when is configured to be greater than 0.1 s, as shown in Figures 7(c) and 7(d). The reason is that is not strictly estimated since many associated parameters use their boundary values.
The mathematical expression of helps us to gain the insight of system error produced in the construction of the distributed system using the partitioning approach. Giving an expected threshold of , a proper advancing step of the distributed system can be determined. A larger , compared to the integral step in the nondistributed system, will reduce the data-exchange frequency between simulation nodes, leading to the reduction of demands on system timing performance and network bandwidth. Then, the simulation cost is saved eventually. In fact, this approach can also play a role in multicore or multi-CPU parallel computing environments.
However, for systems that may become unstable after system partitioning, this approach is not so convenient since a Lyapunov function of the continuous system needs to be found firstly. The parameters defining specific regions satisfying (13)–(15) also need to be determined. More work needs to be done to improve this approach’s availability.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
E. Hairer, S. P. Nørsett, and G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems, vol. 8 of Springer Series in Computational Mathematics, Springer, Berlin, Germany, 2nd edition, 1993.View at: MathSciNet
E. Hinton, P. Bettess, and R. W. Lewis, Eds., Numerical Methods in Coupled Problems, Pineridge Press, Swansea, UK, 1981.
R. W. Lewis, Ed., Numerical Methods in Coupled Problems, John Wiley & Sons, London, UK, 1984.
R. W. Lewis, Ed., Numerical Methods in Coupled Problems, Wiley, London, UK, 1984.
E. Kofman, “Quantization-based simulation of differential algebraic equation systems,” Simulation-Transactions of the Society for Modeling and Simulation Internatinal, vol. 79, no. 9, pp. 363–376, 2003.View at: Google Scholar
E. Kofman, J. S. Lee, and B. P. Zeigler, “DEVS representation of differential equation systems: review of recent advances,” in Proceedings of the 13th European Simulation Symposium, pp. 591–595, October 2001.View at: Google Scholar
E. Kofman, Discrete event based simulation and control of continuous [Ph.D. thesis], Universidad Nacional de Rosario, Santa Fe, Argentina, 2003.