Abstract

A real-time infrastructure, called MLRTI, is proposed in this paper to fulfill the requirement of real-time simulation in distributed environment. There are two novel contributions in this work. Firstly, a flexible timing mechanism is proposed to integrate external time source and local timer utility, enabling the distributed nodes to advance their timeline simultaneously at different speeds with high precision. A data transmission solution is also presented in which the reflective memory card (VMIC) is employed to provide fast data transmission with minimum delay. Secondly, a system partition schema is proposed in MLRTI to reduce the solution errors introduced by transforming a continuous system into distribution system, which is common in a class of control applications where the system is designed in centralized model but simulated in distributed environment for constrains on system structure or the need to balance computation load. Experiments are conducted and the results show this schema effectively reduces the possible errors by properly partitioning the system into parts that are suitable to be deployed in distributed environment.

1. Introduction

Real-time simulation has been applied in many domains like defense [1, 2], aero, and air space systems [3, 4], embedded automotive electronics [5], and so forth. These domains are often involved with hardware-in-the-loop or human-in-the-loop applications, where the response speed to control signals/commands is critical and the model equations must be solved within limited time intervals. For example, in a robotic control system, the vision system needs to respond with low latency because the produced image is required with the feedback loops of robotic application [6].

Sometimes a real-time system has to be deployed in distributed environment. One case is the system’s structure which is scattered. For example, a training system is where the human operated device is located at one place while the controller is at another place. Another case is the scale of the system which is large and has to be distributed to average the computation load on each node [7]. As a result, a real-time infrastructure for distribution simulation must handle the following issues.(i)It is to synchronize time advancements on distributed nodes. There are often multiple subsystems coexisting and each has different timing requirements. For example, in combat simulation, virtual entities created by computer may need not to interact with human operator directly. Thus their time advancement step can be longer (e.g., 50 milliseconds); occasionally delay on time advancement would not lead to fatal failure. On the other hand, nodes operated by human being (e.g., a manned flight simulator) need strict real-time performance in which the simulation step of 1~5 milliseconds or shorter is needed.(ii)It is to ensure the data between subsystems can be exchanged timely. The performance on data transmission will be affected by multiple factors like bandwidth, data load, topology, and so forth. The unpredictable delays among simulation nodes are unacceptable in critical real-time applications.(iii)A common but little concerned one is the transforming problem when constructing the distributed system. In the application of continuous system, the original system is normally designed and tested in centralized manner; that is, the system is constructed as a whole and tested in a standalone computer; the data transmission does not cross network. When such system is deployed in distributed environment, the solution errors, that is, the difference on states trajectories in two systems, can be introduced.

There does exist formal approach like QSS [810] to transform a continuous model/system into one that is suitable for distributed environment. However, the quantization operation, named as “hysteretic quantization,” is necessary for QSS to hold the outputs of submodels until some predefined thresholds are crossed. This operation needs to modify system models, which will lead to problem when too many models exist or modifications are not allowed at all due to classified reason. Additionally, QSS approach did not discuss the situation where the model resolver step is different from the distributed simulation step , which is the common cases in real-world applications.

In this paper, a multiple layer real-time infrastructure (MLRTI) is proposed to address these problems. MLRTI highlight the following three characteristics: (i) high precision global timing capability achieved by integrating external timing source and local timer; (ii) low latency on data transmission and the publish/subscribe mechanism borrowed from High Level Architecture (HLA) (IEEE standard 1516) to specify data exchanging map between components; (iii) a novel partition schema to reduce the solution errors (for control system) incurred by system transformation from the centralized one to the distributed one.

The content of this paper is organized as follows. In Section 2, the structure of MLRTI is introduced. Two critical characteristics, the high precision timing and low latency data communication, are discussed in detail. And a publishing and subscribing mechanism is also presented to form the data exchanging map in distributed environment. In Section 3, the system partition schema is given to minimize the possible solution errors caused by transforming a centralized system into distributed system. Comparative experiments are conducted to verify its effectiveness.

2. System Structure

MLRTI is designed to allow the distributed parts of a system to work together. Each part can have different timing speeds. Normally, there are different requirements on timing performance within a system:(a)the nonreal-time (NRT) tasks like resource management, deployment configuration, and so forth; these tasks are normally performed at presimulation period and need not to advance time with real-time manner;(b)the soft real-time (SRT) parts, which advance their times at real-time but failures (e.g., the inconsistency caused by inaccurate timing or delays on data transmission) are allowed; examples include the displaying part or virtual entities that do not interact with human operators directly; the time interval here can be 1 ms–100 ms;(c)the hard real-time (HRT) parts, including the human operated equipment or the models whose inputs must be sampled from external environment; the computation of these parts must be completed within specified interval (e.g., 1 μs–1 ms); their times need to be advanced with high precision.

To satisfy all these requirements, a layered infrastructure is proposed, as Figure 1 shows.

The NRT layer, SRT layer, and HRT layer are presented from the top to the bottom, respectively. The differences among layers are distinguished by two factors: the timing precision and the data transmission speed. They are often connected with each other: high data transmission speed helps to improve the timing precision, and high timing precision helps to align data updating interval, thus eliminating the data jitters.

In NRT layer, the network media can be common Ethernet. Currently, the bandwidth of Ethernet can reach 1 Gb/s or more. However, the underlying mechanism of detecting data collision on Ethernet (CSMA/CD according to IEEE802.3) could cause inevitable delays on data transmission. According to our test, the average transmission delay in a commonly configured Ethernet LAN is about 10~15 milliseconds [11].

To reduce this delay, one solution is to measure and compensate time-delay by special schedule mechanism [12]. However, it is not a complete approach. A better solution is to improve the collision detecting mechanism of Ethernet protocol. An excellent example is the PowerLink protocol [13] whose PowerLink stack can replace the TCP/IP and UDP/IP layer seamlessly, called PowerLink Layer. This layer realizes fast, real-time data transmission in which a collision prevention mechanism called “Slot Communication Network Management (SCNM)” is employed to synchronize each node in the polling way. In the best case, the minimum interval between data package sending is about 100 μs, which can meet the requirement of SRT.

As for HRT layer, the requirements on timing and data transmission are higher. Specialized hardware needs to be employed to meet such requirement. High precision time sources, like GPS and BDS (BeiDou Navigation Satellite System), can be introduced into the simulation for high precision synchronizing signal. Additionally, reflective memory card (VMIC) connected with optical fiber provides a mechanism for high speed, low latency data transmission capability (e.g., GE PCI-5565piorc’s data rate ≥ 170 Mb/s; time delay can reach nanosecond level). The specialized hardware and protocol enable CPUs not to be involved in data sending/receiving process; the big bandwidth of optical fiber also contributes to the high performance on data communication.

2.1. Flexible Timing Solution

As mentioned previously, precise timing is one of the key factors of real-time system. Time synchronization is important to maintain correct timeline and causal relationship between nodes. Two issues need to be addressed [14]:(a)absolute synchronization: simulation time needs to align precisely with external time in real world, that is, the physical time; it is important in hybrid simulation like military drill STOW 99 held in US, where real military force (human and equipment) and virtual force interact with each other;(b)relative synchronization: the distributed nodes also need to align their own times during simulation.

Relative synchronization is easier to realize. Timer utilities provided by operation system (OS) or other commercial software can be used to fulfill it. Four utilities are introduced here.

The first one is the timer utility provided by OS. For example, the multimedia time provided by Windows OS can provide periodic timing with accuracy of 1 ms; and its built-in error compensation mechanism can check and remove accumulated timing error.

The second one is the crystal oscillator clock embedded on computer’s motherboard, which could provide more accurate time signal with nanoseconds interval. However, the precision of this signal will vary with motherboard type, manufacturing technology, or working temperature.

Commercial software RTX (real-time extension) is designed to overcome the lack of real-time capability of Windows operating system. RTX can provide time interval at 1 ms, 1 μs, or one-tick interval produced by computer hardware by giving its own clock signals.

The fourth one is the specialized hardware. VMIC is often used in HRT domain to get high precision timing capability. It can broadcast the interrupt signals to network and the per-defined interrupt function on each node will be invoked to process them.

The former two utilities cost low since they are easy to access. However, their performances are limited. The latter two provide improved performance, at the cost of expensive investment.

External time source needs to be imported into system when absolute synchronization is required. The most convenient way is to use timing signals from global navigation satellite system (GNSS). However, it is difficult to get high resolution time signals from such system due to cost or authorization reason. The common resolution is 1 s [15].

As a result, a “Master-Slave” structure is proposed here to manage the timing in distributed manner, as Figure 2 shows.

The master node is responsible for timing control throughout the system. Its time is pinned to external time source and produces synchronization signals. The signals are broadcasted to the whole network, as Figure 2 shows. The slave nodes keep listening to the signals. Each slave node will firstly check the received signal. If this signal is intended for it, the slave node will proceed the following process: (i) update its local time; (ii) read in the latest data from its source node (if any exists); and (iii) do computation and update its outputs. When using VMIC device, this process is done via interrupt functions. The timing precision is ensured by both external time source and local timer.

To maintain different time advancement speeds on simulation nodes, there are two information arrays being maintained in the master node: (i) the time advancement request array; request from each slave node specified the next time point it needs to go to; (ii) the priority array that records a predefined priority sequence, by which the distributed nodes are allowed to advance their time one by one. A complete procedure is described as follows.(1)Master node polls the request array and pick up the nearest next time point . The slave nodes (there could be multiple nodes asking for advancing to the same time point) who send this request are recorded in set variable .(2)Master node resorts with descending priority order and then sends the interrupt signals when each item’s is reached.(3)Slave nodes that are allowed to advance their time do model computation in specified time interval and then send a new time advancement request to the master node. The next time point is told by this way.(4)Master node keeps waiting until all time advancing requests (from ) arrived; then the request array is updated. If some node does not respond in time, the master node would have three options: (a) ignore this delay and keep going; (b) warn this delay and ask user if to continue; (c) warn this delay and stop simulation. It is decided by user.(5)Go back to step .

2.2. Data Exchanging Map between Nodes

The data exchanging between simulation nodes is defined before simulation start. The data is classified into two classes according to their importance: “property” and “message.” “Property” refers to data periodically produced by models and exchanged during simulation, which are not critical and a small amount of loss cannot lead to fatal consequence to the stability and correctness of the whole system. On the other hand, “message” is important to notify the critical events and keeping causality correct. In Figure 1, the “messages” are kept and the “properties” are discarded when data exchanging happened between real-time domains with different advancing speed.

To describe the data exchanging between nodes, the concept of “publishing and subscribing” is borrowed from HLA in which “publishing” node means it can produce data to simulation space and “subscribing” node consumes data produced by others.

The publishing and subscribing map is defined in a tree, in which the parent node represents the parent publishing or subscribing over its son nodes. For example, a “vehicle” node may has son nodes of “fighter” and “tank”; subscribing “vehicle” data means subscribing both “fighter” and “tank” data. On the other hand, publishing or subscribing of a leaf node only triggers the corresponding nodes when that leaf node (data or event) updates itself. This mechanism provides extra flexible to the description of data exchanging.

2.3. Performance Experiments

A distributed HRT environment was built up to test the delays that may exist in timing and data exchange. In our experiment scenario, three computers equipped with VMICs (GE PCI-5565piorc) are connected with optical fiber. Time advancing mechanism takes “Master-Slave” structure, and the local timer of master node is enhanced by RTX middleware working with Windows XP.

The performance of relative synchronization is tested in this scenario. On each timing step, delay comes from three aspects: (i) timing aberration of RTX timer, (ii) data transmission delay caused by VMIC’s ring network protocol, and (iii) responding latency of interrupt function of VMIC since receiving interrupt signal.

According to the product specification, data transmission latency between adjacent RMICs is about 0.4 μs. In Figure 3, 1000 measurements of RTX timer callback with interval of 1 ms are recorded. As we can see, the maximum deviation is less than 1.8 μs and the average deviation is 0.3758 μs.

In Figure 4, 1000 measurements of the responding latency of interrupt function of VMIC since receiving interrupt signal are recorded. The maximum latency is less than 23 μs; the average latency is about 13.228 μs.

The average time advancement error between simulation nodes is 0.3758 + 13.228 + 0.4 = 14.0038 μs. For HRT domain advancing at millisecond level, this error is less than 5% of time step and can be omitted most of time.

The experiment on timing performance with external time source is not conducted here, considering the fact that external time sources like GPS and BDS have been quite mature in their technological evolution and own stable timing precision, which can be added with time delay inside the simulation (Figures 3 and 4) to get the final performance.

3. System Partition Schema

MLRTI guarantees that the distributed system can advance its global time synchronously with high precision and exchange data swiftly with low latency. However, for a class of continuous system application, the possible errors introduced by system transforming are not considered yet. It is a common case that a control system is designed in “centralized” manner; that is, the system is constructed as a whole and is tested in a standalone computer. When such a system is deployed in distributed environment, the new system has been different from the original one. In brief, the input/output sequence between each distributed part could be disordered. This can be explained with a simple example shown in Figure 5, an inverted pendulum control system.

3.1. Problem Description

The inverted pendulum system contains 9 submodels in it, as Figure 5 shows. In nondistributed simulation, all submodels need to be computed and updated on each step, with certain computing order. The principle to determine this order is not to violate data dependence among submodels. To achieve this, the models can be classified into two categories according to their input/output characteristics.

(a) Direct-Feed-Through (DFT) Model. DFT port is defined as a pair of input, output where the output is determined by current input value. A DFT model owns one or more DFT ports. Assuming a model can be described with three sets of variables: input set , state set , and output set ; then the DFT model can be expressed aswhere is input function, is state transition function, and is output function. The latest output is determined by current input and states , which implies a sequent computing order existing between this DFT model and its preceding models (that produce ). is a special case of DFT model, where is the empty set. Common DFT models include gain, product, sum, derivative, and so forth.

(b) Non-Direct-Feed-Through (NDFT) Model. NDFT model has no DFT ports. A NDFT model can be expressed asThe output is associated with current state rather than the current input , which means this NDFT model can produce output without waiting for the latest input; thus the constrain on the computing order between it and its preceding models is relaxed. Common NDFT models include integrator, input signal, memory, and so forth.

Upon this classification, the rules to determine the computing order can be stated as follows.(1)For a DFT model, the models which drive its DFT ports should be computed before it.(2)For a NDFT model, it can be computed with any order as long as before the DFT models it drives.

There could be multiple feasible computing orders for a specific system according to the above rules; the red bracketed figures in Figure 5 indicate one of them. In centralized simulation, the system works well by this order; however, the case becomes complex when it is deployed in distributed environment.

An extreme scenario is to deploy each submodel to a separate node. Obviously, the system would still work well as long as the computation order is maintained. However, it is meaningless to maintain a “sequent” computation in a distributed environment. If we want to make full use of the advantage of distributed environment, that is, to compute in parallel, the input/output between distributed models may become disordered. To describe it, the output sequence of each submodel will be analyzed.

In the following analysis, two time symbols would be referred to: the time step of model resolver and the simulation step of distributed system. Actually, the continuous model is normally implemented as “discrete time model” with specific numerical simulation schema, and the respond numerical resolver (e.g., the Euler or Runge-Kutta resolver) is employed to compute it. The resolver can be fixed or variable step size. In this example, a fixed step of  s is used. Simulation step is the globally allocated time interval for each simulation node to compute the models deployed on it. Data exchanges are performed at the end of each simulation step. Without losing generality, we specify that , .

According to the types of model port, there are 4 NDFT and 5 DFT models in the inverted pendulum system as shown in Table 1.

Each model is described by three variable sets . The subscript of them refers to the index of time step, and the superscript refers to the index of submodel (denoted as ). The computation of each model is displayed in Figure 6. The expressions are briefly explained as follows:(i): receiving current inputs “”; it should be noted that there are two special representations on inputs: (a)   means this model has no inputs, and (b)   means the input is not available at computing time;(ii): transiting states from old state to new state , with current input and time interval . A brief convention is employed here; for example, is represented as ; which represents that the state transition is triggered by time interval , not by input; this case only appears when or ; the model can be considered being out of control when ;DFT models which have no internal states; thus their state transitions are omitted in the following analysis;(iii): computing the outputs ; symbol “” represents (a) the internal states if this is a NDFT model or (b) the current inputs if this is a DFT model.

The outputs and state transitions at the first 3 steps are listed in Table 2.

Figure 7 shows the simulation advancing in distributed environment. Each submodel is computed in parallel manner and data exchange happens at the end of each simulation step.

The outputs and states transition of the first 5 (or 6) steps are listed in Table 3.

The outputs of (NDFT model) and (DFT model) are compared here between centralized and distributed scenario, as Figure 8 shows. Firstly, and ’s outputs are all delayed in distributed environment. The 3rd output of in centralized simulation, , appears at the 7th step in distributed simulation. Similarly, the 3rd output of in centralized simulation, , appears at the 6th step in distributed simulation. Secondly, the values of the corresponding outputs in centralized and distributed scenarios are also changed except the delayed time. For example, it is found that by backtracking their data dependence.

The following facts on output delay can be concluded from the above comparison. All models’ outputs would be delayed except the source model (). Delays start from the first step since all nodes need to advance their computation simultaneously. The first input of each node is missing. For DFT model, the missing input would produce invalid output (denoted as “”); for NDFT model, it is equivalent that the model’s dynamics is changing with inertia rather than external stimulation.

When two or more DFT models are cascaded, the delay will accumulate along the cascading path, but NDFT do not. In the situation where NDFT and DFT models are mixed cascaded, the delays can be counted as follows.(a)Back-track NDFT model’s preceding DFT models until to itself (loop) or another NDFT model. There could be multiple trace paths, among which the longest one determines the delays on the output of this NDFT. For example, there are four incoming paths before : two go back to itself, one to , and one to , as Figure 1 shows. The longest one (DFT model cascaded) is the “”; thus ’s output delay is 3 simulation steps.(b)For DFT model, count the maximum delay of its preceding models and then plus one delay produced by itself. For example, the maximum delay before is 3 (contributed by ); thus ’s output delay is .

3.2. Performance Improvement by Proper System Partition

Obviously, the accumulated delays are produced when distributed nodes only contain DFT models, which deteriorate the control quality. For example, in the inverted pendulum system, the pendulum is actually out of control during the time when the control signal is delayed by its preceding DFT models. Thus the system solution, that is, the state trajectories, could produce undesired errors. In some cases, it can produce instability, as Figure 9 shows.

To reduce the delays caused by DFD models, a schema is proposed here to partition the system properly as follows.(a)Pick up one of the NDFT models, denoted as . Disconnect its outgoing connections and backtrack its preceding DFT models along each incoming path until a different NDFT model, denoted as . Loop is enabled and kept as composition of this part. There could be multiple tracking paths; thus is a set.(b)Disconnect the connections between set and their adjacent DFT models (along the tracking paths). This separated part forms a new integrated NDFT model.(c)Repeat steps (a) and (b) until the system is partitioned completely.

Using this schema, the separately deployed DFT models are eliminated; as a result, the accumulated delays are eliminated either. In Figure 10, the inverted pendulum system is partitioned into 4 parts; the computing order inside each part is determined by the same way as introduced in Section 3.1.

With this partition schema, the control quality of the system in distributed environment is improved greatly, as we can see in Figure 11. The main benefit brought out by this schema is it enlarges the tolerance of simulation step with which the distributed system can sustain the stability property of the original system. It relaxes the criterion under which the original system can be deployed into distributed environment without any modifications to models.

4. Conclusion

A simulation infrastructure, MLRTI, is proposed in this paper to address some practical issues related with real-time simulations. The integrating timing mechanism and high data transmission speed achieved with specialized hardware guarantee the performance of the infrastructure of real-time system. Additionally, the system partition schema successfully reduced the possible errors incurred by improper system distribution. With these characteristics, MLRTI is used in the following domains: (a) the virtual combat simulation domain, where computer generated entities act as friendly or rival forces and human pilots combat with or against them by simulators; (b) the distributed control domain, where different parts of the system residents on different nodes.

In the future, more theoretical work would be done to improve the system partition schema in two directions: considering the load balance (computation balance and communication balance) requirement in the partition schema; finding a formal approach to determine the upper bond of step size in distributed environment.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.