Abstract

With the continuous development of science and technology, people’s lifestyle becomes more and more intelligent, especially in intelligent transportation. However, running in a random environment, safety can be affected by various factors during operation. For an intelligent car, guaranteeing its safety in operation is important to the passengers in the vehicle. So, it is vital to verify the safety of the system of the smart car. This study proposes an integration formal method with stochastic Petri nets and Z (SPZN). Stochastic Petri nets can better simulate the occurrence of random events in the driving process of intelligent cars. With the advantages of the frame structure of Z language, the concurrent process and state before and after the system at different times can be better described. In addition, the frame structure of Z language can solve the problem of state explosion in Petri nets. Using this method, the random events that may occur during the operation can be formally modeled, and the subsequent behavior of the vehicle can be analyzed and predicted effectively. Using the reinforcement learning, the parameter in the stochastic Petri nets can be optimized, which can reduce the probability of bad states and ensure the stability and security of the system. Moreover, a case study of the intelligent car modeled by stochastic Petri nets and Z is given. The results show that it can improve the safety and effectiveness of the smart vehicle driving system.

1. Introduction

With the rise of the Internet of vehicle technology [15], smart cars are gradually integrated into people’s lives. However, smart cars always operate in an environment full of random factors, such as pedestrians, traffic signs on the road, other driving vehicles, and changeable weather, and its safety is particularly important. Liu et al. [6] proposed an efficient communication method FedCPF. Compared with traditional federated learning, this method achieves more efficient communication, faster convergence, and higher test accuracy. In terms of communication, the security of the Internet of vehicles system is improved. Zhao et al. [7] proposed a digital twin-assisted storage strategy for satellite-terrestrial networks (INTERLINK), which leverages the digital twins (DTs) to map the satellite networks to virtual space for better communication. By enabling more reliable communication, the safety of smart cars is guaranteed. Soni et al. [8] developed a novel low-cost sensor system that improves safety in intelligent transportation systems by improving vehicle sensor systems. Martinez et al. [9] found that driving style plays an important role in driving safety and then used different algorithms to characterize and identify the driver’s driving style. Through the analysis of the driver’s driving style, the safety performance during the driving process can be improved. Colombo et al. [10] proposed a strategy to dynamically decompose the formal verification problem of a large road network, which can effectively prevent vehicle collisions and improve the safety of autonomous vehicles. To sum up, the researchers have improved the security of the Internet of vehicles from different aspects, but the above methods have not effectively solved the random events that occur in the driving process of smart cars.

The formal method is an effective method to prove system security, accessibility, and effectiveness, and it is not only widely used in computer hardware technology [11], software requirement verification [1214], industrial engineering [15], medicine [16], communication [17], and so on. In recent years, it has been increasingly used in the field of transportation. Qi et al. [1820] used Petri nets to model intersections and designed strategies to solve the congestion problem. In addition, they also used time-delay Petri nets (TPNs) for intersection modeling, designed corresponding strategies to solve the accident-induced congestion problem, and proposed a method to classify driving behavior at intersections in congestion. Their work effectively improves the congestion at intersections and enhances the management and safety of urban traffic. In the study by Labadi et al. [21], stochastic Petri net (SPN) was used to model and analyze public bike sharing systems, and the results showed that SPN is very suitable for analyzing and simulating discrete-time systems. Moreover, there exist some researchers who have applied formal modeling to Internet of vehicles (IoVs). Liu et al. [22] proposed a formal model based on integration time Petri nets and Z (TPZN). By applying TPZN to IoVs, the behavior of IoVs can be accurately and formally described, and the model is validated with a case study, and the results show that the proposed approach can effectively improve the safety and intelligence of IoVs.

Although the above work has improved the security and intelligence of IoV, it still suffers from the problem of insufficient ability to describe random events. SPN has the ability to describe random events, but it still exists the problem of state explosion. To solve this problem, this study proposes a formal modeling approach combining stochastic Petri nets and Z (SPZN). SPZN consists of two parts, SPZN-SPN and SPZN-Z. SPZN-SPN defines the structure and flow of the overall model, and SPZN-Z abstracts the structure and describes the related constraints. Through the abstraction of SPZN-Z, the complexity of SPN can be effectively reduced, the number of states can be reduced, and the state explosion can be avoided. In addition, each transition in SPN has a transition implementation rate . can affect the stability and security of the system, so setting the value of so that the system has high stability and high security is an urgent problem to be solved. However, setting in practical problems is not an easy task, and Song and Sun et al. [23, 24] used assumed in their work and did not give a method to determine the values of . Therefore, in this study, we propose an optimization method for the transition implementation rate . We use the actor-critic algorithm in reinforcement learning to optimize for SPZN-SPN, which improves the stability and security of the system.

The rest of this study is organized as follows. Section 2 presents some of the basics required for this study. Section 3 introduces a formal modeling approach based on SPZN, refines the method, and proposes a transition implementation rate optimization method. Section 4 analyzes the model in terms of reachability, boundedness, and safety, respectively, and lists the advantages of the model. An example of a SPZN-based smart connected car system with random events is given in Section 5 to verify the effectiveness of the method in this study. Section 6 concludes the study and discusses future work.

2. Preliminaries

In this section, we review smart net cars, stochastic Petri nets, the relationship between stochastic Petri nets and Markov chains, Z language structures, and reinforcement learning.

2.1. Intelligent Connected Vehicle

With the rapid development of computer technology and artificial intelligence, traditional traffic system is gradually being replaced by the intelligent connected vehicle system that integrates people, vehicles, roads, and clouds, that is, “smart transportation” [25, 26]. As shown in Figure 1, this figure is a conceptual diagram of the vehicle-road-human-cloud integrated system. The relationship between the intelligent connected vehicle (ICV) and its various components is shown in Figure 2. It is not difficult to see that the ICV is mainly divided into autonomous vehicles and the Internet of vehicles. For the yellow part where smart transportation and the Internet of vehicles intersect in the figure, the part to which the smart connected car belongs. There is a technical architecture diagram, as shown in Figure 3. In the infrastructure submodule, sensors play an essential role. Usually, sensors are divided into three categories: distance, speed, and image. The distance sensor is mainly radar, and the image sensor is mainly a surveillance camera.

The related research on autonomous vehicles can be summarized into three aspects: autonomous driving in high-speed environments, autonomous driving in urban environments, and autonomous driving in special environments. Autonomous driving in high-speed environments is mainly applied to highways, and the difficulty lies in the safety of vehicles and passengers under high-speed driving. A lot of research is still needed to break through this difficulty. In the urban environment, the automatic driving speed is slow, the vehicle safety performance is high, the application is more extensive, and the prospect is better. However, it still has a complex urban environment and requires a more sensitive perception control algorithm. For autonomous driving in special environments, it is mainly used in harsh environments such as military operations. In this application scenario, we should first consider the reliability of vehicles in harsh environments.

For the Internet of vehicles, Ji et al. [26] proposed a new Internet of vehicle architecture. The purpose is to promote the rapid development of intelligent connected vehicles, improve the ability of road condition perception and traffic management and control, and lay the foundation for intelligent transportation. As shown in Figure 4, the architecture mainly consists of four layers: security authentication layer, data acquisition layer, edge layer, and cloud platform layer.

2.2. Stochastic Petri Net Extension

A stochastic Petri net is a kind of advanced Petri net that includes time factors and probability [27]. The main difference between the stochastic Petri net and the time Petri net is randomness. So, a stochastic Petri net is more suitable to describe the uncertain system. Furthermore, reachable marking graphs of stochastic Petri nets with finite places and transitions are isomorphic to one-dimensional continuous-time Markov [28, 29].

Stochastic Petri net is defined as 5-tuple :(1) is a finite and non-empty set of places(2) is a finite and non-empty set of transitions(3) represents places to transitions and transitions to places(4) is the initial state vector and represents the number of tokens in each place of the model(5) represents the set of implementation rate of transition

Here, for , is a nonnegative real number, which represents the rate of occurrence when the transition satisfies the occurrence conditions, and represents the average firing delay or average service time.

, , represents the probability of t happen in x-time [30].

If there is , then the transition is triggered and the state is changed from to . The token transfer rule is shown as follows:

2.3. SPN to Markov Chain

For a stochastic Petri net, it has the rate of occurrence of transition, , where is a nonnegative real number; then, the time delay of is a random variable that is related to time and obeys exponential distribution. Because the exponential distribution has memoryless properties, if there is a stochastic Petri net, the reachable marking graph of the stochastic Petri net is isomorphic to a finite Markov chain and satisfies the following definition [30].

Definition 1. For a Markov chain that is isomorphic to a stochastic Petri net, where the Markov chain has n states, we define a transition matrix Q of n × n order. When is not equal to , if there is that makes , then we haveelseWhen is equal to , we havewhere is the average implementation rate of transition and and that makes .
Figure 5 is a simple case about the reachable marking graph isomorphic to the Markov chain. From the Petri net of Figure 5(a), the reachable marking graph of Figure 5(b) can be obtained, and the reachable marking graph is isomorphic to the Markov chain, so we can get the Markov chain shown in Figure 5(c). There are 5 states in the reachable marking graph, so there are also 5 states in the corresponding Markov chain.
By constructing the reachable marking graph of the stochastic Petri net, we can obtain that the reachable marking graph and its isomorphic Markov chain have states, and the steady-state probability from state to state is an n-dimensional vector P, and then, , where is the steady-state probability of marking . According to the Markov process, there are the following equations. By solving this system of equations, we can obtain the steady-state probability for each reachable state .

2.4. Z Frame Structure

Z language is a specification language mainly based on first-order predicate calculus. It is a functional language, which can easily express the state and behavior of things in mathematical symbols. As shown in Figure 6(a), in the Z language this structure is called a schema. It is the basic description unit and the basic structure of Z language. A pattern includes the name of the pattern (S), the declaration part (D), and the assertion part (P), and the assertion part is divided into pre-assertion and post-assertion. The framework is similar to a class in the C++ language, as shown in Figure 6(b). A class has a class name, attributes, and functions, which correspond to the name, declaration part, and assertion part of the schema, respectively.

In fact, Z language is just a set of prescribed mathematical symbols, and the “program” written in Z language is an abstract design of computer software or hardware system. Therefore, the content written in the Z language is not a computer program, nor is it a code that can be compiled to generate and can be run on a computer. The content written in Z language is not for the computer to run, but for human understanding and analysis. Using Z language, users can understand the modules, data types, and processes of the system so that the system can be analyzed, optimized, verified, tested, etc. In terms of data abstraction, Z language has a stronger descriptive ability than Petri net [31]. Using the Z language, Petri net state can be reduced and state explosion can be avoided.

2.5. Reinforcement Learning

In reinforcement learning, when the agent makes an action according to the policy function in the state , the environment rewards the agent according to the action. We usually have two methods for artificial intelligence to intelligently control the agent: the first method is policy learning, and the second method is value learning.

Policy learning uses a neural network to approximate the policy function , so the neural network is also called a policy network. The parameters in the policy network represent the weights, and they are updating and optimizing through the actions of the agent and the rewards given by the environment so that the policy network can achieve the desired result. It is represented by the policy gradient algorithm in policy learning, and the Monte Carlo method is the simplest among the many gradient algorithms.

Value learning is also a method of approximating the value function using a neural network called a value network. and represent the state and action at time , respectively, and , , and represent the state, action, and weight of the value network, respectively. The value network is used to predict the score of the action in the current state, and then, the actual reward of the action is given according to the environment, and the gradient descent method is used to correct the future reward prediction. Common value learning algorithms include Q-learning algorithm and deep Q-network algorithm (DQN). Based on the Q-learning algorithm, the DQN algorithm combines the deep neural network, uses the neural network to approximate the value function, and uses the experience playback to train the entire process of reinforcement learning.

It should be noted that there are many learning methods in reinforcement learning. In this study, we use the actor-critic method for reinforcement learning. The method approximates the policy function and the value function using two neural networks, the policy network (actor) is equivalent to a gymnast, and the value network (critic) is equivalent to a gymnastic referee. In the initial state, athletes can only make random actions, and the referee can only make random evaluations based on the athletes’ actions and current state. After the referee makes an action, the environment gives corresponding rewards, and the referee gradually becomes professional by catering to the environment. After the athlete makes an action, the referee evaluates it. By catering to the referee, the athlete’s action gradually becomes standardized to achieve the purpose of optimizing the strategy function. Figure 7 depicts the working principle of the actor-critic method.

In the actor-critic method, a state value function is defined, where is the policy function and is the value function, because two neural networks are used to approximate the policy function and the value function, respectively, so the policy network is , the value network is , and the state value function is as follows:

By summing the products of the probability and value of each action, we can easily obtain the pros and cons of executing the policy function π in the current state s. The main steps of the method are shown in Figure 8. Among them, the temporal difference (TD) algorithm is an algorithm that stages the entire model training process, which is suitable for the idea of discounted returns in reinforcement learning and can be applied to value learning.

Comparing the above three reinforcement learning algorithms, their respective advantages and disadvantages are shown in Table 1 [3234]. Although the representative algorithm DQN in value learning can store previous data through the experience pool and break the relationship between information, it can effectively solve the problem of complex state and action and the existence of a correlation between data. However, this method still has problems such as overfitting, low sample utilization, and unstable evaluation in the solution process. Compared with value learning, policy learning is simpler and has better convergence, but it still has shortcomings such as high algorithm variance, slow convergence speed, and difficulty in determining the learning step size. To further reduce the variance, the actor-critic algorithm is proposed, and the state value function is used as the baseline to predict the value of the bootstrapping method, which greatly reduces the variance, but it introduces deviation, making it a major drawback of the actor-critic algorithm.

3. Modeling with SPZN

To improve the abstraction ability and randomness ability in the intelligent networked automobile system, this study integrates the stochastic Petri net and the Z framework and proposes SPZN. Using the randomness of SPN and the abstraction ability of Z, the shortcomings of the system can be effectively improved. Compared with TPN, SPN, and PZN, SPZN can define and describe the system more efficiently.

3.1. SPZN

Definition 2. A SPZN is a tuple , where(1) is a set of the places, (2) is a set of the transitions, (3) is a set of the arcs, which links a place and a transition(4) is the initial marking, which describes the initial state of the system(5) is a set of the transition implementation rate, which is a concept with in SPN(6) is a basic Petri net(7) is a stochastic Petri net(8) is a PZN(9) is a set of the place based on Z(10) is a set of the transition based on Z(11) is a set of the one-to-one map relationship between and (12) is a set of the one-to-one map relationship between and To better express SPZN, the corresponding relationship between SPN and Z is shown in Figure 9. Among them, the implementation rate of transition in SPN can be used as a precondition for the occurrence of transition T, so corresponding to the pre-assertion in Z language, the implementation rate of transition can be reflected in the assertion. In Petri nets, the place is treated as an entity that does not have any action, so it has only attributes, corresponding to the declaration part in the Z language, whereas the transition is treated as an action, and the place is triggered by the transition, so the transition has properties and functions, corresponding to the declaration part and the assertion part in the Z language.

3.2. Model Refining

The modeling process of an intelligent networked vehicle system with SPZN is shown in Figure 10. First, the vehicle’s driving data, node device information, and other data from the smart car are obtained. While initializing the intelligent networked car system, the obtained node device information is abstracted, and the Z framework and node device information are established for the system node. The preconditions, post-conditions, input and output parameters are set, and an SPN model is established for the system information transmission process.

According to the above flowchart, the modeling of the system can be realized and the corresponding Markov chain can be constructed to calculate the steady-state probability. Taking measures to protect the vulnerable parts of the system based on the steady-state probability, the stability of the system can also be improved by changing the implementation rate of each transition.

3.3. Optimization Method of Parameter

In stochastic Petri nets, the transition implementation rate needs to be set for each transition. It is known through equation (5) in the Markov process that the transition implementation rate can affect the system steady-state probability. In practical problems, however, it is not easy to make reasonable assumptions about the rate of transition implementation based on improving system stability. In the papers of Song et al. [24] and Sun and Li [23], only the assumed transition implementation rate is used, and no method for determining the transition implementation rate is given. To this end, we propose an optimization method for the transition implementation rate , which uses the actor-critic algorithm to train and optimize the transition implementation rate in SPZN-SPN to improve the stability of the system, as shown in Algorithm 1.

Input: SPN, Environment, the Reward and Punishment Rules.
Output: .
Let the initial global time be and the SPN Contains Places and transactions.
(1)The reachable marking graph and Markov Chain with states Can be derived from the SPN.
(2)Random initialization transactions implementation rate and let .
(3)The nth-order square matrix is introduced based on the Markov Chain obtained in Step 1 and Definition 1.
(4)According to the Markov Chain in Step 1, the steady-state probability vector is an -dimensional vector.
(5)According to equation (5), the vector at the current moment is calculated.
(6)Set the set of states , the set of bad states . By then the steady-state probability vector of good states , the steady-state probability vector for the bad state .
(7)iteration = 0;
(8)While (iteration ≥ 100)
 (1) Observe the current state ,i.e., the current steady-state probability . Random initialize the policy network and the value network and randomly sample an action according to the policy network.
 (2) Execute action . The environment generates the next states , according to action . The reward is calculated according to the reward and punishment rules.
 (3) The policy network randomly samples an action according to the current state , but does not execute the action .
 (4) From the value network
 (5) Calculate TD error according to TD algorithm , and is the discount rate.
 (6) Calculate the value network gradient .
 (7) Update the value network, .
 (8) Calculate the policy network gradient, .
 (9).Update the policy network, .
 (10) t++ iteration++,
where and correspond to the learning rates in the value network and policy network, respectively.

We initialize the global time . If there exists an SPN containing places and transitions, then the implementation rate is randomly initialized. The corresponding reachable marking graph and Markov chain containing states can be obtained from the SPN. From Definition 1, equation (5), and the Markov chain, the steady-state probability vector of each state at the current moment can be calculated, .

In the actor-critic algorithm, there are four elements: state, environment, action, and reward. The environment corresponds to equation (5), and calculating equation (5) yields the current state, i.e., the steady-state probability of each state at the current moment . We take changing the rate of transition implementation as the action in the algorithm, and for each transition implementation rate , there exist two actions, i.e., increasing and decreasing by an order of magnitude, so that for a stochastic Petri net with transitions there are a total of actions. , where represents the action that adds one to the value of and represents the action that subtracts one from the value of . The reward should be used as part of the environment, but equation (5) does not have its function. Therefore, we redefined the reward and punishment rules of the environment as shown in equations (7) and (8).

The purpose of this algorithm is to increase the probability of stable states and decrease the probability of nonideal states, so we group the stable states into the set of good states , and the undesirable states are grouped into the set of bad states , and it should be noted that and . By finding the steady-state probabilities of the corresponding states from , we have the good state probability vector and the bad state probability vector .

Because changing the rate of transition implementation may cause the probability of good states and bad states to increase or decrease simultaneously, it is stipulated that the probability of a good state at time is higher than the probability of time t, and then, the positive feedback score given by the environment is greater and vice versa. The greater the probability of a bad state, the greater the negative feedback score given by the environment and vice versa. If the probabilities of good states and bad states are the same, then the positive and negative feedback scores remain the same. At time t, the score given by the environment is the difference between the positive feedback score and the negative feedback score. The above is an explanation of equations (7) and (8) and the reward and punishment rules.

4. Modeling Analysis

4.1. Reachability

Reachability is the most basic dynamic property of Petri nets, and all other properties must be defined by reachability, so whether a Petri net is reachable is crucial.

Definition 3. Let be a Petri net; if there is that makes , then is called directly reachable from . If there are transition sequences and marking sequences , such that , then is said to be reachable from . The set of all markings reachable from is denoted as .
By abstracting the above definitions, the algorithm for verifying the reachability of Petri nets as shown in Algorithm 2 can be obtained.

Input: .
Output: .
 if
  
 else if
  
 else
  .
Generally speaking, to verify that a Petri net is reachable, it can be judged by constructing a reachable marking graph. As shown in the reachable marking graph in Figure 5(b), it can be seen that each state of the reachable marking graph is reachable, and there is no isolated state, so the Petri net has good reachability.

4.2. Boundedness and Safety

For any Petri net, we have Definition 4 and Definition 5 to verify its safety and boundedness.

Definition 4. Let be a Petri net; if there is a positive integer such that , then the place is called bounded, and the smallest positive integer that satisfies this condition is called the bound of place , denoted as .When , the place is said to be safe.

Definition 5. Let be a Petri net; if any place is bounded, it is called a bounded Petri net.We call the bounds of . When , we call is safe.
By abstracting Definition 4 and Definition 5, the algorithm for verifying the boundedness and security of Petri nets as shown in Algorithm 3 can be obtained. Among them, and are the security of the place p and Petri net, respectively. When , the object is unsafe, and when , the object is safe. For the Petri net shown in Figure 5(a), by observing its operation, it is not difficult to see that the places to are bounded, and their bounds are 1, so they are all safe. According to Definition 4 and Definition 5, the Petri net is safe and bound.

Input: .
Output: .
 if
  
  if
   
  else
   
  if
   
   if
    
   else
    
  else
   
 else
  

4.3. Advantage

For the recent research on the security of intelligent connected vehicles, Abu et al. [35] discussed what attack methods are available when attacking intelligent connected vehicles, what solutions can be adopted to defend against attacks, and the efficiency comparison of different solutions. Vijayarangam et al. [36] conducted a more in-depth discussion on the “data fascination attack,” an attack method in intelligent connected vehicles, and proposed a model to detect data fascination attacks to enhance the security of connected vehicles. Additionally, the authors propose a model to reduce time in the case of traffic jams. With the above two models, ICVs can be prevented from data fascination attacks and the throughput efficiency can be improved.

The above two research studies are all proposed models and methods to improve the security of intelligent connected vehicles when they are attacked. They have not studied the security of intelligent vehicles during operation, but the SPZN model we propose can study the safety of smart car during operation. Since SPN has randomness that TPN does not have, it can vividly describe some random events. It is unavoidable that various random events occur during the vehicle’s operation. Therefore, describing random events is the most crucial thing for smart cars. Notably, the best solution to random events is to prevent them in advance.

By adjusting the transiton implementation rate in SPN, in this study, an optimization method for is also proposed. Through reinforcement learning, the optimal value can be quickly found, which improves the security of the system.

Compared with SPN, SPZN also has a Z framework structure that SPN does not have, which can better describe an abstract system. Because the Z framework restricts the places and transitions, it can reduce the number of states in the system. The problem of state explosion in traditional Petri nets is effectively avoided. As shown in Table 2, the advantages of SPZN are more intuitively displayed.

5. A Case Study

To verify the effectiveness of our modeling method for the analysis and verification algorithm, in this section, we simplify the sensor control system of the smart car and only consider the situation of the speed control subsystem when the smart car is driving on a straight road with random events, such as shown in Figure 11. Suppose a smart car has four radar sensors, two video monitors, an acceleration controller, a deceleration controller, a brake, a front sensor subsystem, a rear sensor subsystem, an acceleration system, a deceleration system, and a braking system.

The first step of model building is to obtain the Z frame of each node. Due to the limited space, only the Z frame in part of the system model is given.

The frame above shows the definition of the Z frame for smart cars, which is represented by in the system, and the blue dashed box in the picture defines the Z frame for the related equipment in the system. The following picture shows the definition of the Z frame in system. A certain node device is defined, and it is also an element in .

In the frame below, we have defined action in the system. It is also an element in , corresponding to a transition in SPZN.

In the first step of model building, we obtained the Z frame structure of each node device, place, and transition. The SPN model of the intelligent vehicle speed control subsystem will be constructed in the next second step. Before constructing the SPN model, we analyzed the process of this subsystem and got the flow chart shown in Figure 12. The part of the constructed SPN model is shown in Figure 13.

In the intelligent vehicle speed control subsystem shown in Figure 13, the transition is triggered from the initial place , indicating that the vehicle has been started. Then, the various sensors of the vehicle are monitored when the vehicle is started. If a random event occurs, corresponding processing will be performed according to the type of random event. There are usually three processing modes in the vehicle speed control subsystem: acceleration, deceleration, and braking. After processing random events, the system continues to monitor individual sensors. The meaning of each place and transition is shown in Table 3.

It can be seen from Figure 13 and Table 3 that there are mainly two situations in this case: the normal driving of the vehicle and the random event of the vehicle. Therefore, we will discuss the above two situations separately.

5.1. The Normal Driving Condition of the Vehicle

In this subsection, only the normal driving of the smart car is considered, and there will be no random events before and after the vehicle. Therefore, it is very easy to obtain the SPN model of the normal driving of the car as shown in Figure 14, and the meanings of the corresponding places and transitions in this figure are shown in Table 4.

It is easy to construct its reachable marking graph according to the constructed SPN model. Because the reachable marking graph of SPN is isomorphic with the Markov chain, the Markov chain can be used to calculate the steady-state probability of each possible state and use related mathematical methods to analyze the equilibrium state and change law of the system. Therefore, we constructed the isomorphic Markov chain as shown in Figure 15 from the reachable marking graph.

It can be seen from Figure 15 that the Markov chain contains 5 states, so we can obtain a 5 × 5-order transition matrix from Definition 1, as shown as follows. Assume that there is a steady-state probability vector , and the following equation can be obtained from equation (5):

Assume that the value of the transition implementation rate is shown in Table 5; then, we can get the steady-state probability in each state, as shown in Table 6.

From Table 6, the steady-state probability of each state occurring at the corresponding implementation rate of transition can be obtained. The steady-state probability should be as large as possible for the state favorable to the system stability, and the steady-state probability should be as small as possible for the state unfavorable to the system stability. For a state with a small probability, if the state is a key state in the system, relevant physical means can be used to increase the probability of the state, thereby improving the stability of the system. For example, the vulnerable parts of the vehicle should be inspected and repaired regularly. In addition, the probability of the state can also be changed by changing the rate of implementation of the transitions. For the state , its steady-state probability is 0.4651, which ranks first among all states. The meaning of the state is that the vehicle is running normally, so there is no need to adjust the transition implementation rate of the SPN. However, it should be noted that the meaning of state is to process the normal information of the sensor in front of the vehicle first, and the meaning of state is to process the normal information of the sensor behind the vehicle first. Although the two have the same effect, the steady-state probability is different. The reason is that the transition implementation rates and corresponding to the two states are different. It is not difficult to see that different transition implementation rates have a significant impact on the steady-state probability.

5.2. The Random Event of the Vehicle

In this subsection, because the random events in front of the vehicle and the random events behind the vehicle are essentially the same, only the situation where the random event occurs in front of the vehicle is considered, and corresponding measures are taken according to the different random events that occur. The constructed SPN model is shown in Figure 16, and the meanings of the places and transitions in this figure are shown in Table 7, and the corresponding Markov chain is constructed as shown in Figure 17.

It can be seen from Figure 17 that the Markov chain contains 8 states, so we can obtain a 8 × 8-order transition matrix from Definition 1, as shown as follows. Assume that there is a steady-state probability vector , and the following equation can be obtained from equation (5):

Assume that the transition implementation rate is shown in Table 8; according to equation (12), the steady-state probability shown in Table 8 can be obtained.

It can be seen from Table 9 that the steady-state probability of state is 0.2907, which is the largest among all steady-state probabilities. It can be seen from Table 10 that the meaning of the state is the vehicle initialization state. For a smart car, the initialization state is obviously not the state we want to maintain for a long time, but this state is not an unacceptable state. So, the state is neither a good state nor a bad state. For the good state in the system, the steady-state probability needs to be increased, and for the bad state in the system, the steady-state probability needs to be reduced, thereby improving the system stability.

We keep the other transition implementation rates unchanged and only change the value of the transition implementation rate , so the change trend of the steady-state probability of each state as shown in Figure 18 can be obtained. It can be seen that with the increase in , that is, the increase in the starting trigger rate of the vehicle, the steady-state probability of state will decrease rapidly, and the probability of being in other states will also increase with the increase in . Different states have different magnitudes of increase. The steady-state probability increases the most for state in this figure, since this state is directly affected by the transition implementation rate . It is not difficult to see that when , the steady-state probability of each state tends to be stable, and the steady-state probability of state is the highest. The corresponding meaning of state is that the vehicle is running normally, which is the state we want to achieve and maintain for a long time, so increasing the value of helps to maintain the stability of the system.

We know that in normal driving, smart cars need to prevent and avoid random events as much as possible.Whether it is a random event inside the car such as brake failure or high engine temperature, or a random event outside the car such as emergency braking of the car in front or pedestrians passing by, it can cause a huge impact on the normal driving of smart cars. For random events outside the car, we cannot predict in advance, but we can formulate different measures to prevent different random events. For random events in the car, protective facilities can be installed in the relevant parts of the car, and relevant measures can be taken to prevent them.

Through the above experiments, it is found that the transition implementation rate has a significant impact on the steady-state probability. Therefore, we use the reinforcement learning method to optimize the transition implementation rate and take corresponding preventive measures and solutions to the system according to the optimized transition implementation rate, which can effectively improve the security of the system in a stochastic dynamic environment.

In this case, there are a total of 7 states, in which the set of good states is only , and the set of bad states is only . According to the previous definitions of action, environment, good state set, and bad state set, Table 11 can be obtained.

After the basic parameters are set, we still use the data in Table 8 to initialize the transition implementation rate λ. The state corresponding to the transition implementation rate, that is, the steady-state probability vector of each state, is . After simulation with Algorithm 1, the optimized state is obtained as . It is not difficult to see that the steady-state probability of state has increased from 0.2180 to 0.3297, and the steady-state probability of state has decreased from 0.1246 to 0.0733, successfully increasing the steady-state probability of good states and reducing the steady-state probability of bad states. The optimized transition implementation rates are shown in Table 12.

By observing the optimized transition implementation rates, we can find that compared with the previous transition implementation rates, , , and increase, , , and decrease, and , , , and remain unchanged. According to the corresponding meaning of each transition implementation rate, on the basis of the original vehicle setting, increasing the vehicle start trigger rate and the vehicle brake trigger rate will help to improve the system stability. Prompt maintenance after a car crash also contributes to the stability and safety of the vehicle.

6. Conclusions

In this study, a formal modeling and verification method for smart cars based on stochastic Petri nets and Z language framework is proposed. With this method, the simulation of stochastic events during the driving process of smart cars can be better implemented. In addition, the study uses reinforcement learning to optimize the proposed method to improve the safety of smart connected cars. Experimental cases are given to explain and study the method in detail. Although the method addresses the stochastic nature of events and the abstraction of the system well, it only considers the speed control system of a single smart car driving on a straight road and does not consider the coordinated control of multiple smart vehicles. The complexity and size of the current model will not be conducive to modeling and validation using Petri nets. How to abstract and simplify the model will be our next work.

Data Availability

The experimental data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this study.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grants 62273065 and 61903053 and the Science and Technology Research Program of Chongqing Municipal Education Commission in China under Grants KJZD-K201800701 and KJQN201900702.