Abstract

This paper reviews the research work done on the response time analysis of messages in controller area network (CAN) from the time CAN specification was submitted for standardization (1990) and became a standard (1993) up to the present (2012). Such research includes the worst-case response time analysis which is deterministic and probabilistic response time analysis which is stochastic. A detailed view on both types of analyses is presented here. In addition to these analyses, there has been research on statistical analysis of controller area network message response times.

1. Introduction

The arbitration mechanism employed by CAN means that messages are sent as if all the nodes on the network share a single global priority-based queue. In effect, messages are sent on the bus according to fixed priority nonpreemptive scheduling [1]. In the early 1990s, a common misconception was that although the protocol was very good at transmitting the highest priority messages with low latency, it was not possible to guarantee that the less urgent signals carried in lower priority messages would meet their deadlines [1]. In 1994, Tindell et al. [25] showed how research into fixed priority preemptive scheduling for single processor systems could be applied to the scheduling of messages on CAN. This analysis provided a method of calculating the worst-case response times of all CAN messages. Using this analysis it became possible to engineer CAN-based systems for timing correctness, providing guarantees that all messages and the signals that they carry would meet their deadlines. In 2007, Davis et al. [1] refuted this analysis and showed that multiple instances of CAN messages within a busy period (this period begins with a critical instant) need to be considered in order to guarantee that the message and the signals that they carry would meet their deadlines, since CAN effectively implements fixed priority nonpreemptive scheduling of messages.

Real-time researchers have extended schedulability analysis to a mature technique which for non-trivial systems can be used to determine whether a set of tasks executing on a single CPU or in a distributed system will meet their deadlines or not [1, 2, 4, 5]. The essence of this analysis is to investigate if deadlines are met in a worst-case scenario. Whether this worst case actually will occur during execution, or if it is likely to occur, is not normally considered [6].

In contrast with schedulability analysis, reliability modelling involves the study of fault models, the characterization of distribution functions of faults, and the development of methods and tools for composing these distributions and models in estimating an overall reliability figure for the system [6].

This separation of deterministic (0/1) schedulability analysis and stochastic reliability analysis is a natural simplification of the total analysis. This is because the deterministic schedulability analysis is quite pessimistic, since it assumes that a missed deadline in the worst case is equivalent to always missing the deadline, whereas the stochastic analysis extends the knowledge of the system by computing how often a deadline is violated [7].

There are many other sources of pessimism in the analysis, including considering worst-case execution times and worst-case phasings of executions, as well as the usage of pessimistic fault models. In a related work [8], a model for calculating worst-case latencies of controller area network (CAN) frames (messages) under error assumptions is proposed. This model is pessimistic, in the sense that there are systems that the analysis determines to be unschedulable, even though deadlines will be missed only in extremely rare situations with pathological combinations of errors.

In [9, 10] the level of pessimism is reduced by introducing a better fault model, and in [9] variable phasings between message queuing are also considered, in order to make the model more realistic. In [11] the pessimism introduced by the worst-case analysis of CAN message response times is reduced by using bit-stuffing distributions in the place of the traditional worst-case frame sizes which are referred to in [6, 7].

The organization of the paper is as follows: in Section 2, the review of the research on Worst Case Response Time Analysis of CAN messages is presented, and in Section 3, the review of the research on Probabilistic Response Time Analysis of CAN messages is presented. In both sections, the method of bit stuffing is reviewed.

2. Worst-Case Response Time Analysis of CAN Messages

In automotive applications, the messages sent on CAN are used to communicate state information, referred to as signals, between different ECUs. Examples of signals include wheel speeds, oil and water temperature, engine rpm, gear selection, accelerator position, dashboard switch positions, climate control settings, window switch positions, fault codes, and diagnostic information. In a high-end vehicle there can be more than 2500 distinct signals, each effectively replacing what would have been a separate wire in a traditional point-to-point wiring loom.

Many of these signals have real-time constraints associated with them. For example, an ECU reads the position of a switch attached to the brake pedal. This ECU must send a signal, carrying information that the brakes have been applied, over the CAN network so that the ECU responsible for the rear light clusters can recognise the change in the value of the signal and switch the brake lights on. All this must happen within a few tens of milliseconds of the brake pedal being pressed. Engine, transmission, and stability control systems typically place even tighter time constraints on signals, which may need to be sent as frequently as once every 5 milliseconds to meet their time constraints [1]. Hence it is essential that CAN messages meet their deadlines.

2.1. Related Work

CAN is a serial data bus that supports priority-based message arbitration and non-pre-emptive message transmission. The schedulability analysis for CAN builds on previous research into fixed priority scheduling of tasks on single processor systems [12].

In 1990, Lehoczky [13] introduced the concept of a busy period and showed that if tasks have deadlines greater than their periods (referred to as arbitrary deadlines) then it is necessary to examine the response times of all invocations of a task falling within a busy period in order to determine the worst-case response time. In 1991, Harbour et al. [14] showed that if deadlines are less than or equal to periods, but priorities vary during execution, then again multiple invocations must be inspected to determine the worst-case response time. We note that non-pre-emptive scheduling is effectively a special case of pre-emptive scheduling with varying execution priority—as soon as a task starts to execute, its priority is raised to the highest level. In 1994, Tindell et al. [12] improved upon the work of Lehoczky [13], providing a formulation for arbitrary deadline analysis based on a recurrence relation.

Building upon these earlier results, comprehensive schedulability analysis of non-pre-emptive fixed priority scheduling for single processor systems was given by George et al. in 1996 [15]. In 2006, Bril [16] refuted the analysis of fixed priority systems with deferred pre-emption given by Burns in [17], showing that this analysis may result in computed worst-case response times that are optimistic. The schedulability analysis for CAN given by Tindell et al. in [25] builds upon [17] and suffers from essentially the same flaw. A similar issue with work on pre-emption thresholds [18] was first identified and corrected by Regehr [19] in 2002. A technical report [20] and a workshop paper [21] highlight the problem for CAN but do not provide a specific in-depth solution.

The revised schedulability analysis presented in [1] aims to provide an evolutionary improvement upon the analysis of CAN given by Tindell et al. in [25]. To do so, it draws upon the analysis of Tindell et al. [12] for fixed priority pre-emptive scheduling of systems with arbitrary deadlines, and the analysis of George et al. [15] for fixed priority non-pre-emptive systems, and also presents a sufficient but not necessary schedulability tests, to overcome the complexities involved in calculating the response times of multiple instances of CAN messages within the busy period.

2.2. Bit Stuffing in CAN Messages

CAN was designed as a robust and reliable form of communication for short messages. Each data frame carries between 0 and 8 bytes of payload data and has a 15-bit Cyclic Redundancy Check (CRC). The CRC is used by receiving nodes to check for errors in the transmitted message. If a node detects an error in the transmitted message, which may be a bit-stuffing error, a CRC error, a form error in the fixed part of the message or an acknowledgement error, then it transmits an error flag [22]. The error flag consists of 6 bits of the same polarity: “000000” if the node is in the error active state and “111111” if it is error passive. Transmission of an error flag typically causes other nodes to also detect an error, leading to transmission of further error flags.

Figure 1 illustrates CAN error frames, reproduced from [1]. The length of an error frame is between 17 and 31 bits. Hence each message transmission that is signalled as an error can lead to a maximum of 31 additional bits of error recovery overhead plus retransmission of the message itself [22].

One characteristic of Nonreturn-to-Zero code that is adopted in CAN bus is that the signal provides no edges that can be used for resynchronization if transmitting a large number of consecutive bits with the same polarity. Therefore bit stuffing is used to ensure synchronization of all bus nodes. This means that during the transmission of a message, a maximum of five consecutive bits may have the same polarity. The bit-stuffing area in a CAN bus frame includes the SOF, Arbitration field, Control field, Data field, and CRC field. Since bit stuffing is used, six consecutive bits of the same type (111111 or 000000) are considered an error.

As the bit patterns “000000” and “111111” are used to signal errors, it is essential that these bit patterns are avoided in the variable part of a transmitted message (refer to Figure 3). The CAN protocol therefore requires that a bit of the opposite polarity is inserted by the transmitter whenever 5 bits of the same polarity are transmitted. This process referred to as bit stuffing, is reversed by the receiver. The worst-case scenario for bit stuffing is shown in Figure 2 [1]. Note that each stuff bit begins a sequence of 5 bits that is itself subject to bit stuffing.

Stuff bits increase the maximum transmission time of CAN messages. After including stuff bits and the interframe space, the maximum transmission time of a CAN message containing data bytes is given by where is 34 for standard format (11-bit identifiers) or 54 for extended format (29-bit identifiers), is notation for the floor function, which returns the largest integer less than or equal to a/b, and is the transmission time for a single bit.

The formula given in (1) simplifies to for 11-bit identifiers and for 29-bit identifiers.

2.3. Scheduling Model

The system is assumed to comprise a number of nodes (microprocessors) connected via CAN. Each node is assumed to be capable of ensuring that at any given time when arbitration starts, the highest priority message queued at that node is entered into arbitration [1].

The system is assumed to contain a static set of hard real-time messages each statically assigned to a node on the network. Each message has a fixed identifier and hence a unique priority. As priority uniquely identifies each message, in the remainder of this paper we will overload to mean either message or priority as appropriate. Each message has a maximum number of data bytes and a maximum transmission time , given by (1).

Each message is assumed to be queued by a software task, process or interrupt handler executing on the host microprocessor. This task is either invoked by, or polls for, the event and takes a bounded amount of time between 0 and to queue the message ready for transmission. is referred to as the queuing jitter of the message and is inherited from the overall response time of the task, including any polling delay.

The event that triggers queuing of the message is assumed to occur with a minimum interarrival time of , referred to as the message period. This model supports events that occur strictly periodically with a period of , events that occur sporadically with a minimum separation of and events that occur only once before the system is reset, in which case is infinite.

Each message has a hard deadline , corresponding to the maximum permitted time from occurrence of the initiating event to the end of successful transmission of the message, at which time the message data is assumed to be available on the receiving nodes that require it. Tasks on the receiving nodes may place different timing requirements on the data; however in such cases we assume that is the tightest of such time constraints.

The worst-case response time of a message is defined as the longest time from the initiating event occurring to the message being received by the nodes that require it.

A message is said to be schedulable if and only if its worst-case response time is less than or equal to its deadline . The system is schedulable if and only if all of the messages in the system are schedulable [1].

2.4. Response Time Analysis

Response time analysis for CAN aims to provide a method of calculating the worst-case response time of each message. These values can then be compared to the message deadlines to determine if the system is schedulable.

For systems complying with the scheduling model given in Section 2.3, the CAN has effectively implemented fixed priority non-pre-emptive scheduling of messages. Following the analysis in [25] the worst-case response time of a message can be viewed as being made up of three elements:(i)the queuing jitter , corresponding to the longest time between the initiating event and the message being queued, ready for transmission on the bus,(ii)the queuing delay , corresponding to the longest time that the message can remain in the CAN controller slot or device driver queue before commencing successful transmission on the bus,(iii)the transmission time , corresponding to the longest time that the message can take to be transmitted.

The worst-case response time of message is given by

The queuing delay comprises blocking , due to lower priority messages which may be in the process of being transmitted when message is queued and interference due to higher priority messages which may win arbitration and be transmitted in preference to message .

The maximum amount of blocking occurs when a lower priority message starts transmission immediately before message is queued. Message must wait until the bus is idle before it can be entered into arbitration. The maximum blocking time is given by where is the set of messages with lower priority than .

The concept of a busy period, introduced by Lehoczky [13], is fundamental in analysing worst-case response times. Modifying the definition of a busy period given in [14] to apply to CAN messages, a priority level-busy period is defined as follows(i)It starts at some time when a message of priority or higher is queued ready for transmission, and there are no messages of priority or higher waiting to be transmitted that were queued strictly before time .(ii)It is a contiguous interval of time during which any message of priority lower than is unable to start transmission and win arbitration.(iii)It ends at the earliest time when the bus becomes idle, ready for the next round of transmission and arbitration, yet there are no messages of priority or higher waiting to be transmitted that were queued strictly before time .

The key characteristic of a busy period is that all messages of priority or higher queued strictly before the end of the busy period are transmitted during the busy period. These messages cannot therefore cause any interference on a subsequent instance of message queued at or after the end of the busy period.

In mathematical terminology, busy periods can be viewed as right half-open intervals: [) where is the start of the busy period and the end. Thus the end of one busy period may correspond to the start of another separate busy period. This is in contrast to the simpler definition given in [13], which unifies two adjacent busy periods as we have defined them, and therefore sometimes results in analysis of more message instances than is strictly necessary. For example, in the extreme case of 100% utilisation, the busy period defined in [13] never ends, and an infinite number of message instances would need to be considered.

The worst-case queuing delay for message occurs for some instances of message queued within a priority level-busy period that starts immediately after the longest lower priority message begins transmission. This maximal busy period begins with a so-called critical instant [1] where message is queued simultaneously with all higher priority messages, and then each of these messages is subsequently queued again after the shortest possible time intervals. In the remainder of this paper a busy period means this maximum length busy period.

If more than one instance of message is transmitted during a priority level-busy period, then it is necessary to determine the response time of each instance in order to find the overall worst-case response time of the message.

In [25], Tindell gives the following equation for the worst-case queuing delay: where is the set of messages with priorities higher than and is notation for the ceiling function which returns the smallest integer greater than or equal to .

Although appears on both sides of (6), as the right hand side is a monotonic nondecreasing function of , the equation may be solved using the following recurrence relation:

A suitable starting value is . The relation iterates until either , in which case the message is not schedulable or , in which case the worst-case response time of the first instance of the message in the busy period is given by .

The flaw in the previous analysis is that, given the constraint , it implicitly assumes that if message is schedulable, then the priority level- busy period will end at or before . We observe that with fixed priority pre-emptive scheduling this would always be the case, as on completion of transmission of message , no higher priority message could be awaiting transmission. However, with fixed priority non-pre-emptive scheduling, a higher priority message can be awaiting transmission when message m completes transmission, and thus the busy period can extend beyond [1].

The length of the priority level- busy period is given by the following recurrence relation, starting with an initial value of and finishing when where is the set of messages with priority or higher. As the right hand side is a monotonic nondecreasing function of , the recurrence relation is guaranteed to converge provided that the bus utilisation , for messages of priority and higher, is less than 1: If , then the busy period ends at or before the time at which the second instance of message is queued. This means that only the first instance of the message is transmitted during the busy period. The existing analysis calculates the worst-case queuing time for this instance via (7) and hence provides the correct worst-case response time in this case.

If then the existing analysis may give an optimistic worst-case response time depending upon whether the first or some subsequent instance of message in the busy period has the longest response time.

The analysis presented in Appendix of [15] suggests that is the smallest value that is a solution to (8); however this is not strictly correct [1]. For the lowest priority message, and so is trivially the smallest solution. This problem can be avoided by using an initial value of [1].

The number of instances of message that become ready for transmission before the end of the busy period is given by

To determine the worst-case response time of message , it is necessary to calculate the response time of each of the instances and then take the maximum of these values.

In the following analysis, the index variable is used to represent an instance of message . The first instance in the busy period corresponds to and the final instance to . The longest time from the start of the busy period to the instance at beginning successful transmission is given by The recurrence relation starts with a value of and ends when or when in which case the message is unschedulable. For values of an efficient starting value is given by . The event of initiating instance of the message occurs at time relative to the start of the busy period, so the response time of instance is given by The worst-case response time of message is therefore

The analysis presented previously is also applicable when messages have deadlines that are greater than their periods, so-called arbitrary deadlines [1]. However, if such timing characteristics are specified, then the software device drivers or CAN controller hardware may need to be capable of buffering more than one instance of a message. The number of instances of each message that need to be buffered is bounded by

The analysis presented in [15] effectively uses rather than . This yields a value which is one too large when the length of the busy period plus jitter is an integer multiple of the message period. Although this does not give rise to problems, the more efficient formulation given by (10) is preferred [1].

The analysis given in this section as per Davis et al. [1] corrects a significant flaw in the previous schedulability analysis for CAN, given by Tindell et al. [25]. However, this schedulability test presented is more complex, potentially requiring the computation of multiple response times.

An upper bound on the queuing delay of the second and subsequent instances of message within the busy period is therefore given by

This result suggests a simple but pessimistic schedulability test. An instance of message can either be subject to blocking due to lower priority messages or to push through interference of at most due to the previous instance of the same message, but not both. Hence we can modify (7) to provide a correct sufficient but not necessary schedulability test: A further simplification is to assume that the blocking factor always takes its maximum possible value: where corresponds to the transmission time of the longest possible CAN message (8 data bytes) irrespective of the characteristics and priorities of the messages in the system. So far we have assumed that no errors occur on the CAN bus. However as originally shown in [25], schedulability analysis of CAN may be extended to include an appropriate error model.

In [1] it is assumed that the maximum number of errors present on the bus in some time interval is given by the function . No specific detail about this function is assumed, save that it is a monotonic non-decreasing function of . The schedulability equations are modified to account for the error recovery overhead. The worst-case impact of a single bit error is to cause transmission of an additional 31 bits of error recovery overhead plus retransmission of the affected message. Only errors affecting message or higher priority messages can delay message from being successfully transmitted. The maximum additional delay caused by the error recovery mechanism is therefore given by Revising (8) to compute the length of the busy period we have Again an appropriate initial value is . Equation (19) is guaranteed to converge, provided that the utilisation including error recovery overhead is less than 1.

As before, (10) can be used to compute the number of message instances that need to be examined to find the worst-case response time: Equation (20) extends (11) to account for the error recovery overhead. Note that as errors can impact the transmission of message itself, the time interval considered in calculating the error recovery overhead includes the transmission time of message as well as the queuing delay. Equations (20), (12), and (13) can be used together to compute the response time of each message instance and hence find the worst-case response time of each message in the presence of errors at the maximum rate specified by the error model.

The sufficient schedulability tests given earlier in this section can be similarly modified via the addition of the term to account for the error recovery overhead [1].

3. Probabilistic Response Time Analysis of CAN Messages

3.1. Probabilistic Bit-Stuffing Distributions

When performing worst-case response-time analysis, the worst-case number of stuff bits is traditionally used. In [7], Nolte et al. introduce a worst-case response time analysis method which uses distributions of stuff bits instead of the worst-case values. This makes the analysis less pessimistic, in the sense that we obtain a distribution of worst-case response times corresponding to all possible combinations of stuff bits of all message frames involved in the response time analysis. Using a distribution rather than a fixed value makes it possible to select a worst-case response time based on a desired probability of violation; that is, the selected worst-case response time is such that the probability of a response-time exceeding it is . The main motivation for calculating such probabilistic response-times is that they allow us to reason about tradeoffs between reliability and timeliness.

The number of bits, apart from the data part in the frame, which are exposed to the bit-stuffing mechanism, is defined as which is in the range . This is because we have either 34 (CAN standard format) or 54 (CAN extended format) bits which are exposed to the bit-stuffing mechanism. 10 bits in the CAN frame are not exposed to the bit-stuffing mechanism (refer to Figure 3). The number of bytes of data in CAN message frame is defined as which is in the range .

Recall that a CAN message frame can contain 0 to 8 bytes of data. According to the CAN standard [22], the total number of bits in a CAN frame before bit stuffing is therefore where 10 is the number of bits in the CAN frame not exposed to the bit-stuffing mechanism. Since only bits in the CAN frame are subject to bit stuffing, the total number of bits after bit stuffing can be no more than Intuitively the above formula captures the number of stuffed bits in the worst case scenario, shown in Figure 2.

The expression (22) describes the length of a CAN frame in the worst case. In [6], the number of stuff bits is represented as a distribution. By using a distribution of stuff bits instead of the worst-case number of stuff bits, it is possible to obtain a distribution of response times that allow to calculate less pessimistic (compared to traditional worst-case) response times based on probability.

Firstly, let us define as the distribution of stuff bits in a CAN message frame. We express as a set of pairs containing the number of stuff bits with corresponding probability of occurrence. Each pair is defined as , where is the probability of exactly stuff bits in the CAN frame. Note that .

As shown in [6], we can extract 9 different distributions of stuff bits depending on the number of bytes of data in the CAN message frame. We define as the distribution representing a CAN frame containing bytes of data. Recall that is the number of bytes of data in a message frame .

We define as the worst-case number of stuff bits, , to expect with a probability based on the stuff-bit distribution , that is, , or to express it in another way, the probability of finding more than stuff bits, based on the stuff-bit distribution , is .

Note that the selection of a probability should be done based on the requirements of the application. With a proper value for , the worst case mean time to failure should sufficiently exceed what is required. Finally, by assuming (as in [6]) that CAN message frames are independent in the sense of number of stuff bits, we can define as the joint distribution corresponding to the combination of distributions of stuff bits; that is, the number of stuff bits caused by a sequence of messages sent on the bus is described by ,

where denotes multiplicative combination of discrete distributions. If the distributions happen to be equal, is defined as the joint distribution of equal distributions of stuff bits; that is, the number of data bytes is the same for all messages considered by the expression.

In order to include the bit-stuffing distributions in (12), we need to redefine and as () and (), where where is the distribution of stuff-bits in the message and is the transmission time of message excluding stuff-bits: where is defined as the distribution of the total number of stuff-bits of all messages involved in the response time analysis for message .

This approach obtains the maximum stuffed bits under a given probability , to reduce pessimism of the worst-case response time and busload value.

Anyu Cheng et al. in [23] extend this work in [7] and gives the probability distribution curves of stuffed bits in message’s different lengths by introducing the probability model of stuffed bits. They design and develop scheduling analysis software on fixed priority message scheduling. Then they use the software to analyses the schedulability for the messages in a hybrid electric vehicle. Furthermore, a simulation experiment based on CANoe was made to test the design. By comparing the results, it shows that algorithm based on the probability model of stuffed bits is right, and the designed software is accurate and reliable.

3.2. Probabilistic Error Model

The analysis as presented does not cover the effect of transmission errors. Obviously, detected errors trigger the transmission of an error frame as well as a retransmission which increases the busy window and therefore the response time. On the other hand a longer busy window might increase the probability that successive errors might affect the busy window [24]. In order to include effects of errors (e.g., retransmission overhead) different approaches were introduced.

3.2.1. Related Work

A method to analyse worst-case real-time behaviour of a CAN bus was developed by Tindell et al. [5]. By applying processor scheduling analysis to the CAN bus, they showed that in the absence of faults the worst-case response time of any message is bounded and can be accurately predicted. Moreover, the analysis can be extended in order to handle the effect of errors in the channel.

The error recovery mechanism of CAN involves the retransmission at any corrupted messages. An additional term can be introduced into their analysis, called the error recovery overhead function, which is the upper bound of the overhead caused by such retransmissions in a time interval. A very simple fault model is used [5], to show how the schedulability analysis is performed in the presence of errors in the channel. The model is based on a minimum interarrival time between faults. The authors note that the error recovery function can be more accurately determined either from observation of the behaviour of CAN under high noise conditions or by building a statistical model.

Punnekkat et al. [8] extend the work of Tindell et al. by providing a more general fault model which can deal with interference caused by several sources. Punnekkat’s model assumes that every source of interference has a specific pattern, consisting of an initial burst of errors and then a distribution of faults with a known minimum interarrival time. Except for the more general fault model, the rest of the schedulability analysis is performed like [5].

Both Tindell and Punnekkat use models based on a minimum interarrival time between faults and therefore assume that the number of faults that can occur in an interval is bounded. In the environment where CAN is used, faults are caused mainly by Electromagnetic Interference (EMI) which is often observed as a random pulse train with a Poisson distribution [24]. Therefore the assumption made by the bounded model may not be appropriate for many systems because there is a realistic probability of faults occurring closer than the minimum interarrival time.

Unlike Tindell and Punnekkat, Navet et al. [25] propose a probabilistic fault model, which incorporates the uncertainty of faults caused by EMI. The fault model suggested by Navet uses a stochastic process which considers both the frequency of the faults and their gravity. In that model, faults in the channel occur according to a Poisson law and can be either single-bit faults or burst errors (which have a duration of more than one bit) according to a random distribution. This allows the interference caused by faults in the channel to be modeled as a generalised Poisson process. Note that if the occurrence of faults in the channel follows a Poisson law, the maximum number of transmission errors suffered by the system in a given interval is not bounded, so the probability of having sufficient interference to prevent a message from meeting its deadline is always nonzero; therefore every system is inherently unschedulable.

Hence Navet’s analysis does not try to determine whether a system is schedulable (as [5, 8]), but it calculates the probability that a message does not meet its deadline. Obtaining such a probability, named Worst Case Deadline Failure Probability (WCDFP), gives a measure of the system reliability, because a lower value of the WCDFP implies a high resilience to interference.

Navet’s analysis uses the scheduling analysis of Tindell to calculate the maximum number of faults that can be tolerated for each message before the deadline is reached. This number is called and only depends on the characteristics (length, priority, period, etc.) of the message set. The worst-case response time that faults would generate is called . Once and are obtained, they are used with the fault model to find the probability that a message may miss its deadline. Navet defines the WCDFP of a message as the probability that more than errors occur during . This probability can be analytically calculated as the fault model assumed by Navet is a generalized Poisson process.

The main drawback of the analysis is that it includes two inaccuracies which increase the pessimism in the estimation of the WCDFP. The first source of pessimism is implicit in the definition of WCDFP. The definition of WCDFP does not properly reflect the conditions in which a message can miss its deadline. In order for a message to miss a deadline, faults in the channel is required to occur while the message is queued or in transmission; a fault occurring after the message has been received cannot delay the message. This condition is more restrictive than the condition used in [25], which is that errors occur at any time during the maximum response time of the message, independently of whether the message has already been received.

The second source of pessimism is an overly pessimistic assumption about the nature of burst errors where a fault causes a sequence of bits to be corrupted. In Navel’s analysis, a burst error of duration “” bits is treated as a sequence of single bit faults [25,], each causing a maximal error overhead (an error frame and the retransmission of a frame of higher or equal priority). This assumption is inconsistent with the CAN protocol specification [22] since in reality a burst error can cause retransmission of only one frame, because no message is sent again until the effect of the burst is finished. This causes pessimism of several orders of magnitude.

A different method to calculate probability of deadline failure in CAN under fault conditions is proposed in [9]. This work points out that errors happening during bus idle do not cause any message retransmission, and therefore those errors cause interference lower than the interference typically considered in scheduling analysis. To avoid this source of pessimism when performing scheduling analysis, the effect of errors is modelled with a fixed pattern of interference; this is a simplification of the fault model presented in [8]. Due to this determinism, interactions between messages and errors can be analysed through simulation, and then the probability of having a message that misses its deadline can be determined. Nevertheless, this method has important drawbacks. First, an interference pattern for every possible error source is hard to be determined. And second, combination of several error sources increases the complexity of the analysis to such an extent that it becomes infeasible, so random sampling is used.

Modelling arrivals of errors with a random distribution, as done in [10], allow a more generalized solution. Broster et al. [26] propose an analysis that provides an accurate probability of deadline failure without excess pessimism, based on the assumption that faults are randomly distributed.

In [27], an approach is presented to tightly bound the reliability for periodic, synchronized messages. Therefore, a reliability metric is defined which denotes the probability that CAN communication survives time without a deadline miss. The reliability is calculated based on the hyperperiod, which is the time when the activation pattern of a periodic message set repeats itself. It is defined by the least common multiple over all periods. Hence, the complexity of the algorithm depends on the amount of activations in the hyperperiod. This algorithm is suitable for automotive message sets in which periods are typically multiples of 10 ms. However, if messages are not synchronized, or the relative phasing is unknown, the approach is not applicable. In [26], the busy-window approach is used, and a tree-based approach is presented, where different error scenarios are evaluated iteratively. In a second step, these scenarios are translated to probabilities and a worst-case deadline failure probability is calculated. The approach was extended in [28], and the tree-based was superseded by a simpler, more accurate approach. However, both methods [26, 28] allow only deadlines smaller than the periods, which is a limit for practical use since bursty CAN traffic is not supported. In [24], existing methods are generalized to support arbitrary deadlines and derive a probabilistic response time bound.

3.2.2. Error Model

In [24, 26] the occurrence of errors is modeled by using a Poisson model. Practically, a Poisson process models independent single bit errors (without bursts), where specifies the bit error rate. The probability for the occurrence of m error-events in the time window is

It is possible that a message of length is hit by multiple error events and only one retransmission occurs (e.g., after reception when the CRC is checked), but it is assumed that in the worst-case condition, each error event will lead to exactly one retransmission. Thus, we can directly use (25) to obtain the probability that error events occur during a given time window, and the probability for the error-free case is For , it is not enough to just calculate , because error events have to occur in certain segments of the busy window, and more efficient technique was used in [27], which can be applied for the general case in which a busy-window includes multiple queued activations which can be affected by errors. The approach works as follows: one error-event in the entire busy window can happen in two ways. The error may actually lead to an busy window with the probability. Or, we face a busy window of length and the error event occurs in the interval (): The value of can then be obtained by rearranging the equation. Similarly we can apply this idea to . Two errors in the time window may occur in the following mutually exclusive ways. (i) A busy window of length actually occurs assuming two error events with the probability . (ii) , occurred which implies exactly one error in and the second error must then happen in the interval (; ). (iii) occurred which implies no error in . And exactly two errors must be in the interval (, ): By rearranging the equation for , we get the probability for a busy window. The same argument is valid for the following -error busy windows, and (28) is generalized into the following form: The worst-case response time exceedance function can be calculated as Practically, this function denotes a bound for the probability that a response time exceeds a certain threshold, and the probability that a deadline is exceeded can be bounded to .

4. Conclusion

In this review paper, the worst case response time analysis of messages in controller area network and the probabilistic response time analysis of CAN messages are reviewed. The worst-case response time analysis includes the worst-case response time analysis presented in early 1990s by Tindell et al. [25] and the worst case response time analysis by Davis et al. [1] in 2007. Davis et al. in [1] have pointed out the flaw in the earlier analysis by Tindel et al. and showed that multiple instances of the CAN messages should be analysed to determine the response time and hence the schedulability of the CAN messages. The worst-case response time analysis leads to excessive level of pessimism; we may choose a pessimistic approach but with as little pessimism as possible, since worst case does not always occur. The probabilistic response time analysis of CAN messages is recommended; here two approaches are considered [6, 7], namely, instead of using the worst-case bit-stuffing pattern, we can consider a distribution of possible bit stuffing according to the application and select one most probable bit-stuffing pattern, thereby we are less pessimistic; another probabilistic approach is considering the probability of occurrence of errors [2426]. In worst-case analysis, it is assumed that every error flag transmitted has a retransmission associated, whereas this is not true, since the same error can cause many error flags and only one retransmission. This assumption causes some level of pessimism. There are different methods presented in [6] whereby we can reduce the number of stuff bits, either by using XOR operation on the messages before transmission (encoding) and redoing the XOR after reception (decoding), thus avoiding having continuous bits of zeros or ones, thereby avoiding bit stuffing. The other method presented in [6] is to choose the priorities such that the identifier bits do not have continuous ones or zeros, thereby avoiding bit stuffing. Of course in this method the number of priorities that can be used is reduced.

Another approach in making the best usage of the bandwidth is to schedule the messages with offsets, which leads to a desynchronization of the message streams. This “traffic shaping” strategy is very beneficial in terms of worst-case response times [29, 30]. The Worst-Case Response Time (WCRT) for a frame corresponds to the scenario where all higher priority CAN messages are released synchronously. Avoiding this situation and thus reducing WCRT can be achieved by scheduling stream of messages with offsets. Precisely, the first instance of a stream of periodic frames is released with a delay, called the offset, in regard to a reference point which is the first time at which the station is ready to transmit. Subsequent frames of the streams are then sent periodically, with the first transmission as time origin. The choice made for the offset values has an influence on the WCRT, and the challenge is to set the offsets in such a way so as to minimize the WCRT, which involves spreading the workload over time as much as possible. The future work is to present the review of statistical approach to response time analysis. It is proposed that a fusion of methods may be adopted to cater to the requirement of the application; for safety critical application like automotive and industrial application, the worst-case response time analysis is recommended, and for noncritical applications where we can introduce some tolerance we may apply the probabilistic response time analysis.