Abstract

The real-time operating system (RTOS) has a wide range of application domains and provides devices with the ability to schedule resources. Because of the restricted resources of embedded devices and the real-time constraints of RTOS, the application of cryptographic algorithms in these devices will affect the running systems. The existing approaches for RTOS ciphers’ evaluation are mainly provided by experimental data performance analysis, which, however, lack a clear judgment on the affected RTOS performance indicators, such as task schedulability, bandwidth, as well as a quantitative prediction of the remaining resources of RTOS. By focusing on task schedulability in RTOS, this paper provides a timed automaton-based quantitative approach to judge the feasibility of ciphers in embedded RTOS. First, a cryptographic algorithm execution overhead estimation model is established. Then, by combining the overhead model with a sensitivity analysis method, we can analyze the feasibility of the cryptographic algorithm. Finally, a task-oriented and timed automaton-based model is built to verify the analysis results. We take AES as a case study and carry out experiments on embedded devices. The experimental results show the effectiveness of our approach, which will provide specific feasibility indicators for the application of cryptographic algorithms in RTOS.

1. Introduction

1.1. Motivation

For applications in critical fields such as industrial control systems (ICSs), aerospace, national defense and military, and communications, most of the embedded devices play an important role and undertake a large number of operation and control tasks. In embedded devices with complex functions, embedded systems are generally used to improve resource utilization. To take the needs of real-time control into account, embedded devices are usually equipped with RTOS, including C/OS-II, eCos, FreeRTOS, Mbed OS, RTX, and Vxworks. In recent years, with the development of the Internet of Things (IoT), increasing embedded devices are connected to the Internet for data transmission and realize more functions. From the perspective of security, this also brings new challenges. As a key component, the security of embedded devices is related to the overall security of the system, so it is necessary to add security technology protection to embedded devices.

However, due to its resource constraints and real-time restrictions of RTOS, most embedded RTOSs will encounter an adaptability problem when implementing security technologies. Taking cryptographic algorithms as an example, in embedded RTOS, the application of ciphers will cause additional overhead based on the original tasks, which not only leads to an increase in the utilization of hardware resources such as processor, RAM, and FLASH, but the longer execution time after encryption also affects the schedulability of tasks in the system and further affects the RTOS real-time performance even at the normal realization of system functions.

It can be seen that when the current common cryptographic algorithms are implemented on resource-constrained devices, the performance of the embedded RTOS may be clearly affected by them. At present, the most common cryptographic algorithms are aimed at PCs, servers, and smart mobile devices that do not require high real-time performance and have sufficient resources. Therefore, how to ensure that they can be effectively deployed in embedded RTOS has become a new challenge. Furthermore, how to quantitatively evaluate the cryptographic algorithm configuration allowed by embedded RTOS has also become the focus of security researchers.

On the evaluation of cryptographic algorithms on embedded devices, the performance [1, 2] of cryptographic algorithms on different devices and simple models of cryptographic overhead [36] are presented in prior work. Meanwhile, in the task scheduling problem of RTOS, the existing work indicates that there is a sensitivity analysis method, which can calculate the feasible region of the task set. This brings us to present: we wonder whether we can combine the cryptographic algorithm overhead model and the sensitivity analysis method to achieve a quantitative measure of the cryptographic algorithm feasibility in the RTOS task set. This is the primary motivation for this study.

1.2. Our Contribution

In this paper, we present a method to provide a quantitative evaluation of the feasibility of ciphers in embedded RTOS. First, we propose a model for symmetric cryptographic algorithm overhead prediction; then, we combine the sensitivity analysis method to analyze the feasibility of the cryptographic algorithm. Finally, the model based on task automaton is used to verify the analysis results formally. Our major contributions can be summarized as follows:(i)We analyze the encryption and decryption processes of symmetric ciphers () and construct a model according to the instruction set of the application platform to predict the time overhead of the cipher execution(ii)By applying the sensitivity analysis method to handle the scheduling problem, we analyze the parameter feasible space of the original task set and combine the cost model to quantify the performance upper bound of the cryptographic algorithm in this task set, that is, the upper throughput bound(iii)We use the labeled transition system () to give the definition of schedulability of the task automaton model and verify our analysis results by modeling the case in the formal verification tool Uppaal to evaluate the impact of ciphers on the real-time performance of the system(iv)We port FreeRTOS to a real embedded platform and implement our case. The tracked running data prove the practicability of our proposed evaluation method.

1.3. Organization

The rest of this paper is organized as follows: Section 2 introduces RTOS and the execution on STM32F103 core Cortex-M3. Section 3 introduces the symmetric cryptographic algorithms in detail and models the algorithm’s overhead estimation on the platform. Section 4 defines the task automaton and the quantitative definition of schedulability. Section 5 mainly describes the experimental method of this paper, including embedded experiments and validation of the automaton model with formal tools. Before giving the conclusions and prospects in Section 8, we present the experimental data and analyze our evaluation model in Section 6 and discuss prior work related to this paper in Section 7.

2. Background

2.1. RTOS and FreeRTOS

A real-time operating system (RTOS) is a kind of operating system that guarantees the completion of functions within a certain time limit. RTOS is divided into hard real time and soft real time. Hard RTOS requires the operations to be completed within the specified time, which is guaranteed by the design of the operating system, while for soft RTOS, the operations are completed as quickly as possible according to the priority of the tasks. A real-time task scheduler must be included in an RTOS to allocate CPU time strictly according to priority.

In RTOS, a task is an executable program unit. Tasks can be divided into two types: periodic and nonperiodic, depending on whether there are periodic changes during task execution. On the one hand, for periodic tasks, the external device periodically sends out an excitation signal to the computer, requiring it to execute cyclically according to a specified period so as to periodically control a certain external device. On the other hand, nonperiodic tasks are different. Excitation signals sent by external devices have no obvious periodicity, but they must be linked to a deadline. It can be divided into two parts: start deadline (task must be executed before a certain time) and completion deadline (task must be completed before a certain time).

The task cycles of periodic tasks generally include task initial offset, deadline, WCET (worst-case execution time), and WCRT (worst-case response time). A periodic task model can be expressed, as shown in Figure 1.

In the embedded field, RTOS is widely used. The application of embedded RTOS can make more reasonable and effective use of CPU resources, simplify the design of application software, shorten the system development time, and ensure the real-time performance and reliability of the system better. FreeRTOS is a mini RTOS kernel with the following characteristics: open-source code, portability, scalable, and flexible scheduling strategy. It can be easily ported to various embedded devices. FreeRTOS has no limit on the number of system tasks, and meanwhile, the scheduling algorithm supports priority and round-robin scheduling algorithms.

2.2. Execution on STM32 Cortex-M3

The internal structure of the CPU divides the execution of an instruction into three stages: fetch, decode, and execute. In the first phase, one or more instructions are loaded from the program counter points. Then, in the next state, the instruction is decoded, the opcodes in the instruction register are taken out, and the instruction is analyzed. After recognition, the instruction will be executed. During execution, it may also need to access the memory bus, depending on what the opcode does.

The popular 32-bit Cortex-M architecture uses the ARM Thumb-2 instruction set [7]. This instruction set opcodes are encoded in a narrow 16-bit or a wide 32-bit form. Most of the instructions in Cortex-M require one or two clock cycles to execute, and for load/store operation, there is an additional overhead that is proportional to the amount of data to load or store.

Like the other architectures in the Cortex-M series, M3 is based on Harvard memory architecture. This means that data and instructions are accessed through different buses, so the loading, decoding, and execution of instructions can be parallelized. This feature is called pipelining, as shown in Figure 2. When the CPU executes the first instruction, it can fetch the next instruction in the same clock cycle. However, if the CPU has prefetched the instruction and fills the pipeline, but it notices that the prefetched instruction is not the next line to execute during the decoding phase, the pipeline needs to be refreshed and repopulated with the correct instruction.

Besides pipeline operation, bus contention and cache also have a great impact on the actual runtime of the system. When devices on the bus are working beyond their designed speed, the bus mediation may fail, which leads to bus competition. Cache misses are associated with the overhead of performing actual reads from the bus because the access to cached data is usually the orders of magnitude faster.

STM32 series products have an official cryptographic algorithm library. It provides a series of compact implementations of cryptographic algorithms [8] on the platform. We decompile the binary files generated by compiling the cryptographic algorithm library to get the assembly code under the Thumb-2 instruction set; then, we analyze the code structure based on the implementation of the cryptographic algorithm. We also classified the instructions for the Thumb-2 instruction set in Table 1.

In Table 1, represents the pipe reload. Branches take 1 cycle for instruction and then pipeline reload for target instruction. Nontaken branches are 1 cycle totally. Taken branches with an immediate are normally 1 cycle of pipeline reload (2 cycles total). Taken branches with register operand are normally 2 cycles of pipeline reload (3 cycles total). means that the pipe reload is counted conditionally when the PC is destined or loaded. represents the count of elements. Generally, load-store instructions take 2 cycles for the first access and 1 cycle for each additional access. Stores with immediate offsets take 1 cycle.

3. Cryptography Algorithm Analysis

3.1. Cipher Analysis Model

To evaluate the impact of the use of cryptographic algorithms on embedded devices, we need to model the resource usage of these ciphers in terms of clock cycles. For a certain microcontroller, the software implementation of cryptographic algorithms is finally determined by the corresponding instructions, and the overhead of the assembly instruction is also fixed. On this basis, we propose an instruction-level overhead model for cryptographic algorithm implementation. The dominant idea of this method is to extract and simplify the assembly instructions in implementing the ciphers; then, by counting the number of instructions and clock cycles required for execution, we can estimate the computational overhead. Essentially, a mathematical model for the overheads of ciphers can be expressed as follows:where represents the overhead of a kind of the cryptographic algorithm, and are the overhead and occurrence number of each kind of assembly instructions.

From the perspective of assembly instruction-level analysis, we can use the instruction cycle to estimate the running cost of the program. For a cryptographic algorithm, we can abstract the model that expresses its structural features according to the algorithm flow. Taking the block cipher as an example, it can be represented by the following model:where represents the overhead required to process a piece of content with a length of , , , and are the overhead of these 3 phases of the cryptographic algorithm, includes all other overheads except block processing, including initial and finish overhead, and are the overhead of each round and the number of rounds, respectively, is the overhead of processing and appending blocks, which depends on the block size and cryptographic mode it applied, and is the number of blocks.

3.2. AES-Rijndael

Rijndael is an iterative block cipher with a variable block and a key length. AES (the Advanced Encryption Standard) [9] is a special implementation of the Rijndael algorithm. The block size of AES is fixed to be 128 bits (16 bytes), the key can be 128, 192, and 256 bits (16, 24, and 32 bytes), and the number of rounds can be 10, 12, and 14, respectively.

The input of the AES algorithm is a 128-bit block mapped to a byte array named “State”, and the operations of AES encryption and decryption are performed on the State. The number of rounds depends on the key length; a 16 -byte key corresponds to 10 rounds, a 24-byte key corresponds to 12 rounds, and a 32-byte key corresponds to 14 rounds. The first rounds comprise 4 different transformations: SubBytes, shiftRows, Mix Columns, and AddRoundKeys, while the last round only contains 3 transformations except the Mix Columns.

The Key Expansion for AES-Rijndael is a simple expansion using XOR and cyclic shifts, which takes a word as a basic unit (a word is 4 bytes), and the purpose is to expand the input key into subkeys.

3.2.1. AES-Rijndael Model

For the AES implementation on an instruction set, according to the above formula, it can be expressed in the following form:

In this model, we use the instruction-level overhead expression in formula (1) to replace the content in formula (2), and then, we get formula (3). The number of instruction cycles in the Thumb-2 instruction set can be queried in Table 1. The is the basic frequency of the microcontroller in the running platform. We can calculate the clock cycle, which is the reciprocal of the basic frequency. Based on formula (3), we can estimate the AES execution time [10] of a specific configuration on Cortex-M3.

3.3. Model Analysis

To evaluate the accuracy of the model, we performed it on an embedded platform STM32F103VE as a concrete example. We use different modes of the AES [11] algorithm to encrypt 64-byte plaintext on Thumb-2. Under our model, the statistical results of the amount of instruction groups are shown in Table 2. Figures 3 and 4 show the comparison between the actual encryption process overhead and our predicted results. Figure 3 mainly shows the difference between the predicted and actual value when the length of the plaintext is fixed to 64 bytes. The predicted execution time is greater than the actual execution time with several configurations of the AES. This is because we chose the longest program flow to calculate in the instruction-level static analysis of the AES assembly file, so our prediction can be viewed as the WCET of this AES implementation for a specific plaintext length. While Figure 4 shows the distinction of different plaintext lengths, we can see that the predicted value of our model is still higher than the actual execution time in terms of changing plaintext length. It can be seen that our model can more accurately predict the implementation cost of the AES algorithm on the instruction set.

4. TA Model and Schedulability Analysis

4.1. Timed Automaton (TA)

Timed automaton is a set of theories for modeling and verifying [12] real-time systems. Many verification tools (such as Uppaal [13]) are based on the theory of timed automaton. A timed automaton is a finite automaton (a graph composed of a finite number of nodes or locations and edges) attached to several real-valued clocks. The transition between the states can only occur if the clock constraints are met. Timed automaton is an abstract model of the time-sequential [14] system. In this paper, the task set scheduling problem in RTOS is our focus, so we choose timed automaton theory for modeling.

Definition 1 (timed automaton). Assuming a finite set of real-valued clocks and is the set of constraints on the clocks in . The clock constraints (guards) are conjunctions of expressions of the form and , where and . A timed automaton over the set of clocks is a tuple , where(i): a set of locations that represent the system status(ii): the initial location(iii): a set of actions(iv): a set of clocks. All clocks are initialized to 0 at and may be reset after executing a transition.(v): a function assigning each location with a clock constraint (a location invariant)(vi) is the set of transitions, where transition from location to location labeled with action is executed only if guard is true and clocks in are resetA transition in TA is denoted by , where and represent the source and destination locations, respectively; is the action taken during the transition; is the guard clock that should hold to execute the transition; is the subset of clocks to be reset when the transition is over. When a timed automaton , we use to represent its system status, initial locations, set of actions, and set of clocks. It is similar for other automatons.

Definition 2 (timed automaton semantics). Let be a timed automaton with the configuration , where location and clock valuation . The semantics are defined by a as follows:(i): a set of state(ii): the initial state of TA, (iii): the transition relation such as–-(i)-(ii)When an action received or sent a clock value that satisfies the clock guard (), the transition will happen, and the TA will change its location as well.

4.2. Timed Automaton with Tasks
4.2.1. Tasks

We consider a multiprocessor system with identical processors and a set of independent, preemptive real-time tasks . A real-time task , where , is represented by a tuple as follows:where is the task identification, is the period (or equivalently, the minimum interarrival time), is the relative deadline, is the initial offset, and is the priority, smaller values of indicate lower priorities. Without loss of generality, we assume global fixed-priority scheduling and that so that is the highest priority task. is the resource, and are the best and worst execution time of the task, respectively, such that and .

We shall use to denote a task instance, and can also be shown as . Then, a task queue can be denoted by a poset ordering by the priority of task set . A set of all task queues including instances of tasks from is denoted .

4.2.2. Scheduling Strategy

The research of this paper is scheduling problems on single-processor systems, and the task queue mentioned in the previous part can be assumed as a list sorted with a specific scheduling strategy. The first element in this list is the task executed by the processor, and the others are tasks waiting for resources. A scheduling strategy can be assumed as a function that can insert released tasks into the task queue, such as (first in first out), (fixed-priority scheduling), and (earliest deadline first).

A scheduling strategy can be indicated by . this function can insert a task instance into the task queue and keep the other tasks in the queue preserved when receiving a task instance and a task queue. Finished tasks are removed from the task queue, and the tasks must finish before its WCET.

4.2.3. Task Automaton

The task automaton model is based on the timed automaton and adds a clock named . It resets at the time when a task finishes its computation removed from the queue and prepares to start a new period.

Definition 3 (task automaton). A task automaton is a timed automaton extended with tasks. A task automaton over actions , clocks , and tasks is a tuple , where(i): a set of locations that represents the system status(ii): the initial location(iii) is the set of transitions, indicated by edges(iv): a function assigning each location with a clock constraint (a location invariant)(v): a partial function assigning locations with tasks(vi): the clock which is reset when the task finishedA transition in task automaton is denoted by .
We use to denote the set of clock assignments for ; a semantic state of an automaton is a triple , where is the current location, denotes the current values of clocks, and is the current task queue. By , we denote a clock assignment such that for all clocks . We can then present the operational semantics in .

Definition 4 (scheduling semantics). Given a scheduling strategy , the semantics of an automaton is a labeled transition system with an initial state , and the transitions are defined by the following rules:(i)(ii),(iii),(iv)where denotes the queue with the task instance inserted into (at the first position), denotes the empty queue, and is a distinct action name.

Definition 5 (reachability). We can use to indicate for an action or for a delay or . We define that is reachable for with the scheduling strategy by .
As all deadlines in task automaton are hard, we define schedulability for a given scheduling strategy as an impossibility of reaching a state where some deadline is missed. We use to denote queues containing a task instance with .

Definition 6 (schedulability). A task automaton with the initial state is nonschedulable with if for some and . Otherwise, we say that is schedulable with . More generally, we say that is schedulable if and only if there exists a scheduling strategy with which is schedulable.
If each task automaton in the task set is schedulable in terms of scheduling strategy and system resources, the task set is schedulable. Fersman et al. and Krcal. [15, 16] proved that the problem of checking schedulability relative to a preemptive scheduling strategy is decidable for task automaton with fixed computation times [17]. Therefore, we fixed the execution time of the task to WCET [18] so that the task automaton is decidable.

4.3. Schedulability Analysis

One of the most interesting properties of task automaton is schedulability. In Definition 6, we define schedulability in the task automaton; then, in this section, we study the schedulability problems with cryptographic parameters [19] related to the overhead model presented in Section 3. Merely, yes or no answers about schedulability do not provide useful results to help designers understand how changes in task parameters will affect the schedulability of the system. [20] Therefore, the schedulability analysis [21, 22] should show the amount of slack in each task that can be used to plan other actions and accurately measure the allowable changes. Thus, we not only consider whether the system is schedulable but also apply sensitivity analysis [23] to provide information about how changes in task WCET parameters affect the system. Starting from a feasible task set, sensitivity analysis will provide an accurate amount of change that can be tolerated in the task calculation time to keep the task set feasible.

4.3.1. Feasibility Region in the Space

The space where each coordinate represents a task’s WCET () is called space. When working in the space, we assume that the variable is only , and the period (), deadline (), and other parameters of tasks are fixed. The origin of space is .

In our feasible region analysis, the task set is regarded as a point in a specific space of task parameters, just as space for WCET feasible region analysis. Therefore, the judgment of schedulability can be equivalent to verifying whether a point belongs to the feasible region of the task set. In this way, the relationship between task parameters can be established, and the feasible value of for a specific task can be determined. In particular, using the FPS scheduling strategy, the feasible region of the task set is defined in the following way:

Lehoczky et al. [24] completed the first attempt to analyze and characterize the region in space. In their work, the deadline is assumed to be equal to the period, and they proved the following theorem:

Theorem 1. Given a periodic task set under fixed priorities,(i) is feasibly schedulable if and only ifwhere .(ii)The entire task set is feasibly schedulable if and only if

Theorem 2. This can be restated in a more expressive form:

Theorem 3. This provides a way to express the feasible region of under rate monotonic (RM) priorities, which is optimal in static scheduling [25]. It can be expressed by the following theorem in the form of logical operators:

Theorem 4. When is equal to , the region of the schedulable task sets is defined as follows:where .

Bini and Buttazzo [26] simplified the above theorem and significantly reduced the number of equations required to define the region. Theorem 3 is proposed to facilitate the establishment of the relationship between the task set parameters, by which we can infer the feasible value of .

Theorem 5. Given a periodic task set under fixed priorities, when is equal to , its feasible region can be defined as follows:where is a set of scheduling points defined as , and is given by the expression as follows:

4.3.2. Sensitivity Analysis in Space

Changes to the task content in RTOS often lead to changes in the schedulability of the task set. In many cases, people hope to accurately describe the slack degree of task scheduling for system scheduling problems. At this time, sensitivity analysis [2729] is very useful. For example, when a developer wants to add a new function to a task, sensitivity analysis can give the allowable range of the task, which can help change the task without affecting the schedulability of the task.

Continuing to focus on space, a task set can be expressed as a point in space, where . Changes to tasks in the task set will lead to linear changes of , and the modified point can be expressed in the following formal form:

is a non-negative vector used to set the direction of change, and is a scalar used to measure the change from the original position. In this paper, only the schedulability after the modification of the original schedulable task set is considered, so .

When the change in the task set is determined, and is also determined accordingly, the satisfaction of the task set schedulability condition is equivalent to restricting in the feasible region so that the range of slack can be determined. If we denote the first elements of by and is consisted of the first WCET in , then is schedulable if and only if:where .

This formula provides the maximum linear slack along the direction.

Changing a task is considered in a given schedulable task set , and its WCET will change accordingly, , the direction vector in the change. Except the -th bit, which is set to 1, all other bits are all 0. We can use the above formula to calculate the maximum value of . The actual meaning of is the maximum WCET increase amount allowed by task under schedulability.where .

4.3.3. CPU Utilization Constraints

The analysis in Section 2 is based on a situation where no additional constraints are imposed. Now, adding utilization constraints to a single CPU system is considered. The ratio is called the utilization of task and represents the processor clock cycles used by that task, and the value is as follows:

is called total processor utilization; it represents the proportion of processor time used by the periodic task set. Obviously, if , there is no feasible plan for this task set. So, we can set the CPU utilization of the original task set as follows:

Then, after changing the task k, the new CPU utilization is as follows:

With constrained , is the constraints of CPU utilization, when , the CPU is overloaded. Obviously, this configuration is not feasible. The constraint of under this condition is

Combined with the sensitivity analysis in Section 4.3.2, under the condition of CPU utilization constraints, the maximum value of the task k whose execution time changes iswhere .

4.3.4. Throughput Evaluation of Cryptographic Algorithms

In Section 3, we propose a model for estimating the CPU clock cycle cost of a specific configuration of the cryptographic algorithm on the chip and the corresponding instruction set. In Sections 4.3.14.3.3, we introduce a sensitivity analysis method for determining the parameter feasible region of a task set. Then, in this section, we combine the model and the sensitivity analysis method to give the evaluation index of the application of the cryptographic algorithm in RTOS: throughput. It can reflect the performance of cryptographic algorithms.

First, Formula (3) is simplified; when the block cipher is the subject of investigation, we have

Encrypting and decrypting a piece of content with a length of is considered as the added part of the task . From the previous formula, we know

The throughput of a cryptographic algorithm is defined as the length of plaintext/ciphertext. It processes in a unit time. Then, according to the above formula, it can be concluded that in RTOS, when a fixed priority scheduling strategy is used on the premise that the task set is schedulable and the CPU occupancy rate is within , a specific configuration of cryptographic algorithm protection for task is added; then, the upper bound of throughput can be expressed as follows:

Among them, is the initial cost of encrypting and decrypting each piece of text, is the size of each block; taking AES as an example, there are three choices of 128, 192, and 256 (in bits), and is the time to process each block.

5. Experimental Method

This section explains the experimental steps of this paper, in addition to the description of the automata model in detail.

Our research goal is to determine the schedulability changes on a single-core processor before and after including a cryptographic algorithm to a specific task set; then, we can determine the applicability of the cryptographic algorithm on this hardware platform under this configuration. The process is outlined as follows:(i)Step 1. Use the cryptographic overhead estimation model in Section 3 to evaluate a cryptographic algorithm on the target platform at the instruction level and get the estimated overhead of the cryptographic algorithm under a certain configuration.(ii)Step 2. Perform sensitivity analysis on the original task set in RTOS and analyze the allowable variation of the task WCET under the guarantee of schedulability. The other parameters of the task are fixed.(iii)Step 3. Add the estimated overhead of the cryptographic algorithm in Step 1 to the task and judge whether the WCET of this task meets the schedulability constraint analyzed in Step 2.(iv)Step 4. Model the task scheduling of RTOS according to the content of Section 4, construct a set of task automata models with time constraints, set parameters for the automata according to the research case, and verify the schedulability of the task set through the automata operation.(v)Step 5. Port RTOS to the embedded platform and set task attributes. After running the system, use the Tracealyzer [30] to track the execution of the tasks and compare with the results in the previous steps.(vi)Step 6. Compare and analyze the experimental data to judge the accuracy of this process and the feasibility of this method.

The fundamental idea of the verifying step is to convert the schedulability analysis problem into the reachability problem of timed automata and use the real-time model validator Uppaal to find the WCRT and processor utilization, and check whether all deadlines are satisfied. In the response time analysis, to prove schedulability, it is sufficient to calculate the WCRT of each task and compare it with the deadline [31]. If the WCRT of each task does not exceed the corresponding deadline, then the system is schedulable and vice versa. Some tasks are periodic, and some are sporadic, but we simplify the model by treating all tasks as periodic.

We use STM32F103VE (Cortex-M3) [32] as a development platform and port FreeRTOS [33] as the operating system on this platform, which employs a preemptive fixed priority scheduler and a time slice round scheduler under the same priority. In a hard RTOS, a task that misses the deadline means that the system is not schedulable. The workflow of our experiment is shown in Figure 5.

As the cryptographic overhead prediction, automaton theory, and sensitivity analysis are mentioned in the former sections, we will introduce the model construction of task automata in detail.

We use Uppaal [34] as a formal verification tool to verify our schedulability analysis model by using verification queries [35]. We use a stopwatch to record the progress of the task, and the stopwatch will stop running when the task is preempted. In Uppaal, the stopwatch is implemented as clock derivatives, where the running rate can be set to 1 or 0.1 means the clock runs at a normal speed, and 0 means the clock stops. Syntactically, the stopwatch expression appears in the invariant constraint of the node as , where is declared as a clock type and is an integer. By verifying whether the stopwatch satisfies the clock constraints, you can check whether the task meets the real-time restrictions of the system.

5.1. The Uppaal Framework

The Uppaal framework [36] consists of the following process models: a fixed-priority preemptive scheduling strategy, a system resource model, a task model, and a model to ensure global invariance. The template we provide for the task model is parameterized using an explicit sequence of task attributes, and we can customize the protocols that the tasks share on specific resources [37]. The main modeling elements are summarized below:(i)A system resource sharing template. The system has two states: Idle and Inuse. The states switch through the interaction with tasks and scheduling strategy; therefore, the resource utilization rate can be calculated;(ii)A scheduling strategy template to process tasks in the queue according to different scheduling strategies;(iii)A task template that can set attributes according to the definition in Section 4.2 task automaton. Each task uses the following clocks and data variables:(iv)the task and its clocks are parameterized by the identifier id; the local clock x is used to control the execution status of its own tasks, and the global clock time [id] is used to control the execution status of the task set in the entire system.

5.2. Resource and Scheduler Model

Figures 6 and 7 show a model of system resources (CPU) and scheduling strategy. First, the system resource model is initialized. Since no user task is running, the CPU runs idle tasks and is in the Idle position. When a task is ready, it sends the signal ready [id]! on the resource preparation channel so that the system resource moves from Idle to an intermediate state position and then sends the signal insert_task [policy]! on the channel to insert the new task into the scheduling queue. Then, the scheduling strategy model FPS receives the signal. If the scheduling queue is not empty, the priority of the current task is compared with the priority in the queue in turn and then inserted into the task queue by insert_task_in_buffer (); otherwise, the current task is directly inserted into the head position of the task queue by insert_at (0, ready_task, id). Then, the resource selects the highest priority task from the task queue and enters the running state position Inuse.

The task releases the CPU through the signal finished [id]!. In this case, the scheduling policy model removes the task from the task queue by removeTask (). If the task queue is empty at this time, the CPU runs the idle task and moves to the Idle position. If it is not empty, the highest priority task is extracted from the task queue, and the CPU is still in the running state, Inuse position. It also uses the stopwatch usedTime to calculate the CPU usage time and then to calculate the total CPU load as .

5.3. Task Template

Figure 8 shows the task model. The task model is built according to the task automaton template proposed in Section 4.2. Each task is instantiated by , , , , , , , , and . First, the task is initialized; after initialization, a new cycle is started, and it is on the WaitingOffset position and waits for the initial offset. Then, the task is on the WaitingDepedency position and waits for the end of the predecessor task it depends on. When the dependency constraint is reached, the task will send ready[id]! to the CPU resource. The system resource and scheduling strategy will insert the task into the queue, and the task will enter the Ready position. After that, the task will switch between Running and Suspend according to its position in the task scheduling queue, at a different rate of local clock x. When completed, the task will send finished[id]! to the CPU resource, and the system resource and scheduling policy will remove the task from the scheduling queue. At last, the task is completed; the periodic task will continue to start a new cycle, and the nonperiodic task will enter the Done position.

5.4. Verification Queries

We use the following query statement to verify schedulability:(i)Check the schedulability of the system (the error location is not reachable):A[] forall (i: t_id) not Task (i).Error(ii)Find the total worst CPU usage:sup: usedTime, idleTime, globalTime(iii)Find the worst-case response times in tasks:sup:WCRT[0], ..., WCRT[n-1]

6. Case Study and Data Analysis

We present a case study to evaluate our method. The original task set configuration is shown in Table 3. The column attributes include WCET, period, deadline, priority, scheduled point, and CPU utilization of each task. Now, adding the AES algorithm to task 3 is considered to protect it without affecting its schedulability. Task 3 needs to encrypt of data and send it in each period. And the CPU occupancy rate is required to be less than . Under this condition, we evaluate the feasibility of the AES algorithm with different configurations. Figure 9 shows the algorithm’s flow diagram for the analysis of tasks related to AES.

Sensitivity analysis is performed on the original task set, and the maximum allowable change in WCET of task 3 is calculated. According to the formula, we have

We can see that in the sensitivity analysis of this task set, the factor restricting the WCET of task 3 is schedulability rather than CPU utilization. After calculating the maximum change in WCET of task 3, we calculate the upper bound of the throughput of different configurations of AES under this constraint, which can be expressed as

Through formula (24) and the instruction-level overhead prediction model in Section 3, we can calculate the upper throughput bound of AES with different key lengths and encryption modes in the task set of the case. As a reference, it can be a performance measurement for the selection of cryptographic algorithms and encryption mode. For example, for the configuration of AES-192-CBC, the upper bound of its throughput rate is , and for AES-256-CBC, the value is .

The encryption requirement of task 3 is per period, so the required throughput rate is . According to the sensitivity analysis, we can conclude that AES-192-CBC is feasible but AES-256-CBC is not. Next, we use the task automaton model to verify this conclusion.

The results of our model-based framework include three parts: task set schedulability judgment, worst-case response time estimation, and cycle-limited verification benchmark CPU utilization estimation.

Schedulability. In the task automata model, the task state that violates the time constraints will be transferred to Error. By retrieving the state space list, the formal verification tool Uppaal can transform the schedulability problem into the automata reachability problem. By using query statement 1 in Section 5.4, the correctness of our analysis results has been verified.

Worst-case response time. In the response time analysis, the WCRT of each task is calculated and compared with the deadline. This is a visual proof of schedulability that if the WCRT of each task does not exceed the corresponding deadline, the system is schedulable. Table 4 compares the WCRT of the original task set and the task set after adding AES-192-CBC and AES-256-CBC, respectively. Task 3, which adds AES-256-CBC, will enter the Error state in Uppaal, and the task will be terminated, so its WCRT cannot be estimated.

CPU utilization. In the periodic task model of RTOS, tasks do not have a fixed progress measurement. To control the running progress of tasks and even the entire task set, we use the cycle count to indicate the number of cycles up to the current time. When the cycle count reaches a preset upper limit (CYCLELIMIT), it is reset to 0 along with all global clocks.

Before resetting the cycle counter Cycle, exploring the hyperperiod (the least common multiple of periods) of all periodic tasks is an effective way to simplify the set of infinite periodic tasks. There are three task periods in the case: , , and , so the potential hyperperiod is . If the global cycle (CYCLE) is set to , the number of hyperperiods is 10 global cycles. The results are shown in Table 5.

In this case, the CPU utilization is not the major factor limiting the task WCRT, so only the CPU utilization of the task set using AES-192-CBC under different global cycles is shown. The CPU utilization is verified by query 2 in Section 5.4. When the cycle counter Cycle is 1 to 10; that is, during the first hyperperiod, the CPU utilization fluctuates. When the cycle counter Cycle is 10, the CPU utilization stabilizes to 90.861% for the first time, and the fluctuation of the CPU utilization gradually decreases. When the cycle counter Cycle is a multiple of 10, that is, when the time window taken is an integer number of hyperperiod, the CPU utilization is stable at 90.861%.

6.1. STM32 Experimental Analysis

We port FreeRTOS10.0.0.1 on the STM32F103VE platform and implement the case. We record the experimental results to judge the accuracy of our method and the correctness of task automata verification. The results are shown in Figures 1013.

Through Tracealyzer’s tracking results, we can see that in the original task set and the task set of adding AES-192-CBC and AES-256-CBC, the overall CPU usage is stable at 89.6%, 90%, and 90.6%, respectively. These are similar to automata estimation and are all lower than the required values because in this case, the CPU usage is not the key factor affecting feasibility. In Figure 13, we can see that, compared to task 3, to which the AES algorithm is directly added, the WCRT of Tasks 1 and 2 has no major changes before and after the application of the AES algorithm. There are only small fluctuations, and they are all within their deadline. It can prove the schedulability of these two tasks. For Task 3, in the original task set and after the application of AES-192-CBC, its WCRT is within the deadline and can be scheduled; but after the AES-256-CBC is added, its WCRT exceeds the deadline range, and Task 3 cannot be scheduled. This is also consistent with our analysis and formal verification results.

7.1. Cipher Overhead Evaluation

Liu et al. [4] proposed a mathematical model based on the basic operation cost frequently used in cryptographic algorithms to predict the cost of these algorithms. Since these basic operations cannot be ignored in implementation, the prediction result can be seen as the lower bound of the overhead. Bauer and Freiling [38] gave an overview of cycle-accurate simulation problems and showed a method for cycle-accurate ARM Thumb-2 simulators. The classification of the Thumb-2 instruction set was used for reference. Granelli and Boato [5] proposed a method to evaluate the complexity and computational cost of different block ciphers that are independent of the platform on which they are implemented. As a representative of resource-constrained devices, the cost evaluation and measurement of cryptographic algorithms on wireless sensor networks [2, 6, 3941] have been widely studied.

7.2. Schedulability Analysis

Yalcinkaya et al. [42] determined the sensitivity of task execution time. The paper measured the influence of parameters on schedulability and proved its correctness and optimality. Punnekkat et al. [43] provided a general method for sensitivity analysis of the task set. This method can help developers integrate changes to the system while ensuring that schedulability remains unchanged. Bini and Buttazzo [26] proposed a novel method to analyze the schedulability of periodic tasks under RM priority allocation [25]. This method can accurately describe the feasible region in the task calculation time space (C space). Afterwards, they [44] summarized a theoretical method for the sensitivity analysis of a real-time system containing a set of periodic tasks. The proposed method allows designers not only to verify the feasibility of the application but also to determine the specific operations to be performed on the design variables to ensure the schedulability of the task set or improve resource utilization.

7.3. Formal Verification

Fersman et al. [15] provided a model, task automaton, for real-time systems with nonuniform repetitive computing tasks. It is an extended version of the timed automaton, which can be used to specify resource constraints and strict timing restrictions in computation. The paper proves that the schedulability check problem related to preemptive scheduling is decidable for task automaton. Mikučionis et al. [45] proposed a modeling framework that uses Uppaal real-time model checker to perform schedulability analysis. Contributions included the modeling framework, its application in industrial case studies, and the comparison of the results with classical response time analysis.

The related work is elaborated in three parts: the overhead estimation analysis and modeling of cryptographic algorithms on embedded platforms, task schedulability and sensitivity analysis in RTOS, and the establishment of RTOS models based on timed task automaton. The related research results in the three fields are introduced, respectively. However, as far as we know, there is no method to analyze the application feasibility of cryptographic algorithm in the RTOS task set, which stimulates the thinking and research in this paper.

8. Conclusions and Future Work

In this paper, we analyze the instruction-level overhead of commonly used symmetric cryptographic algorithms on a chip, establish an automaton model based on the task set in RTOS, and introduce the parameter-based sensitivity analysis. In the experimental part, for a concrete case, we conducted a feasibility analysis of AES in different modes, and then, we construct a model for RTOS on a single-core CPU with a fixed priority preemptive scheduling strategy using Uppaal and verify its schedulability before and after adding a cryptographic algorithm for tasks. Comparing the results with the running data on a real embedded RTOS tracked by the tracealyzer tool, it shows that the estimation results of our method are substantially consistent with the actual data. It is proved that this method can effectively estimate the impact of a cryptographic algorithm on RTOS system task scheduling and provide a quantitative index for the application of cryptographic algorithms on embedded platforms.

In future work, we intend to establish a more accurate estimation model for the overhead of cryptographic algorithms and incorporate more cryptographic algorithms into our evaluation model. We also plan to conduct a sensitivity analysis of other factors affecting RTOS scheduling and study the impact of different scheduling strategies, processor frequency, jitter, and deadlines on models and analysis methods, to build a model that can describe the running process of tasks more accurately.

Data Availability

The figures, tables, codes, and models used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

We thank the support of the National Foundation for Development of Cryptography, under grant no. MMJJ20180211.