Abstract
With timedelay systems arising, timedelay system testing has attracted much attention. Additionally, evaluating the cost and effectiveness is required to make a good test strategy in practice. In this paper, we take timedelay and other five factors (state number, input number, output number, completeness degree, and accessibility degree) into account and present a timer embedded FSM (TEFSM) model to design a comparative strategy for assessing the coverage criteria and test suites generation methods for timedelay systems. We explore the impact on the average length of test suites, in which the test suites generation methods, coverage criteria, and TEFSM model parameters are involved.
1. Introduction
In real world, timedelay exists in various engineering systems such as chemical processes and long transmission lines in pneumatic systems; the study of timedelay systems has gained considerable attention over the past years [1, 2].
The attendant increase in timedelay system complexity has made the timely development of reliable systems extremely challenging. To ensure the quality and reliability of timedelay systems, model based testing technique has been used widely. ElFakih et al. present a method for deriving test suites with the guaranteed fault coverage for deterministic possibly partial timed finite state machines (TFSMs) [3]. Abramovici and Stroud present a delayfault testing approach for Field Programmable Gate Arrays based on builtin selftest (BIST) [4]. And many conformance test derivation methods are based on a specification given in the form of a finite state machine (FSM), such as [3, 5], partial () [6, 7], HIS [8], and [9] test derivation methods.
However, the project schedule is always tight and it is impossible to test a system completely. Thus, the efficiency of testing method becomes a key factor for ensuring system reliability. There are some researchers studying the evaluation of the effectiveness of system testing. Briand [10] presents a critical analysis of empirical research in software testing and provides a structured overview and discussion of the validity issues most commonly encountered while performing empirical studies of testing techniques. It gives us some guides to do experiments and analyze the validity of our research at last. Besides, there are some researchers exploring methods in the field of empirical research. For example, the literature [11] based on state charts proposes a precise simulation and analysis procedure and investigates the cost effectiveness of four adequacy criteria: all transitions (AT), all transition pairs (ATP), and all the paths in the transition tree (TT).
However, they do not have a further analysis on the effect of the parameters of FSMs to the derivation methods. Testers cannot determine derivation methods to use. Moreover, there are no papers to research these in combination (generation methods to coverage criteria, parameters to generation methods/coverage criteria).
Based on the previous discussion, aiming at decreasing the cost of timedelay systems, we present an experimental evaluation method for obtaining minimum test suite for timedelay systems. This method incorporates recent advances in THE literature [10–12]. We take timedelay and other five factors (state number, input number, output number, completeness degree, AND accessibility degree) into account and present a timer embedded FSM (TEFSM) model firstly. Moreover, by taking timedelay as an element of input, the TEFSM can be converted as normal FSM model, and this makes the evaluation task much easier. Furthermore, the goal of our investigation is focused on minimizing the length of test suite since the minor the test sequence is, the less the system cost is. We present a novel method to evaluate the efficiency of FSMbased testing method and explore the situation that a test sequence generation method may have relatively good effect for a particular or some coverage criteria and how the parameters of a FSM model affect the FSM test suite. Namely, we investigate the relationship between generation methods, coverage criteria, FSM’s parameters, and testing cost.
The rest of the paper is organized as follows. Section 2 describes some fundamental concepts such as FSM, TEFSM, and some classic coverage criteria of FSMbased systems. Section 3 presents the research method to covert TEFSM as FSM and decrease the size of test sequence for saving test cost. Section 4 presents experiments and analyzes the results. Finally, this paper makes a conclusion and presents directions for the further research.
2. Related Concepts Definition
In this section, some basic concepts are described which will be used in the proposed approach, and TEFSM model is especially described.
2.1. FSM Model
Formally, an FSM is a system where is a finite input character set, is a finite output character set, is a finite state set, is an initial state, is a transition function, and is a output controlled function [3].
If each state of one FSM respondes to all the inputs, tis is a complete FSM (CFSM); otherwise, it is a partial FSM. That is, if for all , , , , it is CFSM.
If one state of a FSM reaches different states for the same input, this is a nondeterministic FSM; otherwise, it is a determined FSM. That is, if , , , , it is nondeterministic FSM.
If the states of FSM are pairwise distinguishable, this is a reduced FSM. Otherwise, it is nonreduced. That is, if , for all , , , where indicates a set of inputs sequences, it is nonreduced FSM.
2.2. TEFSM
TEFSM model is an extension of FSM; it is used for dealing with timedelay system. In this model, a timer is embedded as it is called timer embedded FSM model. Timer means delay time of a state, timedelay is considered as a factor during state transition, timers and entry actions are associated with states, conditions and actions are associated with transitions, and time is implicitly associated with inputs and outputs [12]. The state transition occurs automatically after timeout unless it receives further inputs before timeout. Given a TEFSM as , in which indicates input that contains a timedelay, a pair of , and is a nonnegative rational and is a timedelay of state transaction, and input is applied after time .
2.3. Coverage Criteria
Coverage criteria rule the sufficient conditions for deciding when to end the testing process; that is, one test suite must satisfy one kind of testing requirements [13]. If indicates one coverage criterion and indicates a test suite, is the testing requirements set of a given for criterion , and is the requirement sets satisfied by a test suite . Obviously, it is the subset of . indicates the coverage rate of ; that is . If or , is a test suite adequate for the criterion of .
Figure 1 is an example of FSM model which has four states , , , , respectively, and eight transactions; it will be used for the following coverage criteria explained in this section.
(1) State Coverage (SC) Criterion. Cover all states of ; hence, , where is the state set of and . For the FSM in Figure 1, there are four states , , , and , and the test suite is SCadequate for all the four states that could be covered with this test suite. Note that the initial state is reachable with the empty transfer sequence, while the prefix of the test is a transfer sequence to state .
(2) Transition Coverage (TC) Criterion. A transition is a state change triggered by an input event. Such as in Figure 1, there are eight transitions between the four states [11]. To cover all transitions of , conditions and need to be satisfied.
The test suite is TCadequate for Figure 1 because this test suite could cover all the 8 transitions.
(3) Initialization Fault (IF) Coverage Criterion. Avoid confusing other states with the initial state, this is for checking whether the behaviors of other states are same as the initial state, denotes as . There might be no states that can be distinguished from the initial state [11]. However, for a minimum , there are states which are different from the initial state. We can express formally as follows:
It indicates that there are some sequences in to distinguish the initial state from other states.
For the TEFSM in Figure 1, the test suite is IFadequate. We know that the input sequence is a transfer sequence to state and is followed by , which distinguishes state from the initial state. Similarly, the sequences and also satisfy the requirements related to state and .
(4) Transition Fault (TF) Coverage Criterion. Detect whether the output or the reachable state of one transition is correct to certain input and whether the ending state could be distinguished from other states [14]. We can express formally in mathematics as follows:
defines whether there are some sequences in that could distinguish the ending states of all transitions of from other states. We can express it formally as follows:
For the example TEFSM in Figure 1, the test suite is TFadequate. Consider, for instance, the transition whose tail state is . Test covers this transition. States and are distinguished by , which follows and ; states and are distinguished by , which follows and ; and states and are distinguished by , which follows and empty sequence. Thus, all requirements related to transition are satisfied. One can check the other requirements that are also satisfied.
2.4. Test Sequences Generation Methods
There are many methods reviewed in Section 5 of the literature [15]. This part describes some test sequences generation methods.
(1) Method. It uses a characterization set to distinguish each state of the specification FSM [5]. For the FSM of Figure 1, and has the state cover set ( means empty sequence), after removing those sequences that are proper prefixes of other sequences; the test suite is
(2) Method. It serves as an improvement of method [7]. In the testing phase of transition, while it does not use the characterization set , it uses subset instead. According to the ending state of one transition, choose the corresponding state identifiers set , which uses fewer test sequences than method. For the FSM in Figure 1, , , , and . In the first phase, test sequences that check the state are .
(3) Method. It utilizes a separating family from the identifiers of each state to harmonize these identifiers [8]. For the example of Figure 1, we can obtain the separating family set , where , , , and . So, in the first phase, we get ; in the second phase, we compute . The test suite is
Besides, there are methods such as UIO presented in literature [8], DS, and proposed in the literature [16]. And also some researchers think about the derivation methods of nondeterministic FSM.
3. Research Method
3.1. Conversion between TEFSM and FSM
According to the definition of TEFSM presented in Section 2.2, TEFSM diagram could be described as in Table 1. A number of symbols are used in order to increase readability: “?” symbol for input; “!” for output; “%” for probability; “~” for condition and probability; and “Δ” for timedelay [12].
Figure 2 is an example of TEFSM model, which has three states named (initial state), , and .
There are transactions between each two states with different transaction conditions. and indicate the minimum delay time from state to state with transition conditions and , respectively. Set and ; then, the TEFSM in Figure 2 can be converted as Figure 3, which is now a normal FSM model.
In this way, theory of normal FSM could be applied for TEFSM model reasonably.
3.2. Testing Cost
Testing cost is a main aspect to evaluate testing technique performance. Briand presents that the testing cost involves two dimensions at least [10]: human effort and machine CPU time. In addition, Namin and Andrews describe the relationship among three properties of test suites: size, structural coverage, and fault location effectiveness [17].
We consider a technique more effective if it could generate minimum test suite and be highly active; hence, we make use of the length of test suites as a measure of the cost. The length of test suites means the total length of test cases within the test suite. In the literature [4], it does not consider the accessibility degree and only changes one factor to observe the experiments each time. Here, we set five parameters of FSM model as the effect factors: the number of states, the number of inputs, the number of outputs, the completeness degree, and the accessibility.
3.3. Evaluation Method
This paper mainly focus on three test derivation methods .method, as the most classic method, should be taken into consideration. method improves the method for reducing the number of test suites and thus should be considered. method is the method used in the literature [4, 14], so it is necessary to consider this method here for comparison.
Moreover, we use the coverage criteria as those considered in the literature [13]: state coverage (SC), transition coverage (TC), initialization fault (IF), and transition fault (TF) coverage criteria. When an FSM is the specification for testing, tests covering an FSM specification target one or several elements such as inputs, outputs, states, and fragments of its transition graph. Paths are typical fragments of the transition graph considered for coverage; therefore, path coverage has to be taken into account. We choose the traditional TC criterion. The latter two criteria (IF, TF) are proposed to direct at the fault models. IF criterion finds that the only possible faults in FSM implementation are related to a wrong initial state of a specification FSM, while TF criterion states that implementation faults occur in transitions. The four criteria could be representative of criteria for the FSM model.
For evaluation method, there are three ways.(1)Similar to the method proposed in the literature [1], generate a test set which is adequate for all coverage criteria as a test pool first; then, minimize it to get the minimum test set which is adequate for one specific coverage criteria.(2)Generate the test set which satisfies each kind of coverage criteria by generation method directly.(3)According to the relation among the coverage criteria shown in Table 3 of the literature [4], the test suite covering TC/IF/TF criteria almost covers the test requirements of SC (the coverage rate and the standard deviation in the first row are 1.000/0.000, 0.970/0.064 and 0.994/0.034). Since there may be some states that are not distinguishable from any other states, the latter two (IF/TF) do not cover all the states. However, we can generate the approximate minimum test set, which is adequate for the SC criterion, and then add some test sequences to the minimum test suite to satisfy other criteria. Contrast to (1), it resembles an incremental method. Because we can obtain the test suite adequate for some coverage criteria when utilizing the generation methods, so it is better to use this incremental method if the coverage criterion is very strict and one generation method is not able to generate the test set adequate for this coverage criterion. We use the first evaluation method in this paper, which we are coming to illustrate in Section 4 in detail.
After generating the final test set, we analyze the experimental data using the orthogonal testing method. The orthogonal test is useful when many factors influence one thing. Because at that situation the inputs would be many values, it would be effective through choosing a suitable orthogonal table that has advantage of equilibrium and dispersion. In this study, we choose the orthogonal table containing four levels and five factors which actually can be written as , which is shown in Section 4.2 in detail. The whole evaluation process is a general process for assessment as well. We can choose a way to solve the problem according to the situation. Figure 4 describes the process in detail.
How can we select a minimum test suite from to satisfy the test coverage criteria? The literature [13] applies greedy algorithm to deal with this problem. Test suite selection for SC/TC criteria is still a weighted setcover problem, while the selection for IF/TF criteria is defined as setcover with pairs. However, for the definition of IF criterion, we can consider the selection of the minimum test suite for IF criterion as the weight setcover problem as well.
We also define the cost of a sequence in test suite as , that is, the length of the sequence plus the reset symbol used to bring the FSM back to the initial state.
In the literature [13], every time pick up the min ratio between the cost and coverage increments induced by in , that is, . However, it may generate redundant test sequences. For example, in the case of SC criterion, the FSM model shown in Figure 3 has three states , and . The initial test set is: , . The first sequence “” could cover state and state with min ratio . So, we can add it to , and then check the second sequence “”, because the second input can make the system reach state , which is not covered before, the ratio min ratio, so it should be ignored in this round. For the second loop, add “” to it; as a result, there are “” and “.” In fact, “” is enough to satisfy the SC criterion.
To deal with this problem, we present a novel algorithm MinTSByWeight (minimum test suite by weight), which determines test sequence by weight. The algorithm is mainly based on the theory that the minor test suite it has, the less it costs. We do not need to consider timedelay in this algorithm since we already presented a method to covert TEFSM as FSM by taking timedelay as an element of input.
As shown in Algorithm 1, there are 3 inputs: FSM model, initial test set , and test coverage criteria ; and one output , which is the minimum test sequence of test object and satisfies the coverage criteria . At step (3), the weight of each test sequence or each pair test sequence is calculated based on the formula . In steps (4) to (7), select the minimum weight from the weights of all sequences or pairs and then add such sequence or pair “” or “” into the result set with steps (8)~(10). From the second round, the weights of the remaining sequences or pairs need to readjust in the initial test set by removing the cover points in or and then go to the select min weight as before. Cycle this process until the selected sequences are adequate for the criterion or having checked all the sequences in the initial set .

4. Experiments and Results Analysis
4.1. Coverage Criteria for Different Method
Premise. FSM is deterministic and completed finite state machine. When the generated FSM is nonreduced, the program does not proceed. In other words, although the automatic process of generating FSMs may produce nonreduced FSMs, this kind of model is not considered in the experiment. Therefore, we generate deterministic and completed models directly.
Coverage Criteria and Generation Method. This method includes average length of the test suite satisfying four coverage criteria (SC/TC/IF/TF) generated by method/method/method
Experiment Application Scenario. Considering that in practical applications, it is impossible to model the whole system, but some important parts, and the experiment contains eight experimental groups and sets the number of states: 3, 6, 8, 10, 12, 14, 16, and 20. Besides, we set the number of inputs to be 2, 5, and 7, respectively, and the number of outputs to be also 2, 5, and 7, respectively. Each group automatically generated 50 FSMs.
Experiment Result. The results of the three types under the three generation methods are shown in Tables 2, 3, and 4.
Figures 5(a)~5(e) show impact of kinds of methods to 4 criteria.
(a) Impact of test suites satisfy SCCriterion
(b) Impact of test suites satisfy TCCriterion
(c) Impact of test suites satisfy IFCriterion
(d) Impact of test suites satisfy TFCriterion
(e) Impact of original test suites
Result Analysis from Test Suite Length. Figures 5(a) and 5(c) show that the test suite length is, in low degree, affected by the generation methods. Almost all groups can obtain the sequencesuites with nearly the same length in average. Figure 5(b) shows that the gap between method and method under TCadequacy is small, and method can generate a test set with shorter length. Figure 5(d) shows that the impact trend of these three methods on the TFcriterion is similar, but the average length of test suite with TFadequacy generated by method and method is relatively longer. Figure 5(e) shows the difference of the initial test suite average length among the three generation methods. The method generates the longest test suite length, and the length gap of suite derived by method and method is insignificant, but method could generate less test suite because it does not need to harmonize the state identifiers. Moreover, method generates shorter TCadequacy test suite and TFadequacy test suite as well as shorter initial test suite.
Result Analysis from Effort. When choosing the minimum set from a large pool suiting to each coverage criteria, the reduction rate is different. If the reduction rate is low, we could directly use the initial test suite instead of choosing the minimization set to improve the test efficiency, because the SC/IF criteria can be covered by fewer test sequences, while the TC/TF criteria need more test sequences to be covered. We only consider the latter two criteria under different derivation methods. The impact of each derivation method on the TCcriterion is shown in Figure 6(a), and the impact on TFcriterion is shown in Figure 6(b).
(a) Efforts of minimizing test suite satisfied TC Criterion
(b) Efforts of minimizing test suite satisfied TF Criterion
From Figure 6, we know that choosing the minimization test suite needs the largest effort. For the TCcriterion, Figures 5(b) and 5(e) show the reduction intensity to remove the useless sequences and Figure 6(a) reflects the high reduction rate, while there is little difference of the reduction rate between the other two generation methods (method and method) when choosing the minimum set to satisfy the TFcriterion. However, as for TCcriterion, when the number of inputs is small, the gap is small. But, when it is large, there is a clear gap. For example, if the groups of 7, 8 and 9 (the number of states, the number of inputs, and the number of 22 outputs) are 8/2/2, 8/4/7 and 8/5/5, respectively, in group NO. 7, the results selection rate is 0.62637 for method and 0.7059 for method, while in the NO. 8 the results selection rate is 0.31609 for method and 0.6409 for method. So, if the input number is large, method can be used to improve the quality of the test sequences.
4.2. The Impact of FSM Parameters on Coverage Criteria
Consider five factors: the number of states, the number of inputs, the number of outputs, the completeness degree, and the accessibility degree. From the results above, for CFSMs, the minimization cost of method is the minimum. Hence, here, we choose method as generation method.
Using the orthogonal experimental method, we choose some representative samples from all level combinations for experiments and then find out the optimal level combination by analyzing the results of these samples to comprehend experiments completely.
In this experiment, there are 4 levels and 5 factors. Each group generates 500 FSMs automatically. We control the level of the number of inputs and accessibility degree related to the number of states to make the FSMs more similar to the reality. As one level among these factors is on the same situation, though the level values among each factor may be different, the control would not lead to the inequality among the factors.
Table 5 shows the levels for each factor and Table 6 shows the design scheme (number in brackets indicates level number).
Experimental Results Analysis. Applying the rangeanalysis method, we can obtain some values such as , , , , and ; from these range values, we can easily know . Also, the factor of the number of states has the greatest impact on the test sequences length, followed by the number of inputs and the number of outputs (outputs will affect the verification of transitions). By the orthogonal design tool, we can get the effect curve shown in Figure 7(a). The effect curve describes the impact for SCcriterion as shown in Figure 7(b).
(a) Effect curve 1
(b) Effect curve 2
Figure 7 shows that the number of states for the average length has a great impact, while the other three parameters have little effect on the average length of the test suites which satisfy the SCcriterion. The number of inputs has a positive influence on the average length as Figure 7(a) shows. And the number of outputs increased results in the growth of the length of the test sequences, because the more the inputs, the shorter the length of characterization sequences and the fewer the number of such sequences. As each derivation method is closely related to the sequences reaching each state, the impact of accessibility degree emerges as a wave form but is not positively correlated. Figure 7(a) shows that when the accessibility degree is 0.8 times as long as the maximum length (the number of states − 1), the test suite length nearly reaches the maximum. When it equals the 60 percent of the maximum length, the test suite length nearly reaches the minimum. After all, represents the length of each sequence reaching one state.
Because the range analysis cannot distinguish the data changing caused by test conditions during the testing from that caused by data errors, without respect to whether the factor effect is significant, we cannot figure out the accuracy of analysis. Next, we perform the variance analysis of the impact of factors and errors, and let the column of accessibility degree be the column here that the experimental errors exist. Result is shown in Table 7.
We can see that the number of states has a most significant impact on the length of the test suites.
To sum up the above results, we draw the following conclusions. (a) For SCcriterion, IFcriterion, the length of the test suites has little difference among the derivation methods. method can produce a relatively shorter length of the test suites for TC coverage criterion adequacy, but it needs to pick out a relatively small part of the whole initial test suite. To satisfy the TFcriterion, method has some advantages. Relatively speaking, it generates the smallest test set length, and when choosing the minimum test suite, the number of removed sequences is few, so when testing, if the coverage criterion is TFcriterion, we can use method directly and do not need to select the minimal criterion adequacy test suite. Moreover, if we only need to satisfy TCcriteria, we can make a tradeoff and consider the method. Because method is influenced by the number of inputs of FSM in the process of selecting the minimum test set for TCcriterion, and in case the number of test sequences derived by method is more than that derived by method, the rate of the sequences needing to be removed is relatively stable. (b) One of the FSM model parameters, the number of states , has a most significant impact on the length of the test suites. With the number of states increasing, the length increases much faster. In addition, the increase of the number of inputs will cause an increase in length, but the change rate is lower than that in the situation of increasing the number of states. And as the number of outputs increases, the length decreases. Because the increase of the outputs makes the length of the sequences distinct, the states are short and the number of distinguishable sequences is limited; and when the accessibility degree is up, the length changes as a wave form. So, when designing the complete FSM models, in the conditions of the same number of states, inputs, and outputs, if we try to make the accessibility degree be 0.6 times as long as the maximum length (the number of states − 1), the length of the generated test set will be fluctuating around the minimum.
4.3. Threats to Validity
The FSM considered in the experiments is generated randomly, and we only consider the complete and deterministic FSMs together with timedelay factor, so we could not make sure these would be close to the FSM specifications in the actual project. Therefore, the conclusions drawn from the random FSMs may not be well applied in the actual situation. In the future work, it needs further validation in real situation and to extend our study on the nondeterministic FSM and the partial FSM.
Measuring the cost of testing in such an experimental context is also a challenge. Briand presents that the testing cost involves at least two dimensions: human effort and machine CPU time, details of which are in Figure 3 of the literature [10]. It is rather clear that test suite size, regardless of how it is measured, is a very rough cost measure. Moreover, the literature [11] use the cumulative length of a test set to measure the cost for it’s easy to count. Though many follow the simple way to measure the cost, it would be better to consider other factors.
We can assess the ability of the test criteria not only by the test suites average length but also through the malfunction versions generated by the use of mutation operators from a specification FSM together with timedelay. In this case, we compare their variation scores that measure the number of variants killed and use it to assess the fault detection capabilities of various test suites generation methods. However, it is only used for finding the faults. From here, it can be reflected that the former (the test suites average length) mainly evaluate the cost and the later (variation scores) mainly evaluate the effectiveness. Whether there are some valuable suggestions on the problem of finding the balance point of costeffectiveness is not probed here.
5. Conclusion
The main contribution of this paper is that we propose a comparative strategy for assessing the coverage criteria and test cost for timedelay systems test. A TEFSM model is adopted for timedelay system’s test suite generation. In the future, if we want to assess the coverage criteria or the test suites generation methods that are not applied in this paper, the coverage criteria or the test suites generation methods can be added directly to the comparative architecture. Besides, we take many factors of the FSM model (FSM state number, input number, output number, completeness degree, accessibility degree, and timedelay which are not considered in the literature [13]) into account. Through empirical study, we explore the test suites generation methods, coverage criteria, and FSM model parameters in terms of the influence of the average length of test suites. Based on these relationships, test engineers can make a good test strategy on the part of the conformance testing based on the TEFSM model. For example, when testing, if the coverage criterion is TFcriterion, we can use that directly for its relatively minimum cost. When designing the complete FSM models, in the conditions of the same number of states, inputs, and outputs, and if we try to make the accessibility degree be 0.6 times as long as the maximum length (the number of states − 1), the length of the generated test set will be fluctuating around the minimum. In industrial practice, one may design FSM models for critical components and automatically obtain test sequences by suitable methods. If, based on the requirements, we could judge in which situation the test cost is lower than others’ cost, one could know which method to choose or how to change the models to be better. However, due to the random generation of the FSMs for experiments, we only consider the complete and deterministic FSM; it may vary with the actual situation. Next, we will extend our work on the nondeterministic FSM and the partial FSM. It requires further validation in more practical situation. The literature [13] explores the correlation between the structure testing and the model testing. We will benefit from the efforts, if we solve these problems, such as, how to extract the commons of the testing phases, understand the kinds of errors detected by each of the various stages, and use the various test adequacy criteria and test suites generation methods reasonably to test efficiently.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors would like to thank Professor Adenilso da Silva Simao for his help to their questions. The reviews are highly thanked for their useful comments.