Abstract

In this paper, a methodology of automatic generation of test scenarios for intelligent driving systems is proposed, which is based on the combination of the test matrix (TM) and combinatorial testing (CT) methods together. With a hierarchical model of influence factors, an evaluation index for scenario complexity is designed. Then an improved CT algorithm is proposed to make a balance between test efficiency, condition coverage, and scenario complexity. This method can ensure the required combinational coverage and at the same time increase the overall complexity of generated scenarios, which is not considered by CT. Furthermore, the way to find the best compromise between efficiency and complexity and the bound of scenario number has been analyzed theoretically. To validate the effectiveness, it has been applied in the hardware-in-the-loop (HIL) test of a lane departure warning system (LDW). The results show that the proposed method can ensure required coverage with a significantly improved scenario complexity, and the generated test scenario can find system defects more efficiently.

1. Introduction

In recent years, with the development of intelligent driving technologies, such advanced driver assistance systems (ADAS) as auto emergency braking (AEB), forward collision warning (FCW), lane departure warning (LDW), etc. have been put into market rapidly [1]. Compared with the traditional onboard electronic systems, the working condition of such systems is uncontrolled and indefinable exactly, which causes the traditional test scenario design approach to be inapplicable. On the other hand, since these systems influence the driving safety directly, sufficient and comprehensive tests are necessary before they go public [2, 3].

At this stage, the approaches of designing the test scenario for ADAS can be divided into the following five types mainly: naturalistic field operational test (N-FOT), Monte Carlo simulation (MCS), accelerated evaluation method (AE), worst-case scenario evaluation (WCSE), and test matrix approach (TM). When using N-FOT, vehicles equipped with ADAS to be tested are driven by multiple drivers in real traffic for data collection over a quite long period to achieve enough coverage [4]. It can test the system under the real working condition, but the occurrence probability of critical condition is very low and most of the time the system may be tested under the similar and simple scenarios. Moreover, the real traffic is uncontrollable, which makes it difficult to ensure a good test consistency [5]. To improve N-FOT, some researchers use the data collected from naturalistic traffic to build a stochastic driving behavior model to generate test scenarios, which is called MCS. Based on this approach, Yang et al. developed a car-following driver model to evaluate FCW and AEB [6]. Compared with N-FOT, MCS can partly extend the test condition by the driving behavior model, ensure a good consistency with real traffic, and reduce the similar and simple scenarios to increase test efficiency. Unfortunately, the critical condition is still hardly to be generated because of the lack of original traffic data.

To generate more critical scenarios, Zhao et al. [7] and Huang et al. [8] proposed AE, which applies importance-sampling theory to speed up the evaluation process by finding the most critical scenarios. However, this method only takes limited influence factors into account for some specific driving assistance function, which restricts its application range. In order to directly extract the most critical scenarios, WCSE was proposed based on the database built up from the traffic accidents. Andreas validated the emergency lane assist system based on these most dangerous scenes [9]. These extreme scenarios help to find the faults and achievable performance of ADAS, but the coverage of all related factors cannot be measured quantitatively. This makes it difficult to determine when to stop the validation and evaluation process.

As a widely used method to construct the standardized test scenarios, TM has also been applied in the formulation of test standards for ADAS, e.g., ISO 15622 for ACC [10], AEB in EuroNCAP [11], etc. TM can take all factors relating to the tested system into consideration. Therefore compared with other methods, it can be used to design higher diversity and complex scenarios with an acceptable testing cost, better controllability, and preferable repeatability. However, the current TM has the following problems when applied on ADAS test [12]: (1) the number of test scenarios and considered factors is too few to achieve the coverage requirement. Otherwise the mostly used orthogonal experiment method (OE) for TM leads to “dimensional disaster" when the number of considered influence factors increases; (2) and its test efficiency is very low when failures are only triggered by the interaction of a small number of factors.

To solve these issues of TM, we introduce the combinatorial testing method (CT), which has already been widely used in computer software testing [13]. The motivation of CT is based on the fact that most of the program errors are caused by the interaction of multiple influence factors [14]. The set of test scenarios generated by CT can guarantee full coverage of the required n-wise combination with as few scenario number as possible. Cohen et al. proposed Automatic Efficient Test Generator (AETG) algorithm, which generates a certain number of test scenarios at each time by using the greedy algorithm and then picks one that can cover most of the n-wise combination of factors still uncovered [15]. Czerwonka developed another CT tool called Pairwise Independent Combinatorial Testing tool (PICT), which is unlike AETG algorithm and need not produce test scenarios to be selected beforehand [16]. Instead it uses a fixed random seed to ensure the consistency of the output. To trade off a reduction of the number of test scenarios, James developed a new tool called Allpairs to find a reasonably small set of test scenarios to satisfy the full coverage of arbitrary pairwise combination [17]. Wu et al. used the particle swarm optimization algorithm to generate test suites ensuring the coverage criteria with smaller test set than other metaheuristic strategies [18]. Garvin et al. modified the original simulated annealing algorithm to further improve the efficiency of test suite generation [19]. Gonzalez-Hernandez used the Mixed-Tabu Search to build the test suite with uniform strength [20].

All the aforementioned CT methods focus on the reduction of test set while ensuring the combinatorial coverage at the same time. The complexity of scenario has not been taken into account when generating combinatorial test scenarios, which is beneficial to test effectiveness because a system tends to malfunction under more complex conditions. Therefore, a new method called combinatorial testing based on scenario complexity (CTBC) is presented in this paper for automatic generating the test scenarios for intelligent driving system, which can achieve full coverage of n-wise combination with a relative compact test set meanwhile increasing the scenario complexity. The following paper is organized as follows: Section 2 describes the model of influence factors and potential problems; Section 3 introduces the procedure and principle of the proposed CTBC, whose performance is analyzed theoretically in Section 4; Section 5 validates CTBC by application and comparative analysis; and finally Section 6 concludes the paper.

2. Problem Description

Before discussing the problem of automatic generation of test scenario for intelligent driving systems, a mathematical model is introduced to describe the influence factors of intelligent driving system, which act as the input of the automatic generation process of test scenarios.

2.1. Tree Structure Model of Influence Factors

The components used to construct the test scenario are called influence factors in this paper, which are related to the functionality of intelligent driving system. These factors can be obtained through a variety ways, such as technical specifications, naturalistic traffic data, etc. To ensure the comprehensiveness, such systematic methods, e.g., classification tree [21], are suggested to analyze the possible influence factors. In general, the influence factors of an intelligent driving system can be divided into three types: environment, self-state, and other road participants.

To realize automatic generation of test scenarios, the influence factors should be discretized to get distinct values, which can be generated by such black box test scenarios design methods as equivalence class partition and boundary value processing [12]. Finally, a tree structure model of influence factors can be derived as shown in Figure 1 as an example. Table 3 in the Appendix shows the detailed influence factors of LDW.

Then the factors and their values can be described by a general mathematical linguistic form as shown by (1) and (2):where is the -th factor at the -th layer, is composed of all subfactors of , is the number of factor layer, and is the number of factors in the -th layer.

All subfactors belonging to the same parent factor can appear in one scenario, while the different values of the same factor at the bottom layer cannot. Therefore, we need a different formula to describe all possible values of factors locating at the bottom factor layer:where is the -th value of and is composed of the subscripts of all factors at the bottom layer.

Finally, the test scenario can be built by combining all the factors at the bottom factor layer with one of their corresponding values together. To simplify the description, let denote the number of elements belonging to a set. Then the end node factors can be denoted as , where . And each factor’s values can be denoted as (. The mathematical expression of test scenarios iswhere is a matrix including all test scenarios, is the number of test scenario, and denotes the value of -th factor in the -th scenario. The construction process of a test scenario is shown in Figure 2 as an example.

2.2. Problems Analysis

Based on the structure model of influence factors of LDW shown in Table 3 in the Appendix, the problem of generation of effective test scenario is discussed in this section. There are 16 influence factors at the bottom level, which have 58 values totally. The number of test scenarios defined in ISO 17361 is only 8 [22], which is far away to cover all possibilities and its error detection capability is pretty bad. Such standard test scenarios only can be used to validate the primary functionality. On the other hand, the scenarios generated by OE can ensure the coverage of all influence factors, but the number of scenarios reaches 349,920,000, which is unacceptable in practice because of the test cost. By using the PICT, the number of scenarios ensuring pairwise combination is reduced to 53 [23]. However, the percentage of scenarios with higher complexity cannot be improved, under which LDW may malfunction easily. Therefore in order to increase the proportion of scenario with higher complexity to further improve the test efficiency, an improved CT method called CTBC is to be introduced in the next section.

3. Combinatorial Test Generation Method Based on Complexity

To control the complexity of the generated test scenarios, we need an index to measure the contribution of factor value to the complexity of scenario.

3.1. Complexity Index of Scenario

The contribution of each factor or value to the complexity of scenario can be determined by the Analytic Hierarchy Process (AHP), which is widely used in the field of engineering because of its ability to make quantitative analysis of subjective evaluation [24]. The calculation process is based on the tree structure model of influence factors as shown in Figure 1. According to their relative contribution, a judgment matrix can be constructed aswhere represents a comparison of the relative importance of any two elements in , and it can be determined by Saaty’s scaling law [24]. It also has the following properties:

Then, the normalized eigenvector corresponding to the maximum positive eigenvalue of iswhere represents the relative importance of influence factors or its corresponding values according to [24].

However, the relative importance degree is not comparable for the factors belonging to different categories. Therefore to measure the importance of different values used to construct test scenario uniformly and equally, the following importance index of value is used:where is composed of the subscript of the nodes in the route from to the root node . Table 3 shows the importance index of factor values of LDW. Then, with the importance index, the complexity index of the i-th test scenario in is defined aswhere is the importance index of .

3.2. Complex Index Based CT Algorithm

In this section, a CTBC algorithm is introduced to improve the test scenario complexity while ensuring required combinational coverage. An idea to improve the scenario complexity is that the factor value with higher importance index is preferred to generate the test scenario. Naturally more complex scenarios can be found at the beginning of automatic generation process. But the importance index of the left factor value not covered becomes very small. It is impossible to construct complex scenario using these factor values, which is bad for the overall complexity of all generated scenarios.

To overcome this problem, a threshold is defined to choose the aspect to be considered when generating test scenarios. When generating a new scenario, if the summed importance index of the selected factor values is larger than this coefficient, the unselected factor values with the most combinational coverage is preferred. Otherwise, the importance index is considered preferentially when determining the value of left factors of a scenario. By this way, a tradeoff between combinational coverage and scenario complexity is made.

For a given tree structure model of influence factors with their importance index (4), there exists a boundary of achievable maximum or minimum complexity of scenario. If the selected threshold is less than the minimum complexity, the CTBC algorithm generates scenarios only considering the combinational coverage requirement like AETG. In Section 4.1, a complexity improvement coefficient is proposed to determine a proper threshold.

The CTBC algorithm generates one test scenario at a time and lasts until the combinations of all possible values have been covered by at least one scenario. Based on the aforementioned idea and the variables defined in Table 1, the pseudocode of CTBC is shown in Algorithm 1.

1: Input:  , (, , ,
2: Output:  
3: Obtain uncovered
4: for all comb  in  uncovered  do:
5:  
6:  Add combweight to setofweight
7: end for
8:
9: while  uncovered    do
10:  best = combination with max(combweight) in uncovered
11:  for all in do
12:   Assign factors and values from combination that corresponding to best to testscenario
13:   Remove best from uncovered
14:  end for
15:  if  best > threshold  then
16:   while ((length of testscenario) < ) do
17:   nextbest = combination with max(combweight) in uncovered
18:   Comparing values of the same factors between nextbest and testscenario
19:   if there exist conflicting values then
20:    continue to select the next combination corresponding to nextbest in the descending order of combweight
21:    end if
22:    Factors and values that corresponding to nextbest but not exist in testscenario are assigned to testscenario
23:    Remove nextbest from uncovered
24:   end while
25:  else
26:   Assign factor that are not contained in testscenario with the value of
27:  end if
28:  Add testscenario to
29: end while

When programming the executable code, the following should be noted:

(a) In order to accelerate the searching process, all uncovered combinations are put in a red-black tree according to the sum of importance indices in the descending order [25].

(b) Moreover, the requirement of reproducibility and certainty is important for the test of ADAS. Therefore the lexicographical order is used to ensure the consistency of generated scenarios when there exist multiple choices providing the same sum of importance indices [26].

4. Performance Analysis of CTBC

The generated test scenarios are evaluated mainly by two aspects: (a) test cost measured by the number of scenarios (b) and test effectiveness evaluated by the overall scenario complexity in this paper. To make an optimal tradeoff between these two aspects, a complexity improvement coefficient is proposed, by which the best test scenarios can be found with a statistical method.

4.1. Optimization by Complexity Improvement Coefficient

The purpose of introducing this complexity improvement coefficient is to improve the overall scenario complexity:where and are the minimum and maximum achievable scenario complex indexes. From the fundamental of CTBC, a more complex scenario can be obtained by increasing , but the number of needed scenarios ensuring combinational coverage may also increase. To make a balance between the test cost and effectiveness to find the best test scenarios, the overall complexity of test scenarios is defined firstly:where , , and are the overall complexity, the number of test scenarios, and the -th scenario’s complexity index, which all are functions of the complexity improvement coefficient .

Here the AHP method is used again to determine the weighting of test cost and effectiveness [24]. Then the optimization problem can be described bywhere denotes the composite test effect, and are the normalized weight for scenario complexity and test cost determined by AHP method, and are the normalized value of complexity and test cost, and and are the maximum and minimum number of scenarios. In (8), the “min-max scaling” process is used to put and in the same scale to avoid the challenge of selecting a proper weight.

For a given tree structure model of influence factors and its importance index values, the achievable maximum or minimum complex index of scenario can be derived by selecting the factor value with highest or lowest importance index. Unfortunately the range of scenario number cannot be obtained from the previously known information of the tested system, which is used to normalize the test cost in (8). In the following section, an approximated method is presented to estimate the range of scenario number.

4.2. Estimation of Test Scenario Number

The test scenario number in can be discussed in two situations: (a) is a Covering Array (CA) and (b) is a Mixed Covering Array (MCA). The difference between CA and MCA is whether the number of each factor value is the same, i.e., in CA , and in MCA the number of values is different [27].

Firstly, the simpler condition is considered, i.e., is a CA. At this condition, if each factor value is only required to appear at least once, then test scenarios are needed. If all combinations of values for any factors are required to be covered, the lower bound of scenario can be found by the orthogonal table [28]. And Bush proposed a construction method of the orthogonal table for CA, which has the smallest number of scenarios [29]. At the condition , CTBC algorithm gives priority to test cost and the number of its generated scenario is no less than that of the orthogonal table [29]:

The maximum number of scenarios is reached when . This means that when a new test scenario is generated, it can only cover one combination in the set of uncovered combinations and the upper bound of scenario number can be estimated bywhere denotes the number of options for selecting any factors from ones.

Based on the aforementioned analysis, the condition that is a MCA is discussed next. Firstly all factors are rearranged according to the number of their values from large to small, and we get satisfying . Similar to CA, the lower bound of scenario number may happen when and can be estimated by

Analogously, when the maximum number of scenarios may be reached, which is calculated by where denotes the set of combinations for selecting any factors from factors, and is the -th combination of factors in .

5. Application and Analysis

In this section, the proposed test scenario generation method is applied to evaluate an LDW system to validate its effectiveness by hardware-in-the-loop test as shown in Figure 3.

In Figure 3, “A” is a workstation where the virtual reality simulation software “Prescan” runs, “B” is the external driver input device, “C” is a host computer that can configure and monitor the real-time simulator, “D” is the MicroLabBox produced by Dspace and acting as a real-time simulator to run the vehicle dynamical model, and “E” is the tested LDW system.

5.1. Optimization of Complexity Improvement Coefficient

With this application, the strength of combination coverage is selected to be to find the optimal complexity improvement index. The judgment matrix is used to determine the normalized weights of scenario complexity and test cost:The optimization process of test effect is shown in Figure 4. The best test effect is when . At this condition, and . A tradeoff between the test cost and the scenario complexity is successfully made by the proposed approach to achieve better test effectiveness.

5.2. Fundamental Validation of CTBC

The fundamental of the CTBC algorithm is that the ADAS under test is easier to malfunction under more complex scenarios. The scenarios are generated by using and , which is optimized in Section 5.1. The number of scenarios generated by CTBC is 324, among which 10 scenarios, whose complex index distributes uniformly, are selected to test the LDW. These scenarios are numbered from 1 to 10 in descending order according to their complex index. The fault detection rate under different scenarios is shown in Figure 5.

As can be seen from Figure 5, scenario No. 1 has a fault detection rate of 42.20%, while that of scenario No. 10 is only 5.9%. The fault detection rate fluctuates between 35.65% and 43.80% as the complexity index falls from 0.4729 of scenario No. 1 to 0.4334 of scenario No. 7. And the fault detection rate reduces dramatically from scenario No. 7 to No. 10. The results show that overall the bigger the complex index of scenario is, the easier it is to find the malfunctions of tested system. This fact can be used not only to guide the automatic generation algorithm of test scenario, but also to evaluate the effectiveness of test scenarios generated by other algorithms.

5.3. Performance Analysis

In this section, CTBC algorithm is compared with other TM methods to validate its improvement of test effect. The number of test scenarios defined in ISO 17361, designed by the OE method, and generated by CTBC algorithm with the strength of combination coverage is shown in Figure 6. Being similar to the optimization process in Section 5.1, the best complexity improvement coefficient is selected to be when .

It can be seen clearly that the number of the test scenario defined in ISO 17361 is too few to test the LDW adequately and throughly as there are only 8 scenarios. Moreover, these scenarios are quite simple. The number of scenarios generated by OE far exceeds others, which is almost impossible to undertake because of test costs. The scenarios generated by CTBC are more reasonable by synthetically considering the test effectiveness. Furthermore from the previous studies it is known that a quite small can reveal most of potential errors [26].

Moreover the bound of scenario number is the basis of the optimization process of test scenario introduced in Section 4.1. From the results in Figure 6, it is found that the number of scenarios generated by CTBC lies in the estimated range under different coverage strength, which implies that the proposed estimation of bound of scenario number in Section 4.2 is effective.

5.4. Scenario Complexity Index Improvement

The best advantage of CTBC algorithm is its improvement of scenario complexity index, which is beneficial to finding fault validated in Section 5.2. The primary focus of this experiment is to set comparison with the test set generated by other CT algorithms which does not consider the complexity of scenario and is the basis of CTBC. Therefore, choosing such methods can highlight the superiority of the proposed algorithm. Moreover, PICT, Allpairs, and AETG have already been widely applied in engineering applications. Therefore, they are used to validate the effectiveness of CTBC in increasing the overall complexity index of test scenarios. The comparative results are shown in Figure 7 and Table 2.

It can be seen that compared with other CT algorithms, though the maximum value of complexity index is only increased slightly, the number of scenarios with higher complexity index is increased significantly, which implies that the scenarios generated by CTBC can find more errors.

The number of test scenario generated by these methods is compared in Figure 8. The number of scenarios generated by CTBC is increased by 6.1132, 5.4 and 5.3115 times, respectively. It is still greatly reduced compared with that of ET (as shown in Figure 6). Besides, an appropriate increase of test cost is acceptable in order to give priority to detect the hidden errors of ADAS systems, which have respect with the driving security.

6. Conclusions

Considering the high security of intelligent driving systems, this paper proposes a new approach to automatically generate more effective test scenarios to improve the test effectiveness. The theoretical analysis and application results show the following:

(1) The larger the complexity index of scenario is, the easier it is to find out the malfunctions of tested systems.

(2) The proposed CTBC algorithm can generate more complex scenarios compared with the traditional CT method, while ensuring the required combinational coverage.

(3) Compared to the widely used ET method, the number of test scenarios generated by CTBC algorithm is reduced greatly, which leads to a dramatically cut down of test cost.

Appendix

See Table 3.

Data Availability

The tree structure model of LDW and the importance index of each factor are listed in the appendix. The ATEG algorithm is an online program, which can be found at http://alarcostest.esi.uclm.es/CombTestWeb/combinat orial.jsp. The Allpairs and PICT algorithms are free software, which are available from the corresponding author upon request. The program of CTBC algorithm programed by Python and its input file are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the National Key R&D Program of China under grants 2016YFB0101104 and 2017YFB0102504, Scientific Technological Plans of Chongqing under grant cstc2017zdcy-zdzx0042, and Industrial Base Enhancement Project under grant 2016ZXFB06002.