About this Journal Submit a Manuscript Table of Contents
Advances in Software Engineering
Volume 2010 (2010), Article ID 428521, 18 pages
http://dx.doi.org/10.1155/2010/428521
Research Article

Automated Test Case Prioritization with Reactive GRASP

Optimization in Software Engineering Group (GOES.UECE), Natural and Intelligent Computing Lab (LACONI), State University of Ceará (UECE), Avenue Paranjana 1700, Fortaleza, 60740-903 Ceará, Brazil

Received 15 June 2009; Revised 17 September 2009; Accepted 14 October 2009

Academic Editor: Phillip Laplante

Copyright © 2010 Camila Loiola Brito Maia et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Modifications in software can affect some functionality that had been working until that point. In order to detect such a problem, the ideal solution would be testing the whole system once again, but there may be insufficient time or resources for this approach. An alternative solution is to order the test cases so that the most beneficial tests are executed first, in such a way only a subset of the test cases can be executed with little lost of effectiveness. Such a technique is known as regression test case prioritization. In this paper, we propose the use of the Reactive GRASP metaheuristic to prioritize test cases. We also compare this metaheuristic with other search-based algorithms previously described in literature. Five programs were used in the experiments. The experimental results demonstrated good coverage performance with some time overhead for the proposed technique. It also demonstrated a high stability of the results generated by the proposed approach.

1. Introduction

More than often, when a system is modified, the modifications may affect some functionality that had been working until that point in time. Due to the unpredictability of the effects that such modifications may cause to the system’s functionalities, it is recommended to test the system, as a whole or partially, once again every time a modification takes place. This is commonly known as regression testing. Its purpose is to guarantee that the software modifications have not affected the functions that were working previously.

A test case is a set of tests performed in a sequence and related to a test objective [1], and a test suite is a set of test cases that will execute sequentially. There are basically two ways to perform regression tests. The first one is by reexecuting all test cases in order to test the entire system once again. Unfortunately, and usually, there may not be sufficient resources to allow the reexecution of all test cases every time a modification is introduced. Another way to perform regression test is to order the test cases in respect to their beneficial factor to some attribute, such as coverage, and reexecute the test cases according to that ordering. In doing this, the most beneficial test cases would be executed first, in such a way only a subset of the test cases can be executed with little lost of effectiveness. Such a technique is known as regression test case prioritization. When the time required to reexecute an entire test suite is sufficiently long, test case prioritization may be beneficial because meeting testing goals earlier can yield meaningful benefits [2].

According to Myers [3], since exhaustive testing is out of question, the objective should be to maximize the yield on the testing investment by maximizing the number of errors found by a finite number of test cases. As Fewster stated in [1], software testing needs to be effective at finding any defects which are there, but it should also be efficient by performing the tests as quickly and cheaply as possible.

The regression test case prioritization problem is closely related to the regression test case selection problem. The Regression Test Case Selection problem can be directly modeled as a set covering problem, which is a well-known NP-Hard problem [4]. This fact points to the complexity of the Test Case Prioritization problem.

To order the test cases, it is necessary to consider a base comparison measure. A straightforward measure to evaluate a test case would be based on APFD (Average of the Percentage of Faults Detected). Higher APFD numbers mean faster fault detection rates [5]. However, it is not possible to know the faults exposed by a test case in advance, so this value cannot be estimated before testing has taken place. Therefore, the research on test case prioritization concentrates on coverage measures. The following coverage criteria have been commonly used, APBC (Average Percentage Block Coverage), which measures the rate at which a prioritized test suite covers the blocks of the code, APDC (Average Percentage Decision Coverage), which measures the rate at which a prioritized test suite covers the decision statements in the code, and APSC (Average Percentage Statement Coverage), which measures the rate at which a prioritized test suite covers the statements. In this work, these three coverage measures will be considered.

As an example, consider a test suite containing test cases that covers a set of blocks. Let be the first test case in the order of that covers block The APBC for ordering is given by the following equation (equivalent for the APDC and APSC metrics) [6]:

Greedy algorithms have been employed in many researches regarding test case prioritization, in order to find an optimal ordering [2]. Such Greedy algorithms perform by iteratively adding a single test case to a partially constructed test suite if this test case covers, as much as possible, some piece of code not covered yet. Despite the wide use, as pointed out by Rothermel [2] and Li et al. [6], Greedy algorithms may not choose the optimal test case ordering. This fact justifies the application of global approaches, that is, approaches which consider the evaluation of the ordering as a whole, not individually to each test case. In that context, metaheuristics have become the focus in this field. In this work, we have tested Reactive GRASP, not yet used for test case prioritization.

Metaheuristic search techniques are algorithms that may find optimal or near optimal solutions to optimization problems [7]. In the context of software engineering, a new research field has emerged by the application of search techniques, especially metaheuristics, to well-known complex software engineering problems. This new field has been named SBSE (Search-Based Software Engineering). In this field, the software engineering problems are modeled as optimization problems, with the definitions of an objective function—or a set of functions—and a set of constraints. The solutions to the problems are found by the application of search techniques.

The application of genetic algorithms, an evolutionary metaheuristic, has been shown to be effective for regression test case prioritization [8, 9]. We examine in this paper the application of another well-known metaheuristic, GRASP, not applied yet neither to the regression test case selection problem nor to any other search-based software engineering problem. The GRASP metaheuristic was considered due to its good performance reported by several studies in solving complex optimization problems.

The remaining of this paper is organized as follows: Section 2 describes works related to the regression test case prioritization problem and introduces some algorithms which have been applied to this problem. These algorithms will be employed in the evaluation of our approach later on the paper. Section 3 describes the GRASP metaheuristic and the proposed algorithm using Reactive GRASP. Section 4 presents the details of the experiments, and Section 5 reports the conclusions of this research and states future works.

2. Related Work

This section reports the use of search-based prioritization approaches and metaheuristics. Some algorithms implemented in [6] by Li et al. which will have their performance compared to that of the approach proposed later on this paper will also be described.

2.1. Search-Based Prioritization Approaches

The works below employed search-based prioritization approaches, such as greedy- and metaheuristic-based solutions.

Elbaum et al. [10] analyze several prioritization techniques and provide responses to which technique is more suitable for specific test scenarios and their conditions. The metric APFD is calculated through a greedy heuristic. Rothermel et al. [2] describe a technique that incorporates a Greedy algorithm called Optimal Prioritization, which considers the known faults of the program, and the test cases are ordered using the fault detection rates. Walcott et al. [8] propose a test case prioritization technique with a genetic algorithm which reorders test suites based on testing time constraints and code coverage. This technique significantly outperformed other prioritization techniques described in the paper, improving in, on average, 120% the APFD over the others.

Yoo and Harman [9] describe a Pareto approach to prioritize test case suites based on multiple objectives, such as code coverage, execution cost, and fault-detection history. The objective is to find an array of decision variables (test case ordering) that maximize an array of objective functions. Three algorithms were compared: a reformulation of a Greedy algorithm (Additional Greedy algorithm), Non-Dominating Sorting Genetic Algorithm (NSGA-II) [11], and a variant of NSGA-II, vNSGA-II. For two objective functions, a genetic algorithm outperformed the Additional Greedy algorithm, but for some programs the Additional Greedy algorithm produced the best results. For three objective functions, Additional Greedy algorithm had reasonable performance.

Li et al. [6] compare five algorithms: Greedy algorithm, which adds test cases that achieve the maximum value for the coverage criteria, Additional Greedy algorithm, which adds test cases that achieve the maximum coverage not already consumed by a partial solution, 2-Optimal algorithm, which selects two test cases that consume the maximum coverage together, Hill Climbing, which performs local search in a defined neighborhood, and genetic algorithm, which generates new test cases based on previous ones. The authors separated test suites in 1,000 small suites of size 8-155 and 1,000 large suites of size 228-4,350. Six C programs were used in the experience, ranging from 374 to 11,148 LoC (lines of code). The coverage metrics studied in that work were APBC, APDC, and APSC, as described earlier. For each program, the block, decision, and statement coverage data were found by tailor-made version of a commercial tool, Cantata. The coverage data were obtained over 500 executions for each search algorithm, using a different suite for each execution. For small programs, the performance was almost identical for all algorithms and coverage criteria, considering both small and large test suites. The Greedy algorithm performed the worst and the genetic algorithm and Additional Greedy algorithm produced the best results.

2.2. Algorithms

This section describes some algorithms which have been used frequently in literature to deal with the test case prioritization problem. The performance of them will be compared to that of the approach proposed later on this paper.

2.2.1. Greedy Algorithm

The Greedy Algorithm performs in the following way: all candidate test cases are ordered by their coverage. Then, the test case with the highest percentage of coverage is then added to an initially empty solution. Next, the test case with the second highest percentage is added, and so on, until all test cases have been added.

For example, let APBC be the coverage criterion, and let a partial solution contain two test cases that cover 100 blocks of code. Suppose there are two other test cases that can be added to the solution. The first one covers 80 blocks, but 50 of these were already covered by the current solution. Then, this solution covers 80% of the blocks, but the actual added coverage of this test case is of 30% of coverage (30 blocks). The second test case covers 40 blocks of code, but none of these blocks was covered by the current solution. This means that this solution covers 40% of the blocks. The Greedy algorithm would select the first test case, because it has greater percentage of block coverage overall.

2.2.2. Additional Greedy Algorithm

The Additional Greedy algorithm adds a locally optimal test case to a partial test suite. Starting from an empty solution, the algorithm follows these steps: for each iteration, the algorithm adds the test case which gives the major coverage gain to the partial solution.

Let us use the same example from Section 2.2.1. Let a partial solution contain two test cases that cover 100 blocks of code. There are two remaining test cases: the first one covers 80 blocks, but 50 of these were already covered; the second one covers 40 blocks of code, none of these already covered. The first solution represents an actual 30% of coverage and the second one represents 40% of coverage. The Additional Greedy algorithm would select the second test case, because that solution has greater coverage factor related to the current partial solution.

2.2.3. Genetic Algorithm

Genetic algorithm is a type of Evolutionary Algorithm which has been employed extensively to solve optimization problems [12]. In this algorithm, an initial population of solutions—in our case a set of test suites—is randomly generated. The procedure then works, until a stopping criterion is reached, as new populations are generated based on the previous one [13]. The evolution from one population to the next one is performed via “genetic operators”, including operations of selection, that is, the biased choice of which individuals of the current population will reproduce to generate individuals for the new population. This selection prioritizes individuals with high fitness value, which represents how good this solution is. The other two genetic operators are crossover, that is, the combination of individuals to produce the offspring, and mutation, which randomly changes a particular individual.

In the genetic algorithm proposed by Li et al. [6], the initial population is produced by selecting test cases randomly from the test case pool. The fitness function is based on the test case position in the current test suite. The fitness value was calculated as follows: where is the test case’s position in the current test suite and is the population size.

The crossover algorithm follows the ordering chromosome crossover style adopted by Antoniol [14] and used in [6] by Li et al. for the genetic algorithm in the experiments. It works as follows. Let and be the parents, and let and be the offspring. A random position is selected, and the first elements of become the first elements of , and the last elements of are the elements of which remain when the elements selected from are removed from . In the same way, the first elements of become the first elements of , and the last elements of are the elements of which remain when the elements selected from are removed from . The mutation is performed by randomly exchanging the position of two test cases.

2.2.4. Simulated Annealing

Simulated annealing is a generalization of a Monte Carlo method. Its name comes from annealing in metallurgy, where a melt, initially disordered at high temperature, is slowly cooled, with the purpose of obtaining a more organized system (a local optimum solution). The system approaches a frozen ground state with Each step of simulated annealing algorithm replaces the current solution by a random solution in its neighborhood, based on a probability that depends on the energies of the two solutions.

3. Reactive GRASP for Test Case Prioritization

This section is intended to present a novel approach for test case prioritization based on the Reactive GRASP metaheuristic.

3.1. The Reactive GRASP Metaheuristic

Metaheuristics are general search algorithms that find a good solution, sometimes optimal, to optimization problems. In this section we present, in a general fashion, the metaheuristic which will be employed to prioritize test cases by the approach proposed later on this paper.

GRASP (Greedy Randomized Adaptative Search Procedures) is a metaheuristic with two phases: construction and local search [15]. This metaheuristic is defined as a multistart algorithm, since the procedure is executed multiple times in order to get the best solution found overall; see Figure 1.

428521.fig.001
Figure 1: GRASP’s phases.

In the construction phase, a feasible solution is built by applying some Greedy algorithm. The greedy strategy used in GRASP is to add to an initially empty solution one element at a time. This algorithm tends to find a local optimum. Therefore, in order to avoid this local best, GRASP uses a randomization greedy strategy as follows. The Restrict Candidate List (RCL) stores the possible elements which can be added at each step in this construction phase. The element to be added is picked randomly from this list. RCL is associated with a parameter named which limits the length of the RCL. If only the best element—with highest coverage—will be present in the RCL, making the construction process a pure Greedy algorithm. Otherwise, if the construction phase will be completely random, because all possible elements will be in RCL. The parameter should be set to calibrate how random and greedy the construction process will be. The found solution is then used in the local search phase.

In the local search phase, the aim is to find the best solution in the current solution neighborhood. Indeed, a local search is executed in order to replace the current solution by the local optimum in its neighborhood. After this process, this local optimum is compared with the best local optimum solution found in earlier iterations. If the local optimum just found is better, then this is set to be the best solution already found. Otherwise, there is no replacement.

As can be easily seen, the performance of the GRASP algorithm will strongly depend on the choice of the parameter In order to decrease this influence, a GRASP variation named Reactive GRASP [15, 16] has been proposed. This approach performs GRASP while varying the values of according to their previous performance. In practice, Reactive GRASP will initially determine a set of possible values for Each value will have a probability of being selected in each iteration.

Initially, all probabilities are assigned to , where is the quantity of For each one of the values of the probabilities are reevaluated for each iteration, according to the following equation: with where is the incumbent solution and is the average value of all solutions found with . This way, when a particular generates a good solution, its probability, given by of being selected in the future is increased. On the other hand, if a bad solution is created, the value used in the process will have its selection probability decreased.

3.2. The Reactive GRASP Algorithm

The pseudocode below, in Algorithm 1, describes the Reactive GRASP algorithm.

alg1
Algorithm 1: Reactive GRASP for Test Case Prioritization

The first step initializes the probabilities associated with the choice of each (line 1).

Initially, all probabilities are assigned to where is the length of Set, the set of values. Next, the GRASP algorithm runs the construction and local search phases, as described next, until the stopping criterion is reached. For each iteration, the best solution is updated when the new solution is better.

For each iteration, is selected as follows; see Algorithm 2. Let be the incumbent solution, and let be the coverage average value of all solutions found with where and is the number of test cases. As described in Section 3.1, the probabilities are reevaluated at each iteration by taking

alg2
Algorithm 2: Selection of

The pseudocode in Algorithm 3 details the construction phase. For each iteration, one test case which increases the coverage of the current solution (set of test cases) is selected by a greedy evaluation function. This element is randomly selected from the RCL (Restricted Candidate List), which has the best elements, that is, the best coverage values. After the element is incorporated to the partial solution, the RCL is updated. The increment of coverage is then reevaluated.

alg3
Algorithm 3: Reactive GRASP for Test Case Prioritization, Construction Phase.

The Set is updated after the solution is found, in order to change the selection probabilities of the Set elements. This update is detailed in Algorithm 4.

alg4
Algorithm 4: Update of .

After the construction phase, a local search phase is executed in order to improve the generated solution. This phase is important to avoid the problems mentioned by Rothermel [2] and Li et al. [6], where Greedy algorithms may fail to choose the optimal test case ordering. The pseudocode for the local search is described in Algorithm 5.

alg5
Algorithm 5: Reactive GRASP for Test Case Prioritization, Local Search Phase.

Let be the test suite generated by the construction phase. The local search is performed as follows: the first test case on the test suite is exchanged with the other test cases, one at a time, that is, new test suites are generated, exchanging the first test case with the one, where varies from 2 to and is the length of the original test suite. The original test suite is then compared with all generated test suites. If one of those test suites is better—in terms of coverage—than the original one, it replaces the original solution. This strategy was chosen because, even with very little computational effort, any exchange with the first test case can generate a very significant difference in coverage. In addition, it would be prohibitive to test all possible exchanges, since it would generate new test suites, instead of in which most of them would exchange the last elements, with no significant difference in coverage.

4. Empirical Evaluation

In order to evaluate the performance of the proposed approach, a series of empirical tests was executed. More specifically, the experiments were designed to answer the following question.(1)How does the Reactive GRASP approach compare—in terms of coverage and time performances—to other search-based algorithms, including Greedy algorithm, Additional Greedy algorithm, genetic algorithm, and Simulated Annealing?

In addition to this result, the experiments can confirm results previously described in literature, including the performance of the Greedy algorithm.

4.1. Experimental Design

Four small programs (print_tokens, print_tokens2, schedule, and schedule2) and a larger program (space) were used in the tests. These programs were assembled by researchers at Siemens Corporate Research [17] and are the same Siemens’ programs used in Li et al. [6] for the experiments regarding test case prioritization. Table 1 describes the five programs’ characteristics.

tab1
Table 1: Programs used in the Evaluation.

Besides Reactive GRASP, other search algorithms have also been implemented, in order to compare their effectiveness. They are Greedy algorithm, Additional Greedy algorithm, genetic algorithm, and simulated annealing. These algorithms were implemented exactly as described in Section 3 of this paper. For the genetic algorithm, as presented by Li et al. [6], the population size was set at 50 individuals and the algorithm was terminated after 100 generations. Stochastic universal sampling was used in selection and mutation, the crossover probability (per individual) was set to 0.8, and the mutation probability was set to 0.1. For the Reactive GRASP approach, the maximum number of iterations was set, by preliminary experimentation, to 300.

For the simulated annealing approach, the initial temperature was set to a random number between 20 and 99. For each iteration, the new temperature is given by the following steps:(1)dividend = actualTemperature + initialTemperature, (2)(3)

In the experiments, we considered the three coverage criteria described earlier (APBC, APDC, and APSBC). In addition, we considered different percentages of the pool of test cases. For example, if the percentage is 5%, 5% of test cases are randomly chosen from the pool to compare the performance of the algorithms. We tested with 1%, 2%, 3%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% for the four small programs and 1%, 5%, 10%, 20%, 30%, 40%, and 50% for space. Each algorithm was executed 10 times for the four small programs and 1 time for the space program, for each coverage criterion and each percentage.

The pools of test cases used in the experiments were collected from SEBASE [18]. The test cases used are composed of “0”s and “1”s, where “0” represents “code not covered” and “1” represents “code covered”. The length of a test case is the quantity of portions of code of the program. For example, when we are analyzing the decision coverage, the length of the test cases is the quantity of decisions on the program. In the APDC, a “0” for the first decision means that the first decision is not covered by the test suite and a “1” for the second decision means that the second decision is covered by the test suite, and so on.

All experiments were performed on Ubuntu Linux workstations with kernel 2.6.22-14, a Core Duo processor, and 1 GB of main memory. The programs used in the experiment were implemented using the Java programing language.

4.2. Results

The results are presented in Tables 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, and 18 and Figures 2 to 17, separating the four small programs from the space program. Tables 2, 3, 4, and 5 detail the average of 10 executions of the coverage percentage achieved for each coverage criterion and each algorithm for printtokens, printtokens2, schedule, and schedule2, respectively. Table 12 has this information regarding the space program. The TSSp column is the percentage of test cases selected from the test case pool. The mean differences on time execution in seconds are also presented in Tables 6 and 16, for small programs and space, respectively.

tab2
Table 2: Results of Coverage Criteria (Average of 10 Executions), Program Print-tokens.
tab3
Table 3: Results of Coverage Criteria (Average of 10 Executions), Program Print-tokens2.
tab4
Table 4: Results of Coverage Criteria (Average of 10 Executions), Program Schedule.
tab5
Table 5: Results of Coverage Criteria (Average of 10 Executions), Program Schedule2.
tab6
Table 6: Coverage Significance and Time Mean Difference, Small Programs.
tab7
Table 7: Weighted Average for the Metrics, Small Programs.
tab8
Table 8: Difference in Performance between the Best and Worst Criteria, Small Programs.
tab9
Table 9: Average for Each Algorithm (All Metrics), Small Programs.
tab10
Table 10: Standard Deviation of the Effectiveness for the Four Algorithms, Small Programs.
tab11
Table 11: Summary of Results, Small Programs.
tab12
Table 12: Results of Coverage Criteria (1 Execution), Program Space.
tab13
Table 13: Coverage Significance and Time Mean Difference, Program Space.
tab14
Table 14: Weighted Average for the Metrics, Program Space.
tab15
Table 15: Difference in Performance between the Best and the Worst Criteria, Program Space.
tab16
Table 16: Average for Each Algorithm (All Metrics), Program Space.
tab17
Table 17: Standard Deviation of the Effectiveness for the Four Algorithms, Program Space.
tab18
Table 18: Summary of Results, Program Space.
428521.fig.002
Figure 2: APBC (Average Percentage Block Coverage), Comparison among Algorithms for Small Programs.

Tables 7 and 14 show the weighted average for the metrics (APBC, APDC, and APSC) for each algorithm. Figures 2 to 17 demonstrate a comparison among the algorithms for the metrics APBC, APDC, and APSC, for the small programs and space program.

4.3. Analysis

Analyzing the results obtained from the experiments, which are detailed in Tables 2, 3, 4, 5, and 9 and summarized in Tables 6 and 13, several relevant results can be pointed out. First, the Additional Greedy algorithm had the best performance in effectiveness of all tests. It performed significantly better than the Greedy algorithm, the genetic algorithm, and simulated annealing, both for the four small programs and for the space program. The good performance of the Additional Greedy algorithm had already been demonstrated in several works, including Li et al. [6] and Yoo and Harman [9].

4.3.1. Analysis for the Four Small Programs

The Reactive GRASP algorithm had the second best performance. This approach also significantly outperformed the Greedy algorithm, the genetic algorithm, and simulated annealing, considering the coverage results. When compared to the Additional Greedy algorithm, there were no significant differences in terms of coverage. Comparing the metaheuristic-based approaches, the better performance obtained by the Reactive GRASP algorithm over genetic algorithm and simulated annealing was clear.

In 168 experiments, the genetic algorithm generated a better coverage only once (block criterion, the schedule program, and 100% of tests being considered). The two algorithms tied also once. For all other tests, the Reactive GRASP outperformed the genetic algorithm. The genetic algorithm approach performed the fourth best in our evaluation. In Li et al. [6], the genetic algorithm was also worse than the Additional Greedy algorithm. The results in the present paper demonstrate that this algorithm is also worse than the proposed Reactive GRASP approach. The simulated annealing algorithm had the third best performance, outperforming only the Greedy algorithm.

Figures 2, 3, and 4 demonstrate a comparison among the five algorithms used in the experiments. It is easy to see that the best performance was that of the Additional Greedy algorithm, followed by that of the Reactive GRASP algorithm. Reactive GRASP surpassed the genetic algorithm and simulated annealing in all coverage criteria, and it had the best performance at APDC criterion. The Additional Greedy algorithm was better at APBC and APSC criteria and Greedy algorithm was the worst of all.

428521.fig.003
Figure 3: APDC (Average Percentage Decision Coverage), Comparison among Algorithms for Small Programs.
428521.fig.004
Figure 4: APSC (Average Percentage Statement Coverage), Comparison among Algorithms for Small Programs.

For better visualization, consider Figures 5 and 6 that show these comparisons among the used algorithms. To make the result clearer, Figures 7 and 8 have this information regarding the 3 more efficient algorithms in this experiment. Figure 9 shows the final coverage average for each algorithm.

428521.fig.005
Figure 5: Weighted Average for the Metrics (Comparison among the Metrics), Small Programs.
428521.fig.006
Figure 6: Weighted Average for the Metrics (Comparison among the Algorithms), Small Programs.
428521.fig.007
Figure 7: Weighted Average for the 3 More Efficient Algorithms (Comparison among the Metrics), Small Programs.
428521.fig.008
Figure 8: Weighted Average for the 3 More Efficient Algorithms (Comparison among the Algorithms), Small Programs.
428521.fig.009
Figure 9: Average for Each Algorithm (All Metrics), Small Programs.

To investigate the statistical significance, we used -test, which can be seen in Table 6. For each pair of algorithms, the mean coverage difference is given, and the significance level. If the significance is smaller than 0.05, the difference between the algorithms is statistically significant [6]. As can be seen, there is no significant difference between Reactive GRASP and Additional Greedy, in terms of coverage. In addition, one can see that there is no significant difference between simulated annealing and genetic algorithm, also in accordance with Table 6.

We can also notice in Table 6 the time mean difference for execution, for each pair of algorithms. It is important to mention that the time required to execute Reactive GRASP was about 61.53 larger than the time required to execution for Additional Greedy algorithm.

Another conclusion that can be drawn from the graphs is that the performance of the Reactive GRASP algorithm has remained similar for all metrics used, while Additional Greedy algorithm was a slightly different behavior for each metric.

Table 7 shows the weighted average of the algorithms, for each coverage criterion. The best results are highlighted in the table (bold). Table 8 shows the difference in performance between the best and the worst metric regarding the coverage percentage. In this experiment, Reactive GRASP had the minor difference in performance between the best and the worst coverage criterion, which demonstrates an interesting characteristic of this algorithm: its stability.

Table 9 contains the effectiveness average for all coverage criteria for each algorithm (APBC, APDC, and APSC). Together with Figure 9, Table 9 reinforces that the best performance was obtained by Additional Greedy algorithm, followed by that of the Reactive GRASP algorithm. Notice that Reactive GRASP algorithm has little difference in the performance compared with that of Additional Greedy algorithm.

The standard deviation shown in Table 10 refers to the 3 metrics (APBC, APDC, and APSC). It was calculated using the weighted average percentage of each algorithm. According to data in Table 10, the influence of the effectiveness performance regarding the coverage criterion is the lowest in the proposed Reactive GRASP algorithm, since its standard deviation value is the minimum among the algorithms. These data mean that the proposed technique is the one that less varies its performance related to the coverage criteria, which, again, demonstrates its higher stability.

4.3.2. Analysis for the Space Program

The results for space program were similar to results for the four small programs. The Reactive GRASP algorithm had the second best performance. Additional Greedy algorithm, genetic algorithm, simulated annealing, and Reactive GRASP algorithms significantly outperformed the Greedy algorithm. Comparing both metaheuristic-based approaches, the better performance obtained by the Reactive GRASP algorithm over the genetic algorithm and simulated annealing is clear.

The Reactive GRASP algorithm was followed by genetic algorithm approach, which performed the fourth best in our evaluation. The third best evaluation was obtained by simulated annealing.

Figures 10, 11, and 12 demonstrate a comparison between the five algorithms used in the experiments, for the space program. Based on these figures, it is possible to conclude that the best performance was that of the Additional Greedy algorithm, followed by the Reactive GRASP algorithm. Reactive GRASP surpassed the genetic algorithm, simulated annealing, and Greedy algorithm. One difference between the results for space program and the small programs is that Additional Greedy algorithm was better for all criteria, while, for small programs, Reactive GRASP had the best results for the APDC criteria. Another difference is the required execution time. As the size of the program increases, the Reactive GRASP algorithm has its time relatively less slow compared with the others.

428521.fig.0010
Figure 10: APBC (Average Percentage Block Coverage), Comparison among Algorithms for Program Space.
428521.fig.0011
Figure 11: APDC (Average Percentage Decision Coverage), Comparison among Algorithms for Program Space.
428521.fig.0012
Figure 12: APSC (Average Percentage Statement Coverage), Comparison among Algorithms for Program Space.

For better visualization, consider Figures 13 and 14 that show these comparisons among the used algorithms. To make the result clearer, Figures 15 and 16 have this information regarding the 3 more efficient algorithms in this experiment. Figure 17 shows the coverage average for each algorithm.

428521.fig.0013
Figure 13: Weighted Average for the Metrics (Comparison among the Metrics), Program Space.
428521.fig.0014
Figure 14: Weighted Average for the Metrics (Comparison among the Algorithms), Program Space.
428521.fig.0015
Figure 15: Weighted Average for the 3 More Efficient Algorithms (Comparison among the Metrics), Program Space.
428521.fig.0016
Figure 16: Weighted Average for the 3 More Efficient Algorithms (Comparison among the Algorithms, Program Space.
428521.fig.0017
Figure 17: Final Average for Each Algorithm, Program Space.

The -test was used to investigate the statistical significance for space program, which can be seen in Table 13. As in the analysis for the small programs, the level of significance of the result was set to 0.05. In the same way to the small programs, there is no significant difference between Reactive GRASP and Additional Greedy, in terms of coverage, for space program, neither for simulated annealing nor for genetic algorithm.

4.3.3. Final Analysis

These results qualify the Reactive GRASP algorithm as a good global coverage solution for the prioritization test case problem.

It is also important to mention that the results were consistently similar across coverage criteria. This fact had already been reported by Li et al. [6]. It suggests that there is no need to consider more than one criterion in order to generate good prioritizations of test cases. In addition, we could not find any significant difference in the coverage performance of all algorithms when varying the percentage of test cases being considered.

Note that we have tried from 1% to 100% of test cases for each program and criterion for the four small programs, and the performances of all algorithms remained unaltered. This demonstrated that the ability of the five algorithms discussed here is not deeply related to the number of test cases required to order.

In terms of time, as expected, the use of global approaches, such as both metaheuristic-based algorithms evaluated here, adds an overhead to the process. Considering time efficiency, one can see from Tables 6 and 13 that the Greedy algorithm performed more efficiently than all other algorithms. This algorithm was, on average, 1.491 seconds faster than Additional Greedy algorithm, 8.436 faster than the genetic algorithm, 0.057 faster than the simulated annealing, and almost 50 seconds faster than the Reactive GRASP approach, for the small programs. In terms of relative values, Reactive GRASP was 61.53 times slower than Additional Greedy, 11.68 slower than genetic algorithm, 513.87 slower than simulated annealing, and 730.92 slower than Greedy algorithm. This result demonstrates, once again, the great performance obtained by the Additional Greedy algorithm compared to that of the Greedy algorithm, since it was significantly better, performance-wise, and achieved these results with a very similar execution time. On the other spectrum, we had the Reactive GRASP algorithm, which performed on average 48,456 seconds slower than the Additional Greedy algorithm and 41,511 seconds slower than the genetic algorithm. In favor of both metaheuristic-based approaches is the fact that one may calibrate the time required for prioritization depending on time constraints and characteristics of programs and test cases. This flexibility is not present in the Greedy algorithms.

Tables 11 and 18 summarize the results described above.

5. Conclusions and Future Works

Regression testing is an important component of any software development process. Test Case Prioritization is intended to avoid the execution of all test cases every time a change is made to the system. Modeled as an optimization problem, this prioritization problem can be solved with well-known search-based approaches, including metaheuristics.

This paper proposed the use of the Reactive GRASP metaheuristic for the regression test case prioritization problem and compared its performance with other solutions previously reported in literature. Since the Reactive GRASP algorithm performed significantly betterin terms of coverage performancethan the genetic algorithm, Simulated Annealing, and similarly to the Greedy algorithm and it avoids the problems mentioned by Rothermel [2] and Li et al. [6], where Greedy algorithms may fail to choose the optimal test case ordering, the use of the Reactive GRASP algorithm is indicated to the problem of test case prioritization, especially when time constraints are not too critical, since the Reactive GRASP added a considerable overhead.

Our experimental results confirmed also the previous results reported in literature regarding the good performance of the Additional Greedy algorithm. However, some results point out to some interesting characteristics of the Reactive GRASP solution. First, the coverage performance was not significantly worse when compared to that of the Additional Greedy algorithm. In addition, the proposed solution had a more stable behavior when compared to all other solutions. Next, GRASP can be set to work with as many or as little time as available.

As future work, we will evaluate the Reactive GRASP with different number of iterations. This will elucidate whether its good performance was due to its intelligent search heuristics or its computational effort. Finally, other metaheuristics will be considered, including Tabu Search and VNS, among others.

References

  1. M. Fewster and D. Graham, Software Test Automation, Addison-Wesley, Reading, Mass, USA, 1st edition, 1994.
  2. G. Rothermel, R. H. Untcn, C. Chu, and M. J. Harrold, “Prioritizing test cases for regression testing,” IEEE Transactions on Software Engineering, vol. 27, no. 10, pp. 929–948, 2001. View at Publisher · View at Google Scholar
  3. G. J. Myers, The Art of Software Testing, John Wiley & Sons, New York, NY, USA, 2nd edition, 2004.
  4. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, MIT Press, Cambridge, Mass, USA; McGraw-Hill, New York, NY, USA, 2nd edition, 2001.
  5. G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold, “Test case prioritization: an empirical study,” in Proceedings of the International Conference on Software Maintenance (ICSM '99), pp. 179–188, Oxford, UK, September 1999.
  6. Z. Li, M. Harman, and R. M. Hierons, “Search algorithms for regression test case prioritization,” IEEE Transactions on Software Engineering, vol. 33, no. 4, pp. 225–237, 2007. View at Publisher · View at Google Scholar
  7. F. Glover and G. Kochenberger, Handbook of Metaheuristics, Springer, Berlin, Germany, 1st edition, 2003.
  8. K. R. Walcott, M. L. Soffa, G. M. Kapfhammer, and R. S. Roos, “Time-aware test suite prioritization,” in Proceedings of the International Symposium on Software Testing and Analysis (ISSTA '06), pp. 1–12, Portland, Me, USA, July 2006. View at Publisher · View at Google Scholar
  9. S. Yoo and M. Harman, “Pareto efficient multi-objective test case selection,” in Proceedings of the International Symposium on Software Testing and Analysis (ISSTA '07), pp. 140–150, London, UK, July 2007.
  10. S. Elbaum, G. Rothermel, S. Kanduri, and A. G. Malishevsky, “Selecting a cost-effective test case prioritization technique,” Software Quality Journal, vol. 12, no. 3, pp. 185–210, 2004. View at Publisher · View at Google Scholar
  11. K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II,” in Proceedings of the 6th Parallel Problem Solving from Nature Conference (PPSN '00), pp. 849–858, Paris, France, September 2000.
  12. J. H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, University of Michigan, Ann Arbor, Mich, USA, 1975.
  13. M. Harman, “The current state and future of search based software engineering,” in Proceedings of the International Conference on Software Engineering—Future of Software Engineering (FoSE '07), pp. 342–357, Minneapolis, Minn, USA, May 2007. View at Publisher · View at Google Scholar
  14. G. Antoniol, M. D. Penta, and M. Harman, “Search-based techniques applied to optimization of project planning for a massive maintenance project,” in Proceedings of the IEEE International Conference on Software Maintenance (ICSM '05), pp. 240–252, Budapest, Hungary, September 2005. View at Publisher · View at Google Scholar
  15. M. Resende and C. Ribeiro, “Greedy randomized adaptative search procedures,” in Handbook of Metaheuristics, F. Glover and G. Kochenberger, Eds., pp. 219–249, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2001.
  16. M. Paris and C. C. Ribeiro, “Reactive GRASP: an application to a matrix decomposition problem in TDMA traffic assignment,” INFORMS Journal on Computing, vol. 12, no. 3, pp. 164–176, 2000.
  17. M. Hutchins, H. Foster, T. Goradia, and T. Ostrand, “Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria,” in Proceedings of the 16th International Conference on Software Engineering (ICSE '99), pp. 191–200, Los Angeles, Calif, USA, 1999.
  18. SEBASE, Software Engineering By Automated Search, September 2009, http://www.sebase.org/applications/.