Abstract

During past decades, many automated software faults diagnosis techniques including Spectrum-Based Fault Localization (SBFL) have been proposed to improve the efficiency of software debugging activity. In the field of SBFL, suspiciousness calculation is closely related to the number of failed and passed test cases. Studies have shown that the ratio of the number of failed and passed test case has more significant impact on the accuracy of SBFL than the total number of test cases, and a balanced test suite is more beneficial to improving the accuracy of SBFL. Based on theoretical analysis, we proposed an PNF (Passed test cases, Not execute Faulty statement) strategy to reduce test suite and build up a more balanced one for SBFL, which can be used in regression testing. We evaluated the strategy making experiments using the Siemens program and Space program. Experiments indicated that our PNF strategy can be used to construct a new test suite effectively. Compared with the original test suite, the new one has smaller size (average 90% test case was reduced in experiments) and more balanced ratio of failed test cases to passed test cases, while it has the same statement coverage and fault localization accuracy.

1. Introduction

Software fault localization is an activity of identifying the exact locations of program faults. It is one of the most tedious and time-consuming tasks in program debugging [1]. To decrease the cost of software debugging, many automatic software fault localization techniques have been proposed in recent years [28]. Among those, Spectrum-Based Fault Localization (SBFL) techniques have attracted a lot of attention, such as Tarantula [9] and Ochiai [10]. SBFL techniques are simple and highly efficient, and they usually calculate the suspiciousness of program entities according to different dynamic behaviours of failed and passed test executions.

In the field of SBFL, more than 30 kinds of formulas have been proposed to calculate suspiciousness of program entity [11]. And it is essential to count the number of failed and passed test cases in each SBFL technique. Due to that, we can speculate that the ratio of failed test cases to passed test cases might affect the accuracy of software fault location. In this paper, we call the ratio of the number of failed test cases to the number of passed test cases as class ratio. Test cases reduction is an important topic in respect of the test case number. Some studies [1215] focused on test cases reduction technologies for SBFL, but they primarily reduced the total number of test cases, without considering class ratio.

Only a few studies focused on the impact of class ratio on the accuracy of software fault localization [1618]. The experiments in Gong et al.’s work [16] showed that the class imbalance phenomenon of test suites would negatively affect the efficiency of SBFL. They used two methods to change the class ratio of failed to passed test cases. (1) The first method fixed the total size of test suites and changed the ratio of failed to passed test cases. (2) The second method fixed the number of failed cases and changed the number of passed test cases. They selected some test cases from an original test suite randomly when generating a new test suite. Base on the work, Gao et al. [17] conducted a theoretical study to generate balanced test suite. They cloned the failed test cases for suitable number of times to catch up with the number of passed test cases. Their theoretical analysis result suggested that the efficiency of SBFL can be improved under certain conditions and impaired at no time by using their strategy.

In this paper, we proposed a nonrandom PNF (Passed test cases, Not execute Faulty statement) test case selection strategy to build up a reduced and balanced test suite. This strategy is applicable for some regression testing. For example, part of test cases should be selected from the original test suite when software platform is updated. Different from cloning failed test cases in literature [17], we prefer to select passed test cases from the original test suite to construct a new balanced test suite. This strategy will not only make the test suite more balanced than before, but also significantly reduce the size of the original test suite without decreasing statement coverage.

This paper is structured as follows. In Section 2, we analyse the PNF strategy from a theoretical perspective. Then the experiments on Siemens and Space are presented in Section 3. Finally Section 4 concludes the paper and outlines our further work.

2. Theoretical Analysis

In this section, we conduct a theoretical analysis about the impact of increasing passed test cases on the accuracy of SBFL. Here, we take two typical SBFL techniques Tarantula and Nashi2 as representative. Based on the analysis result, we propose PNF strategy to build up a balanced test suite, which can not only reduce the size of test suite but also hold the accuracy of SBFL like the original test suite.

2.1. PNF Strategy

In order to describe our PNF strategy, we define the following symbols at first:(i): the original test suite.(ii): the initial test suite which consists of some test cases selected from .(iii): the increased test suite in which some test cases are added based on .(iv): suspiciousness of the statement .(v): the number of all passed test cases in .(vi): the number of all failed test cases in .(vii): the number of all passed test cases in .(viii): the number of all failed test cases in .(ix): the number of passed test cases which execute the statement in .(x): the number of failed test cases which execute the statement in .(xi): the number of passed test cases which execute the statement in .(xii): the number of failed test cases which execute the statement in .

In the fault localization report of SBFL, all statements are often ranked by their suspiciousness in descending order. A smaller indicates a higher likelihood of being faulty statement. Let statement represent the faulty statement and let represent any of the nonfaulty statements. Suppose in the original test suite; if the inequality of is not changed after modifying the class ratio of failed to passed test cases, we regard it as a positive change strategy for modifying the class ratio.

In this paper, we use the PNF strategy to modify class ratio and construct the new test suite. In PNF strategy, we hold the failed test cases unchanged and then change the number of passed test cases. The detailed steps are listed as follows:(1)Build up an initial test suite . Copy all failed test cases from and select part of passed test cases from . Here, the number of selected passed test cases is equal to . It means that the class ratio is 1 : 1 (failed : passed).(2)Build up a new test suite in which the class ratio is 1 :  (failed : passed). Here, the number of passed test cases is times of . Moreover, the increased test cases do not execute the faulty statement .(3)Take the coverage information into consideration. We also calculate the statement coverage because of the importance of the coverage criterion in software testing. We give the priority to the passed test cases which contribute to the statement coverage when selecting a new test case. If the statement coverage of is lower than that of , we add some additional passed test cases to ensure this goal (the statement coverage of is equal to that of ).

Algorithm 1 of PNF is used to select passed test cases and build up a balanced test suite.

Input:  initial passed testcases: ,
   initial failed testcases: ,
   class ratio:
Output: new passed testcases:
//() find passed test cases which do not
execute faulty statements
for  each   (passed test cases ID) in    do
 stmtSet: get executed statements set of tID;
if  stmtSet not contain faulty lines  then
  add () into ;
end
if  nonFaultMap is emptythen
 add all into ;
//() find passed test cases which execute as
many non-faulty statements as possible
sort by its value (size of stmtSet);
validPassList = .Key;
//() build up a new test cases
= ;
= new ArrayList (newPasscnt);
//statement coverage
= get the set of all executed statements;;
while    do
= a new test case of ;
= get executed statements by ;
if  one element in smSetTi smSet  then
  add to ;
  add smSetTi to ;
end
if  statement coverage of   <  that of    then
 //improve statement coverage
 find out the valuable passed test cases;
 add valuable passed test cases to ;
//   make sure of the required class ratio
while    do
 search unselected test case from ;
 add it to ;
end
return ;

To do a detailed theoretical analysis, we describe some common points when applying this strategy to increase passed test cases:(1)Since we do not increase any failed test cases, we get ,  .(2) denotes the number of passed test cases that executed the statement , and its value range is , where .(3)Since the increased passed test cases do not execute the faulty statement , we get .(4)Because all failed test cases definitely execute the faulty statement , .

In the next sections, we do the detailed theoretical analysis by taking typical Tarantula and Nashi2 as examples.

2.2. Theoretical Analysis in Tarantula

The suspiciousness formula of Tarantula is

We try to calculate the object equation:where and denote the suspiciousness of statement and statement after increasing passed test cases. We discuss three cases based on the relation of the suspiciousness of the faulty statement and nonfaulty statement in the initial test suite.

According to (1) and , (2) can be expressed as follows:

Here, we proof that PNF strategy is positive from three cases, respectively: , , and .

Case 1 (). Because the suspiciousness of statement is greater than the suspiciousness of statement , then we can express it as follows:That is,Since , (5) can be simplified as follows: To ensure that our strategy is positive, we need to have .
According to previous calculation, we know According to the above analysis, this problem can be simplified to the following proof: The following is given: . The following is proved: .

Proof. For the faulty statement , because the increased passed test cases do not execute this statement, . For any statement , because we increased passed test cases, the value range of iswhere denotes the number of passed test cases which executed the statement . Consider

Based on the proof, if , we can get when we use the strategy to select passed test cases. That is to say, for the faulty statement whose suspiciousness is higher than the suspiciousness of the statement before increasing passed test cases, its suspiciousness is still higher than the suspiciousness of statement after increasing passed test cases. It shows that the rank of the faulty statement will not decrease. Therefore, the strategy of increasing passed test cases is a positive approach to select passed test cases in this condition.

Case 2 (). Similarly with the first case, when we increase passed test cases to times than before, the value range of is :(i)If , (ii)If , Based on the above analysis, if , we can get when we use the strategy to select passed test cases. It implies that the rank of the faulty statement will not decrease while it maybe increases. When we increase passed test cases, if we select passed test cases which execute as many nonfaulty statements () as possible, it will enhance the rank of the faulty statement. The strategy of increasing passed test cases in this way will be a positive approach to enlarge test suite.

Case 3 (). The object equation is the same as previous analysis: Since , , we only focus on the numerator: When (), because , the result of () could be one of the three cases: , , and . But as we can know, the following condition can effectively reduce the negative effects:Therefore, we should make the value of as large as possible; namely, we should select those passed test cases which execute as many nonfaulty statements as possible.

2.3. Theoretical Analysis in Nashi2

The suspiciousness formula of Nashi2 is

The symbols in (15) are the same with the formula of Tarantula. We use the similar analysis process with Tarantula. Therefore, we still try to calculate the difference between of the faulty statement and of the statement and expect that the rank of faulty statement in the report of SBFL is not impaired. The difference between and can be expressed as follows:

We discuss three cases based on the relation of (suspiciousness of the faulty statement) and (suspiciousness of the statement) in the original test suite as Section 2.2.

Case 1 (). According to , we haveIn order to ensure , the object equation of (16) can be transformed into the following: According to the above analysis, this problem can be simplified to the following proof: The following is given: . The following is proved: .

Proof. Consider the following:

This proof shows that the suspiciousness of the faulty statement is still higher than the suspiciousness of statement after passed test cases are increased. It means that the rank of the faulty statement does not reduce.

Case 2 (). According to this condition, we have In order to ensure , the object equation of (16) can be transformed into the following:

Proof. Because , we can only focus on the numerator and simplify it into

The above inequality implies that the suspiciousness of the faulty statement must be equal to or higher than the suspiciousness of statement after passed test cases are increased. Moreover, we should make the value of as large as possible, and it means that we should select the passed test cases which execute as many nonfaulty statements as possible.

Case 3 (). According to this condition, we have We need to calculate the relation between and .
Because and , the relation between and can be one of the three cases: higher, equal, and lower.
When , it shows . This case can effectively reduce the negative effects on the rank of the faulty statement by increasing passed test cases. Therefore, we should make the value of as large as possible; namely, we should select those passed test cases which execute as many nonfaulty statements as possible.

2.4. Summary of Strategy

According to the analysis and proof about Tarantula and Nashi2 in Sections 2.2 and 2.3, we draw a conclusion that the accuracy of SBFL would not be affected by the class ratio of failed to passed test cases when we change the class ratio using a nonrandom PNF strategy. Here, PNF strategy means holding the number of failed test cases unchanged and modify the class ratio by increasing passed test cases. When increasing passed test cases, the selected passed test cases are expected not to execute the faulty statement and to execute as many nonfaulty statements as possible. According to the analysis, we can build up a balanced test suite with the PNF strategy. The new test suite has the following advantages:(1)A smaller size than the original test suite.(2)A more balanced ratio of failed to passed test cases than the original one.(3)Keeping the same statement coveragence as the original test suite.(4)At least keeping the same fault localization accuracy of SBFL as the original test suite.

Despite the knowledge about the location of a fault in PNF strategy, it can be replaced by other information, such as the top suspicious statements according to suspiciousness calculation. However, in order to get more exact result, the knowledge about the location of a fault is required in this paper. This PNF strategy is not applicable to the regression testing in which target program has to be modified. However, it can be used to construct a new test suite for the regression testing in which target program is unchanged. For example, regression testing is often required for all developed products in a company when OS is updated, package is applied, or platform is changed.

3. Experiment

Because the effectiveness of our PNF strategy has been analysed and proven from a theoretical perspective in Section 2, we took two small programs (tcas and totinfo) from Siemens program suite and one large program (Space) as samples to verify our strategy from an experimental perspective in this section.

3.1. Experiment Setup

Siemens and Space (http://sir.unl.edu/portal/index.php) dataset have been widely used in the research work of fault localization, and the original information of the three target programs is listed in Table 1. All the programs are written in C. The average fault localization accuracy of all available versions for a target program is the final accuracy of the program.

There are several faults in some faulty versions of tcas and Space, and we call these versions multiple faults versions. In experiment, we simplified the interactions and interferences between multiple faults presented in [19] and took one multiple faults version as several single fault versions. For these multiple versions, we supposed that only the most suspicious fault could be localized in every iteration, then fixed it, and entered it into the next iteration to find out another fault. Although this way is not very efficient, it is similar to the fault localization process of real software testing in a certain degree. In addition, we used 20 of 38 versions of Space program in our experiment and excluded other versions in which there were compile errors or no failed test cases.

Symbol refers to different test case class ratio and symbol denotes different program versions. In experiment, we generated 7 different class ratios for each program suite, as = 1, 2, 4, 8, 16, 32, 64, where means that the number of passed cases is 64 times the number of failed cases in test suite. Symbol refers to one test, and it means experiment on version with test case class ratio .

To evaluate the influence of class ratio in fault localization activity, we performed 4 basic fault localization techniques, Tarantula, Nashi2, Jaccard, and Ochiai, on each faulty program version with different class ratio. Because of the time and space constraints, our case studies did not use other advanced fault localization techniques, such as RBF, DStar, and others proposed in [4, 6, 2022].

Since failed test cases are more contributive to fault localization, we generated test case suites by remaining all the failed cases and increasing passed cases in accordance with the different class ratio. When we constructed test suites, we used random strategy and nonrandom PNF strategy, respectively. The random strategy selects passed cases randomly from the original passed test cases to generate a new test suite, while the nonrandom PNF strategy selects passed cases according to Algorithm 1.

3.2. Results Measured by Score

In this section, we used suspiciousness to measure the accuracy of fault localization, which has been widely used in software fault localization [1, 2, 23]:

In (24), denotes the total number of statements, and denotes the rank of the faulty statement. presents the percentage of code that needs to be examined before the faults are identified. A higher score means higher efficiency of fault localization. When calculating score, we used First-Line strategy to deal with the same suspiciousness situations in which we assigned all statements sharing the same suspiciousness with the first ranking number of them. And we took the average of the score of each faulty version of the same program as the final score of the program. For multiple faults version, we assigned the highest score among several iterations as the score of the version.

Figures 1, 2, and 3 present our experiment results on tcas, totinfo, and Space respectively. In the horizontal axis of these figures, “orig” denotes the original test suite and “1 : ” means that the class ratio of failed test cases to passed test cases is 1 : .

The three figures show that class ratio has effect on fault localization performance when using random strategy to generate test suite for all four SBFL techniques. For tcas and totinfo, we can get better accuracy of fault localization with lower class ratio, which implies that a balanced test suite is more efficient for SBFL by selecting passed test cases in a random strategy, which is also consistent with the result of literature [16]. But when we use the nonrandom PNF strategy to enlarge passed test cases, the class ratio can do barely nothing to fault localization performance, which conformed with our theoretical analysis. Consequently, whether the class ratio has the effect on the accuracy of SBFL is closely related to the strategy for generating the new test cases. From the results, it may be observed that the scores of the new test suites constructed by PNF strategy are higher than the scores of original test suite for the three target programs, while the strong point does not always hold when using random strategy to build up a new test suite.

As shown in Figure 3, there is a small difference between Space and tcas/totinfo. Figure 3(b) about Space program indicates that the four classical SBFL techniques could get better fault localization accuracy with a more balanced test suite using random strategy. It is the same with tcas and totinfo, while the fault localization accuracy becomes better when the distribution of test cases becomes more unbalanced using PNF strategy. It is a different trend compared with tcas and totinfo. But no matter using PNF strategy or random strategy, we can still achieve higher score than the original test suite. The experiment evidence of our PNF strategy is effective for building up a relative balanced test suite. In this figure, 1 : 2 is the best class ratio considering both the accuracy of fault localization and the size of test suite.

3.3. Results Measured by NScore

This section evaluates our theoretical analysis with NScore measurement conforming with literature [16]. NScore could be calculated by the following equation:

In (25), is the number of faulty program versions in which the accuracy of fault localization is higher than a threshold and is the total number of program versions.

The experimental results of tcas and totinfo using different threshold (0.8/0.9/0.95) are presented in Figures 4 and 5. For tcas program, the most left top point in Figure 4(a) means that the scores in 50 of versions are higher than 0.8. Namely, if the threshold of accuracy is 0.8 and the class ratio is 1 : 1, then 50 of faults can be localized correctly with Tarantula. The following can be observed: (1) The performance of fault localization is more stable using PNF strategy than random strategy to generate test case; (2) higher fault localization accuracy can be achieved using PNF strategy instead of random strategy. The totinfo program has the same tendency with the tcas.

Figure 6 illustrates the experiment result of Space program. Similar to the results in Section 3.2, there is a small difference between the Siemens and Space program. Figures 6(c) and 6(d) indicate the following: No matter using PNF or random strategy for Nashi2, NScore of Space does not change with the variation of class ratio, while NScore has an increased tendency along with the growth of unbalanced class ratio for Tarantula presented in Figures 6(a) and 6(b). Although a thorough analysis about the reason causing the difference has not been conducted yet, the PNF strategy can reduce the original test suite to a more balanced test suite and get better fault localization accuracy.

3.4. Experiment Summary and Discussion

The experiments using and imply that class ratio really has influence on the accuracy of faulty localization. From the experiment results, we concluded that we can build up a balanced test suite with PNF strategy and ensure the accuracy of fault localization with the new balanced test suite. Also, the size of new test suite is smaller than original one. In experiments, the test case reduction rate was over 95% in tcas program, 90% in totinfo program, and 90% in Space program.

In experiment, we also found that the score of different faulty versions has a significant difference with the same SBFL method, while the results of some version have same tendency. The average score will make us ignore the difference. Table 2 is the score of different tcas versions with several SBFL methods, and it is sorted by Tarantula’s score in descending order. From this table, we observed that the maximum score of tcas is 0.9859, but the minimum score is only 0.1268. Why are the scores of some versions so different? Nashi2, Jaccard, and Ochiai have the same problem. It is worth doing more studies from this perspective.

4. Conclusions

Every suspiciousness calculation of SBFL method is closely related to the number of passed and failed test cases. Previous studies have shown a balanced test suite, which means that the class ratio of the number of failed to passed test cases is similar and is more beneficial to improve the accuracy of SBFL. In this paper, we proposed a PNF strategy to building up a balanced test suite according to the theoretical analysis and evaluated it by experiments using different SBFL methods.

In the PNF strategy, in order to construct a new balanced test suite, we kept the failed test cases unchanged and selected passed test cases from the original test suite according to certain rules: the selected passed test cases should not execute the faulty statement and should execute as many nonfaulty statements as possible with the consideration of statement coverage. The experiments indicated the PNF strategy is effective for SBFL. Based on the original test suite, it can generate a new more balanced test suite, which has smaller test suite size (average 90% test cases are reduced), the same accuracy of SBFL, and the same statement coverage.

However, PNF strategy still has some limitations. For example, when there are only few failed test cases in the original test suite, the selected passed test cases by PNF strategy must be not enough for normal testing. And the process about the multiple faulty program is not enough. The work can be improved in following directions: (1) To reveal the other factors which have the effect on the accuracy of SBFL besides the class ratio; (2) To combine the PNF strategy with test suite reduction techniques. These studies will improve the efficiency of testing and the accuracy of fault localization. In addition, how to build up a balanced test suite without the knowledge about the location of a fault is one of our future work.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The work is supported by the National Natural Science Foundation of China (61402370, 61502392), Fundamental Research Funds for the Central Universities (3102015JSJ0004, 3102014JSJ0013), and Aerospace Science and Technology Innovation Foundation of China (2014H03FK011).