Abstract

For mission critical programs, integer overflow is one of the most dangerous faults. Different testing methods provide several effective ways to detect the defect. However, it is hard to validate the testing outputs, because the oracle of testing is not always available or too expensive to get, unless the program throws an exception obviously. In the present study, the authors conduct a case study, where the authors apply a metamorphic testing (MT) method to detect the integer overflow defect and alleviate the oracle problem in testing critical program of Traffic Collision Avoidance System (TCAS). Experimental results show that, in revealing typical integer mutations, compared with traditional safety property testing method, MT with a novel symbolic metamorphic relation is more effective than the traditional method in some cases.

1. Introduction

Integer overflow is one of the most important dangerous faults, which is crucial for mission critical programs which are usually related to critical functions or services. And more important, it is too late to deal with integer overflow until the program under test halts or collapses, which is catastrophic [1]. Detection of such faults or bugs presents a challenge because conventional software testing processes do not always apply: in particular, it is difficult to detect subtle errors, faults, defects, or anomalies, which could lead to recessive failure in many applications in these domains because there is no reliable “test oracle” to indicate what the correct output should be for arbitrary input. The general class of software systems with no reliable test oracle available is sometimes known as “nontestable programs” [2]. Many of these applications fall into a category of software that Weyuker describes as “programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known” [2].

Even though the ideal oracle is hard to get sometimes, software testing can still be carried out. Baresi and Young [3] proposed to use necessary properties of program to test numerical programs. For example, a common method used for testing numerical programs is to check if the outputs satisfy certain properties, such as the property of for exponential calculation programs. Program checkers [4, 5] and self-testing [6, 7] also make use of relations among outputs of object program to check its correctness by themselves. To design programs that check their work, one basic technique used is, first, determine the necessary properties of program, then check if it satisfies these properties with random inputs. The widely used properties are linear consistency and neighbor consistency [6, 7]. In software tolerance, the most relevant technique is data diversity [8], which is redescribing the input in a different format. This technique can lower the cost greatly compared with “N-version programming.” Data diversity was first proposed for fault tolerance, not for fault detection. Meanwhile, properties applied by data diversity are restricted by equality relations.

For testing integer defects, which could be considered as one of the most important faults of mission critical software failures and classified into four categories [9], we first enumerate the metamorphic relations that such application would be expected to demonstrate, then for a given implementation determine whether each relation is a necessary property to reveal program correctness. If it is, then an unexpected change could exhibit the violation of the relation, which indicates an integer defect. If it is not a necessary property of the software under testing and the properties would still be anticipated to hold in the application or algorithm, then they cannot be used for validation. In addition to validating the effectiveness of metamorphic testing of integer overflow, we conduct a case study, where the authors apply the method to alleviate the oracle problem in a typical mission critical system, Traffic Collision Avoidance System (TCAS), which is used as the critical application for airplane collision management.

The rest of the paper is organized as follows. Section 2 supplies background information about integer overflow defects and introduces the integer defect categories. Some basic definitions are also introduced in this section. Section 3 proposes the relationships between MT and metamorphic relation. Section 4 presents the case study. The results of our experiment demonstrate that the MT could detect integer defects effectively in the mission critical program TCAS. Mutation testing is also introduced to systematically insert integer mutants into the source code and validate the effectiveness of metamorphic testing with conventional ones in integer defects detection. Our conclusions and future works are given in Section 5.

2. Preliminaries

2.1. Integer Overflow Defects

An integer overflow is caused by the fixed width of the integer data type expression. As the data of operation statements computed is bigger or smaller than the range of the fixed width, then there is an integer overflow. And sometimes, an integer overflow defect is an integer bug in wide sense. For C language, Sercord [1] proposed that integer defects could be categorized into four types, which include overflow integer defects, underflow integer defects, signedness integer defects, and truncation integer defects. Table 1 gives the examples of different kinds of integer defects.

2.2. Definitions

The test oracle problem has always been restricting the development of software testing. Researchers have been exploring the solutions to the oracle problem in different applications. Then in 1998, Chen and his fellows [10] proposed the software metamorphic testing technique, which is an effective method for oracle problem. This technique verifies the correctness of the program by checking whether it satisfies some necessary properties of the program, which are called metamorphic relations. The definitions with metamorphic testing techniques are given as follows.

There are two understandings of the test oracle [2]. Some one refer to it as the expected outcome of the program under test. Some others also include the process of comparing the actual outcomes against the expected ones. For example, program is an implementation of , the test oracle in a narrow sense is all the for , and the test oracle in a wide sense refers to a mechanism that checks whether expression is true for . The test oracle used in this paper is referred to the one in a narrow sense without special statements.

Definition 1 (metamorphic relation (MR) [11]). Suppose program is the implementation of function and are groups of input for , their corresponding outputs are . If satisfy relation , it can be referred that satisfy relation ; that is,
Then is called a metamorphic relation of .
Therefore, if is correct, then it must satisfy the following comprehension:
are actual inputs of corresponding with and are the outputs. People could verify the correctness of by checking whether expression (2) is satisfied while testing.
Suppose program is correct, then the following expression should be satisfied: , where is the input of corresponding to and is the output. We use to represent the input in this paper. So if the outputs of test cases do not satisfy the above formula, then the hypothesis is wrong and there are faults in the program. MRs are the key to judging the execution of a set of test cases and their quality greatly affects the efficiency of testing. For different SUTs, there is usually more than one MR. Suppose is the th metamorphic relation of and represents the set of metamorphic relations.
For Definition 1, it is not easy to understand, and more important, it needs more than two formulas to represent a metamorphic relation. Therefore, based on the traditional one, we propose an integrated formula in expression (3) to describe a metamorphic relation:
For example, for program , a typical metamorphic relation is

Definition 2 (original test cases). It is also called original test input recorded as OTI. Suppose there is a metamorphic relation and its input is , then the OTI is test cases from which are generated with other testing methods, such as special testing, random testing, and iterative testing.

Definition 3 (follow-up test cases [12]). It is also called follow-up input (FTI). Suppose one has a metamorphic relation and its input is . FTI is all the test cases from except original test cases. Follow-up test cases are generated based on metamorphic relations. Suppose the input of a metamorphic relation is , then it can be recorded as (OTI, FTI).

3. Relationships between Metamorphic Testing and Metamorphic Relation

Metamorphic testing technique [10, 11] was proposed by Professor Chen, which checked the relations between inputs and outputs to determine whether the program satisfies the necessary properties, which are called as metamorphic relations. With the relations, it is unnecessary to assume the existence of an ideal oracle for test inputs. It could also check whether the test outputs deviate from the predicted ones. In the previous section, several prerequisite conceptions were proposed. The relationships between these conceptions of metamorphic testing and metamorphic relation will be illustrated in this section.

Therefore, it is not an easy task to evaluate the effectiveness of different relations, because the definition of metamorphic relation is so complex. So, in Figure 1, the metamorphic relation decomposed into three subrelations. Original test cases and follow-up test cases come from input space , and their corresponding test outputs should in output space .

If is a program, then its program function is denoted by [13]. represent finite numerable test inputs. represent the outputs of corresponding program function .

First of all, the metamorphic relation could be denoted as where the numerator is the hypothesis of , which means that the MR based on the hypothesis that the output of program is equal to the output of program function , and if the input satisfied , the output of program function can satisfy the relation .

Definition 4 (input relation of metamorphic relation: IR). Given a program , which implements program function , are different input variables and are the corresponding outputs. If there is an :
   , then the relation is called as . It could also denote as .

Definition 5 (output relation of metamorphic relation: OR). The relation is called as output relation. It could also denote as .
In this paper, the program function is also called as self-relation (SR), which could be denoted as . With this definition, it would be easy to understand the structure of MR, shown in Figure 1.
So a metamorphic relation can be represented as .
For example, for formula (4), , , OR .

4. A Case Study

In Section 3, we introduce the relationships between metamorphic testing and metamorphic relation, which could be used to instruct the testing of this section, in which we conduct a case study, aiming to investigate the effectiveness of our method in verification of integer faults. We choose a kernel component of TCAS, named tcas.c which can be downloaded from the Software-artifact Infrastructure Repository [14].

4.1. TCAS

TCAS [15] is an on-board aircraft conflict detection and resolution embedded system. The system is intended to alert the pilot to the presence of nearby aircraft that pose a midair collision threat and to propose maneuvers so as to resolve these potential conflicts. In cases of collision threats, TCAS estimates the time remaining until the two aircrafts reach the closest point of approach (CPA) and presents two main levels of alert. When an intruder aircraft enters a protected zone, the TCAS issues a Traffic Advisory (TA) to inform the pilot of potential threat. If the danger of collision increases, then a Resolution Advisory (RA) is issued, providing the pilot with a proposed maneuver that is likely to solve the conflict. It is possible to download a C component from the Software-artifact Infrastructure Repository [14], which is called tcas.c. It is a preliminary version of TCAS, which is freely and publicly available and is responsible for the Resolution Advisories issuance. The component is (modestly) made up of 173 lines of C code. In our experiment, the environment is Linux operation system and GCC compiler.

4.2. Designing Typical Metamorphic Relations

According to the safety properties which is formalized described in [15], if the relative location between TCAS equipped airplane and threat one keeps fixed, no matter how high the altitudes (altitude is high enough) of the two airplanes are, the outputs are equal. After control flow and data flow analysis on the source code of tcas.c, we find that input parameters Own_Tracked_Alt (the altitude of the TCAS equipped airplane) and Other_Tracked_Alt (the altitude of the “threat”) are independent from other input parameters, so we conclude that outputs keep equal if the relation between Own_Tracked_Alt and Other_Tracked_Alt is not changed; meanwhile other parameters’ values are not changed too.

Above all, we propose a typical metamorphic relation based on both black box and white box information, where represents Own_Tracked_Alt and is Other_Tracked_Alt as follows: is the offset; so in this experiment, let separately. and represent other parameters, and their values do not influence the metamorphic relation. and represent individual input parameter combination of the TCAS.

4.3. Mutant Generation

To gain an understanding of how effective metamorphic testing is at detecting integer defects in applications without test oracles, we use mutation testing to systematically insert defects into the applications of interest. Mutation testing has been shown to be suitable for evaluation of effectiveness, as experiments comparing mutants to real faults have suggested that mutants are a good proxy for comparisons of testing techniques [16].

For this experiment, the algebra operations are not many for the source code of tcas.c; so the most possible integer bugs are that programmers do not estimate the range of the inputs by defining a wrong type that represents a shorter bit vector. As the range of integer type is from −2147483648 to +2147483647 and char type is from −128 to +127 in GCC standard, we introduce three types of integer mutants shown in Table 2. For the first mutant, we replace the type of parameter Cur_Vertical_Sep from “int” to “char,” which is the mutant 1 version program. For the second one, we replace the type of “int” to “short int,” which is mutant 2 version program. And for the third one, we replace the type of “int” to “long int,” which is mutant 3 version program. A mutant code segment of tcas.c is showing in Box 1, which is a kind of width overflow fault.

4.4. Test Cases Execution and Results Comparison

Subtest sets in testplans-bigcov.tar cover all the branches in the program and the tests are generated randomly for each branch. We select suite 8 (including 87 test cases) in testplans-bigcov.tar as test inputs. We execute these test cases in the three mutant version programs. We say these different outputs between these versions are actual mutants because of mutant injection. Then forgeting the execution information, we insert several “if” statements into the end of the program with formal safety properties as test oracle. A test is “failure” if and only if none of these properties are satisfied and “pass” if and only if at least one is hold. At last, we use MR and original test cases suite 8 to generate follow-up test cases and test the results.

For evaluating the effectiveness of different MRs, we adopt the fault detection ratio (FDR) [8], the percentage of test cases that could detect certain mutant ; that is, FDR , where is the number of times SUT fails, is the number of test cases, which include original test cases and follow-up test cases, and is the number of infeasible test cases. The testing execution results are shown in Table 3. For the value in the table, numerator is the number of unsatisfied test cases and denominator is the number of total test cases.

In this experiment, we select , separately. is the medium of the values of two parameters of Own_Tracked_Alt and Other_Tracked_Alt individually in suite 8. The test result is affected by , because after subtracting the value , some of Own_Tracked_Alt and Other_Tracked_Alt are changed from the invalid domain to the valid domain. If we select an appropriate , we may get a higher FD.

As Gotlieb [15] proposed ten safety properties of TCAS, which are more effective than traditional testing methods, so we compare our metamorphic testing with the safety properties testing. The average integer fault detection ratio of metamorphic testing with different MRs and a safety property is shown in Figure 2. From the figure, we can see that metamorphic testing method is more effective than the formal safety property method for the mutant in this experiment. For the mutant 1 version program, FDR o MR in average is 21.84%, which is much higher than the FDR of the formal safety property method, which is only 2.30%. And for the other two mutant versions, the FDR of MR in average is 20.92% and 22.30%, and 1.49% and 2.30% for the safety property method.

4.5. Experiment Conclusions

By this experiment, we could see that it is possible to detect latent fault by a typical symbolic metamorphic relation, which represents a series of relations. And we should admit that it is not easy to design this typical metamorphic relation, which needs an in-depth understanding of the software under test. In other words, this kind of MRs, which is related to the kernel function of the SUT, is more effective than other relations. And the result of this experiment is also consistent with the conclusion of [17].

However, in this case study, we suppose that there is only one integer overflow defect in the program under test by mutant injection. With one metamorphic relation, it is easy to detect only one integer defect, when the relation is not satisfied. So, it is hard to detect more faults by only one metamorphic relation. At the same time, safety property method can detect a design failure which is not detected by MT. This is a design bug that the programmer cannot take into account in the implementation or design process, and this goes beyond our discussion about integer defects detection.

5. Conclusions

In this paper, we have presented a case study on the mission critical program MT, which could detect integer faults effectively. Integer fault detection ratios contrasting to traditional safety property technique are discussed. The results show that the proposed metamorphic testing is more effective than the safety property technique, which is based on in-depth understanding of the program structure and algorithm to generate this kind of novel relations.

Acknowledgments

This work was supported by the National High Technology Research and Development Program of China Project (no. 2009AA01Z402) and the Natural Science Foundation of Jiangsu Province, China (no. BK2012059, no. BK2012060).