Abstract

Control flow graphs are a well-known graphical representation of programs that capture the control flow but abstract from program details. In this paper, we derive decision graphs that reduce control flow graphs but preserve the branching structure of programs. As an application to software engineering, we use decision graphs to compare and clarify different definitions of branch covering in software testing.

1. Introduction

Graphs that represent the control flow of programs have been studied since many years and are known under the names of control flow graphs or program graphs. There are mainly two types of such graphs: one that associates one node with each statement in programs, see, for example, [1], where control flow graphs are applied to optimization or [2] for the application in software engineering; and the other that replaces maximal sets of consecutive nodes with a single entry and a single exit called blocks or segments, by single nodes, for example, [3, 4]. Blocks can be derived from the control flow graphs of the first type or constructed directly from the programs. Both types capture the control flow by abstraction from the program details.

Since the control flow through programs is determined by the decisions, for example, the if-then-else-constructs, based on the data and the conditions in such constructs, it is promising to keep in graphical representations of programs only the decisions and the control flow between them and thus defining a reduction of control flow graphs that preserves the branching structure.

In this paper, we will study control flow graphs (of the first type) and derive decision graphs [5, 6] that represent the branching structure of programs based on the definition of program graphs reduced to DD-paths by Paige [7].

Statement coverage and branch coverage are widely used in software testing. The first property can be checked with control flow graphs since each node represents a statement (or block of statements). When regarding branch coverage, analogously, the question arises, in which in graph type, each edge represents a branch.

More general, decision graphs can be derived not only from control flow graphs but also from arbitrary directed graphs and thus represent the branching structure of the graphs. We show that the branches in a graph correspond to the edges in the derived decision graph.

As an application, we compare different definitions of branch covering in software testing that already existed but were specified in different ways in order to find where they differ from each other and thus get new results on branch coverage that clarify the definitions found in the related literature.

The contribution of this paper is to solve the problem of finding a graph type that has one edge for each branch, analogously to control flow graphs that have one node for each statement. Furthermore, we apply decision graphs to software engineering and clarify the different notions of branch covering in software testing, one of them based on decision graphs, in order to avoid confusion when using them in practice. We propose decision graphs, independently, from the application to software testing, as means to abstract from details and focus on the decision structure.

The remainder of this paper is organized as follows. We start with the necessary definitions and results about directed graphs, control flow graphs, and decision graphs. In Section 3, we show the correspondence of branches in a graph and the edges in the derived decision graph. As an application, we define three different notions of branch coverage and compare them in Section 4. Related work and other applications of control flow graphs are discussed in Section 5. Section 6 concludes the paper.

2. Basic Definitions and Results

This section presents definitions and results—partially taken from [5, 6]—necessary for the following.

2.1. Directed Graphs

Definition 1. A directed graph with multiple edges is a pair consisting of a finite set of nodes and a finite set of edges with , together with functions and that associate a start node and an end node, respectively, with each edge.
For a node , the sets , and , are called preset and postset of , respectively. An edge that ends in a node is called incoming edge of ; an edge that starts in is called outgoing edge of . The indegree and outdegree of is the number of the incoming and of the outgoing edges of , respectively, that is, ; . Nodes with indegree 0 are called entry nodes and those with outdegree 0 exit nodes of the graph.
A path is a nonempty, finite sequence of edges such that for . The start node of the first edge is called the start node of , the end node of the last edge —the end node of . The length of a path is the number of its edges. The nodes are called inner nodes of . A path which contains one node twice and all other nodes only once is called loop. A loop is called unconditional loop, if the inner nodes and the end node have outdegree 1. A special case of an unconditional loop is an isolated loop, that is, a loop that has only nodes with indegree 1 and outdegree 1. A path is contained in a path if for indices . A prefix of a path is an initial part of the path for denoted by . If , we write . For a path , the set and, for a set of paths, the set are the sets of prefixes of and , respectively. A path starting with an entry node is called S-path. An S-path that ends in an exit node is called complete path. By and , we denote the set of nodes and edges, respectively, that are contained in the path . A node is reachable from a node if or if a path with start node and end node exists.
If it is not clear from the context which graph is meant, we add a subscript to the functions , , , , , , and , for example, .
A detailed introduction to graphs can be found, for example, in [8, 9].
In many cases, we do not need multiple edges between nodes. If for all , the graph is called simply directed graph. In such a graph the notation can be simplified and an edge can be written as a pair of nodes. A path can then briefly be denoted by the sequence of the contained nodes , where . Since there are no multiple edges in such a graph, the numbers of incoming and outgoing edges of a node are equal to the numbers of elements of the preset, postset of , respectively, that is, , .
A simple fact that will be used later is that in a directed graph with multiple edges, the number of edges is equal to the sum of the numbers of incoming edges of all nodes and also equal to the sum of the numbers of outgoing edges of all nodes: . The same holds for an unconnected component of a graph. An isolated loop forms a component of the graph, consisting of sequential nodes, that is unconnected to the rest of the graph. Therefore, the above equation is also valid for isolated loops.

2.2. Control Flow Graphs

The control flow in a function written in a programming language can be modeled by a directed graph called control flow graph, which contains one node for each statement in the function and edges that represent the control flow between statements. We add an entry node and an exit node as unique entry and exit points of the function. When a function is called within a function, the control flow also leaves the function and enters it again after the execution of the called function. But since we discuss only single functions, we do not interpret function calls as exit and entry points of the function. Statements in the programming language C are function calls, assignments, and other expressions with semicolon, return-, break-, continue-, goto-, if-, switch-, do-while-, for-, and while-statements and the null statement. Syntactically, a block is also a statement, but since it consists of statements, only the statements in the block are nodes in the control flow graph, not the block itself. Declarations and definitions are not statements and are also not included in the control flow graph.

Definition 2. The control flow graph of a function is a directed graph that consists of the set of nodes and the set of edges. This set contains an edge if the statement is executed immediately after the statement . For the first statement in the function, we introduce an edge . Furthermore, we add edges for each node that is associated to a statement , after which the control flow leaves the function because of a return-statement or the right brace that terminates the function. The control flow graph of an empty function, that is, a function without any statements, consists of and .
From the definition, it follows that in the control flow graph , each node—with the exception of and —corresponds to a unique statement in the function . Figure 1 shows as an example the control flow graph of the following function Search:

void Search int values ,  int key,

      int *found,   int *index

{

 int i 0;

 *found 0;

 while i < N

 {

  if values key

  {

   printf found positive ;

   *found 1;

  }

  else

  {

   printf non-positive ;

   if values key

   {

    printf found negative ;

    *found 1;

   }

  }

  if *found

  {

   printf found at index %i ,   i ;

   *index i;

   return;

  }

  i ;

 }

}

This function checks whether the array parameter values contains the integer parameter key or -key. In these cases, it sets the output parameter found to 1 or to and sets the parameter index to the found index. The length of the array is given by the constant N.

The nodes of the control flow graph are labeled with “=”, “while”, and so forth to show which kind of statements is represented or with “in”, “out” for better readability. The preset of the exit node consists of two nodes, one representing the return-statement and the other the while-statement after which the function reaches the terminating brace.

2.3. Decision Graphs

As in [5, 6], we reduce directed graphs and keep in decision graphs only entry and exit nodes and such nodes that represent decisions, that is, nodes with postsets that have two or more elements, called D-nodes [7]. The following definitions and results are independent from the modeling of software by control flow graphs and can be applied to all directed graphs.

Definition 3. Let be a directed graph. A node is called D-node if it is an entry node or an exit node or if . A DD-path is a path where the start and end nodes , are D-nodes and the other nodes are not D-nodes.
A node is not a D-node if its indegree is at least 1 and its outdegree is exactly 1. The D-nodes of the control flow graph of the function Search (Figure 1) are the entry and exit nodes , and the while- and if-nodes , , , .
If an inner node would occur twice in a DD-path, no D-node could be reachable from it. Therefore, all inner nodes are different [5]. Furthermore, there are at most different DD-paths that start in a D-node , since there is no branching possible after leaving .
In order to reduce graphs but to preserve the branching structure we follow the idea of Paige [7] and replace DD-paths, with single edges. There may be more than one DD-path between two D-nodes, for example, between the second and the third if-node in the control flow graph of Search (Figure 1), and therefore multiple edges are necessary to avoid the merging of branches in the decision graph.

Definition 4. Let be a directed graph. The decision graph of is a directed graph with multiple edges that consists of the set of nodes and the set of edges that contains an edge with start node and end node for each DD-path in that starts in and ends in .
The number of edges in between two nodes and is equal to the number of different DD-paths in that start in and end in and is equal to the (finite) number of different DD-paths in that start in . The assignment of edges in to DD-paths in is a bijective function that allows us to identify the edge in the decision graph that corresponds to a given DD-path in the graph and vice versa. This will be necessary to distinguish multiple edges when we compare different coverage notions. In Figure 2, the decision graph of the control flow graph of the function Search is depicted.

Figure 3 shows the decision graph of the following function f1:void f1 int x {  int y; if x > 0   x 1; y x; g y ; }

This example shows that DD-paths need not to be disjoint ( and ).

Not only graphs can be reduced to decision graphs, but also paths can be reduced to to decision paths.

Definition 5. Let be a directed graph and a path in that starts with a D-node and contains at least a second D-node. Let be the D-nodes in with . Then are DD-paths for . From Definition 4, follows that there are edges with start node and end node in the decision graph associated to the DD-paths . The decision path of is defined as .
The nodes following the last D-node are not a complete DD-path and are therefore clipped. From for , we obtain that the decision path is a path in the decision graph [5]. The path in the control flow graph of the function Search contains nine ( ) D-nodes (underlined), can be split into eight DD-paths and therefore induces the decision path with eight edges. The last two nodes are not represented in the decision path.

3. Branches in Directed Graphs

In this section, we will discuss branches in directed graphs and their relationship to edges in the derived decision graphs. Let us examine the graph shown in Figure 4. This graph can be a control flow graph, for example, of the following function f2:void f2 int x { if x label goto label; }

Since in the decision graph the goto-node will not appear, this example shows that unconditional loops are not represented in decision graphs. Therefore, the number of DD-paths that start in a D-node , where an unconditional loop branches off is lower than outdegree( ). In the example, the if-node has two outgoing edges and a postset with two elements, but only one DD-path that starts in the if-node.

In programming languages like C, the only possibility to create unconditional or isolated loops is using goto-statements.

Lemma 6. Let be directed graph. Then it holds: (1)each node that is not contained in an isolated loop is reachable from a D-node,(2)each edge that is not contained in an isolated loop is contained (as last edge) in a path that starts with a D-node and whose inner nodes are not D-nodes,(3)each edge that is not contained in an unconditional loop is contained in a DD-path [6].

Proof. (1) Let be the set of nodes from which is reachable and assume that does not contain a D-node. Let be an edge that starts in . If , the node cannot be reached from . But can be reached from . Therefore, there must be a second edge that starts in , and would be a D-node which contradicts to the assumption. This means that . Let be an edge that ends in . Then is also reachable from and . Together this means that, if there is a connection of with by an edge, this connection can only be the outgoing edge of (which exists since is not a D-node).
In the case that this connection does not exist and the end node of the outgoing edge of is in , the nodes form an unconnected component and (because of the assumption that does not contain D-nodes, for all ). Therefore, all nodes in must have indegree 1. Let us denote the successor of by , the successor of by , and so on. Finally, we reach a node (which must be ), a second time. This means that we found an isolated loop that contains which is forbidden.
In the case that the end node of the outgoing edge of is in not , it follows that and at least one node in must have indegree 0 and is a D-node which contradicts to the assumption.
(2) If the node is a D-node, is contained in the path . If is not a D-node, it follows from part 1 of this lemma that there is a path that starts in a D-node and ends in . We prolong this path by the node and get the path that contains as last edge. If an inner node in the path is a D-node, we shorten the path from the start node to the last inner D-node (let us denote it by ) in the path. The resulting path, denoted also by , starts with the D-node , which could be , and ends with the edge . The inner nodes of are not D-nodes (Figure 5).
Part (1) of this lemma can be applied since would be contained in the same isolated loop, if is contained in an isolated loop.
(3) If the last node in the path (this is ) according to part (2) of this lemma is not a D-node, we follow the edges until we find a D-node (an exit node is a D-node) or detect after at most steps an unconditional loop that contains which is forbidden by the assumption. Thus, we constructed a DD-path that contains the edge .

If we exclude unconditional loops, we can show that a decision graph has exactly one edge for each branch in the directed graph and thus abstracts from branch details. We identify branches by their first edges—that are outgoing edges of D-nodes. Such an edge leads from an entry node into the graph or selects a branch in a node with a postset of two or more.

Theorem 7. Let be a directed graph without unconditional loops. Then, the branches in , that is, outgoing edges of D-nodes, correspond bijectively to the edges in the decision graph of .

Proof. Let be a directed graph. The set of outgoing edges of D-nodes will be denoted by . Let . With part (3) of Lemma 6, follows that there exists a DD-path that contains . Since is a D-node, must be the first edge in . It is not possible that two different DD-paths start with the same first edge, and therefore the association is well-defined. Furthermore, for two different edges since , are the first edges in and , respectively. Of course, every DD-path starts with an edge in . Together, that means that is a bijective association from to the set of all DD-paths in . In Definition 4, a bijective association was defined (let us call it ) that associates an edge in the decision graph of with each DD-path in . This means that is bijective.

Note that this result holds for arbitrary directed graphs, not only for control flow graphs, and thus is independent from the modeling of software by graphs.

4. Branch Coverage

In this section, we apply decision graphs to software testing and compare different definitions of branch coverage.

4.1. Test Cases

When a test case for a function is executed, it runs through the function and also induces a path in the control flow graph. This path always starts with the edge , where is the first executable statement or with , if the function does not contain any statements, and therefore is an S-path. In most cases, the execution reaches the end of the function, and the induced path is complete. But there are also cases where the exit node is not reached, for example, when a function is called that does not terminate or a division by 0 is encountered. In both cases, we observe a finite but not a complete path in the control flow graph of the function. The execution could also encounter an infinite loop in the function. Then, we observe in theory an infinite path. In practice, we have to stop the execution of the test case after some time and also get a finite path. This means that we always observe a finite path while executing a test case for a finite observation time. Mostly, we can distinguish these cases in practice while debugging the application. Clearly, this is impossible in general.

Definition 8. Let be a test case of a function . If is the observation time, we denote by the observed finite S-path in the control flow graph of the function that is induced by the execution of the test case for time . The set is the set of all observed paths. For a set of test cases, we write
If a path is in , any prefix can also be observed with shorter observation time. This means that is prefix closed.
The execution of the test case of the function Search with the array 1, 2, 3, 4 for the parameter values and for the second parameter key induces the complete path in the control flow graph. Therefore, (Search , , , . The test case of the function f2 (Figure 4) with parameter 1 results in an infinite execution that is represented by the infinite set of paths (f2 .

4.2. Coverage

The basic coverage notion in software testing is a statement coverage which is obtained if in a test of a function, the test cases in a set execute all statements in the function. For the control flow graph of the function, it follows that all nodes are covered by the paths that are induced by the test cases, that is, . For this reason, this coverage criterion is also called all-nodes criterion [4, 10]. The set of test cases with as above and with parameters values = 1, 2, 3, 4 and key = 2, which induces the complete path satisfies statement coverage for the function Search since each node in the control flow graph of the function is covered by the paths in (Search . A test case of a function induces paths in the control flow graph and thus also paths in the decision graph of the control flow graph if it runs through at least a second D-node.

Definition 9. Let be a test case of a function . We define
The set is also prefix closed: Let (where ) and a prefix with . Then, a path exists such that is the decision path of . With Definition 5, it follows that the edges , correspond to the DD-paths . The path induces then . Since is prefix closed, and .
The set of test cases for the function Search as above induces two complete paths and from which two decision paths , and the set of decision paths (Search can be derived.
In software testing, the notion of branch coverage where the test cases should cover all branches of the software is very popular because it is stronger than statement coverage but easier to obtain than more sophisticated coverage definitions like those that consider not only decisions but also the Boolean conditions that occur in the decisions or like data-flow-oriented coverage notions, for example, [4, 10], which can give better results [11]. In the following, we will give three definitions of branch coverage and investigate the relationship and differences between them. The first one captures the notion that in decisions like if-statements the conditions should be at least once true and once false during testing and thus all branches should be taken, the second one is edge covering of the control flow graph, and the last one is edge covering of the decision graph.

Definition 10. Let be a function, the control flow graph of and the decision graph of the control flow graph. (i)A set of test cases satisfies decision coverage if and only if (ii)A set of test cases satisfies edge coverage (of the control flow graph) if and only if (iii)A set of test cases satisfies branch coverage [5] if and only if
Edge covering (of the control flow graph) is often called all-edges criterion or branch coverage by many authors, for example, [4, 10, 12], whereas our notion of branch coverage is defined as edge covering of the decision graph. Hierons et al. [13] define branch coverage based on outgoing edges of D-nodes similar to our definition (i). Frankl and Weyuker [11] do not distinguish between branch testing and decision coverage based on the Boolean conditions. Further definitions of branch coverage arise when other graph reductions are used. For example, Bertolino and Marré [14] define branches that start/end in D-nodes, junction nodes (with indegree 2), the entry node, or the exit node. In the reduced graphs, called ddgraphs, branches are replaced by edges. We do not consider these definitions of branches because we concentrate on the reduction to decision graphs.
Note that in the rest of the paper we mean by branch coverage the edge coverage of the decision graph, as in Definition 10(iii), unless otherwise stated.
The set of test cases for the function Search does not satisfy any of these coverage notions since the edge between the while-node and the exit node is neither covered in the control flow graph nor in the decision graph. We need a third test case with parameters values = 1, 2, 3, 4 and key = 0 to cover all edges in both graphs.
It is obvious that a set of test cases that satisfies edge coverage for a given function also satisfies decision coverage for that function. Such a relation between coverage criteria is often called subsume: a coverage   subsumes a coverage , if for all functions (or for all programs ) and all specifications all sets of test cases that satisfy also satisfy [4, 11, 15]. The specification of the program is not used in our coverage definitions and therefore left out in this paper.

4.3. Comparison of Coverage Definitions

The example f2 (Figure 4) shows that in the case of unconditional loops all DD-paths and thus all edges in the decision graph can be covered (e.g., by the test case with parameter 0) but not all edges in the control flow graph are executed. This leads to the following lemma.

Lemma 11. Let be a function such that the control flow graph of does not contain unconditional loops, and let be a set of test cases that satisfies branch coverage. Then, also satisfies edge coverage and decision coverage.

Proof. Let be an edge in the control flow graph. From part (3) of Lemma 6, follows that there exists a DD-path that contains . This DD-path induces an edge in the decision graph of (Definition 4). Since the set of test cases satisfies branch coverage, a path with exists (Definition 10), and furthermore there is a path such that is the decision path of (Definition 9). Since the edge occurs in the decision path , we know from Definition 5 that is part of the path and thus .

Unconditional loops can be allowed in cases where all test case sets that satisfy branch coverage also run through all unconditional loops. Figure 6 shows the control flow graph of the following function f3:

void f3 int x {  if x > 0  {  while x > 0    x x − 1;  label goto label; } }

In this example, we need at least two test cases, one with a positive and one with a nonpositive parameter, in order to satisfy branch coverage. The test case with positive parameter executes also the unconditional loop.

Nonterminating function calls result in incomplete test case paths. For example, when we assume that the function g called in f1 (Figure 3) does not terminate if it is called with the parameter 0, the path induced when the function f1 is called with the parameter 0 does not lead to the exit node. The consequence is that all edges are covered when the function f1 is called with parameters 0 and 1 but the DD-path and thus the corresponding edge in the decision graph is not executed.

Definition 12. Let be a directed graph. A set of paths is called complete, if for each path that does not end in an exit node, there exists a path with .
Infinite sequences of paths (which are induced in the control flow graph of a function by infinite loops) are allowed in complete sets of path, but not paths that stop before reaching an exit node. The set f1 where is the test case with parameter 0 is not a complete set since f1 but (f1 .

Lemma 13. Let be a directed graph, a complete set of paths, and a path with inner nodes that are not D-nodes such that there exists a path with . Then, there is a path that contains .

Proof. If , we have and is contained in since . Assume that the proposition holds for all paths with inner nodes that are not D-nodes with length . Let be a path with length where are not D-nodes such that a path with exists. From the assumption, it follows that there is a path that contains   . If is the last node in , we can prolong uniquely by since is not a D-node. This new path contains   and is in since is complete and is not an exit node (Definition 12). If is not the last node in , the node following in must be , since is not a D-node, and therefore contains .

When we apply this lemma to DD-paths , we can prove the following proposition.

Lemma 14. Let be a function and a set of test cases that satisfies edge coverage or decision coverage such that is complete. Then, also satisfies branch coverage.

Proof. Let be an edge in the decision graph of the control flow graph of . Then, there exists a DD-path in such that is the associated edge in (Definition 4). Since the set of test cases satisfies edge coverage of or decision coverage, a path with exists (Definition 10). The set is complete, and therefore there is a path that contains (Lemma 13). It follows from Definition 5 that the decision path of contains the edge . This means that for the path holds .

Since branch coverage does not cover branches with unconditional loops, we could weaken the condition for complete sets and allow that the executions of unconditional loops are not fully contained in the path sets, that is, a path is allowed to end in a node that is contained in an unconditional loop without the existence of a path with .

In the case that a control flow graph has an isolated loop, it is impossible to get edge coverage but decision coverage possibly can be achieved. Of course, such a loop can only appear in functions with unreachable code. This is not the only case in which edge coverage does not follow from decision coverage. Another case arises if the set of paths induced by the test cases is not complete. When, for example, the function f1 in Figure 3 is called with 0 and 1, we get decision coverage, but not edge coverage, if we assume that g never terminates.

Lemma 15. Let be a function such that the control flow graph of does not contain isolated loops and let be a set of test cases that satisfies decision coverage such that is complete. Then, also satisfies edge coverage.

Proof. Let be an edge in the control flow graph. Let be the path according to part (2) of Lemma 6 that contains as last edge. Since the set of test cases satisfies decision coverage, a path with exists (Definition 10). From Lemma 13, follows that there is a path that contains . Therefore, .

If we exclude unconditional and thus isolated loops, for example, by not allowing gotos, which is a simple syntactical criterion, we can summarize the results as follows.

Theorem 16. In the set of all functions with control flow graphs without unconditional loops,

branch coverage subsumes edge coverage and edge coverage subsumes decision coverage.


For functions (with control flow graphs without unconditional loops) and test case sets that induce complete sets of paths the reverse directions also hold, that is, from decision coverage follows edge coverage and from edge coverage follows branch coverage.

Control flow graphs can be used for white box testing to support test data selection and coverage notions as shown in [5] for statement, segment, and branch coverage or as discussed by Laski and Korel [16] and Rapps and Weyuker [12] for data flow oriented testing. In the latter papers, only complete paths, that is, paths that start in the entry node and end in the exit node, are considered, whereas in the first paper also, paths that do not end in the exit node are allowed in order to capture infinite loops which can occur in practical applications that run until switched off, for example, in embedded control systems. Jalote [10] and Zhu et al. [4] base the definitions of statement and branch coverage and of data flow coverage notions on control flow graphs, where the nodes represent blocks of statements. Different coverage definitions are compared in [4, 11]. In the first of these papers, the authors argue that a coverage that subsumes another coverage does not necessarily give better results with respect to the detection of faults and introduce a relation called “properly covers" with which they prove that decision coverage is weaker than condition based and data flow oriented coverages. White [17] models the structure of programs with control flow graphs in order to discuss different aspects of testing. Program transformation techniques also use control flow graphs to represent the program structure, for example, as shown by Hierons et al. [13] with the aim to apply automated test data generation to transformed unstructured programs. An approach to generate test data that uses control flow graphs to describe all paths that lead from the entry node to the branch which should be tested is shown in [18]. Bertolino and Marré [14] propose an algorithm to generate path covers for branch testing which is based on ddgraphs that reduce graphs to D-nodes and junction nodes and the paths between them. The difference between ddgraphs and our decision graphs is the inclusion of the junction nodes in ddgraphs.

Another principal usage of control flow graphs is control flow analysis in compiler construction and optimization [19]. Aho et al. [3] use control graphs to represent intermediate code in the form of three address statements for code generation during the compilation of programs. These statements have the form x y op z or are unconditional goto-statements goto label or conditional goto-statements if condition goto label. A conditional goto is treated as one statement. Nodes represent basic blocks of sequential statements, which can be entered only by the first statement in the block and left by the last statement. Entry and exit nodes are separate nodes and not part of blocks. Ferrante et al. [1] derive program dependence graphs from control flow graphs that describe the data and control dependences in the program and use them for transformation and optimization of programs.

Kosaraju [20] defines flow charts recursively using different types of basic constructs and compares them to study the computational power of the underlying constructs. Analysis of programs by partitioning using segments, DD-paths, and other approaches is discussed by Paige [7].

A further application is to support the definition and evaluation of source-code-based metrics. For example, cyclomatic complexity can be, based on the cyclomatic number in graph theory [21], defined by counting the linearly independent circuits in the graphs [10, 22]. Sommerville [2] combines cyclomatic complexity and independent paths to design test cases in the white box test. Cyclomatic complexity for sets of functions can be defined in several ways. In the original paper, McCabe [22] defines the complexity of components by , where is the number of edges and the number of nodes in the components. Henderson-Sellers and Tegarden [23] argue that if the components represent calling and called functions the control flow graphs of the called functions can be expanded in the control flow graph of the caller. Function call nodes are split into two nodes, a call node and a return-from-node with additional edges from the call node to the entry node of the called function and from the exit node of the called function to the return-from-node. Thus, edges and nodes are added, if the components consist of one calling and called functions. The expanded graph consists of one component and defines an alternative cyclomatic complexity of a set of functions: . In [6], we compare the cyclomatic complexity of a control flow graph and that of its decision graph and prove that .

In order to do interprocedural analysis, Reps et al. [24] define a framework that consists of the set of control flow graphs for all functions in a program using the technique of node-splitting and expansion as described above. For data-flow analysis, especially, the interprocedural approach gives much better results than intraprocedural analysis [25]. Kapfhammer [15] defines test coverage notions based on interprocedural control flow graphs. When classes are considered, interprocedural control flow graphs can be restricted to the methods of single classes. With this approach, Harrold and Rothermel [26] give a framework for data flow oriented testing of classes. One difference between procedural and object-oriented programming languages is polymorphism. In another paper, Harrold and Rothermel [27] solve this by the introduction of polymorphic call and return nodes.

So far, all mentioned approaches modeled the control flow on the level of higher programming languages or intermediate level. But it is also possible to analyze the control flow of machine-level programs.

One application of interprocedural control flow graphs on the lower level is the detection of self-mutating malware. Bruschi et al. [28] try to find the control flow graph of the searched malicious code as subgraph in the control flow graphs of the program and thus identify malware.

Abadi et al. [29] use the control flow graph of machine-level programs to detect deviations from the control flow caused by attacks on the program. Computed jumps especially, have to be secured against destination addresses forged by attackers.

Usually, the control flow graphs are known when software is analyzed. If only the executable code is available, the control flow graphs have to be extracted before the analysis can take place. Various problems make the construction of the graphs imprecise, for example, when jump tables with data dependent target addresses are used. Theiling [30] describes a software framework that extracts the control flow graphs such that they can be used in a safe analysis of the worst-case execution time (WCET). An approach to construct the control flow graphs based on XML representations of the executable code in assembly form is proposed by Wenjian et al. [31].

Several tools exist that visualize control flow graphs. Figure 7 shows the control flow graph of the function Search generated by Crystal FLOW from SGV Software Automation Research Corporation (http://www.sgvsarc.com/). Such tools usually show more information in the graphs in order to support the understanding of the code or to be used in code reviews or documentation. A tool that uses control flow graphs to show the results of program analysis, in this case the WCET, is aiT by AbsInt (http://www.absint.com/ait/). With the visualization tool aiSee, the user can explore the graphs and thus inspect the WCET analysis results [32].

Control flow graphs can also be applied to the testing of hardware descriptions in VHDL. Zhang and Harris [33] introduce timing nodes to represent the timing information in VHDL descriptions and define data flow oriented du pairs coverage for hardware descriptions. Flow graphs are also useful in business process modeling. Sadiq and Orlowska [34] define workflow graphs where nodes represent tasks and edges represent the workflow between tasks. In this paper, workflow graphs are defined as acyclic graphs. A special iteration task is used to express the repetition of tasks. Workflow graphs are checked for deadlocks and lack of synchronization by graph reduction. Workflow charts that model human-computer interactions that support business processes and allow loops are studied in [35]. There, computer screens, forms and links are modeled by nodes in the graphs. One difference between these graphs and control flow graphs is that in order to model the workflow concurrency is necessary.

In the testing of graphical user interfaces, event flow graphs model the events that occur as reaction to the interaction of the user with the interface [36, 37]. Coverage criteria are based on paths in the graphs modeling event sequences. The analogue to edge coverage is called by Memon et al. [38] event selection coverage because an edge models the selection of an event after the event occurred. Like program decomposition into functions, graphical user interfaces are build up of components and intercomponent criteria can be defined.

Decision trees, introduced by Raiffa and Schlaifer [39], are well known in probability theory. A decision tree consists of decision nodes where decisions are taken and of chance nodes where unknown states are modeled by different successors with assigned probabilities. The leaves of decision trees are utility nodes that specify the outcome of the decisions. Each path from the root to a leaf thus models a sequence of decisions and state assumptions that lead to the outcome under assigned probabilities. The drawback that decision trees grow exponentially with the number of decisions can be solved by more general structures such as influence diagrams [40]. The main difference between these graph types and our decision graphs is that decision trees and influence diagrams are acyclic, and thus the length of decision sequences have always a fixed upper bound. Oliver [41] defines decision graphs as generalizations of decision trees where duplicated subtrees are joined and applies them to construct decision procedures from sets of examples. A practical application of decision trees is shown for example in [42] where decision trees for the prediction of the diagnosis and the outcome of Dengue illness are constructed from simple clinical and haematological data of 1200 patients using a decision tree classifier software tool. The authors of this study state as a conclusion that their algorithms are expected to help disease management.

6. Conclusion

In this paper, we derived decision graphs from directed graphs such that the branching structure is preserved. It can be shown that the branches in a graph without unconditional loops correspond to the edges in the decision graph. One useful application is the modeling of programs with control flow graphs. Decision graphs form an abstraction from control flow graphs that display only the decisions, for example, if-then-else-constructs and the paths between decisions to the programmer. With this approach, we compared different definitions of branch covering in software testing that already existed and showed the differences. When we exclude unconditional loops, branch coverage based on the edges in decision graphs subsumes edge coverage of the control flow graph and decision coverage. Control flow graphs are not only popular in software modeling but also popular in different other fields. Therefore, it seems promising to apply decision graphs to other domains and exploit their advantages.

Acknowledgment

The author wishes to thank the anonymous reviewers for their careful reading and helpful suggestions.