Abstract

Software metrics which are language-dependent are proposed as quantitative measures to assess internal quality factors for both method and class levels like cohesion and complexity. The external quality factors like reliability and maintainability are in general predicted using different metrics of internal attributes. Literature review shows a lack of software metrics which are proposed for reliability measurement and prediction. In this context, a suite of four semantic language-independent metrics was proposed by Mili et al. (2014) to assess program redundancy using Shannon entropy measure. The main objective of these metrics is to monitor program reliability. Despite their important purpose, they are manually computed and only theoretically validated. Therefore, this paper aims to assess the redundancy metrics and empirically validate them as significant reliability indicators. As software reliability is an external attribute that cannot be directly evaluated, we employ other measurable quality factors that represent direct reflections of this attribute. Among these factors, defect density is widely used to measure and predict software reliability based on software metrics. Therefore, a linear regression technique is used to show the usefulness of these metrics as significant indicators of software defect density. A quantitative model is then proposed to predict software defect density based on redundancy metrics in order to monitor software reliability.

1. Introduction

Software quality is one of the main concerns of all organizations using software systems. According to [1], it means the capability of a software process to produce software product with good quality. In both software process and product, quality is described through a set of attributes or characteristics that may be internal or external. Concerning product quality, reliability is identified as one of the most important software quality attributes. Different techniques including fault prevention, removal, fault tolerance, and fault forecasting are defined to produce reliable software systems [2, 3]. Furthermore, literature shows that software reliability like other external attributes is difficult to measure directly [46]. Thus, it is generally measured and predicted based on other quality attributes like defect density and fault-proneness identified as direct reflections of reliability [7]. These attributes are directly measurable through software metrics and widely used to validate different suites software metrics. Examples include complexity, cohesion, defect density, and fault-proneness [6, 810].

Software metrics are quantitative measures used to make evaluation, improvement, and prediction of different software quality attributes [1115]. Numerous software metrics (or combination of them) are proposed to assess internal attributes like complexity, cohesion and size, faults count, and defect density. Concerning external attributes like reliability and maintainability, they are generally expressed through internal attributes using the metrics proposed to measure them. For instance, reliability can be reflected through cohesion, coupling, and complexity using their related metrics [5].

Continuous research studies focusing on the assessment and prediction of software reliability are needed. Therefore, a suite of four semantic metrics (language-independent) are proposed in [16] to assess programs redundancy in order to monitor their reliability. However, these metrics are theoretically presented and manually computed for basic arithmetic operations. Furthermore, it is unclear how they can be computed for more complex programs (functions and classes). Using basic programs containing simple operations, we cannot assume that the metrics are related to programs’ redundancy. Thus, the authors need to perform the empirical assessment based on more complex software projects to convincingly show that the proposed metrics are related to program redundancy. On the contrary, concerning the hypothesis of these metrics as useful indicators of software reliability, there are no concrete examples or evidence of how these metrics can be linked to reliability. Therefore, empirical studies focusing on concrete relationship between these metrics and various reliability factors are required.

The authors in [17, 18] focused on the empirical validation of the redundancy metrics as measures of programs mutants’ rate. The manual assessment of these metrics for basic arithmetic operations and certain types of computation-based programs represent the main limitation of these studies.

Software metrics validation consists on identifying their utility by studying the relationship between these metrics and a quality attribute. The most used quality attributes as reliability indicators are defect density and fault-proneness [6, 810, 14, 19]. This relationship can be exploited to propose a quantitative model helping to predict a quality attribute through these metrics [9, 14].

To perform this validation, we used an empirical dataset including the different metrics values and defect density attribute. Metrics values are computed using a set of classes taken from Apache projects. Defect density attribute is obtained using the fault injection procedure on the same classes. Using the linear regression technique, this relationship is exploited to propose a quantitative model that reflects programs defect density through these metrics. This model is also useful to predict defect density for further datasets.

This paper is organized as follows: Section 2 presents the related works and motivation. Section 3 discusses the proposed empirical assessment approach. Section 4 outlines the empirical validation of redundancy metrics as defect density predictors. Section 5 analyses the experimental results. Conclusion and perspectives are reported in Section 6.

This section focuses on software reliability measurement, software redundancy, and entropy concepts. Furthermore, it describes the redundancy metrics suite.

2.1. Software Reliability

Software reliability is one of the most important software quality attributes [11, 20]. It is defined from two main sides:(i)The mathematical side considers reliability as the probability of failure-free operation for a specified period of time in a specified environment [3].(ii)The broader side considers reliability as the degree to which a system, product, or component performs specified functions under specified conditions for a specified period of time [2]. It is described by combining different subcharacteristics which are maturity, availability, fault tolerance, and recoverability [3].

Different techniques, namely, fault prevention, fault elimination, fault tolerance, and fault forecasting, are defined to reflect software reliability [11, 21, 22]. Software prevention mechanisms help to prevent faults; however, they cannot guarantee the avoidance of all software faults [23]. Consequently, further protective mechanisms like fault removal are required. For [22, 24], fault removal techniques are important and critical for software reliability. However, these techniques cannot guarantee the elimination of all faults since they are based on software testing and formal inspections which in turn have their problems. Besides, Mili and Tchier [22] argued that while removing one fault, other ones can appear. They also note that some faults cannot be removed since they are not sensitized or masked (actual states are equal to the stated ones), but these faults can cause software failure. Consequently, fault tolerance techniques and forecasting are required to enhance the performance of fault avoidance and removal to reduce faults and to improve reliability [2325].

Bansyia et al. [19] noted that reliability is one of the high-level quality attributes that are abstract concepts and cannot be directly observed and measured. Different models based on direct metrics are proposed to predict various external quality attributes including reliability. Chidamber and Kemerer (C&K) [19, 2629] proposed a suite of metrics used for reliability prediction. The proposed reliability prediction models use software metrics (called independent variables) to evaluate measurable reliability attributes (called dependent variable) like defect density, fault-proneness, and defect count [4, 810, 14, 30]. Mili et al. [16] also proposed a suite of four metrics to monitor programs reliability based on their redundancy.

2.2. Software Redundancy

Redundancy concept was firstly used in hardware systems. It provides more physical copies of components, i.e., processors and memories, to improve their reliability [31]. Different forms of redundancy were defined in software systems like information redundancy (code redundancy), functional redundancy, and time redundancy [22, 32, 33]:(i)Information redundancy indicates the excess of information expressed in Shannon bits used to represent the state of a program [34].(ii)Functional redundancy consists on using the same program specification to generate different algorithm or program versions performing the same functionality [33].(iii)Time redundancy represented by the time was used to repeat the execution of the failed process [24, 31].

Literature review shows that redundancy can be used in different applications. For instance, in 1976, redundancy was exploited through N-version programming technique to achieve reliable systems [35]. This technique was also used to compare the probability of failure between single and N-version systems. Furthermore, in [21], the authors noted that redundancy is useful for self-checking programs. In addition, in 2003, Lyu [25] showed that redundancy structured in mutation testing is exploited through the injection of faults in numerous program versions. Furthermore, in 2015, redundancy was also used to identify the semantic similarity of code fragments [36].

The redundancy metrics proposed by Mili et al. [16, 22] assess program information redundancy provided by the different states of the program [34]. Program states are the set of variables manipulating related data and reflect the uncertainty about the outcome of these variables. To develop a better understanding of the redundancy metrics, we present the terminology related to program states:(i)State space is defined by a set of values that the declared program variables may take [16].(ii)Initial state space is the state of the program (function/class) represented by the input variables [22].(iii)Current state (actual state) is the set of states that the program may be in at any given point [16, 37].(iv)Final state space is the state of the program that is produced by the output variables [22].(v)State redundancy arises when the representation of the program state allows a wider range of values than needed to represent the different states [32].

These definitions are illustrated in the following example:

Let the following program:{int s, x, result; //state space of the programint s = 2; //initial state of the programs = s + 1; //internal state 1 of the programs = 2  s; //internal state 2 of the programs = s%3; //internal state 3 of the programs = s + 12; //final state of the program}

2.3. Entropy Concept

As mentioned, the redundancy metrics were defined based on the Shannon entropy measure. Thus, for a given random variable X that takes its values in a finite set, its entropy is the function denoted by H(X) and defined by [34, 38]where is the probability of the variable X = .

Intuitively, this function measures (in bits) the uncertainty pertaining to the outcome of X and takes its maximum value H(X) = log2(N) when the probability distribution is uniform. N is the cardinality of X.

2.4. Redundancy Metrics Suite

Redundancy metrics were defined based on Shannon entropy measure of programs code. Mili et al. [16] supported the hypothesis of uniform probability distribution in order to perform the analytical evaluation of these metrics. The authors in [16, 17] argued that, for a random variable X that takes values of a 32-bit integer, N equals 232 and log2(N) is then merely equal to 32 bits. They assumed the uniform probability throughout; then, the entropy of any program variable is basically the number of bits in that variable. The entropy of a given variable X of value N is given as

Four metrics were defined which are initial state redundancy, final state redundancy, functional redundancy, and noninjectivity. To define these metrics, the following assumptions were made by Mili et al. [16]:(i)Probability distribution of the different variables is uniform.(ii)Variables are 32 bits size.(iii)Metrics were computed at method level. These methods manipulate input and output’ variables of integer type. This means programs with input states were represented by the declared variables and output states were represented by the modified states of these variables.

2.4.1. Initial and Final State Redundancy Metrics

We recall that the state of a given program is defined by its declared variables. It is very common to declare the range of values related to these variables much more than it is really required. For instance, the age of an employee is generally declared as an integer variable type. However, only a restrict range, i.e., between 0 and 120, is really required. This means that 7 bits are sufficient to store the age variable, but the typical 32 bits size of an integer variable is used. The unused bits measure the code redundancy. Thus, state redundancy represents the gap between the declared state and the actual state (really used) of this program [16, 18, 22].

Mathematically, let S be the declared state of the program (its variables) and σ be its actual state (the actual values of these variables), and then, the state redundancy is given by the difference between their respective entropies denoted by H(S) and H(σ). The program moves from its initial states (σ1) to its final states (σf), and then, two state redundancy measures, namely, initial state redundancy (ISR) and final state redundancy (FSR), were defined by the following equations [18] (Table 1):(i)ISR is the gap between the declared state and the initial state of the program.(ii)FSR is the gap between the declared state and the final state of the program.(iii)H(S) is the entropy of the individual variable declarations (number of bits required to store these variables) of the program. For instance, for Java language, the entropy of variable declarations of basic data types is illustrated in Table 1.(iv)σ1 and σf are, respectively, the initial and final actual states of .(v)H(σ1) and H(σf) are, respectively, the entropies of these states.(vi)To compute SR metric (ISR and FSR), each data type has to be mapped to its width in bits.

Example 1. Consider the following program:int x, y, z; //the program state is represented by x, y, and z variables{int x = 21; //initial state of xint y = 90; //initial state of yz = (x + y)/2//final state}The declared space of this program is defined by three integer variables: x, y, and z; hence, using the metrics definitions provided in [18], we have stated that H(S) = 96 bits since 3 integer variables are used. Its initial state is defined by three variables: x, y, and z. The input variables x and y require, respectively, 5 and 7 bits to be stored. The output variable z has a free range (32 bits). Hence, H(σ1) = 5 + 7 + 32 = 44 bits. For the final state, it is determined by the value of the variable z; then, H(σf) = 6 bits, and we find

2.4.2. Functional Redundancy (FR) Metric

According to [18], the functional redundancy metric is a function from initial states to final states. It reflects how initial states are mapped to final states. It also identifies how initial states are affected by input data and how final states are projected onto output data. Mathematically, for a program (function), FR is the ratio of the output data delivered by prorated to the input data received by the program and given by(i)H(S) is the entropy of the program’ declared space defined by the declared variables.(ii)H(Y) is the output data delivered by the program.(iii)H(X) is the entropy of input data passed through parameters, global variables, and read statements.

Considering the same previous example, H (S) = 96 bits. The random variable Y is defined by the integer variable z represented by 32 bits. Then, H (Y) = log2 (232) = 32 bits. H(X) is the input data received by and represented by the two integer variables x and y. Then, H(X) = 2  log2 (232) = 64, and FR is given by

2.4.3. Noninjectivity (NI)

According to [28], a major source of program (function) redundancy is its noninjectivity. An injective function is a function whose value changes whenever its argument does. A function is noninjective when it maps several distinct arguments (initial states σ1) into the same image (final states σf). Mathematically, NI is the conditional entropy of the initial state given the final state: if we know the final state, how much uncertainty do we have about the initial state? According to [34], this conditional entropy equals the difference between entropies of these two states. Hence, NI was defined as

Using the previous example, NI is equal to 44 − 6/44 = 0.86. A sum up of the presented metrics is illustrated in Table 2.

2.5. Motivation and Objective

As mentioned, the main objective of redundancy metrics defined in [16] is to monitor product reliability. This makes them important measures since software reliability is one of the most important quality attributes. However, the different metrics composing this suite are theoretically presented and manually computed for basic arithmetic operations. Furthermore, it is unclear how they can be computed for more complex programs. With very simple operation programs, we cannot assume that the metrics are related to programs’ redundancy. Therefore, we need to perform the empirical assessment based on more complex software programs (functions/classes) to convincingly show that the proposed metrics are related to program redundancy. In addition, there are no concrete examples or evidence of how these metrics can be linked to reliability. Thus, empirical studies focusing on concrete relationship between these metrics and various reliability factors are required. Given these statements, the main objectives of this study can be stated into two parts:(i)First, we aim to propose an approach to empirically assess the mentioned redundancy metrics. Solving this issue consists first on considering complex programs taken from real-world software projects rather than be limited to basic operations and certain types of computation-based programs [29, 39]. The basic idea is to generate an empirical database including for each program (function/class) the values of the redundancy metrics.(ii)Second, we aim to propose an empirical validation of the proposed metrics as reliability indicators by considering defect density attribute as direct reflection of software reliability [5, 7]. The basic idea is to exploit the generated database to study the relationship between the metrics and defect density using regression techniques [8, 28, 40].

Therefore, we propose in Section 3 an empirical assessment approach to calculate the different metrics. In addition, we present in Section 4 an empirical validation approach to demonstrate the concrete relationship between these metrics and software reliability.

3. Empirical Assessment of Entropy-Based Redundancy Metrics

The proposed redundancy metrics were computed manually in [16] at function level for simple examples and for specific data type and values, i.e., greatest common division of two integer variables. Thus, in this paper, we will consider more complex examples taken from real-world software projects to automatically compute these metrics at the class level since software projects are organized in classes. Three main steps are used:(1)Selection of software classes: in this step, we have selected different classes from which the metrics will be generated.(2)Compute redundancy metrics: once the different classes are selected, we have used appropriate scripts to compute these metrics.(3)Construct the database: the implementation of the two previous steps helps to obtain an empirical database that contains for each class, the values of the different metrics. The presented steps will be detailed in the following sections.

3.1. Selection of Software Classes

According to Radjenović et al. [29] and Kumar et al. [39], software repositories used to validate most of software metrics are of three main types:(i)Private/commercial repositories: this type of repositories was used and maintained by companies within the organizational use. In these repositories, source code and other related information like fault datasets are not available [29].(ii)Partially public repositories: in these repositories, Radjenović et al. [29] noted that only the product source code and the related software faults are available, whereas the values of software metrics are usually unavailable so, they need to be calculated from the available source code and then mapped with their fault information. Regarding to [29], the mapping may lead to biased results.(iii)Public repositories: in these repositories, the values of software metrics and other information like software faults are usually available, and this justifies their uses in many research projects [27]. Some examples of these public repositories include PRedictOr Models In Software Engineering (PROMISE (http://promise.site.uottawa.ca/SERepository/datasets-page.html)) repository of NASA projects, Software-artifact Infrastructure Repository (SIR (http://sir.csc.ncsu.edu/portal/index.php)), and Bug Prediction Dataset (BPD (http://bug.inf.usi.ch/index.php)).

To perform the empirical assessment of redundancy metrics, we have focused on the repository containing programs of input/output type as explained above [16].

Given that computing redundancy metrics requires the availability of the source code, private and commercial repositories were not considered since the programs’ source codes are not available. Therefore, we have focused on partially public and public repositories. Literature review [7, 41] shows that most of studies focused on metrics validation have used programs (classes or methods) taken from NASA projects like CM1, JM1, and KC1. In this context, the authors in [40] showed that, from 64 metrics’ validation studies performed from 1991 to 2013, NASA projects were the most used (60%), followed by PROMISE repository datasets (15%) and other open-source projects (12%).

In our research project, we have proceeded as follows:(1)First, we have used NASA then PROMISE projects for which different information are available including the values of software metrics. Furthermore, faults datasets of these projects are also available; in addition, for each class, we can identify if it is fault-free or not (true/false). However, we have decided to not use this repository as we have identified the unavailability of the source code mandatory to our study as we need to compute the redundancy metrics using this code.(2)Next, we have focused on a set of open-source projects, some of them are not of input/output type like SIR′ velocity and Camel projects. Others do not include source code such as BPD repository. Literature review [39] shows that Apache Common library including different java projects with available source code of input/output type was also used to validate software metrics. Besides, in this repository, the unit tests related to the classes are available.

Consequently, to select the needed repository, we have considered Apache Commons products library which respects all our requirements and hypothesis. Then, from the selected repository, we have considered a set 43 classes (see Table 3) containing functions manipulating variables in the input and the output state.

A description of each class and its related function is available at http://commons.apache.org/proper/commons-math/javadocs/api-3.6/.

3.2. Automatic Metrics Computing and Database Construction

The process we used to compute redundancy metrics (ISR, FSR, FR and NI) is summarized in Figure 1.

Figure 1 presents the different steps used to compute redundancy metrics. These steps are the same for the different selected classes presented in Table 3. To compute these metrics, we have used the Eclipse development environment (version: Neon.3 Release (4.6.3)). As the selected source code, we have used to compute these metrics organized in classes, and we present under here how redundancy metrics are computed at the class level. The computing process was developed using the following steps.

3.2.1. Compute Program State Space H(S)

To compute the state space H(S), we have first identified for each class the input/output functions manipulating the different states of the class variables. Next, we have computed H(S) as the maximum entropy of all function variables (input/output). For a better understanding of H(S) and the other metrics computing, an example of the used script is illustrated in Figure 2.

In Figure 2, H(S) is computed as shown in line 33, and its value is equated with the maximum entropy of the input and output variables used in lines 24 to 31. The input data related to these variables are randomly generated as shown in lines 34 to 36.

3.2.2. Compute H(σ1) and ISR Metric

H(σ1) reflects the initial entropy of the input variables and the maximum entropy of the output ones. To compute the entropy of a variable (exact number of the used bits) presented by equation (2), a Java function called sizeOfBits is used. More details about this function are presented in Appendix A (see Figure 3). H(σ1) is computed as illustrated in lines 48 and 49 of Figure 2. Using equation (3), ISR metric value is deduced as illustrated in line 64 of Figure 2.

3.2.3. Compute H(σf) and FSR Metric

As mentioned, H(σf) reflects the entropy of the variables used to produce the output ones. The sizeOfBits function is used to provide their values as illustrated in lines 76, 78, and 79 of Figure 2. Using equation (4), the FSR metric value is deduced as illustrated in line 81 of Figure 2.

3.2.4. Compute FR and NI Metrics

To compute FR and NI metrics, we have used equations (6) and (8). We have first computed H(Y) and H(X) values as shown in lines 84 and 85 of Figure 2. Then, FR and NI values are deduced as illustrated in lines 86 and 88 of Figure 2.

3.2.5. Generate Metrics Values to Excel Files

We have noted that the metrics values were generated using 1000 iterations to test different possibilities of random inputs as shown in lines 19 and 20 of Figure 2. To store the values of these metrics, we have generated them in .xsl files as shown in lines 16 and 91. Then, we have computed for each class the average of the 1000 generated metrics values to construct the final database. A part of this database is illustrated in Figure 4.

The presented process can be also used to compute the redundancy metrics at a function level. Thus, an example of the used script is illustrated in Figure 5 of Appendix A. The source of the metrics generator is available at https://gitlab.com/dalilaamara/redundancymetrics.

4. Empirical Validation of Semantic Metrics as Reliability Indicators

According to [5], a valid metric is one whose values are statistically associated with a quality attribute. The empirical validation aims not only to identify the utility of a proposed metric and to make comparisons with other metrics but also to identify which metrics are not useful. Reliability can be reflected by other measurable attributes including defect density and fault-proneness. In our research, we have focused on the defect density attribute as fault-proneness attribute indicates whether a class contains faults (1) or not (0), and in our research, all classes in the constructed redundancy database contain faults.

4.1. Formulation of Research Hypotheses

To study the relationship between redundancy metrics and software defect density attribute, the following hypotheses are designed:(i)H1: ISR redundancy metric is significant as software defect density indicator.(ii)H2: FR redundancy metric is significant as software defect density indicator.(iii)H3: NI redundancy metric is significant as software defect density indicator.(iv)H4: ISR, FR, and NI (or combination of them) are jointly indicators of software defect density.

Through these hypotheses, we aim to verify if a relationship between the different metrics and defect density attribute exists. Once, a significant correlation between redundancy metrics and the defect density is identified, and it can be stated that these metrics are useful to monitor software reliability.

4.2. Software Defect Density

Defect density (DD) was defined as the number of defects divided by thousand lines of a delivered code [5, 42, 43]. It is given as follows [5, 7]:(i)The product size is in general measured in terms of thousand lines of code (KLOC) [42, 43].(ii)According to [5], defect counts can include postrelease failures, residual faults (all faults discovered after release), all known faults, and the set of faults discovered after some arbitrary fixed points in the software life cycle (after unit testing).

Once the quality attribute is identified, the next step consists on studying the existence of a relationship between this attribute as dependent variable and the different redundancy metrics as independent variables.

4.3. Empirical Validation Approach

A software metric shall be validated [26] as the validation helps to identify the best metrics providing the required information leading to the metrics’ purpose [14].

Different studies [4, 19, 26] detail various software metric suites’ validation. Table 4 shows a comparison between the common validation approaches based on their objectives, process validation, and used repositories.

As illustrated in Table 4, different studies were proposed to validate software metrics as appropriate indicators of various quality factors like defect density, maintainability, and fault-proneness. We have stated the following:(i)The different validation approaches were based on three main steps: dataset collection, dataset analysis and models building, and models’ performance evaluation.(ii)The data related to software metrics and the considered quality attribute to validate them are available in public datasets including NASA projects.

Therefore, our proposed methodology to validate redundancy metrics consists of the following steps:(1)First, we have collected data related to dependent and independent variables represented, respectively, by defect density and redundancy metrics. As explained above, redundancy metrics are computed for a set of classes selected from Commons Apache library. In these classes, the defect density was not available. Thus, we have used defect injection procedure to compute this attribute for the same used classes.(2)Second, we have studied the independent and joint impact of the different redundancy metrics on software defect density using data analysis tools.(3)Third, we have proposed a defect density predictive model based on redundancy metrics.

4.4. Defect Density Data Collection

Based on equation (9), defect density is derived from the number of faults in the source code divided by thousands of line of code (KLOC) from the classes presented in Table 3.

4.4.1. KLOC Computing

To compute the KLOC measure, we have used the Metrics tool [47]. Within Eclipse environment, this tool provides for each of the used classes the number of lines of code illustrated in Figure 6.

4.4.2. Defect Count Computing

As mentioned, we have used the Apache commons Math library including only the source code and the associated unit tests. Thus, we used fault injection procedure to obtain the values of this measure. One of the well-known fault injection techniques is mutation testing consisting on automatically seeding into each class’ code a number of faults (or mutations). The new classes are called mutants. Then, tests are run, and two possible cases are presented [48, 49]:(i)If tests fail: the injected fault is detected (or killed) since the test gives different results between the original program (function in the class) and the faulty one.(ii)If tests pass: the injected fault is masked (or lived) since the original program and the faulty one (mutant) give the same results when tests are run.

Fault injection procedure is performed based on automated mutation tools like MuJava, MuEclipse, PiTest, and much more [50]. In our research work, PiTest (https://pitest.org/) is used within Maven (https://maven.apache.org/) environment. To inject faults, we have proceeded as follows:(i)All possible faults which are active by default in PiTest are injected in the considered classes [50]. These faults include the replacement of binary arithmetic operations by another ones (+ by -, - by +, by /, / by , and % by ), replacement of increments with decrements and vice versa, etc.(ii)PiTest runs, and related reports are generated. PitReport indicates for each function of the different classes the type and the location of the injected fault. Moreover, it indicates whether the injected fault is detected or masked. An example of PitReport of Erf class is illustrated, respectively, in Figures 7(a) and 7(b). Further details of fault injection process are presented in Appendix B.

Figures 7(a) indicates the line coverage and mutation coverage for each class. This measure represents the rate of the source code executed by a program when a test suite is launched. For Figure 7(b), it presents a report of the injected faults in the Erf class especially for erf (double, double) function. The green lines indicate that the injected fault is detected, whereas the pink one indicates that it is masked. Based on these reports, the number of the injected faults is used to compute defect density measure for each class. The final structure of the obtained database contains for each class the values of the four metrics and the density attribute. This database is available at https://gitlab.com/dalilaamara/redundancymetrics/.

4.5. Redundancy Metrics Data Collection

We have presented in the previous section the procedure to inject faults into the source code of the different selected classes from the Commons Math library. To test whether the metrics values can be affected by the injected faults, we have computed the redundancy metrics after the fault injection process. Thus, we have adopted the following steps (Table 5):(i)Step 1: we have selected five mutators, namely, Conditionals Boundary (Mut 1), Negate conditionals (Mut 2), Return values (Mut 3), Inverts negatives (Mut 4), and Math (Mut 5) as different mutators were active by default in the PiTest tool used for fault injection. A brief description of these mutators is illustrated in Table 5. Further details are available at https://pitest.org/quickstart/mutators/.(ii)Step 2: we have computed the values of the redundancy metrics using formulas 3 to 6 presented in Section 2 for each injected type of mutators (faults).(iii)Step 3: the constructed dataset contains the values of the redundancy metrics for each type of the selected mutators. It will be used to determine whether the values of the redundancy metrics are still unchangeable when faults are injected.

4.6. Dataset Analysis

Dataset analysis phase requires other important steps like data normalization/standardization and correlation analysis. Normalization and standardization are feature scale transformation. According to [51], in case we have a large difference between the maximum and minimum values, e.g., 0.01 and 1000, we should rescale them in the range [0, 1]. In this study, the used metrics are defined in such a way that they range between 0 and 1 [18].

Correlation analysis is required to identify the association between the different independent variables (redundancy metrics) in order to consider only significant ones (not intercorrelated). To test the redundancy metrics correlation, we have used Python language. The results are illustrated in Figure 8.

Correlation coefficients between the independent variables are analyzed based on Hopkin’s statements [52]:(i)Correlation coefficients which are greater than 0.5 are considered as large.(ii)Correlation coefficients which are between 0.3 and 0.5 are considered as moderate.(iii)Correlation coefficients which are between 0.1 and 0.3 are considered as small.(iv)Correlation coefficients which are smaller than 0.1 are considered as insubstantial.

Figure 8 indicates a strong significant correlation between NI and FSR metrics since their correlation coefficient is equal to 0.99. Thus, one of these variables will be omitted.

5. Experiments and Results

In this section, we have evaluated the usefulness of redundancy metrics as predictors of defect density to test the stated hypotheses. According to [7, 30], regression techniques are used since the defect density attribute we aim to predict is represented by quantitative values.

5.1. Experiments

As mentioned, linear (simple and multiple) can be used to predict defect density based on redundancy metrics. Thus, we present the general form of each regression type as follows:(i)The general form of the simple linear regression model is presented as(ii)The general form of the multiple linear regression model is presented asIn the presented equations, the Y represents the dependent variable, the Xi represent the independent variables, βi are the estimated parameters, and Epsilon (ε) is the random error.Using the presented formulas (10) and (11), three main experiments are performed:(iii)Experiment 1: before studying the linear regression between the redundancy metrics and the defect density, we have tested whether the values of the redundancy metrics are affected by the injected faults. Therefore, for each metric, we have represented its variation with the five selected mutators described in Table 4.(iv)Experiment 2: in this experiment, we have used the univariate linear regression to test the hypotheses H1 to H3 presented in Section 4.1.(v)Experiment 3: in this experiment, the multivariate linear regression is used to test the hypothesis H4.

5.2. Results
5.2.1. Experiment 1: Variation in the Redundancy Metrics with the Injected Faults

We have studied the variation in the ISR, FSR, and NI redundancy metrics for each type of the injected mutators for a set of classes selected from the constructed dataset (see Figure 9). We have only focused on these metrics since FR redundancy metric is computed using the maximum entropy of the input and output data insensitive to any change in the variables’ states (see equation (6)). Results are depicted in Figure 9.

Figure 9 shows some variation in the values of the redundancy metrics for the different mutators. This variation indicates that change in mutators of different types affects the redundancy of the source code. Thus, the state of the variables used to assess this redundancy changes with the injected faults.

Figure 10 illustrates a part of the mutated source code of the BesselJ class.

In line 446, 2 injected faults of Math mutators are survived, and these faults consist on replacing, respectively, the multiplication with a division and the subtraction with an addition. Therefore, the state of the p variable will be modified and will affect the state of the subsequent instructions. In line 464, other types of mutators were injected, which are Mut 1, Mut 5, and Mut 2. These mutators affect the state of the different variables used by the program showing a variation in the redundancy metrics as their values depend on the variables’ states.

In experiment 1, the variability of the redundancy metrics for the different mutators is negligible in some classes, as these faults cause a little change in the number of needed bits representing the variable states. For instance, in Figure 10 line 446, if we consider the variables en = 2, plast = 1, x = 1, and pold = 1, then the value of p variable before and after the injection of Math mutators (replace multiplication with division and subtraction with addition) is equal to 1. Thus, the redundancy provided by the state of the p variable is still unchangeable. In line 465, if we consider l = 7, then the states of ncalc variable before and after the Math mutators (replace subtraction with division) are, respectively, 6 and 8. The required entropy to represent these states is 3 bits. Therefore, there is no variation in the redundancy metrics between these two states (6 and 8). This explains the similar values of redundancy metrics for the different mutators presented in Figure 9.

5.2.2. Experiment 2: Univariate Linear Regression

The univariate linear regression uses only one independent variable (one of the presented redundancy metrics) to predict defect density. Thus, we perform univariate linear regression to test separately the three first hypotheses presented above. Results of this experiment are illustrated in Table 6.

Results shown in Table 6 are analyzed based on value measure. This measure is defined as the probability of error which is the significance level that is used to accept or reject the hypothesis [7]. Two possible cases are presented:(i)To accept the hypothesis, the value must be less than or equal to 0.05.(ii)Otherwise, reject the hypothesis.

Taking defect density as dependent variable and redundancy metrics as predictors and based on previous statements, the results in Table 6 are summarized as follows:(i)For H1, value is 0.000 and less than 0.05. So the H1 hypothesis is accepted which means that ISR metric can be considered as a significant defect density predictor.(ii)For FR, value is 0.000. Thus, we can accept the hypothesis H3 which indicates that FR redundancy metric can be considered as a significant predictor of defect density attribute.(iii)For NI, value is 0.000. Using previous statements, the hypothesis H3 is also accepted which means that NI can be considered as a significant predictor of defect density attribute.

5.2.3. Experiment 3: Multivariate Linear Regression

Once univariate linear regressions are performed to identify the relationship between each redundancy metric and defect density separately, we aim in this experiment to join these metrics and study their common effect on defect density. For this, we have tested the multivariate regression for the hypothesis H4 based on equation (11). Results are summarized in Figure 11.

Taking defect density as dependent variable and redundancy metrics as predictors, results in Figure 9 show the following:(i)For ISR and NI, the values are less than 0.05 and equal, respectively, to 0.011 and 0.000. Consequently, the hypothesis H5 which supposes that redundancy metrics are useful as defect density predictors is accepted for ISR and NI.(ii)For FR metric, value is greater than 0.05 and equal to 0.189. For this, this metric is omitted from the multivariate regression and only ISR and NI are considered as significant predictors of defect density attribute.

5.3. Model Performance Evaluation

We present in this section the overall evaluation of the linear regression model and summary of results:(i)Model performance evaluation: model evaluation is required to evaluate the significance of the model to identify whether it fits well the data. Among the performance evaluation measures, the coefficient of determination (R-squared score) is the most used [41, 53]. This measure represents the proportion of variance in the dependent variable that can be predicted from the independent variables.(ii)Model parameters are regression beta coefficients () that describe the degree of the response variable for each 1-unit change in the independent variables.

A summary of results for the different above hypothesis based on the presented performance measures and model parameters is illustrated in Table 7.

For the different previous hypotheses and based on the presented performance evaluation measures, Table 7 shows the following:(i)For H1, H2, and H3, R-squared values indicate, respectively, that 38.8%, 47.3%, and 65.8% of the variability of defect density is predicted separately by, respectively, ISR, FR, and NI. The obtained R-squared values for these hypotheses are moderate and indicate that using the redundancy metrics as separate independent variables explains moderately the variation in defect density as a dependent variable.(ii)For H4, adjusted R-squared shows that ISR and NI jointly predict 73.6% variability of defect density. This indicates that, overall, the performed multiple regression can significantly predict the defect density. So, this multiple regression between redundancy metrics and defect density is justified.

To sum up, the combined impact of ISR and NI redundancy metrics is analyzed, and results show that this multiple regression is justified. So, using these metrics jointly gives more improvement in predicting defect density. The application of multiple linear regression provides a model that reflects the relationship between the redundancy metrics and defect density and can be depicted by the following equation based on beta coefficients presented in Table 7:

Using previous statements, we can note the following:(i)For ISR, the regression coefficient is positive and equal to 0.727. This means that, for each 1-unit increase in the ISR metric, there will be an increase in defect density by 0.727 units.(ii)For NI, the regression coefficient is positive and equal to 0.462. Consequently, for each 1-unit increase in the NI variable, the defect density variable will increase by 0.462 units.

5.4. Discussion and Threats to Validity

Reliability is in general predicted based on predictive models which are developed using two basic elements: software metrics and software faults [8, 54]. The proposed linear regression model is justified and can be used to predict the defect density for new datasets based on their redundancy measures. This defect prediction model can serve as early quality indicator for developers and testing teams to manage and control test execution activities. For different code alternations, different values of the ISR and NI redundancy metrics are obtained. The variation in these values can affect the defect density attribute.

We have obtained promising results proposing validated ISR and NI redundancy metrics as significant reliability indicators. However, we have noted several threads to validity. First, the proposed redundancy metrics are semantic as they depend on the program functionality; each program (function or class) has its state represented by the manipulated variables. Hence, each time the used variables in the program input state change, the output state will change, and the values of the redundancy metrics will change too. Therefore, the proposed computing process described in Section 3 is not automated, and it is implemented separately for each program. Second, the more the larger training datasets and optimizing model parameters are used, the better the model prediction performance [55], and our dataset can be extended to enhance the performance of the proposed prediction model. Third, literature review related to software metrics validation [7, 10, 12, 13] shows that usually numerous quality attributes can be used to validate a software metric. Hence, we can use other reliability subcharacteristics like fault-proneness to show the utility of the redundancy metrics as reliability indicators.

6. Conclusion and Perspectives

Initial state redundancy, final state redundancy, noninjectivity, and functional redundancy metrics were proposed to assess the code’ redundancy in order to monitor software reliability. However, all of these metrics are manually computed and theoretically presented. In this research, we aim at empirically assessing and validating these metrics as significant reliability indicators. We have used the defect density attribute as a direct reflection of software reliability to reach our objective.

We have built an empirical database including a set of Java classes taken from the Commons Math Library, all related redundancy metrics’ values, and the defect density as a direct reliability indicator. This database has allowed us to empirically assess and validate the redundancy metrics as reliability indicators.

Regression techniques have been used to propose a predictive model based on the defect density attribute as a dependent variable and initial state redundancy and noninjectivity metrics as independent variables.

The proposed model is useful for testers and developers and can be used to predict defect density and to monitor software reliability for further datasets.

As the initial state redundancy metric only measures the program redundancy in its initial and final states without considering the redundancy of its internal states, we propose in the future work, to improve this metric by considering its internal states in order to reflect the overall program redundancy. In addition, we envision to develop an automated support tool computing the redundancy metrics leading to ameliorate the performance of the computing process.

Appendix

A

In this Appendix, we present examples of the used scripts to perform the empirical assessment and the validation of redundancy metrics as useful indicators of software reliability.

Figure 3 presents an example of the sizeOfBits function identifying the needed entropy of the used variables in the different program (function/class) states.

In Figure 5, the different redundancy metrics are computed at method level. The computing process is the same as for the class level. However, at function level, only one function and the associated input and output variables are considered.

B

This appendix presents the process of mutations (faults) injection. Thus, three main steps were adopted to seed faults into java programs. These steps consist on executing three Maven command lines:(1)mvn install: consists of installing Maven packages into the local repository; various actions are printed which end with build success result as shown in Figure 12.(2)mvn test: required to compile test classes result as shown in Figure 13.(3)mvn org.pitest:pitest-maven:mutationCoverage: used to run mutations; build success result is obtained as shown in Figure 14.

Data Availability

Datasets used to perform our empirical research work are available through https://gitlab.com/dalilaamara/redundancymetrics. This link is also included in the manuscript.

Conflicts of Interest

The authors declare that they have no conflicts of interest.