Abstract

In general, the software-testing time may be measured by two kinds of time scales: calendar time and test execution time. In this paper, we develop two-dimensional software reliability models with two-time measures and incorporate both of them to assess the software reliability with higher accuracy. Since the resulting software defect models are based on the familiar nonhomogeneous Poisson processes with two time scales, which are the natural extensions of one-dimensional software defect models, it is possible to treat the time data both simultaneously and effectively. We investigate the dependence of test-execution time as a testing effort on the software reliability assessment and validate quantitatively the software defect models with two-time scales. We also consider an optimization problem when to stop the software testing in terms of two-time measurements.

1. Introduction

The reliable software plays a central role to develop the dependable and high assurance computer-based systems. Since the debugging cycle times of software are often reduced due to smaller release time requirements, the accurate estimation of software reliability tends to be more important day by day, especially, in the earlier testing phase. Software defect models (SDMs) are used to measure the quantitative software reliability and to control the software testing process effectively. Since the software reliability is defined as the probability that software failures caused by faults do not occur for a specified time period, the time evolution of failure occurrence process (fault detection process in software testing) should be modeled by any stochastic counting process. In fact, during the last three decades since the seminal contribution by Jelinski and Moranda [1], a huge number of SDMs have been extensively developed by many authors to help us in estimating the number of initial fault contents and understanding the effect of errors on software operation as well as in predicting the software reliability [25]. These are characterized by modeling the software intensity function which implies the instantaneous debugging rate of software faults in the testing phase and is equivalent to the transition rate of stochastic point process.

When one counts the number of software faults detected in the testing phase, the cumulative number of faults is described by a generalized pure birth process with time and/or state dependent transition rate [6, 7]. If the software intensity function depends on only the number of remaining faults, it is reduced to a time-homogeneous continuous-time Markov chain (CTMC). The Jelinski and Moranda SRM [1] is the simplest example of time-homogeneous CTMCs. On the other hand, Goel and Okumoto [8] develop an alternative CTMC model with time-dependent transition rate, based on the nonhomogeneous Poisson process (NHPP). Since the NHPP has a simple mathematical structure and its analytical treatment is quite easy, many SDMs belonging to this category are developed under different software debugging scenarios with deterministic intensity functions. Langberg and Singpurwalla [9] provided a bridge between Jelinski and Moranda SRM [1] and Goel and Okumoto SRM [8] from the Bayesian point of view. Miller [10] extended the Langberg and Singpurwalla’s idea to more general cases with the concept of exponential order statistics. Gokhale and Trivedi [11] took account of the testing coverage and propose a different unification framework of the NHPP-based SRMs. Apart from the sophisticated probabilistic approaches, Huang et al. [12] explained the deterministic behavior of NHPP-based SDMs, namely, the mean value functions, by introducing several kinds of mean operations. Pham and Zhang [13] solved a generalized differential equation by which the mean value function is governed and proposed a unified NHPP-based SDM with many parameters.

The above modeling frameworks may unify the NHPP-based SDMs, but never refer to how to use them effectively in practice. For instance, the software testing time may be usually measured by two kinds of time scales: calendar time (hour, day, week, etc.) and test-execution time (CPU second, CPU minute, CPU hour, etc). Musa et al. [3] developed the calendar-time theory and gave a relationship between the calendar-time modeling and the test-execution time modeling in software reliability. Parr [14] used the Rayleigh curve to describe the software development effort (see [15, 16] for the other static approaches). An alternative approach based on the NHPP-based SDMs was proposed by Yamada et al. [17, 18], Huang and Kuo [19], Huang [20], Kuo et al. [21]. They defined the mean value functions of NHPPs as functions of test-execution time and developed some testing-efforts dependent SDMs. These SDMs are quite simple for practical use, but involve several technical problems in terms of parameter estimation and model validation, as we point out them latter. Chen et al. [22] took both execution time, and code coverage into consideration as factors that influence the cumulative number of software failure occurrences. Grottke [23] considered a number of factors influencing the cumulative number of failure occurrences, like calendar time, execution time and code coverage, by specifying consecutive relationships, within the framework of CTMC modeling.

In this paper, we summarize two-dimensional SDMs with two-time measures and incorporate both of them to assess the quantitative software reliability with higher accuracy. Ishii and Dohi [24] developed two-dimensional NHPP-based SDMs and estimate the model parameters with elementary least squares estimation. In the subsequent paper [25], the same authors considered the maximum likelihood estimation for the same SDMs and investigated the goodness-of-fit and the predictive performance with real software fault data. Ishii et al. [26] focused on the discrete-time models with the number of test cases spent for testing and proposed the similar two-dimensional SDMs. The modeling framework developed in [2426] was widely used for obtaining analogous SDMs [27, 28]. Since the resulting SDMs are based on the familiar NHPPs with two time scales which are the natural extensions of one-dimensional SDMs, it is possible to treat the time data both simultaneously and effectively. In that sense, other two-dimensional SDMs are quite different from Grottke [23]. Also it is worth noting that the two-dimensional SDMs in [25, 26] should be distinguished from the earlier stochastic models developed in the reliability filed [2933], because they are consistent to the existing one-dimensional NHPP-based SDMs in the literature and involve as special cases. The modeling approach employed here enables us to apply the maximum likelihood method for estimating the model parameters as well as to provide a useful tool to incorporate the testing-effort expenditure as one of the testing matrices. As pointed out by Musa et al. [3], the test-execution time is the best time scale to measure the net testing time and efforts. On the other hand, the software reliability should be defined on the calendar time based on the operational profile, and the testing control to release the software products will be also made on the calendar time. That is, it is essentially important to treat two kinds of time measures in the software reliability assessment. We investigate the dependence of test-execution time as a testing effort on the software reliability assessment and validate quantitatively our new SDMs with two-time scales. We also consider an optimization problem when to stop the software testing in terms of two-time measurements.

2. One-Dimensional Software Defect Modeling

2.1. NHPP-Based SDMs

We make the following assumptions. (A-1) The software fault detected in testing phase is instantly fixed and removed. (A-2)The number of initial fault contents in the program, , is given by a nonnegative discrete random variable having the probability mass function (p.m.f.) . (A-3) The th software fault can be detected by independent and identically distributed (i.i.d.) random calendar time , having the cumulative distribution function (c.d.f.) .

We call the above c.d.f. the fault-detection probability in this paper. From the assumptions (A-1)–(A-3), it can be seen that the total number of software faults detected by an arbitrary calendar time (day, week, month, etc.) is given by the univariate random variable having the conditional binomial p.m.f. where If is the exponential distribution and are the exponential order statistics sampled from , then the corresponding stochastic point process is equivalent to the well-known Jelinski and Moranda SDM [1].

In addition, suppose that the initial number of software faults is unknown and obeys the Poisson distribution with mean (>0), that is, Then the cumulative number of software faults detected by time is given by This implies that the probabilistic law of is governed by the well-known nonhomogeneous Poisson process (NHPP) with mean value function , and that . By substituting several kinds of fault-detection probabilities into , it is possible to develop a number of NHPP-based SDMs. Goel and Okumoto [8] assumed the exponential distribution with constant and developed the exponential NHPP-based SDM with . Yamada et al. [34] and Goel [35] used the two-phase Erlang distribution and the Weibull distribution and proposed the S-shaped NHPP-based SDM and the generalized exponential NHPP-based SDM with and , respectively. In Figure 1, we show the configuration of one-dimensional software debugging with the i.i.d. fault-detection times to obtain the NHPP-based SDMs.

On the other hand, when the cumulative test-execution time (CPU time) is observed in the testing phase, it can be used as an alternative of the calendar time, that is, . Musa et al. [3] claimed to use the test-execution time effectively instead of the calendar time in software reliability modeling and pointed out that the execution-time modeling is superior to the calendar-time modeling. However, the execution-time modeling has a limitation to the real software reliability assessment and management. To overcome the problem, Musa et al. [3] developed the calendar-time modeling in order to transform the execution time to the calendar time and gave a relationship (deterministic formula) among test-execution time, calendar time, and resource usage parameters through a simple linear regression analysis. Unfortunately, since this is an intuitive model under a specific testing circumstance, it is not applicable to the general reliability assessment in practice. In other words, the calendar-time modeling by Musa et al. [3] involves some theoretical problems and has seldom been used in the actual software reliability management.

2.2. Testing-Effort Dependent SDMs

It is common to assume that the cumulative test-execution time is regarded as a function of calendar time, that is, . Yamada et al. [17, 18] assumed the fault-detection probability and proposed the testing-effort dependent NHPP-based SDMs, where (>0) is a constant. More specifically, define the function where is called the testing-effort function and means the testing-effort expenditure spent per unit calendar time. From (5), the function can be considered as the cumulative testing-effort expenditure consumed up to time . Yamada et al. [17, 18] further assumed the Weibull testing-effort function with arbitrary parameters (>0), (>0), and (>0). Huang and Kuo [19] and Kuo et al. [21] proposed the logistic testing-effort function

Let (=), , and denote the calendar time, the test-execution time, and the cumulative number of software faults detected by the th calendar week, respectively. Given fault data , the parameters or in (6) and (7) are estimated by means of the regression analysis with the data , and the remaining parameters are done by the method of maximum likelihood with the data . Since this two-stage method is a combination of the least squares method and the maximum likelihood method, the resulting estimates have no significant meaning in terms of statistical theory. That is, the maximum likelihood estimates of with fixed do not possess the invariant property and/or asymptotical efficiency. In addition, since the testing resource is finite and , it is evident that and that the fault detection probability is defective [36]. This implies that the initial fault contents are given by the Poisson distributed random variable with mean and is inconsistent with the model assumptions in II-A. In the testing-effort dependent modeling, the fundamental idea is that the test-execution time is given by a function of the calendar time . However, if the execution time can be observed directly, substituting into formally leads to which is equivalent to the mean value function of the simple exponential NHPP-based SDM with test-execution time . Consequently, if one uses only the execution time , it corresponds to the execution-time modeling based on the exponential NHPP and is independent of the calendar-time data. In this way, the classical testing-effort dependent SDMs [1719, 21] give no satisfactory modeling framework to deal with two-time measures consistently in the past literature.

3. Two-Dimensional SDMs

3.1. Bivariate Fault-Detection Time Distributions

To cope with two-time scales () simultaneously, we extend the above univariate binomial modeling framework to the bivariate one. Let and be the calendar time (week) and the test-execution time (CPU hr), respectively. Suppose that is a continuous bivariate random variable having the bivariate c.d.f. [37, 38] where and . For convenience, letting denote the marginal distributions, it is worth noting in the bivariate probability distribution [37] that Also, it can be easily checked that and that the bivariate c.d.f. cannot be determined uniquely only by its marginal distributions, that is, the bivariate c.d.f. with the same marginal distributions can be defined infinitely (see [3943]).

In this paper, we focus on two simplest cases where obeys the bivariate exponential distribution by Marshall and Olkin (Marshall-Olkin distribution) [42, 43] and the bivariate Weibull distribution by Lu and Bhattacharyya [44], because the objective here is not to develop a number of SRMs with different fault-detection probabilities. (i) Marshall-Olkin distribution: (ii) Bivariate Weibull distribution: where (>0), (>0), , (>0), (>0), (>0), (>0), and are arbitrary parameters. Figure 2 depicts the configuration of two-dimensional software debugging with two-time measures.

For the initial fault contents , let be the i.i.d. bivariate random variable and the number of software faults detected by time . Since each software fault-detection time distribution is given by , from the assumptions (A-1)–(A-3), it is immediate to see that This is a two-dimensional NHPP with two-time scales and . In this paper, we call this type of NHPP the two-dimensional SDM with mean value function .

It is reasonable to assume that the test-execution time is proportional to the number of test cases. In the two-dimensional SDM, if either or takes an extremely large value, that is, a number of test cases are consumed during an extremely short period or very a few test cases are tried for an extremely long period, then the fault-detection probability becomes small. This satisfies our intuition for the relationship between the testing time and the number of test cases. It can be found that and if and are statistically independent from each other. This means that the cumulative number of test cases has no correlation with the length of testing period measured by calendar time and seems to be quite unreasonable. In fact, we expect in modeling that the functions and are both increasing functions of and , respectively. This is the main reason to use the Marshall-Olkin distribution [42, 43], because the other bivariate exponential distributions with explicit form of c.d.f. such as the Gumbel distribution [40] have the negative correlation [27, 28]. In the Marshall-Olkin distribution, we have Also, it holds in this case that and that the software fault can be detected with positive probability at time .

On the other hand, for the bivariate Weibull distribution in (12), we have Unfortunately, the explicit forms of and for the bivariate Weibull distribution are not available, but can be numerically calculated. These regression functions may be used to predict the cumulative test-execution time in future at an arbitrary calendar time, after estimating model parameters in .

3.2. Maximum Likelihood Estimation

In the seminal paper [24], we apply the least squares estimation to determine the model parameters. However, the advantage of the two-dimensional SDMs is to make it easy to estimate the model parameters by means of the maximum likelihood method. Let denote the cumulative number of software faults detected by the th calendar week. Given fault data , the problem is to estimate the model parameter , where , , , and for the Marshall-Olkin distribution and the bivariate Weibull distribution, respectively. Then, the likelihood function is given by Taking the logarithm of both sides of (17) yields the following logarithmic likelihood function: where .

Since the logarithmic function is monotonically increasing, the maximum likelihood estimate of the parameter maximizes the logarithmic likelihood function . If the function is strictly concave in , then the maximum likelihood estimate has to satisfy the first order condition of optimality , where is the zero vector. The above modeling approach provides a valid parameter estimation procedure based on the maximum likelihood method, without the two-stage procedure based on the least square method.

4. Software Release Planning

Of our next concern is the formulation of the optimal software release problem (e.g.,[4549]) based on the two-dimensional SDMs. The main objective here is to derive simultaneously the optimal software release timing measured by calendar time and the optimal testing-effort expenditure, which minimize the relevant expected software cost. This leads to the determination of the economic software testing schedule. Define: : software release time measured by calendar time : cumulative test-execution time measured by CPU time : software-testing cost per unit calendar time : software-testing cost per unit execution time : software-fault correction/removal cost per unit fault in testing phase : software-fault correction/removal cost per unit fault in operational phase : lifetime of software product (given constant).

During the testing phase, two kinds of testing costs proportional to the calendar time and the test-execution time, respectively, are considered, where is the cost to keep the testing team or the testing personnel, and is the cost to develop the test cases. Here, it is assumed again that the development cost of test cases is proportional to the time length of software test-execution time. On the other hand, when one software fault is detected in the testing phase and the operational phase, the unit costs to fix/remove it are given by and , respectively. The expected debugging cost during the testing phase is with fixed . If the lifetime of software product is given in advance, then the expected number of software faults detected in the operational phase is given by , that is, the expected debugging cost during the operational phase becomes . Thus, the expected total software cost during the software life cycle is given by If as a special case, then and the underlying optimization problem can be simplified as

For the expected total software cost during the software life cycle, the optimal software release policy is defined by the optimal pair which minimizes it. That is, the pair is the minimizer of provided that the associated Hessian matrix is positive definite at . In this situation, it must satisfy the first order condition of optimality if there exists a unique solution . Fortunately, it can be shown that the two-dimensional minimization problem is rather tractable. Figure 3 illustrates the behavior of expected total software cost function with respect to , where ($), ($), ($), ($), , and with , , , and . It can be easily seen that the function is unimodal in and that there exists a unique optimal software release policy . In this simple example, the time length of software testing should be planned with the release time and the test-execution time (CPU hr). When the software delivery schedule is fixed, is replaced by the delivery time, and the underlying problem is reduced to the simpler one-dimensional minimization problem, with given .

5. Real Data Analysis

5.1. Data Sets

In this section, we consider two real data sets collected in the actual software development projects for the real time command and control systems [50]. We present these data in Tables 1 and 2, where the data set DS no. 1 (DS no. 2) consists of 54 (53) faults count data observed during 17 (16) weeks with total 32.8 (18.5) (CPU hr). More precisely, DS no. 1 is the fault data for a middle size of program with 27.7 KLC of a real-time command control system. On the other hand, DS no. 2 is also the fault data for the similar program with 33.5 KLC.

5.2. Goodness-of-Fit Test

Using these two data sets, we derive the maximum likelihood estimate of the parameter , say , and perform the goodness-of-fit test of the two-dimensional SDMs with the maximum logarithmic likelihood (LLF), the Akaike information criterion (AIC) and the residual square sum (RSS) at the data observation points of 100%, 75%, 50%, and 25%. Tables 3 and 4 present the goodness-of-fit results for the data sets DS no. 1 and DS no. 2, respectively, where MO, BW, GO, GL, LT, and WT denote the Marshall-Olkin SDM, the bivariate Weibull SDM in the two-dimensional NHPPs, the one-dimensional exponential SDM, the one-dimensional generalized exponential SDM, the one-dimensional logistic testing-effort dependent SDM, and the one-dimensional Weibull testing-effort dependent SDM, respectively. From these results, it can be seen that the two-dimensional SDMs can provide the better goodness-of-fit performance in terms of AIC and RSS in many cases. Of course, it is worth noting that the comparison between the two-dimensional and the one-dimensional SDMs is not fair, because the corresponding AICs mean the approximate distances in the sense of distribution between the real (but unknown) software fault-count process and the probability model with different dimension. In other words, the goodness-of-fit comparison based on the AIC should be made among the SDMs with the same dimension. Then, the BW can show the better goodness-of-fit performance than the MO in almost cases. In Figures 4 and 5, we show the temporal behavior of estimates for the cumulative number of detected software faults with 100% data of DS no. 1 and DS no. 2, respectively. It is observed that the two-dimensional SDMs clarify the relationship between the calendar time and the testing-effort expenditure and enable measuring the software testing progress effectively.

5.3. Prediction Performance

Next, we predict the behavior of the number of software faults detected in the future. Two possible cases are considered: in case 1, the model parameters are estimated at the observation point , but at each calendar time , the testing-effort expenditure is scheduled in advance ( (given) or estimated with the regression function in (14); that is, (regression). On the other hand, in case 2, we consider the other situation where after estimating the model parameters at the observation point , the testing-effort expenditure to consume the scheduled test cases, , is known in advance, but the calendar time to complete respective test cases is estimated with the regression function in (18), that is, . Tables 5 and 6 (Tables 7 and 8) present the prediction squares error (PSE) for estimates of the cumulative number of software faults detected in the future, where PSE is defined by and is the total number of data for analysis. In this example, we set and 16 for DS no. 1 and DS no. 2, respectively. From these results, the two-dimensional SDMs can show the better performance for prediction at the 75% observation point. However, in the other observation points, no remarkable results are found, so that the prediction performance depends on the kind of data.

Next, we assess the quantitative software reliability as the probability that no software failure occurs during a specified time period, where the software reliability is defined by Once the software is released at the calendar time to the user or market, it should be noted that no software test is carried out in general, so that the cumulative testing-effort expenditure is considered as a constant, and the software reliability is defined as a function of only the calendar time. Figures 6 and 7 illustrate the estimation results of software reliability at the 100% observation point. From these figures, it can be found that the two-dimensional SDMs (MO and BW) overestimate the reliability function much more than the typical one-dimensional SDMs (note that the testing-effort dependent SDMs are not available in the situation with the constant testing-effort expenditure after release, because is an increasing function of the calendar time). Since the corresponding software products are actually released at the observation points, however, it can be concluded that the values of software reliability based on one-dimensional SDMs are too small and rather questionable.

5.4. Optimal Software Release Policy

Finally, we investigate the dependence of cost parameters characterizing the software testing process on the optimal software release policy (Tables 9 and 10). In Tables 11 and 12, we present the optimal software release policies maximizing the expected total software costs with the two-dimensional SDMs proposed in this paper, where the Marshall-Olkin SDM is assumed with the same parameters in Figure 3. In these tables, the policy denotes that it is optimal not to carry out the software test economically. Of course, such a situation with relatively small difference between the fault correction/removal costs in the testing and operational phases will not be realistic. As a remarkable point in these tables, as both cost ratios and increase, the associated optimal pair monotonically decrease. To the contrary, the expected total software cost with fixed increases but the cost with fixed decreases.

6. Conclusions

In this paper, we have developed two-dimensional software reliability models based on bivariate fault-detection time distributions related with calendar time and test execution time and compared their goodness-of-fit and prediction performances with real project data. The numerical examples have suggested that our new SDMs provide better performance than the existing NHPP-based SDMs in many cases.

In the future, we will apply the same technique to the case with many factors influencing the cumulative number of failure occurrences. For instance, when the code coverage is considered as Grottke [23], a different modeling framework will be needed to incorporate it. Also, if the third time measure is introduced like the number of test cases, the higher-dimensional SDMs should be developed to handle such a complex case.

Acknowledgment

The present research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (C), Grant no. 23510171 (2011–2013). This work is an extended version of our conference paper [25].