Abstract

Most Software Reliability Growth Models (SRGMs) based on the Nonhomogeneous Poisson Process (NHPP) generally assume perfect or imperfect debugging. However, environmental factors introduce great uncertainty for SRGMs in the development and testing phase. We propose a novel NHPP model based on partial differential equation (PDE), to quantify the uncertainties associated with perfect or imperfect debugging process. We represent the environmental uncertainties collectively as a noise of arbitrary correlation. Under the new stochastic framework, one could compute the full statistical information of the debugging process, for example, its probabilistic density function (PDF). Through a number of comparisons with historical data and existing methods, such as the classic NHPP model, the proposed model exhibits a closer fitting to observation. In addition to conventional focus on the mean value of fault detection, the newly derived full statistical information could further help software developers make decisions on system maintenance and risk assessment.

1. Introduction

Software reliability, defined as the probability of failure-free operation under certain conditions and by specific time [1], is one of the significant attributes of software systems development life cycle. As the software systems mature with ever growing complexity, evaluation, prediction, and improvement of their reliability become a crucial and daunting task for developers in both the development and testing phases. Numerous Software Reliability Growth Models (SRGMs) have been developed [25], which generally agree that the fault debugging process is a Nonhomogeneous Poisson Process (NHPP) under various diverging assumptions: perfect and imperfect debugging phenomenon [6, 7].

A common assumption is that removing detected faults will not introduce new faults, usually called perfect debugging phenomenon. Based on earlier works of Jelinski and Moranda [8], Goel and Okumoto [9] developed the exponential Software Reliability Growth Models (G-O model) with a constant fault detection rate. In latter models, the smoothly changeable fault detection rates are incorporated. Yamada et al. [10] found the fault detection rate fit to a S-shape growth curve and proposed the delay S-shaped SRGM. Employing the learning phenomenon in software fault detection process, Ohba [11] later developed an alternative inflection S-shaped SRGM. Further work was conducted by Yamada and others [12] by incorporating the relationship between the work effort and the number of faults detected into the model. Huang et al. [13] stated that fault detection and repair process are different in software development and operation. They proposed a unified theory to merge multiple moves into a NHPP-based SRGM and found a regularly changeable fault detection rate in the detection process. Later on Huang and Lyu [6] also used multiple change points to deal with imperfect debugging problems. The common assumption is that the removal of the detected faults will not introduce new faults. This scenario is known as perfect debugging. However, given the complexity of the software debugging process, debugging activity may be imperfect in practical software development environment and thus those perfect models may oversimplify the underlying dynamics.

To solve such problems, imperfect debugging processes are incorporated in latter models. Whenever the software system is developed, new errors can be introduced into the software during development/testing phase and faults may not be corrected perfectly. This phenomenon is known as imperfect debugging. According to the above two phenomenons, SRGMs can be divided into two categories: perfect and imperfect debugging model. Kapur and Garg [14] discussed their fault removal phenomenon that as the team gains more experience, they may remove detected faults without causing new ones. However, the testing team may not correct a fault perfectly and the original fault may remain or generate new faults, leading to a phenomenon known as imperfect debugging. It was Goel [15] who first introduced the concept of imperfect debugging. Latter, Ohba and Chou [16] developed an error generation model based on the G-O model, named as an imperfect debugging model. Pham et al. [17] proposed a general imperfect software debugging model (P-N-Z model). Pham [18] also developed an SRGM for multiple failure types incorporating error generation. Zhang et al. [19] proposed a testing efficiency model that includes both imperfect debugging and error generation. Kapur et al. [20] proposed a flexible SRGM with imperfect debugging and error generation using a logistic function for the fault detection rate, which reflects the efficiency of the testing and removal team. Employing the learning phenomenon in software fault detection process, Yamada et al. [21] later developed an imperfect debugging model. In imperfect software debugging models, the fault introduction rate per fault is generally assumed to be constant or decreasing changes over time [22]. Recently, Wang et al. [22] also proposed a model to represent the imperfect debugging process with a log-logistic distribution. To sum up, most models presume a deterministic relationship between the cumulative number of detected faults and the time span of the software fault debugging process.

However, software debugging is a stochastic and uncertain process, which can be effected by many environmental factors, such as the distribution of resources, strategy, and running environment [23]. In particular, in the uncertain network environment, the aforementioned assumptions of deterministic model would become problematic. The environmental noise introduces great uncertainties that significantly affects traditional debugging process. To address such challenges, new stochastic models [2427] were proposed, which treat the debugging process as perfect and stochastic; they assume each failure occurrence is independent and follows identical random distribution. By employing the white noise, that is, temporally uncorrelated random variables, for the environmental factors collectively, they used a flexible stochastic differential equation to model the irregular changes. Comparing to conventional models, such white-noise-based approach assumes the perfect debugging and no doubt provides closer approximation to uncertain fluctuations in reality but with great mathematical simplicity. Debugging activity is usually imperfect in piratical software development and recent data [28, 29] show that the fault detection is highly susceptible to noise and is generally correlated; thus earlier assumptions become problematic. Thus, because of its mathematical simplicity, it may also considerably underestimate the imperfect debugging process and the temporal correlation in a dynamic environment.

In this study, we propose an alternative stochastic framework based on partial differential equation (PDE), to describe the perfect/imperfect debugging processes, in which the environmental noise is of arbitrary correlation. Details of the model is provided in Section 2 with an equation for the probabilistic density function (PDF) of system states. In Section 3, we validate our approach with both historical failure data and results from conventional methods; main features of the model, including the temporal evolutions of its full statistical information, are later addressed in Section 4. The final conclusions are given in Section 5.

2. Problem Formulation

2.1. Model Formulation

Widely used in many practical applications, NHPP model assumes the following:(1)the fault debugging process is modeled as a stochastic process within a continuous state space;(2)the system is subject to failures at random times caused by the remaining errors in the system;(3)the mean number of detected faults in a time interval is proportional to the mean number of remaining faults;(4)when a fault is detected, it may be removed with the probability ;(5)when a fault is removed, new faults may be generated with a constant probability .

In order to describe the stochastic and uncertain behavior of the fault debugging process, in a continuous state space, the temporal evolution of is routinely described as [12]where is the expected number of faults detected in the time interval , denotes the number of total faults, and is a nonnegative function representing the fault detection rate per remaining fault at testing time . We further note that is the number of newly introduced bugs, while means the number of successful removal; together they represent a (perfect/imperfect) debugging process in which probabilities and are assigned [3032]. Without a loss of generality, we denote to facilitate subsequent presentation and indicates a perfect debugging process.

Since is subject to a number of random environmental effects [23], we follow the previous studies [24, 25, 27] and represent as a sum of its mean value and a zero-mean (random) fluctuation :

By substituting (2) into (1), we obtain a stochastic differential equation:

In contrast to earlier works of [24, 25, 27] which approximate the fluctuations term as a white noise, we relax such assumption and denote the noise’s two-point covariance more broadly aswhere represents the strength of the noise, denotes its correlation function, and indicate two temporal points, and is the noise correlation time.

2.2. PDF Method

Our goal is to derive an equation for the probabilistic density function of , that is, . To be specific, we adopt the PDF method originally developed in the context of turbulent flow [33] and use the Dirac delta function to define a “raw” PDF of system states at time :whose ensemble average over random realisations of yields

Following recent study on PDF method [34], we multiply the original equation (3) with and apply the properties of the Dirac delta function . This would yield the stochastic evolution equation for in the probability space ():

Now we take the ensemble average of (7) and apply Definition (6); by employing a closure approximation, variously known as the large-eddy diffusivity (LED) closure [35] or the weak approximation [36], an equation of can be obtainedwhere the eddy diffusivity and effective velocity areAnd Green’s function satisfies the deterministic partial differential equationsubject to the homogeneous initial (and boundary) conditions corresponding to their (possibly inhomogeneous) counterparts for the raw PDF (7).

3. Model Validation

In this section, we validate our PDE model (8a), (8b), (8c), and (8d) by computing the mean fault detection and compare it with historical data and two other methods, namely, the classic deterministic SRGMs (generalised NHPP model) and the popular stochastic SRGMs (white-noise model). As shown in Table 1, four sets of data are used: the first one (DS1) is from WebERP system [37]; the second (DS2) originated from the open source project management software [38]; and the third (DS3) and fourth (DS4) are from the first and fourth product releases of the software products at Tandem Computers [32]. Among those four sets of data, the two UDP/TCP-based systems (DS1 and DS2) are more likely to be affected by environmental factors, whereas the TCP-based software development at Tandem Computers Inc. takes place in a more enclosed/stable environment. UDP is a connectionless-oriented transport protocol and will introduce larger correlation strength for system based on it, while TCP is connection-oriented and will generate smaller correlation strength. We encapsulate such difference in terms of noise variance, for example, larger for UDP/TCP-based systems (DS1 and DS2). Moreover, DS1, DS2, and DS3 are from the imperfect debugging process, while DS4 is from the perfect debugging process; we demonstrate this diverse with various configurations of the subsequent presentation . The testing time of DS1, DS2, DS3, and DS4 is (months), (weeks), (weeks), and (weeks), respectively. For detailed information, please refer to the data values tabulated in the Appendix.

It is also noted here that we consider a special case, that is, constant , by adopting earlier works [30, 39, 40]. A stationary Gaussian process is prescribed to the fluctuation term, , with an exponential correlation .

3.1. Model Solution for Constant

For model validation, we adopt the popular generalized Goel-Okumoto model (G-O), as this enables us to derive an analytical solution of (3) using separation of variables. The expression for the mean fault detection rate per remaining fault using the generalised G-O model is given by [41]where is a constant and is the shape factor.

Following the PDF method, we can obtain an analytical solution of :whereand is the variance of the random integral , whose PDF follows a Gaussian distribution, , as outlined in Table 2.

The cumulative density function (CDF) can be derived as :and represents the error function.

The mean fault detection is then

It is noted here that the fault detection rate model (12) describes an imperfect debugging process for constant . In many circumstances, time delay, failed removal, and new faults would lead to the problem that is a time-dependent variable [6, 7, 13, 42], so that we are not able to get an analytical solution and can only obtain a numerical solution. In the current study, our focus lies on the uncertainty quantification framework of environmental factors in imperfect debugging process; without a loss of generality, subsequent approach could also be extended to the imperfect debugging process with time-dependent .

3.2. Performance Criteria

To evaluate the performance of our PDE model, we employ four comparison criteria, for example, the predicted relative error (PRE), mean square error (MSE), Coefficient of Determination (), and root mean square prediction error (RMSPE).

3.2.1. Predicted Relative Error (PRE)

The relative difference between observed and estimated number of faults at time , known as predictive validity [7, 43], is as follows:where is the fault number from observed data at time . Hence for a better fit to the data, we seek [44].

3.2.2. Mean Square Error (MSE)

Considerwhere and denote the estimated and observed number of faults detected at time , respectively, is the total number of observation data points. A lower value of represents better model prediction [41, 45].

3.2.3. Coefficient of Determination ()

We define this coefficient as the ratio of the sum of squares resulting from the trend model to that from constant model:where is the average of detected faults, . measures the percentage of the total variation about the mean accounted for the fitted curve. The lager is, the better the model explains the variation in the data.

3.2.4. Root Mean Square Prediction Error (RMSPE)

The average of prediction errors (PEs) is known as Bias [46]. The standard deviation of predicted relative variation (PRV) is known as predicted relative variation [46]. The root mean square prediction error (RMSPE) is a measure of the closeness with which the model predicts the observation and is a measure of closeness with which a model predicts the observation [46]:Given The lower the values of Bias, PRV, and RMSPE, the better the goodness of fit.

3.3. Model Parameter Estimation

There are six input parameters for our PDE method: the total number of initial faults , the fault detection rate , the shape factor , the subsequent presentation , the noise strength , and correlation time . Given a data set, we used SPSS to estimate the parameters , , and , and the methods for estimating three parameters include method of moments, least square method, maximum likelihood, and Bayesian methods [4749]. For the subsequent presentation , we use the empirical value of [30, 39, 40]. About the noise parameters, we adopt earlier works on white noise [24] for the strength of our correlated noise , which yields the optimal solution as discussed in the sensitivity analysis Section 3.5. Regarding the noise correlation time , we take the fact that a software failure is typically repaired within days [50] and hence set (month) or (week) for later analysis. A stationary Gaussian process is prescribed to the fluctuation term, , with an exponential correlation .

3.4. Validation Results

In this section, we validate our PDE model (12) by computing the mean fault detection and compare it with historical data and two other methods, namely, the classic deterministic SRGMs (generalised NHPP model) and the popular stochastic SRGMs (white-noise model). Figure 1 shows expected number of faults from our PDE model, the G-O model, and the white-noise model against four data sets over the testing time, respectively. Parameters of all three models are adjusted for their best fit. As demonstrated in Figure 1, though our PDE model overestimates the number of faults for DS1 and leads to underestimation for DS2, DS3, and DS4, overall it provides the overall best match to the four data sets, compared with the G-O model and the white-noise model.

Figure 2 shows the PRE results for our PDE model, G-O model, and white-noise model over the testing time. Similar trends are exhibited for all three models: large estimation error (deviation from zero) at the beginning but gradual convergence to observation at later time. This is expected, since few data can be utilised at earlier time to gauge the models’ parameters; as time elapses and more data become available, the models’ accuracy improve and hence their PRE approaches zero. Overall, the PDE model yields the smallest PRE.

We tabulated the aforementioned evaluation results of our PDE model, G-O model, and white-noise model for DS1, DS2, DS3, and DS4 in Table 3. Despite a slightly higher value of PRV for DS1, the PDE model presents best goodness-to-fit with all four data sets in all performance indicators.

3.5. Sensitivity Analysis

Having validated our PDE model with four observation real data sets, we now conduct the sensitivity analysis to study the impact of correlation time , strength , and function on our fault prediction.

We first investigate the impact of the correlation time . Figure 3 and Table 4 present the PRE, MSE, , Bias, PRV, and RMSPE results on DS2 for our PDE model with parameters , , , , and , G-O model with parameters and , and white-noise model with parameters , , , , and . It demonstrates that the PDE model fits best the real data and the repair time is week. Hence a correlation time (month or week) is a good approximation of fault repair time in reality.

Next we study the impact of correlation strength. Figure 4 and Table 5 show the results with DS2 for our PDE model with parameters , , , and , G-O model with parameters and , and white-noise model with parameters , , , and . It demonstrates that has a large effect on the white-noise model, while its influence on the PDE model is very limited.

Lastly we conduct the sensitivity analysis of the correlation function. Five types of correlation function, exponential, besselj, normal, cosine, and cubic, are studied. Figure 5 and Table 6 show the results on DS2 for our PDE model with parameters , , , , , and , G-O model with parameters and , and white-noise model with parameters , , , , and . It is clear that the correlation type significantly affect our model estimation. It also demonstrates that the PDE model with exponential correlation function fits best.

4. Reliability Assessment

Having validated our model with historical data, we now apply our framework first to study the detection process behaviour through its full statistical information in Section 4.1 and later to assess software reliability in Section 4.2.

4.1. Temporal Evolution of PDF

In addition to conventional focus on the mean value of fault detection, , we now include full statistical information, that is, the probabilistic density function (PDF) and the cumulative density function (CDF), to study system behaviour.

Three snapshots of the PDF and CDF at time , and weeks are plotted in Figure 6, respectively, using , , , , , and . The uncertainty of our prediction, encapsulated in the width of the PDF and the slope of the CDF, initially rises from to weeks; it later falls at weeks as most of the faults are detected. Here we propose employing the CDF as a criteria to ensure certain levels of reliability. For example, in Figure 6, for a software reliability in , , and weeks, the minimum number of faults detected must be , , and , respectively.

4.2. Software Reliability and Reliability Growth Rate

Software reliability is broadly defined as the probability that no software fault is detected in the time interval , given that the last fault occurred at time . The software reliability can be expressed as

Thus, the software reliability growth rate, , a measure of reliability change rate, can be computed aswhereThe temporal derivative of the variance is listed in Table 2.

Three snapshots of software reliability, , at time weeks, are presented in Figure 7, using , , , , , and . Reliability at all three timeframes exhibits initial drop but such deterioration gradually slows down and eventually approaches zero as . This is expected, since, at earlier time, most of the faults are yet removed; with elapsing time, more are debugged and this leads to a fall in software reliability; when the system is cleared of all faults, reaches a stationary state. Therefore, later timeframe ( week) also presents a sharper drop to zero.

For a closer examination, we now look at the software reliability growth rate, , at weeks. As shown in Figure 7, in all three timeframes demonstrates an initial rise to maximum and late fall to stationarity of zero, albeit earlier timeframe ( week) exhibits higher zenith (RGR = 600), since fewer faults are removed at earlier stage.

5. Conclusion

We presented a novel model to quantify uncertainties associated with the perfect or imperfect debugging process. Treating the random environmental impacts collectively as a noise with arbitrary correlation, an equation of the distribution of fault detection, that is, its probabilistic density function (PDF), is derived. Our analysis leads to the following major conclusions.(i)Under the conventional evaluation criteria on mean value of detected fault in the perfect or imperfect debugging process, our model provides best fit to observation data comparing to the classic G-O model and white-noise model.(ii)The proposed approach also goes beyond uncertainty quantification based on mean and variance of system states by computing its PDF and cumulative density function (CDF). This enables one to evaluate probabilities of rare events, which are necessary for probabilistic risk assessment.(iii)Predictive uncertainty of fault detection initially increases but gradually diminishes as time elapses; hence more maintenance is needed at the initial stage.(iv)Software developers could also use CDF as a new criteria to assess system reliability, for example, by computing the minimum number of faults detected for an prescribed confidence level.(v)The proposed correlated model describes an imperfect debugging process for constant , which enables us to obtain an analytical solution . The approach could also be extended to the imperfect debugging process with time-dependent , to help us drive a numerical solution by computing its PDF. This would be the focus of our future work.

Appendix

Data Sets

In the Appendix, four data sets used are presented. The detailed information about two data sets DS1 and DS2 can be referenced for further study at the SourceForge website. Originally, these two data sets had belonged to time-between-failures data, as tabulated in Tables 7 and 8. The other software testing data were collected from two major releases of the software products at Tandem Computers [4], as summarized in Table 9.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.