Abstract

In our previous work, we proposed wavelet shrinkage estimation (WSE) for nonhomogeneous Poisson process (NHPP)-based software reliability models (SRMs), where WSE is a data-transform-based nonparametric estimation method. Among many variance-stabilizing data transformations, the Anscombe transform and the Fisz transform were employed. We have shown that it could provide higher goodness-of-fit performance than the conventional maximum likelihood estimation (MLE) and the least squares estimation (LSE) in many cases, in spite of its non-parametric nature, through numerical experiments with real software-fault count data. With the aim of improving the estimation accuracy of WSE, in this paper we introduce other three data transformations to preprocess the software-fault count data and investigate the influence of different data transformations to the estimation accuracy of WSE through goodness-of-fit test.

1. Introduction

In the field of software reliability engineering, the quantitative assessment of software reliability has become one of the main issues of this area. Especially, people are interested in finding several software intensity functions from the software-fault count data observed in the software testing phases, since the software intensity function in discrete time denotes the number of software faults detected per unit time. This directly makes it possible to estimate the number of remaining software faults and the quantitative software reliability, which is defined as the probability that software system does not fail during a specified time period under a specified operational environment. Moreover, these evaluation measures can be used in the decision making such as allocation of development resources and software release scheduling. Therefore, we are interested in developing a high-accuracy estimation method for the software intensity function.

Among over hundreds of software reliability models (SRMs) [13], nonhomogeneous Poisson process (NHPP)-based SRMs have gained much popularity in actual software testing phases. In many cases, the NHPP-based SRM is formulated as a parametric model, where the mean value function or its difference in discrete time or derivative in continuous time, called “software intensity function,” can be considered as a unique parameter to govern the probabilistic property. One class of parametric NHPP-based SRMs is concerned with modeling the number of software faults detected in testing phases, initiated by Goel and Okumoto [4]. Afterwards, many parametric NHPP-based SRMs were proposed in the literatures [58] from various points of view. However, it is well known in the software reliability engineering community that there does not exist a uniquely best parametric NHPP-based SRM which can fit every type of software-fault count data. This fact implies that nonparametric methods without assuming parametric form should be used to describe the software debugging phenomenon which is different in each testing phase.

Apart from the traditional Bayesian framework, some frequentist approaches based on non-parametric statistics were introduced to estimate the quantitative software reliability. Sofer and Miller [9] used an elementary piecewise linear estimator of the NHPP intensity function from the software-fault detection time data and proposed a smoothing technique by means of quadratic programming. Gandy and Jensen [10] applied the kernel-based estimator to estimate the NHPP intensity function in a non-parametric way. Barghout et al. [11] also proposed a kernel-based non-parametric estimation for the order statistics-based SRMs, where the likelihood cross-validation and the prequential likelihood approaches were used to estimate the bandwidth. Wang et al. [12] applied the similar kernel method to the NHPP-based SRMs, where they focused on the local likelihood method with a locally weighted log-likelihood function. By combining the non-parametric estimation with the Bayesian framework, El-Aroui and Soler [13] and Wilson and Samaniego [14] developed non-parametric Bayesian estimation methods. It should be noted that, generally, these non-parametric estimation methods require high computational cost, which, in some cases, may be almost similar to or greater than an effort on model selection in the parametric SRMs.

Another class of non-parametric estimation methods for NHPP-based SRMs is the wavelet analysis-based approach, initiated by Xiao and Dohi [15]. They proposed the wavelet shrinkage estimation (WSE), which does not require solving any optimization problem, so that the implementation of estimation algorithms is rather easy than the other non-parametric methods. They compared their method with the conventional maximum likelihood estimation (MLE) and the least squares estimation (LSE) through goodness-of-fit test. It has been shown that WSE could provide higher goodness-of-fit performance than MLE and LSE in many cases, in spite of its non-parametric nature, through numerical experiments with real software-fault count data.

The fundamental idea of WSE is to remove the noise included in the observed software-fault count data to get a noise-free estimate of the software intensity function. It is performed through the following three-step procedure. First, the noise variance is stabilized by applying the data transformation to the data. This produces a time-series data in which the noise can be treated as Gaussian white noise. Second, the noise is removed using “Haar-wavelet” based denoising algorithm. Third, an inverse data transformation is applied to the denoised time-series data, obtaining the estimate of the software intensity function. Among many variance-stabilizing data transformations, the Anscombe transform [16] and the Fisz transform [17] were employed in the previous work [15]. The other well-known square root data transformations are Bartlett transform [18] and Freeman transform [19]. Both Anscombe transform and Freeman transform are actually natural extensions of the Bartlett transform.

This paper focuses on the first step of WSE and aims at identifying and emphasizing the influence that the data transformation exerts on the accuracy of WSE. The remaining part of this paper is planned as follows. In Section 2, we give a preliminary on NHPP-based software reliability modeling. Section 3 is devoted to introduce data transformations. Section 4 describes the WSE for NHPP-based SRMs in details. In Section 5, we carry out the real project data analysis and illustrate numerical examples to examine the effectiveness of the proposed methods. Finally the paper is concluded with future researches in Section 6.

2. NHPP-Based Software Reliability Modeling

Suppose that the number of software faults detected through a system test is observed at discrete time . Let and denote the number of software faults detected at testing date and its cumulative value, where is assumed without any loss of generality. The stochastic process is said to be a discrete non-homogeneous Poisson process (D-NHPP) if the probability mass function at time is given by where is called the mean value function of a D-NHPP and means the expected cumulative number of software faults detected by testing date . The function is called the discrete intensity function and implies the expected number of faults detected at testing date , say .

MLE is one of the most commonly used parametric estimation method. Let denote the vector of parameters in the mean value function , and denote the realization of . When software faults are detected, the log-likelihood function of a D-NHPP is given by where and . Then, the maximum likelihood estimate of , say , is given by the solution of . Therefore, the estimate of the software intensity function can be obtained by with . In the following sections, we consider the problem of estimating the software intensity function from the noise-involved observation , in a non-parametric way.

3. Variance Stabilizing Data Transformation

It is very familiar to make use of data transformations (DTs) to stabilize the variance of Poisson data. By using DT, the software-fault count data which follow the D-NHPP are approximately transformed to the Gaussian data. The most fundamental data-transform tool in statistics is the Bartlett transform (BT) [18]. Let denote the Poisson white noise, that is, Taking the BT, the random variables: can be approximately regarded as Gaussian random variables with the normal distribution , so that the realizations: can be considered as samples from . That is, the transformed realizations by the BT are the ones from the normally distributed random variables: where is the transformed software intensity function, and is the Gaussian white noise with unit variance.

Bartlett [18] also showed that is a better transformation since it provides a constant variance more closely to 1, even when the mean of is not large.

The Anscombe transform (AT) [16] is a natural extension of BT and is employed in our previous work [15]. AT is of the following form: where can be considered as observations of Gaussian random variable with the normal distribution . Freeman and Tukey [19] proposed the following square-root transform (we call it FT), which is also an extension of BT: They showed that the variance of Gaussian random variable is the nearest to 1 among BT, AT, and FT if the mean of is small. Recently, these variance stabilization techniques were used to LSE of the mean value function for the NHPP-based SRMs [20]. Table 1 summaries the DTs mentioned above.

As mentioned in Section 1, the first step of WSE is to apply the normalizing and variance-stabilizing DTs to the observed software-fault count data. In this paper, we employ BT, AT, and FT in the first and the third steps of WSE. Then, the target of denoising in the second step of WSE is the transformed data or . Letting , and denote the denoised , and , respectively, the estimate of the original software intensity function can be obtained by taking the inverse DT of and , as given in Table 1.

4. Wavelet Shrinkage Estimation for NHPP-Based SRM

4.1. Haar-Wavelet-Based Denoising Procedure

The Haar-wavelet-based shrinkage technique can be used as a denoising algorithm for the second step of WSE. In general, the noise removal is performed through the following three steps: (i) expanding the transformed time-series data to obtain the empirical wavelet coefficients, (ii) removing the noise included in the empirical wavelet coefficients using thresholding method, and (iii) making use of the denoised coefficients to calculate the estimate of the transformed software intensity function.

4.1.1. Haar Wavelet Transform

The Haar scaling function and the Haar wavelet function are defined as respectively. By introducing the scaling parameter and shifting parameter , the Haar father wavelet and the Haar mother wavelet are defined by respectively. Then the target function, transformed software intensity function , can be expressed in the following equation: where are called the scaling coefficients and the wavelet coefficients, respectively, for any primary resolution level . Due to the implementability, it is reasonable to set an upper limit instead of for the resolution level . In other words, the highest resolution level must be finite in practice. We use to denote the highest resolution level. That is, the range of in the second term of (12) is . The mapping from function to coefficients () is called the Haar wavelet transform (HWT).

Since or can be considered as the observation of , the empirical scaling coefficients and the empirical wavelet coefficients of can be calculated by (13) and (14) with replaced by or . The noises involved in the empirical wavelet coefficients can be removed by the thresholding method that we will introduce later. Finally, the estimate of can be obtained by taking the inverse HWT with denoised empirical coefficients.

4.1.2. Thresholding

In denoising the empirical wavelet coefficients, the common choices of thresholding method are the hard thresholding: and the soft thresholding: for a fixed threshold level , where is the indicator function of an event , is the sign function of and . There are many methods to determine the threshold level . In this paper, we use the universal threshold [21] and the “leave-out-half” cross-validation threshold [22]: where is the length of the observation . Hard thresholding is a “keep” or “kill” rule, while soft thresholding is a “shrink” or “kill” rule. Both thresholding methods and both threshold levels will be employed to work on the empirical wavelet coefficients for denoising.

4.2. Wavelet Shrinkage Estimation for NHPP-Based SRM

Since the software-fault count data is Poisson data, the preprocessing is necessary before making use of the Haar-wavelet-based denoising procedure. Xiao and Dohi [15] combined data transformation and the standard denoising procedure to propose the wavelet shrinkage estimation (WSE) for the D-NHPP-based SRMs. They call the HWT combined with AT, the Haar-Anscombe transform (HAT). Similarly, we call the HWT combined with BT and FT, the Haar-Bartlett transform (HBT) and the Haar-Freeman transform (HFT), respectively. In the numerical study, we investigate the goodness-of-fit performance of HBT- and HFT-based WSEs and compare them with the HAT-based WSE [15].

5. Numerical Study

5.1. Data Sets and Measures

We use six real project data sets cited from reference [1], where they are named as J1, J3, DATA14, J5, SS1, and DATA8. These data sets are software-fault count data (group data). We rename them for convenience as DS1~DS6 in this paper. Let () denote the pair of the final testing date and the total number of detected fault. Then these data sets are presented by (62, 133), (41, 351), (46, 266), (73, 367), (81, 461), and (111, 481), respectively. We employ the MSE (mean squares error) as the goodness-of-fit measures, where Additionally, we calculate LL (Log Likelihood), which is defined as where , and is the WSE estimate of the software intensity function .

5.2. Goodness-of-Fit Test

A total of 16 wavelet-based estimation methods are examined in this paper since the WSE is applied with four thresholding techniques: hard thresholding (h) versus soft thresholding (s); universal threshold (ut) versus “leave-out-half,” cross-validation threshold (lht). Let HBT, HAT and HFT denote the WSEs based on Haar-Bartlett Transform, Haar-Anscombe Transform and Haar-Freeman Transform, respectively. HBT1, and HBT2 correspond to the transforms in (5) and (7), respectively. Additionally, the result of HAT-based WSE was introduced in [15], but they only showed the results of HAT-based WSE with hard thresholding. Here, we present comprehensively all the results of them for a further discussion.

We present the goodness-of-fit results based on different threshold levels in Tables 3 and 4, respectively. HBT2 provides smaller MSE and larger MLL than the others when “ut” is used, regardless of the thresholding method employed. It is worth mentioning that “ut” provides the same estimates with hard or soft thresholding in three data sets (DS1, DS2, and DS4). This is due to the relatively large value of “ut”, since when threshold is set to be 0, the output of thresholding is the wavelet coefficient itself. In our numerical experiments, “ut” is relatively large in these 3 software-fault count data sets, which result in whichever hard or soft thresholding is applied. HBT2 also looks better than the others when software thresholding is applied with “lht.” However, when “lht” is combined with hard thresholding, HBT1 (HAT; HFT) possesses the best results in DS3 and DS5 (DS1; DS6), respectively. Since “lht” is considered as a much more proper threshold level than “ut” in analyzing of software-fault count data, we suggest that HBT2 should be selected as an appropriate DT for the WSE.

5.3. Comparison with MLE

Our concern in this section is the comparison of WSE with MLE. We estimate the software intensity function by using three parametric D-NHPP-based SRMs listed in Table 2, where the MLE is applied to estimate the model parameters for comparison. HBT2-based WSE is selected as a representative one among the 16 wavelet-based estimation methods. Figures 1 and 2 depict the estimation behavior of cumulative number of software faults and its increment per testing date (individual number of software faults detected per testing date) with DS1. The observed data is plotted in bar graph in both figures. Looking at (i) and (iii) in these figures, it is clear that the parametric D-NHPP-based SRMs with maximum likelihood estimator can fit the real data. However, since the parametric D-NHPP-based SRMs assume the software intensity function as smooth function, they can estimate only the average tendency of the individual number of software faults detected at each testing date, but cannot follow the microscopic fluctuated behavior in (ii) and (iv) of Figures 1 and 2. In other words, the estimation accuracy based on the cumulative number of faults is embedded in “cumulative effects” in (i) and (iii). The experimental results performed here give the potential applicability of the wavelet-based estimation methods with different thresholding schemes. Our methods employed here do not need the expensive computational cost comparing with the MLE (within less than one second to get an estimate). This is a powerful advantage in applying the D-NHPP-based SRMs, in addition to the fact that practitioners do not request much time and effort to implement the wavelet-based estimation algorithms.

6. Concluding Remarks

In this paper, we have applied the wavelet-based techniques to estimate the software intensity function. Four data transformations were employed to preprocess the software-fault count data. Throughout the numerical evaluation, we could conclude that the wavelet-based estimation methods with Bartlett transform have much more potential applicability than the other data transformations to the software reliability assessment practice because practitioners are not requested to carry out troublesome procedures on model selection and to take care of computational efficiency such as judgment of convergence and selecting initial guess of parameters in the general purpose of optimization algorithms. Note that, the result obtained here does not mean that the other data transformations are not good because the performance evaluation was executed only through goodness-of-fit test. Although the prediction ability of the proposed methods is out of focus of this paper at the present, the predictive performance should be considered and compared in the future.