An Improved Approach for Reduction of Defect Density Using Optimal Module Sizes
Nowadays, software developers are facing challenges in minimizing the number of defects during the software development. Using defect density parameter, developers can identify the possibilities of improvements in the product. Since the total number of defects depends on module size, so there is need to calculate the optimal size of the module to minimize the defect density. In this paper, an improved model has been formulated that indicates the relationship between defect density and variable size of modules. This relationship could be used for optimization of overall defect density using an effective distribution of modules sizes. Three available data sets related to concern aspect have been examined with the proposed model by taking the distinct values of variables and parameter by putting some constraint on parameters. Curve fitting method has been used to obtain the size of module with minimum defect density. Goodness of fit measures has been performed to validate the proposed model for data sets. The defect density can be optimized by effective distribution of size of modules. The larger modules can be broken into smaller modules and smaller modules can be merged to minimize the overall defect density.
The reliability of a software product is an important factor, being considered by the developer before the formal release of a product. Defect density (DD) is an important attribute that affects software reliability. Defect prediction is very important for estimation of defect density. There are many methods available that can be used to predict the number of defects in software during testing phases . Estimation of this attribute is one of the approaches used to establish readiness for release. There are several methods that can be used to predict or estimate the value of this parameter. Six sigma methods are used by Fehlmann  for advanced prediction of defect density for software that has been developed and its development process is moving further for production. Mark  has presented a DevCOP method that has been used for estimation of defect density using verification and validation certificate technique. There are a few crucial factors that have an effect on the initial defect density. Analysis of these factors is important because it provides a quantitative method of identifying possible techniques for reducing the insertion rate of defects. Further, this analysis can be used to estimate initial defect density that may be used later while planning the required effort for testing. In present scenario many models are available for calculating the defect density. Most studies used multiple numbers of factors in modelling for defect density estimation. The performance of these models increases when a number of factors are added for estimation of defect density . The examples of such type of models are RADC model [5, 6] and ROBUST model [7, 8] that uses this multiplicativity of factors for performance enhancement.
Most of models earlier in the software reliability engineering focused on the complexity of code, skill or experience of developers, and development process. But the overall effect has been affected, with consideration of only these factors, when requirement changes . Earlier studies show that defects that are distributed in a software system are random in nature. So the size of particular module may affect the defect density. Ferdinand  study argued that a number of defects are increased with number of the code segment. Specifically, this theory asserts that the number of defects is proportional to a power of . This consideration has been used in the proposed approach to relate the defect density with module size.
2. Related Work
The software design for module of small size is easy as compared to larger module size. So the design of the software may be improved by minimizing the size of the module. But Basili and Perricone  mentioned that larger modules had a lower defect density. This observation was true even for the more complex module. This analysis done by the Basili and Perricone with a data set where most of the modules are of small size.
Alsmadi and Najadat  proposed techniques to generate a large number of randomly selected modules from tested data sets and make a comparison in every cycle between two modules where one is faulty and the other is not. This comparison can be used to estimate the defect density due to identified faulty modules.
The study by Withrow  shows that the defect density is decreasing with increasing size of modules up to a certain point and after that it increases with an increase in the size of the modules. These results of Withrow analysis support the hypothesis given by Banker and Kemerer .
Hatton  in his approach proposed logarithmic growth in the number of defects for software product of the size up to 200 lines of code with reference to Withrow’s data. Malaiya and Denton  have evaluated the argument of Rosenberg  about observed decreases in defect density with respect to rising in module sizes. They have observed that it is misleading and proposed a model that confirms such observations.
Malaiya and Denton  proposed a model that takes into account both trends, that is, decreasing and rising trend of defect density. They have given two different regions for defect density. They had also applied it to the actual data to obtain parameter values.
In the next section a model proposed by Malaiya and Denton  for defect density has been considered and a modified model, with two additional parameters, has been proposed, which can be used to examine the effect of module size distribution of the overall defect density.
3. Proposed Model
In this paper two mechanisms have been considered which are responsible for categorizing the faults in the software system. In the present scenario most of the software system is comprised of modules and modules themselves are comprised of instructions . Using these two considerations a composite model has been proposed by Malaiya and Denton  with using some parameters to control the defect density. In this model only the relation of defect density with the module size is established, but this model is not able to give the parameters for updating the module size in terms to reduce the defect density. So there is a need to add some parameters in the model that give idea for changes in the size of modules for minimizing the defect density. In this paper the model proposed by Malaiya is modified by considering the argument given by Ferdinand  and adding two other parameters to increase the reduction in the defect density rate of software systems. The first category of faults may be related to those modules that occur due to interaction with others modules. The second category of faults may be an instruction-related fault that occurs due to incorrect implementation of individual module. In the proposed model defect density for module-related fault will be denoted by and for instruction-related fault the defect density will be denoted by and the overall defect density will be denoted by .
First Category of Faults. These types of faults are associated with parameter passing among the modules. Some of the faults of first category may be related to global data as well as assumptions made by modules to one another. For such types of faults, uniform distribution among the modules has been considered. In terms of defect density, defects may be defined as the overheads that decline proportionately with increase in module size.
Defect density for module-related faults with size is given by where is any nonzero integer that is added in the model to exponentially reduce the defect density and is proportionality constant. In (2) the minimum possible value of s is one and is a suitable parameter. The model presented with (2) seems consistent with the model proposed by Shen et al. .
A Second Category of Faults. This category represents the faults that occurred due incorrect implementation of instruction or module. So these faults can be treated as instruction faults. For obtaining the defect density with respect to these types of fault, first the probability of an incorrect instruction, which has two components, should be considered. The first component has been assumed as a constant that is and the other component depends on the interaction among instructions. Thus, the second component is proportional to the module size . So the defect density due to instruction-related defects can be expressed as follows: where is another proportional parameter and is any nonzero integer added to exponentially reduce the defect density. From (2) and (3), the total defect density can be given as follows: Equation (4) represents the total amount of defect density that occurs due to module-related faults and instruction-related faults, where is dependent variable and is independent variable.
From the above equation, it is obvious that total defect density can be optimized with respect to module size , and for minimum defect density the concept of maxima and minima (based on calculus) may be used under the assumption that the given equation is well defined and differentiable within its domain.
Now for a finding the minimum value of defect density, differentiating (4), with respect to , gives For maxima and minima equate (5) to zero and put . Consider the following: Equation (7) gives the minimum value of size . To calculate the minimum defect density, with respect to this minimum size of modules, takes the second derivative of (5). Consider the following: Substituting value of from (7) in (8), it gives This is only positive when constants are positive and are nonzero positive integers. So, at the point the defect density will be minimum under some restriction.
So the minimum defect density is It is also possible to formulate a model for defect density verses module size with more variational parameters but it may become complicated to provide justification for the assumptions made for these parameters.
Special Cases. (i) If the value of and is taken, then it gives and , and total defect density is given by Differentiating (11), with respect to , gives So, the module size for minimum defect density is and the minimum defect density is given by
(ii) If the value of and is taken, then it gives and , and total defect density is given by So, the module size for minimum defect density is given by and the minimum defect density is given by
(iii) If the value of and is taken, then it gives and , and total defect density is given by , and under some suitable constraints over , , and . The point will give minimum point. Consider the following:
(iv) If the value of and is taken, then it gives and , and total defect density is given by , and under some suitable constraints over , , and . So, is a point of minima and the minimum value is given by Under some restricted conditions of constants , , and this type of model can be classified into two distinct regions . In the first region defect density is decreasing with increasing rate of module size, and in another region defect density increasing with module size.
4. Analysis of Proposed Model
As discussed in the previous section the model has been formulated to show a relationship between software defect density and size of the module. In this section we have discussed the data collection for the analysis of the proposed model, method used to calculate the parameter value of the two cases, calculation of defect density using the parametric values, and finally optimal size of module for minimum defect density.
4.1. Data Collection
The proposed model has been analyzed with respect to the two distinct values for , and , . The proposed model can also be analyzed with other distinct values for and , but in this paper only two special cases as case (i) and case (ii) have been used to validate the model. In this paper three different data sets have been used to analyze the model. The data made available by Basili and Perricone  has been considered as the first data set for analysis. This data set only comprises small sizes of modules. There are very few number of modules that are larger than 200 in size. The data set is given in Table 1. The second data set, given by Withrow , has been used for analysis. This data set comprises larger module sizes and is given in the Table 2. The third data set has been taken from the declining defect density that is noticed only in the small size of modules. The third data set has been collected from the PDR (Promise data repository) . This data set has been collected from the 23 closed sources projects in which number of modules, size, and defect density are available. This data set is given in Table 3.
4.2. Parameter Calculation
In the previous section we have discussed the three data sets that have been used to analyze the proposed model. To calculate the parameter values for , , and , curve fitting method has been used. The parameter values are obtained for all three data sets for two distinct values of and as in the case (i) and case (ii). The first data set comprises only small size of modules, so according to the model, parameter plays a very little role. This data set does not show any values in the second region where the growth of defect density is seen with increasing size of modules. Only the first region is seen where the defect density decreases with increasing size of the module. So the optimal size of modules and minimum defect density field in the Tables 1 and 2 of the first data set are kept blank. The parameter values are given in Table 1 in case (i) and in Table 2 for case (ii). The optimal values for size of module in case (i) have been calculated using (13); the minimum defect density with respect to this optimal size was calculated using (14) and is given in Table 1. Similarly, the optimal values for size of module and minimum defect density with respect to this size for case (ii) are given in Table 2.
4.3. Calculation of Defect Density
After calculating the parameter value for two cases, defect densities with respect to proposed model have been calculated. Finally, calculated values of defect density for two special cases are compared with the observed values of defect density using the graph. The calculated values of defect density for the data made available by Basili and Perricone  are presented in Table 3 and plotted in Figure 1.
The analysis of decreasing defect density trend with increasing size of modules has been done on the data made available by Basili and Perricone . It indicates the relationship between observed data and fitted values for case (i) and case (ii).
From the graph it can be inferred that there is no region where the growth of defect density with increasing size of module possible has been seen. Only the first region is seen where the defect density decreases with increasing size of the module. The case (i) and case (ii) give more linear relation between defect density and module size than the observed data. In case (ii), geometric progressive sizes are used that give minimum defect density with respect to the size.
In the data set given by Basili and Perricone , no larger modules are used, so another data set given by Withrow  that comprises most of larger size modules has been taken for analysis. Due to these larger size modules both regions can be seen. These data are mentioned in Table 4, for Ada modules with calculated values of defect density and plotted in Figure 2.
The analysis of growth in defect density trend with increasing size of larger modules has been done on the data made available by Withrow . It shows both trends indicated by the proposed model. These trends have been seen in both special cases.
This increasing trend in defect density for larger modules has been identified. It may happen due to possibilities that large modules may not be tested too thoroughly in comparison to the smaller modules, resulting in relatively higher defect density in larger models. From the graph it is observed that the case (ii) gives better results in terms of minimization of defect density with increasing size of modules.
The third data set has been collected from the PDR (Promise data repository) . These data with calculated values of defect densities in case (i) and case (ii) are presented in Table 5 and plotted in Figure 3.
This data set is also showing the two regions for decreasing and increasing defect density trends. The analysis for these two trends has been done on the data collected from the PDR and calculated values for two special cases.
From Figure 3, it can be inferred that case (ii) gives improved results in comparison to case (i). Both regions are seen in the graph. The analysis results for this data set show very small variation in defect density, because of the very large size of modules. It infers that once the size of module increases with geometric progression, the increasing rate of defect density reduces with size of the module.
4.4. Goodness of Fit
In the previous section, defect densities for case (i) and case (ii) have been calculated using the proposed model. After obtaining the values of defect density, the fitted values must be compared in terms of fitness of good measure. To perform goodness of fit measure, SPSS statistical analysis tool  has been used for calculating the different statistics measures. The sum of squares due to error (SSE) statistics measures the total deviation of the response values from the fit to the response values. A value of this measure closer to 0 indicates that the model has a smaller random error component, and the fit will be more useful for prediction. The -square () statistics measure indicates how successful the fit is in explaining the variation of the data. This measure can take on any value between 0 and 1, with closer to 1 indicating that most of the proportion of the variance is accounted for the model. For example, an value 0.59 means that the predictors explain the 59% variation in the dependent variable. The value is the estimated probability of rejecting the null hypothesis of the study question. If the value is less than 0.05, then the null hypothesis can be rejected. In this study, the null hypothesis can be considered as “there is no goodness of fit between the observed value of defect density and fitted values of defect density.” Table 6 shows these measures for the all three data sets using case (i) and case (ii).
From the Table 6, it is inferred that, in all three data sets, case (ii) gives more goodness of fit measured with the observed data set.
5. Performance Evaluation
The model proposed by Malaiya and Denton  suggested a relation of module size with defect density in SDLC. In this model the modules are sized with small ranges, whereas in the model proposed through this paper, the ranges of module size are increased by using two additional parameters with integer power of module size in comparison with Malaiya’s model. This proposed approach has been mentioned in (4). On the basis of analysis using proposed model with the data sets (two data sets as used by Malaiya and Denton  and data set from PDR ), improvement in terms of minimizing the defect density with optimum module size has been observed compared to Malaiya’s model. The graph shown in Figure 1 indicates closer results with the observed defect density when used with integer 2 power for sizes of modules. Similarly, graph in Figure 2 indicates improvised optimization in the defect density when used with the power of 2 of sizes of modules. The more improvised optimization can be seen in Figure 3 with data sets collected from the PDR with a power of 2 of sizes of modules. Furthermore, when the power of module size will be increased, the proposed model will likely to provide better results.
6. Conclusions and Future Work
Through this paper a model has been proposed that depicts the impact of module size on the defect density. Defect density reduces when the size of the module is reduced and gradually rises after an optimal size of the module. In this paper the module size has been considered in geometric progression. Two additional parameters have been added to the contemporary model proposed by Malaiya et al. Analysis of the proposed model has been performed with the data set as used by the Malaiya et al. and data set collected from the PDR. Analysis with two special cases reflects an optimization of defect density with respect to original observed defect density. Through proposed model an optimal size of the module has also been identified that may be used for effective distribution of module sizes during SDLC. A condition for optimal distribution of module sizes has also been identified in this paper. With the improvised approach presented through this paper, it may also be concluded that the optimization of defect density may be achieved by effective distribution of module sizes. The larger modules can be broken into smaller modules and very small modules can be merged together to optimize the overall defect density during SDLC. In later stages, the implementation of proposed model can also be planned with more variables with different values that may lead to better feasible solution. In all such cases the trade-offs with the complexity of the system should also be taken care off. The implication for minimization of defect density can be further analyzed to investigate the rate of failure of software.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
T. Fehlmann, “Defect density prediction with six sigma,” in Proceedings of the 6th Software Measurement European Forum, Rome, Italy, 2009.View at: Google Scholar
S. Mark, “Utilizing verification and validation certificates to estimate software defect density,” in Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 381–384, 2005.View at: Google Scholar
M. Takahashi and Y. Kamayachi, “An empirical study of a model for program error prediction,” in Proceedings of 8th International IEEE Conference on Software Engineering, pp. 330–333, August 1985.View at: Google Scholar
J. C. Munson and T. M. Khoshgoftar, “Software metrics in reliability assessment,” in Handbook of Software Reliability Engineering, M. R. Lyu, Ed., IEEE-CS Press/McGraw-Hill, 1996.View at: Google Scholar
W. Farr, “Software reliability modeling survey,” in Handbook of Software Reliability Engineering, M. R. Lyu, Ed., IEEE-CS Press/McGraw-Hill, 1996.View at: Google Scholar
N. Li and Y. K. Malaiya, “ROBUST: a next generation software reliability engineering tool,” in Proceedings of the 6th International Symposium on Software Reliability Engineering, pp. 375–380, October 1995.View at: Google Scholar
Y. K. Malaiya and J. A. Denton, “Module size distribution and defect density,” in Proceedings of the 11th International Symposium on Software Reliability Engineering (ISSRE '00), pp. 62–71, October 2000.View at: Google Scholar
J. Rosenberg, “Some misconceptions about lines of code,” in Proceedings of the 4th International Software Metrics Symposium, pp. 137–142, November 1997.View at: Google Scholar
SPSS, IBM SPSS Statistics Version 20 64-bit, http://www.spss.com/statistics.