Abstract

Researchers in reliability engineering regularly encounter variables that are discrete in nature, such as the number of events (e.g., failures) occurring in a certain spatial or temporal interval. The methods for analyzing and interpreting such data are often based on asymptotic theory, so that when the sample size is not large, their accuracy is suspect. This paper discusses statistical inference for the reliability of stress-strength models when stress and strength are independent Poisson random variables. The maximum likelihood estimator and the uniformly minimum variance unbiased estimator are here presented and empirically compared in terms of their mean square error; recalling the delta method, confidence intervals based on these point estimators are proposed, and their reliance is investigated through a simulation study, which assesses their performance in terms of coverage rate and average length under several scenarios and for various sample sizes. The study indicates that the two estimators possess similar properties, and the accuracy of these estimators is still satisfactory even when the sample size is small. An application to an engineering experiment is also provided to elucidate the use of the proposed methods.

1. Introduction

A stress-strength model, in the simplest terms, considers a unit/system that is subjected to an external stress, modeled by r.v. , against which the unit sets its own strength, modeled by r.v. , in order to properly operate. The probability that the unit withstands the stress is then given by , which is usually called reliability.

A great deal of work has been done about this topic: most of it deals with the computation of reliability, if the distributions of stress and strength are known, or its estimation under various parametric assumptions on and , when samples from and are available. A complete review is available in [1]. Many applications of the stress-strength model, for its own nature, are related to engineering or military problems, where it is also referred to as a load-strength model [2]. However, there are also natural applications in medicine or psychology, which involve the comparison of two r.v., representing, for example, the effect of a specific drug or treatment administered to two groups (control and test); here, reliability assumes a wider meaning.

Almost all of these papers consider continuous distributions for and , since many practical applications of the stress-strength model in engineering fields presuppose continuous quantitative data. A relatively small amount of work is devoted to discrete or categorical data. Data may be discrete by nature, for example, the number of events occurring in a certain spatial or temporal interval; sometimes discrete data are derived from continuous ones by grouping or discretization or censoring, and then, instead of numerical measurements on and , they are presented in a form of ordered categories.

Among the r.v. modeling discrete data, the Poisson can be of interest in several practical applications. The Poisson r.v. is often used to model rare events such as the number of claims in automobile insurance, the number of times a website is accessed, the number of calls to a phone operator, the number of words mistyped per page in a book, and so forth [3, 4]. The distribution of the difference between two independent r.v. each having a Poisson distribution has already attracted some attention [5]. Strackee and van der Gon [6] stated that “in a steady state the number of light quanta, emitted or absorbed in a definite time, is distributed according to a Poisson distribution. In view thereof, the physical limit of perceptible contrast in vision can be studied in terms of the difference between two independent variates each following a Poisson distribution”. Irwin [7] studied the case when the two variables and each have the same expected value; Skellam [8] was the first to discuss the problem when . Strackee and van der Gon [6] gave tables of the approximate values of the cumulative probability for several combinations of the values of the parameters and . More recently, Karlis and Ntzoufras [9] used the Poisson difference distribution to model the difference in the decayed, missing, and filled teeth index before and after treatment; Karlis and Ntzoufras [10] applied it to model the difference in the number of goals in football games.

In this paper, we examine point and interval estimation for the reliability of the stress-strength model with independent Poisson stress and strength. Although the maximum likelihood (ML) and uniformly minimum variance unbiased (UMVU) estimators of reliability have a known analytical expression, their statistical properties cannot be easily derived and thus need to be assessed through a Monte Carlo simulation study. Confidence intervals for reliability based on approximate expression for variance are also presented, and their performances in terms of coverage rate and average width are empirically investigated.

The paper is laid out as follows: in Section 2 reliability for Poisson stress-strength model and its ML and UMVU estimators are presented and discussed. Section 3 introduces approximate variance estimators and confidence intervals for reliability. Section 4 is devoted to a Monte Carlo (MC) study, which empirically investigates the performance of ML and UMVU estimators, and the corresponding confidence intervals for different combinations of distributional parameters and sample sizes. Section 5 describes an application, and Section 6 gives final remarks.

2. Point Estimators

Let and be independent r.v. modeling stress and strength, respectively, with and . Then, the reliability of the stress-strength model is given by (see [1])

The terms of the external sum rapidly converge to zero: reliability can be actually computed taking into account only its first terms. As an example, we compute the reliability when and ; the partial sums are reported in Table 1: the value of is already stable at the 7th decimal digit when .

If two simple random samples of size and of size from and , respectively, are available, reliability can be estimated with the ML estimator, obtained by substituting in (1) the maximum likelihood estimators of the unknown parameters and :

Otherwise, one can use the UMVU estimator [1]: where . Note that formula (3) is represented via a finite sum, whereas formula (2) contains a rapidly converging series. The number of calculations that formula (3) performs depends on the sample means and the sample sizes, which jointly define the number of terms of the external sum; in formula (2) the terms of the external sum rapidly converge to zero, so that it may practically need fewer calculations than (3).

In Table 2, the values for the UMVU estimator are reported when and , for different combinations of sample sizes and . Note that the values of are very close to the value of even for small sample sizes and get closer as the sample sizes increase. These results are pictorially displayed in Figure 1 for and .

Due to the complex expressions involved, the bias of and the variance of either estimators and cannot be analytically derived; a comparison of their performance (in terms of mean square error) can be carried out through MC simulations.

3. Variance Estimators and Confidence Intervals

Whereas the exact value of the variance or the mean square error of either estimator introduced in Section 2 is almost impracticable to derive, an approximate value can be easily supplied recalling the delta method [11]. For the ML estimator , since and are independent estimators of and , the variance of can be approximated as with , and

An analogous approximation can be carried out for the variance of the UMVU estimator; remembering that , can be rewritten as and then the two first-order partial derivatives are given by

The approximate variances of and derived through the delta method can be estimated substituting in (4) the sample means to the unknown parameters and thus getting and an analogous result for .

The Gamma function and its first derivative, , involved in the partial derivatives of , have to be numerically computed. In the software environment [12] this task is easily performed through the gamma and digamma functions, the latter providing the ratio .

Once one has computed , an approximate confidence interval for can be built, recalling the asymptotic normality of : and in an analogous way for . Since is bounded in , special care has to be given when is close to one (close to zero) and/or sample sizes are small: the upper bound may exceed one (the lower bound may fall below zero), and then the CI in (9) will be modified as follows:

More sophisticated asymptotic confidence intervals for can be built recalling some normalizing transformations, such as logit and arcsine [13].

4. Simulation Study

The simulation study aims at empirically comparing the performance of the ML and UMVU estimators, in terms of bias and mean square error, and the confidence intervals based on them, in terms of the coverage rate and average length. Since the approximation of the variance derived through the delta method (4) holds for large samples, we will investigate to what extent it still holds for small and moderate sample sizes, and how it affects inferential results. In this MC study, the value of the parameter of the Poisson distribution for stress is first set equal to a “reference” value, , and the parameter of the Poisson distribution modeling strength is allowed to vary in order to obtain four different levels of reliability , namely, , , , , and . Note that a value of is needed in order to get while lead only to . Then, is set equal to greater values (namely , , and ), and is allowed to vary in order to ensure the five values of reliability above. The corresponding values of for each combination of and values are reported in Table 3.

For each couple , a huge number () of samples of size and of size are drawn from and independently. Different and unequal sample sizes are here considered (all the nine possible combinations between the values , and ). The ML and UMVU estimators are computed on each sample, their approximate variances are calculated, and the corresponding confidence intervals for are built. Some measures of performance for these estimators are supplied. In more detail, the MC root mean square error and the percentage relative bias of the ML estimator are provided: where denotes the value of for the th sample. Analogous indexes are derived for , whose bias is null, and for which we then expect the MC relative bias to be close to zero.

Regarding estimating the variance, the true variance is approximated by its MC mean: with , and then the MC relative bias and RMSE of are calculated the same way as for .

The MC coverage rate of the CIs is simply defined as follows: where is the indicator function, taking value if is true, otherwise. The length of the confidence interval is then equal to . The same performance indexes are derived for .

The simulation results for are reported in Table 4 (RB and RMSE for ML and UMVU point estimators), Table 5 (RB and RMSE for variance estimators), and Table 6 (coverage rate and average length of confidence intervals).

The simulation results show that the ML estimator always presents a very small bias even for small samples: in absolute value, the MC percentage relative bias is always smaller than for all the scenarios considered (whereas the maximum absolute MC percentage relative bias for , which is theoretically unbiased, is ). In 42 scenarios out of 45, underestimates . Regarding the RMSE, the ML estimator performs better than UMVU in 27 cases out of 45, worse in 7 cases, and in 11 cases the RMSE is equal at the third decimal digit. However, under each scenario, even for smaller sample sizes, the values of RMSE for the ML and UMVUE estimators are very close. The ML outperforms the UMVU estimator as the value of gets close to ; their performances tend to be alike as and increase. For both estimators, for fixed sample sizes, the RMSE increases as decreases; for a fixed , the RMSE increases, as the sample sizes decrease (as expected). Figure 2 displays the MC distribution of the ML and UMVU estimators in the case , for three values of sample size; it highlights their very similar behaviour.

Regarding the approximate variance estimators, surprisingly their performance is good even for the moderate sample sizes considered in this study; the percentage relative bias, in absolute value, is never greater than 8%: the worst performance occurs for and . Indeed, when both sample sizes equal , the RB is greater than for small sample sizes, whereas one would expect that the RB decreases in absolute value when sample sizes increase. The results of further simulations not reported here show that the RB actually decreases to zero for . For both estimators, the rate of underestimates is almost equal to the rate of overestimates. Under each scenario, and especially when , the value of RB of the variance estimator is quite close to the corresponding value of the RB of , whereas the RMSE of is smaller than the RMSE of in each of the 45 cases considered. The RB of the two approximate variance estimators does not present a clear trend in terms of ; while their RMSEs, for each of the couples here explored, seem to present a maximum near and a minimum for .

The confidence intervals built upon the point estimators and these variance estimators present coverage that is always greater than 87% for UMVU and 90.5% for ML: the lowest value is obtained for and . They attain the nominal level (95%) for ; in 31 and 21 cases out of 45, respectively, the coverage rate of the ML and UMVU interval estimators is greater than or equal to 92.5%. Overall, the CIs present better coverage when is close to . In fact, in this case, the distributions of and tend to be symmetrical and are more finely approximated by the normal distribution; then, the confidence intervals (9), which assume an underlying normal distribution, show a better performance. The CIs based on the ML estimator almost always show a coverage rate greater than those based on the UMVU estimator; moreover, the latter are always a bit wider, unless when . This feature tends to be negligible when the sample sizes are increased. As one would expect, the average length decreases as sample sizes increase, for fixed , and as increases, for fixed sample sizes.

The results for , which are not reported here for the sake of brevity, confirm the previous findings. Even if the study is obviously not exhaustive, since only several scenarios have been covered, nevertheless these general features can be outlined.

5. An Example of Application

In this section, we apply the inferential techniques presented in Section 3 to a real dataset. The application is based on the data from an engineering experiment discussed in [3], carried out in an electric company, under several experimental conditions (called “runs”), corresponding to different combinations of 8 factors. The blackening experiment was conducted in a three-layer oven; when each run was completed, 30 masks from each layer in the oven were collected to examine the number of defects in each mask. The total number of defects in the 30 masks from the upper layer for each experimental run is observed (see Table 7). We focus on runs 1 and 2, where and defects, respectively, are counted.

Since the number of defects in a mask is either zero or a positive integer, the appropriate distribution is the Poisson. Denoting with and the variables modeling this number for run 1 and run 2, respectively, we are interested in determining a point estimate and an interval estimator for the probability that the number of defects in run 1 is smaller than in run 2, that is, . Since the sample size is for both variables, then and ; then, supposing that and follow a Poisson distribution, the ML and UMVU estimators and their corresponding approximate variances can be computed according to (2), (3), and (8); the associated confidence intervals can be estimated recalling (10).

The results are presented in Table 8 and show the closeness between the two approaches. All the confidence intervals for always exclude 0.5, thus meaning that the difference in sample means testifies to the statistical dominance of on : the number of defects under run 1 is stochastically larger than the number of defects under run 2.

6. Conclusions

In this paper, point and interval estimators for the reliability of a Poisson stress-strength model are presented, discussed, and empirically compared through a Monte Carlo simulation study. The results show that the maximum likelihood and uniformly minimum variance unbiased estimators possess similar sampling properties, and the first is slightly preferable to the second in terms of dispersion around the true value of reliability. Moreover, although the variance estimators proposed here are approximate (i.e., biased), being based on the delta method for asymptotically normal r.v., the empirical results emphasize that the estimators' bias is small even for moderate sample sizes, and these estimators can be usefully employed to build approximate confidence intervals, whose coverage is shown to be overall close to the fixed nominal level, especially when the value of reliability is close to . However, when is close to (or, symmetrically, ), the intervals can show a poorer performance; then caution is needed when constructing a confidence interval based on a point estimate close to 1 (0). In this case, one can resort, for example, to some variance-stabilizing transformation of the estimate.

Acknowledgments

The author thanks the editor and two anonymous referees for their comments and suggestions on the original paper. Special thanks go to Riccardo Inchingolo for his moral support.