Research Article | Open Access
Tolerance Intervals in a Heteroscedastic Linear Regression Context with Applications to Aerospace Equipment Surveillance
A heteroscedastic linear regression model is developed from plausible assumptions that describe the time evolution of performance metrics for equipment. The inherited motivation for the related weighted least squares analysis of the model is an essential and attractive selling point to engineers with interest in equipment surveillance methodologies. A simple test for the significance of the heteroscedasticity suggested by a data set is derived and a simulation study is used to evaluate the power of the test and compare it with several other applicable tests that were designed under different contexts. Tolerance intervals within the context of the model are derived, thus generalizing well-known tolerance intervals for ordinary least squares regression. Use of the model and its associated analyses is illustrated with an aerospace application where hundreds of electronic components are continuously monitored by an automated system that flags components that are suspected of unusual degradation patterns.
The model and analyses developed in this paper address a problem encountered when analyzing data from service life tests of aerospace hardware packages. Data for as many as 700 performance metrics per part type are automatically stored during surveillance testing and subsequently input into a software program where, up to this point, an ordinary least squares line based on normal-theory has been routinely fit to the data using time-in-service as the explanatory variable. The software program also outputs tolerance intervals based on the ordinary least squares analysis. Engineers monitoring this process are alerted only to those cases where observations in the scatter plot fall outside the tolerance intervals or the tolerance interval crosses a given limit within some specified future time interval (e.g., 60 months). In cases where the alert suggests an increasing accelerated degradation, a proactive corrective action (e.g., part replacement) may be initiated. In cases where the alert suggests less than expected degradation, a cursory investigation to determine if the part is being utilized properly is initiated. Tolerance intervals are often similarly used for monitoring environmental applications .
Compared to having engineers individually examine the data from all the combinations of metrics and part types, the automated monitoring and flagging process is quite cost-effective. However, when the variance of the metric increases with time, the tolerance intervals that result from the ordinary least squares analyses fail in the sense that the intervals become too narrow as time increases. Figure 1 illustrates this point, where the 111 observations are the performance metric for one type of part. It is evident from the figure that the random errors associated with the regression model are heteroscedastic. Heteroscedasticity is, of course, not unique to our application. It arises routinely in other applications such as economics, behavioral sciences, social sciences, environmental science, and computer vision. The works in [2–6] provide examples within these disciplines, respectively.
Figure 1 also shows two sets of pointwise 95%-content tolerance intervals constructed at the 90% confidence level. The dashed lines represent the tolerance intervals derived from an ordinary least squares analysis, while the solid lines represent the tolerance intervals derived from an estimated weighted least squares analysis that is described in Section 1.2. The bold line in the figure is the estimated weighted least squares regression line. It is evident in Figure 1 that the ordinary least squares tolerance intervals are inadequate in the sense that they are too wide for small time values and too narrow for large time values. On the other hand, the more pronounced curvature associated with the tolerance intervals derived from the estimated weighted least squares analysis more adequately captures the range of the observations at both ends of the time spectrum.
1.2. Model for Heteroscedasticity
Examination of aerospace data for many part types and many performance metrics led us to propose the following model for heteroscedasticity. Let denote the metric level at time for a particular part, and for motivational purposes, assume that the underlying process is a discrete-time valued process ( = ). Assume that the initial value of the process is , where is an unknown constant and is a normally distributed random variable with zero mean and unknown variance . If the degradation process has both a constant deterministic trend, say , and also a random stochastic perturbation, it would follow that , where is a normally distributed random variable with zero mean and unknown variance . It follows from successive substitutions that , or equivalently, , where is a normally distributed random variable with zero mean and unknown variance . The model for is (drifting) Brownian motion, except for the fact that the variance function is displaced from the origin by . In our application, the process is measured one time on each unit; so the data are a collection of independent observations , where is the process associated with the th unit. Thus, we have the model () for the observations.
Equivalently, the model implies that observations are independent and normally distributed with means and variances . The fact that a variance component, , is responsible for the heteroscedasticity is somewhat unique to our model since the large literature pertaining to heteroscedasticity models usually assumes variances of the form , where is an arbitrary function (e.g., a polynomial or the exponential function); see [2, 7] and references therein. While these types of models have seen many successful applications, the lucid interpretation of the heteroscedastic variance function was crucial when eliciting buy-in from our aerospace engineering project stakeholders.
1.3. Tolerance Interval for Heteroscedastic Regression
Defining , the log-likelihood function for based on the observations is
For a fixed the conditional MLEs, say and , of and satisfy the linear system of equations:
and the conditional MLE of is . The conditional MLEs, and , are the weighted least squares estimator of the slope and intercept and have an estimated variance-covariance matrix equal to , where is the matrix whose first column is all ones and whose second column is the set of values and =
The profile log-likelihood function (e.g., see reference ) of is , which can be maximized to find the MLE of , say . The MLEs of , and are then are , , and . Unbiasedness of and follows from general results in .
A pointwise -content tolerance interval with confidence level for a regression line in the context of homoscedastic normal errors was derived in . The following theorem, which is proved in the Appendix A, extends that result to our context.
Theorem 1. If is known, a pointwise -content tolerance interval with confidence level for a normal distribution that has mean and variance is where is the upper percentile of a chi-square distribution with degrees of freedom and is the solution to , with .
Proof. See Appendix A.
1.4. Testing for Heteroscedasticity
Of special importance when analyzing the data for our application is the test of that the data are homoscedastic. If the hypothesis is not rejected, all the pertinent inference can be drawn from an efficient ordinary least squares analyses. On the other hand, rejecting protects a practitioner from the pitfalls of using an inappropriate least squares analysis which include a higher than expected false alert rate. Deriving a test for is particularly interesting from the standpoint that there are alternative tests available in the literature and in subsequent sections in this paper we present a unifying framework for relating alternative tests to each other.
1.5. Overview of Remaining Sections
In Section 2, we position the proposed heteroscedastic model as a special case of a general mixed linear model. We then propose a test of that is motivated by informally combining the root arguments used when deriving locally most powerful tests and score tests. We show that the test statistic has a distribution that only depends on the underlying variance ratio and thus is amenable to a significance test of . A Monte Carlo algorithm for computing the required critical values is outlined. In Section 3, we relate our proposed test to some common tests for heteroscedasticity, namely, the Breusch and Pagan, White, and likelihood ratio tests. In Section 4, we show additional examples of our test for heteroscedasticity and the use of tolerance intervals within the context of aerospace engineering case study. We also report results of a simulation study that examined the power of all the tests that are discussed in this paper. We close the paper with a short summary in Section 5.
2. Proposed Test for Heteroscedasticity
The heteroscedastic regression model developed in Section 1 is a special case of a more general mixed linear model defined by
where is an matrix (we assume without loss of generality that has a full column rank) of known constants, is vector of fixed-effects, is matrix of known constants, is a vector of unobservable random-effects that follow an distribution, and is an vector of unobservable random error terms, independent of , and having a distribution. The special case considered in Section 1 is simply where and with a diagonal matrix with the values equal to .
We begin the derivation of our proposed test for heteroscedasticity by writing the log-likelihood of the general mixed linear model, based on , as follows:
Next, there exists an orthogonal matrix such that , where is a diagonal matrix whose elements are the eigenvalues of . It follows that , and hence . Equivalently, . Combining this result with the definitions and , the log-likelihood can be rewritten as
If and were known, the locally most powerful test statistic would be
To obtain a useable test statistic we follow the procedure used when deriving the score test (e.g., see ) by replacing the unknown and by their conditional MLEs given , which are and , where and . The resulting statistic is
The distribution of r is and by the nature of it is clear that its distribution only depends on . A simple approach for finding critical values to test is to simulate vectors from an , compute the corresponding values of from (8), and estimate the critical value from the resulting empirical distribution (edf) function of values. Denoting the upper- percentile of the edf by , the test rejects if . We emphasize the computational simplicity of , and the ease in which critical values can be obtained. It is also possible to reexpress the null distribution of (8) as a ratio of quadratic forms in independent standard normal random variables and then the saddlepoint approximation described by  can be employed to approximate the P-value associated with the test.
A locally most powerful invariant (LMPI) test of was derived in [12–14]. The derivation of this test is sketched in Appendix B. It can be seen there that the complexity of computing LMPI test is nontrivial. The key result in Appendix B is the demonstration that the LMPI test and the test are equivalent under the general mixed linear model. The practical significance of this finding is that the computation of the LMPI test, as originally proposed, involves a complicated eigenvalue and eigenvector analysis, whereas computing the equivalent form test only requires fitting the model with ordinary least squares tools. To characterize this computational contrast another way, the computations associated with the equivalent form test can be done with software as simple as Excel, whereas the computations associated with the originally proposed form LMPI test will require a rich matrix processing software package.
From a conceptual perspective, the equivalence between the LMPI test and the test is interesting in the sense that the original complicated derivation of the LMPI test can be replaced by the simpler derivation of the test. In addition, the demonstrated equivalence also provides further motivation for a procedure introduced in  where a test statistic is derived by replacing nuisance parameters in the locally most powerful test statistic by their conditional MLEs. Results in  show this procedure produces an asymptotically uniformly most powerful test. Our equivalence finding shows that when we used this approach to derive the R test, in a finite sample context, we arrive at a locally most powerful invariant test.
3. Other Tests for Heteroscedasticity
The Breusch and Pagan test for homogeneity is a partial score test derived for a general setting where the observations are independent and normally distributed with means and variances , where is a vector of known covariates, is a vector of unknown parameters, is a vector of known covariates (whose components may or may not overlap with the components of ), is a vector of unknown parameters, and is an arbitrary function that is only assumed to possess a first derivative. By setting the first component of to unity, a test of is a test of heteroscedasticity. The Breusch and Pagan test  is the partial score test of . For application to our problem, we have , and and the test statistic simplifies to
where is an diagonal matrix whose th diagonal element is equal to .
The White test  is positioned to be robust to the normality assumption. For our application, the statistic can be obtained by first computing the vector of squared residuals, say where , and then computing the test statistic as
where is an matrix of ones, , and is a matrix whose first column is all ones, whose second column is the values and whose third column is the values . Note that the ratio term in the definition of is simply the -square value of the regression of on the matrix. The distributions of both BP and depend on and the same Monte Carlo method for approximating the null distribution of can be used to approximate the null distributions of BP and .
The log-likelihood function for the general mixed linear model was given in (5) and computation of the MLE of and the conditional MLE of was discussed in Section 2. It follows that the likelihood ratio test statistic is
The LRT requires relatively significant computations. For example, computations of the full model MLEs needed by the LRT are not readily available unless software such as PROC MIXED in SAS is available. In standard situations the asymptotic null distribution of LRT is chi-square with degrees of freedom equal to the difference in the dimension of the unconstrained and constrained parameter spaces. However, our situation is not standard since the reduced parameter space lies on the boundary of the unconstrained parameter space. Results in  show that the asymptotic distribution of LRT in (11) is a 50: 50 mixture of zero and a chi-square distribution with one degree of freedom. A size- LRT therefore rejects if , where denotes the upper-2 percentile of a chi-square distribution with one degree of freedom.
4. Application to Aerospace Case Study
4.1. Illustrative Examples
Figure 1 was used in Section 1 as illustrative of a data set that exhibits heteroscedasticity. Figure 2 shows a second illustrative data set (109 observations) which does not exhibit evidence of heteroscedasticity. As in Figure 1, two sets of pointwise 95%-content tolerance intervals constructed at the 90% confidence level are shown corresponding to an ordinary least squares analysis and the estimated weighted least squares analysis described in Section 1.2. In contrast to Figure 1, the two sets of tolerance intervals are nearly identical.
Table 1 shows the computed test statistics , BP, , and LRT for the two data sets shown in Figures 1 and 2, along with their -values for testing . The tests all agree that heteroscedasticity appears to exist in Data Set 1 but not in Data Set 2. We note that of all the tests considered, is by far the simplest to compute with BP and only slightly more complicated. As described in the introduction, our application context has hundreds of metrics that need to be monitored with this type of analysis methodology. Figures 3, 4, 5, and 6 provide four additional illustrative examples, including the computed statistic and associated -value for . There is evidence of heteroscedasticity in all four of these data sets.
4.2. Power Comparison of Alternative Tests
It is easy to show that the distributions of the , BP, , and LRT test statistics depend only on . It follows that the power functions (i.e., the probability of rejecting as function of ) of the tests can be evaluated using simulated data sets from the mixed model in (3) using the parameters . We implemented our simulation study using the S-Plus package. For each simulated data set, each of the test statistics can be evaluated and compared to their respective critical values corresponding to nominal size significance tests. The power of the tests is estimated by the fraction of data sets for which they reject .
Column 2 of Table 2 shows the estimated power of the 10% nominal size test based on this procedure for , using 5000 simulated data sets and the same set of values associated with Data Set 1, which is shown in Figure 1. Columns 3–5 of Table 2 give the ratio of the power for the LRT, BP, and tests, relative to the test. The results show that the test and the LRT have nearly identical power, though the test has slightly better power for small . The test has uniformly better power than BP and tests. A somewhat surprising observation is the pronounced dominance of with respect to BP and , which are perhaps the two most widely known tests in the economics literature. Table 3 shows similar power comparisons using the associated with Data Set 2, that is shown in Figure 2, and the conclusions are consistent with those drawn from Table 2.
Our aerospace case study, which pertains to the time evolution of performance metrics for electronic equipment, motivated the derivation of a model for heteroscedastic regression errors. A test for homoscedasticity was proposed and compared with various other tests that are prevalent in the literature. The proposed test was shown to be equivalent to an LMPI test associated with a general mixed linear model of which this paper’s heteroscedastic regression model can be regarded as a special case. Aside from what we judge to be a simpler intuitive motivation associated with the test formulation, it has the practical benefit of being much simpler to compute than the formula that results from the classical LMPI derivation. The power of the test was compared to some popular alternative tests, and only the considerably more difficult to compute likelihood ratio test has power that is comparable to the test. We extended the classical application of tolerance intervals for regression to our heteroscedastic case and illustrated their use with six data sets from our aerospace case study.
A. Tolerance Intervals for Heteroscedastic Model
Using observations that follow a simple linear regression model , (), Wallis  derived a tolerance interval for a normal distribution that has mean and variance . The Wallis interval easily extends to the more general case where the observations are of the form , where the are not necessarily equal to unity. In particular, a tolerance interval with content for the normal distribution with mean (where and are arbitrary) and variance is
where and denote the ordinary least squares estimates of and using an matrix that has the values in its first column, rather than unity, and moreover, when computing the value from
The heteroscedastic regression model described in Section 1.2 generalizes the problem solved by Wallis by having independent and normally distributed with zero means and variances equal to , where is a new parameter. The derivation in this Appendix is for the case where is known so that the observations can equivalently be expressed as
where are independent and normally distributed with zero means and variances equal to . Defining, and , it follows that a tolerance interval with content for a normal distribution with mean and variance is
where = with being the matrix whose first column is the values and whose second column is the values , = , and where is the solution to (A.2) but with N replaced by .
Finally, by multiplying the endpoints of (A.5) by a tolerance interval with content for a normal distribution that has mean and variance is obtained. The resulting interval has the form
as reported in Section 1.3.
B. Equivalence of Test and LMPI Test
Following , where = , an appropriate ANOVA table for the mixed model is based on the sum-of-squares decomposition = . The first two terms on the right-hand-side of this decomposition, which we denote as and , respectively, correspond to fitting the fixed-effects first and then fitting the random-effects after the fixed-effects. The final term, denoted as , corresponds to the residual error sum-of-squares. It can be shown that the degrees of freedom associated with these three quadratic forms are , and , respectively.
Let the nonzero eigenvalues of be denoted by , and define . Let denote the number of distinct eigenvalues of , and denote their values by and their multiplicities by . Define to be the matrix of orthornormal eigenvectors corresponding to the . Finally, let and define to be a solution to . The following results are proved in .(a) has a (b)is independent of (c)For (note that the cardinality of is ) and , are independent with and .
It is shown in  that the locally most powerful invariant test statistic for testing is
Although explicitly appears in the denominator of (B.1), use of the LMPI test does not require . For the cases where (), such as what will occur with our application since is a diagonal matrix of rank , the LMPI test is still useable with the test statistic simply reducing to . Alternative derivations of the LMPI test were proposed in [14, 15].
To prove the equivalence of the LMPI test based on (B.1) and the test based on (8), we first note that = = = , which establishes the denominator of the two test statistics coincide. Next, we observe that the numerator of LMPI test statistic, , can be reexpressed as . Since = implies = , we have = = = = , which establishes that the numerators of the two statistics are also identical.
- G. F. Piepel, S. K. Cooley, and M. R. Paul, “Upper tolerance intervals adjusted for multiple nuisance uncertainties,” Journal of Quality Technology, vol. 40, no. 3, pp. 245–258, 2008.
- R. Davidson and J. G. MacKinnon, Estimation and Inference in Econometrics, Oxford University Press, New York, NY, USA, 1993.
- J. Cohen, P. Cohen, S. G. West, and L. S. Aiken, Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, 2nd edition, 2003.
- A. Gelman and J. Hill, Data Analysis Using Regression and Multi-Level/Hierarchical Models, Cambridge University Press, New York, NY, USA, 2007.
- V. Barnett, Environmental Statistics: Methods and Applications, John Wiley & Sons, New York, NY, USA, 2004.
- Y. Leedan and P. Meer, “Heteroscedastic regression in computer vision: problems with bilinear constraint,” International Journal of Computer Vision, vol. 37, no. 2, pp. 127–150, 2000.
- R. J. Carroll and D. Ruppert, Transformation and Weighting in Regression, Chapman & Hall, New York, NY, USA, 1988.
- J. O. Berger, B. Liseo, and R. L. Wolpert, “Integrated likelihood methods for eliminating nuisance parameters,” Statistical Science, vol. 14, no. 1, pp. 1–28, 1999.
- R. Kackar and D. A. Harville, “Approximations for standard errors of estimators of fixed and random effects in mixed linear models,” Journal of the American Statistical Association, vol. 79, pp. 853–862, 1984.
- W. A. Wallis, “Tolerance intervals for linear regression,” in Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics, pp. 43–51, 1951.
- O. Lieberman, “Saddlepoint approximation for the distribution of a ratio of quadratic forms in normal variables,” Journal of the American Statistical Association, vol. 89, pp. 924–928, 1994.
- T. Lin and D. A. Harville, “Some alternatives to Wald's confidence interval and test,” Journal of the American Statistical Association, vol. 86, pp. 179–187, 1991.
- J. Nyblom and T. Makelainen, “Comparisons of tests for the presence of random Walk coefficients in a simple linear model,” Journal of the American Statistical Association, vol. 78, pp. 856–864, 1983.
- P. H. Westfall, “Power comparison for invariant variance ratio tests in mixed ANOVA models,” The Annals of Statistics, vol. 17, pp. 318–326, 1989.
- F. Marohn, “A comment on locally most powerful tests in the presence of nuisance parameters,” Communications in Statistics: Theory and Methods, vol. 31, no. 3, pp. 337–349, 2002.
- T. S. Breusch and A. R. Pagan, “A simple test for heteroscedasticity and random coefficient variation,” Econometrica, vol. 47, pp. 1287–1294, 1979.
- H. White, “A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity,” Econometrica, vol. 48, pp. 817–838, 1980.
- S. G. Self and K. Liang, “Asymptotic properties of maximum likelihood estimators and likelhood ratio tests under nonstandard conditions,” Journal of the American Statistical Association, vol. 82, pp. 605–610, 1987.
- D. A. Harville and A. O. Fenech, “Confidence intervals for a variance ratio, or for heritability, in an unbalanced mixed linear model,” Biometrics, vol. 41, no. 1, pp. 137–152, 1985.
Copyright © 2009 Janet Myhre et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.