Detecting Changes under Multivariate Normal Distributions via the Generalized Inference
It is commonly encountered in many fields to detect whether a change occurs on a population after a special process. Based on observations for describing the population before and after the process, we formulate this problem as two statistical hypotheses testing problems within a framework of multivariate statistical analysis and then propose a generalized inference approach to solve them. The corresponding generalized values and their calculation details are provided. The proposed method is also extended to multiple testing problems. Simulation studies show that the proposed values have satisfactory frequentist performance. We illustrate our methods with a real application in manufacturing of bearings that are used in medical devices.
In many practical problems, we need to analyze uncertainty of a population induced by an operation. Specifically, let be the indices that describe the population. For independent individuals from the population, let denote their values of . After some operation acts on these individuals, their values become , respectively. These can be viewed as an independent sample from , the indices of the population after the operation. We are interested in whether the operation has a significant influence on the population, i.e., there exists a significant change from to .
There are a number of examples of the above problem in real applications. In medicine, the population can be the whole patients, and their indices become after they take a kind of drugs. In manufacturing, the population can be the whole products, and their indices become after a certain process. We want to know whether the change from to is statistically significant.
Note that is usually larger than 1, i.e., we need to consider more than one indices simultaneously. This paper studies the problem of testing changes within a framework of multivariate statistical analysis. Let , and assume a multivariate normal distribution with unknown mean and covariance matrix , where is positive definite. The value of can be used to quantify the uncertainty of the change from to , where denotes the trace of a matrix, and we let it be the parameter of interest. Let the independent sample of be , where for . With a prespecified , the problem of testing changes is formulated as the following statistical hypothesis testing problems:where is prespecified by experienced practitioners and can be viewed as a threshold value for justifying whether a change significantly occurs from to . In other words, we say that a change significantly occurs if and only if is greater than . When we need strong evidence to support that change occurs, we should consider testing (1). This can be used in testing whether a drug is effective. When we need strong evidence to support that there is no change, we should consider testing (2). This can be used in manufacturing where we expect that some process does not influence the product.
It should be pointed out that the above statistical problems are completely different from the conventional two-sample comparison test since is not independent of . They are indices of the same population in two phases. Problems (1) and (2) are also different from mean tests or covariance tests of multivariate normal distributions, which can be derived based on the Wilks statistics (likelihood ratio statistics) , since the parameter involves both the mean and a part of the covariance matrix. To the best of our knowledge, there is no work on the above statistical hypothesis testing problems for detecting changes. Existing research studies on change detection in the literature [2–4] relate to different statistical models or focuses. Note that the distributions of the likelihood ratio statistics under the null one-sided hypotheses like (1) and (2) are very complicated . In this paper, we use the generalized inference approach [6, 7] to provide generalized -values of testing (1) and (2). The concept of the generalized value was introduced by Tsui and Weerahandi ; this idea was extended to interval estimation by Weerahandi . Since then the generalized inference approach has been successfully applied to many complex inferential problems, many of which appear in industry and medical science, see Mu et al. , Chen and Ye , and Feng and Tian  among many others. For multivariate normal distributions, Gamage et al.  and Park  proposed the generalized inference methods for specified problems, which are different from the above testing problems in (1) and (2). The generalized inference approach is proven to be asymptotically correct under mild conditions [13–15]. Simulation results in the literature indicate that this approach usually possesses good frequentist properties. Another appealing feature of the generalized inference approach is its simplicity under some parametric models, especially normal models. We do not need to derive complex (asymptotic) distributions of test statistics such as the likelihood ratio statistics for one-sided hypotheses.
This paper is organized as follows. Section 2 presents the generalized inference approach for testing (1) and (2). Section 3 extends the proposed method to multiple testing problems. In Section 4, we make a simulation study for Type-I error and power performance of the proposed methods. We apply our method to a dataset in manufacturing in Section 5. We conclude the paper with some concluding remarks in Section 6.
Based on the sample independently and identically distributed from , we first construct the generalized pivotal quantities of the unknown parameters , which can induce the generalized pivotal quantity of . Then, the generalized values for testing (1) and (2) can be given by the generalized pivotal quantity of .
It is known that , the Wishart distribution with degrees of freedom and scale parameter matrix . Let be the Cholesky factor of . That is, , where is a lower triangular matrix with positive diagonal elements. Such a decomposition is unique for any positive definite matrix . Let be the Cholesky factor of with positive diagonal elements. Writeand we see that is a lower triangular matrix with positive diagonal elements and , which indicates that is the unique Cholesky factor of . On the contrary, we writeand is independent of . Based on the construction method of generalized pivotal quantities introduced by , we can obtain from (4) and (5) that the generalized pivotal quantities of and are
Therefore, the generalized pivotal quantity of is
Note that can be viewed as a distribution estimator  or a generalized bootstrap variable  for . Some experience shows that overestimates in finite-sample cases, and this will lead to biases in testing one-side hypotheses such as (1) and (2). To overcome the difficulty, here we use Xiong  generalized bootstrap variable method to present new values for (1) and (2). Letbe the maximum likelihood estimator of . By Xiong  method, can be used to approximate the distribution of , and this leads to values for testing (1) and (2):respectively. We can use some combinations of the above results to give values. Algorithm 1 provides implemental details for computing the value in (11), and those for computing (8), (9), and (12) can be obtained similarly.
3. Extension to Multistage Detections
We sometimes need to detect the changes after a multistage process. Let denote the initial indices of the population. After the th stage, becomes for . Denote , where with unknown mean and unknown positive definite covariance matrix for . We want to know which ones among the changes, , are statistically significant. Then, problems (1) and (2) should be extended to the following two multiple testing problems:andrespectively. Here, the threshold values , are prespecified.
Let . Based on the independent sample , we provide a Bonferroni inequality method for (13) and (14). The Bonferroni inequality is commonly used in multiple comparison problems . Algorithm 2 presents implemental details of our method for (13), and those for (14) can be obtained similarly. It can be seen that the Bonferroni inequality method is simple to implement. It does not depend on the correlations between and can control the family-wise error rate .
4. Simulation Study
We now illustrate the proposed procedure with some simulated data. We consider the following cases:
For each case, we use Algorithm 1 to implement our tests when . The combinations of are given in Table 1. The Monte Carlo sample size in Algorithm 1 is set as . The empirical Type-I errors over 10,000 repetitions are shown in Table 1. We can see that most errors are close to or less than the nominal level . As increases, the errors become closer to . These findings indicate that the proposed testing methods are effective.
In addition, we compute the powers of our test for (1) under Case (I). First, we let increase and compute the powers at and . The results are shown in the left side of Figure 1. Then, we fix , and compute the powers at and with increasing . The results are displayed in the right side of Figure 1. It can be seen that the power performance of our method is reasonable: the power tends to one fast as or increases.
5. Case Study
A bearing is the basic element of any rotating mechanism . For a certain type of bearings that are used in medical devices, we investigate the manufacturing process of their retainers. Figure 2 shows a type of bearing retainers. There is a postprocessing in the entire manufacturing process, including two cleaning stages and one heat treatment stage. The objective is to assess whether the inner and outer diameters of the retainers significantly change after each stage in the postprocessing.
In this study, we obtain 34 sample retainers and measure their inner and outer diameters before and after the postprocessing. The original data cannot be open due to a confidentiality agreement, and thus, they were slightly modified via a linear transformation. We use the modified data to conduct our analysis. For the th retainer, let and denote its diameters before and after stage , respectively, where the first and second components of represent the inner and outer diameters, respectively. The scatter plots of and are presented in Figure 3 for . Based on the data, we test the multiple hypotheses (14) using the Bonferroni inequality method in Algorithm 2. The values of in the three stages are, respectively, , , and , which are suggested by engineers. With the Monte Carlo sample size , we obtain the generalized values , , and , respectively. Given the nominal level , we should reject and , and this indicates that the inner and outer diameters do not significantly change after the second cleaning stage and the heat treatment stage.
6. Concluding Remarks
In this paper, we have proposed a multivariate statistical approach to detect whether a change occurs after some process. The parameter of interest in our approach describes both changes in mean and variance terms and can be acted as an index to detect the change of the population. The generalized values for testing whether the parameter of interest exceeds a threshold value have been presented, and we have shown them to have good performance in our simulations. We also apply our methods to analyze products of bearing retainers and conclude that they do not significantly change after two stages in the postprocessing. A future topic is to extend our methods to other models when the normal assumption does not match the data. For such cases, it may be difficult to use the generalized inference approach. We will consider to combine this approach and other popular inferential methods such as the bootstrap method [20, 21]. Another possible future direction is to study the false discovery rate control method  when the number of stages in Section 3 is large.
In the case study part (Section 5), we obtain 34 sample retainers and measure their inner and outer diameters before and after the postprocessing. The data cannot be open due to a confidentiality agreement, and thus, they were slightly modified.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported by the National Natural Science Foundation of China (Grant nos. 11671386, 11871033, and 11871294).
T. Robertson, F. T. Wright, and R. L. Dykstra, Order Restricted Statistical Inference, Wiley, New York, NY, USA, 1988.
W. Mu, X. Xu, and S. Xiong, “Inference on system reliability for independent series components,” Communications in Statistics: Theory and Methods, vol. 38, pp. 409–418, 2009.View at: Google Scholar
D. A. Harville, Matrix Algebra from a Statistician’s Perspective, Springer, New York, NY, USA, 1997.
S. Dudoit and M. J. van der Laan, Multiple Testing Procedures with Applications to Genomics, Springer, New York, NY, USA, 2008.
T. A. Harris and M. N. Kotzalas, Essential Concepts of Bearing Technology, CRC Press Marcel Dekker, New York, NY, USA, 2006.