Research Article | Open Access
Posterior Propriety of an Objective Prior in a 4-Level Normal Hierarchical Model
The use of hierarchical Bayesian models in statistical practice is extensive, yet it is dangerous to implement the Gibbs sampler without checking that the posterior is proper. Formal approaches to objective Bayesian analysis, such as the Jeffreys-rule approach or reference prior approach, are only implementable in simple hierarchical settings. In this paper, we consider a 4-level multivariate normal hierarchical model. We demonstrate the posterior using our recommended prior which is proper in the 4-level normal hierarchical models. A primary advantage of the recommended prior over other proposed objective priors is that it can be used at any level of a hierarchical model.
Bayesian hierarchical models have a wide range of modern applications including engineering , astrophysics , economics , environmental sciences , climatology , survival analysis , and genetics . It is wonderful here to stay, but hyperparameter priors are often chosen in a casual fashion. Statisticians often use improper priors to express ignorance or to provide good frequency properties. However, it is dangerous to implement the Gibbs sampler without checking that the posterior is proper. As Hobert and Casella  pointed, without proper precaution, simple noninformative priors can be misused, sometimes unknowingly, and lead to other difficulties, such as the nonconvergence of the Gibbs sampler. Therefore, it is hazardous to skip the demonstration at the risk of making the inference from an improper posterior distribution. There are many examples of this in the statistical and other literatures. See especially, Wang et al. ; Hobert and Casella [8, 10]; Berger and Strawderman ; Berger et al. ; Speckman et al. ; Roy and Dey ; Michalak and Morris ; Ramos et al. ; Tomazella et al. ; etc. The importance of the posterior propriety motivates us to explore it.
The normal hierarchical distribution has received enormous attention and is also of substantial importance in contemporary statistical theory and application. Ning  proposed a 2-level multivariate normal hierarchical model for the degradation data of multiple units with change point. Wang and Coit  handled the reliability prediction problem using a multivariate normal distribution model, considering multiple degradation measures. Heuristically, one could also build the multivariate normal hierarchical model with adding a normal distribution for unknown mean vectors to deal with this reliability prediction problem. The unknown parameters because of either convenience or a lack of prior information are often modeled with improper objective priors. Some references of objective priors can be found in Berger and Strawderman ; Berger ; Pollo et al. ; and Ferreira et al. . However, formal approaches to objective Bayesian analysis, such as the Jeffreys-rule approach or reference prior approach, are only implementable in simple hierarchical settings. It is thus common to use less formal approaches, such as utilizing formal priors from nonhierarchical models in hierarchical settings. However, this can be fraught with danger. For instance, nonhierarchical Jeffreys-rule priors for variances or covariance matrices result in improper posterior distributions if they are used at higher levels of a hierarchical model (see ).
Berger et al.  addressed the question of choice of hyperpriors in the following 2-level normal hierarchical model:wherefor , in which the are the observation vectors and the are the unknown mean vectors (thus ), is a unknown “hypermean” vector, and is an unknown “hypercovariance matrix.” Some commonly priors for the hyperparameter can be considered. For example, the constant prior and conjugate normal prior. For the prior of the covariance matrix , a large literature can be found. For example, the articles by Dey and Srinivasan ; Lin and Perlman ; Haff ; Yang and Berger ; Daniels and Kass [26, 27]; Ledoit and Wolf ; Sun and Berger ; Rajaratnam et al. ; Hoff ; And Berger et al.  explored posterior propriety and admissibility of various priors. But no overall conclusion was reached as to a specific prior in their paper. Following the seminal work of Berger et al. ; Berger et al.  recommended a particular objective prior for choice of and :where are the ordered eigenvalues of . Their recommendation is based on considerations of the posterior propriety, admissibility, ease of implementation (including computational considerations), and performance. A primary advantage of the recommended prior over other proposed objective priors is that it can be used at any level of a hierarchical model. But the property of the resulting posterior is not clear, and it represents a very difficult question to answer. Michalak and Morris  pointed out that except in the simplest models when improper priors are used, it can be daunting and time-consuming to verify that the resulting posterior distribution is proper. While Berger et al.  suspected that the propriety holds in any level normal hierarchical models, they were not strictly able to prove it. They only showed that the posterior is proper in a 3-level hierarchical model. In this paper, we follow the story and consider the posterior propriety of the recommended prior in a 4-level normal hierarchical model. We demonstrate that the posterior using the recommended prior is still proper in the 4-level normal hierarchical model.
The structure of this paper is as follows. Section 2 presents a 4-level normal hierarchical model. We propose the recommended hyperpriors for “hypermean” vectors and “hypercovariance matrix.” In Section 3, we demonstrate that the posterior using the recommended prior is proper in the 4-level normal hierarchical model. In Section 4, we consider MCMC computation from the posterior. Section 5 gives the performance of this prior, in comparison with other objective priors that were studied in Berger et al. , presenting strong numerical evidence of the superiority of (3). Some concluding remarks are provided in Section 6.
2. A 4-Level Normal Hierarchical Model
Consider the following 4-level normal hierarchical model:where the are the observation vectors; the are the unknown mean vectors; the are the unknown vectors; the are the unknown vectors; is an unknown q-dimensional ”hypermean” vectors; the are the known matrices; the are the known matrices; and , , and are the unknown , , and “hypercovariance matrix,” respectively. Following the assumption of Berger et al. , let and . We also assume that , , , and .
There are four unknown parameters: . From Berger et al. , the recommended (independent) priors for becomewhere are the ordered eigenvalues of , are the ordered eigenvalues of , and are the ordered eigenvalues of .
As discussed by Berger et al. , the prior can be represented by the following hierarchical structure:
For both intuitive and technical reasons, it is convenient to write , , and , where and being the matrix of eigenvectors corresponding to , and , , respectively. Consider the one-to-one transformation from to and rewrite the prior aswhere and denotes the invariant Haar measure over the space of orthonormal matrices (see Anderson  for definition). From Farrell , the functional relationship between and is
Therefore, the prior for becomes the prior of :
Use of the invariant prior on (essentially a uniform prior over rotations) is natural and noncontroversial. This transformation reveals a significant difficulty of any prior that can be written as a function of ; in the space, such priors contain the term , which gives low mass to close eigenvalues and hence effectively force the eigenvalues apart. It is a criticism of inverse Wishart and Jeffreys priors. This is contrary to the common intuition, in that one often chooses a prior that pushes the eigenvalues closer together.
Similarly, one can obtain the prior of and given by
3. Posterior Propriety
In this section, we consider the posterior propriety of the recommended prior in the 4-level normal hierarchical model (4).
3.1. The Case
Theorem 1. Consider the hierarchical Bayes model (4) with . Assume that has rank and has rank . Then, the posterior distribution is always proper.
Proof. For the technical reason, we defineThen, (4) is equivalent toWe marginalize out over , , and , one by one. It yields the marginal distribution of given iswhere andNote that , and marginalizing out yields the marginal distribution of given :Therefore, the likelihood of given iswhere and ϵ is a positive constant, assuring Note that Let be the marginal density of . Clearly,Note thatFrom relationship (8) between and ,Next, consider the integration over . From the matrix identity , we obtainLet be the smallest eigenvalue of , thenThus,Using the above inequality, it yieldsLet be the eigenvalues of . Using the matrix identity again, it follows thatTherefore,Last, to consider the integration over . Let be the smallest eigenvalue of . Then,Thus,which is finite ifSince ϵ can be chosen arbitrarily small, the integration over is finite ifSince , (29) is true. The theorem is proved.
3.2. The Case
Theorem 2. Consider the hierarchical Bayes model (4) with . Assume that has rank and has rank n. Then, the posterior distribution is proper if
Proof. In the case , note that is just a variance, so thatThe prior of ξ becomesIntegrating out ξ in (22) with its constant prior and dropping all exponential terms of the likelihood (as they are less than one), the marginal likelihood for and satisfiesThus, for the posterior propriety, we need to verifyDenote be the largest eigenvalue of Let and denote the minimum and maximum eigenvalue of , respectively. Then,Noting that is an vector, it follows thatLet and , where ϵ is a small positive constant, satisfying Then,Clearly,where Note thatwhich is finite since Similarly,which is finite since Next, consider the integration over It follows thatif Since ϵ can be chosen arbitrarily small, and ; this is true ifThis completes the proof.
In this section, we consider the MCMC computation from the posterior arising from the model (4). The normal hierarchical models are typically handled today by Gibbs sampling. One difficulty of computation is to sample the covariance matrix efficiently. The main new development discussed in Berger et al.  is a new and efficient computational algorithm for dealing with priors on covariance matrices as in (3). It overcomes the computational bottleneck mentioned in the introduction.
Fact 1. Here are the full conditional distributions:(a)For , the conditional posterior of given and is the usual conjugate posterior density:(b)Define The conditional posterior of given is (c)Defining , the conditional posterior density of for given can be written as Here and in the following represents for a squared matrix , and(d)Define The conditional posterior of given is (e)Defining , the conditional posterior density of for given can be written as where (f)To compute with the recommended prior for , use the equivalent representation as follows:(g)Defining , the conditional posterior density of for given can be written as where (i) For sample λ from its full conditional, the Inverse Gamma density If , this step is not needed as the hyperprior, for ξ is constant. (ii) Given , Gibbs sampling of can be done from its full conditional, which is (when , set )Sampling of , , , and can simply be carried out with a Gibbs step, as its full conditional will be a normal distribution. To sample the covariance matrix , , and from Fact 1, the Metropolis–Hastings Algorithm  and Hit-and-Run Method Chen and Schmeiser  could be used, based on proposal distributions that generate full-candidate matrices. But the two well-known methods are both inefficient, especially for high-dimensional data. Fortunately, Berger et al.  proposed a new and efficient computational algorithm for sampling the covariance matrix , , and from Fact 1, respectively.
Take sampling from (45) as an example. For the conditional density of given in (45), we use the eigenvalue-vector decomposition , where is the orthogonal and is the diagonal matrix of ordered eigenvalues. For defined in (46), note that so the conditional density of given in (45) can be transformed toGibbs sampling of : following Lemma 1 in Berger et al. , i.e.,we can first sample given , fromwhere is the element of . Thus, we can directly sample independently from inverse gamma distributions, and then simply rearrange the , so that .
Gibbs sampling of : from (19), the conditional density of given and isWrite where is the matrix of normalized eigenvectors and is the diagonal matrix of corresponding eigenvalues with Denote Then, the conditional density of given and isHoff  introduced an MCMC algorithm to sample from the posterior of by updating two randomly selected columns of . Alternatively, Berger et al.  suggested updating two randomly selected rows of (essentially equivalent to Hoff’s method in full rank cases, but considerably faster in situations of less than full rank). For example, to update the first two rows of , we write where is the first 2 rows of is the rest rows of andHere, and for Write and Then, the conditional posterior of ω, isWritewhere and . From (59), the conditional posterior of θ iswhere . Let . Then, the full conditional density of α isSampling can be done with a rejection sampler with proposal distribution.
From the studies in Berger et al. , the new method substantially outperforms the Metropolis and Hit-and-Run algorithms in moderate dimensions and succeeds for k up to 100, whereas the other methods break down in much lower dimensions.
From the mean square error (MSE) perspective, we compare the performance of 6 objective hyperpriors (considered in Berger et al. ), created from the product of three objective hyperpriors for :(1)Constant prior: (2)Conjugate prior: (3)Recommended prior: and two objective hyperpriors for (1)Constant prior: and (2)Recommended reference prior:
Except for the constant prior and recommended reference prior, Berger et al.  also studied the nonhierarchical independence Jeffreys prior () and the hierarchical independence Jeffreys prior () in the model (1). From Berger et al. , the nonhierarchical independence Jeffreys prior for the covariance matrix cannot be used for the hierarchical models since it results in improper posterior. For the hierarchical independence Jeffreys prior, the common sampling methods are very inefficient to sample from the posterior distribution. Therefore, we do not consider either of the abovementioned two Jeffreys priors.
Set . We generate , where for Similarly, , where for We simulate the Bayes risks of the posterior means resulting from the 6 priors in every combination of the following cases:(i) or ;(ii) or (, , ).
Each choice of specifies a “true” hierarchical model for the simulation and we wish to compute the Bayes risk corresponding to the posterior means for that arise from each of the 6 objective priors. To simulate these risks, we generate 2000 random and generate observations , . To obtain the posterior means, , of , under the 6 priors, we run MCMC cycles after burn-in cycles, using the algorithms described above. Finally, we approximate the Bayes risk as the average observed mean squared error (MSE) as follows:
The results are given in Table 1. The recommended prior (in the last line) dominates all the others in terms of risk. Within each three-row segment, comparison of the first row with the third row is a comparison between using the constant prior for and the recommended prior. The gains with the recommended prior are large. Comparison of each of the first three rows with each of the last three rows is a comparison between using the constant prior for versus the recommended prior. The gains with the recommended prior are also large. The risk results present the strong numerical evidence of the superiority of the recommended priors (5).
In this paper, we follow Berger et al.’s  work and study a 4-level normal hierarchical model. We demonstrate that the posterior using the recommended prior is still proper in the 4-level normal hierarchical model. Herein, we do not demonstrate Berger et al.’s  conjecture, i.e., the posterior using the recommended prior is always proper in any level normal hierarchical models. But our method provides one useful guideline for the completion of the story.
Besides, the normal hierarchical models are typically handled by Gibbs sampling. One difficulty of computation is to sample the covariance matrix efficiently. The common sampling methods for covariance matrices, for example, Metropolis–Hastings algorithm  and Hit-and-Run method , are inefficient for higher dimensions. To overcome the computational bottleneck of sampling covariance matrices, we can use a powerful and efficient method proposed by Berger et al.  for sampling the conditional density of covariance matrices. Therefore, there is no difficulty in computation for the 4-level normal hierarchical model with the recommended priors (5). In addition, the simulation result presents the strong numerical evidence of the superiority of the recommended priors (5).
No data were used to support this study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The research was supported by the Chinese 111 Project (Grant no. B14019), the National Natural Science Foundation of China (Grant nos. 11671146 and 11901519), and the China Postdoctoral Science Foundation (Grant no. 2019M661416).
- D. Shermon, Systems Cost Engineering Program Affordability Management and Cost Control, Gower Publishing, Farnham, UK, 2017.
- P. Léna, D. Rouan, F. Lebrun, F. Mignard, and D. Pelat, Observational Astrophysics, Springer Science & Business Media, Berlin, Germany, 2012.
- K. Shimotsu, “Exact local whittle estimation of fractional integration with unknown mean and time trend,” Econometric Theory, vol. 26, no. 2, pp. 501–540, 2010.
- N. Shoari, J.-S. Dubé, and S. E. Chenouri, “Estimating the mean and standard deviation of environmental data with below detection limit observations: considering highly skewed data and model misspecification,” Chemosphere, vol. 138, pp. 599–608, 2015.
- C. Chenouri and R. Hale, Time Series Analysis in Meteorology and Climatology: an Introduction, vol. 7, John Wiley & Sons, Hoboken, NJ, USA, 2012.
- V. L. D. Tomazella, S. R. de Jesus, F. Louzada, S. Nadarajah, and P. L. Ramos, “Reference bayesian analysis for the generalized lognormal distribution with application to survival data,” Statistics and Its Interface, vol. 13, no. 1, pp. 139–149, 2020.
- J. Schafer and K. Strimmer, “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Statistical Applications in Genetics and Molecular Biology, vol. 4, p. 32, 2005.
- J. P. Hobert and G. Casella, “The effect of improper priors on Gibbs sampling in hierarchical linear mixed models,” Journal of the American Statistical Association, vol. 91, no. 436, pp. 1461–1473, 1996.
- C. Wang, J. Rutledge, and D. Gianola, “Bayesian analysis of mixed linear models via gibbs sampling with an application to litter size of iberian pigs,” Genetics Selection Evolution, vol. 26, pp. 1–25, 1994.
- J. P. Hobert and G. Casella, “Functional compatibility, Markov chains, and Gibbs sampling with improper posteriors,” Journal of Computational and Graphical Statistics, vol. 7, no. 1, pp. 42–60, 1998.
- J. O. Berger and W. E. Strawderman, “Choice of hierarchical priors: admissibility in estimation of normal means,” The Annals of Statistics, vol. 24, no. 3, pp. 931–951, 1996.
- J. O. Berger, W. Strawderman, and D. Tang, “Posterior propriety and admissibility of hyperpriors in normal hierarchical models,” The Annals of Statistics, vol. 33, no. 2, pp. 606–646, 2005.
- P. L. Speckman, J. Lee, and D. Sun, “Existence of the MLE and propriety of posteriors for a general multinomial choice model,” Statistica Sinica, vol. 19, pp. 731–748, 2009.
- V. Roy and D. K. Dey, “Propriety of posterior distributions arising in categor- ical and survival models under generalized extreme value distribution,” Statistica Sinica, vol. 24, pp. 699–722, 2014.
- S. E. Michalak and C. N. Morris, “Posterior propriety for hierarchical models with log-likelihoods that have norm bounds,” Bayesian Analysis, vol. 11, no. 2, pp. 545–571, 2016.
- P. L. Ramos, F. Louzada, and E. Ramos, “Posterior properties of the nakagami-m distribution using noninformative priors and applications in reliability,” IEEE Transactions on Reliability, vol. 67, no. 1, pp. 105–117, 2018.
- S. Ning, “Bayesian degradation analysis considering competing risks and residual-life prediction for two-phase degradation,” Ohio University, Athens, OH, USA, 2012, PhD thesis.
- P. Wang and D. W. Coit, “Reliability and degradation modeling with random or uncertain failure threshold,” in Proceedings of the 2007 Annual Reliability and Maintainability Symposium, pp. 392–397, IEEE, Orlando, FL, USA, January 2007.
- J. Berger, “The case for objective bayesian analysis,” Bayesian Analysis, vol. 1, no. 3, pp. 385–402, 2006.
- M. Pollo, V. Tomazella, G. Gilardoni, P. L. Ramos, M. J. Nicola, and F. Louzada, “Objective bayesian inference for repairable system subject to competing risks,” 2018, https://arxiv.org/abs/1804.06466.
- P. H. Ferreira, E. Ramos, P. L. Ramos et al., “Objective bayesian analysis for the lomax distribution,” Statistics & Probability Letters, vol. 159, Article ID 108677, 2019.
- D. K. Dey and C. Srinivasan, “Estimation of a Covariance Matrix under Stein’s Loss,” The Annals of Statistics, vol. 13, no. 4, pp. 1581–1591, 1985.
- S. P. Lin and M. D. Perlman, “A monte carlo comparison of four estimators of a covariance matrix,” in Proceedings of the Sixth International Symposium on Multivariate Analysis VI, Pittsburgh, PA, USA, 1985.
- L. R. Haff, “The variational form of certain Bayes estimators,” The Annals of Statistics, vol. 19, no. 3, pp. 1163–1190, 1991.
- R. Yang and J. O. Berger, “Estimation of a covariance matrix using the reference prior,” The Annals of Statistics, vol. 22, no. 3, pp. 1195–1211, 1994.
- M. J. Daniels and R. E. Kass, “Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models,” Journal of the American Statistical Association, vol. 94, no. 448, pp. 1254–1263, 1999.
- M. J. Daniels and R. E. Kass, “Shrinkage estimators for covariance matrices,” Biometrics, vol. 57, no. 4, pp. 1173–1184, 2001.
- O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,” Journal of Multivariate Analysis, vol. 88, no. 2, pp. 365–411, 2004.
- D. Sun and J. O. Berger, “Objective priors for the multivariate normal model (with discussion),” in Proceedings of the Valencia/ISBA 8th World Meeting on Bayesian Statistics, pp. 525–564, Alicante, Spain, June 2007.
- B. Rajaratnam, H. Massam, and C. M. Carvalho, “Flexible covariance estimation in graphical Gaussian models,” The Annals of Statistics, vol. 36, no. 6, pp. 2818–2849, 2008.
- P. D. Hoff, “A hierarchical eigenmodel for pooled covariance estimation,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 71, no. 5, pp. 971–992, 2009.
- J. O. Berger, D. Sun, and C. Song, “An objective prior for hyperparameters in normal hierarchical models,” Journal of Multivariate Analysis, 2019.
- T. W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, Hoboken, NJ, USA, 1984.
- R. H. Farrell, Multivariate Calculation. Use of the Continuous Groups, Springer, New York, NY, USA, 1985.
- J. O. Berger, D. Sun, and C. Song, “Bayesian anaylsis of the covariance matrix of multivariate normal distribution with a new class of priors,” The Annals of Statistics, 2019.
- W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their applications,” Biometrika, vol. 57, no. 1, pp. 97–109, 1970.
- M.-H. Chen and B. Schmeiser, “Performance of the gibbs, hit-and-run, and Metropolis samplers,” Journal of Computational and Graphical Statistics, vol. 2, no. 3, pp. 251–272, 1993.
Copyright © 2020 Chengyuan Song et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.