A Measure of Uncertainty regarding the Interval Constraint of Normal Mean Elicited by Two Stages of a Prior Hierarchy
This paper considers a hierarchical screened Gaussian model (HSGM) for Bayesian inference of normal models when an interval constraint in the mean parameter space needs to be incorporated in the modeling but when such a restriction is uncertain. An objective measure of the uncertainty, regarding the interval constraint, accounted for by using the HSGM is proposed for the Bayesian inference. For this purpose, we drive a maximum entropy prior of the normal mean, eliciting the uncertainty regarding the interval constraint, and then obtain the uncertainty measure by considering the relationship between the maximum entropy prior and the marginal prior of the normal mean in HSGM. Bayesian estimation procedure of HSGM is developed and two numerical illustrations pertaining to the properties of the uncertainty measure are provided.
Consider the following model for normally distributed data: Bayesian analysis of the model (1) begins with the specification of prior distributions for unknown parameters and the noise variance . Specifically, we assign a normal prior distribution for and an inverse gamma (IG) prior for , that is, and , which are commonly used in a normal model as conjugate priors, where all the hyperparameters and , , and are assumed to be known in the first place.
On the other hand, when we are completely sure about a functional constraint of a priori; a suitable restriction on the parameter space , such as using a truncated normal distribution, is expected. However, it is often the case that the actual observations of (1) may violate the constraint on account of the measurement error or due to some other reasons. Further, the data may provide strong evidence that the constraint is inappropriate and therefore may appear to contradict the theory associated with the constraint. In this respect, it is expected that the uncertainty about the constraint is taken into account in the estimation procedure. O’Hagan and Leonard  indeed proposed two stages of a prior hierarchy based on the truncated prior distribution, which reflects the uncertainty about the parameter constraint. Liseo and Loperfido , Kim , and Kim and Choi  among others considered the Bayesian estimation of normal models with uncertain interval constraints using the idea of two stages of a prior hierarchy. In particular, Kim  obtained the marginal prior of as the normal selection distribution (e.g., ) and thus exploited the class of weighted normal distribution by Kim  for reflecting the uncertain prior belief on .
Although the two-stage prior is applied by many investigators, there is no objective method to measure (or control) the degree of uncertainty regarding the interval constraint of accounted for by using the two stages of a prior hierarchy. This is a major hindrance factor in developing the idea of the two stages of a prior hierarchy which is advocated by O’Hagan and Leonard . Thus, such practical problem motivates us to develop a formal measure of uncertainty about the constraint in order to show how the uncertainty of the prior information regarding interval constrained parameter space of can be reflected by utilizing the two stages of a prior hierarchy. This topic is tackled in this paper.
To propose the uncertainty measure, we consider the Bayesian inference of the normal mean in (1), but subject to an uncertain interval constraint. Because the maximum entropy prior by Jaynes [7, 8] is useful for describing (or measuring) the relative levels of uncertainty about the distribution of the prior parameter, our investigation focuses on the theoretical relationship between the two stages of a prior hierarchy of by O’Hagan and Leonard  and the maximum entropy prior subject to an uncertain interval constraint. The remainder of this paper is organized as follows. In Section 2, we briefly discuss the two-stage prior of , which will be used for the Bayesian analysis of subject to uncertainty regarding the interval constraint. Accordingly, influenced by the seminal work of O’Hagan and Leonard , we provide a normal model based on the two-stage prior distribution of , referred to as hierarchical screened Gaussian model (HSGM). In Section 3, we explore the theoretical properties of the two-stage prior of by using Boltzmann’s maximum entropy theorem [9, 10]. Based on the properties, we propose an objective measure of uncertainty regarding the interval constraint of that is accounted for by the two-stage prior. In Section 4, we explore Bayesian estimation procedure by analytically deriving the posterior distribution of the unknown parameters under HSGM, and we discuss the properties of the proposed measure of uncertainty that can be explained in the context of HSGM. Finally, the concluding remarks along with a discussion are made in Section 5.
2. Hierarchical Screened Gaussian Model
Let us assume the normal model (1) and consider the two stages of a prior hierarchy in the following way: where denotes a doubly truncated distribution with the lower truncation point and upper truncation point . In practice, there are certain cases in which we have a priori information that is highly likely to have an interval constraint, and thus the value of needs to be located with uncertainty in a restricted space ,
In order to elicit the prior distribution on the uncertain interval constraint, we utilize the two-stage hierarchical model as in (2), which was initially advocated by O’Hagan and Leonard  in which values of that do not belong to are penalized to a lesser extent. In this respect, the normal model structure of (2) is referred to as HSGM in the remainder of the paper, because the two stages of a prior hierarchy on are considered and the resulting marginal prior distribution of becomes the weighted normal (or interval screened normal) distribution studied by Kim . This is shown as follows. Since , the marginal prior of under HSGM is where and , respectively, denote the density of and the distribution function of and .
We see that (4) is the density of a weighted normal (WN) distribution by Kim . This leads to the following assertion in Lemma 1.
Lemma 1. The two-stage prior of in HSGM of (2), eliciting the uncertain interval constraint (3), is marginally distributed as a weighted normal distribution, with its density (4), where
Note that , where , a bivariate normal distribution. See Kim  for various properties of the distribution. According to O’Hagan and Leonald , the first stage variance, , of the mean may measure the degree of uncertainty in the constraint. However, there is no systematic method to assign the values of (or ), according to a priori specified degree (say, of uncertainty about the interval constraint (3). This is a major hindrance factor in developing the HSGM.
3. The Measure of Uncertainty
3.1. A Maximum Entropy Prior
Sometimes we have a situation where partial prior information is available, outside of which it is desired to use a priori that is as noninformative as possible. A useful method of dealing with this problem is through the concept of entropy by Jaynes [7, 8]. As discussed by Rosenkrantz , entropy has a direct relationship to information theory and in a sense measures the amount of uncertainty inherent in the probability distribution.
Assume now that we can specify the partial information concerning a location parameter (including the normal mean) with continuous space of the form but with nothing else about our prior distribution . Then the maximum entropy prior can be obtained by choosing that maximizes the entropy in the presence of the partial information in the form of (7). A straightforward application of the calculus of variation leads us to Boltzmann’s maximum entropy theorem. This tells us that the density that maximizes , subject to the constraints , takes the -parameter exponential family form where can be determined, via the -constraints, in terms of , . See Leonard and Hsu  for the proof.
3.2. Maximum Entropy Prior for Constrained Normal Mean
Let the location parameter of our interest be the normal mean in (1). Then the results of the previous subsection can be applied to the prior for the normal mean . This subsection considers the case where the partial priori information of is in the form of an interval, that is, with , and examines how the maximum entropy prior of , that is, , has different formula according to the degree of uncertainty regarding the interval constraint.
Case 1. , and with certainty.
When we have partial priori information about that we can specify values for both mean and variance . Then the prior specification is the maximum entropy prior for (e.g., ). Thus the prior density and its entropy for the Case 1 are respectively.
Case 2. , and with certainty.
On the other hand, when we are completely sure about the priori interval constraint of , a suitable restriction on the parameter space such as using a truncated distribution is expected. This case supposes that we can specify values for both and on the space by a priori information. Further suppose that we have certain prior information that the parameter is concentrated on the region , that is, but nothing else about our prior distribution . The last condition is equivalent to . Therefore, by Bolzmann’s maximum entropy theorem, the prior density of for the Case 2 is by (9), because , and . Since , the maximum entropy prior, subject to these three restrictions, is thus provided that we choose , where and .
Some algebra using the moments of the distribution in Johnson et al.  yields the entropy of which is
Case 3. , and with uncertain.
Now suppose that we have partial priori information that we can specify values for both mean and variance of for . In addition, suppose that we have priori uncertain interval constraint information that . Then along with the priori moment conditions and , the additional uncertain prior information about the interval constraint can be expressed by where (or ) denotes the degree of uncertainty. Thus the priori uncertain interval constraint is equivalent to Applying this partial prior information to Bolzmann’s maximum entropy theorem, we have the following lemma.
Lemma 2. If , the maximum entropy prior distribution of , reflecting the uncertain interval constraint in (15) is where , , denote a rectangle probability of and whose joint distribution is a bivariate normal ,
Proof. Taking , and , and setting and , the maximum entropy prior in (9) reduces to by Bolzmann’s maximum entropy theorem. Now the second exponential term in the right hand side of (19) can be determined by using the condition (16) as follows. Among all the possible proper prior densities of the form (19), the choice of yields the proper prior density (22)Further, this choice leads to the only proper prior that satisfies in the condition (16), because (22) is the density of .
Thus the maximum entropy prior density for the Case 3 is given by where and .
Note, from Lemma 2, that for and . This is consistent with our partial priori information that and for . Some algebra using the moments of the distribution given in Kim  yields the entropy of (i.e., ) given by As seen in (23), the calculation of involves a complicated integration. Instead, by using a Monte Carlo integration, we may calculate it approximately. According to Kim , it follows that the stochastic representation of the prior distribution with density is where and , and they are independent. This stochastic representation is useful for generating ’s from the prior distribution , and hence implementing the Monte Carlo integration. The following corollary asserts the relationship among , in terms of .
Corollary 3. As , the maximum entropy prior approximates to in (13), while is equivalent to for .
Proof. Note from (22) that and as for . Also note, from the conditional property of , that is equivalent to for finite vales of ,,, and with . Applying these two results to , we see that it approximates to as . It is straightforward to see from Lemma 2 thatfor , because the distribution is equivalent to for .
Each graph of Figure 1 depicts the difference between and as a function of for three values of , two cases of , and . In Figure 1, the difference is denoted by . Since each graph coincides with the results of Corollary 3, we can obtain the following implications from the figure. (i) As expected, we see that for . (ii) The entropy of is a monotone decreasing function of . (iii) Each entropy of the three priors increases as becomes large. (iv) is closely related with degree of uncertainty (i.e,. ) for it is a monotone decreasing function of and is a function of . (v) is a monotone increasing function of for the case where a value of is given.
3.3. Objective Measure of Uncertainty
From a relationship between Lemmas 1 and 2, we can propose an objective measure for the degree of uncertainty regarding a prior interval constraint accounted for by . The following theorem proposes the objective measure using the same notations as used in Lemmas 1 and 2.
Theorem 4. Let , and . Then the two-stage prior of defined by in (2) is equivalent to and the degree of uncertainty regarding the interval constraint, , accounted for by the is
Proof. If , and , the marginal prior distribution of in Lemma 1 is equivalent to the density of. Thus the prior density in (4) is equal to the maximum entropy prior in (22). Now Lemma 2 asserts that reflects uncertainty about the interval constraint by the degree of with .
Theorem 4 provides an exact measure of the uncertainty about the priori interval constraint on accounted for by , and it shows that the uncertainty measure is different from that of O’Hagan and Leonald  which mainly depends on the first stage variance, , of in . Theorem 4 also indicates that in (2) can be used to elicit the priori uncertain interval information associated with . Further, the entropy of the two-stage prior defined by the (i.e., in (4)) can be calculated by using the formula of in (23). We can visualize the degree of uncertainty about the priori interval constraint, , by plotting for different values of in Figure 2.
From Figure 2, we can see, for fixed value of , that with small values of tends to increase the uncertainty regarding the priori constraint. This coincides with the result which is asserted by Corollary 3. Further, we see from the figure that, for a fixed value of , we can enlarge (or reduce) the uncertainty about the priori constraint by increasing (or decreasing) the amount of (or equivalently and ) in the two stages of a prior hierarchy of .
When the data information is , it is well known that the maximum entropy priors (4) and (9) are conjugate priors for the normal mean when is known. This is obtained from the following posteriors: where and . Thus, we see that each prior satisfies the conjugate property that and belong to the same family of distributions for . The following corollary provides this conjugate property which also applies to in (22).
Corollary 5. Let with known . Then the prior yields the posterior distribution of given by where , and .
Proof. Since with known , under the prior in (22), the posterior density of is given by in that , where and . This is a kernel of density.
4. Posterior Estimation
Let us revisit the HSGM with the following two stages of a prior hierarchy of : According to Theorem 4, we see that the two-stage prior of is essentially the same as the maximum entropy prior which properly elicits the partial priori information of an uncertain interval constraint, , with degree of uncertainty, where . Here and are true prior mean and variance of the parameter when the uncertain priori interval condition does not exist.
Based on the marginal prior distribution of in Lemma 1, we have the joint posterior distribution of and proportional to the product of likelihood and the prior distribution, where is the inverse-gamma density with parameters and , and is the density (22), that is, the density of . Note that the joint posterior is not simplified in an analytic form of the known density and thus intractable for posterior inference. Instead, we derive each of the conditional posterior distributions of and in an explicit form, which will be useful for posterior inference such as Gibbs sampling (e.g., ).
Corollary 6. Given the joint posterior distribution (31), we have the following. (i) The full conditional posterior distribution of is given by where , and .(ii)The full conditional posterior distribution of is the inverse-Gamma distribution
Proof. The unnormalized conditional density of given that and is proportional to
where . Thus direct application of Corollary 5 yields the result.
It is straightforward to see from (31) that the full conditional posterior distribution of is which is a kernel of distribution.
It is not difficult to construct the Gibbs sampling scheme working with because their full conditional distributions are given in Corollary 6. A routine Gibbs sampler would work to generate posterior samples of . The only difficulty would lie in generating random samples from distribution, that is, as given in (32). In order to generate random samples from the WN distribution, we can make use of the stochastic representation given in (24). Also we can note that the stochastic representation (24) of the full conditional posterior distribution (32) provides the posterior mean and variance given by where , and are respective mean and variance of the truncated standard normal distribution, with , and . Johnson et al.  shows that they are where denotes the density of . It is seen that each of the first terms in (37), that is, and , is the posterior mean and variance of when the prior is put on (e.g. [15, 16]), instead of the two-stage prior of Theorem 4, that is, . That is each of the second terms in the posterior mean and covariance vanishes when is assumed to be a priori without any interval constraint. In this sense, the second terms in (37) can be interpreted as a constraining effect obtained by adopting the . See, for example, Kim , Kim and Choi , and H. j. Kim and H. M. Kim  for practical applications of methodology to get the constraining effect in various constrained parameter problems.
This paper considered the normal models which include the two-stage prior of the normal mean, referred to as hierarchical screened Gaussian normal model (). The HSGMis based on the two stages of a prior hierarch advocated by O’Hagan and Leonard  and elicits partial priori information obtained for the case where an interval constraint of the normal mean needs to be incorporated in the modeling but such a restriction is uncertain. Then we proposed an objective method to measure (or control) the uncertainty accounted for by . For this purpose, we derived the maximum entropy prior, reflecting the uncertainty about an interval constraint on the normal mean, by using Boltzmann’s maximum entropy theorem. As a result, we found that both the maximum entropy prior and the two-stage prior in belong to the family of weighted normal distributions considered by Kim . By exploring the distributional relationship between the two priors, we proposed the objective measure of uncertainty. This paper also proposed the Bayesian estimation procedure of and investigated some properties of the procedure by deriving the full conditional posterior distributions of unknown parameters under in analytic forms.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT, and Future Planning (2013R1A2A2A01004790).
A. O'Hagan and T. Leonard, “Bayes estimation subject to uncertainty about parameter constraints,” Biometrika, vol. 63, no. 1, pp. 201–203, 1976.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
B. Liseo and N. Loperfido, “A Bayesian interpretation of the multivariate skew-normal distribution,” Statistics & Probability Letters, vol. 61, no. 4, pp. 395–401, 2003.View at: Publisher Site | Google Scholar | MathSciNet
H. J. Kim, “On a class of multivariate normal selection priors and its applications in Bayesian inference,” Journal of the Korean Statistical Society, vol. 40, no. 1, pp. 63–73, 2011.View at: Publisher Site | Google Scholar | MathSciNet
H. J. Kim and T. Choi, “On Bayesian estimation of regression models subject to uncertainty about functional constraints,” Journal of the Korean Statistical Society, vol. 43, no. 1, pp. 133–147, 2014.View at: Publisher Site | Google Scholar | MathSciNet
R. B. Arellano-Valle, M. D. Branco, and M. G. Genton, “A unified view on skewed distributions arising from selections,” The Canadian Journal of Statistics, vol. 34, no. 4, pp. 581–601, 2006.View at: Publisher Site | Google Scholar | MathSciNet
H. J. Kim, “A class of weighted multivariate normal distributions and its properties,” Journal of Multivariate Analysis, vol. 99, no. 8, pp. 1758–1771, 2008.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
E. T. Jaynes, “Prior probabilities,” IEEE Transactions on Systems Science and Cybernetics, vol. 4, no. 3, pp. 227–241, 1968.View at: Publisher Site | Google Scholar
R. D. Rosenkrantz, Ed., E. T. Jaynes: Papers on Probability, Statistics and Statistical Physics, Reidel, Dordrecht, The Netherlands, 1983.
C. Cercignani, The Boltzman Equation and Its Applications, Springer, Berlin, Germany, 1988.
R. D. Rosenkrantz, E. T. Jaynes: Papers on Probability, Statistics, and Statistical Physics, Kluwer Academic, Norwell, Mass, USA, 1989.
R. D. Rosenkrantz, Inference, Method, and Decision: Towards a Bayesian Philosophy and Science, Reidel, Boston, Mass, USA, 1977.
T. Leonard and J. S. J. Hsu, Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, UK, 1999.View at: MathSciNet
N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate Distributions, vol. 1, John Wiley & Sons, New York, NY, USA, 1994.
A. E. Gelfand, A. F. M. Smith, and T. Lee, “Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling,” Journal of the American Statistical Association, vol. 87, no. 418, pp. 523–532, 1992.View at: Publisher Site | Google Scholar | MathSciNet
C. E. Rasmussen and C. K. I. Williams, Gaussian Process for Machine Learning, The MIT Press, Boston, Mass, USA, 2006.View at: MathSciNet
J. Q. Shi and T. Choi, Gaussian Process Regression Analysis for Functional Data, Monographs on Statistics and Applied Probability, Chapman & Hall, London, UK, 2011.
H. J. Kim and H.-M. Kim, “A class of rectangle-screened multivariate normal distributions and its applications,” Statistics: A Journal of Theoretical and Applied Statistics, 2014.View at: Publisher Site | Google Scholar