Research Article | Open Access
Bayesian Analysis for Dynamic Generalized Linear Latent Model with Application to Tree Survival Rate
Logistic regression model is the most popular regression technique, available for modeling categorical data especially for dichotomous variables. Classic logistic regression model is typically used to interpret relationship between response variables and explanatory variables. However, in real applications, most data sets are collected in follow-up, which leads to the temporal correlation among the data. In order to characterize the different variables correlations, a new method about the latent variables is introduced in this study. At the same time, the latent variables about AR (1) model are used to depict time dependence. In the framework of Bayesian analysis, parameters estimates and statistical inferences are carried out via Gibbs sampler with Metropolis-Hastings (MH) algorithm. Model comparison, based on the Bayes factor, and forecasting/smoothing of the survival rate of the tree are established. A simulation study is conducted to assess the performance of the proposed method and a pika data set is analyzed to illustrate the real application. Since Bayes factor approaches vary significantly, efficiency tests have been performed in order to decide which solution provides a better tool for the analysis of real relational data sets.
Logistic regression model, a widely appreciated model for analyzing categorical response data especially for binary/dichotomous data in social science research, marketing, biomedical studies, and ecology, has been received a substantive attention [1, 2]. The primary concern of logistic regression analysis is to build the interrelationships between explanatory variables and responses and explain the variability of the response probability in terms of the variability in the observed covariates. The common analysis for the logistic regression model is based on the independent subjects, and statistical inference is carried out via the maximum likelihood approach (see [3–5]).
However, in practice, many studies need to account for correlations among the items within time as well as correlations among the same items between times. Independent assumptions cannot capture such correlations. In order to interpret the correlations among the responses, one popular choice is to introduce the latent variables. Even more, taking into account the temporal correlations, a dynamic system is established which leads to the latent variable models [6–8]. However, this dynamic system is associated with dynamic factor. So, a Bayesian approach is proposed in this paper. With an alternative approach about the ML, the development is based on the generalized logistic model in the well-known linear dynamic factor analysis model. To cope with this complicated model, Gibbs sampler  with MH algorithm [10, 11] is implemented to generate a sequence of observations from the appropriate joint posterior distribution. The joint Bayes estimates of the parameters and latent variables scores associated with standard error estimates can be obtained directly based on the simulated observations. Beyond the estimation problem, another main objective of this paper is to introduce the Bayes factor (see ) to test various hypotheses about the posited model. In general, the computation of the Bayes factor involving multiple integrals is rather intractable and, more often, are evaluated via various computational methods (see [13–19], and among others). Among easy-to-construct, we adopted the well-known path sampling technique  for computing Bayes factor. The nice feature of path sampling lies in its simplicity in implementation and effectiveness over such an important sampling (e.g., ) and bridge sampling .
A basic intent of modeling dynamic model is to represent series of observations generated in time and to predict the future evolution of the series. The well-known Kalman recursion  is commonly employed for forecasting and provides the optimal forecasts in the case where the state transitions are linear and dynamic system is Gaussian. However, if Gaussian assumptions, in particular, of the distributions of the measurement errors are violated, Kalman recursion only yields the best linear predictor for linear dynamic system which is substantially different from the optimal forecasts. Many authors have suggested various alternatives to the Kalman filter; see, among others, Kitagawa , Meinhold and Singpurwalla , Carlin et al. , and Smith and West . Motivated by the key idea in Carlin et al. , in this paper, the missing data is treated as the future observations and augmented them with parameters of latent variables as observed data. Therefore, it can use Bayesian analysis approach to deal them. Simulated observations generated by Gibbs sampler from the joint posterior distribution can be directly applied to forecasts and/or smoothing.
This paper is organized as follows. In Section 2, we describe the problem under consideration. Section 3 gives the estimation procedure and explores the Bayesian approach in detail. For model selection, Bayes factor and Bayesian forecasting are introduced in Section 4. Simulation study and a real example are given in Section 5. Some concluding remarks are presented in Section 6.
2. Materials and Methods about Model
In this section, the problem and model structure are presented as follows, which will be considered throughout the paper. For , , let denote the number of living tree within the th plot in the th month. We assume that follows the binomial distribution with the unknown survival rate where is the total number of tree being investigated at time in theth plot. Further, the probability is in relation to the covariates via the following link function: in which “” denotes the function, is a vector of fixed covariates, is a regression parametric vector, is the common factor, and is the unique error with distribution where is the diagonal matrix. The factor loading vector is , which measures the effect of on the functions of the response probability. Moreover, it assumes that satisfies the following one-order autoregressive model where and is the iid random errors distributed according to the normal distribution (). For the initial variable , we assume that with mean and variance . and are mutually independent and both are independent of . In this paper, we treat and to be known and fix them at some preassigned values.
Let , . Then, (1), (2), and (3) can be reformulated as follows: The main aim of introducing the latent variables is to characterize the temporal correlation between the consecutive observations and explain the interrelationships among the response probability. To give more details, note that for , and , the correlation between and is given by in which A path diagram of latent variables and manifest variables at time and is given in Figure 1. Following the conventions of path diagrams, the ellipses represent the latent variables, the rectangles denote the observed measurements, and the arrows represent the direct effect.
For ease of exposition, let , , and , which denote the vector of random observations, the vector of latent variables, and the vector of covariates cross items at time , respectively. Let be the collection of the observed data, let be the collection of the factor variables, let be the collection of the latent variables, and let be the vector of unknown parameters in , , , , and involved in the model and then the joint distribution of () conditioning on is given by In the framework of Bayesian analysis, the following priors for unknown parameter vector are used to complete Bayesian specifications: and in which , , , , , , , and are the known superparameters.
3. Parameters Estimation via Gibbs Sampler with MH Algorithm
For the Bayesian analysis, we are required to evaluate the complicated posterior distribution which involves the high-dimensional integrals. Data augmentation  technique is used to cope with the posterior analysis in relation to the complicated . Specifically, the observed data are augmented with the latent quantities in the posterior analysis. A sequence of random observations will be generated by the Gibbs sampler , coupled with the Metropolis-Hastings algorithm [10, 11] from the joint posterior distribution , specifically, at the th iteration with current values . We do the following:(i)draw from ;(ii)draw from ;(iii)draw from ;(iv)draw from ;(v)draw from ;(vi)draw from .
It has been shown [9, 29] that under mild conditions and, for sufficiently large , such as , the joint distribution of converges at an exponential rate to the desired posterior distribution . Hence, can be approximated by the empirical distribution of , where is chosen to give sufficient precision to the empirical distribution. The convergence of the hybrid Gibbs sampler with MH algorithm can be monitored by the “estimated potential scale reduction (EPSR)” values as suggested by Gelman and Rubin . To implement Gibbs sampler with MH algorithm, it requires the full conditional distributions for each parameters and latent quantities. For saving space, these technical details are omitted.
Simulated observations obtained from the posterior values can be used for statistical inference via straightforward analysis procedure. For brevity, let be the random observations of generated by the Gibbs sampler from . The joint Bayesian estimate of and can be obtained easily via the corresponding sample means of the generated observations as follows: The consistent estimates of covariance matrix of () can be obtained as follows:
4. Model Selection and Bayesian Forecasting
Model selection is an important issue for Bayesian generalized logistic analysis since it is of interest to determine whether the latent variables are involved or whether the significant effects exist among the competing covariates. Consider the problem of comparing competing models and which are nested or nonnested. Let and denote the marginal density of the data under and , respectively. A popular choice for selecting models (e.g., [16, 18]) is achieved via the following Bayes factor: where and are the prior density of and the joint probability density of given the values of under , respectively. Kass and Rafter  provide the following categories to furnish appropriate guidelines in Table 1.
Computing or log involves the high-dimensional integrations of likelihood with respect to which is rather difficult. Various techniques are explored to address this problem (see ). Among easy-to-construct, we consider the path sampling method . The core of path sampling is to construct a series of linked models which link up the competing models.
Specifically, consider a class of densities indexed by a continuous parameter in such that corresponding to and to . Note that where is the normalizing constant of given by A path using the parameter in is constructed so that . Taking logarithm and differentiating (13) with respect to give in which denotes the expectation with respect to the distribution and Hence, where are the fixed grids such that , , and in which are observations sampled from . The MCMC method proposed for estimation produced can be directly applied to simulate the above observations for computing or 2log .
With all the complete conditionals available for sampling, a predictive or estimated values for a future can be obtained, provided that was available, which offers a solution to the so-called forecasting problem. More generally, the predictions of the future observations can be obtained in terms of the observed data and the covariates . As mentioned in introduction, we treat as missing data and augment them with , and and data in the posterior analysis. Gibbs sampler now is implemented to sample observations from the posterior . Predictive values for are given by which can be solved by where are the random observations generated by the Gibbs sampler with MH algorithm from . Consistent estimates of covariance matrix of (18) can be obtained via sample covariance matrix.
5. Results of Experiment
5.1. Simulation Study
A simulation study is presented to assess the performance of our proposed procedure. The data set is simulated from the model defined by (1), (2), and (3) with , , and . The true population values of the unknown parameters are taken as , , , , , and in which elements with asterisk are treated as fixed known parameters for model identification (see ). For , , we first sampled independently the observations from the normal distribution to form the covariates , where is a identity matrix, and then generate from model (3) with . Based on these settings, we generate the observations , from the binomial distribution Bin with . Priors inputs for the hyperparameters in the prior distributions are taken as (I) , , , , , , , and ; (II) and are set to be equal to the true values, respectively; , , , , , and . Note that prior (I) approaches the noninformative prior and prior (II) gives the informative prior.
At first, four parallel methods are conducted for different starting values as a pilot study to obtain some idea about the number of the Gibbs sampler iterations in getting convergence. For MH algorithm, the acceptance rate is also adjusted in posterior sampling to give about 40% . Hence, in the following analysis, experiment always takes such values for the tuned parameters. Figure 2 gives the estimated potential scaled reduction (EPSR) values  of the parameters against the number of the iterations in Gibbs sampler under prior (I).
In Figure 2, it can be seen clearly that all these EPSR values are less than 1.2 in about 2000 iterations. To be conservative, we collect 5000 observations after 5000 “burn-in” in computing the absolute bias (BIAS) and the root mean squares (RMS) of the estimates and the true values in 100 replications. Results obtained from the Gibbs sampler are summarized in Table 2, where SD denotes the standard deviation of the estimates. Based on these results, it can be seen that our proposal is rather accurate and effective. It also shows that there are no significant differences of estimates between prior (I) and prior (II). Therefore, the proposal method is more robust against the choices of values of hyperparameters. An interesting phenomenon is that the estimates of variance parameters in unique errors under prior (I) are more accurate than those under prior (II). This reflects that it provides more information in estimation.
For model comparison, we consider the following competing models to illustrate the performance of the Bayes factor.
is the aforementioned model given by (4). Consider Note that is nested to in the sense that all the parameters in model are included in model . indicates that no temporal correlation exists among the random effects. The linked model between and is taken as in which and correspond to and , respectively. For computation, we choose in and draw 5000 observations after 5000 burn-ins deleted from the posterior distribution for each to calculate the log . Figure 3 gives the histogram of the values of log across 100 replications. The estimate of log is 11.2222 with standard deviation 6.6242. Based on the guidelines given by Kass and Raftery , is selected with positive evidence.
To investigate the smoothing and forecasting of our proposal, we regenerate a data set with sample size 100 and treat the last 5 × 4 observations as unobserved. The Gibbs sampler with MH algorithm is used to compute the first ninety-five smoothing values and the last five forecasting values of the latent variables . Figure 4 gives the true values and smoothing/forecasting values based on 5000 simulated observations after deleting first 5000 observations. It can be seen that it fits well between the estimated values and true values.
5.2. Pika’s Data
An application is considered in relation to a study of Guo et al., 2009 , to illustrate the developed methodology. The data set is collected from Qinshui Forest farm of Jincheng Coal Industry Group sited on the northwest of Qinshui County of Jincheng City in Shanxi Province, 112°19′40′′–112°19′52 east longitude and 35°48′43′′–35°48′57 north latitude. Details about experiment design can be found in Guo et al., 2009 . The original data set is constituted of the number of the living tree and pika within four plots in five months. The primary concern is to assess the relationship between the survival rate of young plantation and the pika population and model the optimal pika population per hectare. The data set is reanalyzed by Xia et al.  to investigate the correlations among the observed responses. In these studies the latent variables model is established through AR model (3) coupled with model (1) and (2), which interprets the correlations among the outcomes resulting from the time dependence.
Firstly, the following hypothesis is considered: . If is true, the model becomes the common generalized logistic model considered by Xia et al. . Path sampling is used to compute log . The linked model is defined similarly as (20). grids were chosen in , and, for each (), 10000 random observations were drawn from the each posterior distributions and the first 5000 observations were deleted in view of the burn-in phase. The logarithm of Bayes factor is 3.0143. It shows that there exists significant evidence against ; hence, our proposal is appropriate.
To assess the effect of forecasts of the proposal methodology, the original data set is divided into two parts: one is formed by the previous sixteen observations in four months within four plots and the other is made up of four observations in the following fifth month. The proposal algorithm computes the predictive values of the proportion of the living tree in the fifth month based on the previous 20 observations. Figure 5 shows the forecasts and actual values of the proportions of the living tree within four plots. It can be seen that fitted values match the actual values closely.
6. Concluding Remarks
Logistic regression model is the most popular model used to interpret the relationship between the responses and covariates and to assess the effect of covariates on the responsible probability. Bayesian analysis of logistic model has received a lot of attention recently; see Johnson and Albert  and Congdon . For example, Choi et al.  discussed a Bayesian statistical inference about missing information on the basis of the logistic regression model. They also used Gibbs sampler to get the estimates and the posterior analysis of the posited model. Since their model is not an AR model, the underlying development is less complicated. Recently, McCormic et al.  proposed an online binary classification procedure based on the dynamic logistic regression and dynamic model averaging. However, their developments are restricted within a single response , which do not need to explore the relationships among the observed variables. Though Xia et al.  establish the Bayesian analysis for generalized logistic model, the essential difference lies in that their latent variables are identically and independently distributed according to normal distribution and, hence, can not capture the temporal correlation.
In the present paper, dynamic factor model is established to characterize the temporal correlation among responses. One contribution in this paper is the development of a feasible estimation procedure for obtaining the Bayesian estimates of the parameters and latent factor scores. The hybrid Gibbs sampler with MH algorithm was implemented to provide a convenient mechanism for implementing our method. Another contribution is the development of test statistics on the Bayes factor for testing hypothesis of the model. Computation of the Bayes factor in the current complex model is on the basis of path sampling. Further, Bayesian forecasting procedure which takes advantages of the full conditional distribution is presented. Results from empirical studies indicate that these procedures can be usefully applied to real studies.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The paper is supported by the Natural Science Foundation (no. 61306046) and the Anhui Provincial Educational Natural Science Foundation (no. KJ2013A177).
- D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, John Wiley & Sons, New York, NY, USA, 2nd edition, 2000.
- A. Agresti, Categorical Data Analysis, John Wiley & Sons, New York, NY, USA, 2nd edition, 2002.
- J. Nelder and R. W. M. Wedderburn, “Generalized linear models,” Journal of the Royal Statistical Society A, vol. 135, pp. 370–384, 1972.
- P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman and Hall, London, UK, 2nd edition, 1989.
- B. C. Wei, Exponential Family Nonlinear Models, Springer, Singapore, 1998.
- K. A. Bollen, Structural Equations with Latent Variables, John Wiley & Sons, New York, NY, USA, 1989.
- S. Y. Lee, Structural Equation Modeling: A Bayesian Perspective, Wiley Series in Probability and Statistics, John Wiley & Sons, New York, NY, USA, 2007.
- A. Skrondal and S. Rabe-Hesketh, Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models, Taylor & Francis, London, UK, 2004.
- S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721–741, 1984.
- N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” The Journal of Chemical Physics, vol. 21, no. 6, pp. 1087–1092, 1953.
- W. K. Hastings, “Monte carlo sampling methods using Markov chains and their applications,” Biometrika, vol. 57, no. 1, pp. 97–109, 1970.
- J. O. Berger, Statistical Decision Theory and Bayesian Analysis, Springer Series in Statistics, Springer, New York, NY, USA, 2nd edition, 1985.
- M. Aitkin, “Simpson's paradox and the Bayes factor,” Journal of the Royal Statistical Society B, vol. 60, no. 1, pp. 269–270, 1998.
- A. E. Raftery, “Bayesian model selection in structural equation models,” in Testing Structural Equation Models, K. A. Bollen and J. S. Long, Eds., pp. 163–180, Sage, Newbury Park, Calif, USA, 1993.
- A. E. Raftery, “Bayesian model selection in social research,” in Sociological Methodology, A. E. Raftery, Ed., Blackwell, Oxford, UK, 1995.
- S. Chib, “Marginal likelihood from the Gibbs output,” Journal of the American Statistical Association, vol. 90, no. 432, pp. 1313–1321, 1995.
- A. O'Hagan, “Fractional Bayes factors for model comparison,” Journal of the Royal Statistical Society B Methodological, vol. 57, no. 1, pp. 99–138, 1995.
- R. E. Kass and A. E. Raftery, “Bayes factors,” Journals of the American Statistical Association, vol. 90, pp. 773–795, 1995.
- T. J. DiCiccio, R. E. Kass, A. Raftery, and L. Wasserman, “Computing Bayes factors by combining simulation and asymptotic approximations,” Journal of the American Statistical Association, vol. 92, no. 439, pp. 903–915, 1997.
- A. Gelman and X. Meng, “Simulating normalizing constants: from importance sampling to bridge sampling to path sampling,” Statistical Science, vol. 13, no. 2, pp. 163–185, 1998.
- A. E. Gelfand and D. K. Dey, “Bayesian model choice: asymptotics and exact calculations,” Journal of the Royal Statistical Society B. Methodological, vol. 56, no. 3, pp. 501–514, 1994.
- X. Meng and W. H. Wong, “Simulating ratios of normalizing constants via a simple identity: a theoretical exploration,” Statistica Sinica, vol. 6, no. 4, pp. 831–860, 1996.
- R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME D: Journal of Basic Engineering, vol. 83, pp. 35–45, 1960.
- G. Kitagawa, “Non-Gaussian state-space modeling of nonstationary time series,” Journal of the American Statistical Association, vol. 82, no. 400, pp. 1032–1063, 1987.
- R. J. Meinhold and N. D. Singpurwalla, “Robustification of Kalman filter models,” Journal of the American Statistical Association, vol. 84, no. 406, pp. 479–486, 1989.
- B. P. Carlin, N. G. Polson, and D. S. Stoffer, “A Monte Carlo approach to nonnormal and nonlinear state-space modeling,” Journal of the American Statistical Association, vol. 87, pp. 493–500, 1992.
- A. F. Smith and M. West, “Monitoring renal transplants: an application of the multiprocess Kalman filter,” Biometrics, vol. 39, no. 4, pp. 867–878, 1983.
- M. A. Tanner and W. H. Wong, “The calculation of posterior distributions by data augmentation,” Journal of the American Statistical Association, vol. 82, no. 398, pp. 528–550, 1987.
- L. Tierney, “Markov chains for exploring posterior distributions,” The Annals of Statistics, vol. 22, no. 4, pp. 1701–1728, 1994.
- A. Gelman and D. B. Rubin, “Inference from iterative simulation using multiple sequences,” Statistical Science, vol. 7, no. 4, pp. 457–472, 1992.
- M. K. Cowles, “Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models,” Statistics and Computing, vol. 6, no. 2, pp. 101–111, 1996.
- A. H. Guo, Y. Chen, and L. Y. Fu, “Harm about the artifical growth forest with Pikas,” Journal of Shandong Forestry Science and Technology, vol. 180, no. 1, pp. 54–55, 2009.
- Y. M. Xia, Y. A. Liu, and Z. Fang, “Modeling survival rate of forest through Bayesian generalized logistic regression model,” Journal of the Nanjing Forestry, vol. 34, pp. 47–50, 2010.
- V. E. Johnson and J. H. Albert, Ordinal Data Modeling, John Wiley & Sons, New York, NY, USA, 1999.
- P. Congdon, Bayesian Models for Categorical Data, John Wiley & Sons, New York, NY, USA, 2005.
- T. Choi, M. J. Schervish, and K. A. a. Schmitt, “A Bayesian approach to a logistic regression model with incomplete information,” Biometrics. Journal of the International Biometric Society, vol. 64, no. 2, pp. 424–430, 2008.
- T. H. McCormick, A. E. Raftery, D. Madigan, and R. S. Burd, “Dynamic logistic regression and dynamic model averaging for binary classification,” Biometrics, vol. 68, no. 1, pp. 23–30, 2012.
Copyright © 2014 Yu-sheng Cheng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.