Research Article  Open Access
A Logistic Regression Model with a Hierarchical Random Error Term for Analyzing the Utilization of Public Transport
Abstract
Logistic regression models have been widely used in previous studies to analyze public transport utilization. These studies have shown travel time to be an indispensable variable for such analysis and usually consider it to be a deterministic variable. This formulation does not allow us to capture travelers’ perception error regarding travel time, and recent studies have indicated that this error can have a significant effect on modal choice behavior. In this study, we propose a logistic regression model with a hierarchical random error term. The proposed model adds a new random error term for the travel time variable. This term structure enables us to investigate travelers’ perception error regarding travel time from a given choice behavior dataset. We also propose an extended model that allows constraining the sign of this error in the model. We develop two Gibbs samplers to estimate the basic hierarchical model and the extended model. The performance of the proposed models is examined using a wellknown dataset.
1. Introduction
Understanding the utilization of public transport is important for policy design and urban traffic planning. From a behavior analysis perspective, such utilization can be analyzed in terms of a binary choice problem in which the traveler must choose between public transit and a private mode of transport. Previous studies usually employ logistic regression models to discuss this binary choice problem. These models can be used to predict choice probability and to evaluate the effect of various attitudes on the utilization of public transport. The effects of demography, travel cost, travel time, and accessibility on such utilization can be analyzed through estimating the parameters of these models.
McGillivray [1] proposed the binary choice problem in the twomodal (public and private) case and applied a logistic model to investigate the dependence of modal choice on the individual values of time and cost by modal. Gebeyehu and Takano [2] showed that the logistic model is a useful tool for evaluating transit systems; their study developed a logistic model to investigate citizens’ perceptions of bus conditions in Addis Ababa. Hess [3] attempted to apply a logistic regression model to assist planners and policymakers in expanding the mobility and accessibility of public transport for older adults by analyzing the influence of the accessibility of public transport on ridership for older people in California and New York. Johansson et al. [4] further formulated the binary logit model with latent variables; this structure can examine the effects of attitudes and personality traits on mode choice. Muley et al. [5] and Badland et al. [6] focused on microscopic behavior in the binary choice problem. Muley et al. [5] employed a binary logistic regression model to explore the impact of personal and transit characteristics on the utilization of public transport. Badland et al. [6] adopted a logistic regression model to discuss how parking availability and public transport accessibility influenced the split between uptakes of the two modes. However, Buehler and Pucher [7] studied the binary choice problem from a macroscopic perspective and employed a logistic model to compare utilization of public transport in the US and Germany.
One benefit of using a logistic regression model to analyze the utilization of public transport is that the model can consider the combined effects of attributes through a linear combination and can easily evaluate the contribution of various attributes to such utilization. Previous studies agree that travel time is an important variable for formulating a logistic regression model to analyze this binary choice problem. Although previous studies usually prefer to treat travel time as a deterministic variable in the model, one cannot neglect that travelers’ perception error can prevent travelers from accurately evaluating their actual travel time. On the other hand, travelers also cannot say exactly how long a given travel time, for instance, 10 minutes, actually is. Carrion [8] provided a case analysis for this error using selfreported as well as GPS measured travel times and suggested that perception error regarding travel time can influence travel behavior. Cheng and Tsai [9] studied travelers’ perceptions of travel time through a questionnaire survey. They found that travel time perceptions can be influenced by personal characteristics such as gender and age.
As aforementioned, researchers have recognized the effect of perception errors regarding travel time on travelers’ choice behavior. However, the classic formulation of a random error term in logistic regression models cannot reflect this perception error appropriately (Chen et al. [10]). Analyzing travel behavior using the classic logistic regression model might therefore prevent us from looking for further insights into the utilization of public transport. Hence, this study proposes a hierarchical logistic regression model to fill the gap. Previous studies have successfully applied hierarchical error terms to model heterogeneous unobservable utility in logistic regression models (e.g., Tilahun et al. [11] and Czado and Prokopenko [12]). Inspired by these studies, we adopt hierarchical error terms to solve the problem. However, we add these terms after important attributes such as travel time rather than after the entire utility function in order to capture travelers’ perception error regarding the attributes. This study is not merely an effort to improve the performance of the regression model; more importantly, the proposed model offers an alternative way to estimate the statistical characteristics of perception error regarding travel time and allows us to explore the property of this error. Although this study prefers to focus on modeling perception error regarding travel time, we believe that individuals might have perception errors on other attributes such as travel cost. The proposed model can be easily used to estimate other such errors as well.
We first propose a basic hierarchical model and then develop an extended model in this study. The extended model allows us to constrain the sign of perception error regarding travel time. Correspondingly, this study also develops two Gibbs samplers to estimate the parameters of the proposed models. We evaluate the performance of the proposed models using a wellknown dataset provided by Horowitz [13, 14].
2. The Hierarchical Logistic Regression Model
For simplicity, we describe the proposed model based on a binary choice problem that was provided by Horowitz [13]. In this choice problem, an agent faces making a choice decision between using a private car or public transport. The attributes considered in this choice problem are CARS, DCOST, DOVTT, and DIVTT where DCOST is “public transport fare minus private car travel cost,” CARS is “private cars owned by the traveler’s household,” DOVTT is “public transport outofvehicle time minus private car outofvehicle time,” DIVTT is “public transport invehicle time minus private car invehicle time.”
We formulate the choice problem through a logistic regression model. If we present this logistic regression model as a latentvariable model, then the model can be obtained as follows:where and are the indexes of individual and survey questions, respectively. denotes that the choice result is private car and is otherwise public transport. The logistic regression model uses a random error term to present stochastic behavior; however, this formulation does not allow us to investigate the perception error of attributes. In this study, we introduce a random variable into the logistic regression model. Here denotes the random perception error on DIVTT for individual . We consider that the value of can be different among individuals and that follows a probability distribution. The logistic regression model with a hierarchical random error term can be obtained as follows:In the above equation, we use to denote the parameter vector . In the proposed model, is a latent random variable, which serves as the random error term for DIVTT rather than the random term of the utility function. This structure enables us to investigate the perception error of DIVTT. We assume that follows the normal distribution with unknown mean and specify the variance of the normal distribution as for identification purposes. and denote the prior distributions of and , respectively. We wish to estimate the value of along with . In this hierarchical model, DIVTT is divided between and ; this leads to DIVTT no longer being a deterministic variable in the model.
The logistic regression model shown by (2) can be further extended. Here, we consider that indicates that individual subjectively weakens the difference of invehicle travel time when the agent makes the choice decision; on the other hand, suggests that the individual subjectively enlarges the difference of invehicle travel time. In addition, we impose the value of to be a nonnegative (nonpositive) value if (≤0). Following these considerations, we rewrite of (2) aswhereIn (4), if then the value of is 1 or else 0.
As shown by (2) and (3), the combination of and leads to a hierarchical random error term structure; this structure can be presented as a Directed Arc Graph (see Figure 1).
3. Estimation Methods
First, we discuss how to estimate parameters of the model shown by (2). Stefanski [15] indicated that the logistic distribution can be represented as a normal scale mixture. Accordingly, Holmes and Held [16] suggested an auxiliary variable method to present the logistic regression model. Along the same lines, the regression model presented by (2) can also be derived in the following form: and are the parameters of the proposed model., , and are the latent parameters of the proposed model. is the joint distribution of these parameters, which can be uniquely identified by (2). and are the prior distributions for and , respectively. KS denotes the KolmogorovSmirnov distribution.
We develop a Gibbs sampler to draw the samples of , , , , and from the joint distribution and calculate the estimates of the parameters through aggregating the samples. Let , , and denote the vector , , and , respectively. We use to denote the indexes of the sample. The outline of the Gibbs sampler can then be given as follows:
Algorithm 1.
Step 0 (initialization). Set initial values: , , , , and .
Set .
Step 1. Draw th sample of Step 2. Draw th sampler of Step 3. Draw th sampler of Step 4. Draw th sample of Step 5. Draw th sample of If then go to Step 1; otherwise, stop the sampling procedure.
The set of conditional distributions , , , , and is referred to as the full set of conditional distributions of . Sampling , , , , and in turn from the conditional distributions is equivalent to sampling these parameters from simultaneously. We then illustrate how the Gibbs sampler draws th samplers for , , , , and in detail.
Algorithm 2.
Step 1. Update for drawing th samplers for to do for to do end end
Step 2. Draw th sample for .
The conditional distribution is equivalent to . On replacing in Albert and Chib [17] by , the sample of can be generated through the sampling scheme proposed by Albert and Chib [17]. As shown by them, can be specified as a diffuse distribution, and can be derived as a normal distribution.
Step 3. Draw th sample for .
The conditional distribution is equivalent to . Albert and Chib [17] indicated that is a truncated normal distribution.
Step 4. Draw th sample for .
The conditional distribution is equivalent to . Holmes and Held [16] proposed a rejection sampling scheme that can be used to draw samples from .
Step 5. Draw th sample for .
can be generated through sampling in turn. The conditional distribution of is equivalent to . The probability density function of is a normal distribution and is defined by (11)–(13).
Step 6. Draw th sample for .
The conditional distribution is equivalent to . The probability density function of is a normal distribution and is given by (16).
Applying the Bayesian theorem, we obtain as a normal distribution. Without loss of generality, we only present how to obtain for the case for : The above equation shows that the conditional probability of is a normal distribution. The mean of the distribution isand the variance of the distribution isIn (12), representsWe consider the prior distribution of as a normal distribution with mean 0 and variance . If we further consider the prior distribution as a diffuse prior then this probability function of the prior can be given asAccordingly, the conditional distribution can be obtained asThis conditional distribution is a normal distribution too; the mean of the distribution is and the variance of the distribution is .
Now, we discuss how to estimate the parameters of the extended model described by (3). In this case, we can also employ Steps 1 to 3 and 5 of Algorithm 2 to draw the samples of , , , and . We need only to use and to replace and in Step 1, respectively, where is defined by (4) and is defined as . On the other hand, instead of Step 4, we develop an MH sampling scheme to draw.
We derive the formulation of aswhere is the probability density function of the normal distribution with mean and variance 1. Following (3), we get aswhere is defined by (3). The HM sampling scheme is as follows.
Algorithm 3.
Step 0 (initialization). Set .
Step 1. Generate a candidate value for .
Draw from a normal distribution: , where denotes the candidate value for.
Step 2. Calculate the value of Step 3. Get , if then else . If then and go to Step 1.
The proposed model structure also allows us to further investigate traveler’s perception error regarding both DIVTT and DOVTT. To do this, we just need to modify (3) as follows:whereWe use to denote the traveler’s perception error on DOVTT and use to denote the mean of . Let be ; the outline of the sampling scheme for th sample of , , , , , , and can be described as follows.
Algorithm 4.
Step 1. Consider .
Step 2. Consider .
Step 3. Consider .
Step 4. Consider .
Step 5. Consider .
Step 6. Consider .
Step 7. Consider .
, , and can be generated along the same line of Algorithm 2. Since and hold the same formulation, therefore, and can be sampled the same as and (see Algorithm 2 and (16)).
To estimate the parameters of the proposed model, we can apply the sampling algorithm to draw random samples for , , , , , , and . These samples can be denoted as , , , , , , and for to , where is the number of drawings. The estimates of the parameters can be obtained through averaging the samples. For example, the estimates of and can be obtained as follows: and reflect the perception error estimate regarding DOVTT and DIVTT, respectively.
4. Numerical Example
In this section, we use a modal choice dataset provided by Horowitz [13] to examine the performance of the proposed models. This dataset was collected in Washington D.C. and contains 842 persons’ modal choice results (private car or public transport) for the daily trip from home to work. Table 1 provides the data structure of the dataset.

We first estimate the hierarchical model defined by (2) using the dataset. We conduct the estimation using the Gibbs samplers. The Gibbs samplers generate 20,000 samples for each parameter. The first 5,000 samples are dropped as the burnin procedure. The remaining samples are used to aggregate the estimates of the parameters. We use Table 2 to show the estimation results. The sign of the estimates of , , and suggests that DCOST, DOVTT, and DIVTT have a positive effect on the choice probability of car (and have a negative effect on the utilization of public transport). The value of DIC of the model is 465.764.

We estimate the parameter of the extended model through the sampling scheme with HM step. We draw 20,000 samples of the parameter and also treat the first 5,000 samples as the burnin procedure. The DIC of the extended model is 461.687. This result indicates that the performance of the extended model is better than that of the basic model defined by (2). We also estimate the parameters of the logistic regression model defined by (1). The value of DIC of this model is 465.848. This value is slightly higher than the DIC of the basic model defined by (2) and clearly higher than the DIC of the extended model defined by (3).
The samples of the parameters are used to investigate the shapes of the distributions of the parameters. Table 3 reports the estimates of the parameters of the extended model. Figures 2 and 3 provide the histogram for and of the extended model. The distributions of the parameters have symmetrical sides and one single peak.

In addition to the improvement of the performance, the main contribution of the proposed model is that it can allow us to capture the perception error on travel time and analyze the characteristics of the perception error. Let us look at the extended model. As shown by Table 3, the estimate of is −1.022; this result implies that people tend to weaken the difference of invehicle travel time when making the choice decision. The sampling schemes also draw samples of for to . The latent parameter directly reflects the perception error on DIVTT of individual .
To further investigate the property of of the extended model, we group these samples according to the value of DCARS, which is related to the demography of an individual. We calculate the mean and variance of the samples of for each group. Figure 4 plots the mean of the samples, and Figure 5 shows the variances of the samples versus the value of DCARS. Since the values of DCARS in the dataset do not contain 6 and only onerecord DCARS is equal to 7, we study the case where the values of DCARS range from 1 to 5 to avoid bias.
One can find that the mean of the samples of tends to be close to 0 as DCARS increases. On the other hand, the variance decreases with the value of DCARS. This result implies that travelers who hold more cars will be more sensitive to the difference of travel time between public transport and private car. For example, the mean of for travelers who own 5 cars is −0.357; this value can change to −1.5623 for travelers who do not own any cars. As mentioned above, indicates that individual subjectively weakens the difference of travel time when he/she makes the choice decision; therefore, if , a smaller can reduce the contribution of the difference of travel time on the travel mode choice decision even more.
To investigate traveler’s perception error regarding both DIVTT and DOVTT, we also use Algorithm 4 to estimate the model defined by (20). Table 4 shows that the DIC of the model is significantly smaller than the standard logistic model defined by (1). In the model defined by (3), traveler’s perception error regarding travel time can be characterized by the estimate of. On the other hand, in the model defined by (20), traveler’s perception error regarding travel time should be characterized by the summation of the estimates of and . Looking at Table 4, one can find that the summation of the estimates of and is −1.011. This value is closed to the estimate of (−1.022) in the model defined by (3).

5. Conclusions
This study proposes a logistic regression model with a hierarchical random error term to analyze the binary choice problem. The proposed model can account for travelers’ perception errors regarding attributes. Since a number of studies have shown perception error regarding travel time to have a significant impact on modal choice, this study focuses in particular on how to capture this error from behavior data.
We construct a hierarchical random error term structure in the logistic regression model though introducing a random error term for the travel time variable. In the proposed model, travel time is no longer a deterministic variable. To make the model more sensible, we also propose an extended model, where the sign of the DIVTT and DOVTT variables can be constrained. We develop a Gibbs sample to estimate the basic hierarchical model, while developing a Gibbs sampler with MH step to estimate the parameters of the extended model. The binary choice dataset provided by Horowitz [13] is used to examine model performance and demonstrate the proposed model’s benefit. The numerical example shows that the proposed model can estimate the perception error regarding travel time exactly and that the samples of and can be used to further analyze the characteristics of this error. Although this study prefers to look at the perception error regarding travel time, the proposed methodology can also be used to investigate travelers’ perception error regarding travel cost, among others.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is jointly supported by the National Basic Research Program of China (no. 2012CB725403), the Fundamental Research Funds for the Central Universities (no. 2014JBM056), and the National Natural Science Foundation of China (no. 51408035).
References
 R. G. McGillivray, “Demand and choice models of modal split,” Journal of Transport Economics and Policy, vol. 4, no. 2, pp. 192–207, 1970. View at: Google Scholar
 M. Gebeyehu and S. Takano, “Diagnostic evaluation of public transportation mode choice in Addis Ababa,” Journal of Public Transportation, vol. 10, no. 4, pp. 27–50, 2007. View at: Publisher Site  Google Scholar
 D. B. Hess, “Access to public transit and its influence on ridership for older adults in two U.S. cities,” Journal of Transport and Land Use, vol. 2, no. 1, pp. 3–27, 2009. View at: Publisher Site  Google Scholar
 M. V. Johansson, T. Heldt, and P. Johansson, “The effects of attitudes and personality traits on mode choice,” Transportation Research Part A: Policy and Practice, vol. 40, no. 6, pp. 507–525, 2006. View at: Publisher Site  Google Scholar
 D. Muley, J. Bunker, and L. Ferreira, “Investigation into travel modes of TOD users: impacts of personal and transit characteristics,” International Journal of ITS Research, vol. 7, no. 1, pp. 3–13, 2009. View at: Google Scholar
 H. M. Badland, N. Garrett, and G. M. Schofield, “How does car parking availability and public transport accessibility influence workrelated travel behaviors?” Sustainability, vol. 2, no. 2, pp. 576–590, 2010. View at: Publisher Site  Google Scholar
 R. Buehler and J. Pucher, “Demand for public transport in Germany and the USA: an analysis of rider characteristics,” Transport Reviews, vol. 32, no. 5, pp. 541–567, 2012. View at: Publisher Site  Google Scholar
 C. Carrion, Travel time perception errors: causes and consequences [Ph.D. thesis], The University of Minnesota, Minneapolis, Minn, USA, 2013.
 Y.H. Cheng and Y.C. Tsai, “Train delay and perceivedwait time: passengers' perspective,” Transport Reviews, vol. 34, no. 6, pp. 710–729, 2014. View at: Publisher Site  Google Scholar
 A. Chen, Z. Zhou, and W. H. K. Lam, “Modeling stochastic perception error in the meanexcess traffic equilibrium model,” Transportation Research Part B: Methodological, vol. 45, no. 10, pp. 1619–1640, 2011. View at: Publisher Site  Google Scholar
 N. Y. Tilahun, D. M. Levinson, and K. J. Krizek, “Trails, lanes, or traffic: valuing bicycle facilities with an adaptive stated preference survey,” Transportation Research Part A: Policy and Practice, vol. 41, no. 4, pp. 287–301, 2007. View at: Publisher Site  Google Scholar
 C. Czado and S. Prokopenko, “Modelling transport mode decisions using hierarchical logistic regression models with spatial and cluster effects,” Statistical Modelling, vol. 8, no. 4, pp. 315–345, 2008. View at: Publisher Site  Google Scholar  MathSciNet
 J. L. Horowitz, “Semiparametric estimation of a worktrip mode choice model,” Journal of Econometrics, vol. 58, no. 12, pp. 49–70, 1993. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 J. L. Horowitz and N. E. Savin, “Binary response models: logits, probits and semiparametrics,” Journal of Economic Perspectives, vol. 15, no. 4, pp. 43–56, 2001. View at: Publisher Site  Google Scholar
 L. A. Stefanski, “A normal scale mixture representation of the logistic distribution,” Statistics & Probability Letters, vol. 11, no. 1, pp. 69–70, 1991. View at: Publisher Site  Google Scholar
 C. C. Holmes and L. Held, “Bayesian auxiliary variable models for binary and multinomial regression,” Bayesian Analysis, vol. 1, no. 1, pp. 145–168, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 J. H. Albert and S. Chib, “Bayesian analysis of binary and polychotomous response data,” Journal of the American Statistical Association, vol. 88, no. 422, pp. 669–679, 1993. View at: Publisher Site  Google Scholar  MathSciNet
Copyright
Copyright © 2015 Chong Wei et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.