#### Abstract

With the rapid development of urbanization and motorization, urban commute trips are becoming increasingly serious due to the unbalanced distribution of residence and workplace land-use types in most Chinese cities. To explore the inherent interrelations among residence location, workplace, and commute trip, an integrated model framework of joint residence-workplace location choice and commute behavior is put forward based on the personal trip survey data of Beijing in 2005. First, to extract households’ different choice characteristics, this paper presents a latent class model, clusters all households into several groups, and analyzes the conditional probability of each group. Second, the paper integrates the residence location and workplace together as the joint choice alternative, employs the socioeconomic factors, individual attributes, household attributes, and trip characteristics as explanatory variables, and formulates the joint residence-workplace location choice model using mixed logit method. Estimations of the latent class model show that four latent groups fit the data best. Further results of the joint residence-workplace location choice model indicate that there exist significantly different choice characteristics in each latent group. Generally, the integrated model framework outperforms traditional location choice methods.

#### 1. Introduction

In most Chinese cities, with the rapid development of urbanization and motorization, the density of urban land-use is increasing very fast, and the spatial distribution of residence location and workplace is turning to be unbalanced. As a result, the urban transportation systems, especially the commute trips, are facing more and more serious problems.

During the past two decades, integrated models of urban land-use and transportation systems have been studied extensively, especially the residential location choice models using decision behavior approaches. As a competitive tool, the discrete choice model was used widely in the location choice models. Lerman (1976) [1] introduced household car ownership, housing type, and mode to work to the residential location choice and formulated a logit model. Freedman and Kern (1997) [2] studied both workplace and residence locations in two-earner households. To make the model closer to the reality, McFadden (1978) [3], Boots and Kanaroglou (1988) [4], Gabriel and Rosenthal (1989) [5], Waddell (1993) [6], Abraham and Hunt (1997) [7], Ben-Akiva and Bowman (1998) [8], Deng et al. (2003) [9], Hunt et al. (2004) [10], Bhat and Guo (2004) [11], Waddell et al. (2007) [12], Jiao and Harata (2007) [13], Vega and Reynolds-Feighan (2009) [14], and Li et al. (2010) [15] presented different models of household residence location or workplace choices, as well as household members’ activity and travel schedules. All the above work made use of discrete choice methods. According to the modeling techniques, existing researches can be classified into the following four categories: multinomial logit model (MNL), nested logit model (NL), generalized extreme value model (GEV), and mixed logit model (ML). The MNL model is the most widespread in the pioneering work due to its simplicity; however, it assumes the independence of irrelevant alternatives (IIA), for instance, Gabriel and Rosenthal (1989) [5]. The NL model allows for the correlations between the alternatives in each “nest”; however, the alternatives in different “nests” are still independent, for example, Boots and Kanaroglou (1988) [4], Abraham and Hunt (1997) [7], Deng et al. (2003) [9], and Hunt et al. (2004) [10]. The GEV model was first developed by McFadden (1978) [3] and was applied into the job-housing balance study by Vega and Reynolds-Feighan (2009) [14] due to its flexible structure and correlated alternatives. The ML model is the most flexible in the structure, and most existing researches have used it to incorporate the random taste variations of different households, as well as the spatial correlations among different land locations.

More recently, Li et al. (2014) [16] presented a multiobjective optimization model of residential distribution using operations research method; Ibeas et al. (2013) [17] and Jiao et al. [18] also proposed some location choice models to formulate spatial interactions between residence location and workplace. Furthermore, Jiao et al. [18] combined the residence location and workplace together as the choice alternative and put forward a joint residence-workplace location choice model. The case study also indicated the advantages of the joint choice model.

With some models focusing on location choice of specific household types (such as two-earner or single worker households), most of the above works have analyzed the location choice behavior using data of all kinds of households. However, there should exist different choice characteristics for different households. Jiao and Harata (2007) [13] clustered households into several groups according to their life styles and analyzed the residential location choice behavior of each group. Nevertheless, the simple clustering analysis method is too straightforward to extract the inherent characteristics of different households.

Fortunately, the latent class model (LCM) has been used in the category analysis widely. Lazarsfeld (1950) [19] first formulated a latent class model. Goodman (1974) [20] developed an algorithm to estimate the model parameters and made the model more applicable in practice. Haberman (1979) [21] analyzed the relations between latent class model and log-linear model. Vermunt (1997) [22] proposed a general latent class model for categorical data analysis with discrete latent variables. Actually, the LCM have been generally used in the researches about health behavior (Lanza and Rhoades, 2013 [23]) and social interactions (Harvey and Taylor, 2000 [24], Arentze and Timmermans, 2008 [25]), but it was rarely applied in the analyses of urban land-use and transportation systems.

To analyze the different residence location and workplace choice characteristics according to household types, one key feature of this paper is to formulate a latent class model and to extract the inherent household groups. Another key feature is to further combine the residence location and workplace together as the choice alternatives and present the joint residence-workplace location choice models for each latent class using mixed logit methods.

The rest of this paper consists of the following contents. The general model framework is proposed in Section 2, including both the latent class model and the mixed logit model. The latent class model for household clustering is formulated in Section 3. The integrated model of joint residence-workplace location choice is put forward in Section 4 based on the combined choice alternatives. Both models are estimated using the personal trip survey data of Beijing in 2005, and the estimation results are reported and analyzed in Section 5. Conclusions and potential future researches are summarized in Section 6.

#### 2. General Model Framework

The theory of LCM is based on the probability distribution principles and log-linear models, with the objective to explain the interrelations among manifest variables using the least latent categories and to achieve the local independence. The LCM is mainly used to analyze the categorical data. Compared with continuous variables, the biggest difference of categorical variables is that their values are discrete, with each value denoting different attribute or classification, for instance, gender, residence location, trip mode, and so forth.

Mixed logit (ML) model is a kind of discrete choice model. To assume the parameters subject to some random distributions, it is capable of incorporating the random taste variations of different households, as well as the spatial correlations among different land locations. Therefore, the ML model is widely used in location choice researches.

##### 2.1. Structure of Latent Class Model

The LCM is a kind of model to transform the probabilities of categorical variables to some parameters, that is, probabilistic parameterization. There are two kinds of categorical variables in classical LCM: manifest variable and latent variable. Meanwhile, there are two groups of parameters: latent class probability and conditional probability.

The manifest variable can be observed directly, for example, time, distance, and so on. It is also called observable variable or measureable variable. However, the latent variable cannot be observed directly, for instance, psychological expectation, individual preference, and so forth.

A latent class model can be formulated aswhere is the joint probability of the LCM; , , and denote three manifest variables; is the latent class probability, which means the probability of latent variable in class , , ; , , and are three conditional probabilities; shows the conditional probability of latent class on manifest variable in level : that is, , ; indicates the conditional probability of latent class on manifest variable in level : that is, , ; denotes the conditional probability of latent class on manifest variable in level : that is, , ; is a probability function.

##### 2.2. Structure of Mixed Logit Model

Similar to MNL model, NL model, and GEV model, the mixed logit model is also based on the assumption of random utility maximization. With rather flexible formulation in the structure, it mainly has the following advantages: there is no IIA property in the model; the error item of the utility function can be subject to any random distribution, which removes the constraint of Gumbel distribution in logit model or normal distribution in probit model; the estimated parameters are subject to some kind of random distribution, which incorporates the taste variations of different decision makers.

Similar to our previous work [18], using to denote the utility function for decision maker to select alternative , then it can be divided into two items: the systematic item and the random item ; that is,

To incorporate random taste variations in the model, is further formulated as below:where and are two kinds of parameters to be estimated; is the fixed parameter, just like MNL model; is the unfixed parameter following some random distribution to incorporate the random taste variations; and are explanatory variables corresponding to and , respectively; is the total number of variables corresponding to ; is the total number of variables corresponding to .

Based on the fundamental theory of discrete choice analysis, the mixed logit model can be formulated aswhere is the probability for decision maker to select alternative and is the total number of alternatives.

Therefore, the unconditional probability for decision maker to select alternative can be further formulated aswhere is the unconditional probability for decision maker to select alternative and is the density function which the unfixed parameter follows.

#### 3. Latent Class Model for Household Clustering

To explore the inherent characteristics of urban residence location choice and workplace choice, this paper further formulates the latent class model for commute trips based on the personal trip survey data of Beijing in 2005. The study area is divided into eight zones according to the urban districts: Xicheng, Dongcheng, Chongwen, Xuanwu, Haidian, Chaoyang, Fengtai, and Shijingshan. Based on the thorough analyses of influence factors of residence location and workplace choices, we introduce the following five variables into the LCM: residence location, workplace, commute distance, commute mode, and household monthly income. Here residence location, workplace, and commute mode are discrete variables; however, commute distance and household monthly income are continuous in nature. For convenience, these two continuous variables are also discretized and transformed to categorical variables. For the important commute mode, we mainly select five modes, that is, walk, bicycle, bus, subway, and car.

Variables in the latent class model are summarized in Table 1.

Based on the variables in Table 1, the latent class model is formulated aswhere is the latent variable; is the number of latent classes; denotes the latent class probability; , , , , and show the manifest variables; , , , , and are levels of manifest variables, respectively; , , , , and are conditional probabilities; is the conditional probability of latent class on manifest variable in level : that is, ; is the conditional probability of latent class on manifest variable in level : that is, ; is the conditional probability of latent class on manifest variable in level : that is, ; is the conditional probability of latent class on manifest variable in level : that is, ; is the conditional probability of latent class on manifest variable in level : that is, ; is the joint probability of the LCM.

Based on the above symbols and equations, we can further formulate the conditional probability of latent variable as

Using (7), we can obtain the conditional probability of latent variable. Every observation is assigned to the corresponding latent class according to the largest magnitude in the value of . Therefore, for urban residence location and workplace choice problems, households will be clustered into several groups logically using the above LCM method, and each group will have different choice characteristics.

#### 4. Integrated Model of Joint Residence-Workplace Location Choice

Based on the clustered household groups from the LCM, we can further formulate residence location and workplace choice models for each kind of household using mixed logit method, just like our previous work [18]. Explanatory variables are summarized in Table 2, including house renting price, commute distance, commute time, household monthly income, population density, number of employment opportunities, and GDP of workplace.

In this model, we assume that households make residence location and workplace choices simultaneously, that is, the joint location choice. Since there are eight zones in the study area, totally we have 64 choice alternatives; that is, each residence-workplace location pair denotes one alternative.

Based on many estimation experiments, commute distance and commute time are assumed to be corresponding to the unfixed parameters . Here is assumed to follow a logarithmic normal distribution. Therefore, the integrated model of joint residence-workplace location choice is presented aswhere is the total number of choice alternatives, follows the logarithmic normal distribution in (9), and definitions of other variables are the same as those stated before:where is the random variable and and are the expectation and variance of , respectively.

Using (8) and (9), we can formulate every household group’s joint residence-workplace location choice behavior.

#### 5. Estimation Results

Based on the personal trip survey data of Beijing, we can estimate the integrated model of joint residence-workplace location choice and commute behavior.

##### 5.1. Estimation Results of Latent Class Model

Parameter estimation of latent class model is usually implemented using two kinds of iterative algorithms based on maximum likelihood method: expectation maximization algorithm and Newton-Raphson algorithm. The iterative process consists of two steps: the first step is to achieve the maximized value from a starting number, which is taken as the initial estimation value in the algorithm and the second step is to estimate again from the result of the first step, until the process arrives at the accuracy requirement.

To obtain the LCM with best-fit ratio, this paper makes use of the maximum likelihood method for estimation. The progress of estimation begins by fitting a complete independence model with and then iteratively increases the number of latent classes one by one until an appropriate is achieved. The likelihood ratio chi-square value will increase with the increase of differences between observed values and expected values. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are employed to evaluate the model and to determine the appropriate number of latent classes . For the LCM, the fit ratio increases with the decrease of AIC and BIC.

Based on several estimation experiments, the fit criteria of the proposed LCM are summarized in Table 3.

From Table 3, one can find out that the fit criterion of the model is the best while ; that is, we totally get 4 latent classes.

The detailed parameter estimation results are further summarized in Table 4.

From Table 4, we can find out the following results.

*Class 1*. Households mainly reside in Haidian and Chaoyang districts and wok in the same zones. Their commute distances are rather long, with 55.3% within 8 kilometers, and 29.6% between 8 and 16 kilometers. The frequently used commute modes are bicycle, bus, and car. Household income is rather high. Furthermore, from the latent class probability, the proportion of this latent class is the biggest.

*Class 2*. Households mainly reside in Xicheng and Dongcheng districts. Some of them work in zones close to home, while some others tend to work in Haidian and Chaoyang districts. The possible reasons are that there are many scientific, technological, and cultural zones in Haidian district, and the central business district (CBD) locates in Chaoyang district. With higher income than class 1, their commute distances are also longer than the first household group. Meanwhile, the most frequently used commute mode is car, and the second one is bus.

*Class 3*. Most households reside in Dongcheng and Chaoyang districts and work in the same area; that is, these households mainly live and work in the east area of Beijing. Therefore, the commute distance is the shortest, with 87.7% within 8 kilometers. Meanwhile, the frequently used commute modes are walk and bicycle, which are most appropriate for short distance trips. Almost no household uses subway; however, the percentage of car is about 20%. The possible reasons are that there was few subway lines under operation when the survey was carried out, and some people tend to use car due to rather high income.

*Class 4*. Most households reside in Fengtai and Shijingshan districts and work in the same zones. Besides, some households tend to live and work in Chongwen and Xuanwu districts. The percentage is rather small, because there are a large number of historical and cultural protection areas in these two districts, which are not very appropriate for residence. Commute distances of this latent class are rather short, with 85.5% within 8 kilometers. Therefore, corresponding to the distance, the most frequently used commute mode is the bicycle, covering 52.4%. Furthermore, since the economic level of Fengtai and Shijingshan districts are not very high, household income of this group is the lowest in all latent classes.

From the above analyses, we can further find out that the differences among four latent classes are distinct. Moreover, the clustered results from the LCM method are much more logical than those from traditional simple cluster analysis methods [13].

##### 5.2. Estimation Results of Mixed Logit Model

The joint residence-workplace location choice model based on mixed logit is estimated using maximum simulated likelihood (MSL) method, which was proved to be rather effective and efficient by Bhat and Guo (2004) [11]. Furthermore, to implement the MSL algorithm, we integrate randomly scrambled Halton draws [26] into the estimation algorithm. Similar to our previous work, we also code the algorithm using GAUSS platform.

Using the above method, the joint residence-workplace location choice model is estimated as though that all households tend to choose residence location and workplace simultaneously; that is, these two kinds of land-use types influence each other. The estimated mean values of all parameters are reported in Table 5, with the -statistics shown in parentheses to indicate the significances of explanatory variables. As stated before, for unfixed commute distance and commute time, estimations of their mean values and standard deviations are both reported. Furthermore, to compare different household groups, the estimations of four latent classes are summarized in four columns, respectively.

From Table 5 we can find out that all estimated parameters have expected signs and significance, which generally proves the effectiveness of the integrated model of joint residence-workplace location choice and commute behavior.

For all latent classes, we can get the following results from the signs of parameters.(1)The expected negative sign of house renting price shows that with other conditions fixed, households tend to live in areas with rather low housing price.(2)Both commute distance and commute time between residence location and workplace have negative signs as expected, which indicates that households tend to job-housing balance when they consider their residence location and workplace choices; that is, proximity to workplace is very important for households to choose residence location; at the same time, proximity to residence location is also very important for households to choose workplace.(3)The positive sign of household monthly income is also consistent with expectation, which means that households are more likely to reside or work in places which could bring them higher income.(4)The sign of number of employment opportunities is positive, showing that job opportunities are a rather important factor influencing households’ residence location and workplace choices. It means that people tend to live and work in locations with more opportunities.(5)GDP of workplace has the expected positive sign, indicating that households are more inclined to work in places with good economic environment.(6)A very interesting thing is that there is an exception in population density; that is, in latent class 1, the parameter is positive, while in latent classes 2, 3, and 4, the parameters are negative. Characteristics of each latent class could explain such exception. In class 1, most households live in Haidian and Chaoyang districts, which are two rather big zones with many residential land-uses, but the residence density is not very high; therefore, households tend to locate in places with high population density, which is also a kind of reflection of population clustering effect. Conversely, in classes 2, 3, and 4, most households live in other six districts of Beijing, which are rather small areas with very high residence density; therefore, households in these three classes tend to reside in areas with low population density, which reflects that low residence density and comfortable community environment are more important for these people.

Further comparisons of the estimations among four latent classes reveal the following results.(1)For all 4 latent classes, the magnitude of household monthly income is much bigger than other parameters. It indicates that this factor is much more important than other factors for household residence location and workplace choices. Moreover, house renting price also has rather big magnitude, showing that housing price is also a very important factor in location choices.(2)For latent class 1, house renting price, population density, and GDP of workplace have much bigger magnitudes, showing that these three factors are more important for households in Haidian and Chaoyang districts to make their residence location and workplace choices.(3)For latent class 2, much bigger magnitudes in population density and GDP of workplace again prove their importance. As stated before, the sign of population density is negative, because most households in class 2 reside in Xicheng and Dongcheng districts, which locate at the center city of Beijing, with very high residence density. Therefore, different from class 1, these households tend to live in areas with low population density and comfortable environment.(4)For latent class 3, commute distance has rather big magnitude, which means that people consider more about commute distance when they make residence location and workplace choices. Results from the latent class analyses give the reason; that is, most households in this group use walk and bicycle in commute trips, and these two kinds of modes are more sensitive on trip distance.(5)For latent class 4, the magnitude of number of employment opportunities is obviously bigger than others. Once again, the reason can be achieved from the latent class analyses. Most households in this group reside and work in Fengtai and Shijingshan districts, which are relatively underdeveloped in economic level. There are less employment opportunities in these two districts than other six zones, and income is also rather low. Therefore, households in this group pay more attention to number of employment opportunities in residence location and workplace decision behaviors.

Generally, all the estimation results are consistent with expectations. The detailed analyses based on latent classes explore many interesting and logical results.

#### 6. Conclusions

This paper addresses an integrated model of joint residence-workplace location choice and commute behavior using latent class and mixed logit methods. The general model framework consists of two single models. We first present a latent class model to extract households’ different choice characteristics and cluster households into several groups. Based on the latent class analyses, we further combine the residence location and workplace together as the joint choice alternative and formulate a joint residence-workplace location choice model using mixed logit method. A large amount of data is extracted from the personal trip survey data of Beijing in 2005 for case study. Estimation results of the latent class model show that households are properly clustered into four groups, and every kind of household has different characteristics. The mixed logit models for all four latent classes are then estimated, respectively, using maximum simulated likelihood method. Estimated parameters show that all the estimations are consistent with expectations. For all latent classes, household monthly income and housing price are much important for residence location and workplace choices. Further comparisons of the estimated parameters among four latent classes prove that there exist much big differences in the location choice behaviors, and the joint residence-workplace location choice model using latent class and mixed logit methods is very effective.

Future researches are directed towards the following aspects. The first is to employ more recent socioeconomic data, census data, and trip survey data and to update the case studies of this research. The second is to further explore the differences among different decision makers, for instance, male, female, and children in the same household and to analyze more detailed choice behaviors. The third is to track the development histories of residential and employment land-uses based on panel data.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This research has been supported by National Natural Science Foundation of China Project (51208024), Beijing Philosophy and Social Science Project (14CSC014), Beijing Nova Programme (Z151100000315050), Science and Technology Project of Ministry of Housing and Urban-Rural Development of China (2013-K5-6), and the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (CIT&TCD201404071).