#### Abstract

This paper developed a mixed multinomial probit (MMNP) model with alternative error specification and random coefficients (for both generic variables and personal attributes) to accommodate flexible covariance structure and taste variation. The MMNP model can be efficiently estimated with analytic approximations of multivariate normal cumulative distribution functions, which avoid defects of simulation-based integration in the mixed multinomial logit (MMNL) model. The integral dimension of the MMNL model increases as random coefficients increase, but it only depends on the number of available alternatives in the MMNP model. Simulation experiments and empirical analysis of Shanghai commuters’ mode choice behavior were undertaken to examine the performance of MMNP models. Both simulation results and empirical results show that MMNP models can well accommodate flexible covariance structures and taste variation reflected through random coefficients being associated with both generic and personal variables. Empirical results indicate that the MMNP model performs better than traditional discrete choice models, such as the multinomial logit, the cross-nested logit, MMNL, and multinomial probit models. Random coefficients of “in-vehicle time of car” and “number of companions” indicate taste heterogeneity and the identifiability of random coefficients associated with both generic and personal attributes. Pairwise positive correlations between car/taxi, bus/metro, and bus/bus and metro are to be expected. However, the positive correlation between the car and metro modes may be unique to the Chinese city, Shanghai, because of the developed metro system. Unequal error variances reflect heterogeneities in unspecified factors in commute modes’ utilities. The MMNP model will offer an alternative efficient way to accommodate taste heterogeneity and flexible error covariance structure in discrete choice models. Compared with the MMNL model, the MMNP model can accommodate more random coefficients without increasing computational complexity.

#### 1. Introduction

In general, two methods are being applied to overcome the limitation of the IIA property in the multinomial logit (MNL) model by accommodating random taste variation and correlation among alternatives in a discrete choice model. The first method adopts the mixed multinomial logit (MMNL) model, which combines random coefficients varying across individuals to represent taste variation with random error components varying across alternatives to represent correlation among alternatives [1–4]. The second method is a more complex mixed generalized extreme value (GEV) model, such as using a nested logit (NL) model to represent correlation among alternatives and random coefficients to represent taste variation [5–7]. The flexibility of these two methods in representing correlation and random taste variation comes at a cost that choice probabilities can no longer be expressed in a closed-form function. Compared with the first method, the benefit of the second one is that it has fewer dimensions of integration. The disadvantage is that it has a more complex log-likelihood function. Garrow [8] compared these two methods using NL mixtures to approximate an NL model because, theoretically, mixture analogs can approximate any random utility model [9, 10]. The results show that the NL model cannot be well approximated by its mixture analog because it is difficult to obtain precise correlation estimates from mixture models, particularly for nests that have small correlations, and the accuracy of NL mixture coefficients is sensitive to the number of draws and number of random coefficients in the simulation. Empirical findings in Garrow [8] recommend adopting the second method to accommodate the correlation and random taste variation.

Although the second method, the mixed GEV models, has fewer dimensions of integration, the dimension of integration is the number of random coefficients in the model. With the increase of random coefficients, the integration will be more complicated with higher dimensions. Besides, GEV estimates are biased when negative correlations appear between choices [11]. However, modelers usually do not know the covariance structure in advance. Under this condition, the use of multinomial probit (MNP) models that can accommodate negative correlation will not cause the problem of estimation bias. In addition to negative correlations or covariances, an MNP model can also naturally accommodate unequal variances in its flexible error covariance structure. As well as accounting for random taste variation, this paper considers using a mixed multinomial probit (MMNP) model with random coefficients to represent random taste variation and its error terms to represent a flexible covariance structure among alternatives.

The MMNP model is rarely applied in the transportation field because of the difficulty to evaluate its likelihood function without a closed-form expression. Even if there are some applications, random error terms are usually assumed to be independently distributed [12, 13]. They only use correlated random coefficients to capture the correlation between alternatives. If random error terms are also correlated, will these correlations, as well as correlated random coefficients, be identifiable? Generally, in a mixed model, generic variables, which vary across alternatives, are assumed to take random coefficients. Are all the elements in a covariance-variance matrix of random coefficients for a set of generic variables (e.g., in-vehicle travel times across travel modes) identifiable? Can personal attributes, which do not vary across alternatives, take random coefficients? If so, how can the elements in a covariance-variance matrix of those random coefficients be identified? Some simulation experiments will be conducted to answer those questions in this paper.

The estimation of the MMNP model depends on the computation of a multivariate normal cumulative distribution (MVNCD) function. As the evaluation method of the MVNCD function improves, to avoid the above-mentioned defects of simulation techniques in mixed GEV models, several analytic approximation techniques have been developed [14]. Bhat [14] proposed four new analytic MVNCD approximation methods and compared them with other existing analytic approximation methods and simulation-based integration methods. Regardless of the dimensionality of the MVNCD function, the two-variate bivariate screening (TVBS) approach performs best in the ability to recover MNP parameters accurately and quickly. Therefore, we employed the TVBS analytic approximation evaluation of the MVNCD function to estimate the MMNP model for this study.

This paper aims to develop an MMNP model with correlated random coefficients (for both generic variables and personal attributes) and correlated error structure to accommodate a flexible covariance structure and taste variation, which can be efficiently estimated using analytic approximations of the MVNCD function. And the MMNP model will be applied to analyze commute mode choice behavior in Shanghai, China. The rest of this paper is organized as follows. The second section details the methodology of the MMNP model accommodating random taste variation and flexible covariance structures. In the third section, simulation experiments are undertaken to validate the identifiability when correlated random coefficients are associated with generic variables and personal characteristics. In the fourth section, an empirical study is conducted to explore Shanghai residents’ commute mode choice behavior. Discussions and conclusions will be presented in the last section.

#### 2. Modeling Methodology

In this section, the MMNP model will be formulated to accommodate correlated random coefficients and a flexible error covariance structure. For an -alternative MMNP model, the utility function of alternative (, where denotes the total number of alternatives) is given bywhere is a column vector of explanatory variables, and is a row vector of corresponding coefficients. and are the elements of vectors and , respectively. . is the random error term for an alternative . follows a multivariate normal distribution with a mean vector of zeros and variance-covariance matrix , or .

is assumed to be a normally distributed random coefficient with mean and standard deviation . That is, , and herein follows a standard normal distribution. Then, Equation (1) can be rewritten aswhere is the systemic utility of alternative , which is the total random utility combining the random error term and the random utility from the random coefficient . The covariance matrix of can be expressed as .

Since , mean values of and are zeros, the mean value of the total random utility is also zero, . Then, the variance-covariance matrix of can be calculated aswhere and , which are both column vectors. and are the transpose of and , respectively. “” represents the matrix multiplication, and “” represents Hadamard product or element-wise multiplication. Since (where ), the variance-covariance matrix of random coefficients can be expressed as .

Based on the principle of random utility maximum (RUM), the probability that an alternative is chosen can be expressed as

Let , with . Then, the probability in Equation (4) can be transformed intowhere is the probability density function of an -dimensional multivariate normal distribution (MVN).

. The covariance matrix of is calculated as , where is the variance-covariance matrix of as shown in Equation (3). And is an identity matrix of size with an extra column of −1 added as the column [15]. For example, if and . is the transpose of .

Therefore, the likelihood function of the maximum likelihood estimation procedure or probability of choosing an alternative can be expressed as an -dimensional multivariate normal cumulative distribution (MVNCD) function of as shown in Equation (5). With the development of analytic approximations of the MVNCD function, the two-variate bivariate screening (TVBS) approach is adopted to estimate the MMNP model in this paper. The TVBS approach can recover underlying parameters accurately and efficiently, regardless of the dimensionality involved in the MVNCD function. For more details of the TVBS approach, one can refer to Bhat [14].

Additionally, in an -alternative MMNP model, elements in the covariance matrix of correlated error terms can be identified [15]. If correlated random coefficients are associated with generic variables, all means and all elements in the variance-covariance matrix are identifiable. If correlated random coefficients are associated with a personal attribute that appears in utility functions of all the available alternatives, similar to an alternative-specific constant, only means of random coefficients for this personal attribute are identifiable. And the covariance matrix of random coefficients is similar to that of error terms in identifiability. That is, elements in the variance-covariance matrix of random coefficients can be identified. Simulation experiments will be conducted in the next section to demonstrate the number of identifiable elements in covariance matrices.

#### 3. Simulation Experiments

Correlations among alternatives may exist in both correlated error structure and correlated random coefficients. To verify the identifiability of parameters, two simulation experiment scenarios were designed when correlated random coefficients were associated with generic variables and personal characteristics. The estimation of a model with independent random coefficients across alternatives is a special case of that with correlated random coefficients. All the correlations can be fixed at 0. Random error terms are correlated across alternatives in both simulation experiments.

In simulation experiments, we assume that all alternatives are available for each individual. Explanatory variables follow IID uniform distributions between 0 and 10. Dependent variables are determined based on the RUM principle:

If is the maximum utility among all of the available alternatives for an individual, ; otherwise, .

##### 3.1. Experimental Design

###### 3.1.1. Correlated Random Coefficients Associated with Generic Variables

For a four-alternative MMNP model, we assumed that four random coefficients are associated with four independent generic variables , , , and . Utility functions of four alternatives are given as follows:where random coefficients are normally distributed with a mean vector and a standard deviation vector . Since random coefficients are assumed to be correlated across alternatives, , , , and are correlated and follow a standard multivariate normal distribution with a mean vector and a covariance matrix . Then, the variance-covariance matrix of is calculated as . Random error terms , , , and follow a standard multivariate normal distribution with a mean vector and covariance matrix .

All elements in the covariance matrix can be estimated in the MMNP model. Since only five elements in the error term covariance matrix can be identified in a four-alternative MMNP model, we fixed all the diagonal elements at 1 and the correlation . Thus, parameters to be estimated include all values, all values, all elements in the covariance matrix , and five covariance elements in the covariance matrix . and are set as negative to validate the ability of the MMNP model in accommodating negative correlation.

###### 3.1.2. Correlated Random Coefficients Associated with Personal Attributes

Generally, in a mixed model, random coefficients are assumed to be associated with generic variables, such as travel time, cost, or other level-of-service variables of different travel modes. The identifiability when random coefficients are associated with personal attributes has not been explored in previous studies. In this subsection, random coefficients of the same personal attribute are assumed to be correlated across alternatives and vary across individuals and alternatives. A three-alternative MMNP model with utility functions is considered as follows:where random coefficients follow normal distributions with a mean vector and a standard deviation vector . Since random coefficients are correlated across alternatives, and follow a standard multivariate normal distribution with a mean vector and a covariance matrix . Then, the variance-covariance matrix of is calculated as . Random error terms , , and follow a standard multivariate normal distribution with a mean vector and a covariance matrix .

In this case, random coefficients are associated with the same explanatory variable in utility functions of three alternatives. With consideration of the identification problem, one of the mean values of random coefficients (, or ) should be fixed as a reference, similar to an alternative-specific constant. The covariance matrix of random coefficients is similar to that of error terms in identifiability. In a three-alternative MMNP model, only two elements in the covariance matrix of error terms can be identified. Therefore, all parameters to be estimated in this scenario are all values, two of values ( was fixed), one covariance element and one standard deviation in the covariance matrix , and two covariance elements ( and ) in the covariance matrix of error terms . and are set as negative to validate the ability of the MMNP model in accommodating negative correlation.

##### 3.2. Simulation Results

Simulation results of the MMNP models with correlated error structure and correlated random coefficients associated with generic variables and personal attributes are shown in Tables 1 and 2. Variances in both covariance matrices of error terms are assumed to be identical in the two experiments. Parameters are also consistently estimated for heteroscedastic error terms. For brevity, only the results of homoscedastic error terms are presented here.

According to significant t-statistics at the 0.01 level shown in Tables 1 and 2, all identifiable parameters can be estimated and are seemingly consistent with their true values, indicating that correlations among alternatives can be accommodated in both error structure and random coefficients for generic variables. In empirical studies, if correlations from only one source are considered, it may lead to bias in model estimation. Results in Table 2 show that, in addition to generic variables, personal attributes can also be specified with random coefficients. In real data, respondents may have different preferences for the same generic variables (such as travel time and cost of different travel modes), and individuals with different characteristics also have different preferences for alternative modes. And the MMNP model can well accommodate random taste variation in both generic variables and personal attributes through random coefficients. As significant t-statistics at the 0.01 level are shown in Tables 1 and 2, all negative covariance elements in covariance matrices of random error terms are correctly estimated, which indicates that the MMNP model can well accommodate various correlated error structures, even with negative correlations. It can perfectly avoid defects in previous MMNP models and mixed GEV models since error terms are assumed to be independent in previous MMNP models [12, 13], and negative correlations cannot be accommodated in the mixed GEV models [15], which may lead to bias in model estimation [11].

However, the sample size for ensuring the significance of all the identifiable parameters is great in MMNP models. For the four-alternative MMNP model with random coefficients associated with generic variables, the sample size is around 500,000 in our experiment to ensure the significance of all the critical coefficients. The sample size for the three-alternative MMNP model with random coefficients of personal attributes is around 5,000,000 to ensure significance. Both sample sizes are much larger than those in usual empirical studies. As per Walker [16], although model parameters are theoretically identifiable, there are issues of empirical identification, namely, whether the data can support the model specification. If the empirical data cannot allow for estimating all the theoretically identifiable parameters, some parameters may be fixed, and only parameters of particular interest need to be estimated.

To improve the efficiency of estimation and decrease the required sample size for statistically significant identification, , , , and in the random coefficients of the four-alternative MMNP model can be set as . That is, , , , and are linearly correlated with a correlation coefficient of 1. And they follow the standard normal distribution. Then, the required sample size for identification can be decreased to 5000. The simulation results with a sample size of 5000 observations are shown in Table 3.

#### 4. Empirical Study

To examine the performance of the MMNP model in accommodating random taste variation and correlations among alternatives, an application is presented in this section to analyze the commute mode choice behavior of Shanghai residents.

##### 4.1. Data and Sample Description

###### 4.1.1. Commute Trip Data

The commute trip data used in this study were derived from a web-based travel survey of Shanghai conducted in 2018. The survey collected detailed socioeconomic and demographic information, and a complete 24-hour travel diary reported by 2033 individuals. The detailed trip data included origin and destination locations, trip beginning and ending times, travel mode, trip purpose, and the number of companions. Six travel modes are considered in this study including car, taxi, metro, bus, a combination of bus and metro (hereinafter referred to as Bus and Metro), and the nonmotorized mode (bicycling and walking). After data screening to remove records with missing data for explanatory variables of interest and travel mode choice, the final sample comprised 1743 commute trips.

###### 4.1.2. Level-of-Service Data

The LOS characteristics of different travel modes were extracted from the zone-to-zone travel impedance matrices generated from transportation networks of Shanghai, which were integrated into TransCAD based on the GIS data of roads (links and nodes), bus lines and stops, metro lines and stations, and residential committees. Travel time calculated from floating car data was added to each link of the road network [17, 18]. Travel times, service frequencies, and fares by the time of day were applied to each trip OD pair in the sample.

###### 4.1.3. Sample Description

Table 4 provides a summary of the socioeconomic and demographic characteristics of the sample. Most commuters are 20–40 years old, married, and well educated, hold a driving license, and live with family. Personal monthly income mainly falls between 4.5k and 15k RMB Yuan. More than half (62.2%) of commuters have an available private car on the travel day. Origin TAZs (or residential areas) have greater population density and lower employment density than destination TAZs (or workplaces), as expected.

Descriptive statistics of LOS characteristics are shown in Table 5. The average trip distance of nonmotorized mode is 4.204 km, which indicates that nonmotorized mode can meet most of the short-distance commute demands. In terms of average fare, the taxi has the highest fare because Shanghai taxi has a high starting price. In terms of average in-vehicle time, car, taxi, and metro are all around 20 minutes, while the bus and Bus and Metro rise obviously to 37.999 and 38.747 minutes, respectively. In the Bus and Metro mode, the in-vehicle time of the metro is longer than that of the bus, indicating that the bus is a complementary mode to the metro. The average initial waiting time and transfer waiting time of the metro are shorter than those of bus and Bus and Metro, and they vary much less, meaning that the metro system provides a higher service frequency, and the service frequency distribution is more even among different lines. In terms of average access/egress distance, bus and Bus and Metro have substantially shorter distances than the metro because the bus has a higher station coverage rate. And metro’s average access distance is slightly longer than the average egress distance presumably due to the denser distribution of metro stations in workplaces than in residential areas. The Bus and Metro mode takes a slightly greater number of transfers than bus and metro, as expected.

##### 4.2. Empirical Results

In this section, the empirical results of models, including the MNL, CNL (cross-nested logit model, shown in Figure 1), MMNL, MNP, and MMNP models, are presented in Table 6. And then, the data fit of the MMNP model is compared with that of the MNL, CNL, MMNL, and MNP models.

The MMNP model can accommodate both random taste heterogeneity and flexible covariance structure among alternatives. Compared to the MMNP model, the MNL, the CNL, and the MNP models cannot accommodate random coefficients. Although the MMNL model can capture the taste variation across individuals by random coefficients, it cannot accommodate the variance heterogeneity of different travel modes. The following paragraph will focus on model results of the MMNP model because coefficients of explanatory variables have the same signs and similar values.

###### 4.2.1. Model Results of the MMNP Model

*(1) Random Coefficients.* In the empirical study, both random coefficients of a generic variable and a personal attribute appear to be significant, indicating that there are variations among commuters’ sensitivities to those variables.

As a generic variable, the in-vehicle time of the car has a mean coefficient of −2.662 and a standard deviation of 1.050. The negative mean value indicates that the utility of the car will decrease with the increase of in-vehicle time on average across individuals, as expected. The value of standard deviation is different from zero significantly, implying that random taste heterogeneity does exist among commuters in this LOS variable. And the value of standard deviation in this MMNP model is quite similar to that in MMNL models [19, 20].

“The number of companions” has a random coefficient with a mean value of 0.423 and a standard deviation of 0.828 in the utility function of the car mode. The positive mean value indicates that “the number of companions,” on average, has a positive impact on the choice of car because commuters can share the cost while enjoying the flexibility and mobility of the car, which is more cost-effective than driving alone. The standard deviation is significantly different from zero. “The number of companions” is a kind of personal attribute because the variable remains the same in all alternatives’ utility functions. Therefore, it verifies that the random coefficient of a personal attribute can also be estimated in an empirical study.

*(2) Error Covariance Structure.* As shown in Table 6, the CNL, the MNP, and the MMNP models all reveal positive correlations for the pairwise alternatives of car/taxi, car/metro, bus/metro, and bus/bus and metro. In the context of single choice models (that is, when only one alternative can be chosen), such positive correlations imply a higher level of substitutability between the corresponding pairs of modes relative to assuming zero correlation (as in the MNL).

Of the pairwise correlations, those between car/taxi, bus/metro, and bus/bus and metro are to be expected. Cars and taxis are characterized by their mobility and flexibility, and both belong to the category of a private transportation mode. Meanwhile, bus and metro are both public transportation modes and have many aspects in common. However, the positive correlation between the car and metro modes, even though less than the positive correlations between car/taxi and bus/metro, is particularly interesting because one is a private mode, while the other is a public transportation mode. This result, though perhaps likely to be unique to many Chinese cities, is probably because the Shanghai metro has been well developed and provides a convenient, high-frequency, and punctual service. Shanghai metro operates on a strict schedule, making it attractive to most commuters. The Shanghai metro operates at a speed (almost 30 km/h) even faster than the speed of road traffic during peak hours. In addition, it is difficult to find a temporary parking lot during peak hours, and it is also highly expensive to reserve a parking lot because of limited parking spaces around workplaces. This finding suggests that improvement in metro LOS can be effective in attracting commuters away from private cars to the metro and alleviate traffic congestion during peak hours.

The variance of the bus mode is also shown in Table 6. Due to the limited sample size, not all identifiable parameters appear to be significant. Variances of the other modes (car, taxi, metro, bus and metro, and nonmotorized modes) are fixed at 1 as a base. Consistent with the results in Bhat [21], the variance of the bus mode is greater than that of car, taxi, and metro, indicating that unobserved variables affecting the choice of bus have greater variances than those affecting car, taxi, and metro. For example, the comfort level is an unobserved variable whose values may vary considerably within bus modes because different kinds of bus services in Shanghai, such as customized buses (or demand-response transit), bus rapid transit (BRT), and regular buses, may have different occupancy rates. However, comfort varies little in the car, taxi, and metro. Therefore, the random component of the bus mode has greater variance than that of the other modes.

*(3) Level-of-Service Variables.* The in-vehicle time takes expected negative effects on the choice of car, taxi, metro, and bus. Taxi has the most negative coefficient value because the taxi fare ranks first among all the modes. Commuters can spend their in-vehicle time entertaining and do not have to focus on traffic conditions when they take public transit. As a result, coefficients of in-vehicle time for metro and bus are less than those for cars and taxis.

Access/egress distance reduces utilities of public transit (metro, bus, and Bus and Metro), indicating that longer access/egress distance will discourage commuters from choosing public transit. And the coefficient of access/egress distance for the metro is lower than that for bus, Bus and Metro. Because the bus has a higher station coverage rate than the metro, and the metro has a much more comfortable environment than the bus, commuters are more likely to accept the larger access/egress distance of the metro.

A negative coefficient of “the number of transfers” implies that the greater the number of transfers is, the less likely the commuters are to take the bus. Usually, the bus is not as punctual as the metro. The more the number of transfers is, the greater the probability that commuters will encounter bus delays. Besides, the transferring environment at public transit stations is crowded and uncomfortable. Commuters are usually anxiously waiting or in a hurry. Similarly, the coefficient of “transfer waiting time” is also negative for Bus and Metro.

While the nonmotorized mode mainly serves short-distance commute trips as shown in Table 5, “trip distance” takes a significantly negative coefficient, as expected.

*(4) Sociodemographic and Employment Variables*. Married commuters tend to drive a car because they usually need to drive their spouse to work or drive children to school during the commute trip. Commuting by car can provide maximum flexibility and convenience. Likewise, commuters living with families also prefer to use a car. “Personal monthly income” has positive coefficients in utilities of car, taxi, and metro because high-income commuters tend to use high-quality, but high-cost services. With the emerging car-sharing services, having a driving license becomes the only requirement for commuters to drive a car as the commute mode. Therefore, having a driving license will increase the likelihood of commuting by car. Compared with using a car-sharing service, “having available private cars on the trip day” will encourage commuters to drive a car to commute because commuters do not have to walk to car-sharing stations from home and return the car to designated stations. On the other hand, “having available private cars on the trip day” will decrease the likelihood of taking public transit. “Having available private bikes on the trip day” provides a more convenient opportunity for commuters to commute by the nonmotorized mode as evidenced by a positive coefficient. Higher employment densities of workplaces indicate lower parking facility densities and higher parking fees. Commuters working in these areas will be less likely to drive to work, and the likelihood of taking a taxi, bus, and metro will increase.

###### 4.2.2. Comparisons of Data Fit with the MNL, CNL, MMNL, and MNP Models

To evaluate the data fit of the MMNP model, other four models were also estimated: a basic six-alternative MNL model, a CNL model with nesting structure shown in Figure 1, an MMNL model with random coefficients associated with the same variables (in-vehicle time of the car, and the number of companions) as in the MMNP model, and an MNP model with the same correlated error structure as in the MMNP model.

To compare the model fitting performance of the MMNP model with those of the other four models, multiple goodness-of-fit measures are calculated, including , adjusted , Akaike Information Criteria (AIC), and Bayesian Information Criterion (BIC):where is the log-likelihood value of the model without any explanatory variables. is the log-likelihood value at convergence.where is the number of parameters estimated in the model, and is the number of observations.

Further, to compare the logit-based models with the MMNP model using a statistical test, a nonnested test statistic is applied (see [22–24]). The adjusted log-likelihood ratio statistic of each model is first computed:

The nonnested test statistic iswhere represents the CDF of standard normality, and takes a positive value. That is, the probability that could have occurred by chance is no larger than . A small value for the nonnested test statistic indicates a small probability of erroneously selecting the incorrect model. And the model with a higher value for the adjusted likelihood statistic is preferred. The nonnested test is then used to compare the MMNP model with the other four models.

Based on , adjusted , the model with the highest value performs best. And the model with the smallest values of AIC and BIC has the best performance. As shown in Table 7, all the measures indicate the same result that the MMNP model performs better than the other four models with the highest values of and adjusted , and the smallest AIC and BIC values. It indicates that the MMNP can not only accommodate random taste variation and flexible covariance structures among alternatives, but also perform better than the traditional discrete choice models. From the informal nonnested likelihood statistic values provided in Table 7, it can be inferred that the probability of erroneously selecting the MMNP model is literally zero.

#### 5. Discussion and Conclusion

This paper developed an MMNP model with correlated error structure and random coefficients for both generic variables and personal attributes to accommodate flexible covariance structure and random taste variation. And the MMNP model can be efficiently estimated with analytic approximations of the multivariate normal cumulative distribution (MVNCD) function proposed by Bhat [14]. Both simulation experiments and empirical analysis of Shanghai commuters’ mode choice behavior are carried out to examine the performance of the MMNP model.

Compared to the MMNP model, the MNL, the CNL, and the MNP model cannot accommodate random coefficients. Although the MMNL model can capture the taste variation across individuals by random coefficients, it cannot accommodate the variance heterogeneity of different alternatives. Simulation results show that the MMNP model can accommodate a flexible covariance structure, even with negative correlations that cannot be accommodated in mixed GEV models. However, the neglect of negative correlation in mixed GEV models may lead to estimation bias. Correlations among alternatives can exist in both error structure and random coefficients for generic variables, while most previous applications of MMNP models assumed that random error terms were independent of each other, which may also lead to bias in model estimation. In addition to correlated random coefficients for generic variables, correlated random coefficients for personal attributes can also be significantly identified.

Empirical results are consistent with simulation results in the fact that random coefficients can exist in both generic variables and personal attributes in MMNP models. As a generic variable, the in-vehicle time of the car has a negative mean coefficient and a standard deviation being significantly different from zero. “The number of companions” remains the same in all alternatives’ utility functions and can be regarded as a kind of personal attribute. The random coefficient of “the number of companions” has a positive mean and a standard deviation being significantly different from zero. Simulation results show that all theoretically identifiable elements in variance-covariance matrices of correlated random coefficients can be estimated with a large enough sample size, while in empirical studies, the real data may not allow for estimating all these parameters due to limited sample size or little variation in explanatory variables. In these cases, some parameters may be fixed, and only parameters of interest can be estimated. The variance and correlation structures from the MMNP model provide useful insights. A greater variance for bus than metro indicates much higher variability in the quality of bus-related equipment, bus stop environment, and bus service relative to the more streamlined and less variable metro rail service. Interestingly, whenever correlations were allowed in the models, these turned out to be positive. Of the pairwise positive correlations, those between car/taxi, bus/metro, and bus/bus and metro are to be expected. However, the positive correlation between the car and metro modes, even though less than the positive correlations between car/taxi, bus/metro, and bus/bus and metro, is particularly interesting because one is a private mode, while the other is a public transportation mode. But this result, though perhaps likely to be unique to many Chinese cities, suggests that improvement in the metro level of service can be effective in attracting commuters from private cars to the metro and alleviate traffic congestion during peak hours.

Empirical results also show that the MMNP model has a better model fit than traditional discrete choice models, such as the MNL, MMNL, and MNP models, evidenced by the highest values of and adjusted , and the smallest AIC and BIC values. And the nonnested test statistics show that the probability of erroneously selecting the MMNP model is literally zero. A model comparison could also be achieved with a synthetic population of larger size. Researchers and practitioners may not get access to large samples of available travel data in practice. To better test the performance of these models in the real world, this paper uses a small sample of real data.

Combined with analytic approximations of MVNCD functions, the MMNP model can be estimated more efficiently and quickly and avoid the high-dimensional simulation integration of using the traditional MMNL model or the mixed NL model with random coefficient to relax the response homogeneity assumption and the independence assumption of the MNL model. In summary, the MMNP model can well accommodate random taste variation and flexible covariance structure in both simulation experiments and empirical study and can be efficiently estimated by employing analytic approximations of MVNCD functions. The developed MMNP model will offer researchers and practitioners another simple and efficient way to accommodate taste heterogeneity and flexible covariance structure. In this paper, random coefficients are assumed to be normally distributed, but the possibility to develop an MMNP model based on different distributional assumptions may be explored in future research.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Authors’ Contributions

The authors confirm their contribution to the paper as follows. Xin Ye and Ke Wang conceptualized and designed the study; Ke Wang collected the data; Ke Wang and Xin Ye interpreted and analyzed the results; Ke Wang, Xin Ye, and Hongcheng Gan prepared the draft manuscript. All authors reviewed the results and approved the final version of the manuscript.

#### Acknowledgments

This research was partially supported by the general project “Study on the Mechanism of Travel Pattern Reconstruction in Mobile Internet Environment” (no. 71671129) and “The Investigation of the Mechanism for the Impact of 'Auto/P + R′ Multi-Modal Traveler Information on Mode Choice” (no. 71871143) and the key project “Research on the Theories for Modernization of Urban Transport Governance” (no. 71734004) from the National Natural Science Foundation of China.