Development of Integrated Choice and Latent Variable (ICLV) Models Using Matrix-Based Analytic Approximation and Automatic Differentiation Methods on TensorFlow Platform

Ma, Jie; Ye, Xin; Huang, Kun

doi:https://doi.org/10.1155/2022/6556282

Journal of Advanced Transportation

On this page

Abstract Introduction Conclusions and Discussion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 6556282 | https://doi.org/10.1155/2022/6556282

Development of Integrated Choice and Latent Variable (ICLV) Models Using Matrix-Based Analytic Approximation and Automatic Differentiation Methods on TensorFlow Platform

Jie Ma,¹Xin Ye,¹and Kun Huang¹

Academic Editor: Tomio Miwa

Received29 Oct 2021

Revised03 Mar 2022

Accepted03 May 2022

Published02 Aug 2022

Abstract

This study further explores the multinomial probit-based integrated choice and latent variable (ICLV) models. The LDLT matrix-based analytic approximation methods, including Mendell and Elston (ME) method, bivariate ME (BME) method, and two-variate bivariate screening (TVBS) method, were adapted to calculate the multivariate cumulative normal distribution (MVNCD) function in the ICLV model because of the better performances in accuracy and computational time. Integrated with the composite marginal likelihood (CML) estimation approach, the ICLV model based on high-dimensional integration can be estimated accurately within a reasonable time. In this study, some three-alternative and four-alternative ICLV models are simulated to examine their abilities to recover model parameters. It is found that the parameter estimates and standard error estimates are acceptable for both models and the computational time is expected to decrease using tensor data structures on the TensorFlow platform. For the four-alternative ICLV models, the TVBS method has the highest level of accuracy. The BME method is also a good alternative to TVBS if computational time is of great concern. The application of the automatic differentiation (AD) technique in the model can free researchers from coding analytical gradients of log-likelihood functions and thereby greatly reduce the workload of researchers.

1. Introduction

In the discrete choice model (DCM), the choice depends not only on the transport characteristics and the observable characteristics of decision-makers but also on the unobservable characteristics, namely latent variables. Among them, psychological and attitudinal factors as parts of unobservable latent variables have been proved to be significant when being integrated into DCMs [1].

One alternative to incorporate latent variables into discrete choice models is the integrated choice and latent variable (ICLV) model [2, 3], which can be viewed as a variation of the traditional structural equation model (SEM) with the advantage to involve the underlying choice process based on psychological concepts. The ICLV model is a popular research topic in the field of transport choice studies since the early 2000s, and the application of this model has been increasing in recent years [3–7]. The earlier ICLV models are generally developed on the basis of the mixed logit model, of which the error terms follow an independent and identically Gumbel distribution. These logit kernel-based ICLV models ignore the correlation among the utility of alternatives and the correlation among the error terms of latent variables. Meanwhile, the models have several deficiencies in the aspects of the estimation method (i.e., maximum simulated likelihood (MSL) approach [8, 9]), the convergence of model estimation with multiple latent variables [7], and computational time of multidimensional integration. Regarding the limitations of the above models, Bhat and Dubey [10] firstly proposed an multinomial probit (MNP) kernel-based ICLV formulation method, which can accommodate a large number of latent variables, and performances in convergence and computational efficiency have been improved. In particular, the maximum approximate composite marginal likelihood (MACML) inference approach is integrated to estimate the model parameters, which further simplifies the estimation process. As the number of latent variables is independent of the integral dimensions of the log-likelihood function and only relates to the number of alternatives of choice models, the computational time of this method is relatively short. Besides, the ordered and continuous response indicators for the latent variables can be involved in the model easily. Up to now, the new MNP-based ICLV model has not been extensively applied [11, 12] and remains a broad prospect for practice.

At present, it is critical to compute the multivariate cumulative normal distribution (MVNCD) function of the proposed ICLV model, which can be further improved. In some earlier studies, the simulation techniques are used for evaluating the MVNCD functions, of which the most common method is using a Geweke–Hajivassiliou–Keane (GHK) simulator [13]. However, since there is a large time cost of the simulation method with the same level of accuracy, this method may be replaced by the analytic approximation method. Mendell and Elston (ME) proposed a univariate conditioning approach earlier [14], and then, the researchers made a series of improvements. In the above probit-based ICLV model, the MACML approach of Bhat [15] is adopted, which combines the Switzer–Solow–Joe (SSJ) approximation for the MVNCD function with the CML inference approach for MNP models. Here, we intend to apply a new LDLT matrix-based analytic approximation method proposed by Bhat in 2018 [16] to evaluate the MVNCD function, which is demonstrated to have better performance than the SSJ. The method can effectively reduce difficulties in convergence and covariance matrix computation that occur routinely in the maximum-likelihood estimation of choice models with analytically intractable likelihood functions. Meanwhile, applying the LDLT decomposition method to the algorithms also assists in ensuring no substantial speed reduction in each iteration.

In the process of the maximum-likelihood estimation (MLE) for models, programming analytical gradients are generally needed to improve the speed and accuracy at convergence. However, since the derivation and coding of analytical gradients are usually complicated and time-consuming for complex models, the introduction of automatic differentiation (AD) into the model estimation process enables the acquisition of gradients without coding effort, which significantly reduces the workload of researchers. The AD method takes advantage of the fact that every computer program will execute a sequence of elementary arithmetic operations and basic functions, no matter how complicated it is. The MLE procedure usually adopts a reverse mode of AD, which is first published in 1976 by Linnainmaa [17]. Compared with numerical differentiation, it has obvious advantages in terms of calculating higher-order derivatives and partial derivatives with multiple inputs. In the past decades, the theory of AD has been considerably developed. Nevertheless, it has limited applications in the field of econometrics and statistics up to now [18, 19], which provides opportunities for further investigation in this respect.

Furthermore, the graph computation technique based on tensors represented as multidimensional arrays in the TensorFlow platform can avoid loop operations based on the two-dimensional matrix data structure used in conventional computational platforms. TensorFlow should have an advantage in computational speed over conventional platforms when the LDLT-based analytic approximation method is applied within a sample. On the one hand, the graph computation technique based on tensors can ensure that all the calculation process in LDLT-based analytic approximation is completely realized at a speed of C language codes, rather than that of scripting language codes to operate matrices repeatedly in multiple loops. On the other hand, in MLE procedures using matrix-based analytic approximations, a tensor with more dimensions is desirable because each individual observation in the sample needs a matrix-based operation for its log-likelihood computation and such computation needs to be repeated as many times as the sample size. Thus, a computational platform with a tensor data structure can avoid repeated matrix operations in a large loop and further contribute to computational speed improvement. In this study, some efforts will be made to explore the feasibility of MNP-based ICLV model estimation using LDLT-based analytic approximation and tensor data structure on the TensorFlow platform.

The rest of this study will be organized as follows. The second section elaborates on the MNP-based ICLV model and LDLT-based analytic approximation method and then introduces the AD mechanism. In the third section, the simulation experiments are designed for validating the three-alternative and four-alternative ICLV models. The next section presents the evaluation results of the MVNCD function and the estimation results for the ICLV model with different analytic approximation methods (ME, BME, and TVBS). Conclusions and discussions are summarized in the last section.

2. Methodology

2.1. MNP-Based ICLV Model

The MNP-based integrated choice and latent variable (ICLV) model proposed by Bhat and Dubey in 2014 [10] contains the following three components. Assume there are L latent variables (index ), G ordinal indicator variables (index ), I alternatives’ utilities (index ), and Q individuals (index ) in this model. To simplify the presentation, the matrix form is taken in the formulas below and the index of the variables is not shown.

2.1.1. Latent Variable Structural Equation Model

where is a vector of observed covariates affecting (not including a constant) and is a corresponding matrix of coefficients (i.e., exogenous variable loadings on ). is a vector of errors assumed to be standard multivariate normally distributed: , where is a correlation matrix. Each of is assumed to be the standard normal distribution. For the structural equation model, the same exogenous variables can be used for different latent variables .

2.1.2. Latent Variable Measurement Equation Model

where , , and are vectors that represent observed ordered indicator-dependent variables, intercept in measurement equation, and error items in measurement equation, respectively. is a matrix of coefficients representing the effect of latent variables on observed indicators . Also, let be the correlation matrix of , in which the identification of diagonal is not considered. Here, the normalization on the error terms is needed for identification, as in the usual ordered-response model. The ordered indicator can be horizontally partitioned into corresponding ordinal categories by setting thresholds. As per Bhat in 2014, is a vector that stacks the lower thresholds and is another vector that stacks the upper thresholds .

2.1.3. Choice Model (Multinomial Probit Model)

where is () vector of alternative utilities, is () matrix of exogenous variables in the choice model, and is () utility error vector that followed multivariate normal distribution whose variance-covariance matrix is . In the equation, is a matrix of variables interacting with latent variables and is an matrix of coefficients capturing the effects of latent variables and their interactions with exogenous variables. In a case without interactions, matrix of coefficients can be specified when and the matrix is reduced to an identity matrix of size . Since only the covariance matrix of error differences is estimable, elements can be identified in after normalizing the covariance matrix.

The MNP-based ICLV model framework can be presented in Figure 1.

In the process of the model estimation, in equations (2) and (3) can be substituted into the right side of equation (1) to further simplify expressions. We assume that the error vectors , , and are independent of each other. The following system can be obtained:

Consider the vector . The following is defined:

Then, .

To estimate the model, we need to define matrix of size firstly to transform the choice model into the utility difference form. When the alternative is chosen, the matrix in with the first G rows and first G columns is an identity matrix, and the rest matrix in is an identity matrix of size with an extra column of −1 added as the column. For example, when the alternative is chosen, if and . Next, let , where . After that, the matrix dimension is reduced to .

To integrate the multivariate OP model and MNP model to estimate the parameters, the upper and lower threshold vectors are defined: and , where both and are -column vectors that represent negative infinities and zeros, respectively. Thus, the likelihood function of the ICLV model may be written as follows:where is the integration domain and is the multivariate normal density function of dimension . The integral dimension of the above likelihood function does not increase with the increase in the latent variables .

To further proceed, the composite marginal likelihood (CML) approach can be used to simplify the model estimation process when the full likelihood function may be infeasible to evaluate. The CML estimator is the one that maximizes the compounded probability of all pairwise events, which obtains much easier-to-compute, lower-dimensional marginal likelihood. In the above case, the pairwise marginal likelihood function may be written for the ICLV model as follows:where represents the index for the ordinal outcome category for the ordinal variables. Then, consider (similar to ), which is a selection matrix to select the ordered variables and all alternatives of the choice model. Let be the multivariate standard normal cumulative matrix function of dimension E and correlation matrix , where is the diagonal matrix of the standard deviation of the covariance matrix . The function is expanded in detail as follows:where , .

In equations (7) and (8), the CML function can be divided into two components: the first component corresponds to each pair of ordinal indicators (of which the integral dimension of MVNCD is 2), while the second component corresponds to each pair of the choice outcome and an ordinal indicator outcome (of which the integral dimension is equal to I). Accordingly, the maximum integral dimension of the MVNCD function in equation (8) does not exceed I. It is pivotal to solve the multidimensional integral in the process of model estimation. The matrix-based analytic approximation method proposed by Bhat [16] can be applied to evaluate the MVNCD function, as detailed in a later section. At last, accounting for the index q for individuals, the pairwise marginal log-likelihood function is defined as .

The pairwise estimator obtained by maximizing the logarithm of the pairwise marginal likelihood function with respect to the vector is consistent and asymptotically normally distributed with asymptotic mean and covariance matrix given by the inverse of Godambe’s [20] sandwich information matrix [21]:where and can be estimated straightforwardly at the CML estimate ():

and

2.2. Matrix-Based Analytic Approximation Method

The computation of the MVNCD function can be applied for the multidimensional integral of the ICLV model above. In the previous ICLV model of Bhat, the MACML approach in 2011 [15] is adopted. In this section, a new matrix-based analytic approximation method proposed by Bhat in 2018 [16] can replace the MACML approach to evaluate the MVNCD function for acquiring better estimation results [22, 23].

There are five methods for approximating the MVNCD function mentioned and summarized in the literature of Bhat in 2018. As per Bhat, the TVBS method is recommended as the evaluation approach for the MVNCD function with the highest accuracy, and the BME method has a significant speed advantage over these evaluation approaches. Besides, as the foundation of the series of analytic approximation methods, the ME method can also be included in the evaluation system in this study. Thus, these three typical methods (ME, BME, and TVBS) will be selected for elaboration below. Conceptually, the ME method is based on univariate conditioning, while BME and TVBS are based on a bivariate conditioning mechanism.

Let be a multivariate normally distributed random vector with zero means, variances of 1, and a correlation matrix (K > 2), where is the dimensionality of the MVNCD function. The K-dimensional MVNCD function can be expressed as follows: . To facilitate the understanding, assume that represents and represents , and then, .

The joint probability of the MVNCD function of the three methods (ME, BME, and TVBS) may be expressed as follows. All the joint probability may be written as the product of a marginal probability and conditional probabilities, and the normal distribution can be used to approximate the skew normal distribution for conditional probabilities. In the equations below, the superscript n of is the number of truncations; is the maximum number of truncations, and a lower means a higher accuracy of the analytic approximation method. Let be an intermediate parameter used to measure : is equal to N for the ME method, and is discussed as being odd or even for the BME (or TVBS) method. The expected values and variances of single-sided truncations in the following equations can be obtained by the properties of truncated multivariate normal distribution, which is not detailed here (refer to Section 2.1 of Bhat in 2018).

2.2.1. The ME Method

The method relies on a single-sided truncation of a multivariate normal distribution in which some variables are truncated, while others are not. After a single variable is truncated, the residual variables follow a skew normal distribution in the ME method. is defined as the univariate conditional probabilities of after times truncations, which follows a univariate normal distribution (being used for approximating the skew normal distribution). Thus, the above can be transformed into the product of recursive univariate normal cumulative distribution function (CDF) (represented by ). In equation (11), and .

For instance, when , the probability of multivariate normal distribution can be written as follows: , where follows the skew normal distribution after times truncations. Thereinto, was truncated once and . is defined, and it can be deduced that the expected values and variances , where , , and . Similar to the calculation principle of , in equation (11) can be calculated by iteratively updating the expected value and covariance matrix. In the equation above, the maximum number of truncations is 5.

2.2.2. The Bivariate ME (BME) Method

When is even,

When is odd,

The BME method is in accordance with a bivariate truncation scheme, assuming that the marginal distribution of the third and fourth untruncated variables, given the first two are truncated, remains bivariate normal, and so on. In the equation above, similar to the ME method, uses the bivariate normal distribution to approximate the bivariate skew normal distribution. The conditional probability of this method for MVNCD function is a product of bivariate normal CDF when the dimension is even and a product of and when the dimension is odd. Here, is the updated correlation coefficient between error terms after bivariate truncation, and the calculation principle of the expected values and variances is similar to the ME method. As the extension of the ME method, the BME method truncates two dimensions each time and times in total (also does the TVBS method), where in equations (12) and (13).

Similarly, when , the using the BME method can be expressed as follows: . Thereinto, and are truncated once , and and are truncated twice . The maximum number of truncations is 2, which is smaller than the of the ME method. This indicates that the accuracy of the bivariate truncation scheme is better than the univariate truncation scheme.

2.2.3. The Two-Variate Bivariate Screening (TVBS) Method

When is even,

When is odd,

Similarly, the TVBS method combines the bivariate truncation scheme, in which the bivariate distribution is assumed to be rectangle-screened normal (RSN) distribution [24]. The corresponding conditional CDF at each step can be expressed as a ratio of a four-variate normal CDF and a two-variate normal CDF . Since the calculation of four-variate normal CDF is time-consuming, the TVBS method adopts an approximation method by taking the trivariate CDF of the first three variates . Given the first three variables, the CDF of the skew normal distribution of the fourth variable is denoted as . Accordingly, a four-variate normal CDF is obtained: . Further, the MVNCD function is composed of the product of (denotes trivariate normal CDF) and when is an odd number, and the product of , , and when is odd. In equations (14) and (15), and represent the expected value and covariance matrix updated after bivariate truncation, respectively. Additionally, ( is even) and ( is odd) for the TVBS method.

When , the using the TVBS method can be expressed as follows: , where and . Here, , , and conducted one truncation in , while and conducted two truncations in .

Furthermore, the matrix-based analytic approximation method applies an LDLT decomposition method for the correlation matrix followed by rank 1 (ME) or rank 2 (BME and method) updates of the factorization. The implementation is much easier to code and more computationally efficient.

As we know from the above discussion, the multidimensional integral for the analytical approximation method of MNVCD can be completely converted into integrals with no more than three dimensions. At present, the CDF of bivariate normal (BVN) distribution and trivariate normal (TVN) distribution is generally realized by numerical methods based on the Legendre–Gauss quadrature integral approximation. This numerical integration method adopted in most existing packages is relatively old with low levels of accuracy and speed. In this study, referring to the study of Genz in 2004 [25], a new numerical integration method is used to replace the original method. For BVN probabilities, we adopted the algorithm proposed by Drezner and Wesolowsky [26] with a higher-order Taylor expansion; for TVN probabilities, the second Plackett formula method studied by Gassmann [27] is applied (not be detailed here). Accordingly, after the above algorithms are applied along with matrix operations, the performance of both functions is improved in terms of accuracy and speed.

2.3. Automatic Differentiation (AD)

Automatic differentiation (AD) is a chain rule-based technique for evaluating the derivatives with respect to the input variables of functions defined by a computer program [28]. There are two different AD strategies: the forward mode and the reverse mode. The advantage of the reverse mode over the forward mode will be highlighted when the number of parameters is large, which is more suitable for the MLE procedure.

For a simple example, consider a binary probit model: , and . The logarithmic likelihood function of the model can be written as . In the reverse mode of AD, the derivative is computed with respect to each sub-expression recursively in the chain rule:

The derivative of with respect to is defined as , and the reverse accumulation with computation graph is illustrated in Figure 2. In this figure, the partial derivatives are estimated as follows: and . Thereinto, has a closed form, and can be approximated by the numerical integration method.

TensorFlow (https://www.tensorflow.org/), an open-source framework for large-scale distributed numerical computation, is widely applied in the areas of machine learning and deep neural network during early explorations. The main feature of the TensorFlow framework is based on a data flow graph and uses tensor as the core component to denote data. The framework builds a gradient graph corresponding to the original computation graph by backward propagation and chain rule of derivative values, which is more mature and flexible than other programming modes in the application of automatic gradient. Due to the advantages of TensorFlow, we applied it for developing econometric models in this study.

AD has obvious advantages when it is applied to the MLE of parameters in econometric models. Using this technique, modelers do not need to derive and code the complex analytic gradient expression of the log-likelihood function, which greatly reduces the workload. When the CML approach is used to estimate model parameters, the Hessian matrix and Jacobian matrix of estimated parameters need to be calculated. The errors and complexity of the numerical differentiation method increase when second-order derivatives are computed, and the speed is low in calculating partial derivatives of a function with respect to many inputs. Therefore, the use of automatic differentiation instead of numerical differentiation can aid in attaining more accurate estimation results with less computational time.

3. Simulation Experiments

To evaluate the ability of the proposed ICLV model to recover parameters from finite samples, it is necessary to conduct Monte Carlo simulation experiments. In the study of Bhat [10], four ordered indicator variables and three alternatives are concerned. In this section, referring to Bhat’s experiment and further increasing the number of alternatives, we applied the LDLT matrix-based analytic approximation method for a higher integral dimension in simulation experiments.

Here, we design simulation experiments and discuss the identifiability of the model system based on two assumptions (excluding those mentioned in the last section): (1) the number of indicators will not be less than the number of latent variables (i.e., each latent variable at least has one indicator, which only loads on that variable but does not load on any other latent variable); (2) there is no common variable between each row of matrix and the vector. To explain the model more clearly and concisely, the following ICLV model is designed as an example. Consider that the ICLV model is composed of four latent variables , four ordered indicator variables , and three alternatives’ utilities , where the structural equation model of the latent variables in equation (1) can be represented as follows:

Assume that each ordered variable has only one threshold (i.e., ), and the four-variate ordered probit model for the latent variable measurement model of equation (2) is as follows:

Then, the choice model with three alternatives of equation (3) can be specified as follows:where and are the exogenous explanatory variables in the choice model.

In the above model, the correlation matrix of the multivariate ordered probit model is fixed as an identity matrix, and the correlation matrix of latent variables and the covariance matrix Λ of the choice model can be defined as follows:

There are parameters in and parameters in that need to be estimated. The covariance matrix of the whole model after considering can be expressed as follows:

Applying the CML approach of in equations (7) and (8), the number of identifiable parameters in the is 16 (6 + 8+2) in the above sample. Thereinto, parameters of the matrix can be identified (the covariance matrix needs to normalize the diagonal elements to 1 in advance), parameters of can be identified where parameters can be identified in each row at most, and parameters can be identified in the matrix as mentioned in the section of methodology.

As shown in Figure 3, we conduct the simulation experiments of two models: one contains three alternatives of the choice model and the other contains four alternatives, with the same settings in the latent variable structural equation model and measurement equation model in which , , , and are fixed at the value of zero. For model 1 of three alternatives, the simulation experiment is undertaken 50 times for a sample size of 10,000 with different realizations of vector to generate 50 different datasets. The estimator is then applied to each dataset to estimate data specific values for the 31 parameters, and the numerical integration method for calculating the CDF of TVN distribution is used for the MVNCD function; for model 2 of four alternatives, we perform the above data generation progress once with a large sample size of 100,000 and then carry out the Monte Carlo simulation experiments on the same dataset using ME, BME, and TVBS methods, respectively. The analytic approximation method of four-variate normal CDF can be written as follows: for the ME method, for the BME method, and for the TVBS method. The 37 parameters can be estimated in the ICLV model.

4. Model Estimation Results

4.1. The MVNCD Function Evaluation Results

To evaluate the accuracy of computing the MVNCD function for different dimensions, we generate 1000 groups of random numbers, which consist of four sets of 250 MVNCD evaluations each, corresponding to (a) low correlations, high MVNCD values, (b) low correlations, low MVNCD values, (c) high correlations, high MVNCD values, and (d) high correlations, low MVNCD values. On the one hand, to generate the datasets of low and high MVNCD values, the upper integration limits can be drawn from for the cases of high MVNCD values, while it can be drawn from for the cases of low MVNCD values. On the other hand, to generate the datasets of low correlations and high correlations, the positive-definite covariance matrices can be calculated by , where represents a matrix of random univariate standard normal variates and is a diagonal matrix with the vector of standard uniform random variates on the diagonal. is a scalar to determine the relative magnitude of the diagonal elements relative to the non-diagonal elements. We can use the two different values of to distinguish the matrix with low correlations and high correlations .

The MVNCD evaluation results from the different methods are presented in Tables 1 and 2. Here, we use the Gaussian quadrature integral based on 400 nodes as the exact value when K ≤ 3 and use the value calculated by the CDFMVNe function of the Gauss (up to an accuracy of 1e − 6) as the exact value when K > 3. Meanwhile, the mean absolute error (MAE) and the mean absolute percentage error (MAPE) are chosen to evaluate the accuracy level of the MVNCD evaluation results. The “%MAE > 0.005” represents the percentage in which the MAE was over 0.005, and “%MAPE > 2” represents the percentage in which the MAPE was over 2.

From Table 1, the following findings can be obtained. Compared with the calculated results by the cdfBvn and cdfTvn functions in the GAUSS (https://www.aptech.com/wp-content/uploads/2016/05/GAUSS_16_PDF_UG.pdf), the accuracy levels (shown as the MAE) of BVN and TVN probability (based on the improvement in Section 2.3) calculated with the assistance of TensorFlow have been improved from e⁻¹⁰ to e⁻¹⁵ and from e⁻¹⁰ to e⁻¹¹, respectively, with a slight loss in time. When the integration dimension K is 2 or 3, as shown in Table 1, the accuracy level of the MAE for the ME and BME methods is e⁻³, which is far lower than that of bivariate and trivariate normal distribution function calculated by numerical integration. Similar conclusion can be obtained by choosing the MAPE as an index. Therefore, numerical integration should be adopted to calculate BVN or TVN values.

When K ≥ 4, we employed the LDLT matrix-based analytic approximation method for the evaluation on Gauss and TensorFlow platforms. From Table 2, when the number of dimensions “K” increases, the performance of the three methods can be indicated by “MAE,” “MAPE,” “%MAE > 0.005,” and “%MAPE > 2.” The TVBS method is taken as an example, and there is a variation as the dimension “K” goes from 4 to 15, with “MAE” reducing from 0.00117 to 0.00037, “%MAE > 0.005” reducing from 5.8% to 0.1%, and “%MAPE > 2” increasing from 8.4% to 32.0%. As shown in Figure 4(a), “%MAE > 0.005” gradually decreases and “%MAPE > 2” gradually increases with the increase in dimension of all three methods. To explain the intrinsic reason, we checked the data and found that the overall average value of 1000 MVNCD values calculated by the method of generating random number would decrease with the increase in dimension. Due to the lower MVNCD values in the higher dimension, the relative error (MAPE) may become larger with the higher dimension. Meanwhile, as expected, with the increase in K, the time required by the analytic approximation gradually increased for all the methods. In addition, when we compare the three analytical approximation methods with each other at the same dimension, it can be found that the time consumed by the BME method is the shortest. In general, the time consumed by BME is less than half of that by the ME method and that by the TVBS method falls in the middle. Compared with the ME method, the error of the BME and TVBS methods is relatively low, as both of them are based on the bivariate truncation scheme. Overall, the TVBS is the best in terms of accuracy in evaluating individual MVNCD functions. These findings are consistent with Bhat’s study (2018) and mutually corroborate.

(a)

(b)

Furthermore, there are some findings by comparing the calculation error and calculation time of the different analytic approximation methods (i.e., ME, BME, and TVBS) in Gauss and TensorFlow. In terms of calculation accuracy, it can be seen in Table 2 that “MAE,” “MAPE,” “%MAE > 0.005,” and “%MAPE > 2” of the three analytic approximation methods are basically the same. Therefore, although the new method improves accuracy levels of BVN and TVN probability in TensorFlow, its contribution to accuracy improvement is limited for high-dimensional analytic approximation; in terms of calculation time, it can be seen in Table 2 and Figure 4(b) that under the same dimension and algorithm, the calculation speed of TensorFlow platform is much higher than that of Gauss platform, and the average running time of Gauss is about 5-6 times that of TensorFlow. As shown in Table 1, there is no significant difference in the calculation time of BVN and TVN values on the two platforms. Thus, the declines in calculation time are mainly due to the obvious time advantage of TensorFlow with tensor data structure and graph computation technique.

4.2. Simulation Results for ICLV Model

Following Section 3, the two ICLV models are designed to evaluate the ability for recovering the model parameters. For the three-alternative ICLV model with the maximum integral dimension being 3, the performance of the improved trivariate normal integral approach in estimating the parameters and the standard errors of the ICLV model is evaluated in Table 3. The evaluation system can be adopted as follows:(1)For the parameter estimates, the mean estimate for each model parameter across the 50 datasets can be obtained. Then, the absolute bias of the estimator is computed as and the absolute percentage bias as .(2)For the standard error estimates, the standard deviation of each estimated parameter across the 50 datasets can be computed, and this is labeled as the finite sample standard deviation or FSSD (essentially, this is the empirical standard error); also, we can compute the mean standard deviation for each model parameter across the 50 datasets and denote this as the asymptotic standard error or ASE (the standard error here is computed using the Godambe (sandwich) matrix at convergence for a theoretical approximation to the FSSD). Next, to evaluate the accuracy of the ASE formula for the finite sample size used, the relative efficiency of the estimator is computed as . As per Bhat [10], the closeness between ASE and FSEE may be examined as captured by the 0.75–1.25 range for the relative efficiency value.

The following aspects can be observed in Table 3 (the experiment is repeated 50 times with a sample size of 10,000). Firstly, the overall mean value across parameters of APB is 3.7940%, with the individual parameter APB ranging from 0.1% to 19%, which are close to those in Bhat. Besides, the simulation errors are also similar and therefore considered reasonable and acceptable; secondly, the mean APB values for the vector (9.0%), vector (4.3%), and in Γ matrix (9.0%) are relatively higher than the overall mean APB. This indicates that the coefficients on the latent variable vector in the measurement equation and the choice model are somewhat more difficult to recover than other parameters. It is expected, since these elements enter into the covariance matrix Ω in a nonlinear fashion as shown in equation (6), and Ω itself enters into the composite likelihood function (equation (8)) in a complex manner. Lastly, as for simulation error estimates, the overall mean value of relative efficiency (RE) across the parameters is 1.1190 (and the RE of almost all parameters is between 0.75 and 1.25), which suggests that the FSSE and ASE values are close to each other and the asymptotic formula is performing well in estimating the finite sample standard errors.

The simulation results (with a sample size of 100,000) of the four-alternative ICLV model estimated by the three analytical approximation methods, namely the ME, BME, and TVBS, are presented in Tables 4 –6, respectively. In these tables, and . As shown in the experimental results, it is ensured that all the identifiable parameters can be significantly estimated and seemingly consistent with their true values. We can see that the values of absolute percentage bias (APB) fall in the ranges of 0–15% in the three tables, of which the APB of four parameters exceeds 10% (i.e., , , , and ). It is a similar finding showing that the parameter estimators related to are less consistent. Additionally, in these three tables (also shown in Figure 5), it can be observed that among the three methods (i.e., ME, BME, and TVBS), the overall mean value of absolute bias is 0.0266, 0.0262, and 0.0259, and the average value of APB is 4.269%, 4.240%, and 4.204%, respectively. It is noticed that the values of these two indicators slightly decrease. On the other hand, the standard errors of all methods are about 0.037. It shows that the standard errors of the four-alternative ICLV model seem reasonable and acceptable. Overall, we have validated the ability for recovering parameters of these three LDLT-based analytic approximation methods for the ICLV model, and we can conclude that TVBS is superior to BME and BME is superior to ME. These findings are consistent with the evaluation results with respect to individual MVNCD functions. Although model parameters can be estimated significantly with a sample of large size, there are issues of empirical identification, namely whether the data can support the model specification. If the empirical data cannot allow for significantly estimating all the theoretically identifiable parameters, some parameters may be fixed and only parameters of particular interest need to be estimated.

(a)

(b)

Furthermore, more attempts were made to discuss the operational efficiency of models using the automatic differential technique. The convergence time of the three-alternative ICLV model in Table 3 has a median value of about 8 minutes for the case of 10,000 observations, which is much shorter than that of the literature of Bhat in 2014 (about one hour for the case of 1000 observations). It can also be inferred that the computational time of the improved four-alternative ICLV model based on TensorFlow is reduced. This can be attributed to two factors: one is that the LDLT-based analytic approximation method (instead of the MACML approach) can shorten the calculation time of the ICLV model, as mentioned in Bhat. The improvement on bivariate normal CDF and trivariate normal CDF also facilitates this process; the other is based on the time advantage of TensorFlow in calculating tensor and multidimensional arrays. In addition, in the absence of coding the analytic gradient artificially, the convergence speed of the model can be improved using automatic differentiation to obtain the gradient of the likelihood function. For the four-alternative ICLV model with a sample size of 100,000, the convergence times of three analytical approximation methods are 3 h7 min (for ME), 1 h14 min (for BME), and 2 h6 min (for TVBS), respectively. Among them, the BME algorithm has obvious advantages in terms of time consumption. Hence, if the accuracy level can be lowered under some situations (e.g., in a variable screening process), the BME algorithm can also be chosen to save computational time. In this section, it is not necessary to evaluate the ICLV model of Bhat for comparison, because Bhat has explicitly shown that the LDLT matrix-based analytic approximation method runs much faster than other methods (such as the SSJ method) in evaluating MVNCD functions and estimating MNP model. It is worth noting that all simulation experiments are conducted on a desktop computer with an Intel Core™ [email protected] GHz Processor and 16 GB of RAM.

5. Conclusions and Discussion

On the basis of the research advances of Bhat in 2018 and the automatic differentiation (AD) technique on the TensorFlow platform, this study integrates the LDLT matrix-based analytic approximation method into the probit kernel-based ICLV formulation, which can further improve the performance of the ICLV model in terms of accuracy level and computational speed. The main achievements of this study can be summarized as follows.

Firstly, the individual MVNCD function was evaluated with different dimensions. By improving the numerical integration algorithm of CDF of BVN and TVN, the accuracy level of BVN and TVN functions was enhanced in this study (compared with algorithms in GAUSS). Based on the TensorFlow platform, we also tested the errors and operation time of three types of matrix-based analytic approximation methods in various dimensions. It is found that the operation time of BME is the shortest and the accuracy level of TVBS is the highest in the same dimension. By comparing the calculation of MVNCD functions on Gauss and TensorFlow platforms, it shows that the calculation speed of TensorFlow is much higher than that of Gauss.

Subsequently, we designed the simulation experiments of the three-alternative ICLV model and four-alternative ICLV model. The estimates and standard errors of both models are within a reasonable range, of which the parameters related to latent variables may be more difficult to recover. For the three-alternative ICLV model, the numerical integration method of calculating TVN value was employed and the simulation experiments were repeated 50 times with a sample size of 10,000. The simulation results show that the overall mean value of APB is 3.79% and the relative efficiency is 1.1190, exhibiting a desirable capability of parameter recovery. After that, this study applies three analytic approximation methods (i.e., ME, BME, and TVBS) in the MVNCD function calculation of the ICLV model and conducts simulation experiments with a sample size of 100,000 regarding the four-alternative ICLV model, and significant results of consistent estimation value can be obtained. The average values of absolute bias and APB decreased in the order of ME, BME, and TVBS, indicating the increased accuracy. To sum up, when we estimate the ICLV model with high-dimensional integration, the TVBS method can be chosen with the highest accuracy level. Also, the BME method is an alternative to save computational time.

Meanwhile, with the assistance of the AD technique, the researchers are freed from coding analytical gradients. Compared with the probit-based ICLV model proposed by Bhat, the computational time of the proposed model is reduced remarkably, which is mainly attributed to two reasons: one is adopting the LDLT matrix-based analytic approximation methods to evaluate MVNCD function; the other is based on the advantages of TensorFlow in calculating tensor (represented as multidimensional array). The operation of a three-alternative ICLV model with a sample size of 10,000 only takes 8 min with the time advantage of LDLT-based analytic approximation methods, which can facilitate the applications of the proposed method in practice.

In the future work, we will further explore the practical application of the improved ICLV model with multiple alternatives and incorporate more psychological attitude variables into the model so as to explore the potential influencing mechanism of travel behaviors by using real data. If possible, further comparisons in terms of parameter estimation results between the ICLV model and previous models will be discussed.

Data Availability

The data used to support the findings of this study are simulation data and can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was partially supported by the general project “Study on the Mechanism of Travel Pattern Reconstruction in Mobile Internet Environment” (no. 71671129) and the key project “Research on the Theories for Modernization of Urban Transport Governance” (no. 71734004) from the National Natural Science Foundation of China.

References

A. De Witte, J. Hollevoet, F. Dobruszkes, M. Hubert, and C. Macharis, “Linking modal choice to motility: a comprehensive review,” Transportation Research Part A: Policy and Practice, vol. 49, pp. 329–341, 2013.
View at: Publisher Site | Google Scholar
M. Ben-Akiva, J. Walker, A. T. Bernardino, D. A. Gopinath, T. Morikawa, and A. Polydoropoulou, “Integration of choice and latent variable models,” In Perpetual Motion, pp. 431–470, 2002.
View at: Publisher Site | Google Scholar
D. Bolduc and M. Ben-Akiva, Hybrid Choice Models with Logit Kernel: Applicability to Large Scale Models, Emerald Group Publishing Limited, Bingley, UK, 2005.
M. Vredin Johansson, T. Heldt, and P. Johansson, “The effects of attitudes and personality traits on mode choice,” Transportation Research Part A: Policy and Practice, vol. 40, no. 6, pp. 507–525, 2006.
View at: Publisher Site | Google Scholar
D. Temme, M. Paulssen, and T. Dannewald, “Incorporating Latent Variables into Discrete Choice Models - A Simultaneous Estimation Approach Using SEM Software,” Business Research, vol. 1, no. 2, pp. 220–237, 2008.
View at: Publisher Site | Google Scholar
A. Daly, S. Hess, B. Patruni, D. Potoglou, and C. Rohr, “Using ordered attitudinal indicators in a latent variable choice model: a study of the impact of security on rail Travel behaviour,” Transportation, vol. 39, no. 2, pp. 267–297, 2012.
View at: Publisher Site | Google Scholar
R. A. Daziano and D. Bolduc, “Incorporating pro-environmental preferences towards green automobile technologies through a bayesian hybrid choice model,” Transportmetrica A: Transport Science, vol. 9, no. 1, pp. 74–106, 2013.
View at: Publisher Site | Google Scholar
C. R. Bhat, “Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model,” Transportation Research Part B: Methodological, vol. 35, no. 7, pp. 677–693, 2001.
View at: Publisher Site | Google Scholar
X. Fu and Z. Juan, “Estimation of multinomial probit-kernel integrated choice and latent variable model: comparison on one sequential and two simultaneous approaches,” Transportation, vol. 44, no. 1, pp. 91–116, 2017.
View at: Publisher Site | Google Scholar
C. R. Bhat and S. K. Dubey, “A new estimation approach to integrate latent psychological constructs in choice modeling,” Transportation Research Part B: Methodological, vol. 67, pp. 68–85, 2014.
View at: Publisher Site | Google Scholar
C. R. Bhat, S. K. Dubey, and K. Nagel, “Introducing non-normality of latent psychological constructs in choice modeling with an application to bicyclist route choice,” Transportation Research Part B: Methodological, vol. 78, pp. 341–363, 2015.
View at: Publisher Site | Google Scholar
M. Kamargianni, S. Dubey, A. Polydoropoulou, and C. Bhat, “Investigating the subjective and objective factors influencing teenagers' school travel mode choice - An integrated choice and latent variable model,” Transportation Research Part A: Policy and Practice, vol. 78, pp. 473–488, 2015.
View at: Publisher Site | Google Scholar
C. R. Bhat, C. Varin, and N. Ferdous, “A comparison of the maximum simulated likelihood and composite marginal likelihood estimation approaches in the context of the multivariate ordered-response model,” Maximum Simulated Likelihood Methods and Applications, vol. 26, pp. 65–106, 2010.
View at: Publisher Site | Google Scholar
N. R. Mendell and R. C. Elston, “Multifactorial qualitative traits: genetic analysis and prediction of recurrence risks,” Biometrics, vol. 30, no. 1, pp. 41–57, 1974.
View at: Publisher Site | Google Scholar
C. R. Bhat, “The maximum approximate composite marginal likelihood (MACML) estimation of multinomial probit-based unordered response choice models,” Transportation Research Part B: Methodological, vol. 45, no. 7, pp. 923–939, 2011.
View at: Publisher Site | Google Scholar
C. R. Bhat, “New matrix-based methods for the analytic evaluation of the multivariate cumulative normal distribution function,” Transportation Research Part B: Methodological, vol. 109, pp. 238–256, 2018.
View at: Publisher Site | Google Scholar
S. Linnainmaa, “Taylor expansion of the accumulated rounding error,” BIT, vol. 16, no. 2, pp. 146–160, 1976.
View at: Publisher Site | Google Scholar
H. J. Skaug, “Automatic differentiation to facilitate maximum likelihood estimation in nonlinear random effects models,” Journal of Computational and Graphical Statistics, vol. 11, no. 2, pp. 458–470, 2002.
View at: Publisher Site | Google Scholar
D. A. Fournier, H. J. Skaug, J. Ancheta et al., “AD model builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models,” Optimization Methods and Software, vol. 27, no. 2, pp. 233–249, 2012.
View at: Publisher Site | Google Scholar
V. P. Godambe, “An optimum property of regular maximum likelihood estimation,” The Annals of Mathematical Statistics, vol. 31, no. 4, pp. 1208–1211, 1960.
View at: Publisher Site | Google Scholar
Y. Zhao and H. Joe, “Composite likelihood estimation in multivariate data analysis,” Canadian Journal of Statistics, vol. 33, no. 3, pp. 335–356, 2005.
View at: Publisher Site | Google Scholar
R. D. Connors, S. Hess, and A. Daly, “Analytic approximations for computing probit choice probabilities,” Transportmetrica A: Transport Science, vol. 10, no. 2, pp. 119–139, 2014.
View at: Publisher Site | Google Scholar
P. N. Patil, S. K. Dubey, A. R. Pinjari, E. Cherchi, R. Daziano, and C. R. Bhat, “Simulation evaluation of emerging estimation techniques for multinomial probit models,” Journal of Choice Modelling, vol. 23, pp. 9–20, 2017.
View at: Publisher Site | Google Scholar
D. Kim, J. Ko, and Y. Park, “Factors affecting electric vehicle sharing program participants' attitudes about car ownership and program participation,” Transportation Research Part D: Transport and Environment, vol. 36, pp. 96–106, 2015.
View at: Publisher Site | Google Scholar
A. Genz, “Numerical computation of rectangular bivariate and trivariate normal and t probabilities,” Statistics and Computing, vol. 14, no. 3, pp. 251–260, 2004.
View at: Publisher Site | Google Scholar
Z. Drezner and G. O. Wesolowsky, “On the computation of the bivariate normal integral,” Journal of Statistical Computation and Simulation, vol. 35, no. 1-2, pp. 101–107, 1990.
View at: Publisher Site | Google Scholar
H. Gassmann, “Rectangle probabilities of trivariate normal distributions. Unpublished preprint available from website,” 2000, https://link.springer.com.
View at: Google Scholar
A. Verma, “An introduction to automatic differentiation,” Current Science, vol. 78, no. 7, p. 4, 2000.
View at: Google Scholar

Copyright

Copyright © 2022 Jie Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

473

Downloads

390

Citations