Abstract

This paper addresses some of the recent developments in efficiency measurement using stochastic frontier (SF) models in some selected areas. The following three issues are discussed in details. First, estimation of SF models with input-oriented technical efficiency. Second, estimation of latent class models to address technological heterogeneity as well as heterogeneity in economic behavior. Finally, estimation of SF models using local maximum likelihood method. Estimation of some of these models in the past was considered to be too difficult. We focus on the advances that have been made in recent years to estimate some of these so-called difficult models. We complement these with some developments in other areas as well.

1. Introduction

In this paper we focus on three issues. First, we discuss issues (mostly econometric) related to input-oriented (IO) and output-oriented (OO) measures of technical inefficiency and talk about the estimation of production functions with IO technical inefficiency. We discuss implications of the IO and OO measures from both the primal and dual perspectives. Second, the latent class (finite mixing) modeling approach is extended to accommodate behavioral heterogeneity. Specifically, we consider profit- (revenue-) maximizing and cost-minimizing behaviors with technical inefficiency. In our mixing/latent class model, first we consider a system approach in which some producers maximize profit while others simply minimize cost, and then we use a distance function approach, and mix the input and output distance functions (in which it is assumed, at least implicitly, that some producers maximize revenue while others minimize cost). In the distance function approach the behavioral assumptions are not explicitly taken into account. The prior probability in favor of profit (revenue) maximizing behavior is assumed to depend on some exogenous variables. Third, we consider stochastic frontier (SF) models that are estimated using local maximum likelihood (LML) method to address the flexibility issue (functional form, heteroskedasticity, and determinants of technical inefficiency).

2. The IO and OO Debate

The technology (with or without inefficiency) can be looked at from either a primal or a dual perspective. In a primal setup two measures of technical efficiency are mostly used in the efficiency literature. These are (i) input-oriented (IO) technical inefficiency and (ii) output oriented (OO) technical inefficiency.1 There are some basic differences between the IO and OO models so far as features of the technology are concerned. Although some of these differences and their implications are well-known except for Kumbhakar and Tsionas [1], no one has estimated a stochastic production frontier model econometrically with IO technical inefficiency using cross-sectional data.2 Here we consider estimation of a translog production model with IO technical inefficiency.

2.1. The IO and OO Models

Consider a single output production technology where is a scalar output and is a vector of inputs. Then the production technology with the IO measure of technical inefficiency can be expressed as where is a scalar output, is IO efficiency (a scalar), is the vector of inputs, and indexes firms. The IO technical inefficiency for firm is defined as and is interpreted as the rate at which all the inputs can be reduced without reducing output. On the other hand, the technology with the OO measure of technical inefficiency is specified as where represents OO efficiency (a scalar), and is defined as OO technical inefficiency. It shows the percent by which actual output could be increased without increasing inputs (for more details, see Figure 1).

It is clear from (2.1) and (2.2) that if is homogeneous of degree then , that is, independent of and . If homogeneity is not present their relationship will depend on the input quantities and the parametric form of .

We now show the IO and OO measures of technical efficiency graphically. The observed production plan () is indicated by the point A. The vertical length AB measures OO technical inefficiency, while the horizontal distance AC measures IO technical inefficiency. Since the former measures percentage loss of output while the latter measures percentage increase in input usage in moving to the production frontier starting from the inefficient production plan indicated by point A, these two measures are, in general, not directly comparable. If the production function is homogeneous, then one measure is a constant multiple of the other, and they are the same if the degree of homogeneity is one. In the more general case, they are related in the following manner: .

Although we consider technologies with a single output, the IO and OO inefficiency can be discussed in the context of multiple output technologies as well.

2.2. Economic Implications of the IO and OO Models

Here we ask two questions. First, does it matter whether one uses the IO or the OO representation so far as estimation of the technology is concerned? That means, whether features of the estimated technology such as elasticities, returns to scale, and so forth, are invariant to the choice of efficiency orientation. Second, are efficiency rankings of firms invariant to the choice of efficiency orientation? That is, does one get the same efficiency measures (converted in terms of either output loss or increase in costs) in both cases? It is not possible to provide general theoretical answers to these questions. These are clearly empirical issues so it is necessary to engage in applied research to get a feel for the similarities and differences of the two approaches.

Answers to these questions depend on the form of the production technology. If it is homogeneous, then there is no difference between these two models econometrically. This is because for a homogeneous function , where is the degree of homogeneity. Thus, rankings of firms with respect to and will be exactly the same (one being a constant multiple of the other). Moreover, since , the input elasticities as well as returns to scale measures based on these two specifications of the technology will be the same.3

This is, however, not the case if the technology is nonhomogenous. In the OO model the elasticities and returns to scale will be independent of the technical inefficiency because technical efficiency (i.e., assumed to be independent of inputs) enters multiplicatively into the production function. This is not true for the IO model, where technical inefficiency enters multiplicatively with the inputs. This will be shown explicitly later for a nonhomogeneous translog production function.

2.3. Econometric Modeling and Efficiency Measurement

Using the lower case letters to indicate the log of a variable, and assuming that has a translog form the IO model can be expressed as where is the log of output, denotes the vector of ones, is the vector of inputs in log terms, is the trend/shift variable, , and are scalar parameters, , are parameter vectors, is a symmetric matrix containing parameters, and is the noise term. To make nonnegative we defined it as .

We rewrite the IO model above as where , , and , . Note that if the production function is homogeneous of degree , then . In such a case the function becomes a constant multiple of , (namely, ), and consequently, the IO model cannot be distinguished from the OO model. The function shows the percent by which output is lost due to technical inefficiency. For a well-behaved production function for each .

The OO model, on the other hand, takes a much simpler form, namely, where we defined to make it nonnegative.4 The OO model in this form is the one introduced by Aigner et al. [2] and Meeusen and van den Broeck [3], and since then it has been used extensively in the efficiency literature. Here we follow the framework used in Kumbhakar and Tsionas [1] when θ is random.5

We write (2.4) more compactly as Both and are functions of the original parameters, and also depends on the data ( and ).

Under the assumption that and is distributed independently of with the density function , where is a parameter, the probability density function of can be expressed as where denotes the entire parameter vector.

We consider a half-normal and an exponential specification for the density , namely, The likelihood function of the model is then where has been defined above. Since the integral defining is not available in closed form we cannot find an analytical expression for the likelihood function. However, we can approximate the integrals using a simulation as follows. Suppose is a random sample from . Then it is clear that and an approximation of the log-likelihood function is given by which can be maximized by numerical optimization procedures to obtain the ML estimator. For the distributions we adopted, random number generation is trivial, so implementing the SML estimator is straightforward.6

Inefficiency estimation is accomplished by considering the distribution of conditional on the data and estimated parameters where a tilde denotes the ML estimate, and denotes the data. For example, when is half-normal we get This is not a known density, and even the normalizing constant cannot be obtained in closed form. However, the first two moments and the normalizing constant can be obtained by numerical integration, for example, using Simpson’s rule.

To make inferences on efficiency, define efficiency as and obtain the distribution of and its moments by changing the variable from to . This yields

The likelihood function for the OO model is given in Aigner et al. [2] (hereafter ALS).7 The Maximum likelihood method for estimating the parameters of the production function in the OO models are straightforward and have been used extensively in the literature starting from ALS.8 Once the parameters are estimated, technical inefficiency () is estimated from —the Jondrow et al. [4] formula. Alternatively, one can estimate technical efficiency from using the Battese and Coelli [5] formula. For an application of this approach see Kumbhakar and Tsionas [1].

2.4. Looking Through the Dual Cost Functions
2.4.1. The IO Approach

We now examine the IO and OO models when behavioral assumptions are explicitly introduced. First, we examine the models when producers minimize cost to produce the given level of output(s). The objective of a producer is to from which conditional input demand functions can be derived. The corresponding cost function can then be expressed as where is the minimum cost function (cost frontier) and is the actual cost. Finally, one can use Shephard’s lemma to obtain , where the superscripts and * indicate actual and cost-minimizing levels of input .

Thus, the IO model implies (i) a neutral shift in the cost function which in turn implies that RTS and input elasticities are unchanged due to technical inefficiency, (ii) an equiproportional increase (at the rate given by ) in the use of all inputs due to technical inefficiency, irrespective of the output level and input prices.

To summarize, result (i) is just the opposite of what we obtained in the primal case (see [6]). Result (ii) states that when inefficiency is reduced firms will move horizontally to the frontier (as expected by the IO model).

2.4.2. The OO Model

Here the objective function is written as from which conditional input demand functions can be derived. The corresponding cost function can then be expressed as where as, before, is the minimum cost function (cost frontier) and is the actual cost. Finally, . One can then use Shephard’s lemma to obtain where the last inequality will hold if the cost function is well behaved. Note that unless is a constant.

Thus, the results from the OO model are just the opposite from those of the IO model. Here (i) inefficiency shifts the cost function nonneutrally (meaning that depends on output and input prices as well as ; (ii) increases in input use are not equiproportional (depends on output and input prices); (iii) the cost shares are not independent of technical inefficiency, (iv) the model is harder to estimate (similar to the IO model in the primal case).9

More importantly, the result in (i) is just the opposite of what we reported in the primal case. Result (ii) is not what the OO model predicts (increase in output) when inefficiency is eliminated. Since output is exogenously given in a cost-minimizing framework, input use has to be reduced when inefficiency is eliminated.

The results from the dual cost function models are just the opposite of what the primal models predict. Since the estimated technologies using cost functions are different in the IO and OO models, as in the primal case, we do not repeat the results based on the production/distance functions results here.

2.5. Looking Through the Dual Profit Functions
2.5.1. The IO Model

Here we assume that the objective of a producer is to from which unconditional input demand and supply functions can be derived. Since the above problem reduces to a standard neoclassical profit-maximizing problem when is replaced by , and is replaced by , the corresponding profit function can be expressed as where is actual profit, is the profit frontier (homogeneous of degree one in and ) and is profit inefficiency. Note that the function depends on , , and in general. Application of Hotelling’s lemma yields the following expressions for the output supply and input demand functions: where the superscripts and * indicate actual and optimum levels of output and inputs . The last inequality in the above equations will hold if the underlying production technology is well behaved.

2.5.2. The OO Model

Here the objective function can be written as which can be viewed as a standard neoclassical profit-maximizing problem when is replaced by and is replaced by , the corresponding profit function can be expressed as where . Similar to the IO model using Hotelling’s lemma, we get The last inequality in the above equations will hold if the underlying production technology is well behaved.

To summarize (i) a shift in the profit functions for both the IO and OO models is non-neutral. Therefore, estimated elasticities, RTS, and so on, are affected by the presence of technical inefficiency, no matter what form is used. (ii) Technical inefficiency leads to a decrease in the production of output and decreases in input use in both models, however, prediction of the reduction in input use and production of output are not the same under both models.

Even under profit maximization that recognizes endogeneity of both inputs and outputs, it matters which model is used to represent the technology!! These results are different from those obtained under the primal models and from the cost minimization framework. Thus, it matters (both theoretically and empirically) whether one uses an input- or output-oriented measure of technical inefficiency.

3. Latent Class Models

3.1. Modeling Technological Heterogeneity

In modeling production technology we almost always assume that all the producers use the same technology. In other words, we do not allow the possibility that there might be more than one technology being used by the producers in the sample. Furthermore, the analyst may not know who is using what technology. Recently, a few studies have combined the stochastic frontier approach with the latent class structure in order to estimate a mixture of several technologies (frontier functions). Greene [7, 8] proposes a maximum likelihood for a latent class stochastic frontier with more than two classes. Caudill [9] introduces an expectation-maximization (EM) algorithm to estimate a mixture of two stochastic cost frontiers with two classes.10 Orea and Kumbhakar [10] estimated a four-class stochastic frontier cost function (translog) with time-varying technical inefficiency.

Following the notations of Greene [7, 8] we specify the technology for class as where is a nonnegative random term added to the production function to accommodate technical inefficiency.

We assume that the noise term for class follows a normal distribution with mean zero and constant variance, . The inefficiency term is modeled as a half-normal random variable following standard practice in the frontier literature, namely, That is, a half-normal distribution with scale parameter for each class.

With these distributional assumptions, the likelihood for firm , if it belongs to class , can be written as [11] where , and . Finally, and are the pdf and cdf of a standard normal variable.

The unconditional likelihood for firm is obtained as the weighted sum of their -class likelihood functions, where the weights are the prior probabilities of class membership. That is, where the class probabilities can be parameterized by, for example, a logistic function. Finally, the log likelihood function is

The estimated parameters can be used to compute the conditional posterior class probabilities. Using Bayes’ theorem (see Greene [7, 8] and Orea and Kumbhakar [10]) the posterior class probabilities can be obtained from

This expression shows that the posterior class probabilities depend not only on the estimated parameters in , but also on parameters of the production frontier and the data. This means that a latent class model classifies the sample into several groups even when the are fixed parameters (independent of ).

In the standard stochastic frontier approach where the frontier function is the same for every firm, we estimate inefficiency relative to the frontier for all observations, namely, inefficiency from and efficiency from . In the present case, we estimate as many frontiers as the number of classes. So the question is how to measure the efficiency level of an individual firm when there is no unique technology against which inefficiency is to be computed. This is solved by using the following method, where is the posterior probability to be in the th class for a given firm (defined in (3.9)), and is its efficiency using the technology of class as the reference technology. Note that here we do not have a single reference technology. It takes into account technologies from every class. The efficiency results obtained by using (3.10) would be different from those based on the most likely frontier and using it as the reference technology. The magnitude of the difference depends on the relative importance of the posterior probability of the most likely cost frontier, the higher the posterior probability the smaller the differences. For an application see Orea and Kumbhakar [10].

3.2. Modeling Directional Heterogeneity

In Section 2.3 we talked about estimating IO technical inefficiency. In practice most researchers use the OO model because it is easy to estimate. Now we address the question of choosing one over the other. Orea et al. [12] used a model selection test procedure to determine whether the data support the IO, OO, or the hyperbolic model. Based on such a test result, one may decide to use the direction that fits the data best. This implictly assumes that all producers in the sample behave in the same way. In reality, firms in a particular industry, although using the same technology, may choose different direction to move to the frontier. For example, some producers might find it costly to adjust input levels to attain the production frontier, while for others it might be easier to do so. This means that some producers will choose to shrink their inputs while others will augment the output level. In such a case imposing one direction for all sample observations is not efficient. The other practical problem is that no one knows in advance, which producers are following what direction. Thus, we cannot estimate the IO model for one group and the OO model for another.

The advantage of the LCM is that it is not necessary to impose a priori criterion to identify which producers are in what class. Moreover, we can formally examine whether some exogenous factors are responsible for choosing the input or the output direction by making the probabilities function of exogenous variables. Furthermore, when panel data is available, we do not need to assume that producers follow one direction for all the time, so we can accommodate switching behaviour and determine when they go in the input (output) direction.

3.2.1. The Input-Oriented Model

Under the assumption that , and is distributed independently of , according to a distribution with density , where is a parameter, the distribution of has density where denotes the entire parameter vector. We use a half-normal specification for , namely,

The likelihood function of the IO model is where has been defined in (3.8). Since the integral defining in (3.11) is not available in closed form, we cannot find an analytical expression for the likelihood function. However, we can approximate the integrals using Monte Carlo simulation as follows. Suppose is a random sample from . Then it is clear that and an approximation of the log-likelihood function is given by which can be maximized by numerical optimization procedures to obtain the ML estimator. To perform SML estimation, we consider the integral in (3.11). We can transform the range of integration to by using the transformation which has a natural interpretation as IO technical efficiency. Then, (3.11) becomes Suppose is a set of standard uniform random numbers, for . Then the integral can be approximated using the Monte Carlo estimator where .

The standard uniform random numbers and their log transformation can be saved in an matrix before maximum likelihood estimation and reused to ensure that the likelihood function is a differentiable function of the parameters. An alternative is to maintain the same random number seed and redraw these numbers for each call to the likelihood function. This option increases computing time but implies considerable savings in terms of memory. An alternative to the use of pseudorandom numbers is to use the Halton sequence to produce quasi-random numbers that fill the interval . The Halton sequence has been used in econometrics by Train [13] for the multinomial probit model, and Greene [14] to implement SML estimation of the normal-gamma stochastic frontier model.

3.2.2. The Output-Oriented Model

Estimation of the OO is easy since the likelihood function is available analytically. The model is We make the standard assumptions that , , and both are mutually independent as well as independent of . The density of is [11, page 75] where , , , and and denote the standard normal pdf and cdf, respectively. The log likelihood function of the model is

3.3. The Finite Mixture (Latent Class) Model

The IO and OO models can be embedded in a general model that allows model choice for each observation in the absence of sample separation information. Specifically, we assume that each observation is associated with the OO class with probability , and with the IO class with probability . To be more precise, we have the model with probability , and the model with probability , where the stochastic elements obey the assumptions that we stated previously in connection with the OO and IO models. Notice that the technical parameters, , are the same in the two classes. Denote the parameter vector by . The density of will be where , and are subsets of . The log likelihood function of the model is The log likelihood function depends on the IO density , which is not available in closed form but can be obtained with the aid of simulation using the principles presented previously to obtain where has been defined in (3.14) and in (3.16). This log likelihood function can be maximized using standard techniques to obtain the SML estimates of the LCM.

3.3.1. Technical Efficiency Estimation in the Latent Class Model

A natural output-based efficiency measure derived from the LCM is where is the posterior probability that the th observation came from the OO class. These posterior probabilities are of independent interest since they can be used to provide inferences on whether a firm came from the OO or IO universe, depending on whether, for example, or . This information can be important in deciding which type of adjustment cost (input- or output-related) is more important for a particular firm.

From the IO component of the LCM we have the IO-related efficiency measure, say , and its standard deviation, say , that can be compared with and from the IO model. Similarly we can compare with the output efficiency of the IO model () and/or the output efficiency of the OO component of the LCM.

3.4. Returns to Scale and Technical Change

Note that returns to scale (defined as ) is not affected by the presence of technical inefficiency in the OO model. The same is true for input elasticities and elasticities of substitution (that are not explored here). This is because inefficiency in the OO model shifts the production function in a neutral fashion. On the contrary, the magnitude of technical inefficiency affects RTS in the IO models. Using the translog specification in (2.4), we get whereas the formula for RTS in the OO model is

We now focus on estimates of technical change from the IO and OO models. Again TC in the IO model can be measured conditional on () and TC defined at the frontier (), namely, These two formulas will give different results if technical change is neutral and/or the production function is homogeneous (i.e., ). The formula for is the same as except for the fact that the estimated parameters in (3.3) are from the IO model, whereas the parameters to compute are from the OO model.

It should be noted that in the LCM we enforce the restriction that the technical parameters, , are the same in the IO and OO components of the mixture. This implies that RTS and TC will be the same in both components if we follow the first approach, but they will be different if we follow the second approach. In the second approach, a single measure of RTS and TC can be defined as the weighted average of both measures using the posterior probabilities, , as weights. To be more precise, suppose is the type II RTS measure derived from the IO component of the LCM, and is the RTS measure derived from the OO component of the LCM. The overall LCM measure of RTS will be . Similar methodology is followed for the TC measure.

4. Relaxing Functional form Assumptions (SF Model with LML)

In this section we introduce the LML methodology [15] in estimating SF models in such a way that many of the limitations of the SF models originally proposed by Aigner et al. [2], Meeusen and van den Broeck [3], and their extensions in the last two and a half decades are relaxed. Removal of all these deficiencies generalizes the SF models and makes them comparable to the DEA models. Moreover, we can apply standard econometric tools to perform estimation and draw inferences.

To fix ideas, suppose we have a parametric model that specifies the density of an observed dependent variable conditional on a vector of observable covariates , a vector of unknown parameters , and let the density be . The parametric ML estimator is given by

The problem with the parametric ML estimator is that it relies heavily on the parametric model that can be incorrect if there is uncertainty regarding the functional form of the model, the density, and so forth. A natural way to convert the parametric model to a nonparametric one is to make the parameter a function of the covariates . Within LML this is accomplished as follows. For an arbitrary , the LML estimator solves the problem where is a kernel that depends on a matrix bandwidth . The idea behind LML is to choose an anchoring parametric model and maximize a weighted log-likelihood function that places more weight to observations near rather than weight each observation equally, as the parametric ML estimator would do.11 By solving the LML problem for several points , we can construct the function that is an estimator for , and effectively we have a fully general way to convert the parametric model to a nonparametric approximation to the unknown model.

Suppose we have the following stochastic frontier cost model: where is log cost and is a vector of input prices and outputs12; and are the noise and inefficiency components, respectively. Furthermore, and are assumed to be mutually independent as well as independent of .

To make the frontier model more flexible (nonparametric), we adopt the following strategy. Consider the usual parametric ML estimator for the normal () and truncated normal () stochastic cost frontier model that solves the following problem [16]: where , and denotes the standard normal cumulative distribution function. The parameter vector is and the parameter space is . Local ML estimation of the corresponding nonparametric model involves the following steps. First, we choose a kernel function. A reasonable choice is where is the dimensionality of , , is a scalar bandwidth, and is the sample covariance matrix of . Second, we choose a particular point , and solve the following problem: A solution to this problem provides the LML parameter estimates , and . Also notice that the weights do not involve unknown parameters (if is known) so they can be computed in advance and, therefore, the estimator can be programmed in any standard econometric software.13 For an application of this methodology to US commercial banks see Kumbhakar and Tsionas [17, 18] and Kumbhakar et al. [15].

5. Some Advances in Stochastic Frontier Analysis

5.1. General

There are many innovative empirical applications of stochastic frontier analysis in recent years. One of them is in the field of auctions, a particular field of game theory. Advances and empirical applications in this field are likely to accumulate rapidly and contribute positively to the advancement of empirical Game theory and empirical IO. Kumbhakar et al. [19] propose Bayesian analysis of an auction model where systematic over-bidding and under-bidding is allowed. Extensive simulations are used to show that the new techniques perform well and ignoring measurement error or systematic over-bidding and under-bidding is important in the final results.

Kumbhakar and Parmeter [20] derive the closed-form likelihood and associated efficiency measures for a two-sided stochastic frontier model under the assumption of normal-exponential components. The model receives an important application in the labor market where employees and employers have asymmetric information, and each one tries to manipulate the situation to his own advantage, Employers would like to hire for less and employees to obtain more in the bargaining process. The precise measurement of these components is, apparently, important.

Kumbhakar et al. [21] acknowledge explicitly the fact that certain decision making units can be fully (i.e., 100%) efficient, and propose a new model which is a mixture of (i) a half-normal component for inefficient firms and (ii) a mass at zero for efficient firms. Of course, it is not known in advance which firms are fully efficient or not. The authors propose classical methods of inference organized around maximum likelihood and provide extensive simulations to explore the validity and relevance of the new techniques under various data generating processes.

Tsionas [22] explores the implications of the convolution in stochastic frontier models. The fundamental point is that even when the distributions of the error components are nonstandard (e.g., Student- and half-Student or normal and half-Student, gamma, symmetric stable, etc.) it is possible to estimate the model by ML estimation via the fast Fourier transform (FFT) when the characteristic functions are available in closed form. These methods can also be used in mixture models, input-oriented efficiency models, two-tiered stochastic frontiers, and so forth. The properties of ML and some GLS techniques are explored with an emphasis on the normal-truncated normal model for which the likelihood is available analytically and simulations are used to determine various quantities that must be set in order to apply ML by FFT.

Starting with Annaert et al. [23], stochastic frontier models have been applied very successfully in finance, especially the important issue of mutual funds performance. Schaefer and Maurer [24] apply these techniques to German funds to find that they “may be able to reduce its costs by 46 to 74% when compared with the best-practice complex in the sample.” Of course, much remains to be done in this area and connect more closely stochastic frontier models with practical finance and better mutual fund performance evaluation.

5.2. Panel Data

Panel data have always been a source of inspiration and new models in stochastic frontier analysis. Roughly speaking, panel data are concerned with models of the form , where the ’s are individual effects, random or fixed, is a vector of covariates, is a parameter vector and, typically, the error term .

An important contribution in panel data models of efficiency is the incorporation of factors, as in Kneip et al. [25]. Factors arise from the necessity of incorporating more structure into frontier models, a point that is clear after Lee and Schmidt [26]. The authors use smoothing techniques to perform the econometric analysis of the model.

In recent years, the focus of the profession has shifted from the fixed effects model (e.g., Cornwell et al. [27]) to a so-called “true fixed effects model” (TFEM) first proposed by Greene [28]. Greene’s model is , where . In this model, the individual effects are separated from technical inefficiency. Similar models have been proposed previously by Kumbhakar [29] and Kumbhakar and Hjalmarsson [30], although in these models firm-effects were treated as persistent inefficiency. Greene shows that the TFEM can be estimated easily using special Gauss-Newton iterations without the need to explicitly introduce individual dummy variables, which is prohibitive if the number of firms () is large. As Greene [28] notes: “the fixed and random effects estimators force any time invariant cross unit heterogeneity into the same term that is being used to capture the inefficiency. Inefficiency measures in these models may be picking up heterogeneity in addition to or even instead of inefficiency.” For important points and applications see Greene [8, 31].

Greene’s [8] findings are somewhat at odds with the perceived incidental parameters problem in this model, as he himself acknowledges. His findings motivated a body of research that tries to deal with the incidental parameters problem in stochastic frontier models and, of course, efficiency estimation. The incidental parameters problem in statistics began with the well-known contribution of Neyman and Scott [32] (see also [33]). In stochastic frontier models of the form: , for and . The essence of the problem is that as gets large, the number of unknown parameters (the individual effects , ) increase at the same rate so consistency cannot be achieved. Another route to the incidental parameters problem is well-known in the efficiency estimation with cross-sectional data (), where JLMS estimates are not consistent.

To appreciate better the incidental parameters problem, the TFEM implies a density for the th unit, say The problem is that the ML estimator is not consistent. The source of the problem is that the concentrated likelihood using (the ML estimator) will not deliver consistent estimators for all elements of .

In frontier models we know that ML estimators for and seem to be alright but the estimator for or the ratio can be wrong. This is also validated in a recent paper by Chen et al. [34].

There are some approaches to correct such biases in the literature on nonlinear panel data models.(i)Correct the bias to first order using a modified score (first derivatives of log likelihood).(ii)Use a penalty function for the log likelihood. This can of course be related to [2] above.(iii)Apply panel jackknife. Satchachai and Schmidt [35] has done that recently in a model with fixed effects but without one-sided component. He derives some interesting results regarding convergence depending on whether we have ties or not. First differencing produces but with ties we have (for the estimator applied when you have a tie).(iv)In line with (ii) one could use a modified likelihood of the form , where is some weighting function for which it is clear that there is a Bayesian interpretation.

Since Greene [8] derived a computationally efficient algorithm for the true fixed effects model, one would think that application of panel jackknife would reduce the first order bias of the estimator and for empirical purposes this might be enough. For further reductions in the bias there remains only the possibility of asymptotic expansions along the lines of related work in nonlinear panel data models. This point has not been explored in the literature but it seems that it can be used profitably.

Wang and Ho [36] show that “first-difference and within-transformation can be analytically performed on this model to remove the fixed individual effects, and thus the estimator is immune to the incidental parameters problem.” The model is, naturally, less general than a standard stochastic frontier model in that the authors assume , where is a positive half-normal random variable and is a positive function. In this model, the dynamics of inefficiency are determined entirely by the function and the covariates that enter into this function.

Recently, Chen et al. [34] proposed a new estimator for the model. If the model is , deviation from the mean gives Given , we have , when = “data.” The distribution of , belongs to the family of the multivariate closed skew-normal (CSN), so estimating and σ is easy. Of course, the multivariate CSN depends on evaluating a multivariate normal integral in . With , this is not a trivial problem (see [37]).

There is reason to believe that “average likelihood” or a fully Bayesian approach can perform much better relative to sampling-theory treatments. Indeed, the true fixed effects model is nothing but another instance of the incidental parameters problem. Recent advances suggest that the best treatment can be found in “average” or “integrated” likelihood functions. For work in this direction, see Lancaster [33] and Arellano and Bonhomme [38], Arellano and Hahn [39, 40], Berger et al. [41], and Bester and Hansen [42, 43]. The performance of such methods in the context of TFE remains to be seen.

Tsionas and Kumbhakar [44] propose a full Bayesian solution to the problem. The approach is obvious in a sense, since the TFEM can be cast as a hierarchical model. The authors show that the obvious parameterization of the model does not perform well in simulated experiments and, therefore, they propose a new parameterization that is shown to effectively eliminate the incidental parameters problem. They also extend the TFEM to models with both individual and time effects. Of course, the TFEM is cast in terms of a random effects model so it is at first sight not directly related to [35].

6. Thoughts on Current State of Efficiency Estimation and Panel Data

6.1. Nature of Individual Effects

If we think about the model: , , as one natural question is: Do we really expect ourselves to be so agnostic about the fixed effects as to allow to be completely different from what we already know about ? This is rarely the case. But we do not adopt the true fixed effects model for that reason. There are other reasons. If we ignore this choice we can adopt a finite mixture of normal distribution for the effects.

In principle this can approximate well any distribution of the effects, so with enough latent classes we should be able to approximate the weight function quite well. That would pose some structure in the model, it would avoid the incidental parameters problem (if the number of classes grows slowly and at a lower rate than ) so for fixed there should be no significant bias. For really small a further bias correction device like asymptotic expansions or the jackknife could be used.

Since we do not adopt the true fixed effects model for that reason, why do we adopt it? Because the effects and the regressors are potentially correlated in a random effect framework so it is preferable to think of them as parameters. It could be that , or perhaps , when . Mundlak [45] first wrote about this model. In some cases it makes sense. Consider the alternative model:

Under stationarity we have the same implications with Mundlak’s original model but in many cases it makes much more sense: mutual fund rating and evaluation is one of them. But even if we stay with Mundlak’s original specification, many other possibilities are open. For small , the most interesting case, approximation of by some flexible functional form should be enough for practical purposes. By “practical purposes” we mean bias reduction to order or better. If the model becomes adaptation of known nonparametric techniques should provide that rate of convergence. Reduction to would facilitate the analysis considerably without sacrificing the rate. It is quite probable that or should be low order polynomials or basis functions with some nice properties (e.g., Bernstein polynomials are nice) that can overcome the incidental parameters problem.

6.2. Random Coefficient Models

Consider [46]. Typically we assume that . For small to moderate panels (say to 10) adaptation of the techniques in the paper by Chen et al. [34] would be quite difficult to implement in the context of fixed effects. The concern is again with evaluation of ()-dimensional normal integrals, when is large. Here, again we are subject to the incidental parameters problem—we never really escape the “small sample” situation.

One way to proceed is the so-called CAR (conditionally autoregressive) prior (model) of the form: In the multivariate case we would need something like the BEKK factorization of a covariance matrix as in multivariate GARCH processes. The point is that the coefficients cannot be too dissimilar and their degree of dissimilarity depends on a parameter that can be made a function of covariates, if any. Under different DGPs, it would be interesting to know how the Bayesian estimator of this model behaves in practice.

6.3. More on Individual Effects

Related to the above discussion, it is productive to think about sources, that is, where these s or s come from. Of course we have Mundlak’s [45] interpretation in place. In practice we have different technologies (represented by cost functions, say). The standard cost function with input-oriented technical inefficiency results in , . Presence of allocative inefficiency results in a much more complicated model: where is the vector of price distortions (see [47, 48]). So under some reasonable economic assumptions and common technology we end up with a nonlinear effects model through the function. Of course one can apply the TFEM here but that would not correspond to the true DGP. So the issues of consistency are at stake.

It is hard to imagine a situation where in a TFEM, the s can be anything and they are subjected to no “similarity” constraints. We can, of course, accept that , so at least for the translog we should have a rough guide on what these effects represent under allocative inefficiency first-order approximations to the complicated term are available when the s are small. Of course then one has to think about the nature of the allocative distortions but at least that’s an economic problem.

6.4. Why a Bayesian Approach?

Chen et al.’s transformation that used the multivariate CSN is one class of transformations, but there are many transformations that are possible because the TFEM does not have the property of information orthogonality [33]. The “best” transformation, the one that is “maximally bias reducing,” cannot be taking deviations from the means because the information matrix is not block diagonal with respect to . Other transformations would be more effective and it is not difficult to find them, in principle.

Recently Tsionas and Kumbhakar [44] considered a different model, namely, , where is persistent inefficiency. They have used a Bayesian approach. Colombi et al. [49] used the same model but used classical (ML) approach to estimate the parameters as well as the inefficiency components. The finite sample properties of the Bayes estimators (posterior means and medians) in Tsionas and Kumbhakar [44] were found to be very good for small samples with values typically encountered in practice (of course one needs to keep away from zero in the DGP). The moral of the story is that in the random effects model, an integrated likelihood approach based on reasonable priors, a nonparametric approach based on low-order polynomials or a finite mixture model might provide an acceptable approximation to parameters like .

Coupled with a panel jack knife device these approaches can be really effective in mitigating the incidental parameters problem. For one, in the context of TFE, we do not know how the Chen et al. [34] estimator would behave under strange DGPs-under strange processes for the incidental parameters that is. We have some evidence from Monte Carlo but we need to think about more general “mitigating strategies.” The integrated likelihood approach is one, and is close to a Bayesian approach. Finite mixtures also hold great promise since they have good approximating properties. The panel jack knife device is certainly something to think about. Also analytical devices for bias reduction to order or are available from the likelihood function of the TFEM (score and information). Their implementation in software should be quite easy.

7. Conclusions

In this paper we presented some new techniques to estimate technical inefficiency using stochastic frontier technique. First, we presented a technique to estimate a nonhomogeneous technology using the IO technical inefficiency. We then discussed the IO and OO controversy in the light of distance functions, and the dual cost and profit functions. The second part of the paper addressed the latent class-modeling approach incorporating behavioral heterogeneity. The last part of the paper addressed LML method that can solve the functional form issue in parametric stochastic frontier. Finally, we added a section that deals with some very recent advances.

Endnotes

  1. Another measure is hyperbolic technical inefficiency that combines both the IO and OO measures in a special way (see, e.g., [50], Cuesta and Zofio (1999), [12]). This measure is not as popular as the other two.
  2. On the contrary, the OO model has been estimated by many authors using DEA (see, e.g., [51] and references cited in there).
  3. Alvarez et al. [52] addressed these issues in a panel data framework with time invariant technical inefficiency (using a fixed effects models).
  4. The above equation gives the IO model (when the production function is homogeneous) by labeling .
  5. Alvarez et al. [52] estimated an IO primal model in a panel data model where technical inefficiency is assumed to be fixed and parametric.
  6. Greene [14] used SML for the OO normal-gamma model.
  7. See also Kumbhakar and Lovell [11, pages 74–82] for the log-likelihood functions under both half-normal and exponential distributions for the OO technical inefficiency term.
  8. It is not necessary to use the simulated ML method to estimate the parameters of the frontier models if the technical inefficiency component is distributed as half-normal, truncated normal, or exponential along with the normality assumption on the noise component. For other distributions, for example, gamma for technical inefficiency and normal for the noise component the standard ML method may not be ideal (see Greene [14] who used the simulated ML method to estimate OO technical efficiency in the gamma-normal model).
  9. Atkinson and Cornwell [53] estimated translog cost functions with both input- and output-oriented technical inefficiency using panel data. They assumed technical inefficiency to be fixed and time invariant. See also Orea et al. [12].
  10. See, in addition, Beard et al. [54, 55] for applications using a non-frontier approach. For applications in social sciences, see, Hagenaars and McCutcheon [56]. Statistical aspects of the mixing models are dicussed in details in Mclachlan and Peel [57].
  11. LML estimation has been proposed by Tibshirani [58] and has been applied by Gozalo and Linton [59] in the context of nonparametric estimation of discrete response models.
  12. The cost function specification is discussed in details in Section 5.2.
  13. An alternative, that could be relevant in some applications, is to localize based on a vector of exogenous variables instead of the 's. In that case, the LML problem becomes where are the given values for the vector of exogenous variables. The main feature of this formulation is that the parameters as well as , , and will now be functions of instead of .