#### Abstract

Data envelopment analysis (DEA) measures relative efficiency among the decision making units (DMU) without considering noise in data. The least efficient DMU indicates that it is in the worst situation. In this paper, we measure efficiency of individual DMU whenever it losses the maximum output, and the efficiency of other DMUs is measured in the observed situation. This efficiency is the minimum efficiency of a DMU. The concept of stochastic data envelopment analysis (SDEA) is a DEA method which considers the noise in data which is proposed in this study. Using bounded Pareto distribution, we estimate the DEA efficiency from efficiency interval. Small value of shape parameter can estimate the efficiency more accurately using the Pareto distribution. Rank correlations were estimated between observed efficiencies and minimum efficiency as well as between observed and estimated efficiency. The correlations are indicating the effectiveness of this SDEA model.

#### 1. Introduction

Producer performance 1 is influenced by three phenomena: efficiency with which management organizes production, the effect of environmental factors, and random error [1]. If all of these phenomena are influenced positively to production, then it is the best situation for production of the DMU, and if all of these phenomena are influenced the production negatively, then it is the worst situation. Several models of single stage, or multistage have been proposed to incorporate environmental factors in DEA. Banker and Morey [2] proposed a single-stage DEA method for environmental variables. An obstacle of this method is that the direction of impact on production of the environmental factors must be known in advance. In some research, a two-stage approach is used to describe the effect of environmental factors. First stage calculates efficiency from DEA based on inputs and outputs and a second-stage regression analysis tries to explain variation in first-stage efficiency score. A step further, development of the two-stage method was done by McCarty and Yaisawarng [3] and Bhattacharyya et al. [4] by using the second-stage regression residuals to adjust the first-stage efficiency scores. Input oriented DEA to inputs and environmental factors or output oriented DEA to outputs and the environmental factors applied in the first stage. Then, the inputs or outputs were replaced by their residual projections. In the second stage of this method again applied DEA to expanded data set consisting of the originally efficient observations, the originally inefficient observations, and the radial projections of the originally inefficient observations. Camanho et al. [5] suggested a method of DEA considering internal and external nondiscretionary factors. In this method, they generalized models of Banker and Morey [2] and their extension to both nondiscretionary inputs and outputs by Golany and Roll [6], the model of Ruggiero [7] and its extension described by Ruggiero [8]. Lotfi et al. [9] measured the relative efficiency of decision making units with nondiscretionary inputs and interval discretionary data. They showed upper bound by assuming the best performance of a DMU against the worst performance of the rest of other DMU and similarly lower bound by assuming the worst performance of a DMU against the best performance of the rest of other DMU. Sadjadi and Omrani [10] presented DEA model with uncertain data. Simar and Wilson [11ā13] proposed bootstrap algorithm to study statistical properties of DEA models. In bootstrap method, it is difficult to find an appropriate value of a smoothing parameter. Another problem of this method is the large number of iterations. Entani et al. [14] introduce a method to calculate interval efficiency. Kao and Liu [15] used the interval data to measure the interval efficiency. Our method will provide an interval efficiency, the novelty we will be taking into consideration is that the study approach lies with an assumption of noise data in the existing DEA method and bounded Pareto distribution is used to estimate efficiency scores at a time.

The main focus of this paper is to develop a method for DEA including noise in data. Toward the end, we formulate a method of SDEA using BCC model. The objective of the paper is to measure minimum efficiency of DMUs. The rest of the study is organized as follows. Section 2 deals with the background of this study. Section 3 develops a methodology for SDEA and characteristics of bounded Pareto distribution. Section 4 presents and discusses an empirical example. And conclusions are included in the final section.

#### 2. Background

Data envelopment analysis is a nonparametric method in operations research for measure production efficiency of DMUs. It is a popular method to measure performance of DMUs. Farrell [16] was motivated to develop better methods and models for evaluating productivity performance:

The initial DEA model, as presented by Charnes et al. [17] and built on the earlier work of Farrell [16], is known as CCR model. It is a constant returns to scale model. CCR model is as follow: where is the input and is the output for the th firm. represents intensity of variable. is the efficiency of the th firm.

In the CCR model, if a constraint is adjoined, then it is known as BCC models [18].

Chance Constraint DEA (CCDEA) [19] considers two sets of constraints. There is a set of chance constraints that probability of efficient of all DMUs is only 0.05 or less. The CCDEA model is Fuzzy DEA is an extension of the CCR DEA model incorporating fuzzy numbers [20]. The Fuzzy DEA model is where and are -dimensional - fuzzy input vector and dimensional fuzzy output vector of th DMU. and are coefficient vectors of and , respectively. The index indicates the evaluated DMU.

The bootstrap method introduced by Efron [21] is used in Bootstrap DEA [11ā13]. Algorithm of Bootstrap DEA is as follows [11].

*Step 1. *Transform the input-output vectors using the original efficiency estimates as .

*Step 2. *Given a set of estimated efficiencies , use to obtain bandwidth parameter . denotes the standard deviation estimate of efficiency estimates and denotes the interquartile range of empirical distribution, respectively.

*Step 3. *Generate by resampling, with replacement, from the empirical distribution of estimated efficiencies. is the nonsmoothed resample of original efficiencies.

*Step 4. *Generate sequence using

*Step 5. *Generate the smoothed pseudo efficiencies using . is the smoothed resample efficiencies, .

*Step 6. *Let the bootstrap pseudo data be given by *. *

*Step 7. *Estimate the bootstrap efficiencies using the pseudo data and the linear program .

*Step 8. *Repeat Steps 2 to 7āā times to create a set of firm specific bootstrapped efficiency estimates.

Imprecise DEA (IDEA) model is an extension of the CCR model incorporating imprecise data information and this DEA is nonlinear and nonconvex program [22]. Sometimes, outputs and inputs are imprecise data in the forms of bounded data, ratio bounded data, weak ordinal data, and strong ordinal data [23, 24]. If data follow any of the above forms, then DEA is IDEA as follows: where and represent any form of imprecise data. and are the weights of output and input, respectively.

Robust structure proposed by Ben-Tal and Nemirovski [25] and Bertsimas and Sim [26] is used in DEA by Sadjadi and Omrani [10]. They mentioned it as robust DEA. The robust DEA model is expressed as follows [10]: where is the efficiency, and are the th output and th input of th DMU. is the budget of uncertainty for constraint . and are the weights of output and input, respectively. is the utmost probability to violate constraint. and are the dual variables.

In this paper we describe a methodology based on BCC model incorporating noise in data.

#### 3. Method

We propose a method of DEA to know the performance of a DMU in the worst situation. In the first stage, we apply DEA with input and output data to know the efficiency level of DMU. Thereafter, in the second stage, we maximize the gap between frontier output and observed output to assess what will be the maximum loss if the DMU is in the worst situation. In the final stage, we will again use DEA for the worst condition of individual DMU output with observed output of other DMUs and will determine the efficiency level.

##### 3.1. DEA Incorporating Error

*Stage 1. *DEA is familiar with input-oriented method and output-oriented method. We choose the output oriented DEA method of Banker et al. [18] which can be expressed as the linear programming problem as follows:

(a)
where is a function of output . This LP problem is solved times and is the number of firms.

*Stage 2. *Model can be classified into two groups as neoclassical model and frontier model [27]. This classification is depending on the interpretation of the deviation terms . Considering the assumption of neoclassical model, all firms are efficient, and deviations are seen as random, uncorrelated noise terms that satisfy the Gauss-Markov assumptions. But in the frontier models, all deviations from the frontier are attributed to inefficiency which implies that, , for all [28]. A function
Is a frontier model if are interpreted as composite error terms that include both inefficiency and noise components where is the production frontier Ainger et al. [29]. Then, output due to environmental factor and inefficiency is since frontier output is always greater than or equal to the observed output. Now, to maximize the output , we use output-oriented DEA method. The method can be expressed as the linear programming problem as follows:

(b)
where is a function of .

*Stage 3. *Let us suppose that the maximized quantity of is . To calculate efficiency of th firm in worst production situation take into account the output of the th firm is whereas inputs of all forms and output of other firms are observed data.

This can be mathematically shown as the following:

(c)
where is a function of frontier output and output due to environmental factor and shows the minimum level of efficiency of the th firm.

##### 3.2. Estimation of DEA Efficiency

Using the above methodology, we the get minimum efficiency of DMU and the highest known efficiency is 1. Pareto distribution is usually used to describe the allocation of wealth among individuals, but in this paper we use bounded pareto distribution because of its properties. Pareto distribution is applicable if the range of a variable is a certain value to infinity. The range of efficiency of BCC model to maximize the output is a certain value to 1. This situation justifies the use of bounded pareto distribution.

The probability density function of bounded Pareto distribution is where with.

Mean

Variance

Then, the probability density function of bounded Pareto distribution for efficiency is as upper bound for efficiency is 1, where is the minimum efficiency. Using pdf , we generate random number of .

#### 4. Empirical Example

Table 1 shows the efficiency from observed outputs and inputs mentioned as observed efficiency. Firm 1, Firm 2, Firm 4, Firm 12, Firm 13, and Firm 20 are showing to be perfectly efficient among the 20 firms. Firm 16 has the less efficiency among the firms. Frontier output is the projected output obtained from the software DEAP version 2.1. Gap between observed and frontier outputs is the amount of output lost at DMU because of inefficiency. Inefficiency is influenced by some environmental factors. If a DMU can control these factors, then it will be efficient DMU. Maximum loss of output is calculated from Stage 2 of methodology. We calculated minimum output as

Minimum output and observed output are found the same for Firm 7 and Firm 16 since from the beginning these firms are in the worst position among firms. Efficiency in the worst situation is the efficiency of individual firm when it will follow the situation of Firm 7 and Firm 16. Maximum loss is the loss of output due to completely negative effect of unexplained factors on production. Maximum loss of firm 1will be 273 units if all unexplained factors effect as they effect in firm 7 or firm 16 because firm 7 and firm 16 are in worst production situation.

To calculate the efficiency in the worst situation, we consider the minimum output of the firm with observed output of other firms. For efficiency in the worst situation of Firm 1, use the minimum output 2202 unit with observed output of Firm 2 to Firm 20 as output and observed inputs of all firms. Observed efficiency of Firm 16 (0.823) is the lowest. And in the worst situation, Firm 18 and Firm 19 are also showing the lowest efficiency level (0.823) with Firm 16.

Table 2 shows results of estimated efficiency from bounded Pareto distribution. We arbitrarily choose 0.5, 1.5, 2.5, 3.5, and 4.5 as shape value, efficiency in the worst situation as lower bound, and 1 as highest value to draw 10 random numbers. Mean of random numbers is the estimated efficiency. Bias is the difference between observed efficiency and estimated efficiency. In Robust Bootstrap DEA, original efficiency is greater than the estimated efficiency [30], but in our case the bias is both positive and negative. Mean of bias is the lowest for shape 0.5. Mean of bias is increasing with increased the value of shape.

Figure 1 presents interval efficiency of firms. Among the perfectly efficient firm, firm 12 is highly consistent and firm 4 is most inconsistent. That is Firm 12 will be less affected in worst situation. Range of efficiency is almost the same for Firm 7, Firm 10, Firm 16, Firm 17, Firm 18, and Firm 19.

Figure 2 indicates the comparison among the observed, minimum, and estimated efficiencies. Estimated efficiency is following almost the same pattern for different shape parametric values. Estimated efficiency curve is found smooth and it might be due to a parametric estimation. Observed and minimum efficiency lines are showing the same point for Firm 7 and Firm 16. Again minimum efficiency in the worst situation occurred in cases of Firm 16, Firm 18 and Firm 19.

Correlation (0.842) is high for shape parameter 4.5 between observed and estimated efficiencies. Table 3 presents rank correlations matrix among observed efficiency, minimum efficiency, and estimated efficiency with different values of shape parameter. All the correlations are recorded significantly high at 1 percent level of significance, which indicates the effectiveness of the developed SDEA method. This result supports the result of Yuichiro [31].

Correlation between minimum efficiency and observed efficiency is 0.827. Minimum efficiency is found relatively more correlated with estimated efficiency (0.903) when shape parameter is 2.5. Though estimated efficiency with shape parameter 0.5 has minimum bias, correlation with minimum efficiency as well as observed efficiency is relatively low.

Figure 3(a) shows the scatter diagram between minimum efficiency and observed efficiency. The efficiency correlation coefficient is calculated as 0.707. Scatter diagram of observed efficiency and estimated efficiency with different values of shape parameter is shown in Figures 3(b)ā3(f), and correlation coefficients of efficiency are 0.521, 0.667, 0.592, 0.759, and 0.731 for shape parameters 0.5, 1.5, 2.5, 3.5, and 4.5, respectively. The correlation between observed efficiency and estimated efficiency is increased as the value of shape parameter is increased. All the correlations are significant at 5 percent level of significance. Estimated efficiency with shape parameter 2.5 has the highest correlation with minimum efficiency. And the lowest correlation between minimum efficiency and estimated efficiency is found when shape parameter is 0.5.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

To select the appropriate value of shape parameter, we can set up the following hypotheses.(Ha) There is significant difference between observed efficiency and estimated efficiency.

Table 4 shows that there is no significant difference between observed efficiency and estimated efficiencies for shape parameters (0.5, 1.5, 2.5, and 3.5) but there is difference between observed and estimated efficiencies for shape parameter 4.5 at 10% level of significance. In Table 4, the value is decreasing for higher value of shape parameter. To estimate efficiency, the shape parameters 0.5, 1.5, 2.5, and 3.5 can be used but the small value of shape parameter is more appropriate.

#### 5. Conclusion

This paper presents a method of the stochastic data envelopment analysis which handles data with noise in DEA. The method consists of three stages. In the first stage, we use BCC DEA model to see the gap between observed and frontier production. From the second stage, we can assess minimum production level. In policy implication, it will help to determine the size of inventory. The third stage measures lower bound of efficiency for each DMU. It provides an interval efficiency to use statistical properties. For the small value of shape parameter, mean of bias is also small. Small value of shape parameter is more appropriate to estimate efficiency using the Pareto distribution. The result shows an almost similar pattern of observed efficiency, minimum efficiencies and estimated efficiencies. Significantly high correlations in between observed and minimum efficiency as well as between observed and estimated efficiency are recorded. Rank correlations are showing the effectiveness of the developed SDEA method.