Research Article  Open Access
A Data EnvelopmentBased Clustering Approach for Public Sugar Factories in Privatizing Process
Abstract
Turkish Sugar Inc., a public enterprise including 25 factories, is the first corporation of Turkish industry. According to the government policy, public sugar factories (PSFs) will be privatized as geographybased 6 portfolio groups in two years. As performance measures of PSF affect government, sugar producers, and several unions in privatizing process, a systematic approach is necessary to measure efficiencies and grouping factories. This paper uses a new DEA (Data Envelopment Analysis) based clustering approach for measuring efficiency scores of PSF and grouping them instead of geography based portfolio groups. This new approach can help decision makers in privatizing process. At the same time, target values obtained by dual model can be used to eliminate inefficiencies of some PSFs.
1. Introduction
Sugar factories are the first corporations of Turkish industry. The first sugar factory was established by the direction of Kemal Ataturk in Alpullu in 1926. Annual sugar demand of Turkey which is supplied by three different sugar producers is 2.3 million ton. These producers are Turkish Sugar Inc., a public enterprise including 25 factories, Pankobirlik (beet producers union) with 6 factories, and starchbased sugar producers with 5 factories. The market share of these producers is 70%, 20%, and 10%, respectively. Turkish sugar Inc. and Pankobirlik use beet to produce sugar instead of starch.
According to the government policy, sugar factories of Turkish Sugar Inc. will be privatized as geographybased 6 portfolio groups in two years: A (Kars, Ercis, Agri, Mus, and Erzurum), B (Elazig, Malatya, Erzincan, and Elbistan), C (Kastamonu, Kirsehir, Turhal, Yozgat, Corum, and Carsamba), D (Bor, Eregli, and Ilgin), and E (Usak, Alpullu, Burdur, and Afyon). Some Turkish and foreign corporations aspire to buy sugar factories like Pankobirlik, Keskinkilic, Torunlar (Turkish), Sudzucker (German), Saint Louys Sucre (French), and British Sugar.
According to the privatizing supporters, average production season length is shorter in TR (64 days) than that in EU (120 days) and the average number of personnel per factory in TR and EU are 500 and 200, respectively. The number of average personnel per factory is too large in contrast with Europe. On the other hand, usage of noneconomic beets with low polar sugar value brings high costs. According to the opponents, the reasons of high prices are high starchbased sugar quota (15% in TR and 2% in EU) and outdated technology. If the starch basedsugar quota is decreased and new technology is used, PSFs become profitable. At the same time, to prevent external dependency and ruralurban migration and to protect sector, state of council should stop enforcement decision about privatizing. In the final situation, state of council stopped enforcement decision because of the following reasons: (i) the specification contains 5year production obligation and 50milliondollar assurance, (ii) it does not guarantee supplydemand balance and stability, and (iii) it does not guarantee production sustainability, and it creates external dependency.
Because of the reasons mentioned above, performance measurement of PSF is an important task which affects government, sugar producers, and several unions in privatizing process. A systematic approach is necessary to measure efficiencies of PSF. This paper is the first reallife application of DEAbased clustering approach developed by Po et al. [1] for measuring efficiency scores of PSF and grouping them instead of geographybased portfolio groups. This new approach can help decision makers in privatizing process of PSF. At the same time, target values obtained by DEA model can be used to eliminate inefficiencies and to make inefficient factories profitable. Additionally, to the best of the author’s knowledge, there is no scientific study for efficiency measurement of public sugar factories in our country or elsewhere.
The rest of this paper is organized as follows: Section 2 discusses DEA and DEAbased clustering approach which is developed by Po et al. [1]. In this section, the focus is why and how piecewise production functions drawn from DEA models are employed to cluster data. Section 3 illustrates the proposed DEAbased clustering approach for measuring efficiency scores of PSF and grouping them to help decision makers in privatizing process. The results obtained by DEAbased clustering approach are compared with geographic based portfolio groups and target values obtained by CCR model are given to eliminate the inefficiencies of some PSF. Finally, conclusions are stated in Section 4.
2. DEABased Clustering Approach
Conventionally, most clustering algorithms are procedures that minimize total dissimilarity; examples of such algorithms are given in the paper of Po et al. [1].
A general clustering method is to find cluster centers so that the total dissimilarity measure with is minimized. is usually defined as a distancebased function, and the problem here is to select a useful and reasonable distance measure .
On the other hand, the stated clustering approaches can be seen as a feature analysis technique. An assumption of the underlying feature analysis is to regard the feature items as multiple features so that the minimization of presents the closer of data among their features and makes it more possible for these DMUs to be classified into the same cluster. However, the clustering results derived from the minimization of the total feature dissimilarity may not be helpful in some cases of clustering DMUs, especially in production units. In these cases, we use their production data to cluster them. Suppose that the production data have feature items with to being input items and to being output items. The clustering information obtained from the conventional clustering approaches can only reveal that DMUs are more similar to another one. However, the more important information we want to know is the production feature (functions) implied from the production data of all DMUs. That is, . From these derived production functions, , all DMUs are classified into different clusters (production functions). Therefore, each DMU knows not only the cluster that it belongs to but also knows the production function type that it confronts. Each DMU can compare its production feature with the other production functions so that the combination of its input resources or the combination of inputs and outputs can be readjusted. That is, for the case of data feature with input and output items, the cluster derived from production functions is more valuable than that derived from feature dissimilarity measures.
The idea of Po et al.’s study [1] is to employ the production functions to cluster production data. The method supporting this idea is DEA, as initiated and developed by Charnes et al. [2]. The DEA is a dataoriented method for evaluating the relative efficiency of DMUs where each DMU is an entity responsible for converting multiple inputs into multiple outputs. Since the fundamental of DEA uses the nonparametric mathematical programming approach to estimate piecewise frontiers and envelop the DMU data sets, in this study, each piecewise frontier is regarded as one cluster of production functions. Therefore, we use all piecewise frontiers as a base to cluster production data. That is, they give up traditional clustering approaches of feature dissimilarity and propose a new approach by adopting the production functions revealed by the observation data to cluster all DMUs.
DEA is a nonparametric method for the estimation of production frontiers. It is a useful tool for evaluating the relative efficiency for a group of DMUs. Up to now, DEA has been widely studied and applied in various areas for 30 years since Charnes et al. [2] first proposed the DEA method with the CCR model. Among them, the main forms of DEA models and their extensions include those of BCC model [3], the additive model, [4] and the imprecise DEA models [5, 6]. Modifications and extensions are the assurance region models [7, 8], superefficiency models [9, 10], cone ratio models [11, 12]. Stochastic and chanceconstrained extensions are considered by some authors [13–17]. Taxonomy and general model frameworks for DEA can be found in [18, 19]. The CCR is the original model of DEA (see the M1 model) and is used in this study to explain the DEAbased clustering approach.
The DEA model generalizes the usual input/output ratio measure of efficiency for a given unit in terms of a fractional linear program formulation. According to the economic notion of Pareto optimality, the DEA method states that a DMU is considered inefficient if some other DMUs or some combinations of other DMUs produce at least the same amount of output with less of the same resources input and not more of any other resources. Conversely, a DMU is considered Pareto efficient if the above is not possible. Suppose that there are DMUs to be evaluated, is the noted amount of the th input for the th DMU and is the noted amount of the th output for the th DMU. Output multipliers are (one for each item of output) and input multipliers are (one for each item of input). The mathematical formulation of the method is summarized next, where the relative efficiency of the is determined [20]. See the M1 model.
M1 Model: The DEA model is essentially a fractional programming problem with a ratio of a weighted sum of outputs to a weighted sum of inputs where the weights for both inputs and outputs are to be selected in a manner that calculates the efficiency of the evaluated unit. Therefore, the original form of the DEA model is both nonlinear and nonconvex problem. Charnes et al. [21] proved that fractional programming problem can be transformed into linear programming formulations. The first formulation is ‘‘input based,” constraining the weighted sum of outputs to be unity and minimizes the inputs that can then be obtained. The second formulation is “output based,” constraining the weighted sum of inputs to be unity and maximizes the outputs that can then be obtained (see the M2 model). Given constant returns to scale assumption, the result from the inputbased model is the reciprocal of that from outputbased model. If variable returns to scale are assumed, there is no direct relation can be found between these two models.
For the clustering approach used in this study, the results can be different for those PSFs which are not on the production frontier according to the way that inputbased or output based model is applied. The choice of using an inputbased or outputbased model depends on the production process characterizing the firm (i.e., minimize the use of inputs to produce a given output or maximize the output with given levels of inputs). The objective of this study is to find the set of coefficients associated with each output and input that will give the PSF being evaluated the highest possible efficiency by using the M2 model. Then, target values are calculated by using this model to eliminate the inefficiencies of some PSFs.
M2 Model:
DEA differs from the production theory of economics in that it is nonparametric. In economics, the production function is a function that summarizes the process of converting multiple inputs into a single output. Thus, a general mathematical form for the production function in economics can be expressed as , where is a quantity of output and are quantities of inputs. However, DEA is a nonlinear programming model for evaluating a process converting multiple inputs into multiple outputs, that is, . Most previous studies had mentioned and discussed the properties of production function that are hidden in DEA methods [8–10, 14, 15, 17, 22].
Since the number of DMUs is usually much larger than the number of inputs, we prefer to express the linear programming in its duality form. Further, the duality form can interpret the geometric meaning of DEA and provide information about conservation of resources or expansion of outputs to have DMUs from inefficiency to efficiency.
If is the optimal value of , the is said to be efficient if and only if . If is less than 1, is inefficient. According to the efficiency ratio, DMUs may be grouped as good () and poor () performers or clustered by assigning different efficiency ratio grades [23–27]. Although clustering by efficiency ratio gives some information about the rationality of output/input, it does not reveal the intrinsic relationship between the input and output production features. Therefore, this study adopts piecewise production functions derived from the DEA method to cluster data.
In M2 model, it is obvious that the constraint is an inequality formula of production functions. Solving M2 model yields the virtual multipliers and . Thus, is derived. Running M2 model for to gives all production functions. Then, all DMUs are classified into different clusters by these piecewise production functions. Thus, a clustering method using production functions via the DEA method is implemented. Po et al. [1] find that there is less consideration in using these production functions as a reference to classify evaluated DMUs, and they propose a clustering approach according to the properties of DEA and its production possibility set such that they can use these production functions as a reference to classify evaluated DMUs. The details about the algorithm used in which the DEAbased clustering method is applied are given in their paper.
3. DEABased Clustering of PSF
In this study, we have an efficiency evaluation problem with 25 PSFs (DMU), each PSF with three inputs and one output obtained by 20092010 annual activity reports. Actually processed beet quantity (PBQ), fuel consumption (FC), number of total personnel (TP), sugar production (SP), and molasses production (MP) data are placed in annual activity reports of PSF, and all of them are real and correct. PBQ, FC, and TP are considered as inputs. Only SP is selected as output because it is correlated with MP.
The simplified production data of PSF are shown in Table 1. This table shows the required quantity of inputs to produce one unit of (one metric ton) sugar. For example, PSF 22 uses 9.96 ton beet, 0.419 ton fuel, and 0.0213 personnel to produce one ton sugar according to Table 1.

The objective is to find the set of coefficients ’s associated with each output and ’s associated with each input that will give the PSF being evaluated the highest possible efficiency. By using the M2 model for each PSF its efficiency ratio and the solution of virtual multipliers, are obtained. The multipliers are measure of the relative increase in efficiency with each unit
reduction of input value, where is a measure of the relative decrease in efficiency with each unit reduction of output value.
The analytical results are shown in Table 2.
 
*They are reevaluated by , because of zero virtual multipliers. **They are reevaluated by because of zero virtual multipliers. ***They are reevaluated by because of zero virtual multipliers. 
By selecting the set of virtual multipliers to be all nonzero, four frontiers of production functions (PFs) are found:
PSFs with (*), (**), (***) in Table 2 confront the degenerative frontier. Po et al. [1] suggest that they should be reclassified into the nearest effective frontier (the frontier with nonzero virtual multipliers). In this application, it is observed that PSFs with (*) confront the nearest effective frontier , thus their efficiency ratio will be reevaluated by this frontier. However, in complicated applications (with more data items of input and output), it is impossible to judge the nearest effective frontier by observation. Hence, for PSF 7 (Carsamba), we follow the procedure of Po et al. [1], taking , into (PF1), (PF2), (PF3), and (PF4), respectively. The value is calculated, giving , , and . By taking the maximal value, the efficiency ratio for PSF 7 is reevaluated as 0.6307. In addition, PSF 7 is classified into the cluster determined by the corresponding envelope (PF1).
In this study, some PSFs achieve 100 percent efficiency and are referred to as the relatively efficient units, whereas other units with efficiency ratings of less than 100 percent are referred to as inefficient units. According to the results of Table 2, there are six efficient (PSF1, PSF8, PSF11, PSF14, PSF21, and PSF25) and 19 inefficient PSFs. The 5 PSFs out of 19 have greater than or equal to 0.95 efficiency ratio. Additionally, the net revenues of PSF are supported by the DEA results. According to four different production functions, 25 PSFs are classified into four clusters. Clustering results are shown in Table 3.
By considering (PF1), (PF2), and (PF3), TP is the most critical input for the PSF in clusters 1, 2, and 3. The multiplier of TP has the biggest value for these clusters. The relative increase in efficiency is 5.36 with each number reduction of TP for the inefficient PSF placed in clusters 1 and 3. Similarly, the relative increase in efficiency is 1.415 for the inefficient PSF placed in cluster 2 (see Table 3). The order of multipliers for other inputs changes. For example, for the PSF in cluster 2, the multipliers of FC and TP are similar and higher than PBQ. On the other hand, FC is the most critical input for the PSF in cluster 4. The relative increase in efficiency is 1.018 with each ton reduction of FC for the inefficient PSF placed in cluster 4 (see Table 3). DEA and geographybased clustering results are compared in Table 4.
 
*These PSFs are out of geographybased portfolio according to government policy. 
As you can see in Table 4, DEAbased clusters contain different geographybased portfolio groups (1E, C, D), (2A, B, D), (3C), and (4A, B, E). Moreover, the clustering results derived from geographybased portfolio may not be helpful in cases of clustering PSF. From the derived production functions (PF1, PF2, PF3, and PF4), all PSFs are classified into four different clusters (production functions). Therefore, each PSF knows the PF type that it confronts. Additionally, each PSF can compare its production feature with the other production functions so that the combination of its input resources or the combination of inputs and outputs can be readjusted. That is, for the case of data feature with input and output items, the cluster derived from production functions is more valuable than that derived from geographybased portfolio groups. It is possible to eliminate inefficiencies by considering DEAbased clustering. For example, inefficient PSF placed in clusters (1), (2), and (3) should give priority to decrease the number of total personnel. In the same manner, inefficient PSF placed in cluster (4) should decrease fuel consumption at first. It is meaningful to support privatizing decisions by DEAbased clustering results than geographybased portfolio groups.
At the same time, target values of inputs are calculated by using slack variables of M2 model and illustrated in Table 5 for the inefficient PSF. Target values can help decision makers to eliminate the inefficiencies. For example, 9.96 ton beet, 0.419 ton fuel, and 0.0213 personnel are used to produce one ton sugar in PSF 22. When 6.477 ton beet, 0.273 ton fuel, and 0.0014 personnel are used to produce one ton sugar, PSF 22 becomes efficient.

4. Conclusions
This study develops a DEAbased clustering approach for the evaluation of PSF. The proposed approach employs the piecewise production functions derived from the DEA method to cluster the data with input and output items. Compared with geographybased clustering that only considers geographical location of PSF, our proposed approach reveals the inputoutput relationships hidden in the data items of input and output. Thus, for each evaluated PSF, we know not only the cluster that it belongs to but also the production function type that it confronts. It is very important for managerial decision making where decision makers are interested in knowing the changes required in combining input resources so that it can be reclassified into a different and desired cluster/class in privatizing process.
The focus of this paper is to examine the CCR model of DEA and then establish the DEAbased clustering. Without loss of generality, while this approach has been carried out for the CCR model, the proposed approach can be easily extended to other DEA models. The clustering results drawn from the DEAbased clustering are unit invariant, meaning that they are not affected by the scale of data.
The DEAbased clustering approach is suitable for most clustering problems, where there are inputsandoutputs or causeandeffect relationships between the features. For example, we use the proposed approach in the analysis of industry classification, sorting of PSF by inputoutput data.
In summary, in view of the advantages of the DEAbased clustering approach, it is uniquely poised for clustering problems. We believe that future researches are necessary to unleash the full potential of this DEAbased clustering approach. It thus has tremendous potential to be used for various clustering problems. DEAbased clustering algorithm developed by Po et al. [1] is robust to a slight change in the input and output data sets, but not to outliers. Future researches will consider developing a robusttype DEAbased clustering algorithm.
References
 R.W. Po, Y.Y. Guh, and M.S. Yang, “A new clustering approach using data envelopment analysis,” European Journal of Operational Research, vol. 199, no. 1, pp. 276–284, 2009. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 A. Charnes, W. W. Cooper, and E. Rhodes, “Measuring the efficiency of decision making units,” European Journal of Operational Research, vol. 2, no. 6, pp. 429–444, 1978. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 R. D. Banker, A. Charnes, and W. W. Cooper, “Some models for estimating technical and scale inefficiencies in data envelopment analysis,” Management Science, vol. 30, no. 9, pp. 1078–1092, 1984. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 A. Charnes, W. W. Cooper, B. Golany, L. Seiford, and J. Stutz, “Foundations of data envelopment analysis for ParetoKoopmans efficient empirical production functions,” Journal of Econometrics, vol. 30, no. 12, pp. 91–107, 1985. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 W. W. Cooper, K. S. Park, and G. Yu, “Idea and ARIDEA: models for dealing with imprecise data in DEA,” Management Science, vol. 45, no. 4, pp. 597–607, 1999. View at: Publisher Site  Google Scholar
 J. Zhu, “Imprecise data envelopment analysis (IDEA): a review and improvement with an application,” European Journal of Operational Research, vol. 144, no. 3, pp. 513–529, 2003. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 R. G. Thompson, R. D. Singleton, R. M. Thrall, and B. A. Smith, “Comparative site evaluations for locating a high energy physics laboratory in Texas,” Interfaces, vol. 16, no. 6, pp. 35–49, 1986. View at: Publisher Site  Google Scholar
 S. H. Zanakis, C. Alvarez, and V. Li, “Socioeconomic determinants of HIV/AIDS pandemic and nations efficiencies,” European Journal of Operational Research, vol. 176, no. 3, pp. 1811–1838, 2007. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 P. Andersen and N. C. Petersen, “A procedure for ranking efficient units in data envelopment analysis,” Management Science, vol. 39, no. 10, pp. 1261–1264, 1993. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 S. Li, G. R. Jahanshahloo, and M. Khodabakhshi, “A superefficiency model for ranking efficient units in data envelopment analysis,” Applied Mathematics and Computation, vol. 184, no. 2, pp. 638–648, 2007. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 A. Charnes, W. W. Cooper, Q. L. Wei, and Z. M. Huang, “Cone ratio data envelopment analysis and multiobjective programming,” International Journal of Systems Science, vol. 20, no. 7, pp. 1099–1118, 1989. View at: Publisher Site  Google Scholar
 A. Charnes, W. W. Cooper, Z. M. Huang, and D. B. Sun, “Polyhedral coneratio DEA models with an illustrative application to large commercial banks,” Journal of Econometrics, vol. 46, no. 12, pp. 73–91, 1990. View at: Publisher Site  Google Scholar
 K. C. Land, C. A. K. Lovell, and S. Thore, “Productivity and efficiency under capitalism and state socialism: an empirical inquiry using chanceconstrained data envelopment analysis,” Technological Forecasting and Social Change, vol. 46, no. 2, pp. 139–152, 1994. View at: Publisher Site  Google Scholar
 O. B. Olesen and N. C. Petersen, “Chance constrained efficiency evaluation,” Management Science, vol. 41, no. 3, pp. 442–457, 1995. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 W. W. Cooper, Z. Huang, and S. X. Li, “Satisficing DEA models under chance constraints,” Annals of Operations Research, vol. 66, no. 4, pp. 279–295, 1996, Extensions and new developments in data envelopment analysi. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 R. Lahdelma and P. Salminen, “Stochastic multicriteria acceptability analysis using the data envelopment model,” European Journal of Operational Research, vol. 170, no. 1, pp. 241–252, 2005. View at: Publisher Site  Google Scholar
 W. W. Cooper, H. Deng, Z. Huang, and S. X. Li, “Chance constrained programming approaches to technical efficiencies and inefficiencies in stochastic data envelopment analysis,” Journal of the Operational Research Society, vol. 53, no. 12, pp. 1347–1356, 2002. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 S. Gattoufi, M. Oral, and A. Reisman, “A taxonomy for data envelopment analysis,” SocioEconomic Planning Sciences, vol. 38, no. 23, pp. 141–158, 2004. View at: Publisher Site  Google Scholar
 A. Kleine, “A general model framework for DEA,” Omega—The International Journal of Management Science, vol. 32, no. 1, pp. 17–23, 2004. View at: Publisher Site  Google Scholar
 W. W. Cooper, L. M. Seiford, and K. Tone, Data Envelopment Analysis: A Comprehensive Text, with Models, Applications, References and DEA Solver Software, Springer, New York, NY, USA, 2nd edition, 2007.
 A. Charnes, W. W. Cooper, and E. Rhodes, “Evaluating program and managerial efficiency: an application of data envelopment analysis to program follow through,” Management Science, vol. 27, no. 6, pp. 668–697, 1981. View at: Publisher Site  Google Scholar
 W. W. Cooper, J. L. Ruiz, and I. Sirvent, “Choosing weights from alternative optimal solutions of dual multiplier models in DEA,” European Journal of Operational Research, vol. 180, no. 1, pp. 443–458, 2007. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 G. Yu, Q. Wei, P. Brockett, and L. Zhou, “Construction of all DEA efficient surfaces of the production possibility set under the Generalized Data Envelopment Analysis Model,” European Journal of Operational Research, vol. 95, no. 3, pp. 491–510, 1996. View at: Publisher Site  Google Scholar
 R. G. Thompson, E. J. Brinkmann, P. S. Dharmapala, M. D. GonzalezLima, and R. M. Thrall, “DEA/AR profit ratios and sensitivity of 100 large US banks,” European Journal of Operational Research, vol. 98, no. 2, pp. 213–229, 1997. View at: Publisher Site  Google Scholar
 G. R. Jahanshahloo, F. H. Lotfi, N. Shoja, G. Tohidi, and S. Razavyan, “A onemodel approach to classification and sensitivity analysis in DEA,” Applied Mathematics and Computation, vol. 169, no. 2, pp. 887–896, 2005. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 A. Bick, F. Yang, S. Shandalov, and G. Oron, “Data envelopment analysis for assessing optimal operation of an immersed membrane bioreactor equipped with a draft tube for domestic wastewater reclamation,” Desalination, vol. 204, no. 1–3, pp. 17–23, 2007. View at: Publisher Site  Google Scholar
 W. D. Cook and K. Bala, “Performance measurement and classification data in DEA: inputoriented model,” Omega—The International Journal of Management Science, vol. 35, no. 1, pp. 39–52, 2007. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2011 Ezgi A. Demirtas. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.