A Data Envelopment-Based Clustering Approach for Public Sugar Factories in Privatizing Process
Turkish Sugar Inc., a public enterprise including 25 factories, is the first corporation of Turkish industry. According to the government policy, public sugar factories (PSFs) will be privatized as geography-based 6 portfolio groups in two years. As performance measures of PSF affect government, sugar producers, and several unions in privatizing process, a systematic approach is necessary to measure efficiencies and grouping factories. This paper uses a new DEA- (Data Envelopment Analysis-) based clustering approach for measuring efficiency scores of PSF and grouping them instead of geography- based portfolio groups. This new approach can help decision makers in privatizing process. At the same time, target values obtained by dual model can be used to eliminate inefficiencies of some PSFs.
Sugar factories are the first corporations of Turkish industry. The first sugar factory was established by the direction of Kemal Ataturk in Alpullu in 1926. Annual sugar demand of Turkey which is supplied by three different sugar producers is 2.3 million ton. These producers are Turkish Sugar Inc., a public enterprise including 25 factories, Pankobirlik (beet producers union) with 6 factories, and starch-based sugar producers with 5 factories. The market share of these producers is 70%, 20%, and 10%, respectively. Turkish sugar Inc. and Pankobirlik use beet to produce sugar instead of starch.
According to the government policy, sugar factories of Turkish Sugar Inc. will be privatized as geography-based 6 portfolio groups in two years: A (Kars, Ercis, Agri, Mus, and Erzurum), B (Elazig, Malatya, Erzincan, and Elbistan), C (Kastamonu, Kirsehir, Turhal, Yozgat, Corum, and Carsamba), D (Bor, Eregli, and Ilgin), and E (Usak, Alpullu, Burdur, and Afyon). Some Turkish and foreign corporations aspire to buy sugar factories like Pankobirlik, Keskinkilic, Torunlar (Turkish), Sudzucker (German), Saint Louys Sucre (French), and British Sugar.
According to the privatizing supporters, average production season length is shorter in TR (64 days) than that in EU (120 days) and the average number of personnel per factory in TR and EU are 500 and 200, respectively. The number of average personnel per factory is too large in contrast with Europe. On the other hand, usage of noneconomic beets with low polar sugar value brings high costs. According to the opponents, the reasons of high prices are high starch-based sugar quota (15% in TR and 2% in EU) and outdated technology. If the starch based-sugar quota is decreased and new technology is used, PSFs become profitable. At the same time, to prevent external dependency and rural-urban migration and to protect sector, state of council should stop enforcement decision about privatizing. In the final situation, state of council stopped enforcement decision because of the following reasons: (i) the specification contains 5-year production obligation and 50-million-dollar assurance, (ii) it does not guarantee supply-demand balance and stability, and (iii) it does not guarantee production sustainability, and it creates external dependency.
Because of the reasons mentioned above, performance measurement of PSF is an important task which affects government, sugar producers, and several unions in privatizing process. A systematic approach is necessary to measure efficiencies of PSF. This paper is the first real-life application of DEA-based clustering approach developed by Po et al.  for measuring efficiency scores of PSF and grouping them instead of geography-based portfolio groups. This new approach can help decision makers in privatizing process of PSF. At the same time, target values obtained by DEA model can be used to eliminate inefficiencies and to make inefficient factories profitable. Additionally, to the best of the author’s knowledge, there is no scientific study for efficiency measurement of public sugar factories in our country or elsewhere.
The rest of this paper is organized as follows: Section 2 discusses DEA and DEA-based clustering approach which is developed by Po et al. . In this section, the focus is why and how piecewise production functions drawn from DEA models are employed to cluster data. Section 3 illustrates the proposed DEA-based clustering approach for measuring efficiency scores of PSF and grouping them to help decision makers in privatizing process. The results obtained by DEA-based clustering approach are compared with geographic based portfolio groups and target values obtained by CCR model are given to eliminate the inefficiencies of some PSF. Finally, conclusions are stated in Section 4.
2. DEA-Based Clustering Approach
Conventionally, most clustering algorithms are procedures that minimize total dissimilarity; examples of such algorithms are given in the paper of Po et al. .
A general clustering method is to find cluster centers so that the total dissimilarity measure with is minimized. is usually defined as a distance-based function, and the problem here is to select a useful and reasonable distance measure .
On the other hand, the stated clustering approaches can be seen as a feature analysis technique. An assumption of the underlying feature analysis is to regard the feature items as multiple features so that the minimization of presents the closer of data among their features and makes it more possible for these DMUs to be classified into the same cluster. However, the clustering results derived from the minimization of the total feature dissimilarity may not be helpful in some cases of clustering DMUs, especially in production units. In these cases, we use their production data to cluster them. Suppose that the production data have feature items with to being input items and to being output items. The clustering information obtained from the conventional clustering approaches can only reveal that DMUs are more similar to another one. However, the more important information we want to know is the production feature (functions) implied from the production data of all DMUs. That is, . From these derived production functions, , all DMUs are classified into different clusters (production functions). Therefore, each DMU knows not only the cluster that it belongs to but also knows the production function type that it confronts. Each DMU can compare its production feature with the other production functions so that the combination of its input resources or the combination of inputs and outputs can be readjusted. That is, for the case of data feature with input and output items, the cluster derived from production functions is more valuable than that derived from feature dissimilarity measures.
The idea of Po et al.’s study  is to employ the production functions to cluster production data. The method supporting this idea is DEA, as initiated and developed by Charnes et al. . The DEA is a data-oriented method for evaluating the relative efficiency of DMUs where each DMU is an entity responsible for converting multiple inputs into multiple outputs. Since the fundamental of DEA uses the nonparametric mathematical programming approach to estimate piecewise frontiers and envelop the DMU data sets, in this study, each piecewise frontier is regarded as one cluster of production functions. Therefore, we use all piecewise frontiers as a base to cluster production data. That is, they give up traditional clustering approaches of feature dissimilarity and propose a new approach by adopting the production functions revealed by the observation data to cluster all DMUs.
DEA is a nonparametric method for the estimation of production frontiers. It is a useful tool for evaluating the relative efficiency for a group of DMUs. Up to now, DEA has been widely studied and applied in various areas for 30 years since Charnes et al.  first proposed the DEA method with the CCR model. Among them, the main forms of DEA models and their extensions include those of BCC model , the additive model,  and the imprecise DEA models [5, 6]. Modifications and extensions are the assurance region models [7, 8], superefficiency models [9, 10], cone ratio models [11, 12]. Stochastic and chance-constrained extensions are considered by some authors [13–17]. Taxonomy and general model frameworks for DEA can be found in [18, 19]. The CCR is the original model of DEA (see the M1 model) and is used in this study to explain the DEA-based clustering approach.
The DEA model generalizes the usual input/output ratio measure of efficiency for a given unit in terms of a fractional linear program formulation. According to the economic notion of Pareto optimality, the DEA method states that a DMU is considered inefficient if some other DMUs or some combinations of other DMUs produce at least the same amount of output with less of the same resources input and not more of any other resources. Conversely, a DMU is considered Pareto efficient if the above is not possible. Suppose that there are DMUs to be evaluated, is the noted amount of the th input for the th DMU and is the noted amount of the th output for the th DMU. Output multipliers are (one for each item of output) and input multipliers are (one for each item of input). The mathematical formulation of the method is summarized next, where the relative efficiency of the is determined . See the M1 model.
M1 Model: The DEA model is essentially a fractional programming problem with a ratio of a weighted sum of outputs to a weighted sum of inputs where the weights for both inputs and outputs are to be selected in a manner that calculates the efficiency of the evaluated unit. Therefore, the original form of the DEA model is both nonlinear and nonconvex problem. Charnes et al.  proved that fractional programming problem can be transformed into linear programming formulations. The first formulation is ‘‘input based,” constraining the weighted sum of outputs to be unity and minimizes the inputs that can then be obtained. The second formulation is “output based,” constraining the weighted sum of inputs to be unity and maximizes the outputs that can then be obtained (see the M2 model). Given constant returns to scale assumption, the result from the input-based model is the reciprocal of that from output-based model. If variable returns to scale are assumed, there is no direct relation can be found between these two models.
For the clustering approach used in this study, the results can be different for those PSFs which are not on the production frontier according to the way that input-based or output based model is applied. The choice of using an input-based or output-based model depends on the production process characterizing the firm (i.e., minimize the use of inputs to produce a given output or maximize the output with given levels of inputs). The objective of this study is to find the set of coefficients associated with each output and input that will give the PSF being evaluated the highest possible efficiency by using the M2 model. Then, target values are calculated by using this model to eliminate the inefficiencies of some PSFs.
DEA differs from the production theory of economics in that it is nonparametric. In economics, the production function is a function that summarizes the process of converting multiple inputs into a single output. Thus, a general mathematical form for the production function in economics can be expressed as , where is a quantity of output and are quantities of inputs. However, DEA is a nonlinear programming model for evaluating a process converting multiple inputs into multiple outputs, that is, . Most previous studies had mentioned and discussed the properties of production function that are hidden in DEA methods [8–10, 14, 15, 17, 22].
Since the number of DMUs is usually much larger than the number of inputs, we prefer to express the linear programming in its duality form. Further, the duality form can interpret the geometric meaning of DEA and provide information about conservation of resources or expansion of outputs to have DMUs from inefficiency to efficiency.
If is the optimal value of , the is said to be efficient if and only if . If is less than 1, is inefficient. According to the efficiency ratio, DMUs may be grouped as good () and poor () performers or clustered by assigning different efficiency ratio grades [23–27]. Although clustering by efficiency ratio gives some information about the rationality of output/input, it does not reveal the intrinsic relationship between the input and output production features. Therefore, this study adopts piecewise production functions derived from the DEA method to cluster data.
In M2 model, it is obvious that the constraint is an inequality formula of production functions. Solving M2 model yields the virtual multipliers and . Thus, is derived. Running M2 model for to gives all production functions. Then, all DMUs are classified into different clusters by these piecewise production functions. Thus, a clustering method using production functions via the DEA method is implemented. Po et al.  find that there is less consideration in using these production functions as a reference to classify evaluated DMUs, and they propose a clustering approach according to the properties of DEA and its production possibility set such that they can use these production functions as a reference to classify evaluated DMUs. The details about the algorithm used in which the DEA-based clustering method is applied are given in their paper.
3. DEA-Based Clustering of PSF
In this study, we have an efficiency evaluation problem with 25 PSFs (DMU), each PSF with three inputs and one output obtained by 2009-2010 annual activity reports. Actually processed beet quantity (PBQ), fuel consumption (FC), number of total personnel (TP), sugar production (SP), and molasses production (MP) data are placed in annual activity reports of PSF, and all of them are real and correct. PBQ, FC, and TP are considered as inputs. Only SP is selected as output because it is correlated with MP.
The simplified production data of PSF are shown in Table 1. This table shows the required quantity of inputs to produce one unit of (one metric ton) sugar. For example, PSF 22 uses 9.96 ton beet, 0.419 ton fuel, and 0.0213 personnel to produce one ton sugar according to Table 1.
The objective is to find the set of coefficients ’s associated with each output and ’s associated with each input that will give the PSF being evaluated the highest possible efficiency. By using the M2 model for each PSF its efficiency ratio and the solution of virtual multipliers, are obtained. The multipliers are measure of the relative increase in efficiency with each unit
reduction of input value, where is a measure of the relative decrease in efficiency with each unit reduction of output value.
The analytical results are shown in Table 2.
By selecting the set of virtual multipliers to be all nonzero, four frontiers of production functions (PFs) are found:
PSFs with (*), (**), (***) in Table 2 confront the degenerative frontier. Po et al.  suggest that they should be reclassified into the nearest effective frontier (the frontier with nonzero virtual multipliers). In this application, it is observed that PSFs with (*) confront the nearest effective frontier , thus their efficiency ratio will be reevaluated by this frontier. However, in complicated applications (with more data items of input and output), it is impossible to judge the nearest effective frontier by observation. Hence, for PSF 7 (Carsamba), we follow the procedure of Po et al. , taking , into (PF1), (PF2), (PF3), and (PF4), respectively. The value is calculated, giving , , and . By taking the maximal value, the efficiency ratio for PSF 7 is re-evaluated as 0.6307. In addition, PSF 7 is classified into the cluster determined by the corresponding envelope (PF1).
In this study, some PSFs achieve 100 percent efficiency and are referred to as the relatively efficient units, whereas other units with efficiency ratings of less than 100 percent are referred to as inefficient units. According to the results of Table 2, there are six efficient (PSF1, PSF8, PSF11, PSF14, PSF21, and PSF25) and 19 inefficient PSFs. The 5 PSFs out of 19 have greater than or equal to 0.95 efficiency ratio. Additionally, the net revenues of PSF are supported by the DEA results. According to four different production functions, 25 PSFs are classified into four clusters. Clustering results are shown in Table 3.
By considering (PF1), (PF2), and (PF3), TP is the most critical input for the PSF in clusters 1, 2, and 3. The multiplier of TP has the biggest value for these clusters. The relative increase in efficiency is 5.36 with each number reduction of TP for the inefficient PSF placed in clusters 1 and 3. Similarly, the relative increase in efficiency is 1.415 for the inefficient PSF placed in cluster 2 (see Table 3). The order of multipliers for other inputs changes. For example, for the PSF in cluster 2, the multipliers of FC and TP are similar and higher than PBQ. On the other hand, FC is the most critical input for the PSF in cluster 4. The relative increase in efficiency is 1.018 with each ton reduction of FC for the inefficient PSF placed in cluster 4 (see Table 3). DEA and geography-based clustering results are compared in Table 4.
As you can see in Table 4, DEA-based clusters contain different geography-based portfolio groups (1-E, C, D), (2-A, B, D), (3-C), and (4-A, B, E). Moreover, the clustering results derived from geography-based portfolio may not be helpful in cases of clustering PSF. From the derived production functions (PF1, PF2, PF3, and PF4), all PSFs are classified into four different clusters (production functions). Therefore, each PSF knows the PF type that it confronts. Additionally, each PSF can compare its production feature with the other production functions so that the combination of its input resources or the combination of inputs and outputs can be readjusted. That is, for the case of data feature with input and output items, the cluster derived from production functions is more valuable than that derived from geography-based portfolio groups. It is possible to eliminate inefficiencies by considering DEA-based clustering. For example, inefficient PSF placed in clusters (1), (2), and (3) should give priority to decrease the number of total personnel. In the same manner, inefficient PSF placed in cluster (4) should decrease fuel consumption at first. It is meaningful to support privatizing decisions by DEA-based clustering results than geography-based portfolio groups.
At the same time, target values of inputs are calculated by using slack variables of M2 model and illustrated in Table 5 for the inefficient PSF. Target values can help decision makers to eliminate the inefficiencies. For example, 9.96 ton beet, 0.419 ton fuel, and 0.0213 personnel are used to produce one ton sugar in PSF 22. When 6.477 ton beet, 0.273 ton fuel, and 0.0014 personnel are used to produce one ton sugar, PSF 22 becomes efficient.
This study develops a DEA-based clustering approach for the evaluation of PSF. The proposed approach employs the piecewise production functions derived from the DEA method to cluster the data with input and output items. Compared with geography-based clustering that only considers geographical location of PSF, our proposed approach reveals the input-output relationships hidden in the data items of input and output. Thus, for each evaluated PSF, we know not only the cluster that it belongs to but also the production function type that it confronts. It is very important for managerial decision making where decision makers are interested in knowing the changes required in combining input resources so that it can be reclassified into a different and desired cluster/class in privatizing process.
The focus of this paper is to examine the CCR model of DEA and then establish the DEA-based clustering. Without loss of generality, while this approach has been carried out for the CCR model, the proposed approach can be easily extended to other DEA models. The clustering results drawn from the DEA-based clustering are unit invariant, meaning that they are not affected by the scale of data.
The DEA-based clustering approach is suitable for most clustering problems, where there are inputs-and-outputs or cause-and-effect relationships between the features. For example, we use the proposed approach in the analysis of industry classification, sorting of PSF by input-output data.
In summary, in view of the advantages of the DEA-based clustering approach, it is uniquely poised for clustering problems. We believe that future researches are necessary to unleash the full potential of this DEA-based clustering approach. It thus has tremendous potential to be used for various clustering problems. DEA-based clustering algorithm developed by Po et al.  is robust to a slight change in the input and output data sets, but not to outliers. Future researches will consider developing a robust-type DEA-based clustering algorithm.
A. Charnes, W. W. Cooper, B. Golany, L. Seiford, and J. Stutz, “Foundations of data envelopment analysis for Pareto-Koopmans efficient empirical production functions,” Journal of Econometrics, vol. 30, no. 1-2, pp. 91–107, 1985.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
K. C. Land, C. A. K. Lovell, and S. Thore, “Productivity and efficiency under capitalism and state socialism: an empirical inquiry using chance-constrained data envelopment analysis,” Technological Forecasting and Social Change, vol. 46, no. 2, pp. 139–152, 1994.View at: Publisher Site | Google Scholar
W. W. Cooper, H. Deng, Z. Huang, and S. X. Li, “Chance constrained programming approaches to technical efficiencies and inefficiencies in stochastic data envelopment analysis,” Journal of the Operational Research Society, vol. 53, no. 12, pp. 1347–1356, 2002.View at: Publisher Site | Google Scholar | Zentralblatt MATH
W. W. Cooper, L. M. Seiford, and K. Tone, Data Envelopment Analysis: A Comprehensive Text, with Models, Applications, References and DEA Solver Software, Springer, New York, NY, USA, 2nd edition, 2007.