#### Abstract

Electricity sector regulators are practicing benchmarking of distribution companies to regulate the allowed revenue. Mainly this is carried out based on the relative efficiency scores produced by frontier benchmarking techniques. Some of these techniques, for example, Corrected Ordinary Least Squares method and Stochastic Frontier Analysis, use econometric approach to estimate efficiency scores, while a method like Data Envelopment Analysis uses linear programming. Those relative efficiency scores are later used to calculate the efficiency factor (X-factor) which is a component of the revenue control formula. In electricity distribution industry in Sri Lanka, the allowed revenue for a particular distribution licensee is calculated according to the allowed revenue control formula as specified in the tariff methodology of Public Utilities Commission of Sri Lanka. This control formula contains the X-factor as well, but its effect has not been considered yet; it just kept it zero, since there were no relative benchmarking studies carried out by the utility regulators to decide the actual value of X-factor. This paper focuses on producing a suitable benchmarking methodology by studying prominent benchmarking techniques used in international regulatory regime and by analyzing the applicability of them to Sri Lankan context, where only five Distribution Licensees are operating at present.

#### 1. Introduction

The regulators in distribution sector in the world expect to increase investments for increasing electrification with reductions of losses, reduction of the number of employees, and so forth. In Sri Lankan context the regulator is looking to reduce the tariff also as distribution sector is running with government funds. Therefore the main target of the Sri Lankan regulator is to provide the utility with incentives to improve their operating efficiency to ensure that customer will get quality electricity with low price. There are five electricity Distribution Licensees operating in Sri Lanka. In Sri Lanka the allowed revenue for a particular distribution licensee (DL) is calculated according to the allowed revenue control formula as specified in the tariff methodology of Public Utilities Commission of Sri Lanka (PUCSL). The Operational Expenditures (OPEX) component of the base allowed revenue needs to be adjusted at a rate defined by an efficiency factor per year. In successive tariff periods, the regulator (PUCSL) can revise the methodology for computing the efficient OPEX to be included in the distribution allowed revenue. A relative OPEX efficiency score obtained from a benchmarking study is an input to formulate this efficient factor (X-factor). PUCSL can decide on X-factor using the result of a benchmarking study. At present PUCSL does not take into account the X-factor when deciding the allowed revenue for each DL. The reason for not considering the X-factor is that there are no benchmarking studies that have been done on DLs to obtain relative OPEX efficiency scores. Without these relative efficiency scores (percentage values like 100% for one DL, 60% for another, etc.) X-factor cannot be obtained. Therefore PUCSL requires a suitable methodology to benchmark Distribution Licensees in Sri Lanka.

This paper describes a methodology to benchmark five Distribution Licensees in Sri Lanka, which facilitate PUCSL to regulate allowed revenue for each DL according to the relative OPEX efficiencies. The regulator can set differentiated price limits based on the companies’ efficiency performance estimated from a benchmarking analysis [1]. And also it can decide which companies deserve closer examination, so that scarce investigative resources are allocated efficiently [2].

There are different benchmarking techniques used by international regulators [2–5]. Selecting the most appropriate benchmarking methodology is done after considering the principles discussed by CEPA’s reports on benchmarking [3, 6]. The rest of the paper is organized as follows. A brief overview of different benchmarking techniques is presented in Section 2. Data Envelopment Analysis with methodologies is discussed in Section 3. Section 4 is presented results and discussion and finally the conclusions are given in Section 5.

#### 2. Benchmarking Techniques

The following were identified as prominent techniques from the literature review for considerations against the principles discussed by CEPA’s report [7]:(i)Partial Performance Indicators (PPIs),(ii)Ordinary Least Square (OLS),(iii)Data Envelopment Analysis (DEA),(iv)Stochastic Frontier Analysis (SFA).

##### 2.1. Partial Performance Indicators (PPIs)

These indicators are used to compare the ratios of single output to a single input of firms (e.g., energy sold per OPEX). They are often significantly affected by the capital substitution effects [7]. PPIs used in isolation are not possible to use the differences in the energy sector that directly impact on the market. For example, a utility may experience a relatively high or low unit cost simply because of the customer category. Therefore PPIs may not provide a meaningful comparison across different DLs as they are operating at different conditions [8].

##### 2.2. Data Envelopment Analysis (DEA)

Data Envelopment Analysis (DEA) is the prominent technique used by the researchers for benchmarking in the literature [9–13]. Thakur has used DEA and Malmquist Productivity Index to find the rate at which the efficiency frontier has moved over recent years after implementing the reforms process in India [14]. Cui et al. applied DEA and Malmquist Productivity Index to calculate the energy efficiencies of nine countries during 2008–2012 and explained the reasons for energy efficiency changing with respect to technical and management factors [15]. Javier Ramos-Real et al. have estimated the changes in the productivity of the Brazilian electricity distribution sector using Data Envelopment Analysis in terms of productivity change [16]. Chien et al. applied Malmquist Productivity Index comparing the performance of different thermal power plants in Taiwan [17].

DEA involves linear programming to determine the efficient firm(s) from a sample relative to the other firms in the sample [18–20] while the Malmquist Productivity Index evaluates the efficiency change over time [21]. In this nonparametric technique, the ratio of weighted outputs to the weighted input is maximized subjected to constraints (required to solve individual linear programming problems for each firm in the sample). The efficient firm is the one where no other firm or linear combination of other firms can produce more of all the outputs using less input [6]. It is important to select input output variables reflecting the use of resources and misspecification of variables can lead to wrong results [6]. DEA can also accommodate environmental variables that are beyond the control of the firms but can affect their performance (e.g., population density of a particular area of operation). This method is a multidimensional method and inefficient firms are compared to actual firms (or linear combinations of these) rather than to some statistical measure. This does not require specifying a cost or production function. Importantly DEA can be implemented on a small dataset, where regression analysis tends to require larger minimum sample size, but in case of small samples and high number of input or/and output variables there is a danger of overspecification of model and eventually “made-up” results for efficiency scores [22]. As more variables are included in the model, the number of firms on the efficient frontier increases.

##### 2.3. Corrected Ordinary Least Squares (COLS)

With this regression technique the most efficient firm or the frontier is estimated. This “corrected” form of ordinary least square has assumed that all deviations from the frontier are due to inefficiency [6]. This method requires the details of the cost or production function and assumptions about technological properties. COLS method is easy to implement and allow statistical inference about which parameters to include in the frontier estimation [6]. This method requires large data volume in order to create robust regression relationship and is sensitive to data quality.

##### 2.4. Stochastic Frontier Analysis (SFA)

Similar to COLS, SFA requires the specification of a production function based on input variables. But in this model the errors in parameters are incorporated into the model and do not assume that all errors are due to inefficiency [23]. A model of the form described under COLS is estimated with two error functions. The first of these will be assumed to have a one-sided distribution. The second error term has a symmetric distribution with mean zero. The Cobb-Douglas stochastic frontier model takes the form [24]where is an output, is an input, and , are error terms. SFA is theoretically the most appealing technique but the hardest to apply. Since it is difficult to implement in small samples, regulators traditionally have been reluctant to use SFA techniques in setting X-factors [6].

Further it is important to note that the reliable panel data of OPEX was not available. Unavailability of this published/audited historical OPEX data was mainly due to the fact that major 4 DLs are from the same legal entity having no separate audited accounts till the year 2010 (OPEX for year 2011 and 2010 was the only available data). This results in avoiding techniques that rely on panel data for this study.

#### 3. Data Envelopment Analysis

There are a number of variables that can be considered when implementing any benchmarking technique as described in Section 2. In regulators’ point of view factors such as quality of the data, availability, ease of collection, relevance to the business, international practices/reviews, use of statistical indicators (such as correlation), nonredundancy to minimize overlapping, high discriminating power, and reflection of the scale of operation and cost drivers have to be considered when selecting variables.

Therefore the regulator must take care to keep the number of variables to minimum while those variables are strong cost drivers (i.e., OPEX). Relevant data should be accurate and importantly be practical to collect from the DLs timely. In order to find quality and feasible data several reports were analyzed. These include published reports by PUCSL [25–28] and Licensees [29–40]. After studying the above reports the following set of variables were collected:(i)energy sold,(ii)total number of consumers—this is the number of consumer accounts or the number of consumer connection points,(iii)number of new connections provided,(iv)number of employees,(v)total distribution of lines’ length—this includes MV and LV network length,(vi)number of substations,(vii)authorized operation area—this is a constant for each licensee,(viii)operational expenditure.

Note that, in international benchmarking practices, the use of supply/service quality as a variable is rare. Most of the countries reviewed separately run a quality-of-service reward/penalty regime [23]. In Sri Lanka, the supply/service quality is to be determined according to the drafted electricity distribution performance regulations, where penalties have been introduced for underperformance [41].

##### 3.1. Justification of Selected Variables

###### 3.1.1. Cost Drivers

Cost is clearly depending on scale of the operation. Accurate data on energy distributed, production of the distribution business, the number of consumer accounts, network length (MV and LV line lengths), and the number of distribution substations can be timely obtained from DLs in Sri Lanka. Since data on the above-mentioned variables can be timely obtained, regulator can timely perform benchmarking exercise to figure out allowed revenue for each year.

###### 3.1.2. Dispersion of Consumers

Distribution line length per consumer can be taken as indication of what extent the consumer concentration is. It is also an indication of the extent of rural electrification efforts taken by the DLs. For each DL this value is different. For example, DL5 is having a lower value indicating higher concentration of consumers, whereas DL4 is having a larger value as indicated in Table 1.

###### 3.1.3. Correlation

Applying too many explanatory variables to a sample of few observations (i.e., the number of Distribution Licensees) would result in 100% efficient DLs. Therefore, it is necessary to combine several parameters into one single parameter in order to preserve sufficient degrees of freedom. It is important not to consider highly correlated variables simultaneously, in a benchmarking method. Correlation coefficients were calculated using past data from year 2006 to 2011 for each DL. The results are given in Table 2.

For example, correlation coefficient of energy delivered and number of consumer accounts is 0.9683, which is the highest correlation coefficient, while that of energy delivered and number of employees is having the second highest correlation. For further verification Figures 1 and 2 were plotted.

The energy delivered and the number of employees indicated higher correlation. It can be concluded that, from the selected set of variables, energy delivered and the number of consumers are having the acceptable correlation. It is sufficient to account for one variable from the energy delivered and number of consumers. Since energy delivered (output) is highly correlated with number of employees (input) it is justifiable taking number of employees as another input variable.

###### 3.1.4. Input, Output, and Environmental Variables

Operational Expenditure (OPEX) has been taken as the main input variable to assess the efficiency. Energy delivered was used as the main output produced. The number of new connections provided was taken as an output, while the number of employees was taken as input variable. The number of employees includes management and operational staff. Demand for new connections depends on the conditions of the authorized area of operation of DLs. This is not under the direct control of the management of the DL. To provide the demanded connection the DL has to input its resources. Table 3 depicts the variation between DLs [41]. This reflects the variation in demand for new connections that is varying according to the area of operation. DLs need to meet this demand. Therefore DLs need to input their resources accordingly.

As given in Table 3, DL1 is giving 40 new connections per day (on average) whereas DL5 is only providing 6 new connections per day (on average). Obviously DL1 needs to input more resources than DL5 to cope with the demand for new connections. The demand for connection is out of the control of the DL’s management. In some areas, a lot of infrastructure developments, resettlements, and rural developments are going on due to ending of the war with terrorists. This has caused high demand for new connections. Therefore, when evaluating the overall performance, the number of new service connections provided by respective DLs has to be considered.

Network length and substations can be considered as input or output either. Viewing the network length as an output runs the risk that a network that increases its length of lines is rewarded even if there is no impact on real world delivering of services to the customers [23]. In international regulatory practice network length has been considered as both input and output. Hence both scenarios were taken into consideration.

##### 3.2. Selection of Benchmarking Techniques and Models

Results from application of benchmarking method will directly impact the allowed revenue of each DL. If the method itself is complicated and harder to understand, then there would be a doubt in the minds of DLs about the efficiency results. DEA, COLS, and PPIs fulfill the desirable characteristics such as easiness to compute and understand transparency and ability to implement in smaller samples.

If a benchmarking method requires higher number of data points, then it will be harder to implement with a smaller sample like five, as in the case of only five DLs in Sri Lanka. DEA can be easily implemented with five DLs, but care has to be taken to verify the results with other methods. International practice is that, for number of inputs and number of outputs, there has to be number of DLs [42]. Otherwise all the DLs would get closer to 100% efficiency and discrimination could be difficult. In other words, with small sample and high number of input/output variables, there is a danger of receiving made-up results for efficiency scores [22]. When more variables are included in the model, the number of DLs on the efficient frontier increases. Feasibility of COLS has to be decided by practically implementing the COLS method with Cobb-Douglas cost function with the same set of variables, and also COLS implementation can be used to verify the results from DEA. To verify the results (efficiency scores) at least two different benchmarking methods must be used. Selected methods should have different characteristics so that the regulator can convince the DLs about the efficiency scores. In this case DEA and COLS are feasible to implement.

##### 3.3. Implementation of Benchmarking Techniques Using DEA

After considering the factors discussed in Sections 3.1 and 3.2, energy sales, number of new connections, number of employees, OPEX, number of substations, area per consumer, and network line length per consumer were selected when implementing DEA. Note that if “*total network length*” is to be taken as an input, then “*number of substations*” has to be taken as input also. On the other hand if “*network length”* is to be taken as an output, then “*number of substations*” has to be taken as output also.

For each model (8 input/output variables to 3 input/output variables), the efficiency scores were obtained. Note that every possible input output configurations (models) were taken into consideration when obtaining results. Average efficiency scores of each DLs against different models are shown in Table 4.

It can be seen that the discrimination between each DL’s efficiency score decreases with the number of variables considered. It is observed that DL2 is the lowest performer while DL5, DL1, DL3, and DL4 are ranked highest to lower according to the average efficiency scores. Even when considering 8 variables’ models, it can be observed about 10% gap of efficiency with respect to all other DLs. Therefore it is possible to take the 8 variables’ models as the base and take these efficiency values to calculate the X-factor. Note that the implementation is done using data corresponding to year 2011. The DL2 has high degree of freedom to improve its efficiency score since the model contains 8 variables.

If all DLs get closer to 100%, when implementing the DEA method with 8 variables models’ with current values for respective variables (i.e., according to the year of implementation, values for the variables may get changed), then the reduced variables’ models (starting from 7 variables to 3 variables) can be considered. This would allow higher discrimination between efficiency scores as it is observed in Table 4.

##### 3.4. Implementation of Benchmarking Techniques Using COLS

Implementation of COLS method has been done according to the description given in Section 2.3. It is required to select suitable variables for “benchmark cost function.” Variables should represent output produced by the business, input prices paid, and environmental conditions that affect the production cost.

In Sri Lanka, OPEX of DLs mainly consists of expenses for human resource. It is about 50% to 60% of their respective OPEX. Therefore cost per employee must be used as the main input price of the cost function. As energy sold (GWh) reflects the main output produced by the distribution business, it is included in the cost function. Five DLs have their designated area of operation. Accordingly the customer densities of DLs are different from each other. Table 5 illustrates the differences in customer densities as at year 2011.

Therefore the analysis must account for these differences in their business which is out of their (DLs) control. This variable is to capture the heterogeneity dimension of the distribution business [43, 44]. Further, the consumer density also can be accommodated in the model by using the consumers per unit network length, that is, number of consumers per kilometer of line length. Table 6 indicates the extent of heterogeneity. DL5 has a higher number since its area of operation is highly populated.

Efficiency scores with respect to the models are given in Table 7. The average results indicate more than 90% efficiencies for all DLs. Further, efficiency scores of all DLs lie in a band of 90.5% to 96.9%. Hence discrimination is lower. Therefore analysis carried out using three variables and the results are shown in Table 8.

It can be seen that the average efficiency scores are more dispersed than 4-variable models’ average. Efficiency scores are stretched out in a band of 75.6% to 100%. Hence discrimination is higher. Note that, in each model in Table 8, DL2 is the lowest performer. Efficiency score of DL2 always ended up below 77%.

##### 3.5. Implementation of Benchmarking Techniques Using PPIs

PPIs assume linear relationship between input and output. As explained in Section 2.1 they cannot measure the overall performance of the business. These partial indications can be misleading; therefore care should be taken to identify misleading information. PPIs were calculated for each DL by taking the OPEX and number of employees as inputs. Line lengths and number of substations were not taken into account, since those can be considered input or output either. On the other hand OPEX and number of employees can only be considered as inputs to the system while energy delivered to consumers and number of consumers can only be taken as outputs from the system. Table 9 depicts the results from PPIs.

Efficiencies obtained by PPIs are not used to directly conclude the relative efficiency score of a particular DL but to qualitatively verify the results obtained from DEA and COLS. It can be seen that DL1, DL3, DL5, DL4, and DL2 are having the efficiencies from highest to lowest, respectively.

#### 4. Results and Discussion

Let us consider the 4-variable model given under the DEA 3-variable method given in Table 10. In 4-variable model “energy sales” and “total network length” are taken as outputs of the electricity distribution business while OPEX is taken as the input. It is going to look at how efficiently (relatively) a DL has used its OPEX to provide electrical energy to its consumers and also to maintain the total network length owned by that DL.

In that case all DLs except DL2 have obtained relative efficiency score of 100%. DL2 has obtained a score of 77.4%. This means, only relative to each other, that DL2 is efficient by only about 77.4%. This does not imply that the other DLs with 100% efficiency score are strictly efficient. It is possible that DLs with 100% score could be operated more efficiently. DEA compares each DL with all other DLs and identifies those DLs that are operating inefficiently compared with other DLs’ actual operating results. It achieved this by locating the best practice or relatively efficient DLs. This can be graphically illustrated (Figure 3) in following manner according to the ratios given in Table 11.

In Figure 3, points A, B, C, D, and E represent DL5, DL1, DL3, L4, and DL2, respectively. The 100% efficient boundary is demarcated by the line connecting ABCD. The target “*efficient reference point*” for DL2 (i.e., point E) is given by the point which is the intercept of line AB and extended line OE.

In other words this efficient reference point is point , against which DL2 was found to be most directly inefficient. DL2 (point E) was found to have inefficiencies in direct comparison to DL1 (point B) and DL5 (point A). The efficiency of DL2 can be obtained by the ratio of which is equal to 77.4%. DL2 (point E) can approach the point to become 100% relatively efficient, by increasing the respective output/input ratios. In this case, DL2 can reduce its OPEX by 22.6% while keeping the actual outputs in the same level, to be 100% relatively efficient.

According to the average efficiency scores obtained by DEA 3-variable models, DL5 is the efficient performer with 100% relative efficiency. DL5 is 100% efficient which means that it is relatively efficient only and not strictly efficient. That is, no other unit is clearly operating more efficiently than this DL5, but it is possible that all DLs, including DL5, can be operated more efficiently. Therefore, the efficient DL (DL5 in 3-variable models) represents the best existing (but not necessarily the best possible) practice with respect to efficiency. It can be pointed out that considering the small sample size (5 DLs) DEA is theoretically more appealing than COLS technique because COLS requires to estimate the number of coefficients leading to unsatisfactory results purely because of low sample size.

When more variables are included in the model, the number of DLs on the efficient frontier increases. To avoid lower discrimination of efficiency scores the 3-variable models are the most suitable in this context as shown in Figure 4. The models selected must be robust to changes in techniques implemented. In particular, the ranking of firms, especially with respect to the “best” and “worst” performers, and the results must show reasonable stability and the different approaches should have comparable results. COLS and DEA are the main two different techniques used to measure the overall efficiency. Therefore robustness of the results obtained using those two techniques has to be analyzed.

Authors selected COLS 3-variable models over 4-variable models because 4-variable models results indicated average efficiency scores of all DLs lying in a band of 90.5% to 96.9% (i.e., low discrimination). COLS 3-variable technique indicated higher discrimination and the efficiency scores for all DLs lying in a band of 75.6% to 100%. Since authors have incorporated more variables (from 8 variables to 3 variables) in DEA, direct comparison with COLS results is not possible. The COLS method adopted used 3 variables including OPEX. Results from COLS method with 3 variables including OPEX can be compared with 3-variable model in DEA. This is because both methods used 3 variables as input and output; hence the degree of freedom is the same.

It can be seen that the results produced by DEA and COLS are robust for DL1, DL2, DL3, and DL4 as the differences are very low. For DL5 there is a considerable difference, but the efficiency score for DL5 is beyond 90% for both techniques. It is important to note that operation conditions of DL5 are extensively different than the remaining four DLs with respect to consumer density, authorized area of operation, and energy demand per consumer. According to the results given in Table 12 it can be concluded that average efficiency score given by DEA 3-variable models are robust and reliable.

##### 4.1. Ranking of DLs according to Overall Efficiency

Since Sri Lanka is in the initial stage of electricity regulation (Electricity Act came into force in 2009), it is more important to peruse underperforming DLs to obtain at least the next level of efficiencies performing by peer DLs. Further, according to the efficiency scores, the regulator can decide which companies deserve closer examination, so that scarce investigative resources are allocated efficiently [2]. Table 13 depicts the ranking of each DL according to each technique used and also verification by using PPIs.

DL2 is the lowest performer while DL5, DL1, DL3, and DL4 are ranked highest to lower according to the average efficiency scores. It can be recommended that DL2 deserves closer supervision while DL4 also requires close supervision of the electricity regulator (i.e., PUCSL) as they are underperforming relative to other three DLs.

##### 4.2. Influence on X-Factor

Regulator can officially obtain data for relevant variables and perform DEA analysis and use the average results from the DEA method which uses 3-variable models to obtain efficiency scores. Then it verifies those DEA results with efficiency sores obtained by COLS method using 3-variable method and verifies the rankings with PPIs. Then the average efficiency scores given by DEA 3-variable models can be used to decide on X-factor to persuade most underperforming DLs.

The regulator can decide on how to determine the X-factor (the translation of efficiency scores into X-factors), and the method of determining the X-factor may vary among the regulators [10, 13]. For example, X-factor can be calculated as shown in [11]. It is important to note that the relative efficiency scores resulting from this benchmarking exercise give an indication to the regulator (PUCSL) on how these DLs are operating relative to each other.

#### 5. Conclusions

The relative efficiencies of five Distribution Licensees operating in Sri Lanka were analyzed using prominent benchmarking techniques. International practices in electricity distribution regulatory regime were considered when performing this benchmarking study. Techniques like Data Envelopment Analysis (DEA), Corrected Ordinary Least Squares (COLS) method, and Partial Performance Indicators (PPIs) method were utilized with several input output models in order to assess the efficiency in several angles. Care was taken to address the heterogeneity of the operating conditions such as consumer density and authorized area of operation of each DL which is out of the management control.

The efficiency scores obtained with respect to various possible models were scrutinized and came up with a suitable methodology to obtain efficiency scores considering the data availability and low number of Distribution Licensees. The proposed methodology uses DEA with 3 input/output variables and gets the average efficiency scores as the final score, that is, having higher discrimination in the efficiency scores.

In parallel these efficiency scores were verified by the average results obtained by COLS method (3 variables including OPEX). Further, the ranking of Distribution Licensees is also verified with respect to DEA, COLS, and PPIs. It was revealed that for each method DL2 is the lowest ranked and DL4 is the next lowest ranked. DL1, DL3, and DL5 showed up more than 90% average efficiency for DEA and COLS.

Considering the fact that Sri Lanka is in its early stage in regulatory implementations, it is recommended to persuade underperforming DL. These efficiency scores would make a strong platform to the regulator when making the decision on X-factor in order to control the allowed revenue of Distribution Licensees. The methodology produced by this research have identified the inherent constraints prevailing in the context of Sri Lanka, such as low number of samples (i.e., 5 Distribution Licensees) and the unavailability of published/audited historical OPEX data (four DLs are coming under one legal entity. Therefore these four DLs do not have separate audited accounts till the year 2010). Hence, the electricity regulator can use the proposed methodology to start the evaluation of the efficiencies in order to begin incorporating efficiencies of Distribution Licensees in the electricity distribution revenue control formula. This would definitely encourage Distribution Licensees to minimize their inefficiencies in operations and maintenance. Further, the possible reduction in allowed revenue eventually would pass down to the consumers.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.