#### Abstract

The problem of optimal management of a water distribution system includes the determination of the operation regime for each hydrophore station. The optimal operation of a water distribution system means a maximum attention to assess the demands of the water, with minimum electrical energy consumption. The analysis of load profiles corresponding to a water distribution system can be the first step that water companies must make to assess the electrical energy consumption. This paper presents a new method to assess the electrical load in water distribution systems, taking into account the time-dependent evolution of loads from the hydrophore stations. The proposed method is tested on a real urban water distribution system, showing its effectiveness in obtaining the electrical energy consumption with a relatively low computational burden.

#### 1. Introduction

Water and energy are critical resources that affect virtually all aspects of daily life. A huge amount of electrical energy is necessary for the transportation, treatment, and distribution of water for drinking and industrial consumption and for different internal technological processes of water distribution systems. Water distribution systems are massive consumers of energy, which is consumed in each of the stages of the water production and supply chain: starting from pumping the water to the water treatment plant, followed by the treatment processwhile distributing the water via the network. In the Report Watergy by Alliance to Save Energy, it has been asserted that 2-3% of the world’s electrical energy consumption is used to pump and treat water for civil and industrial supply [1]. Energy costs constitute the largest expenditure for nearly all water utilities worldwide and can consume up to 65 percent of a water utility’s annual operating budget [2]. The energy requirements vary significantly from city to city, depending on local factors such as topography, location and quality of water sources, pipe dimensions and configurations, treatment standards required, and the types and numbers of consumers [1–8]. Water industry decisions on operational strategies and technology selection can also significantly influence electrical energy consumption [5]. A high electrical energy consumption may be due to various reasons: inefficient pump stations, poor design, installation or maintenance, old pipes with high head loss, bottlenecks in the supply networks, excessive supply pressure, or inefficient operation strategies of various supply facilities [2–4, 9–16].

Energy-saving measures in water supply systems can be realized in many ways, from decreasing the volume of water pumps (e.g., adjusting pressure zone boundaries) to reducing the price of energy (e.g., avoiding peak hour pumping and making effective use of storage tanks) or increasing the efficiency of pumps (e.g., ensuring that pumps are operating near their best efficiency point). These energy-saving measures often pay for themselves in months, most do so within a year, and almost all recover their costs within three years. Prolonging this enactment period would increase the investment required for long-term [11–16].

Utilities can further reduce energy costs by implementing on-line telemetry and control systems (SCADA) and by managing their energy consumption more effectively and improving overall operations from water supply systems [2]. The motivation for introducing such systems is due to the following factors [2, 9–15]:(i)operation of water supply systems which is in many cases becoming more complex, with rising demands, incorporating water from a variety of sources and aging systems;(ii)high operating costs, which justify investments to improve efficiency;(iii)control and computer hardware and software which are available and more reliable;(iv)the fact that as more computer control systems are installed, there is more experience from which to learn.

Therefore, a permanent policy for the reduction of electrical energy consumption not only involvesthe technical improvement in the water distribution system, but also requires the use of software tools to facilitate the operation process. Based on this concept, a new approach is proposed in the paper based on similarities that exist between daily load profiles of each hydrophore station from water distribution system and their grouping into representative clusters. By knowing the load profile, water companies can simplify the demand determination for their supply zone. Thus, they can provide better and improved efficiency marketing strategies.

Different techniques have been used in literature for the classification and load profiling, but most of them were implemented to solve the problems from power systems. Table 1 presents a synthesis of the solutions proposed in literature depending on the type of technique.

It can be seen that most techniques belong to Artificial Intelligence (clustering, data mining, self-organizing maps, neural networks, and fuzzy logic) and the number of references is higher after the year 2000. This aspect can be explained by the fact that more and more information is becoming available, faster than ever before. The impact is felt in the control and operation of the distribution and transport networks (electricity, steam, gas, and water). Under these circumstances of momentous changes, Artificial Intelligence has the potential to play a more important role.

This paper proposes an extension of profiling techniques in the area of water distribution systems for the assessment of electrical using on a hybrid algorithm. The algorithm uses the* K*-means clustering method for obtaining of representative load profiles and a statistical approach for the assessment of the electrical load in the water distribution systems. The remainder of this paper is organized as follows. In Section 2, the* K*-means clustering method is presented. Section 3 presents the steps of process for obtaining of representative load profiles. Section 4 presents the statistical method for assessment of the electrical load in a water distribution system. Section 5 presents the steps of proposed algorithm. Section 6 shows the results of testing the proposed method on a real system. Finally, Section 7 contains the concluding remarks.

#### 2. -Means Clustering Method

The* K*-means clustering is an algorithm to classify or to group the objects based on attributes/features into number of groups ( is positive integer number). The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid [14, 17–20]:
where is the center of cluster , while is the Euclidean distance between a point and .

Thus, the criterion function attempts to minimize the distance of each point from the center of the cluster to which the point belongs. More specifically, the algorithm begins by initializing a set of cluster centers. Then, it assigns each object of the dataset to the cluster whose center is the nearest and recomputed the centers. The process continues until the centers of the clusters stop changing.

The steps of the algorithm are the following [14, 19–22].

*Step 1. *Choose initial clusters centres .

*Step 2. *At the* k*th iterative step, distribute the samples among the clusters using the relation
where denotes the set of samples whose cluster centre is .

*Step 3. *Compute the new cluster centres , . The new cluster centre is given by
where is the number of objects in cluster .

*Step 4. *Repeat Steps 2 and 3 until convergence is achieved, that is, until a pass through the training sample causes no new assignments.

It is obvious in this algorithm that the final clusters will depend on the initial cluster centers chosen and on the values of .

The optimal number of clusters can be determined using the following algorithm [19–21].(1)Determination of the maximum of clusters : the maximum of clusters should be set to satisfy condition , where is the clustered objects from database.(2)Use the* K*-means clustering method with given for the set of objects from database.(3)According to the obtained clusters structure, determinate partition quality is evaluated.(4)Increase the number of clusters to the to see if* K*-means clustering method finds a better grouping of the data (to repeat Steps 2 and 3).(5)Show number of clusters that have obtained the optimal value of the silhouette global coefficient.

Assessing the results of the* K*-means algorithm represents the main subject of cluster validity. In the process of cluster analysis, the following properties of clusters are being examined: density, sizes and form of cluster, separability of clusters, robustness of classification. There are many approaches to cluster validation [17, 20, 23–27], but internal cluster validation tests are more popular in practice of cluster analysis. The test based on the Silhouette Global Index calculation is one of the most used [20, 23, 24, 28]. This calculates the silhouette width for each sample, average silhouette width for each cluster, and overall average silhouette width for data set. Using this approach each cluster could be represented by a so-called silhouette, which is based on the comparison of its tightness and separation. Cluster validity checking is one of the most important issues in cluster analysis related to the inherent features of the data set under concern. It aims at evaluating clustering results and the selection of the scheme that best fits the underlying data.

#### 3. Determination of Representative Load Profiles

The load profiling represents an alternative to the settlement based on energy meters because in many countries, in water distribution systems there is a lack of necessary metering and monitoring systems to collect data. The load diagram of the hydrophore stations is reconstructed using the normalized load profile and their daily (monthly, yearly, depending on the case) electrical energy consumption. The time interval of sampling load curve data can be one hour. In this situation, a load profile is represented by 24 load values throughout the day. The shape of load profiles is influenced by the type of hydrophore station and, on the other hand, by the type of day or season of the year. Because a large number of load profiles regarding various water hydrophore stations create unnecessary problems in handling them, they could be grouped into coherent groups, seeing that some similarities exist between load profiles. For this purpose, the* K*-means clustering method is applied to classify profiles of the hydrophore stations into coherent groups—representative load profiles (RLPs).

Each RLP is represented by a vector for , and the comprehensive set of RLPs is contained in the set . The time scale along the day is partitioned into time intervals of duration *,* for . Hourly values are used in this paper to exemplify the application. The variables used in the calculations are assumed to be represented as constant (average) values within each time interval. The clustering process forms clusters corresponding to the hydrophore stations. Further, a RLP is assigned to each station.

The algorithm is based on the load profiling process. The major steps are as follows.

*(**1) Measurements*. In this step a representative sample of the set of load profiles is identified, the most relevant attributes are to be measured, and the cadence for data collection is defined. Finally the collected data is gathered in a large database.

*(**2) Data Cleaning and Preprocessing*. In real problems, like this, involving a large number of measurements, spread over a large geographic area, andcollecting data during a considerable period of time different kind of problems will affect the quality of the database. The most relevant and frequent are communication problems, outages, failure of meters, and irregular atypical behavior of some consumers. The result will be a very large database with problems like noise, missing values and outliers. These data (after being cleaned, preprocessed, and reduced) are used in clustering process [43, 55].

*(**3) Classification*. For realization of this classification, the* K*-means clustering method is used. The normalized load profile for each hydrophore station from the water distribution system is determined using a suitable normalizing factor (e.g., energy over the surveyed period):
where is normalized value [p.u.]; is actual value [kW]; is the energy over the surveyed period [kWh]; and* N *is the total number of vectors from database.

*(**4) Determination of Representative Load Profiles.* Using the* K*-means clustering method, the normalized load profiles are refined so as to desist at the unrepresentative profiles. The representative load profile for each cluster is obtained by averaging the normalized values for each hour. These values (called the load factors) lead us to the representative load profiles corresponding to the active powers. The load factors for each cluster are calculated with relation
where represents the number of hydrophore stations from cluster* i*.

The signification of load factors , , from relation (5) is the following: these factors transform the electrical energy consumed by the medium member of cluster in average power demanded by it.

The representative load profiles can characterize very well the operation mode of the water hydrophore stations (identified by the clusters obtained), related to the electrical energy consumption.

*(**5) Assignation*. Finally, to each water hydrophore station is made the assignation of a representative load profile.

#### 4. The Assessment of Electrical Load in Water Distribution Systems

The assessment of electrical load in water distribution systems can be made using an improved simulation method based on the representative load profiles of hydrophore stations. The method is based on the following hypotheses [30, 31, 56].(i)The mean loads corresponding to a cluster of hydrophore stations from the water distribution system in any hour during the analyzed period are approximately proportional to the electrical energy consumption of those stations.(ii)The loads for any hour during the analyzed period have a statistical distribution that can be regarded as normal.

Using these hypotheses, the load estimation of a water distribution system, at any hour , is given by the following formula:
where is the load of the water distribution system at the hour* h*; is the number of clusters corresponding to the hydrophore stations from water distribution system. is the number of the hydrophore stations from cluster* i*; is the average energy consumption of the hydrophore stations from cluster* i*; is the average load factor of the hydrophore stations from cluster* i*, at the hour* h*; is the standard deviation for load factor from hydrophore stations corresponding to cluster* i*.

The standard deviations for each cluster are calculated with relation

#### 5. Algorithm for Assessment of Electrical Load in Water Distribution Systems

The algorithm adopts a procedure composed of two calculation stages:(i)the use of a* clustering *technique (*K*-means method) in the first stage to determine the patterns of electrical load and determine a subset of representative load profiles to be processed by the second stage: at each iteration, the clustering outcomes simplify the process of selecting a relatively small number of load profiles corresponding to hydrophore stations;(ii)the use of* load simulation* in the second stage: this approach is based on a statistical method that considers the loads for any hour during the analyzed period have a statistical distribution that can be regarded as normal. The flow chart of the proposed method is shown in Figure 1.

For a water distribution system, the input information refers to the technical characteristics (rated power of the force pumps, rated water flow) and the hourly load patterns of hydrophore stations.

##### 5.1. Determination of Representative Load Profiles

The load profile of each hydrophore station is normalized relatively to the daily electrical energy consumption. The normalization is made in relation to daily electrical energy consumption because it is always known. This consumption is recorded with meters placed in each hydrophore station. The optimal number of clusters is obtained using the* K*-means clustering algorithm presented in Section 2. Finally, the silhouette global coefficient is calculated to assess the partition quality. After aggregation of the normalized load profiles of each cluster the representative profiles are determined and a RLP is assigned to each hydrophore station.

##### 5.2. The Assessment of Electrical Loads

In the second step of the study, using information from clustering process (hourly average load factors, average energy consumption, and standard deviation of load factors for each hydrophore station from cluster , ) and relation (6), the hourly loads of the analyzed water distribution system will be obtained.

The accuracy of the estimates is expressed, depending on the data available. Thus, if the actual value of the estimated quantity is available (such as during method development and testing), the following quantity can be useful to verify the method: where and represent the real and estimated values for the load of water distribution system at the hour .

The mean absolute percentage error (MAPE) from (8) is dimensionless and thus it can be used to compare the accuracy of the model on different data sets.

#### 6. Case Study

In order to show the characteristics of the proposed method for assessment of electrical energy consumption, a real water distribution system with hydrophore stations is considered. For this system, the input information refers to the technical characteristics (Table 2) and the hourly load patterns of the hydrophore stations.

Thus, in all stations, three or four similar force pumps having the rated power between 2.2 and 7.5 kW are installed. The required water flow is delivered at a constant pressure, by changing the frequency of the source supplying the electrical engines of the force pumps. The load patterns are represented by load profiles of the water hydrophore stations. The measurements of individual load profiles were performed using an electronic meter. A sensor and an electronic device for pulse counting and data storage compose this meter. These profiles were processed for the day when it registered the maximum load in the water distribution system. The time interval is defined by taking hourly steps within a day ( and = 1 hour).

The normalization of the load profiles was made in relation to daily electrical energy consumption. Further, the optimal number of clusters was determined using the algorithm described in Section 2. Getting started, the maximum of clusters was calculated (. Then, for the set of normalized active power profiles, the* K*-means clustering method with a given is used. Finally, the silhouette global coefficient is calculated for the assessment of partition. Because the silhouette global coefficient has the highest value for , this represents the optimal solution for clustering process, Figure 2. For this solution, the silhouette plot is presented in Figure 3.

The characteristics of clusters ( and , , ) are presented in Table 3. After aggregation of normalized load profiles of each cluster, Figure 4, the representative profiles were determined. Representative load profile for each cluster is obtained by averaging the values for each hour represented by load factors *, *. Graphical representation of the representative load profiles corresponding to the three clusters obtained (*C*1,* C*2, and* C*3) is given in Figures 5, 6, and 7.

**(a)**

**(b)**

**(c)**

One hourly value from the representative load profiles denotes the load of a water hydrophore station in per unitof the total average daily load of this station. The hourly load pattern can be employed to approximate the load pattern of any water hydrophore station within the same cluster.

The results of calculations carried out during the first stage of the clustering procedure are presented in Table 4.

From Table 4 it can be seen that the most consistent clusters are* C*1 and* C*3, which together accounted for about 85% of the total load profiles of the water hydrophore stations. In terms of technical characteristics, the water hydrophore stations from cluster* C*2 belong to types I and II (installed rated power is less than 8.8 kW); the stations from clusters* C*1 and* C*3 belong to types III–VI (installed rated power is between 16 and 30 kW). But, in cluster* C*3 approximately 50 percent of the total stations have an installed rated power by 22 kW. In terms of the electrical energy consumption, it can be observed that the highest medium value is registered for cluster* C*1, and the lowest value is registered for cluster* C*2.

In the second step of the study, using information from Tables 3 and 4 and relation (6), the hourly loads of the analyzed water distribution system were obtained. Thus, in Table 5, Figures 8 and 9, the real loads, estimated loads, and estimated errors are presented. The estimation maximum error is 4.79%, and minimum error is 1.37%. Global results show that the value for MAPE is 3.11%.

This may be considered a very good result, especially if we take into account the arbitrariness inherent to loads behavior and not all hydrophore stations have monitoring system.

#### 7. Conclusions

The investigation of combined actions, which account forconnected water and electrical aspects, allows improving the global efficiency of the water distribution systems and supplying the consumers using the least possible amount of water and energy.

In water distribution companies, the system operator predicts the consumption for today and tomorrow based on recent consumption trends, weather forecasting, day of the week, knowledge of future events, and historical knowledge of utility system performance. This approach conducts at a large percentage error (more than 5%) for assessment of load.

The proposed method is based on two stages in which the* K*-means clustering method and a load simulation technique are exploited for estimation of the electrical energy consumption. The method based on the* K*-means clustering algorithm was used for determination of the representative load profiles of the hydrophore stations. A comparison of obtained results using the proposed method with the real registered data indicates an error by 3.11%, which is very close to the expected one by water companies (2.5–3%) [57].

Results obtained demonstrate the ability of the proposed method to become the first step in an efficient management of the water distribution systems. On the one hand, the representative load profiles can be used to model overall water company demands in a way that show how changes in use by one category affect the hourly load profiles for the system as a whole and on the other hand these profiles can contribute to a better understanding of the opportunities for linking water-efficiency and energy-efficiency programs.

#### Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.