#### Abstract

The proposed research work is focused on forecasting the future requirements of water supply based on the current requirement of water and also identifying the possibility of occurrences of cracks and leaks using the ARIMA (autoregressive integrated moving average) model. The experiments were conducted using real-time experimental hardware. The pressure data obtained and their -value is less than 0.05, which represents the stability of the data in the ARIMA model. The forecasted pressure data range between 0.451379 N/m^{2} and 2.022273 N/m^{2}. The frequency of the forecasted pressure ranges between 1.706869 N/m^{2} and 3.065836 N/m^{2} (maximum peak) and −0.81046 N/m^{2} and 1.042164 N/m^{2} (minimum peak). Forecasted data of pressure at damaged condition lie between 2.880788 N/m^{2} and 3.29797 N/m^{2} and frequency ranges between 4.866227 N/m^{2} and 5.664348 N/m^{2}. Similarly, future forecasted data of water requirement for the next 1 year range between 614.6292 (liters/week) and 620.0099 (liters/week), the frequency of the forecast value with maximum ranging from 617.0086 (liters/week) to 628.5465 (liters/week), and the minimum peaks ranging from 611.0967 (liters/week) to 612.2914 (liters/week). The above data are for a single water distribution pipeline.

#### 1. Introduction

Water pipelines face significant problems as a result of chemical leaks, fires, and deformations such as particle accumulation, corrosion, and cracks caused by a variety of factors. The above leads to serious consequences, as the distribution of clean water is one of the major objectives and the whole world depends on it. Hopkins states that the water supply framework today comprises of foundation that gathers, oversees, stores, and conveys water from water sources to shoppers. Because of the absence of new common water sources and an inexorably developing populace, inventive water assets the executive’s approaches are required. [1] Water conveyance frameworks are right now confronting various significant difficulties, including maturing foundation, the interest for consumable water, protecting consumable water quality, debased foundation because of framework failures, ecological concerns, and rising energy costs [2]. Another significant issue facing water utilities is spillage; when it is not noticed long ago, most endeavors address this issue happened after a break or hole had happened. [3, 4] Many researchers studied on the mean-shift algorithm with Gaussian’s profile and made applications in the tracking system for better performance in the field of tracking objects. [5, 6] Monica et al. brief that the breaks and holes were fundamentally brought about by enormous varieties in pipe pressure inside a water conveyance framework [7, 8]. This proposed research is related to [9–12] oil pipeline transportation monitoring using a machine learning algorithm, which is the base work to carry out the present research. What’s more, subsequently, in this task, research has been led to recognize the issues that might be happening in the water circulation frameworks ahead of time utilizing time arrangement estimation calculation. The ARIMA model is used for predicting the future based on the past history of variations in datasets. However, it cannot be used to predict highly uncertain long predictions [13]. Jianbin Huang et al. clarify that the time arrangement anticipation is the way toward fitting models to chronicled information and afterward utilizing those models to foresee future perceptions. [14] In the initial step, past perceptions are accumulated and dissected to make a numerical model that catches the arrangement of hidden information-age measures. The model has been utilized to conjecture future occasions in the subsequent stage. This technique is particularly helpful when there is no sufficient logical model accessible. [15, 16] There are various kinds of time-series forecasting broadly utilized. [17] A study used an integrated genetic algorithm (GA), an algorithm which is used for forecasting monthly electrical energy consumption. [18] A detailed review study has been carried out by the authors [19] about the different furcating models. Among them, the ANN models, support-vector machine models, and ARIMA models were mostly preferable. The present research work uses the ARIMA model for anticipating the future requirements of the water management system. This examination of the line breaks and harms is used to screen the gas or oil pipelines that regularly get harmed and dirty the climate. [20–22] Ling Yang et al. state that the anticipation should be possible by utilizing the both occasional and nonoccasional ARIMA models. [23]. Time arrangements are utilized in the assortment of fields, including numerical money, fabricating, occasion information, IoT information, and some other space of applied science and designing that require worldly estimations. [24–26] As the quickest developing fragment of the information-based industry, time arrangement DBMS will bear witness to the business’ developing requirement for time arrangement determination. [27, 28]. Stormwater and drainage management system with an IoT module was developed by the researchers to predict the drainage using machine learning algorithms. The results show that a well-trained algorithm can predict the drainage situations. Hence, the proposed research work is focused on using a time series-based ARIMA model explored through the *R* studio software, which is one of the widely used platforms for stock predictions. The detailed mathematical study and the results are provided in the subsequent sections to ensure the prediction accuracy.

#### 2. Methodology

##### 2.1. Experimental Hardware for Water Distribution System

The real-time data are collected from the experimental hardware for the water distribution system, which consists of various components’ process tank, reservoir tank, I/P converter, pressure control valve, gate valve, manual hand valve drain valve, pressure gauge, flow control valve, orifice flow meter, etc., [29] as shown in Figure 1. The experimental hardware comprises three different water control loops from the reservoir to the distribution station for measuring pressure, flow, and level in the water distribution system [30]. Pressure, flow, and level are measured using sensors, and their corresponding data will be monitored using a master control engineering station [31, 32]. The control mechanism involves two-layer control, namely, operator work station and engineering station. The former one has the highest control capability than the second one.

The output of the transmitters is fed as input to the input/output hub module, which is interfaced with the field control station. The process flow diagram for the experimental setup is shown in Figure 2. The flowing water in the setup shown in Figure 2 is discussed as follows [33, 34]. The water is sucked from the water reservoir and is fed into the process level tank with the use of a pump and transmitted to the pipeline system. Different setups for level recording, flow recording, and pressure recording are made to record the real-time data from the water distribution system [35, 36] as shown in Figures 3, 4, and 5. The water distribution system is pilot scaled and used for real-time data monitoring based on pressure and flow sensors’ feedback.

The data are obtained in the form of graphs and datasets by using the front end of HMI [37] as shown in Figures 4 and 6. The water is again drained into the water reservoir after the recordings are made, and the total system is shut down. Auto-tuning PID controller is used in the experimental hardware, and it is used for flow and pressure control [38, 39].

###### 2.1.1. The ARIMA Model for the Future Prediction

The autoregressive integrated moving average (ARIMA) model is used for future predicted time-series values based on current pressure and flow data in the water distribution system. The ARMA model and ARIMA models have many similarities. However, the general autoregressive model estimates predictions using previous values of the dependent variable [40, 41]. The ARIMA uses the linear regression model, which uses lags as predictors. The ARIMA model uses a forecasting equation based on the time-series regression equation, and it is represented as follows:

Predicted value *M* = weighted sum of recent values of *M* + weighted sum of recent values of the errors.

The time-series forecasting functional equation is constructed as follows [33, 34]. First, let the predicted value *M* indicates *n*, which is the nth difference of *M*, and it can be represented as follows:

When *n* = 2,

From equation (2), it is pointed out that, when *n* = 2, it gives a discrete version of the analog second derivative value. It represents the local acceleration of the time-series forecasting values instead of a normal trend.

Hence, for the future predicted value *M*, the forecasting formula is represented as follows:where *θ* is the moving average parameter, M is the predicted value, and *e* is the error terms sampled with zero mean. Equation (3) can be rewritten and represented as the autoregressive model equation as follows:where *L* represents the lag operator, and *α* represents the parameters of all autoregressive part as mentioned in equation (3). An ordinary least squares called as OLS is used to for predicting the slope coefficients in the ARIMA model. The unit test root method is used for this purpose. From equation (4), is involved with the unit root with the multiple factor U as follows:

The ARIMA process is represented, and then, equation (4) can be rewritten based on equation (5) as follows:

From equation (5), the generalized ARIMA equation is defined by the following term, ARIMA (P, *Q*+1) process with “” roots.

The following steps are involved in data processing using the ARIMA model for future prediction using *R* software functions.(i)The dataset values obtained are first imported to the environment part in *R* studio software and then tested for class. The class should be in a time-series format to predict and forecast future data.(ii)The dataset is then converted to .ts from .xlsx format and checked for autocorrection function—acf (), partial autocorrection function—pacf (), augmented Dickey-Fuller test adf.test () to check the significant stationarity of the time-series dataset values.(iii)The time-series dataset values are partially or completely converted into a simple or fully formed statistical model by using the ARIMA model, and the -value is checked for the newly formed model to check the sustainability of the formed statistical model. The -value should be less than alpha 0.05 to conclude that the model is statistically significant [9, 10].(iv)acf (), pacf(), and ADF.test() functions are used to check the newly formed ARIMA model to check the significant nature of the framed model.(v)The responses with the forecasted value are obtained from the *R* studio, and the predicted data are collected from the execution of the comments in the console part of the *R* studio.

#### 3. Results

In order to predict the future needs in supplying water through a pipeline, historical output data of the water pipeline system have been regularly taken. With the help of experimental hardware, a certain amount of data has been periodically collected and those data can be used as datasets for future prediction of water requirements. Those pressure datasets and flow datasets are going to play a vital role in future predictions and forecasting [11, 12].

##### 3.1. Forecasting the Possibility of Occurrence of Cracks and Leaks Based on Pressure Datasets

###### 3.1.1. Input Time-Series Data Graph

The input dataset values obtained from the Excel by controlling the pressure control valve attached to the pipeline over equal intervals of time for three different conditions, namely, normal, crack, and damaged, which are fed to the *R* studio, are plotted and shown in Figure 7.

###### 3.1.2. Forecasting Process-Normal Condition

The input dataset values of normal condition are imported through the SCADA system to the *R* studio. The plot of the values associated with the pressure at normal condition is shown in Figure 8. The input dataset values of pressure at the normal condition are fed to the R studio; as the forecasting process requires statistical data, the R studio converts the input data into a time-series data with the time-series function and the plot of the time-series dataset values associated with the pressure at the normal condition with respect to the time [42].

The stability and sustainability of the time-series data of pressure at normal condition are checked, and the results of acf() and pacf() plots are shown in Figure 9.

Figure 10 shows that the data are not within the range of expectations of the stability, so here comes the necessity of making the data into an ARIMA model to obtain statistically stable data. The ARIMA model is built for prediction; the stability and sustainability of the ARIMA model of the pressure residuals at normal condition are checked; and the results of acf() and pacf() plots are shown in Figure 10.

Figure 11 shows that the data are within the range of expectations of the stability, and the -value is less than 0.05, which represents the stability of the data in the ARIMA model of pressure at normal condition over a period of time, which can be used for forecasting. The lag value represents the lead time period over which the average of moving values obtained from the preceding data values remains to be same for the same lead time period in the succeeding data. The predicted value of the ARIMA model of pressure forecast at damaged condition is plotted on the graph as shown in Figure 11.

Figure 11 represents the future forecasted data of pressure at normal condition for the next 1 hour where the blue line represents the average of the forecast value, which ranges with a minimum peak of 0.451379 N/m^{2} and 2.022273 N/m^{2} as the maximum peak. The shaded region represents the frequency of the forecast value with maximum peaks ranging from 1.706869 N/m^{2} to 3.065836 N/m^{2} and the minimum peaks ranging from −0.81046 N/m^{2} to 1.042164 N/m^{2}. The average of the forecast value reduces in its frequency as the pressure from the pressure control valve is reduced over equal intervals of time, and it is assumed to be constant when the pressure from the pressure control valve remains constant. The predicted time-series data are based on the pressure data collected with very less frequency say 125 minutes of cumulative data. In case if the frequency is high, say a month or early data the prediction accuracy will be more.

###### 3.1.3. Forecasting Process-Crack Condition

The input dataset values of the crack condition are fed to the *R* studio, and the plot of the values associated with the pressure at the crack condition is shown in Figure 12.

The input dataset values of pressure at the normal condition in the pipe are fed to the R studio; as the forecasting process requires statistical data, the R studio converts the input data into a time-series data with the time-series function and the plot of the time-series dataset values associated with the pressure at the normal condition with respect to the time that is shown in Figure 13.

The stability and sustainability of the time-series data of pressure at crack condition are checked, and the results of acf() and pacf() plots are shown in Figure 14.

Figure 14 shows that the data are not within the range of expectations of the stability, so here comes the necessity of making the data into an ARIMA model to obtain statistically stable data. The ARIMA model is built for prediction; the stability and sustainability of the ARIMA model of the pressure residuals at crack condition are checked; and the results of acf() and pacf() plots are shown in Figure 15.

Figure 15 shows that the data are within the range of expectations of the stability, and the -value is less than 0.05, which represents the stability of the data in the ARIMA model of pressure at crack condition over a period of time, which can be used for forecasting. The lag value represents the lead time period over which the average of moving values obtained from the preceding data values remains to be same for the same lead time period in the succeeding data. The predicted value of the ARIMA model of pressure forecast at crack condition is plotted on the graph as shown in Figure 16.

Figure 16 represents the future forecasted data of pressure at crack condition for the next 1 hour where the blue line represents the average of the forecast value remains same as 0.144865 N/m^{2} as the pressure remains the same at the crack region without any rise or fall in pressure irrespective of the change in pressure made by the pressure control valve. The minimum peak of the forecast value remains at -0.02742 N/m^{2}, and the maximum peak of the forecast value remains at 0.317153 N/m^{2}. This infers that the pressure remains unchanged over a period of time till the crack is sealed in the pipeline. The estimated value seems like constant with respect to time, since less frequency pressure data are taken for the forecasting analysis. The dataset’s quantitative expansion improves prediction accuracy.

###### 3.1.4. Forecasting Process-Damaged Condition

The input dataset values of normal condition are fed to the *R* studio, and the plot of the values associated with the pressure at normal condition is shown in Figure 17.

The input dataset values of pressure at the normal condition in the pipe are fed to the R studio; as the forecasting process requires a statistical data, the R studio converts the input data into a time-series data with the time-series function and the plot of the time-series dataset values associated with the pressure at the normal condition with respect to the time that is shown in Figure 18.

The stability and sustainability of the time-series data of pressure at damaged condition are checked, and the results of acf() and pacf() plots are shown in Figure 19.

Figure 19 shows that the data are not within the range of expectations of the stability, so here comes the necessity of making the data into an ARIMA model to obtain statistically stable data. The ARIMA model is built for prediction; the stability and sustainability of the ARIMA model of the pressure residuals at damaged condition are checked, and the results of acf() and pacf() plots are shown in Figure 20.

Figure 20 shows that the data are within the range of expectations of the stability, and the -value is less than 0.05, which represents the stability of the data in the ARIMA model of pressure at damaged condition over a period of time, which can be used for forecasting. The lag value represents the lead time period over which the average of moving values obtained from the preceding data values remains to be same for the same lead time period in the succeeding data. The predicted value of the ARIMA model of pressure forecast at damaged condition is plotted on the graph as shown in Figure 21.

Figure 21 represents the future forecasted data of pressure at damaged condition for the next 1 hour where the blue line represents the average of the forecast value, which increases from the minimum peak of 2.880788 N/m^{2} to the maximum peak of 3.29797 N/m^{2} and becomes constant. The shaded region represents the frequency of the forecast value with maximum peaks ranging from 4.866227 N/m^{2} to 5.664348 N/m^{2} and the minimum peaks ranging from 0.895349 N/m^{2} to 0.931593 N/m^{2}. The constant average value of the forecast infers that the pressure remains unchanged over a period of time till the damaged pipeline is replaced with a new pipeline.

##### 3.2. Flow Forecasting

###### 3.2.1. Input Time-Series Data Graph

The input dataset values obtained from Excel by monitoring the flow over a pipeline on a weekly basis, which are fed to the *R* studio, are plotted and shown in Figure 22.

###### 3.2.2. Forecasting Process

The input dataset values of flow in the pipe are fed to the R studio; as the forecasting process requires a statistical data; the R studio converts the input data into a time-series data with the time-series function; and the plot of the time-series dataset values associated with the flow with respect to the time that is shown in Figure 23.

The stability and sustainability of the time-series data of flow are checked, and the results of acf() and pacf() plots are shown in Figure 24. The data are not within the range of expectations of the stability, so here comes the necessity of making the data into an ARIMA model to obtain statistically stable data.

Figure 24 shows that the data are not within the range of expectations of the stability, so here comes the necessity of making the data into an ARIMA model to obtain statistically stable data. The ARIMA model is built for prediction; the stability and sustainability of the ARIMA model of the flow residuals are checked, and the results of acf() and pacf() plots are shown in Figure 25.

Figure 25 shows that the data are within the range of expectations of the stability, and the -value is less than 0.05, which represents the stability of the data in the ARIMA model of flow over a period of time, which can be used for forecasting. The lag value represents the lead time period over which the average of moving values obtained from the preceding data values remains to be same for the same lead time period in the succeeding data. The predicted value of the ARIMA model of flow forecast is plotted on the graph as shown in Figure 26.

Figure 26 represents the future forecasted data of water requirement for the next 1 year where the blue line represents the average of the forecast value, which increases from the minimum peak of 614.6292 (liters/week) to the maximum peak of 620.0099 (liters/week) and becomes constant. The shaded region represents the frequency of the forecast value with maximum peaks ranging from 617.0086 (liters/week) to 628.5465 (liters/week) and the minimum peaks ranging from 611.0967 (liters/week) to 612.2914 (liters/week). This infers that the requirement of water quantity in the next year increases as represented by the average of the forecast data.

#### 4. Conclusions

The identification of the crack, pressure difference, leakages, and blockages with the help of abnormal pressure rise or drop in the municipal water distribution system was analyzed [43, 44]. On the other side, the water flow in the pipeline was monitored and prediction for the future requirements was made by analyzing its present utilization of the flow datasets obtained from the preceding two years. With the help of the time-series forecasting algorithm, the collection of datasets for the flow periodically was made [45, 46]. Those datasets have been used as a source of input to the *R* studio under the time-series forecasting algorithm. The flow rate datasets of the past two years were recorded, and the datasets are analyzed on the time-series forecasting algorithm. As an output of the algorithm, the need for the next year was predicted and figured out in the form of a graph. This provides a strategy for forecasting the needs for the upcoming year. The stability of the statistical data was confirmed by using the -value (less than 0.05) provided by the ARIMA model of data for both pressures and flow datasets. The future prediction of the pressure concerning the flow in damaged conditions would range between 2.880788 and 3.29797 N/m^{2}. The future prediction of the pressure concerning flow in normal conditions would range between 0.451379 and 2.022273 N/m^{2}. The future prediction of the pressure concerning flow in crack condition would remain unchanged but lower than the normal condition. The future prediction of the flow requirement would range between 614.6292 and 620.0099 liters/week. Furthermore, future studies may be incorporated based on the smart pig for the identification of fault and data transmission in oil pipelines.

#### Data Availability

The required data can be obtained from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

Editing and writing were carried out at the Debre Tabor University, Ethiopia.