Abstract

Drought is a complex and frequently occurring natural hazard in many parts of the world. Therefore, accurate drought forecasting is essential to mitigate its adverse impacts. This research has inferred the implication and the appropriateness of the extreme learning machine (ELM) algorithm for drought forecasting. For numerical evaluation, time series data of the Standardized Precipitating Temperature Index (SPTI) are used for nine meteorological stations located in various climatological zones of Pakistan. To assess the performance of ELM, this research includes parallel inferences of multilayer perceptron (MLP) and autoregressive integrated moving average (ARIMA) models. The performance of each model is assessed using root mean square error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), Kling-Gupta efficiency (KGE), Willmott index (WI), and Karl Pearson’s correlation coefficient. Generally, graphical results illustrated an excellent performance of the ELM algorithm over MLP and ARIMA models. For training data of SPTI-1, ELM’s best performance has observed at Chitral station (RMSE = 0.374, KGE = 0.838, WI = 0.960, MAE = 0.272, MAPE = 259.59, R = 0.93). For SPTI-1 at Astore station, the numerical results are (RMSE = 0.688, KGE = 0.988, WI = 0.997, MAE = 0.798, MAPE = 247.35). The overall results indicate that the ELM outperformed by producing the smallest RMSE, MAE, and MAPE values and maximum values for KGE, WI, and correlation coefficient values at almost all the selected meteorological stations for (1, 3, 6, 9, and 12) month time scales. In summary, this research endorses the use of ELM for accurate drought forecasting.

1. Introduction

Drought is a recurrent natural climatic phenomenon that occurs virtually in most parts of the world. A drought is a recurrent event due to a lack of precipitation for an extended period of time in a particular region [1, 2]. Like other natural hazards, drought is steady and sometimes considered a creeping phenomenon as it is a gradually evolving natural hazard due to climatic fluctuations [3, 4]. Generally, the impacts of drought have effect on agriculture, livestock, ecological system, socio-economic, and energy sectors [5, 6]. Moreover, drought can be categorized as meteorological, hydrological, agricultural, and socio-economic drought. Therefore, it requires the investigator to consider individual opinions to define a specific type of drought [1, 2, 7, 8].

Due to the complex nature of drought, it is difficult to monitor and assess its impact [1, 2, 6]. An accurate prediction of drought is considered difficult, especially its onset or end [9, 10]. Prolonged droughts adversely impact the economic agriculture and social sectors. These massive drought impacts are due to sudden and widespread climate changes [11]. Drought can lead to devastating economic effects, with worldwide losses of around $9 billion per annum; the US livestock industry faced a $400 million loss during a severe drought in 2002 [12]. A comprehensive early warning system for drought is necessary to reduce its devastating impact. However, a few studies are conducted to mitigate this stochastic natural hazard [1315]. Recently, numerous drought indices have been developed to identify and monitor droughts and introduce mitigation policies [1619]. Reference [20] proposed a new drought indicator, i.e., Normalized Ecosystem Drought Index (NEDI), to observe dryness conditions in the pattern of a transitional ecosystem. It is expected that dryness conditions can be quantified better by using NEDI.

Numerically expressed drought indices are more understandable than natural rainfall data [1, 2, 21]. Drought indices can be a valuable tool to detect the initiation and termination of drought levels necessary for recovery planning, mitigation, and decision-making [22, 23]. Drought indices aim to quantify how drought conditions evolve and classify the severity of drought events. These indices made easy droughts modeling using stochastic time series, neural network algorithms, and water balance models. The most commonly used drought indices are Palmer Drought Severity Index (PDSI) [24], Surface Water Supply Index (SWSI) [25], Standardized Precipitation Index (SPI) [26], Effective Drought Index (EDI), and Standardized Precipitation and Evapotranspiration Index (SPEI) [27]. Ali et al. [28] proposed a multiscalar drought index named as Standardized Precipitation Temperature Index (SPTI). These drought indices were calculated using different meteorological variables [29]. Different drought indices were used to characterize, estimate, and forecast drought conditions.

The current long-range drought forecasts have minimal reliability [30]. Existing conventional stochastic models are inadequate for accurate drought predictions [31]. The recently developed machine learning (ML) models have extensive application in climatology including Naïve Bayes classifier, Bayesian networks [32], support vector machine (SVM), wavelet gene expression programming [33], maximum entropy, and artificial neural networks (ANNs). Results of several studies affirmed that the ML models perform comparatively better than conventional stochastic and dynamic models for drought estimation [34, 35]. ANN models act like a human brain and can be classified according to their neuron structure, number of hidden layers, and activation functions. Many researchers have successfully applied MLP neural networks for drought estimation and forecasting [36, 37]. The MLP is capable of accurately forecasting soil temperature in semi-humid and arid regions [38]. Aghelpour et al. [39] improved agriculture drought modeling by coupling the dragonfly optimization algorithm with SVM. Furthermore, [40] efficiently modeled RDI using hybrid support vector regression (SVR) coupled with firefly algorithm (FA), whale optimization algorithm (WOA), and wavelet analysis (WA). The results proved that hybrid and coupled SVR techniques improved drought forecasting. Although ML models have an outstanding reputation in estimation, prediction, and forecasting, many have slow computing times [41]. Among the class of ANN algorithms, ELM is being widely used in various fields and has gained fame in climatology and engineering [4247]. Mouatadid and Adamowski [48] efficiently forecasted urban water demand for Montreal (Canada) using ELM.

This research aims to infer the implication and the appropriateness of the extreme learning machine (ELM) algorithm for drought forecasting. In previous research, ELM has been implemented in different disciplines, including classification [49], regression [50, 51], clustering [52, 53], feature selection [54], pattern recognition [55], image processing [56] estimating sediment transport [57], and drought forecasting [30, 58, 59].The ELM has significantly faster learning, improved generalization performance, minimum human intervention, and accurate forecasting performance [60].

2. Materials and Methods

2.1. Data and Study Area

The application of this research is based on nine meteorological stations scattered around Pakistan. The topographic map of the study region and distribution of selected meteorological stations is shown in Figure 1. The study area is situated in the southeastern part of Asia and lies between 23.8° to 37°N latitude and 60.9° to 75.37°E longitude. The region is classified into clusters comprising different meteorological stations with diverse spatial characteristics [61]. Hence, selecting these meteorological stations aims to cover the maximum climatic variability. In addition, the study area encompasses five major river basins, Ravi, Chenab, Sutlej, Jhelum, and the Indus River. These rivers are the backbone of the country’s agriculture industry and hydropower projects.

For this research, time series data of the monthly precipitation and minimum and maximum air temperatures were collected from the Karachi Data Processing Center (KDPC) through the Pakistan Meteorological Department (PMD). The length of the data ranges from January 1951 to December 2016. The full-length data were split into two parts. January 1951 to December 2013 is considered the training data set, and the remaining three years, January 2014 to December 2016, is considered the test data set. The climatological forecast needs more accuracy for future hazard mitigation because long-range climatological forecasts compromise accuracy. Here, the errors and irregularities were detected and removed by the KDPC itself. Additionally, missing data were adjusted by generating values using cumulative distributions over lead periods.

2.2. Standardized Precipitation Temperature Index (SPTI)

The Standardized Drought Indices (SDIs) have extensive applications for drought monitoring. SDIs are standardized and spatially invariant tools for monitoring and assessing drought characteristics. In the literature, various authors have offered numerous methods for SDIs. Example includes the Standardized Precipitation Index (SPI) [26], Standardized Precipitation Evapotranspiration Index (SPEI) [27], and Standardized Precipitation Temperature Index (SPTI) [28]. Precipitation and temperature are two essential climatology indicators, revealing the vital dynamics of climate and hydrology. Therefore, a standardized drought index based on these two meteorological variables is more beneficial for drought monitoring and forecasting. Therefore, the SPTI has been chosen as SDI for monitoring and forecasting drought. The mathematical calculation of SPTI is quite similar to SPI; more detailed discussion can be accessed in [41]. SPTI is a multiscalar drought index and can be calculated for different time scales (1–48). Positive and negative values of the index indicate drought and wet conditions. These drought conditions are classified in Table 1 [62, 63].

SPTI is a modified form of the De-Martonne Aridity Index (DAI) (de Martonne, 1926). The mathematical properties of SPTI are utterly similar to SPI, an extensively used index for drought prediction in many parts of the world. For SPTI, we need to calculate DAI based on the monthly total precipitation and average monthly temperature. The next step is to fit an appropriate distribution to calculate a cumulative probability for standardization. However, many researchers used Gamma distribution for standardization. The index values are subjected to fitted distribution, and none of the single distribution can be appropriate for all the stations and for various time scales. Therefore, the 32 candidate distributions have been fitted on DAI at different lead time scales. The Bayesian Information Criterion (BIC) has been used as a threshold to assess the appropriateness of a distribution.

2.3. Candidate Algorithms

An artificial neural network (ANN) is a computational paradigm. It is a data-driven technique in which information goes through a biological structure of neurons with multiple layers introduced in the 1950s. It did not impose any constraints on input variables to train the model like other stochastic models. These algorithms are brilliant and learn from existing relationships among the observations of input and auxiliary variables. ANN can manage high-dimensional and high-frequency complex datasets [58]. ANN algorithms have broad applications in mathematics, engineering, medicine, economics, neurology, and hydrology [6467]. Kuligowski and Barros [68] claimed that weather prediction could be improved using ANN algorithms. This class of algorithms can be helpful in the field of climatology to forecast natural hazards like drought. Multilayer perceptron (MLP) is considered one of the useful and fully connected feedforward artificial neural networks. It usually consists of three layers of multiple nodes, including an input layer, multiple hidden layers (usually two hidden layers with multiple hidden nodes) with a nonlinear activation function, and an output layer. The neuron structure and estimation accuracy of MLP make it prominent among the other ANN algorithms. Error backpropagation is one of the supervised learning techniques used to train MLP. Another stochastic algorithm used for drought prediction is the ARIMA process [69].

2.3.1. Seasonal Autoregressive and Integrated Moving Average Model (SARIMA)

Yule [70] pioneered to introduce autoregressive (AR) models that the time series being analyzed is a linear function of its previous lag values. Slutzky [71] modeled time series as a function of past residual terms named as moving average model (MA). Wold [72] merged both AR and MA specifications and introduced a new generalized form of ARMA specifications used to model all stationary time series by choosing the appropriate order of “AR” and “MA” terms into the model. Time series data generally have trends (non-stationary). Non-stationary time series can be modeled by appropriate differencing the series into stationary. The series that is transformed from non-stationary to stationary by differencing is known as integrated series. The ARIMA has a systematic way of identification, estimation, and diagnostic checking approach to reach an appropriate model. Many hydrologic and meteorological time series data have inherited seasonal components [73]. These kinds of data can be efficiently modeled with the seasonal ARIMA model, which requires only a few parameters to be estimated [74]. The seasonal ARIMA model is described as ARIMA(p, d, q) (P, D, Q)s, where (p, d, q) is the non-seasonal component of the ARIMA specifications, while (P,D,Q)s is the seasonal component of the ARIMA model. The general seasonal ARIMA specifications are as follows:

Here, “” is the order of non-seasonal autoregressive terms, “” is the no of non-seasonal MA terms to be included in the model. Similarly, “” is the seasonal autoregressive terms, and “” is the number of seasonal MA terms, is the difference operator of non-seasonal series with “d” levels to make the series stationary, is the difference operator of seasonal series with no of differencing to get integrated stationary series, where “s” is the length of the season. The mathematical details of ARIMA specifications can be observed in [75]. The development of ARIMA specifications included identification, estimation, and diagnostic checking. By following these steps, a parsimonious ARIMA specification can be selected for estimation and forecasting a time series.

2.3.2. Extreme Learning Machine (ELM)

The extreme learning machine (ELM) is a modern single hidden layer feedforward neural network (SLFN) algorithm proposed by [76]. The proposed novel machine learning algorithm (ELM) operates identically to feedforward back-propagation ANN (FFBP-ANN) and least-squares support vector regression LSSVR models. It has shown its candidacy among the ANN algorithms to solve complex linear and nonlinear regression problems. It contains a single hidden layer of multiple hidden nods. However, most of the ANN-based methods have specific limitations such as slow computation, learning epochs, larger biases, and tuning parameters (weights). To overcome such weaknesses and the frailty of ANN methods, a state-of-the-art algorithm known as extreme learning machine (ELM) gained fame in the class of ANN algorithms [77].

Studies have revealed that even with randomly generated weights of hidden nodes, ELM can attain the universal approximation feature of SLFNs [46, 47, 78]. In the proposed method, the input weights are assigned randomly, and the output weights can be solved uniquely by the least-squares method of generalized inverse function [76]. If hidden node input weights and biases are chosen randomly, SLNFs can be considered a linear system. The output weights are determined analytically through the generalized inverse operation of the hidden layer output matrices because these weights connect the hidden layer to the output layer of the linear system. The ELM can solve regression problems with shorter simulation times than FFBP-ANN and LSSVR algorithms and makes ELM a thousand times faster [59, 7981]. It contains common properties of high generalization performance. The topological structure of the ELM algorithm is given in Figure 2, where three layers of neurons are used to develop the architecture.

Input layer where the input variables are introduced, the single hidden layer contain variable number of neurons where data are processed and analyzed and the output layer produce desired results through their activation function. The activation function used in the ELM algorithm is a sigmoid function , and while training ELM, most of the time is utilized while calculating the Moore-Penrose generalized inverse of the hidden layer.

Huang et al. [76] claimed that the maximal margin property of SVM and the minimal norm of weight theory of ELM is consistent. ELM and SVM perform equally well for standard optimization. For M random distinct samples , where and , standard ELM with N hidden nodes and with activation function are mathematically modeled aswhere is the weight vector connecting the hidden node and input nodes. is the weight vector connecting the hidden node and output nodes, and bi is the threshold of hidden node. denote the inner product of and .

ELM attains optimal generalization performance as long as the chosen number of hidden nodes is sufficiently high. In our simulation through ELM with sigmoid activation function, the number of hidden nodes is selected automatically to attain optimal prediction and forecast performance.

2.4. Model Evaluation Metrics

A model performance assessment needs calibration of an existed link between observed and predicted hydrological patterns. The fundamental performance assessment method is through a visual inspection of empirical and predicted or forecasted time series. For the quantitative evaluation of algorithms, around 20 performance metrics select a hydrological model [82]. It has been observed that the choice of an appropriate model significantly changed if precision-based metrics were used instead of error-based metrics [83]. Numerous accuracy measurement criteria were developed, but each tool has inherited pros and cons, and none of the metrics is universally accepted and can be used as a threshold [84]. In this study, some error-based performance metrics have been used for cross-validation, including root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The Kling-Gupta efficiency (KGE) and Willmott Index of agreement (WI) are also better ways to assess the performance of stochastic, machine learning, and hydrologic models [85, 86]. Another way to assess an algorithm’s prediction performance is to calculate the simple correlation coefficient between observed and predicted values of the input variable (SPTI) as a closeness measure. Similar performance metrics are used to assess the forecast ability of candidate algorithms.

The RMSE is the deviation of estimated or predicted values “” from actual or observed values “D” of drought indices, computed for “T” different predictions given in

Since RMSE is positively affected by outliers, therefore we need some robust measures toward extreme values. Mean absolute error (MAE) is less influenced by extreme values than RMSE [87]. Equation (7) describes the mathematical structure of the MAE.

Another accuracy measure is the mean absolute percentage error (MAPE), a unit-free tool to assess an algorithm’s prediction and forecast ability. Unlike other performance metrics, MAPE is a scaled independent metric. These performance metrics or accuracy measures are extensively being used in the field of climatology. The mathematical form of MAPE is given in

Kling-Gupta efficiency index was developed to assess the performance model by comparing estimated and observed time series data [88].

Here, “r,” α, and β in the KGE index illustrate the correlation coefficient, standard deviation ratio, and average ratio of observed and predicted values of SPTI, respectively.

Willmott [89] proposes an index named Willmott Index of agreement (WI) as a standardized measure of the degree of model prediction error.

A model with minimum values of RMSE, MAE, MAPE, the maximum value of KGE index, and the value of WI close to “1” will be selected and proposed as an adequate algorithm for the estimation of existing drought phenomena and forecasting future drought episodes.

3. Results

The descriptive statistics of meteorological and climatic variables are briefly detailed by using five-number summary statistics. The numerical results related to Minimum (Min.), first quartile (Q1), Median, Mean, and third quartile (Q3) are expressed in Table 2. These results indicated that the annual and seasonal meteorological characteristics of the selected meteorological stations are quite diverse.

Muzaffarabad has the highest mean monthly precipitation (125.65 mm), and the lowest mean monthly precipitation recorded was (15.35 mm) at Kalat. Sialkot has the highest maximum rainfall in a month (917.6 mm), and Chhor has observed the lowest maximum rainfall in a month (11.47 mm), while minimum rainfall at all selected locations was zero (0 mm). The precipitation source at these stations varies, such as heavy rainfall occurring at some stations in the monsoon season (June-Sep). However, precipitation exponentially declined after September and lasted till December until the western depression started in winter. At the same time, western depression causes rainfall in the winter season (Dec-Mar). The above statistics exhibit the dry and wet season cycles at a few stations. Temperature is another climatic variable used to calculate SPTI, so similar descriptive statistics for minimum and maximum temperature are expressed in Table 3. Results show that the total monthly minimum and maximum temperatures are highly apparent and distinguishable for all the stations.

3.1. Estimation of SPTI

At the very early stage of the computational analysis, we first prepared time series data of the SPTI index for all the stations by following the guidelines [90]. As described in Section 2.2, CDFs of the appropriate probability functions are standardized for all the stations and time scales. The minimum BIC value criterion decides the appropriateness of the probability function. Besides, we have assessed the quantile plot of theoretical and empirical densities. Thirty-two highly parameterized and extreme value distributions are included in our candidate list. The parameter estimation of these distributions and the computation of BIC values are based on Propagate [91] package of R language. Table 4 shows the BIC values calculated by fitting all the candidate distributions on SPTI-1 at selected meteorological stations, and with the lowest value of BIC, a distribution is chosen as the appropriate fitted distribution.

Furthermore, in Sialkot, Muzaffarabad, and Kalat stations, the BIC values of the “four-parameter Beta” distribution are the lowest among other distributions (Sialkot, −722.18; Muzaffarabad, −584.52; Kalat, −664.62). Only in Chhor station, “Johnson SU-distribution” has given better fitness results (Chhor, −445.58). We have observed that the “Three Parameter Weibull” distribution with the lowest BIC values is the most dominant (see bold values in Table 4). For the Astore station, the histogram of the appropriate probability function, the associated quantile plot, and the temporal behavior of the standardized time series data of SPTI-1 are presented in Figure 3.

For ease of convenience, other station plots are skipped. The red spikes indicated the drought severity and conditional dependence structure among drought episodes. The selected probability distributions with respective BIC values for all the time scales (1, 3, 6, 9, and 12) are presented in Table 5. Finally, standardized time series data of SPTI for selected time scales at all the selected meteorological stations have been prepared using appropriate probability distributions.

3.2. ELM and Its Comparative Assessment

In this research, we have assessed the performance of ELM with MLP and ARIMA in two phases. For all the individual stations selected for the current study, full-length data were divided into two independent parts, the training set and the validation set (test data). For most stations, precipitation and temperature records were available from 1951 to 2016. In the training phase, 64 years of monthly precipitation and minimum and maximum air temperature data from January 1951 to December 2013 are used to train candidate algorithms. In the testing phase, the rest of the 36 months of data (2014–2016) are considered test data to validate forecasting results. Simulations for ELM and MLP are carried out using R package nnfor [92], and forecast package for R language [93] was used to select the appropriate order of seasonal and non-seasonal specifications of the ARIMA model.

Furthermore, all the ELM, MLP, and ARIMA simulations are carried out in the R 3.5.3 environment running on core i7 with a clock speed of 2.3 GHz CPU. The optimum order of ARIMA specifications with the estimated parameters is detailed in Table 6. These optimum specifications were attained by running all possible ARIMA models, including all possible seasonal and non-seasonal lag values of the input time series. The ELM algorithm was trained using 23 input layer nodes and a single hidden layer with 100 hidden nodes. The algorithm is repeated 20 times, and the estimated outcomes are combined using the median operator. ELM assigns random weights to each hidden node. Furthermore, it assigns start weights to input layer nodes and generalized weights to hidden layer nodes. These weights are estimated using least absolute shrinkage and selection operator (LASSO) to keep the model parsimonious. The parametric network structure still forms a large dimension matrix, which is not feasible to illustrate numerically in a tabulated form. Furthermore, MLP is trained using two hidden layers containing 10 and 5 hidden nodes, respectively, to get optimum results.

The MLP algorithm drastically increases as we increase the number of hidden layers or by increasing the number of nodes of hidden layers. Hence, the structure of MLP is finalized with 23 nodes of the input layer, two hidden layers with 10 and 5 nodes, respectively, and a single output layer. This parsimonious structure still forms a larger matrix of user-defined parameters. Due to complexity and numeric hazard, the estimated results for user-defined parameters are skipped. The performance of ELM, MLP, and ARIMA algorithms was assessed using performance assessment metrics, including RMSE, MAE, MAPE, KGE, and Willmott index of agreement.

Table 7 provides numerical results of these performance assessment metrics using training data sets for ELM, MLP, and ARIMA models at selected meteorological stations with (1, 3, 6, 9, and 12) month lead time scales. Results indicate that ELM performs better than MLP and ARIMA models. For the assessment of candidate models, the numerical results of all statistical metrics for SPTI-1 are illustrated with details. The best performance of ELM has been observed at Chitral, with a minimum value of RMSE (0.374). However, MLP best performed at Chhor station with a minimum value of RMSE as 0.598, and the ARIMA model overall best performed at the Chhor station with RMSE (0.632). While using MAE, the minimum values for ELM, MLP, and ARIMA are 0.272, 0.449, and 0.5, respectively. As for MAPE, these quantities are 259.59, 161.33, and 324.7, respectively. The values KGE for ELM, MLP, and ARIMA models at Astore station are 0.712, 0.517, and 0.314, respectively.

Similarly, the numeric quantities of WI at the Astore station are 0.999, 0.748, and 0.664, respectively. The KGE index indorses ELM’s superior performance at all the stations by providing maximum values as compared to MLP and ARIMA. Similarly, Willmott’s agreement “WI” index consistently provides the highest values for the ELM algorithm. KGE and WI are considered the most appropriate metrics for the performance assessment of meteorological and hydrological models. The KGE is calculated using the correlation coefficient, the ratio of variations, and the ratio of averages of predicted and observed series using equation (9). The values of “WI” for the ELM model for all the selected stations are close to 1, which endorses the ELM as the best performing model. The similar superior performance of ELM continued for other time scales at selected stations. Overall results for the training phase show that the ELM model has shown good agreement at all selected stations.

As the time scale of the drought index increases, the performance of the proposed algorithm improves. As a result, the ELM algorithm showed superior performance to its competitive algorithms (MLP and ARIMA). A comparison of all the performance assessment metrics concluded that ELM algorithm is selected as the adequate model for the estimation and forecasting of drought indices (see Table 7).

The consistency and co-movement of the observed and estimated values of SPTI are further assessed while employing Karl Pearson’s product-moment correlation coefficient. Table 8 shows the numerical results of the correlation coefficient between the observed and predicted values of SPTI using ELM, MLP, and ARIMA models for training data. The quantitative results of the Astore station for ELM, MLP, and ARIMA are 0.87, 0.59, and 0.56, respectively, indicating a better agreement of ELM to predict SPTI-1 contrary to other candidate algorithms. At Chitral station, values of correlation are (0.93, 0.74, and 0.69). For SPTI-3 at Astore station, these results are 0.96, 0.88, and 0.86, respectively, and at Chitral station, the correlation values for ELM, MLP, and ARIMA are 0.97, 0.93, and 0.90. These numerical results clearly illustrated that the performance of the ELM model significantly improved as the time scale increased. A similar pattern of the superior prediction performance of ELM continued for other time scales at all the selected stations. Although all the models have shown reasonable prediction performance, quantitative results evidently confirmed that the estimated values of SPTI using the ELM model strongly correlate with the observed values of SPTI for any time scale.

Usually, climatic and meteorological studies comprise high-frequency datasets that require fast algorithms. So, speed is a notable characteristic for determining the reliability of the algorithm. The algorithm selection for climatic studies is subjective in terms of speed and relative efficiency. ELM has the novelty of being the fastest algorithm among the ANN class to solve complex datasets. ELM algorithm training and testing time were almost 32 times faster than ANN, indicating ELM's supremacy over other ANN algorithms [58].

The functional relationship between the actual (observed) and predicted values of SPTI using ELM and other algorithms for the “Astore” station is shown in Figure 4 using a line graph, which depicts significantly less variation among the observed values of SPTI and the predicted values using ELM.

MLP and ARIMA were unable to capture all the shocks in historical values of SPTI, and departure from observed values was significant. Here, the ELM algorithm reflects more precise and accurate predictions. Figures 5– detail the ELM's prediction performance for all the time scales at the selected stations, which depicts the ELM algorithm's better prediction performance.

It was observed that as the time scale increases, the prediction performance of algorithms substantially improves. For SPTI-12, significantly fewer deviations have been observed among drought index’s paired (observed and predicted) values. These multi-line plots indicate that the ELM model incurred smaller errors than the two counterparts. The predicted values using ELM model follow the observed values of SPTI more precisely. These graphical presentations show that the ELM model has shown more accuracy than MLP and ARIMA models for estimating SPTI for various lead time scales at selected meteorological stations.

Scatter plots of the observed and predicted time series data sets are another way to assess the prediction performance of probabilistic, machine learning, and ANN algorithms. Figures 810 show the scatter plots of the historical observed and predicted values of SPTI-6 using ELM, MLP, and ARIMA models at all the selected meteorological stations. Predicted values through the ELM model have shown a strong correlation with the observed values of SPTI through scatter plots. Another graphical presentation endorses the superior performance of ELM. We can observe that the ELM algorithm showed more accuracy and can potentially predict drought conditions in any climatic zone.

Figure 11 represents the Taylor diagrams for the Astore station with (1, 3, 6, 9, and 12) month time scales for the training phase. Taylor diagrams are the more comprehensive and precise way to represent the estimation and forecast ability of a model. The similarity between the predicted and actual values of SPTI is evaluated in terms of correlation (as a measure of closeness), and the variation is assessed by the standard deviation (SD) and the RMSE. For SPTI-1, the correlation of modeled data using the ELM algorithm with actual observations was about 0.9, followed by MLP and ARIMA with 0.6 each. As time scale increases, the prediction performance of algorithms significantly improves. For SPTI-3, the ELM algorithm is significantly closer to the actual values as its correlation is about (0.97) as compared to MLP (0.9) and ARIMA (0.85). The Taylor diagram exhibits the superior performance of ELM algorithm for estimating SPTI with (1, 3, 6, 9, and 12) month lead time scales (see Figure 11). Figure 12 illustrates the violin plots related to the training phase of ELM and other candidate models. The red dot represents the mean value, a thick white bar represents the interquartile, and a thin blue line represents the whole data set distribution. These are the components of the boxplot, but the colored area on both sides of the blue line is the estimated kernel density of the shape of the distribution of data. Higher probabilities are associated with wider parts of the diagram, and thinner sections show lower probabilities. This is another systematic way to compare the prediction performance of models. It is noticeable that the mean of the predicted and observed values of SPTI-6 for the ELM model was almost similar for all the stations, whereas other models have slight variations. All the models have shown reasonable agreement but Violin diagrams of ELM and observed data are nearly identical. These graphical illustrations affirmed that the proposed ELM model is better in estimating the actual values of SPTI at all selected meteorological stations.

After calibrating and validating algorithms for training datasets, the proposed algorithm's generalization capability has been assessed in the next step. Finally, out of the sample forecast of SPTI for all the lead time scales, (1–12) is carried out for 36 months from 2014 to 2016. These forecasts are considered sufficient for drought preparedness and mitigation policies. Similar performance metrics have been used to analyze the difference between the observed and forecasted values of SPTI for different time scales at all selected meteorological stations. Numerical results related to these performance metrics are given in Table 9. If the comparison of KGE was made at Astore station, the values of KGE for ELM, MLP, and ARIMA models are 0.988, 0.771, and 0.671, respectively. The KGE for the ELM model is significantly higher than its counterparts for all the time scales at all the selected meteorological stations, which clearly endorses the better forecast capability of the ELM model. Similarly, WI and MAE quantitative results also ratified that the ELM model outperformed the ANN and ARIMA models. RMSE and MAPE endorsed ELM as a better forecasting model for most stations.

The functional relationship between the observed and forecasted values of SPTI using ELM, MLP, and ARIMA models for all the time scales at Astore stations is illustrated in Figure 13 for the test phase starting from January 2014 to December 2016 for 36 months. A significantly smaller degree of deviation in SPTI for the ELM model and observed values of SPTI were exhibited. Although MLP has shown a reasonably good forecast performance compared to the stochastic seasonal ARIMA model, the ELM model evidently shows superior forecast performance.

In order to check the appropriateness of the ELM model, scatter plots were prepared using time series data of the observed and forecasted SPTI at Astore station for all lead time scales (Figure 14). These scatter plots have shown a significant difference in the forecast performance of ELM and other models. The scatter plots depict the correlation, goodness-of-fit, and the extent of agreement between the observed and forecasted SPTI. The ELM model also clearly outperformed MLP and ARIMA models for the testing phase for all the selected time scales.

3.3. Discussion

Drought is a multifaceted and commonly occurring hazard in several parts of the world. Its impacts are prevalent in the agriculture, socio-economic, and energy sectors. However, precise drought monitoring and estimation techniques can assist in decreasing the vulnerability of society to drought. The primary objective of the current study was to test the appropriateness and usefulness of the ELM model relative to other ANN (MLP) models and stochastic (ARIMA) models for drought forecasting. The prediction and forecast performance of ELM is compared with other ANN algorithms (MLP) and statistical stochastic (ARIMA) models. The prediction and forecast performance of models is assessed using numerous performance metrics, including RMSE, MAE, MAPE, KGE, WI, and Karl Pearson's correlation coefficient. The quantitative assessment revealed that both the ANN models (ELM and MLP) performed better than the stochastic model (ARIMA), and among the ANN models, ELM has shown supremacy by producing the smallest RMSE, MAE, and MAPE values and the maximum values for KGE, WI, and correlation coefficient for almost all the meteorological stations.

Furthermore, ELM shows better agreement for both the training and test phases to predict the SPTI at all climatic stations than its counterparts. The efficiency of ELM, contrary to other models, is evident based on the performance metrics. A similar forecast performance has continued for higher-order time scales, consistent with earlier studies [30, 58]. Computational time consumed by drought modeling algorithms also needs to be optimized. Usually, large datasets are used as input variables for real-time drought modeling, which affects the computational performance of different models in terms of time. The ELM model is significantly faster than its counterparts. By evaluating all the numerical results of performance metrics and different graphical illustrations, it can be easily concluded that the ELM model attains the most accurate drought forecasting performance during training and test phases. The study revealed that ELM is the most appropriate, reliable, and efficient algorithm for drought prediction and forecasting. This study suggests that ELM can be used as an early warning drought forecasting tool for developing drought mitigation policies.

4. Conclusions

The reliable, efficient, and faster drought forecasting algorithms are useful for freshwater resource managers and drought mitigation policymakers. The current study examines the forecast performance of new machine learning (ELM) model using the Standardized Precipitating Temperature Index (SPTI). For application, meteorological time series datasets of monthly precipitation and minimum and maximum temperatures were collected from nine meteorological stations located in various climatological zones of Pakistan. Further, the prediction and forecast performance of the ELM model was compared with MLP and ARIMA models using different statistical performance metrics and graphical illustrations. The primary objective of the study was to investigate the appropriateness of the ELM model for predicting the nonlinear and complex temporal behavior of SPTI. The ELM model outperformed MLP and ARIMA models by producing the smallest root mean square error, mean absolute error, mean absolute percent error values, and maximum values for KGE, WI, and Karl Pearson’s correlation coefficient for all the selected meteorological stations for different selected time scales. KGE and WI unanimously endorsed ELM as the best forecasting model at all the stations. Moreover, by comparing forecasting results for a one-month time scale, RMSE clearly affirmed ELM as superior forecasting model for five stations including Astore (RMSE = 0.688), Chitral (RMSE = 0.643), Kohat (RMSE = 0.956), Mianwali (RMSE = 0.773), and Sialkot (RMSE = 0.720). Furthermore, MLP better performed at Multan (RMSE = 0.666) and ARIMA at three of the meteorological stations containing Chhor (RMSE = 0.559), Kalat (RMSE = 0.727), and Muzaffarabad (RMSE = 0.724). While using MAE as a performance measure, the ELM algorithm performs better for six [94] stations (Astore 0.798, Chitral 0.495, Kalat 0.676, Kohat 0.791, Mianwali 0.652, and Sialkot 0.554), MLP for two [95] stations (Chhor 0.447 and Multan 0.577), and ARIMA for one of the stations (Astore 0.765). MAPE recognizes [96] ELM as an appropriate algorithm [97] for seven stations (Astore 247.3, Chitral 142.1, Kalat 500.3, Mianwali 309.1, Multan 545.5, Muzaffarabad 321.8, and Sialkot 91.0) and MLP (Chhor 269.2) and ARIMA (Kohat 268.2) for one station. Contrary [98] to MLP and ARIMA models, the ELM model has [99] the super-fast computation capability of [100] drought modeling. These performance [101] comparisons clearly ratified the novelty and [102] appropriateness of the proposed ELM algorithm.

In summary, this study suggests that ELM can be used as an early warning drought forecasting tool for developing drought mitigation policies using time series data of the Standardized Precipitating Temperature Index. The scope of the study can be enhanced by using wavelet data mining transformation to get the optimized forecast performance of the ELM model. ML algorithms have certain limitations, like requiring fast computing hardware. ANN models (ELM and MLP) have complex network structures among the machine learning algorithms. Another limitation of ANN algorithms is that there is no specific rule to determine the final network structure. Instead, the appropriate network structure is finalized through trial-and-error-based.

Data Availability

Data and codes can be provided on request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

All authors have an equal contribution.

Acknowledgments

The authors appreciate the Deanship Scientific Research at King Khalid University for funding this work through large groups (project under grant number RGP.2/34/43).