Abstract

Drought frequently spreads across large spatial and time scales and is more complicated than other natural disasters that can damage economic and other natural resources worldwide. However, improved drought monitoring and forecasting techniques can help to minimize the vulnerability of society to drought and its consequent influences. This emphasizes the need for improved drought monitoring tools and assessment techniques that provide information more precisely about drought occurrences. Therefore, this study developed a new method, Model-Based Clustering for Spatio-Temporal Categorical Sequences (MBCSTCS), that uses state selection procedures through finite mixture modeling and model-based clustering. The MBCSTCS uses the functional structure of first-order Markov model components for modeling each data group. In MBCSTCS, the suitable order of the components is selected by Bayesian information criterion (BIC). In MBCSTCS, the estimated mixing proportions and the posterior probabilities are used to compute probability distribution associated with the future steps of transitions. Furthermore, MBCSTCS predicts drought occurrences in future time using spatiotemporal categorical sequences of various drought classes. The MBCSTCS is applied to the six meteorological stations in the northern area of Pakistan. Moreover, it is found that MBCSTCS provides expeditious information for the long-term spatiotemporal categorical sequences. These findings may be helpful to make plans for early warning systems, water resource management, and drought mitigation policies to decrease the severe effects of drought.

1. Introduction

Drought is relatively more volatile than other natural disasters, and traditional valuations or forecast procedures are failed to predict it. Its relatively unperceptive onset and the multifaceted impacts cause the new assessment methodologies [15]. Since last decades, it has become more prominent to distress the environment and economic sectors worldwide than other natural hazards [68]. Moreover, determining the onset and end times of the drought is still challenging for drought management. Structurally, the effects of droughts slowly add over a period, and they may linger for a long period [810]. However, it can be characterized by a precipitation deficiency, which has substantial impacts on the agriculture, hydrological systems, and on living standards of the people [11, 12]. Despite perceptible effects of drought, these effects acclimatize severity without appropriate measures and are sustained for the long term even after termination [9].

The advancements in drought assessing and monitoring procedures can lead to better drought preparation and decrease the susceptibility of society to drought and its forgoing influences [8, 10, 13]. Therefore, it is essential to find more suitable techniques and procedures to predict drought occurrences more instantaneously. The improved method can be helpful to make plans for the early warning system, drought mitigation policies, and water resource management and decrease the severe effects of drought. Furthermore, the occurrences and characteristics of drought trigger the discussion about the various methodologies and techniques. Generally based on the occurrences and characteristics of the drought, authors have been categorizing the drought into various groups, including “meteorological, hydrological, agricultural, and socioeconomic” [14]. Chang [15] and Eltahir [16] defined that meteorological drought can be occurred due to the shortage of precipitation over a region for some time. Several studies have considered precipitation data to analyze meteorological droughts [17, 18]. The streamflow data have been frequently used for analyzing hydrological drought [1921]. Furthermore, the reduction in soil moisture usually causes agricultural drought. The reduction in soil moisture can be affected by meteorological and hydrological droughts. Socioeconomic drought is linked to the shortfall in water resource systems, and in this case, the water supply is unable to meet water demands.

In the past few decades, numerous drought indices have been proposed to assess the drought occurrences [2226]. The drought indices are frequently used to characterize the drought. The indices are based on various parameters that describe the spatial and temporal extents. Obtaining accurate and precise information about drought occurrences using several drought indices is crucial for an early warning policy; however, consistent and eminent drought information plays a crucial part in preparing drought monitoring and mitigating policies. Numerous drought indices with their strengths and weaknesses exist in the literature and are used by decision-makers who build action plans for drought early warning systems and mitigation policies. For example, Palmer [27] developed a drought index named the Palmer Drought Severity Index (PDSI). The PDSI worked well especially for subhumid and semiarid regions. The PDSI provided weekly information related to abnormal evapotranspiration deficit for the various regions. Information obtained from PDSI can be helpful for the crops in the region. The moisture condition of the regions can be assessed. Gommes and Petrassi [28] have proposed the national rainfall index (NRI). The NRI was used to provide synthetic discussion in sub-Saharan countries in Africa. They used NRI to determine the pattern recognition of rainfall in various regions. The Surface-Water Supply Index (SWSI) was introduced by Shafer and Dezman [24]. The computation of the SWSI is based on two major sources of irrigation water supply, namely, spring-summer streamflow runoff and reservoir carryover. Both sources are accumulatively analyzed to determine the total availability of surface water supply in season. Van Rooy [29] developed the Rainfall Anomaly Index (RAI). The RAI helped to find geographical anomalies of the rainfall pattern in varying regions. Weghorst [26] has introduced the Reclamation Drought Index. Palmer [22] has introduced crop moisture index (CMI). Bhalme and Mooley [23] has developed Bhalme and Mooly drought index (BMDI). The BMDI used precipitation data and provided both negative and positive values to measure drought intensities. McKee et al. [25] developed the Standardized Precipitation Index (SPI). The SPI considered the time series of a long-term record of precipitation in the climatic areas. The dynamic characteristic of SPI is that it can be studied for different time scales and used to compare varying climatic areas. Therefore, SPI is being used extensively for evaluating and recording drought characteristics [3035]. Furthermore, the drought indices that are mentioned above have been used frequently for drought monitoring in the different studies, although having discrepancies among the indices, to gain consistent interpretation across several regimes and spatial climates. This study utilized SPI, which is often employed to assess and monitor meteorological drought and is recommended by the World Meteorological Organization [36].

Furthermore, many clustering techniques are considered in the literature [3741]. The clustering techniques focus on grouping the data so that the data group with similar characteristics would be selected within the cluster, while distinct information can exist among other clusters. Various clustering techniques have been frequently considered in machine learning approaches, especially in statistics and computer science, due to the variety of their applications [4145]. Among the various techniques, model-based clustering groups data and presumes that each data cluster can be perceived as a part of any probability distribution [46, 47]. In various data groups, numerous distributions are preferred, and finite mixture models are desired [48]. The performances of the model-based clustering are outstanding in spectrometry data, text classification, social networks, and distinct grouping objects. Model-based clustering is used for time series [49] and regression time series analyses [50]. Several studies related to model-based clustering are available in the literature; however, it has not yet received greater attention in drought analysis. Therefore, this study developed a new technique known as Model-Based Clustering for Spatio-Temporal Categorical Sequences (MBCSTCS) to precisely predict drought occurrences for spatiotemporal categorical sequences. The performance of the proposed technique is assessed by using six meteorological stations in the northern area of Pakistan.

2. Methods

2.1. Standardized Precipitation Index (SPI)

The long-term record of precipitation in the climatic area observed in the time sequence can be used to compute SPI. The vital feature of SPI is that it can be considered for various time scales and is being widely used to calculate and record drought occurrences [34, 35, 51, 52]. The analysis with various time scales can provide varying information. For example, the moisture conditions in different seasons can be assessed using SPI at a three-month time scale. The SPI can assess information related to the water deficiency at a twelve-month time scale. Furthermore, the use of SPI describes the best characteristics in forecasting and analyzing risks as a probabilistic approach [31, 35, 53]. The SPI has been frequently used for drought monitoring in several aspects, for example, spatiotemporal analysis, forecasting, frequency analysis, and climatic studies [33, 35, 51, 52]. As precipitation is only used to determine the climatic condition for a particular area, it offers spatially reliable interpretations across various climates [32, 34, 35]. Therefore, it can be advantageous for the areas where other parameters are available that are required to calculate other indices and of significantly great concern to the various environmental and temporal circumstances [54]. This study focuses on the new methodology developed for monitoring drought more precisely and comprehensively in a specific area. The SPI at various time scales (1, 3, 6, 9, 12, and 24) is used for the current analysis.

2.2. Model-Based Clustering for Spatio-Temporal Categorical Sequences (MBCSTCS)

Model-based clustering has been used for time series [49] and regression time series analyses [50]. Various studies associated with model-based clustering are available in the literature; the technique has significant importance for many applications; however, it has not yet received greater attention in drought analysis. Furthermore, in drought classification, categorical sequences are required for obtaining reliable results for the drought characterization. In this perspective, this study proposed MBCSTCS to analyze the categorical drought sequences for various time scales and stations. The MBCSTCS provides more significant results by using a categorical grouping of sequences than traditional approaches that have been used for the prediction. The MBCSTCS reflects the steering behavior of drought classes on various time scales and stations. Moreover, the selected drought classes (states) (“(Extremely Dry (ED), Severely Dry (SD), Normal Dry (ND), Median dry (MD), Median Wet (MW), Severely Wet (SW), and extremely Wet (EW)”) are considered for the region [55].

Moreover, the first-order Markov model has a rationale in statistical modeling. The MBCSTCS considers the functional shape of first-order Markov model components for each data group. Furthermore, in the MBCSTCS the data groups consist of various sequences of drought states. For example, we let observation that specifies for an ordered sequence, where each of its elements consists of a categorical value that is specified for varying drought states and coded by natural integers. Furthermore, it is assumed that the number of unique drought states equals , i.e., {} for . Moreover, using a joint probability expression it can be written as . In this format, the first-order Markov model provides an interesting method to describe the transitions between varying states. The probability of transitions of drought states in the next step depends only on the present state and has no connection to the drought states that are observed in the past. The joint probability using the first-order Markov model is given in the following equation:

Furthermore, to simplify the notations, we use to denote initial state probability and to represent the transition probability. For example, shows the probability that the initial state is and transition probability of to is represented by . So, utilizing the given notations, we can write as there are p states in the Markov model, and in this case, the initial state probabilities can be represented as and the matrix of the transitions as .  = . Moreover, for the specific component based on finite mixture modeling the and are replaced by the and and the model can be written as follows:

The log-likelihood of equation (2) can be expressed as follows:

In equation (3), the is indicator function and indicates the length of categorical sequence. Expectation-maximization (EM) algorithm is employed to estimate the parameters [56].

2.3. Prediction of Future Drought Occurrences for Spatial-Temporal Categorical Sequences

The setting of transition probability matrices can be represented by and a probability distribution , connected with mixture components, and the -step transition probability matrix can be created bywhere indicates the matrix raised to the power . The choice of the appropriate distribution , is linked with the application. However, the (, ,) and (i.e., , ,), which are the mixing proportion estimated vector and the posterior probability estimated vector, respectively, associated with a particular sequence, can hold significant influence for the computation of probability distribution for future drought occurrences.

3. Application

The choice of the region is based on its structural impacts and other climatic characteristics that affect the other parts of the country. The outcomes of the study are obtained from the six selected stations with time-series data from January 1971 to December 2017 of the northern area of Pakistan (Figure 1) using SPI at various time scales. The selected stations have significant importance for the selected region and other regions of the country. For example, the reservoir system and agriculture sector are highly associated with the selected region; therefore, the climatic discrepancy of the region is significant for the other parts of the country [57, 58]. Furthermore, the fluctuation of the weather pattern in other regions within the country also contributed to their impacts on socioeconomic and environmental sectors. Most of the parts of the country have been facing the highest temperature, and these parts are being highly influenced by global warming [58, 59]. Undoubtedly, extreme climate events, including high temperatures, rainstorms, and droughts, are frequently associated with global climate warming. Climate warming significantly affects the universe, which usually causes a high temperature and water deficiency. These issues are associated with drought occurrences that damage the environment, natural resources, and lives of the people distinctly more than any other natural hazard. Furthermore, it produces convoluted consequences for society and the economic sectors of the country. Therefore, it is vital to recognize the drought occurrences more instantly by emerging comprehensive and efficient frameworks and techniques. In this regard, a new technique is applied to the selected stations that will influentially expand the capability of detecting drought occurrences and improve the competencies for drought evaluation and its assessment.

3.1. Results

The findings of this study are obtained by using long time series data collected from six climatological stations in the northern area of Pakistan. The selected stations are observed to show homogenous results for the specific indices when calculated for varying stations with a single time scale [55]. However, on the varying time scales, the observations of the indices may vary. Furthermore, the inconsistency in their observations and varying generating processes of the drought states causes for developing a new method (i.e., MBCSTCS). The MBCSTCS considers the various time scales for a particular station as sequences with inconsistency in their sizes and varying data generating processes to analyze the spatiotemporal behavior of the drought states. It means that the observations of the SPI at scale-1 (SPI-1) for Astore station are considered as sequence-1, sequence-2 takes all observations of Astore station in SPI at scale-3 (i.e., SPI-3), and these sequences are continued to the last scale (SPI-24). Accordingly, these sequences can be assigned for other stations and time scales. Furthermore, the observations of each sequence assume that they come from the specific components that are selected appropriately for the data. The selected states are observed corresponding to every calculated value of SPI. These selected states are further distributed categorically for the computation of this study.

Moreover, Niaz et al. [55] proposed a new technique for monthly forecasting drought intensities using model-based clustering of categorical drought state sequences. The mentioned study is performed on various stations based on a single time scale. However, in this study, the various time scales are accumulatively considered for the monthly prediction of drought severity in a region. The outcomes of the current analysis are more appropriate, especially for the selected stations, and help the policymakers to make better policies related to various kinds of droughts including meteorological, hydrological, and socioeconomic. Furthermore, the current analysis is performed by using the R package ClickClust [45] that handles the case of coming observations from various probability distributions (-components). The package is based on finite mixtures with Markov model components and is used to find the specific outcomes related to the specific sequence. The appropriate order of the components (i.e., the mixture model) is identified by minimizing the Bayesian information criterion (BIC) [60]. Moreover, for a specific sequence, the mixing proportions estimated vector and the associated posterior probability estimated vector were used to calculate probability distribution associated with future steps of transitions from the last state of these sequences. Furthermore, climatological statistics on the given data of various stations are provided in Figure 2. To accomplish the analysis, the R package named propagate is used to provide appropriate findings and permit the specific analysis. In the mentioned package, various distributions are considered; among the given distributions, the appropriate choice of the distribution is based on the BIC values. This selection criterion is helpful to find the best fitting for the time scale and stations specified for the analysis.

The BIC values are given in Table 1 for the selected probability distributions fitting appropriately to the several time scales and stations. For example, at Astore station for scasle−1, the BIC value (−1036.5) of three-parameter (3P) Weibull distribution is found minimum among other distributions. Therefore, the (3P) Weibull distribution is considered as best fitted distribution for the Astore station at a scale−1. Furthermore, in Astore station for scale−3, the Gamma distribution is selected with the minimum BIC value (−1279.1). Moreover, in scale−6 and scale−9, it is also found that the Gamma distribution is best fitted in Astore station with minimum BIC values −892.8 and −896.1, respectively. Furthermore, Cosine and Skewed- normal distributions are considered for scale−12 and scale−24, respectively, in the Astore station. In Bunji station at scale−1, the (3P) Weibull is showing the minimum value of BIC (i.e., 1,031.0) and specified for the computation. At scale−3, scale−6, scale−9, scale−12, and scale−24 at Bunji station, the Gamma, Skewed-normal, Normal, Laplace, and Laplace are selected with BIC values −824.9, −1162.2, −649.1, −688.1, and −843.7, respectively. In Gupis station 4p beta has a minimum value of BIC (−788.7) for scale−1. In other scales (3, 6, 9, 12, and 24), the gamma, Gumbel, Johnson SU, and scaled/shifted t have minimum values of BIC −1264.9, −1305.4, 1519.0, −937.6, and 1408.0, respectively.

Accordingly, various distributions are selected for various time scales for the other three stations (Chilas, Gilgit, and Skardu) (1, 3, 6, 9, 12, and 24). After standardization with a selected probability distribution, the next step is the classification of the SPI for various drought states (Table 2). In Figure 3, the temporal behavior of the SPI at scale-1 is presented for various stations. However, the behavior of SPI for other selected time scales can be presented accordingly. After calculating values to quantify SPI at various time scales, we first categorized SPI for its magnitude. The behavior of several drought classes for SPI at a one-month time scale in selected stations is provided in Table 3, where the observations are taken as an example for various months of the year 2017. Accordingly, the behavior of several drought classes for other years for different time scales is calculated. These observed drought classes are further used to find the probability distribution associated with the three-step transition from the last state in the various sequences. The posterior vector related to these sequences specifies the parameter values (briefly described in Section 2.2). The obtained results show that the most likely state to visit in three steps is ND, which means the probability associated with ND is higher than the other selected states in varying sequences (Table 4). For example, for the Astore station, in sequence-1, the value indicates that the probability of ND occurrence is 0.6668, which is higher than other selected states. This probability of occurrence for ND can be observed from other sequences. Further in sequence-2, the probability of ND occurrence after three months is 0.6729. Moreover, the probabilities of ND in sequence-3, sequence-4, sequence-5, and sequence-6 are 0.6611, 0.6221, 0.6450, and 0.6729, respectively. It means that the policymakers should make their plans accordingly for ND. Other information can be observed from the various sequences for different time scales. However, the ND is prevailing in all time scales in the selected region. So, the policymakers should work to mitigate negative impacts for this specific drought state (ND).

3.2. Discussion

The time series data were collected from six meteorological stations in the northern area of Pakistan. The drought index SPI is used for the analysis with various time scales for selected stations. The reliable and efficient outcomes of the analysis provide strong indications about the drought occurrences that can significantly help for an early warning system [31, 53, 58, 59, 61]. Therefore, a new MBCSTCS method is developed for the drought monitoring and mitigation policies that explicitly envisage spatiotemporal information. The proposed technique uses the long-run behavior of drought states (categorical sequences) from various time scales and stations in the selected region. If a time scale changes, then the categorical sequence sizes are affected. Therefore, in past studies, researchers have not been studied various time scales accumulatively due to inconsistency in their sizes and the phenomenon that has generated the observations for varying stations. However, these issues are being resolved effectively by the current technique. Furthermore, the outcomes associated with the present technique help to accomplish the current objective and provide more substantial outcomes for the selected drought states based on varying time scales and stations. MBCSTCS uses state selection procedures through finite mixture modeling and model-based clustering. Niaz et al. [55] developed a new model-based clustering technique that predicts probabilities for various drought classes. They computed categorical drought state (classes) sequences for selected drought classes and predicted their probabilities for the future. The mentioned study used a single time scale on various stations. However, in this study, the varying time scales are accumulatively measured for the monthly prediction of drought severity in selected stations. Therefore, it is a novel method for predicting drought severity using spatiotemporal categorical sequences. MBCSTCS is applied to six meteorological stations in the northern area of Pakistan. It is found that MBCSTCS provides expeditious information for the long-term spatiotemporal categorical sequences. The present analysis results are more suitable, especially for the selected region, and help the policymakers make better policies related to various kinds of drought, including meteorological, hydrological, and socioeconomic. The MBCSTCS may help to make plans for early warning systems, water resource management, and drought mitigation policies to reduce the severe effects of drought.

4. Conclusions

The outcomes of MBCSTCS provide the future probabilities corresponding to each of the drought states in varying stations and time scales. The obtained outcomes show that the most likely state to visit is ND, which means the probability associated with ND is higher than the other selected states in varying sequences. For instance, in sequence-1, the value shows that the probability of ND is 0.6668, which is higher than other selected states. Further in sequence-2, the probability of ND after three months is 0.6729. This probability of ND also prevails in other sequences. Furthermore, in sequence-6, the ND has a higher probability (0.6729) of occurrence in the future. Therefore, policymakers should work to reduce the negative impacts of this drought state (ND). In conclusion, this study suggests a more appropriate technique that emphasizes evaluating drought occurrences more instantaneously. The MBCSTCS helps the policymakers to make better policies related to various kinds of drought, including meteorological, hydrological, and socioeconomic. Furthermore, the analysis provides the basis to bring more attention to early warning systems. Moreover, the outcomes of the current analysis are only capable of transmitting in the present circumstances of the application site, as the circumstance of the selected stations will change the influence of the outcome for the extrapolations. Furthermore, the study can find some propagations and compute several thresholds for different drought severities for the region. Moreover, other drought indices can be incorporated to envisage the drought occurrences effectively.

Data Availability

The data used for the preparation of the manuscript are available from the corresponding author and can be provided upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by grants from the National Natural Science Foundation of China program (41801339). The authors are also thankful to the Deanship of Scientific Research at King Saud University for the support, through research group no. RG-1439-015. Finally, the author also extends his appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number (RGP.1/26/42), received by Mohammed M. Almazah (https://www.kku.edu.sa).