Abstract

In this paper, comparison results of parametric methodologies of change points, applied to maximum temperature records from the municipality of Tlaxco, Tlaxcala, México, are presented. Methodologies considered are likelihood ratio test, score test, and binary segmentation (BS), pruned exact linear time (PELT), and segment neighborhood (SN). In order to compare such methodologies, a quality analysis of the data was performed; in addition, lost data were estimated with linear regression, and finally, SARIMA models were adjusted.

1. Introduction

Global warming is the most evident manifestation of climate change; it refers to the average increase in global, terrestrial, and marine temperatures [1] and is attributed to the increase in greenhouse gases emissions (mainly , , and ). Although greenhouse gases are naturally present in the atmosphere, consumption of fossil fuels for the production of energy and supplies, product of human development, causes emissions of such gases faster than in its natural cycle [2, 3]. The Intergovernmental Panel on Climate Change (IPCC) estimates radical impacts that imply increases in temperature and the rise in sea level (derived from the melting of the poles), as well as the destruction of ecosystemic services [4]; therefore, it is important to monitor and analyze the behaviour of climatological data in order to establish policies, in case of being necessary, that contribute to diminish impacts on the environment and society.

Based on the previous remarks, it was considered pertinent to perform an analysis of the behaviour of the maximum temperature in a region of the Mexican state of Tlaxcala, which has a climatological station that registers, among other variables, the maximum temperature; such station is located at municipality Tlaxco-Tlaxcala, and it is the responsibility of the Civil Protection Commission of Tlaxcala State and the Climate Change Research Center of the Autonomous University of Tlaxcala. With aforementioned records, change point methodologies were applied and compared for identifying changes in maximum temperature of the early mentioned municipality; for this, data quality analysis was performed, lost data were completed by linear regression, and SARIMA models were adjusted. In order to contrast the goodness of fit of considered models, the Akaike information criterion (AIC) as well as the Bayesian information criterion (BIC) was considered. In Section 2, applied parametric methodologies are described. In Section 3, characterization of the area under study is presented as well as the results obtained from the applied methodologies. In Section 4, the results obtained are discussed.

2. Parametric Methodologies for Detecting Change Points

Change point problem started in control theory, and it dates back to 1950s decade [5, 6] where the existence of a single change point was tested. In order to detect a single change point, the first case analyzed dealt with independent random variables considering parametric methodologies [713]. To improve criteria for identifying and estimating change points, Bayesian and nonparametric methodologies have been used [7, 8, 1421].

Let be a sequence of n independent random vectors each one with probability distribution functions , respectively. The goal of change point problem is to testversuswhere , q is the unknown number of change points, and are the respective unknown positions that have to be estimated. When the distributions belong to a common parametric family , with , then the hypothesis tests are about the population parameters:versuswhere q and have to be estimated.

Since methodologies to detect change points in the context of independent random variables cannot be applied directly in time series [22, 23], parametric, nonparametric, and Bayesian methodologies have also been developed [12, 16, 2431].

Due to the natural disasters recorded in recent decades attributed mainly to anthropogenic activities, there have been an increasing number of studies using change point methods to detect shifts in climate. According to Beaulieu et al. [32], change point detection techniques have been used to detect changes in temperature and in precipitation to detect regime shifts, to detect shifts in aerosol and cloud data, and also to study past changes in the land uptake of carbon. There exist an extensive literature about detecting shifts into environmental data, which apply parametric and nonparametric methodologies [3238]. The main objective of this work is to compare some parametric methodologies to detect abrupt and multiple changes in the maximum temperature in the municipality of Tlaxco, Tlaxcala, Mexico.

2.1. Parametric Methodologies for a Single Change Point in Time Series

For detecting change points in environmental time series, most studies have to apply Bayesian and nonparametric methodologies; various nonparametric tests, including Mann–Kendall test and Pettit’s test, are widely used to detect trend and change point in the historical series of climatic and hydrological variables [35]. In this study, we consider identifying a single change point under the assumption that the data fit an autoregressive model.

Let , be consecutive observations from the modelwhere , , is a white-noise sequence that has a finite fourth moment, and .

To testunder and normality assumption, via likelihood ratio technique conditioned on the first p observations and applying to the likelihood ratio, is possible to establish

For studying the asymptotic behaviour of the distribution of the likelihood ratio statistic under , Davis et al. [39] considered a quadratic form and defined the process over and assumed that , for some . The details are in the following theorem.

Theorem 2.1. Let be the process defined byand assume that under , satisfies , with , for some , and is strongly related with for some . Then, under , we have that where and are normalization constants and is the gamma function.

Proof 2.1. Details of the proof can be consulted in [39].
The methodology proposed by Davis et al. [39] uses a quadratic form and does not allow identifying which parameter of the autoregressive model has changed. Gombay [40] has proposed a method that establishes one- or two-sided hypothesis tests to identify which parameter of an autoregressive model has changed; for this, the proposed methodology uses the components of the efficient score vector as test statistic.
Consider the following process:where is a white noise sequence with .
To testversuscomponents of the efficient score vector under Gaussian assumption are needed.
Let , the components arewhere denotes the logarithm function of likelihood based on observations . Information matrix of size is given bywith V being the covariance matrix of . In addition, let , and be the maximum likelihood estimators of parameters, and consider under as test statistic. For testing the change in d parameters with an exact level of significance α, Gombay [40] suggested the use of on each component, for both unilateral and bilateral tests, as follows:Unilateral test: if , then there exists a change in parameter over the sequence . Critic value is obtained from the relationFor unilateral test, the estimator time is Bilateral test: if , then there exits a change in parameter over the sequence . Critic value is obtained from the relationOnce a change in over , is guaranteed, the estimator for τ is

Proof 2.2. Details of the proof can be consulted in [40].

2.2. Parametric Methodologies for Multiple Change Points

Let be an ordered data sequence, assuming that there are m ordered change points in random times , with , ; hence, such change points induce a partition over the data into segments, where the th segment will contain observations . Something commonly used to identify multiple points is to minimizewith respect to m and . Here, is a cost function for a segment and is a penalty term in order to avoid over fitting [19]. Some commonly used cost functions are the additive inverse of the log likelihood function, accumulated sums [12], and other methods which include applying entropy information, for the penalization of [24]. The most common choice of penalty is one which is linear in the number of change points, that is, . Examples of such penalties include Akaike information criterion (AIC) and Schwarz information criterion (BIC) [41].

For the following methods, the change point hypothesis test isversus

2.2.1. Pruned Exact Linear Time Method

Pruned exact linear time (PELT) method is based on the method proposed by Jackson et al. [42] and Yao [43]. Let be a set of observations; the idea of the method is the search for an optimal partition of data given on a time interval I whose elements are sets of data cells, resulting in no significant loss of information.

A partition of an interval I is a set of blocks, where the blocks are sets of data cells defined by index sets : satisfying the usual conditions and if , and the number of blocks must satisfy . Let be the (finite) set of all partitions of I into blocks. Take as given an additive fitness function that assigns a value to any partition in the formwhere is the fitness of block and has to be minimized in order to detect a change point. According to [19], the PELT method proposes a pruning step, which increases the computational efficiency of the previous method while ensuring a global minimum of the cost function which is linear in n (22). The optimal segmentation is , where

The pseudocode of the PELT method can be consulted in [19, 44] and proofs about the optimal partition in [42].

2.2.2. Binary Segmentation Method

In the binary segmentation (BS) method, consider the model where is a deterministic, one-dimensional, piecewise-constant signal with change points whose number N and locations are unknown, and the random sequence is i.i.d. Gaussian with mean zero and variance one.

The method uses the CUSUM statistic defined by the inner product between the vector and a particular vector of contrast weights given bywhere , with . In its first step, the BS algorithm computes and then takes to be the first change point candidate, whose significance is to be judged against a certain criterion. If it is considered significant, the domain is split into two subintervals to the left and to the right of , and the recursion continues by computing and possibly resulting in further splits [44].

This method extends any single change point method for multiple change points by iteratively repeating the method on different subsets of the sequence for this. It begins by initially applying the single change point method to the entire data and prove if there exists τ such that

If (25) does not hold, it is guaranteed that there is no change point along , and the process ends; otherwise, the data are partitioned into two sets, which consist of the data sequence before and after the change point, ; the process is applied again in each partition until no more change points are found (see pseudocode in [44, 45]).

2.2.3. Segment Neighborhood Method

Segment neighborhood (SN) method uses dynamic programming to detect change point; it begins by setting an upper limit on the size of the segmentation space (i.e., the maximum number of change points) that is required, and this method continues by computing the cost function for all possible segments.

It takes the constrained case which segments the data up to t, for , into segments (using k change points) and denote the minimum value of the cost by . The idea of segment neighborhood search is to derive a relationship between and for for :

If this is run for all values of t up to n and for , then the exact segmentations with segments can be acquired. To extract the exact segmentation, we first let to denote the optimal position of the last change point if we segment data using l change points. This can be calculated as

The pseudocode of the SN method can be consulted in [14].

PELT, BS, and SN methods are implemented in R within the library changepoint; according to [46], the library changepoint creates class objects .cpt. The structure in R to detect change in the mean and variance is as follows:where data = a vector of objects that contain the data where a change in mean will be tested; method = choice for a single change point (AMOC) or multiple change points (PELT, SN, or BS); test.stat = statistic test, i.e., a distribution is assumed; choice “Normal” or “CUSUM” = see [11] or [6] for the normality assumption or the CUSUM test, respectively.

For variancewhere data = vector of objects that contain the data where a change in variance will be tested. know.mean = logical argument required only under the normality assumption; if the logical value is TRUE, then the mean is assumed known, and in the other case, mean is estimated via maximum likelihood. test.stat = choice normal or CUSUM test.

3. Study Area and Application of Methodologies

3.1. Study Area and Data Quality Analysis

To detect change points in temperature data applying the methodologies described in the last section, the Civil Protection Commission of the State of Tlaxcala and the Climate Change Research Center of the Autonomous University of Tlaxcala provided information of 6 years (September 2, 2011, to September 2, 2017,) from a climatological station, located at the center of the municipality of Tlaxco, Tlaxcala, México.

For the quality analysis, the missing data were estimated with regression models; also, to understanding the behaviour of the maximum temperature of the region studied, a box-plot of the data was made (see Figure 1), where the maximum temperature per week was considered; for this, it was taken as the first week of the year, the week 1 of January (the week that begins with the first Sunday of the month) and week 52 as the last week of December. We verify that the assumptions of the methodologies studied were fulfilled to apply the studied theory and finally analyzed the results.

In Figure 1, the increase in maximum temperature is observed until it reaches its maximum values in weeks 11 to 23 (the spring season); with the arrival of rainy season (month of June, week 24), it can be seen a decrease in temperature in the box-plot. It also highlights the effect of the heat or drought intraestival (climatic event which consists of a decrease in the amount of precipitation in the middle of the rainy season which occurs mainly in the warm months of July and August); this event is also characterized by the decrease in rainfall due to an increase in temperature (weeks 28 to 33).

3.2. Application of Methodologies
3.2.1. Detection of a Single Change Point

In order to apply the methodologies of Davis et al., as well as Gombay, ARIMA models were fitted applying the Box–Jenkins methodology [47]. To verify stationarity and seasonality, autocorrelation function (acf) and partial autocorrelation function (apf) were analyzed. Figure 2 shows the time series for maximum temperature; according to the behaviour of the autocorrelation function (Figure 2(b)), it is observed that the time series is not stationary, with a difference the acf decreases in a fast way guaranteeing the stationarity of the series.

Via the analysis of the acf, ARIMA models were fitted; we consider as the best model the one with the lowest AIC value. Table 1 shows all fitted models and its corresponding AIC values. According to the AIC criterion, the best model to fit the data was the model, and coefficients of this model are shown in Table 2. After fitting the model, a residual analysis was made. Noncorrelation of the residuals was verified with the Ljung-Box test. In addition, the Shapiro test was used for verifying normality (Figure 3).

With presented coefficients in Table 2, we have that the model for fitting the maximum temperature iswhere denotes the maximum temperature at time t and is a white noise process.

After verifying assumptions of Theorem 2.1, values of statistic were calculated for (Figure 4), and the maximum value was 29.17452 with . Considering , critic value in order to accept or reject a change in parameters of model expressed by (3.2.1) is 15.76829; therefore, there exists sufficient evidence to guarantee a change in some parameters.

For applying Gombay’s methodology, each component of vector was estimated and plotted (Figure 5). In addition, in Table 3, maximum values of with as well as the value of α were presented; hence, it may be concluded that there is not considered enough evidence to guarantee a change in any parameter of the model given in Section 3.2.1.

Applying the library changepoint of R, was detected a single change point (AMOC) in the mean and variance of database (Table 4 Figure 6).

3.2.2. Detection of Multiple Change Points

With the library changepoint of R, change point analysis of our data was performed with the PELT, BS, and SN methods. Tables 5 and 6 present the number of change points as well as their estimate for mean and variance of the maximum temperature, respectively.

Tables 7 and 8 present all mean and variance change points detected with the PELT method. To have a better conception of the location of change identified points by PELT, BS, and SN methods, and in order to interpret the location of change points, Figures 7 and 8 are presented.

3.3. Fitted Models after Detecting Change Points in the Mean of Maximum Temperature

After applying methods AMOC, BS, PELT, and SN for mean, we observe that PELT method realizes various segmentations on data; hence, an ARIMA fit is not possible. Method BS realizes six partitions to the database; however, last segmentation has only seven data (from weeks 307 to 313) and then is not possible a SARIMA fit in this segment.

In order to model the time series partitions and compare their results, this paper considered partitions from methods AMOC and SN. In the SN method, we considered, like one segmentation the partitions of weeks 276 to 286 (see Figure 7), three ARIMAs were fitted for the maximum temperature in lapses: weeks 1 to 237, 238 to 286, and 287 to 313 applying the methodology of Box-Jenkins.

Table 9 shows the ARIMA models fitted; Figure 9 presents the partitioned time series and its autocorrelation function for each segment.

Table 10 presents fitted ARIMA models for three partitions considered with the SN method; Figure 10 shows each partition for the time series.

To choose the best model for the partitioned time series, the Akaike information criteria were considered, and after applying the AMOC method, the fit to the time series is presented in (31); equation (32) represents the fit after applying the SN method,

4. Comparison of Methodologies and Conclusion

The methodologies of Davis et al. [39] and Gombay [40] allow us to test if there exists a single change point or not in some parameter of the autoregressive model of order p; however, the methodology of Davis et al. does not allow us to estimate the time in which it occurs. When applying these methodologies in the database of maximum temperature, found results show with the methodology of Davis et al., evidence of a change in parameters of the proposed autoregressive model; on the other hand, the methodology of Gombay does not provide sufficient evidence to consider a change; this is attributed to the fact that for fitting the autoregressive model to database, a difference had to be made.

The other applied methodologies do not consider the structure of data, apart from being designed to estimate change points in mean and variance. The AMOC method estimates a single change point, while the PELT method estimates 48 change points in the considered database. Segmentation of the latter methodology makes it hard to propose models to fit the database.

For change in the mean of maximum temperature, methods AMOC, PELT, SN, and BS coincided with the conclusion of a change in week 250 when comparing this with methodology of Gombay, which proposes a possible change in week 245 is very close to that obtained by the other methods.

Gombay’s methodology does not have enough evidence to decide a change in variance; the AMOC method estimates a change in variance in week 27, and the PELT, SN, and BS methods coincide with a change in weeks 18, 194, 227, and 306. With the AMOC method, the change is reported in week 250, which represents the third week of June 2016, reviewing the cold and warm episodes per season reported by the Climate Prediction Center [48]; in the June month of 2016, it is observed that the ocean changed from a heating to a cooling, and this allows us to validate the change reflected with the AMOC method; similarly, for the SN method, estimated change points coincide with the transition from a heating to a cooling of the ocean (week 237, which represents the last week of February month, 2016) and a period without heating of the ocean to a cooling (week 287, which represents June month, 2017).

Although PELT and BS methods detect a change in week 306 (Tables 58) which is almost at the end, this week is in the latest record; therefore, it might be an unreliable change point.

The study of the points of change in climate data at the regional level is essential for decision-makers and for the local development of the communities to be analyzed, since the level of impact on their economic growth can be dimensioned at the same time as they will be able to establish measures to adapt to these changes with society and political-economic sectors.

For the subsequent studies, we intend to consider seasonal models as well as multivariate models that may be related to describe the climate of some regions.

Data Availability

The data from maximum temperature used to support the findings of this study were provided by the Climate Change Research of the Autonomous University of Tlaxcala and the State Civil Protection Commission. Requests for access to these data should be made to Rogelio Bernal Morales, M.S. ([email protected]) or Tomás Morales Acoltzi, Ph.D. ([email protected]).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by VIEP-BUAP, via the project “La caminata aleatoria del elefante y el Teorema Central del Límite Casi seguro.”