Abstract

The goal of detecting change points is to recognize abrupt changes in time series data. This is suitable, for instance, to find events that characterize the financial market or to inspect data streams of stock returns. Regression models categorized as supervised methods have played a significant role in change-point detection. However, since change points might not be available beforehand to train the model, and because the series data might be statistically atypical, the applicability of regression models is limited. To avoid statistical assumptions, this study uses the grey theory, a kind of artificial intelligence tools, to measure the relationships between sequences by grey relational analysis (GRA). This paper contributes to propose an unsupervised method to detect possible change points in time series by GRA. Change-point analysis of the proposed method was performed on S&P100 stock returns. Experimental results from evaluating the recognition accuracy rate show that the proposed method performs well compared to other methods considered for change-point detection.

1. Introduction

A change point represents a transition between different sequences or states of a time series [1]. From the perspective of statistics, the probability density functions of two consecutive sequences resulting from a change point are different [2, 3]. Detecting transitions in time series data has become increasingly important. Many change-point detection (CPD) methods have been proposed for a range of real-world problems to detect and react to interesting events, such as climate change detection [4, 5], image analysis [6], hydrology[2, 7, 8], medical issues [9, 10], and tourism forecasting [11, 12]. Also, statistical methods, such as the likelihood ratio test, the standard normal homogeneity procedure [13], and the regression, have taken a significant role in CPD.

Learning methods applied to CPD problems can be either supervised or unsupervised. Regression models such as the logistic regression [14] and support vector machines [15, 16] can be treated as a supervised approach, with which sufficient training data with labels need to be provided for reasonable performance [1, 17, 18]. However, it is possible that change points are totally unknown or that there are only few available prior to training. Furthermore, the collected time series data might well not conform to statistical properties such as homogeneity and a normality of errors. As such, regression is restricted when applied to CPD. In contrast to supervised methods, unsupervised learning uses unlabeled data to find desired patterns [19]. Thus, to expand the applicability of regression, we try to use a regression-like method to measure the relationship between response and explanatory variables in order to develop an unsupervised method for CPD.

Given a time series, each variable in the series can be extended as a sliding window with a sequence of time series variables [14, 20, 21]. It is found that the grey theory, categorized into artificial intelligence tools [22], can effectively measure the degrees of relationships among sequences by grey relational analysis (GRA) [2327]. To estimate the relationship between a reference sequence and a set of comparative sequences, GRA treats the reference sequence as the desired goal or the response variable [28]. Indeed, GRA has been widely applied to diverse real-world problems (e.g., [2940]). In practice, GRA assigns a so-called grey relational grade (GRG) to each comparative sequence, such that the greater the GRG, the closer the relationship to the reference sequence. For CPD, given a reference sequence, when the respective GRGs of time series variables are obtained from a set of comparative sequences, we can investigate how these ratings can be used to determine all possible change points. Insofar as it does not require prior training, this paper contributes to propose an unsupervised CPD method for detecting multiple change points in time series with GRA.

Despite the usefulness of GRA, few studies have addressed its adoption for CPD. For instance, Wong et al. [2] used GRA to analyze change points in hydrological time series. The main difference between the proposed grey CPD method and Wong et al.’s GRA-based method (WGM) is that the proposed method can detect multiple change points from the collected data, but only one change point could be detected by WGM. Since multiple change points often exist in time series data, this makes WGM restricted. To evaluate the performance of the proposed grey CPD method in terms of whether change points can be detected, several metrics can be adopted, including the sensitivity, specificity, and G-mean [1].

The rest of the paper is organized as follows. Section 2 introduces regression-like GRA. Section 3 presents the proposed grey CPD method. In Section 4, we examine the detection performance of the proposed method using real data from daily log returns of stocks listed on the Standard and Poor’s 100 (S&P100) index. Section 5 presents a discussion and the conclusions of this study.

2. Regression-Like GRA

Let (x1, x2, …, xn) denote a time series sequence of length n, where xi (1 ≤ i ≤ n) represents a variable at time stamp i. In a time series, sliding window Xi = (xi, xi + 1, …, xi + s − 1) with length s (1 ≤ s ≤ n) can be used instead of xi. Furthermore, i + s − 1 cannot be greater than n. The reference sequence Xi = (xi(1), xi(2), …, xi(s)) = (xi, xi + 1, …, xi + s − 1) and m + 1 comparative sequences, Xi, Xi+1 = (xi + 1(1), xi + 1(2), …, xi + 1(s)) = (xi + 1, xi + 2, …, xi + s), Xi + 2 = (xi + 2(1), xi + 2(2), …, xi + 2(s)) = (xi + 2, xi + 3,…, xi + s + 1),…, Xi + m = (xi + m(1), xi + m(2), …, xi + m(s)) = (xi + m, xi + m + 1, …, xn), are prepared for CPD, where m = n − s − i + 1. For instance, given n = 6, i = 3, and s = 3, it follows that m = 1 and i + s − 1 ≤ n hold. Thus, X3 = {x3, x4, x5} is the reference sequence, and both X3 and X4 = {x4, x5, x6} become the comparative sequences.

From the perspective of multiple attribute decision making, GRA can be used to evaluate a decision problem with m + 1 alternative and s attributes. Therefore, s should not be smaller than two. The relationship between Xi on xi(k) and Xj (i ≤ j ≤ i + m + 1) on xj(k) (1 ≤ k ≤ s) can be derived by the grey relational coefficient (GRC), denoted by ξk(Xj, Xi), as follows [24]:where ρ (0 ≤ ρ ≤ 1) is the discriminative coefficient and

It is noted that ρ is often specified as 0.5 [2325], but this is apparently not an optimal setting.

The overall relationship, r(Xi, Xj), between Xi and Xj can be obtained by aggregating ξk(Xi, Xj) as follows:

This means that each variable in a sequence is of equal weight. Xi and Xj are similar to dependent and independent variables in traditional regression, and r(Xi, Xj) is analogous to the regression coefficient of Xj to Xi.

3. The Proposed Grey CPD Method

r(Xi, Xj) is the foundation of the proposed grey CPD method. Among the set of variables {xi, xi + 1, …, xi + m}, CPD can be conducted by inspecting the absolute ratio of variety with respect to xj, δ(Xj), formulated as.δ(Xj) also measures the degree of variety of r(Xi, Xj). If xq is a candidate change point, then δ(Xq) has the maximum value among δ(Xk) such thatand vice versa. xj can be judged as a change point when δ(Xq) ≥ θ, where θ denotes a nonnegative cut value. The greater the value of θ, the fewer possible change points that can be discovered.

The next possible change point can be detected among {xq + 1, xq + 2, …, xq + m}, where m = n − s − q, since the previous candidate changes point (xq). In practice, Xq + 1 = (xq + 1(1), xq + 1(2), …, xq + 1(s)) = (xq + 1, xq + 2, …, xq + s) can serve as the reference sequence, and Xq + 1i, Xq + 2 = (xq + 2(1), xq + 2(2), …, xq + 2(s)) = (xq + 2, xq + 3, …, xq + s + 1), Xq + 3 = (xq + 3(1), xq + 3(2), …, xq + 3(s)) = (xq + 3, xq + 4, …, xq + s + 2), …, Xq + m + 1 = (xq + m + 1(1),xq + m + 1(2), …, xq + m + 1(s)) = (xq + m + 1, xq + m + 2, …, xn) serve as comparative sequences. Then, i is set to q + 1. Given the sequence length s between s1 and s2 (1 < s1 < s2), this CPD process is iteratively performed until i + s − 1 > n for each possible value of ρ ranging from 0 to 1. Figure 1 demonstrates the flowchart of the proposed grey CPD method.

4. Empirical Results

4.1. CPD Methods Considered

Since the main applications for GRA are alternative evaluation and clustering, two unsupervised clustering methods were considered in the empirical analysis with distinct features for CPD methods, namely, Wong et al.’s GRA-based method (WGM) [2], the clustering-based change detector (CBCD) [41], and the piecewise linear function. These methods are briefly described as follows.

4.1.1. WGM

WGM was originally designed to detect change points in hydrological time series. The method considers the reference sequence X1 = (x1, x2, …, xs), and n − 2s + 1 comparative sequences, Xs + 1 = (xs + 1, xs + 2, …, x2s), Xs + 2 = (xs + 2, xs + 3, …, x2s + 1), …, and Xn − s + 1 = (xn − s + 1, xn − s + 2, …, xn). After computing r(X1, Xs + 1), r(X1, Xs + 2), …, and r(X1, Xn − s + 1), the relational degree of X1 to all comparative sequences is defined as

Subsequently, X1 is replaced with (x1, x2, …, xs+1), whereas Xs + 2 = (xs + 2, xs + 3, …, x2s + 2), Xs + 3 = (xs + 3, xs + 4, …, x2s + 3), …, and Xn − s = (xn − s, xn − s + 2, …, xn) become comparative sequences, and r(s + 1) can be thus obtained from n − 2s − 1 comparative sequences on average. This process is performed until r(n/2) is obtained. CPD can then be conducted by inspecting the relative variety ratio of the relational degreewhere k = s, s + 1, …, n/2 − 1. WGM detects xj as a change point when η(j) satisfies

It is obvious that only a single change point can be detected in a time series sequence with WGM, regardless of the length of the sequence. Moreover, generating comparative sequences for a time series variable with the proposed method differ from WGM. However, since multiple change points often exist in time series data, WGM was not considered in the empirical study.

4.1.2. CBCD

The CBCD performs CPD by K-means clustering. Initially, a reference window (x1, x2, …, xs) is given for which the K clusters are created. The centroid cp and the radius rp of the cluster p (1 ≤ p ≤ K) can be computed as follows:where np denotes the size of the cluster p, and xp,k represents the sample k in the cluster p. Then, a current window (x2, x3, …, xs + 1) is generated that replaces x1 with a new incoming xs + 1, and the distance between xs + 1 and cp is computed as

If d(xs + 1, cp) > rp, then xs + 1 is not a member of the cluster p.

As a result, xs + 1 can be considered a change point when it cannot be categorized into any cluster. At this time, (x2, x3, …, xs + 1) becomes the reference window, and K new clusters can be created from the new reference window. Subsequently, a current window catches xs + 2 to generate (x3, x4, …, xs + 2), and xs + 2 is inspected with regard to whether it is a change point. This process terminates after checking xn.

4.1.3. Piecewise Linear Function

The piecewise linear function has been used for CPD by finding the joints of the pieces by making the approximation function continuous [42]. Li and Yu [43] proposed piecewise regression analysis that requires users to prespecify the number of change points although this might well be unknown beforehand. Keogh et al. [44] presented several useful CPD methods that do not entail prespecifying the number of change points, among which the bottom-up method seems to perform best. Since the bottom-up method determines a piece to approximate each state, the property of continuity in the approximation function does not exist. We slightly revise the bottom-up method to conform to the perspective of continuity.

At first, a piece denoted seg(xi − 1, xi) is generated for (xi − 1, xi) by connecting xi − 1 and xi (2 ≤ i ≤ n), and the cost of merging each pair of adjacent pieces is then calculated. For instance, if the cost of merging seg(xa, xa + 1) and seg(xa + 2, xa + 3) is the lowest, and if it is less than a prespecified merging threshold, then a new piece seg(xa, xa + 3) can be generated by deleting seg(xa, xa + 1) and seg(xa + 2, xa + 3). The cost of generating seg(xa, xa + 3) is computed by summing all the r between seg(xa, xa + 3) and xa, xa + 1, xa + 2, xa + 3. The cost of merging seg(xa − 2, xa − 1) and seg(xa, xa + 3) and that of merging seg(xa, xa + 3) and seg(xa + 4, xa + 5) can be computed. Then, the method iteratively merges the pair with the lowest cost until the cost of merging any pair of adjacent pieces is greater than the threshold.

A new method, called the sliding window and bottom-up (SWAB) method, was developed for online detection [44]. SWAB efficiently produces results that are identical to those of the bottom-up method. We used the SWAB to implement the piecewise linear function, but we omit an introduction to this method for simplicity.

4.2. CPD Performance Evaluation

In order to compare alternative CPD methods, appropriate measures of performance are needed. Since the ratio of change points to total data is small, CPD is typically involved in a learning problem with an unbalanced class distribution. When treating CPD as one kind of pattern classification problem, the G-mean ends with a commonly used indicator of CPD performance [16].

A confusion matrix used to evaluate the performance Table 1 of a CPD method is represented as follows.

G-mean then utilizes both sensitivity and specificity measures to assess the performance, where the sensitivity refers to the ratio of correctly recognized change points, and the specificity refers to the ratio of correctly recognized nonchange points.where sensitivity and specificity are formulated as

4.3. Application to S&P100 Stock Returns
4.3.1. Data Collection and Preparation

Empirical studies were conducted using a real dataset to compare the CPD ability of the proposed method to CBCD and SWAB. The task of CPD was performed on log-returns of the daily closing values of stocks consisting of the S&P100 index. Change points found in stock returns can better indicate events that characterize the financial market. In terms of 88 series obtained from different institutions, Barigozzi et al. [45] utilized time series factor models to derive multiple primary change points in daily log returns between 4 January 2000 and 10 August 2016. The change points they discovered can be taken as true change points. Furthermore, among these 88 series, two representative series mentioned in [45]–namely, Goldman Sachs (GS) and Bank of America (BAC)–were taken into account. All data are available from Yahoo Finance.

The aim of the empirical study was to examine the G-mean of the CBCD, SWAB, and the proposed grey CPD method by carefully tuning parameter specifications. Because time periods that include too few or no change points are unhelpful for finding parameter specifications, data from 2004–2006 and 2011–2016 were excluded from the GS and BAC series. Therefore, 8 yearly time series remained for GS and BAC with 31 change points (indicated by dates for each series).

Besides, since change points are correlated with events, it is reasonable to analyze the change points and their corresponding events by month, rather than the specific days reported in [45]. For instance, the burst of the dot-com bubble occurred between March 2000 and October 2002. Thus, 31 change points originally indicated by a specific date were reduced to 21 change points indicated by month. For instance, both 4 May 2000 and 10 May 2000 were detected as change points, but we replaced these two days with a single change point: May 2000.

4.3.2. CPD Results

The programs for implementing the proposed grey CPD method were coded in Delphi 7.0 on a personal computer with an Intel Core i3-8100 CPU, Microsoft Windows 10, 8 GB RAM, and a clock rate of 3.60 GHz. Two parameters significantly influenced the performance of the proposed grey CPD method with the eight time segments: the sequence length s and the cut value θ. For the segment k (1 ≤ k ≤ 8), given s between s1 and s2, θ ranging from zero to 6 was carefully tuned so as to maximize the G-meank, as follows:where Sensitivityk and Specificityk denote the sensitivity and specificity for the segment k, respectively. For instance, G-mean2, Sensitivity2, and Specificity2 are associated with the time segment between January 2001 and December 2001. The maximum G-mean2 value can be found by tuning θ. Finally, the optimal results from all eight segments were used to summarize the overall CPD results. The results of the proposed method on the GS series are summarized in Table 2. G-mean = 0.713 was the best performance obtained by the proposed method.

Table 2 shows that the performance of the proposed method can be improved by choosing appropriate s1 and s2. Therefore, the proposed method was further applied to the BAC series by finding appropriate parameter specifications within the range of s (3 ≤ s ≤ 12). As a result, when s1 = 5 and s2 = 12, the sensitivity and specificity were 0.714 and 0.72, respectively, obtaining the best G-mean (0.717) of the proposed method on the BAC series.

(1) Comparison with the CBCD. To improve CPD results, the rule of determining if xs + 1 is not a member of the cluster p should be revised. In our design, if d(xs + 1, cp) is greater than a user-specified radius threshold rather than rp, then xs + 1 is not a member of the cluster p, and vice versa. By contrast, the CBCD is sensitive to the number of clusters K, the sequence length s, and the radius threshold. With radius thresholds of 0.005 and 0.01, the performance values of the CBCD on the GS series are depicted in Figures 27, in which the dashed lines denote the performance values of the proposed method. The performance values on the BAC series are depicted in Figures 813. Because the best G-mean on the GS series was 0.598 with K = 4, s = 15, and a radius threshold of 0.01, and 0.556 on the BAC series with K = 3, s = 5, and a radius threshold of 0.01, the proposed method outperformed the CBCD in terms of the G-mean.

Moreover, a greater radius threshold (0.01) can lead to the discovery of fewer change points with higher specificity. By contrast, a lower radius threshold (0.005) leads to better sensitivity at the expense of specificity. This is the reason why the sensitivity and specificity of the proposed method were considerably superior to those of the CBCD in the cases of higher or lower radius thresholds, respectively.

(2) Comparison with the SWAB. The merging threshold significantly influenced the performance of SWAB. The performance of SWAB at different merging thresholds on the GS and the BAC series is depicted in Figures 14 and 15, respectively. The greater the merging threshold, the fewer the number of change points that can be discovered, along with lower sensitivity and higher specificity. The best G-mean on the GS series was 0.690 with a merging threshold of 1.8 and 0.661 on the BAC series with a merging threshold of 1.9. Therefore, in terms of the best G-mean, the proposed method outperformed the SWAB with both series.

5. Discussion and Conclusion

In the financial market, CPD can be applied to discover abnormal volatility in a stock returns series. Using the detected change points, we can find and account for events that characterize this volatility. This can help relevant authorities examine or set up managerial mechanisms to cope with these extraordinary situations. To correctly identify change points, it is crucial to develop an accurate and high-performance CPD method. This paper addresses the case where variables in the collected series cannot be labeled because the change points are unknown. In such a case, traditional regression methods cannot perform CPD. Thus, we proposed an unsupervised regression-like method using GRA. GRA is a kind of multiple attribute decision-making methods, which are capable of effectively evaluating the overall performance of alternatives [46].

There are several advantages to using the proposed grey CPD method. First, there are no constraints on the time series data, such as the stationarity, and the data do not need to be independent and identically distributed. Second, it is not necessary to prespecify the number of change points. Finally, the proposed method is sufficiently simple to implement as a program without any statistical assumptions. Empirical results on two representative series of stock returns were encouraging in terms of the performance obtained using the proposed grey CPD method. This demonstrates that using GRA to measure the relationship between response and explanatory time series variables can boost the performance of the proposed method. It should be noted that both the CBCD and SWAB could obtain better sensitivity than the proposed method, but only at the expense of specificity for instance.(1)For the GS series, the best sensitivity of the CBCD was 0.905 with K = 3, s = 5, and a radius threshold of 0.005. That of the SWAB was 1 with a merging threshold of 0.5. The specificity of the former and the latter was 0.36 and 0.053, respectively.(2)For the BAC series, the best sensitivity of the CBCD was 0.762 with K = 4, s = 20, and a radius threshold of 0.005. That of the SWAB was 0.857 with a merging threshold of 0.5. The specificity of the former and the latter was 0.347 and 0.147, respectively.

This is the reason why the G-mean, which combines sensitivity and specificity, is commonly used to measure the performance of CPD methods from the perspective of classification. As such, it is not possible to conclude whether any classification method was “best” insofar as there is no such thing as a best classifier [47].

This study motivated us to explore further studies. First, the proposed grey CPD method can be further applied to other real-world problems. For instance, it can be used to discover abnormal trades in the stock market. Detecting anomalies in the volatility of stock prices can provide regulators with useful information about investments and prevent crime. We will explore this application in future research. Furthermore, the GRG was implemented using a weighted-average method, where noninteraction was assumed among the attributes involved. Nevertheless, the assumption of additivity may not be realistic with many applications [48]. Thus, our future work will explore the development of a nonadditive grey CPD method using nonadditive GRG with fuzzy integral [39, 49] and check the resultant impact on performance. It should be noted that the fuzzy integral has proven to be effective in dealing with performance evaluation and preferential dependence among attributes [5052].

Data Availability

The authors declare that they have no conflict of interest. This article does not contain any studies with human participants performed by the authors.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was financially supported by the Ministry of Science and Technology, Taiwan, under grant MOST 110-2410-H-033-013-MY2.