Abstract

Understanding and predicting extreme turning points in the financial market, such as financial bubbles and crashes, has attracted much attention in recent years. Experimental observations of the superexponential increase of prices before crashes indicate the predictability of financial extremes. In this study, we aim to forecast extreme events in the stock market using 19-year time-series data (January 2000–December 2018) of the financial market, covering 12 kinds of worldwide stock indices. In addition, we propose an extremes indicator through the network, which is constructed from the price time series using a weighted visual graph algorithm. Experimental results on 12 stock indices show that the proposed indicators can predict financial extremes very well.

1. Introduction

The stock market is an important part of global financial markets. Since entering the stock market is relatively easy and the returns are considerable, the stock market has become a major market for investment activities of ordinary investors. However, compared with the capital markets of developed countries such as the United States, emerging stock markets, as represented by China, are more volatile, and their system risks are much greater, due to the short establishment time and imperfect institutional system. Therefore, modelling the stock market and making accurate predictions are very useful for both investors and regulatory authorities to manage the system risk [1]. The financial extremes, such as bubbles, crashes, and rebounds, play a crucially important role in research of the stock market, and prediction of financial extremes using stock market indices is also a hot topic in the research of financial markets [24].

In the last decade, there has been a growing body of literatures addressing the utilization of complex network methods for the characterization of dynamical systems based on time series. There are at least three main class approaches to transform time series to network representations [5], such as proximity networks [6], transition networks [7], and visibility graphs [8]. The connectivity of proximity networks is determined by the mutual statistical similarity or metric proximity between different segments of a time series. Zhang and Small introduced a method to convert the pseudoperiodic time series into networks, in which cycles in the time series are considered nodes, and the edges are determined by the strength of temporal correlation between cycles [6]. Xu et al. proposed another method in which phase-space points are considered nodes in the network, and each node links to its closest k neighbors to form a complex network [9]. To produce an ordinal partition transition network, the time series is symbolized using ordinal patterns. The ordinal patterns are used as the nodes of the network, and directed edges are based on temporal succession of the ordinal patterns [10]. The visibility graph algorithm was proposed by Lacasa in 2008, in which nodes correspond to the data points of the time series, and an edge is assigned to connect two nodes if they can see each other. The visibility graph algorithm can map all types of time series into networks, by converting a periodic series into a regular graph, a random series into a random graph, and a fractal sequence into a scale-free network [8, 1114].

Based on the visibility framework, a horizontal visibility algorithm [15] and a limited penetrable visibility algorithm were generated [16]. Stephen et al. extracted all the segments in a time series with a predefined window size and mapped each segment to a visibility graph. The successively occurring visibility graphs are linked in turn. The weights of links reflect the transfer behaviors of the distinguishable states [17]. In addition, Yan and Serooskerken proposed an absolute invisibility graph, which is just the opposite of the visibility algorithm, to predict the trough points in the stock prices [18]. For the low complexity and good geometric properties, the visibility graph has been widely applied in many kinds of time series, including turbulence [19], sunspot series [20], electrocardiograms (ECGs) [21], the Construction Cost Index (CCI) [22, 23], and financial market [2428].

Based on the previous achievements, we constructed a weighted visual graph (including the visibility graph and absolute invisibility graph), in which the edge weight is defined as the combination of the price difference and the time interval of the corresponding nodes. We then proposed a new predictive indicator of the financial extremes based on the weighted visual graph. The extremes of the financial market are defined as the peak (or trough) points, which are the maximum (or minimum) index among a period of stock prices in this paper. Experiments on 12 indices show the strong predictive power of the proposed indicators.

The rest of this paper is organized as follows. In Section 2, we describe the data used in this work and propose the indicators of financial extremes. In Section 3, we show the experimental results on 12 stock indices. Conclusions are drawn in Section 4.

2. Methodology and Data Description

2.1. Data Description

A series of stock market indices can reflect the overall movement of the markets. We collected 12 major stock market indices from Yahoo Finance (https://finance.yahoo.com) and used the daily closing price series for approximately 19 years, from January 2000 to December 2018. During this period, there were about 4500 trading days (the accurate trading days may be slightly different between the indices). The extremes (peak or trough points) of the financial market are defined as the maximum (or minimum) index within a period of stocks. Table 1 shows the information and the basic statistic of the 12 stock indices, where a = 45 and b = 131 (these variables will be explained later). In this work, we propose an indicator of the extremes on these datasets.

2.2. Problem Definition

In this study, we define the extremes of the financial market as the peak (or trough) points that are the maximum (or minimum) index within a period of stocks. In this case, we aim to find an indicator that has strong predictive power for the peak (or trough) points. Mathematically, for a given stock price time series , where t is the time variable and y the price value at t, the point at time t is a peak (or trough) point if is the maximum (or minimum) price over the period of , where a and b are the number of trading days after (a) and before (b) the current day, respectively. As in the previous work [18], we chose and , which denote the number of trading days in 6 months and 2 months, respectively. In total, the numbers of peak and trough points for each stock index in the considered period are illustrated in Table 1. Figure 1 illustrates the peak and trough points of the Shanghai Stock Exchange (SSE) index. Our goal in this work is to predict whether the peak (or trough) points will appear in the next several days.

2.3. Construction of Visibility Graph and Absolute Invisibility Graph
2.3.1. Visibility Graph

In this work, we find the indicator of extremes from the network perspective, but first we briefly introduce the visibility graph algorithm proposed by Lacasa et al., which is the most commonly used method to convert a time series into a network [8]. For a series , a visible edge exists between two nodes and , if any node located between them satisfies

Figure 2(a) is a schematic of the visibility graph that was converted from the series of SSE index’s daily closing prices in January 2015. A natural number is used to mark the trading days. The points and lines between them constitute the visibility graph. The nodes correspond to series data in the same order and an edge connects two nodes if one can see the other (visibility between them). Taking points 10 and 16 of Figure 2(a) as the example with which to explain the concept of “visibility,” between points 10 and 16 there are five points (11 to 15) that are all under the red line from point 10 to 16. A link (visibility) exists between nodes 10 and 16. As the definition of the visibility graph, the node with a large price would be more likely to have more links, and this would be the basic method with which to predict the peak points.

2.3.2. Absolute Invisibility Graph

The absolute invisibility graph algorithm [18] is just the opposite of the visibility algorithm. For a series , an absolute invisibility edge exists between two nodes and , if any node located between them satisfies

Figure 2(b) is a schematic of the absolute invisibility graph. Taking points 12 and 16 of Figure 2(b) as an example with which to explain the concept of “absolute invisibility,” between points 12 and 16 there are three points (13, 14, and 15) that are all above the line from points 12 to 16. Therefore, every point located in 12 and 16 can obstruct the visibility between 12 and 16, and a link (absolute invisibility) exists. As the definition of the absolute invisibility graph, the node with a low price would be more likely to have more links, and this would be the basic method with which to predict the trough points.

Based on the visibility graph and absolute invisibility graph algorithm, Yan and Serooskerken put forward an indicator to predict the extreme value in the time series [18]. The method in their article shows there will be more possible appearances of an extreme value if the degree of the corresponding node is much higher than the others.

2.4. Indicator of Extremes

It should be noted that the above methods consider only the edge between two nodes, which misses a lot of detailed information of the original series. Taking the visibility graph as an example (Figure 2(a)), the link between points 11 and 12 and that between points 11 and 13 have no differences in the original visibility graph. However, the difference of the variation is very significant, which is also an important factor related to the extremes. Therefore, we propose a weighted visual graph (WVG), which considers the variation between the two points based on the original visibility graph or absolute invisibility graph. As shown in Figure 2(c), the dotted line represents the horizontal sight line, and the angle between the solid line and the dotted line is defined as the depression angle. For a pair of nodes that satisfy the visual condition of the visibility graph (or absolute invisibility graph), the weight of the edge between them is defined as the tangent value of the depression angle:

Compared with the original visibility graph (or absolute invisibility graph) algorithm, the WVG algorithm considers more details such as the time interval and price variation between two points in the time series. It should be noted that, if the price increased, the depression angle is positive, leading to positive weight, and vice versa. Among the entire time series, we use the observation window with S days of data to construct the weighted visual graph. For each graph converted from the corresponding time window, we define and for the weighted visual graph as the indicator with which to predict the appearance of the peak and trough points in the following days, respectively. To predict the peak points, we use the weighted visibility graph, and is defined as

To predict the trough points, we use the weighted absolute invisibility graph, and is defined aswhere i represents the rightmost point in the observing window and S is the length of the observation window.

For the weighted visual graph, the structure of each node is very sensitive to the neighborhood values. Affected by the neighbors, the indicators based on the visibility graph and absolute invisibility graph fluctuate frequently. For example, according to Figures 2(a) and 2(b), it is obvious that, for the points 16 and 17, although these two days’ prices are similar, the corresponding indicators are very different. To reduce the impact of neighbor points, we consider the observations’ neighbor nodes as a whole (as shown in Figure 2(d)), and the accumulated weighted indicators can be calculated as follows:where n decides the size of the considered accumulated neighbors.

3. Results and Discussions

3.1. Comparison Methods

Indicators that are based on the degree (D) and accumulated degree (AD) of the visibility graph and absolute invisibility graph are applied as the comparison methods. The indicators used in this work are summarized in Table 2.

3.2. Metrics

We set the observation window with a length of 262 (S = 262) trading days, and its moving step equals 1 day. For each observation window, we calculate the indicators listed in Table 2, and we expect that the peak point (or trough point) would appear in the following 45 (a = 45) days if the indicator is significant. Therefore, we choose different thresholds for the indicators to observe the predictions. Once the indicator value is above the threshold, we believe that there will be a peak (or a trough) point within 45 days after the rightmost point of the corresponding window. To test the performance of the proposed indicators, we calculate the precision (P) and recall (R) separately. Supposing that is the number of the extremes (peak or trough points) in the total time series, is the number of the prediction extremes in which the indicator values are larger than the threshold and is the number of the prediction extremes, which are the real extremes. Precision can be obtained through and recall through . Large precision means the high accuracy of the method and large recall means more extremes are predicted. While precision and recall are two competitive measures of performance, we use F1 score as the major measurement. F1 score is defined as follows:

3.3. Experimental Results

First, we take the SSE index case as an example to illustrate the prediction process. For each observation time window, we can obtain an indicator according to the equations listed in Table 2. Figures 3 and 4 show the distribution of the indicators based on various methods for the peak and trough point, respectively. The yellow bars represent the indicator value, and the blue dots are the index’s price (log) series. It should be noted that the indicator for the first year cannot be calculated, as the window size is equal to 262 (approximately 1 year). According to both figures, there will always be a significantly large indicator before the peak (or trough) points, which shows that all the indicators are valid for predicting the extremes. However, comparing the indicators based on node degree (VG, Figures 3(a) and 4(a)) and edge weights (WVG, Figures 3(b) and 4(b)), the indicator values based on the edge weights more clearly detect the significant indicators, in which most of the indicators are rounded to 0 and very few indicator values are very large, which are intimately related to the peak (or trough) points.

We focused on a particular extreme (financial crash) during 2014 to 2016 in the SSE as an example to show the interaction of the extreme events and indicators. Figure 5 shows the partial process of the formation and collapse of the corresponding SSE index bubble. In late November 2014, the SSE index began to rise gradually due to macroeconomic expectations and loose monetary policy. During December 2014 to January 2015, the SSE index rose from 2680 to 3210 (nearly 20%), which is obviously a faster-than-exponential growth of prices. Thus, we can confirm that a bubble was forming. According to Figure 5(a), it can be seen that the fluctuation of the peak indicator increased sharply in this period, and the maximum value of the peak indicator appeared on December 8, 2014. After the peak indicator reached the maximum, the SSE index continued to rise, and the stock market risk was further increased. Meanwhile, the financial regulatory authorities took some more stringent measures, and the increase stopped at 5166 on June 12, 2015. In the following two natural months, the SSE index fell by more than 42%, a faster-than-exponential decrease. There was a significant negative bubble at this stage. The trough indicators constructed in this work also fully reflect the process. As shown in Figure 5(b), in the negative bubble stage, the trough prediction indicator increased rapidly. In late August 2015, the trough indicator fluctuated sharply. On August 26, the lowest point was 2927.29 and the trough indicator reached a corresponding minimum. From Figure 5(a), we note that the peak indicators also decreased during the negative bubble process, but the changes of the trough indicators are more sensitive.

To test the performance of the proposed method, we show the precision, recall, and F1 score for peak and trough prediction in Figure 6. For the accumulated methods, the number of neighbors is set as . The horizontal axis represents different thresholds of the indicators, where the indicator with a value larger than the threshold indicates peak (or trough) points in the following 45 trading days. As the indicator via different methods shows a significant difference (according to Figures 3 and 4), it would be difficult to use the concrete values to interpret the threshold. Here, we use the percentages to represent the threshold in Figure 6. For example, the top 20% indicates that the top 20% indicators are treated as the extreme indicators. For various thresholds, we can observe that the precision increases with increasing threshold (Figures 6(a) and 6(d)) because of too many false-positive samples with a small threshold. A similar phenomenon has also been discovered in other forecasting scenarios, such as recommendation systems [29] and link prediction in social networks [30]. It is interesting to find that the recall is very high (nearly 100%) even with very high threshold of the indicator value, which means that almost all the real extremes (peak and trough points) can be forecast by the indicators. According to Figure 6, one can find that the indicators, through accumulated weights on WVG (red bars, (or )), are more accurate than the other methods both in the prediction of peak and trough points. In addition, for various thresholds, the improvements are still robust.

Figure 6 indicates that the indicators based on the accumulated weight on WVG are the best way to predict the extremes on the SSE index series. For the calculation of (or ), we must set the number of considered neighbors (n). Figure 7 illustrates the influence of n on the prediction accuracy, and the bar represents the F1 score. It should be pointed out that is just (or ) that does not consider the neighbors’ influence. Figure 7 shows that the F1 score is very different between and , but the increment varies slightly with increasing when . This indicates that the influence of n is not very significant, but considering the neighbors’ influence is very important.

We check the performance of the proposed indicators on 12 major financial indices. Figures 8 and 9 present the F1 scores for the four indicators of peak and trough points, respectively. Similar to the result of the SSE, the indicators based on the accumulated weight WVG method significantly outperform the others on all 12 datasets.

Among the previous experiments, the parameters are always unchanged (a = 45, b = 131, and S = 262). In order to test the influence of the parameters, we choose different parameter combinations, where a = 20, 30, 40, 50, 60, 70, 80, and 90, b = 90, 125, 160, 195, 230, 265, 300, 335, and 370, and S = 101, 131, 181, 221, 262, 350, and 400, to calculate the F1 score on the SSE data. Figures 1013 illustrate the influence of the combination of a and b and a and S for peak and trough prediction, respectively. And the color represents the F1 score. The influence of the parameter is very significant in the whole range. However, if we focus on the area where , , and , the variance of F1 score is very slight, and the values in this area are also much higher. Additionally, the results based on AW also outperform the other methods in most cases. Similarly with the AUC, we also calculate the area under PR curve as a measure to compare the performance of the four methods, in which a larger area below the curve indicates both greater precision and higher recall. And Figure 14 shows the area distributions. The results show that the proposed methods (AW) perform better than others in most situations.

4. Conclusions

Financial market extremes attract much attention due to their correlation to financial bubbles and crashes. Owing to the extreme complexity of financial markets, phenomenological investigation of stock price data plays a crucial role in gaining a better understanding of financial dynamics. In this work, we aimed to predict the financial extremes from the complex network perspective based on stock indices. The financial extremes are defined as the peak (or trough) points in a long period in the stock market in this work. We proposed indicators according to the accumulated weight of the WVG of the stock price series. Experimental results on 12 major stock indices indicate the strong predictive power of the indicators, which would be an effective indicator for investors to use to adjust their strategies.

Data Availability

The data (12 stock index prices) used in this work can be accessed from the following online address: https://finance.yahoo.com/world-indices.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This study was partially supported by the Zhejiang Provincial Natural Science Foundation of China (grant nos. LR18A050001 and LY18A050004) and Natural Science Foundation of China (grant nos. 61873080 and 61673151).