Abstract

Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the data mining methods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that the predictive power of a pattern varies a great deal for different shapes and each of the existing K-line patterns requires further classification based on the shape feature for improving the prediction performance.

1. Introduction

A time series is a series of observations listed in time order. It is the most commonly encountered data type, touching almost every aspect of human life [1], for example, the meteorological time series, the time series of stock prices (stock time series for short) which are composed of stock price observations, and the time series of personal health that are consisted of the observation of blood pressure, temperature, white corpuscle, and so forth.

Researches show that the time series have two import features. (a) The historical information will affect the future trend [2]. That is, the historical values of observations will exert an influence on the future values in the time series. The influence can be described by time series’ period, nonstationarity, varying volatility, and so on. (b) History repeats itself [3]. That is to say, some special time subseries will repeat in the entire time series. Because of the two features, all kinds of time series forecasting have become a present hot research, one of which is the prediction of stock time series, stock prediction for short. As a typical time series, not only have stock time series the features of time series, but also the trend of stock prices is directly related to the people’s vital interests. Therefore, stock prediction has aroused the interest of a wide variety of researchers.

There are many technical analysis methods about stock prediction, the best known of which is candlestick technical analysis that is also called K-line technology analysis in Asia. In the stock market, in order to learn and study the fluctuation of stock prices in a more intuitive way, people invent a candlestick chart (also called K-line) to represent stock time series graphically. Taking a daily K-line, for example, a K-line represents the fluctuation of stock prices in one day, it not only shows the close price, open price, high price, and low price for the day but also reflects the difference and size between any two prices (all K-lines given in the paper refer to daily K-line, unless otherwise indicated). If the K-line of a stock lists in time order, then a series used to reflect the fluctuation of the stock price for some time can be formed, which can be called K-line series. As each K-line consists of four prices, the essence of K-line series is stock series with four observations.

In K-line series, if a K-line subseries contains some knowledge used to predict stock, then this subseries is called a K-line pattern series, a K-line pattern for short. For instance, when a subseries appears, the stock price will often rise or descend. Then, this subseries is a typical pattern series. Stock prediction based on K-line patterns is the essence of K-line technology analysis. How to mine the K-line patterns and how to make use of these patterns for predicting are main research contents of K-line technology analysis.

By the artificial methods of observing the K-line series of stock market (or Japanese rice market), people (the leading character is the founder of K-line, Munehisa Honma, who was a Japanese rice trader in the 18th century) have found many K-line patterns. The literatures [4, 5] introduce the existing patterns and their features in detail, such as Three Inside Up (TIU), Three Inside Down (TID), and Doji. Some papers [610] conclude from the experiment that the existing K-line patterns have a good forecasting capability for forecasting stock trends. Some other papers [1115] have studied the stock prediction based on these patterns and have achieved some research results. However, there are also a number of papers [5, 1618] challenging these patterns’ predictive power. They argue that K-line technology analysis violates the efficient market hypothesis, so it is not feasible for stock investment based on K-line patterns. They also did some experiments, which show that the existing K-line patterns have no predictive power.

Based on the above analysis, it is obvious that there are some disputes on whether the K-line patterns have predictive power in academia. However, there are few papers analyzing the reason why there are two different positions regarding the patterns’ predictive power. Paper [19] also pays attention to the debate, while it does not analyze the K-line patterns themselves but attempts to obtain an answer to the following question: are the trend reversals accompanied more often by some types of candlesticks than by others? Finally, paper [19] has found that there exist types of candlesticks that frequently tend to appear close to the trend-reversal regions and others that cannot be found in such regions. Although the paper’s research shows that the K-line patterns exist, it does not give the answer that why there is a debate on the K-line patterns’ predictive power.

Through reviewing the relevant literatures, this paper considers that the main reason is that the existing K-line patterns are lack of rigorous mathematical definition. For example, the shadow length and body size are not defined clearly in the definition of K-line patterns, which means that a K-line pattern has many different shapes. Because the predictive power of a pattern may vary a lot for different shapes. If we ignore the shape difference and research the predictive power of a pattern by taking all patterns with various shapes as a whole instead of classifying the pattern further based on its shape feature, then the study result of K-line patterns’ predictive power may produce deviations. For instance, a TIU pattern has three shapes: shape A, shape B, and shape C, as shown in Figure 1, where shape A is the generic form of TIU pattern, and shape B and C are infrequent form of which. Suppose that shape A has predictive power, and shape B and C do not have predictive power. When studying the predictive power of TIU pattern, if we ignore the shape difference between the three patterns and research them as a whole, then we will come to the wrong conclusion that TIU pattern has no predictive power. However, if the three patterns are classified further based on shape features and researched separately, then we can get the correct conclusion that TIU pattern has predictive power only at shape A.

In addition, another reason is that, as the existing K-line patterns are mined by artificial means, there may be some spurious pattern in them.

In order to resolve the debate and verify the two inferences, this paper presents the research of K-line patterns’ predictive power using the data mining related method, such as pattern recognition, pattern clustering, pattern knowledge mining, and statistical analysis. The rest of this paper is organized as follows. In Section 2, we firstly shortly introduce K-line, K-line technology analysis, and K-line patterns. Then we define the similarity match model and nearest neighbor-clustering algorithm of K-line series. In Section 3, we define the mining method of patterns’ predictive power. Section 4 presents the experimental result and discussion. Section 5 concludes the paper.

2. K-Line and K-Line Series Clustering

Firstly, we give the mathematic definition of K-line series. Let represent the -th K-line series of any stock, and let represent t-th K-line in ; thenwhere is the number of elements in , which is also called the length of . , , , and are the t-th day’s close price, open price, high price, and low price in , respectively. In this paper, “” symbol indicates the number of elements in the set or series.

2.1. K-Line
2.1.1. K-Line Introduction

As defined in literature [46], the K-line is drawn by four basic elements: close price, open price, high price, and low price, where the part between the close price and open price is drawn into a rectangle called body of K-line and the part between the high price and body is drawn into a line called upper shadow of K-line. Moreover, the part between the lower price and body is drawn into a line called lower shadow of K-line. This kind of very personalized lines consisting of upper shadow, lower shadow, and body is called K-line.

In the K-line, if open price is lower than close price, K-line also called Yang line, the body is usually filled with white or green color, as shown in Figure 2(a). And if open price is higher than close price, K-line also called Yin line, the body is usually filled with black or red color, as shown in Figure 2(b). Moreover, if open price is equal to close price, K-line also called Doji line, the body then collapses into a single horizontal line, as shown in Figure 2(c). It is important to note that the body color of Yin line and Yang line is different in Chinese stock market and stock markets of European and American. In Chinese stock market, the body color of Yang line and Yin line is red and green, respectively. However, the body color of Yang line and Yin line is green and red, respectively, in the stock markets of European and American.

2.1.2. K-Line Technology Analysis

Firstly, we introduce and define some key concepts of K-line technology analysis. Let represent the t-th day’s K-line of any stock.

(1) Moving Average. It is the average of stock price for some time. The three-day moving average at time is defined bywhere denotes the close price of .

(2) K-Line Trend. It is used to describe the K-line’s trend, including uptrend and downtrend. is said to be a downtrend ifwith at most one violation of the inequalities. Uptrend is defined analogously.

(3) Stock Price Trend. It is used to describe the general trend of stock prices for some time, including uptrend and downtrend. If the future trend of stock price is rising, it is called bullish market. In contrast, if the future trend of stock price is descending, it is called bearish market. Moreover, a more intense rising or descending trend indicates a more typical bullish or bearish market. The capability of a K-line patter for predicting the bullish market and bearish market is defined in formulas (17) and (18), respectively.

It is noted that the concepts of “moving average” and “K-line trend” are defined by the paper [6], while the concept of stock price trend is firstly defined by the paper.

2.1.3. K-Line Patterns

Many K-line patterns have been mined up to now, as shown in literatures [46]. Limited by space, only the patterns of TIU and TID will be introduced in the next content. Let represent a three-day K-line series.

The conditions of becoming the TIU pattern are as follows:    is a downtrend, and .   , , and , where at most one of the two equalities holds. That is, the second day is Yang line and must be contained with the body of the first day.   ; . That is, the third day is Yang line and closes above the open of the first day. A standard TIU pattern is shown in Figure 3(a).

The predictive power of TIU pattern from the existing literature is that TIU is a trend-reversal pattern, which gives the bullish market signal. This means when the TIU pattern appears, the stock prices will be likely to be transferred from downtrend into uptrend or the stock market would be changed from bearish market to bullish market, and the stock prices would rise gradually.

The conditions of becoming the TID pattern are as follows: (1) is an uptrend, and . (2) , , and , where at most one of the two equalities holds. That is, the second day is Yin line and must be contained within the body of the first day. (3) ; . That is, the third day is Yin line and its close is lower than the first day’s open. A standard TID pattern is shown in Figure 3(b).

The predictive power of TID pattern from the existing literature is that TID is a trend-reversal pattern, which gives the bearish market signal. That means, after the TID pattern appears, the stock prices will be likely to be transferred from uptrend into downtrend or the stock market would be changed from bullish market to bearish market, and the stock prices would fall gradually.

2.2. Similarity Match of K-Line Series

The similarity match of K-line series is an essential and basis task for K-line series clustering. In the literature, however, there are few papers focusing on the similarity match of K-line series. Only paper [20] studies the similarity match method and search algorithm of K-line series using image retrieval technology. In addition, paper [19, 21] proposes the similarity match model of K-line series based on the traditional Euclidean distance.

From the view of stock prediction, the K-line series’ similarity refers to the trend similarity of K-line in the K-line series. However, the K-line trend is determined by the close price change, open price change, high price change, low price change, and the size relationship between close price and open price. Therefore, if we want to match the similarity between two K-line series, we should calculate the similarity of K-line price changes instead of the similarity of price values. As the changes of K-line price are not shown in the K-line chart, K-line prices distance rather than K-line price changes distance is used in the similarity match model of literature [1921]. This means that these match models belong to similarity match methods based on K-line price values rather than K-line price changes. Therefore, they cannot accurately measure the similarity of stock prices trend in the K-line series.

For example, assuming that there are two K-line series and needed to match their similarity, where and indicate their similarity. Let , , , , and indicate the close price change rate of at day , which is calculated by , denotes the similarity between and , then , and . We cannot calculate the correct result of by the similarity match model in literature [1921]. Similarly, the same problems would occur for calculating the similarity of open price, high price, or low price.

Therefore, this paper proposes a new similarity match model based on K-line price changes to measure the trend similarity between two K-line series. In this model, the similarity of K-line series is composed of two parts: one is the shape similarity of K-line, which is the similarity of the corresponding K-line’s shape features in the two K-line series; the other is the position similarity of K-line, which is the similarity of the corresponding K-line’s position features in the two K-line series. Therefore, this paper will define K-line series’ shape similarity model and position similarity model, respectively. Then based on these two kinds of similarity models, the similarity model of the entire K-line series could be built.

2.2.1. The Shape Similarity of K-Line Series

According to the shape feature of K-line, this paper proposes using the shape distance to measure the shape similarity between two K-lines. Firstly, based on the shape structure of K-line, three components of K-line shape are extracted: the upper shadow shape, the lower shadow shape, and the body shape. Secondly, the similarity match methods of three shapes are defined, respectively. Finally, the shape similarity of K-line can be calculated by summing the three shapes’ similarity. Assuming that and denote the t-th day’s K-line of and , respectively, the shape similarity model of K-line series is defined as follows:

Let denote the upper shadow length of , as defined in the following formula:where is used to normalize the upper shadow length. According to the related regulation of Chinese A-share market, the range of daily fluctuations of stock prices cannot exceed 10% of the previous day’s close price. So can be used to normalize the length of the K-line’s upper shadow, lower shadow, and body.

Let denote the upper shadow similarity between and , as defined by

Let denote the lower shadow length of , as defined in the following formula:

Let denote the lower shadow similarity between and , as defined by

Let denote the body length of , as defined in the following formula:

Let denote the body similarity between and , as defined by

Let denote the shape similarity between and , as defined bywhere , , and represent the weight of , , and , respectively.

Let denote the shape similarity between and , as defined bywhere represent the weight of . Thanks to the idea that each K-line can be given different weight, the K-line series having special shape features could be identified well.

2.2.2. The Position Similarity of K-Line Series

For computing the similarity between two K-line series, we not only consider the shape similarity of K-line series but also the position similarity. If we only consider the shape similarity, then it will cause the problem that two K-line series having same shape features but different position features will have the same similarity.

For example, supposing that the K-line series chart of and is shown in Figure 2, we can see that, according to the shape feature definition of K-line, all of the corresponding K-lines of and have the same shape features. These mean that and have identical shape features; that is, . However, as is vividly shown in Figure 4, the relative positions of and are different though and have the same relative position in the K-line series. Therefor the stock price’s overall trend of and are not identical, that is, . If we only consider the shape similarity, we will draw the wrong conclusion that .

To solve this problem, the concept of K-line coordinate is introduced hoping to implement the position match of K-line by defining K-line’s coordinate in the K-line series. In this paper, the sequence of K-line in the K-line series is called coordinate of K-line; the increase range of close price is called coordinate of K-line; in addition the first K-line’s coordinate is set to 1 in the K-line series. Therefore, the position similarity model of K-line series based on K-line coordinate is defined as follows.

Let denote the coordinate of , which are defined in the following formula:

Let denote the position similarity between and , as defined by

Let denote the position similarity between and , as defined bywhere represents the weight of . Thanks to the idea that each K-line can be given different weight, the K-line series having special coordinates could be identified well.

2.2.3. The Similarity of K-Line Series

Finally, based on the shape similarity and position similarity, the similarity of K-line series could be obtained. Therefore, the similarity match model between and is defined bywhere and represent the shape similarity weight and position similarity weight of K-line series, respectively.

2.3. Cluster of K-Line Series

The more accurate classification result of K-line patterns can be gotten by clustering them using the nearest neighbor-clustering algorithm based on the similarity match model of K-line series. The K-line series’ nearest neighbor-clustering algorithm (KNNCA) is described as shown in Algorithm 1.

Input:
  //  the data set of K-line series
  //  Similarity threshold
Output:
  //  the set of clusters
KNNCA Algorithm:
 Assign initial value for parameters: ;
;
;  //   represents the m-th cluster
;
FOR    TO    DO
  ;
   FOR EACH    IN  
    FOR EACH    IN  
     Get based on formula (15);
     If  ()
     
      ;
      ;  //   represents the ID of a cluster whose element is most similar to
     
    End
   End
  IF  ()  THEN
    ;
  ELSE
  
   ;
   ;
  
   ;

In addition, represents the number of elements in . As each K-line series will be matched once with all of the K-line series stored in the cluster, the time complexity and space complexity of KNNCA are both .

3. Mining of Patterns’ Predictive Power

We can mine and analyze the patterns’ predictive power according to the following steps.

(1) Pattern Recognition. Based on the definition of K-line patterns, we identify all the K-line series belonging to a pattern (such as TIU or TID), and then they form a set .

(2) Pattern Clustering. We use the KNSSC algorithm to cluster ; then the set of clusters can be gotten, in which different clusters represent the same pattern’s different shapes.

(3) Knowledge Mining. We define some statistical indicators about stock prices, which we use to mine stock prediction knowledge from each cluster.

The pattern’s predictive power is gotten primarily by analyzing the trend of the pattern’s consequent K-line series. Paper [22] found that K-line technology is suited for short-term investment prediction and that the most efficient time period for prediction is 10 days. Therefore, we mainly analyze the close price trend of the pattern’s consequent K-line series in 10 days. Let denote a three-day K-line pattern; its consequent K-line series is denoted by . The statistical indicators of are defined as follows.

(a) Let denote the k-th close price of , let denote that the probability of the trend of is uptrend, and let denote that the probability of the trend of is downtrend. and are calculated bywhere represents the number of patterns meeting the condition of in , represents the number of patterns meeting the condition of in , and represents the close price of . indicates that the future trend of is rising. indicates that the future trend of is descending.

(b) Let denote the probability that the close price will rise in the next days if the pattern appears. denotes the probability that the close price will fall in the next days if the pattern appears. and are calculated bywhere a higher value of or indicates a stronger capability for predicting bullish or bearish market.

(4) Analysis. Based on the statistical result, we analysis the pattern’s predictive power.

4. Experiment and Result Analysis

4.1. Experiment Data and Method

As Yahoo provides the finance stock API used to download the transaction data of Chinese stock market, the stock transaction data of Chinese A-share market in any time can be acquired based on the API. To get a representative testing data, we select the K-line series data of Shanghai 180 index component stocks over the latest 10 years (from 2006-01-04 to 2016-08-24) as the test data. Limited by space, only the TIU and TID pattern’s predictive power will be analyzed in the experiment. And the parameters of KNSSC algorithm are set as follows: , , , , , , and ().

4.2. Experiment One

The aim of the first experiment is to analyze the TIU pattern’s predictive power based on the method defined in Section 3. Firstly, based on the definition of TIU, 1516 TIU patterns are identified from the test data. Then we cluster these patterns using the KNSSC algorithm, and finally 554 clusters are obtained. We choose the top 20 clusters with the most elements to conduct statistical analysis, as shown in Table 1.

In Table 1, represents the cluster composed of 1516 TIU Patterns. Its is only 0.5 which means that it may be a spurious pattern to predict bullish market. However, after further classifying the TIU patterns, we can see that (1) , , , and so forth have a strong capability for predicting bullish market, because their both are above 0.8, (2) , , , and so forth have a moderate capability for predicting bullish market, as their are only in 0.5~0.7, and (3) , , , and so forth have a weak capability for predicting bullish market, as their entire are below 0.5. Particularly for , its is only 0.1.

By comparing the predictive power of and , as shown in Figure 5, we can see that the predictive result of is bullish market while that of is bearish market, which means that their predictive power is opposite. The result of experiment one shows that (1) the predictive power of TIU varies a great deal for different shapes and (2) to be a better pattern for predicting bullish market, the TIU pattern badly needs to be further classified, which are consistent with the expected analysis.

4.3. Experiment Two

The aim of the second experiment is to analyze the TID pattern’s predictive power based on the method defined in Section 3. Firstly, based on the definition of TID, 1498 TID patterns are identified from the test data. Then we cluster these patterns using the KNSSC algorithm, and finally 572 clusters are obtained. We choose the top 20 clusters with the most elements to conduct statistical analysis, as shown in Table 2.

Similarly, the TID pattern may be also a spurious pattern to predict bearish market because of is 0, where represents the cluster composed of 1498 TID patterns. Moreover, after further classifying the TID patterns, we can see that except for and , almost all of the clusters have a weak capability for predicting bearish market, as their entire are below 0.5 and even and still not have a higher value of , which are only 0.5. Therefore, we can consider that the TID pattern is definitely a spurious pattern, which is also consistent with the expected analysis.

4.4. Experiment Conclusion

Through the above experiment, we can draw the following conclusion. (1) The predictive power of a pattern varies a great deal for different shapes. Take TIU; for example, some shapes’ TIU patterns have a strong capability for predicting bullish market, while some others have the opposite predictive power. Therefore, to analyze the predictive power of a pattern, we should make a concrete analysis of concrete shapes. (2) There are definitely some spurious patterns in the existing K-line patterns. Therefore, in order to improve the stock prediction performance based on K-line patterns, we need to further classify the existing patterns based on the shape feature, identify all the spurious patterns, and choose the patterns having stronger predictive power to predict the stock price.

5. Conclusion

Stock prediction is a popular research field in the time series prediction. As a primary technology analysis method of stock prediction, there is different option on the stock price prediction based on K-line patterns in the academic world, though it is widely used in reality. To help resolve the debate, this paper uses the data mining method, like pattern recognition, similarity match, cluster and statistical analysis, and so forth, to study the predictive power of K-line patterns. Experimental results show that one reason for the debate is that the definition of K-line patterns is more open and lack of mathematical rigor. The other is that there are some spurious patterns in the existing K-line patterns. In addition, the method presented in the paper can be used not only to test the predictive power of patterns but also for K-line patterns mining and stock prediction. Therefore, the future works as follows. It will be a necessary and significant task to identify the entire spurious pattern using the proposed method. On the basis of the proposed method, we can research an automatic pattern mining method to discover more useful patterns for stock prediction.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The Key Basic Research Foundation of Shanghai Science and Technology Committee, China (Grant no. 14JC1402203), and the Science and Technology Support Program of China (Grant no. 2015BAF10B01) financially supported this work.