Complexity

Volume 2019, Article ID 4132485, 12 pages

https://doi.org/10.1155/2019/4132485

## Stock Price Pattern Prediction Based on Complex Network and Machine Learning

Business School, Sun Yat-sen University, Guangzhou 510275, China

Correspondence should be addressed to Hongduo Cao; nc.ude.usys.liam@dhoac and Ying Li; nc.ude.usys.liam@yilsnm

Received 7 March 2019; Accepted 14 May 2019; Published 28 May 2019

Guest Editor: Benjamin M. Tabak

Copyright © 2019 Hongduo Cao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Complex networks in stock market and stock price volatility pattern prediction are the important issues in stock price research. Previous studies have used historical information regarding a single stock to predict the future trend of the stock’s price, seldom considering comovement among stocks in the same market. In this study, in order to extract the information about relation stocks for prediction, we try to combine the complex network method with machine learning to predict stock price patterns. Firstly, we propose a new pattern network construction method for multivariate stock time series. The price volatility combination patterns of the Standard & Poor’s 500 Index (S&P 500), the NASDAQ Composite Index (NASDAQ), and the Dow Jones Industrial Average (DJIA) are transformed into directed weighted networks. It is found that network topology characteristics, such as average degree centrality, average strength, average shortest path length, and closeness centrality, can identify periods of sharp fluctuations in the stock market. Next, the topology characteristic variables for each combination symbolic pattern are used as the input variables for K-nearest neighbors (KNN) and support vector machine (SVM) algorithms to predict the next-day volatility patterns of a single stock. The results show that the optimal models corresponding to the two algorithms can be found through cross-validation and search methods, respectively. The prediction accuracy rates for the three indexes in relation to the testing data set are greater than 70%. In general, the prediction ability of SVM algorithms is better than that of KNN algorithms.

#### 1. Introduction

Stock price volatility patterns classification and prediction is a very important problem in stock market research. The prediction of stock price trends is actually a classified prediction of stock price fluctuation patterns [1]. Literature showed that forecasting stock price patterns is sufficient to generate profitable trades and enable the execution of profitable trading strategies [2]. Therefore, many studies have focused on predicting stock price patterns rather than predicting the absolute prices of stocks [2–4].

To date, most studies have focused on the volatility patterns of a single stock based on its own historical attributes [5, 6] and have paid less attention to the comovement of related stocks and information pertaining to the overall market. A few studies have used historical information regarding related stocks as the input variables for prediction and shown that the price fluctuations in a single stock are not isolated and are often influenced by the trends of multiple related stocks [7, 8]. Thus, how to extract the comovement of multiple stocks and apply this information to the prediction of the fluctuation patterns of a single stock is a problem worth studying.

Complex network analysis provides a new explanation for stock market behavior from a systematic perspective. Using complex network theory to study stock prices not only allows us to analyze the relationship between different stocks, but also allows us to explore the macroaspects of the comovement characteristics of the market in different periods [9–11]. Previous studies have proposed a variety of methods to build complex networks using the time series of stock prices, including visibility graphs [12–14], recurrence networks [15–17], correlation networks [11, 18, 19], pattern networks [10, 20], and K-neighbors networks [21, 22]. Of all the network construction methods, the symbolic pattern network is favored by many scholars because it can more accurately reflect the degree of correlation and direction of the primitive elements in a complex system [10, 20, 23, 24]. In a stock price volatility pattern network, each volatility pattern is regarded as a network node, and the relationship between patterns is regarded as a connection between nodes [10]. By analyzing the topological properties of the network, the characteristics of stock price fluctuations can be better understood. Huang et al. used coarse-grained symbolization methods to construct a network of market prices and transaction volume data in different periods based on the Shanghai Stock Exchange (SSE) composite index, and the results showed that the out-degree distribution of network nodes obeyed the power law and the basic fluctuations exhibited different patterns during different periods [24]. Wang et al. converted the yields of gasoline and crude oil stocks into five patterns and studied the characteristics of crude oil and gasoline node networks in different periods using sliding windows and then accurately predicted the crude oil and gasoline stock price pattern based on the conversion characteristics of the price network [10, 20].

However, most of the existing studies on stock price volatility pattern networks have focused on univariate time series. On this basis, we propose a new network construction method to build the volatility pattern networks of the three most important indexes in the US stock market, namely, the Standard & Poor’s 500 Index (S&P 500), the NASDAQ Composite Index (NASDAQ), and the Dow Jones Industrial Average (DJIA). Firstly, the combination symbolic patterns for the three stock indexes are derived using a coarse-grained method. Then, the combination symbolic patterns are used as the nodes of the network, and the frequencies and directions of the conversion of the patterns are used as the weights and directions of the network connections. Finally, we construct directed and weighted networks for the US stock market. By analyzing the network topology properties, we can identify periods of sharp fluctuations in the market.

Meanwhile, many machine learning algorithms have been applied to stock price volatility classification and prediction, such as neural networks [25], random forests [26], decision trees [27], support vector machines (SVM) [3, 7], and K-nearest neighbors (KNN) [1, 28]. Among them, K-nearest neighbors (KNN) and support vector machine (SVM) algorithms have been widely used in pattern recognition and forecasting, machine learning, information retrieval, and data mining. KNN is a simple and effective classification method that is easy to calculate and its performance is comparable to the most advanced classification methods [29, 30]. SVM, which can map nonlinear separable data into high-dimensional space and use hyperplanes for classification, is highly suitable for small sample classification because of its excellent classification ability [26]. Both KNN and SVM algorithms have a mature theoretical basis in relation to classification prediction. Ballings et al. also compared the accuracy of SVM, KNN, and other algorithms in predicting stock price movements one year ahead for 5767 publicly listed European companies, and the results showed that SVM has the better prediction ability than KNN [2]. Teixeira proposed an automatic stock trading method that combined technical analysis with KNN classification. Using 15 stocks from Sao Paulo Stock Exchange (Bovespa), they found that the proposed method generated considerably higher profits than the buy-and-hold method for most of the companies, with few buy actions generated [1]. Huang et al. used SVM algorithms to predict the weekly fluctuations in the Nikkei 225 index and found that SVM outperformed the other classification methods, such as quadratic discriminant analysis and Elman backpropagation neural networks [3].

Literature has demonstrated the ability of SVM and KNN to predict stock patterns. However, they predicted the stock price based on the information of the single stock itself, without considering the information of the network system composed of the relevant stocks. Therefore, another aim of this study is to predict the next-day pattern of a single stock for each combination mode of stocks using the network topology properties as input variables for SVM and KNN algorithms. To the best of our knowledge, this should be the first attempt in existing research. Then, we compare the prediction accuracy using the testing data set after identifying the best models using the training set. The stock price volatility pattern network includes price information for single stocks and related stocks and portrays the macronature of the market, which contains more information than is available using only historical information relating to single stocks. The results show that the pattern network can provide some information to enable us to forecast the price volatility patterns of single stocks. Of the two prediction methods, the optimal parameter search strategy combined with cross-validation and search methods enables us to find the models that perform well on the testing data set. Overall, the performance of SVM algorithms is better than that of KNN algorithms. Combining with complex network and machine learning can provide investors with information on profitability strategies.

The remainder of this paper is organized as follows. In the next section, we introduce the theoretical background for KNN and SVM algorithms. In Section 3, the methodology of constructing the network and of predicting the next-day patterns for each stock index is presented. In Section 4, we show the empirical results and compare the prediction accuracy for KNN and SVM. The last section is devoted to a summary.

#### 2. Theoretical Background for KNN and SVM

##### 2.1. KNN

K-nearest neighbor (KNN) algorithm is a nonparametric classification algorithm that assigns query data to be classified to the category to which most of its neighbors belong [31]. We use the Euclidean distance metric to find K-nearest neighbors from a sample set of known classifications. Suppose that the known data set has four feature variables and four categories . The steps to search the category of the new data through the KNN algorithm are as follows.

Firstly, the Euclidean distance of the feature variables of the data and the other data in the training data set is calculated:

Secondly, all the data in the training set are sorted in ascending order according to the distance from data* i*.

Thirdly,* K* data points with the smallest distance from data are selected.

Finally, the category with the largest proportion of these data points will be considered as the category of data* i*.

An important parameter to be determined in the KNN algorithm is* K*, which represents the number of the nearest neighbors to be considered when classifying unknown samples [1, 2].

##### 2.2. SVM

SVM was introduced by Vapnik [32] and has been widely used in pattern prediction in recent years. The basic idea of SVM is to nonlinearly transform the input vector into a high-dimensional feature space and then search the optimal linear classification surface in this feature space to maximize the distance between the classification plane and the nearest point. The training samples closest to the classification plane are called support vectors. SVM algorithm can be briefly described as follows.

Consider the binary linear classification problem of training data set , ; , where is a feature vector and is a class label. Suppose these two classes can be separated by a linear hyperplane In order to make the correct classification and get the largest classification interval, the optimization problem of constructing the optimal plane is described as

The optimal solution of and can be solved by introducing the Lagrange multiplier. Then we can obtain the optimal classification problem like (3).

For a nonlinear classification problem, the feature vector is transformed into high-dimensional space vector firstly. Then the optimal classification hyperplane is constructed. Suppose the transformation function is , then the optimal problem can be described aswhere is the penalty parameter, which specifies the trade-off between classification distance and misclassification [2]. Finally, the optimal classification hyperplane can be described in (5).

The function (6) is called a kernel function. Because the performance of the Gaussian radial basis function (RBF) is excellent when the additional information of the data is limited, it is widely used in the financial time series analysis [3]. The Gaussian radial basis function (RBF) is used as the kernel function to implement the SVM algorithm in this study. The RBF kernel function can be expressed aswhere is the constant of the radial basis function. Before implementing the SVM algorithm, the parameter* σ* and parameter need to be determined.

For multiclassification problem, it can be converted into multiple two-classification problems [33]. In this study, a four-classification problem is transferred into six two-classification problems by the “one-versus-one” approach of SVM.

#### 3. Methodology

In this section, we introduce the methodology for predicting stock price patterns using network topology characteristic variables. Figure 1 shows a general framework of the proposed pattern prediction system. It consists of two parts: complex network analysis and pattern prediction using machine learning. We present a more detail procedure in the subsections.