Abstract

A stock price is a typical but complex type of time series data. We used the effective prediction of long-term time series data to schedule an investment strategy and obtain higher profit. Due to economic, environmental, and other factors, it is very difficult to obtain a precise long-term stock price prediction. The exponentially segmented pattern (ESP) is introduced here and used to predict the fluctuation of different stock data over five future prediction intervals. The new feature of stock pricing during the subinterval, named the interval slope, can characterize fluctuations in stock price over specific periods. The cumulative distribution function (CDF) of MSE was compared to those of MMSE-BC and SVR. We concluded that the interval slope developed here can capture more complex dynamics of stock price trends. The mean stock price can then be predicted over specific time intervals relatively accurately, in which multiple mean values over time intervals are used to express the time series in the long term. In this way, the prediction of long-term stock price can be more precise and prevent the development of cumulative errors.

1. Introduction

Stock price is a typical and complex type of time series data. The prediction of stock prices has been an active area of research in econometrics, signal processing, pattern recognition, and machine learning for some time. Stock traders and investors are extremely interested in stock market prediction because of the considerable profits that can be reaped by trading stocks. Traditionally, the basic methodology for financial time series has been statistical methods such as autoregressive and moving average model (ARMA), autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH), which require the linear variation in the stock prices to remain stationary. In general, the statistical models cannot adapt to changes in the process. Accordingly, traditional statistical methods cannot predict stock performance very well when tracking the complexity of the stock markets [1].

Recently, many machine learning systems have been used to predict stock prices. These include artificial neural net (ANN) [2, 3], Bayes networks [4], genetic programming [5], support vector regression (SVR) [6], user analysis [7], sentiment analysis [8], and hybrid networks [915]. Accordingly, machine learning methods can be used to track the complexity and nonstationary nature of the stock markets in short-term prediction. These methods predict long-term trends only with great difficulty. Existing methods of long-term stock prediction mainly include the following: use of the recursive iteration prediction to obtain the long term prediction trend [12]; however, this method involves accumulative error, and the cumulative error increases with the number of steps in the prediction process. By using a moving window algorithm to delete older data and take in new data, the prediction model can be updated in sequence [6]. The length of the moving window also has considerable influence on the accuracy of the modeling process. This system can only show the mean stock price during the prediction interval and cannot show the details of changes in the stock trend during this interval. In addition, the present methods use the mean value directly as a feature of the stock trend prediction. Regarding the fluctuations in larger time series, the mean values of the interval weaken the fluctuation characteristics of the time series and reduce the long-term accuracy of the forecast. For these reasons, the prediction of stock price is still a worthwhile issue.

Long-term time series forecasts have other applications, such as the host load prediction. Load prediction is crucial to efficient resource utilization in dynamic cloud computing environments. Di et al. used the exponentially segmented pattern (ESP) [16] to predict the host load in the cloud. They proposed the use of 9 different features to characterize the recent load fluctuation in the evidence subinterval. They were able to predict the mean load over consecutive time intervals.

In this paper, the exponentially segmented pattern (ESP) is used to predict the fluctuation of stock price over consecutive future time intervals. While we give a new feature of stock price in the subinterval, namely, interval slope to characterize the stock price fluctuation over a set period. The interval slope can be used to determine the mean of stock in the subinterval. The support vector regression and the Bayes classifier were used to predict the stock price trend and verify the effectiveness of the interval slope of the stock price in the subinterval.

In this paper, the following contributions are made:(i)The exponentially segmented pattern (ESP) is here used to predict fluctuations in different stock data over a long period and can accurately predict not only mean stock price over a future time interval but also the mean stock price over consecutive future time intervals. In this way, the prediction of long-term stock price can be more precise and the generation of cumulative errors can be prevented.(ii)The use of new features of stock pricing in the subinterval, namely, interval slope, is here proposed to better characterize the stock price fluctuation over some time period.

The rest of the paper is organized as follows. In Section 2, the long-term stock price prediction model is introduced. In Section 3, experiments and comparisons of different models are made. Conclusions are given in Section 4.

The predictive objective is to predict the fluctuation of opening price over a long period. Multiple precise mean values over time interval are used to express the time series long-term trend.

The proposed stock price trend prediction model involves the following three steps: first, using the ESP principle, the estimated data segment is split into a set of consecutive segments, whose lengths increase exponentially. The interval slope is used to describe the features of each interval. Then, machine learning methods, SVR and MMSE-BC, were used to produce the transformation model of the data which is used to predict the mean stock price for the next interval. Multiple precise mean values over time interval are used to express the long-term trends in the time series. In this way, the prediction of stock price in the long term can be performed precisely without generating cumulative errors.

2.1. Exponentially Segmented Pattern (ESP) and Transformation of Segments

The objective of the current work is to predict trends in the patterns of fluctuation of stock price over the consecutive future time intervals. The most important step of the proposed stock price trend pattern prediction is that the estimated data segment is split into a set of consecutive segments by ESP principle, whose lengths increase exponentially. An example of ESP is shown in Figure 1. At a current time point , the estimated data segment is split into a set of consecutive segments whose lengths increased exponentially. The length of each following segment was , where . For each segment over the consecutive future time intervals, the mean values were denoted by , where .

However, the mean stock price over the consecutive time intervals is hard to predict, and the mean stock price over a single future time interval is easy to predict. The length of each following segment is , where . The mean predicted stock price of each time segment is given here as , where .

A set of the mean stock prices for a single time interval is then available. The aim of the current work was to predict the mean stock price over the consecutive future time intervals (). In fact, the vector can be converted from the vector through the following induction according to a previous work [16].

Suppose that the current moment is , and the user has already predicted two mean stock prices (shown by and , the solid red line segment) over two different intervals ( and ). Then, for the two shaded areas, which are of equal size, the mean stock price in can be derived. The transformation is given in Here, is the predicted mean stock price in the new segment corresponding to the black line segment in Figure 1.

Taking into account and , (1) can be simplified further, producing

2.2. Features of Fluctuations in Stock Price

The aim of the current work is to predict the mean stock price over a future time interval () starting at the current time . Here, every future time interval is called a subinterval. Based on the features of stock price trend, a new feature of the stock price in the subinterval was proposed here. This feature is here called the interval slope, and it can be used to characterize the fluctuations in stock price fluctuation over a specific time period. The time series is denoted in the subinterval as , where , and is the sample stock price in the subinterval. For example, the subinterval is 4 days (i.e., ), whose features for stock price fluctuation are shown in Figure 2.

Interval Mean Price. is the mean stock price when the conditions of the subinterval equal :

Interval Last Price. The last price is the most recent price value in the subinterval.

Interval Slope (). interval slope is the slope of linear equation in the condition of the subinterval equals .

First, the last price and the mean price of the subinterval were computed. Then, the linear equation was used to fit the samples in the subinterval; that is,

This produces the following:Interval slope can be transformed into mean by (5).

For example, the value of can be set as follows: . The feature of stock price fluctuation based on , history data, and predicted data are shown in Figure 2.

Next, future data features were predicted (the interval mean and the interval slope) through learning history data features. The prediction methods are presented in the next section.

2.3. Long-Term Forecasting Based on Interval Slope

To verify the effectiveness of the developed feature and assess the interval slope of stock price in the subinterval, the support vector regression and the Bayes classifier were used for long-term forecasting. Two machine learning methods, SVR and MMSE-BC, are used to produce the transform model of the data and the mean stock price is used to predict the next interval slope.

(1) Support Vector Regression (SVR). The aim of SVR algorithm was to minimize ε-sensitive errors on the subset of data, here called the support vectors. SVR algorithm uses nonlinear kernel functions in order to project initial data to a higher dimensional space and project linear classifiers from the higher space to the original space. The formulation of SVR is represented as follows:Here, is a weight vector which is used to determine the maximum margin hyper plane, the term is called a regularized term, and it should be as flat as possible. The second term is the empirical error as measured by Vapnik’s ε-insensitive loss function. is the regularization constant.

The following commonly used kernel functions are included:Linear: .Polynomial: .Sigmoid: .Radial basis function: .

(2) Bayes Classifier. The MMSE-BC has been considered the best strategy that uses Bayes method with the single feature mean load based on the evaluation type A in a previous work [16]. The MMSE-BC used here was the minimized MSE (MMSE) based Bayes classifier. It is a classic supervised learning classifier used in data mining. The formulation of MMSE-BC is represented in

It is important for the Bayes classifier to compute the prior probability distribution for the target states based on the samples and compute the joint probability distribution for each state . Then, the posterior probability was computed according to Formula (7).

2.4. Trend Prediction Model

The following trend prediction model is proposed here as a way of preventing the generation of cumulative errors. The proposed stock price trend prediction model has the following three steps: first, using the ESP principle, the estimated data segment is split into a set of consecutive segments whose lengths increase exponentially. The interval slope is used to describe the features of each interval. Then, the machine learning methods, SVR and MMSE-BC, were used to produce the transform model of the data and by which the mean stock price was predicted in the prediction of the next interval.

First, the stock opening price data were selected. Second, the time series (stock opening price data) was split into a set of a future time interval segments (), whose lengths increase exponentially. The length of following subinterval was , where . Third, the mean and interval slope were computed for every subinterval, and the feature dataset was split into training dataset and prediction dataset. Next, MMSE-BC and SVR were trained in order to produce the model parameters. For example, it can compute the prior probability and the conditional probability ( in (8)) and produce a boundary that leads to the largest margin from both sets of points in SVR and predict the mean stock price and the interval slope in prediction interval over a single interval. The interval slope must transform into the mean of the interval based on (5) because the mean values over consecutive future time intervals are used to express the long-term trends in the time series. Then, the mean values over consecutive future time intervals can be converted from the vector based on (2). At last, the mean squared error of this dataset can be calculated.

In order to evaluate the performance of MMSE-BC and SVR, the entire dataset prediction mean squared error was computed. For example, the price over the first 1000 trading days was selected for training and the price over the next 32 days was selected for prediction. The entire process follows the procedure of the trend prediction model mentioned. Then, the first 1040 trading days’ price can be learned and the next 32 days’ price can be predicted. Next, each process is to find the mean squared error of the prediction process. The process with higher prices prediction performance continues to predict the future stock price. The method of setting the time window is shown in Figure 3.

Algorithm 1 gives the pseudocode of the stock price trend prediction model.

Input: stock dataset, interval, duration of prediction
Output: CDF of the prediction MSE on different dataset and different methods
(1)Split dataset into training dataset and testing dataset
(2)for  (newdataset = dataset[ : ])/   is data number, increasing by 40  /do
(3)for  (interval = 2, 4, 8, 16, 32)  do
(4)  Determine the feature of the mean and interval slope in every interval
(5)  Predict the mean price , using SVR or MMSE-BC method in training dataset.
  /  Use the mean and interval slope as feature of the stock price trend  /
(6)end for
(7) Segment transformation based on (2): η → 
(8) Calculate the MSE of this dataset
(9)end for
(10)  Statistic 80 MSE of different dataset and plot the cumulative distribution function (CDF) of MSE

3. Experiments and Comparison

This section presents experiments of the trend prediction model on the stock open price forecasting. The trend prediction model was here shown to be able to capture the dynamics of highly nonlinear, nonstationary time series.

3.1. Evaluation Indicator

To evaluate the accuracy of these predictions, the overall mean squared error (MSE) between the predicted stock price values and the true values in the prediction interval can be calculated as follows:Here, , is the predicted mean of testing dataset, is the true mean of testing dataset, and is the total number of the segments in the prediction interval.

3.2. Method of Training and Evaluation

Eight opening stock price data samples were selected at random for these experiments: IBM, Coca Cola, Microsoft, Amazon, Sony, Kimberly-Clark, Bank of America, and Walgreens in 1999.1.1–2014.10.30.

SVR and MMSE-BC were here used to predict the trends in opening stock price, and some key parameters are listed in Table 1.

3.3. Experimental Results

The results of MMSE-BC and SVR were compared to the classic mean and the interval slope. Eight stock opening price data samples, IBM, Coca Cola, Microsoft, Amazon, Sony, Kimberly-Clark, Bank of America, and Walgreens in 1999.1.1–2014.10.30, were compared to MSE. Figures 4 and 5 show the cumulative distribution function (CDF) of MSE of different prediction methods, in which SVR’s kernel is sigmoid.

As shown in Figures 4 and 5, the interval slope curve is above the interval mean curve. This indicates that the interval slope’s cumulative probability is greater than that of the interval mean when the value of MSE is below a certain threshold. For example, the IBM interval slope’s cumulative probability was larger than the interval mean curve when the MSE value was less than 100. That means that 88% of the MSE values using interval slope were below 100, and only 52% of the MSE values using interval mean were below 100.

It is clear that interval slope’s performance was better than that of the mean as indicated by the MMSE-BC and SVR methods. In this way, the interval slope can indicate more complex dynamics, such as change trends. In contrast, the mean can smooth out the dynamic fluctuations in stock price.

As an example of prediction results, Figure 6 shows IBM stock price trend prediction, that is, = 3440–3972, by SVR based on interval slope, in which SVR’s kernel is a radial basis function.

Both the mean stock price over a future time interval and the mean stock price over consecutive future time intervals can be predicted accurately. This shows that the prediction of long-term stock price can be performed precisely without generating cumulative errors. The mean stock price over consecutive future time intervals can express future trends, such as sharp falls, slight falls, concussions, slight increases, sharp increases, falls followed by increases, and increases followed by falls. According to the prediction of the fluctuation of opening price over a long-term period, fund allocation models and trading strategies can be developed in advance.

4. Conclusion and Future Work

In this paper, ESP, which does not generate cumulative errors, was introduced and used to predict fluctuations in the opening prices of different stocks over a long period. The use of a new feature of stock price in the evidence subinterval, interval slope, was proposed to characterize the stock price fluctuation over some time period. It can be concluded that the interval slope can capture complex dynamics, such as trends in the changes in stock price.

The premise of this method of trend prediction is that future markets will change gradually rather than abruptly. The complexities of changes in stock price can greatly increase the difficulty of prediction. Future work should evaluate different learning methods and even combine different learning methods. Some new methods of evaluation should be used to evaluate the interval slope, the classic mean method, and the rate of return.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by NSFC under Grant no. 61273002, the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions no. CIT&TCD201304025, and the Key Science and Technology Project of Beijing Municipal Education Commission of China no. KZ201510011012.