Theory and Applications of Data ClusteringView this Special Issue
Research Article | Open Access
Yang Yujun, Li Jianping, Yang Yimei, "An Efficient Stock Recommendation Model Based on Big Order Net Inflow", Mathematical Problems in Engineering, vol. 2016, Article ID 5725143, 15 pages, 2016. https://doi.org/10.1155/2016/5725143
An Efficient Stock Recommendation Model Based on Big Order Net Inflow
In general, the stock trend is mainly driven by the big order transactions. Believing that the stock rise with a large volume is closely associated with the big order net inflow, we propose an efficient stock recommendation model based on big order net inflow in the paper. In order to compute the big order net inflow of stock, we use the M/G/1 queue system to measure all tick-by-tick transaction data. Based on an indicator of the big order net inflow of stock, we select some stocks with the higher value of the net inflow to constitute the prerecommended stock set for the target investor user. In order to recommend some stocks with which this style is familiar them to the target users, we divide lots of investors into several categories using fuzzy clustering method and we should do our best to choose stocks from the stock set once operated by those investors who are in the same category with the target user. The experiment results show that the recommended stocks have better gains during the several days after the recommended stock day and the proposed model can provide reliable investment guidance for the target investors and let them get more stock returns.
In the area of stock recommendation method research , most of the research mainly focuses on the two areas: stock recommendation methods based on stock comment  and price forecasting . The former is easy to understand and master for investors. However, in such a complicated stock market, the investors do not know which one to believe among the lots of stock comments with dubious authenticity and every choice has a great risk for them. The latter method  is difficulty for investors to understand and master since the application of the latter method is relatively complex and involves a lot of profound mathematical knowledge . Given this situation, many scholars have done a lot of research on the stock recommendation .
Currently, the stock recommendation based on price forecasting relies  mainly on mathematical and statistical methods , time series model [9, 10], and machine learning model . Sonsino and Shavit  have researched a stock prediction and selection method based on unidentified historical data. M.-Y. Chen and B.-T. Chen  have proposed a stock price forecasting method based on the hybrid fuzzy time series and granular computing. Xin et al.  have given a strategy for filtering out users with similar demand characteristics by using collaborative recommendation algorithm with fuzzy clustering method, which shows excellent recommendation effect.
In present financial field, how to integrate multiple technologies , such as data mining, machine learning and herd psychology , and other nontraditional technologies, into stock recommendation has become a hot topic. Few papers use money net inflow as stocks recommendation techniques. Given this situation, we proposed an efficient stock recommendation model based on big order net inflow in the paper. At first, we divide lots of users into several categories utilizing collaborative filtering algorithm based on user fuzzy clustering . We get some stocks from the stocks once operated by those users in same category and form a prerecommended stock set. Then, we use a method based on M/G/1  to compute the net inflow amount of big order for every stock in the prerecommended stock set. From the prerecommended stock set descending ordered by the value of big order net inflow, we choose some stocks with highest value in front of the set as the last recommendation stock set for the target user.
In general, we believe that the stock rise with a large trading volume is closely related to the purchase stock of big order. In order to analyze the big order net flow of stock, we need to observe the stock trading volume and turnover. The money net flow and the money flow  are different in concept. The big order refers to the amount of each transaction over one million yuan or the volume of each transaction over fifty thousand in a single transaction. So the big order net inflow refers to the amount of money of big order buy or sell of the same stocks within a day. Most of the time, the money flow is bigger than zero, and the money net flow is less than zero. In individual cases, the money net flow is bigger than zero, and the big order net flow is less than zero. Under this situation, we proposed an efficient stock recommendation model based on big order net inflow. The new recommendation model can measure the capital and the pulsation of the stock markets and consider investors preferences and behavior characteristics; it can improve the existing deficiencies of some current stock recommendation. In addition, the new recommendation model can analyze and filter the stock with less returns in the future and improve the investment gains of investors. The experiment results show that the recommended stocks have better gains during the several days after the recommended stock day and the proposed model can provide reliable investment guidance for the target investors and let them get more investment returns.
The rest of this paper is organized as follows. Section 2 briefly reviews the definitions and theorems of fuzzy clustering and the framework of the collaborative filtering algorithm based on user fuzzy clustering. Section 3 demonstrates the method for computing big order net inflow and the framework of the proposed model. Section 4 presents the simulation experiment and empirical analyses of the proposed model; finally some conclusions are given and some future works are pointed out in Section 5.
2. Theoretical and Modeling Framework
The concept of fuzzy set  in fuzzy cluster is put forward by Zadeh in 1965. Gath and Bar-On earlier applied that theory to compute the scoring of poly graphic sleep recordings in their study . In this section, we will briefly review the definitions and theoretic of fuzzy cluster.
2.1. Fuzzy Clustering Theory
Definition 1. Defined set , a matrix with rows and columns; if , then is called fuzzy matrix . When is only 0 or 1, said is a Boolean matrix. When elements of the diagonal are all 1 in fuzzy matrix , said is reflexive fuzzy matrix.
Definition 2. If is -order square, defined , , and .
Definition 3. Defined set , and called is the transposed matrix of , where .
Definition 4. Defined set , for any , and said is the -cut matrix of the fuzzy matrix R. When , and , , said is a Boolean matrix, and said is a confidence level parameter or cut level parameter.
Definition 5. Suppose two limited discourse domains and ; then to fuzzy relation is an order fuzzy matrix, and set , where represents relevance on fuzzy relation .
Theorem 6. If is fuzzy similar matrix, then, for any natural number , is fuzzy similarity matrix.
Theorem 7. If is an order fuzzy similar matrix, then there is minimum natural number , for all natural number ; constant , , namely, fuzzy equivalent matrix . At this point, said is the transitive closure of , denoted by .
Method 8. If is fuzzy similar matrix, then the following method will solve the transitive closure , and the said method is the square self-synthesis method ().
2.2. Fuzzy Clustering Analysis
The fuzzy clustering analysis is an analyzing clustering and classification method by establishing fuzzy similar relationship of objective things based on the objective characteristics, the degree of closeness, and similarity between objective things.
In Figure 1, the fuzzy clustering processing model can be divided into four stages, namely, the data preprocessing, the data standardization, the constructing fuzzy similar matrix (FSM), and the clustering and analysis.
2.2.1. Data Preprocessing
In China, it is well known that individual investors buy or sell stocks only through a securities company, not directly from the stock exchange. In order to trade stocks, individual investors firstly have to register as a member in a securities company. According to the National Security Act, investors must have a test in the risk tolerance when investors are registering. The securities company gives them a risk test paper with fifteen questions or more. Each question has four or five options for investors to choose. In order to store and process, we use an integer value instead of the chosen answer of investors in the user risk database. In this paper, we choose twelve answers among all answers to study. There are at least 1 million records about risk information of investors in the user risk database of any securities company because there are more than 200 million stocks investors in China. So we assume that an investor risk tolerance surveying database exists, which includes investors risk tolerance surveying data. The characteristics of these data which include the following twelve contents reflect the investment style and risk tolerance of the investors. We define the set of investor risk tolerance surveying data as , where denotes the surveying data of the ith investor in the set , which is constituted by the following contents vector in a fixed order , where belongs to . Table 1 shows the detailed contents of the ith investor in the investor risk tolerance surveying database.
2.2.2. Data Standardization
We firstly compute all columns data of raw matrix , get the minimum data and the maximum data of each column, and then compress each data of matrix to using the following transformation formula. After the above standardizing process, we can get standardized matrix with standardizing data:
2.2.3. Constructing Fuzzy Similar Matrix
After obtaining standardized matrix with standardizing data, we use the direct Euclidean distance method as the similarity coefficient method to determine the similarity coefficient among investors and construct the fuzzy similar matrix. Considerwhere is a suitable choice of parameters so as to and the represents the distance between and as follows: where belongs to and belongs to . We can choose any value for . If the value is too high, parameter will be less, which will lead to increasing the accuracy of computing. Therefore, we chose an appropriate value for . In fact, we can choose zero value for which does not affect the experiment results; is equal to 1.
Thus, we can get fuzzy similar matrix of by between investors:
2.2.4. Clustering and Analysis
Since fuzzy similar matrix is a -dimensional square matrix, we can carry out transitive closure using the square self-synthesis method:where
According to the actual situation, we have to choose an appropriate value between 0 and 1 for -cut matrix of transitive closure of fuzzy matrix . Having a classification of based on , we can get equivalence classification matrix under the given value.
Thus, we will get clustering category for investors. One day, a new investor who becomes the target recommendation investor of stock recommendation system will be added to the investor risk tolerance surveying database. In order to determine which category the new investor belongs to, we utilize the fuzzy clustering method to subdivide the database into several groups based on the above -cut value.
2.3. Collaborative Filtering Algorithm
2.3.1. Nearest Neighbors Choice
Currently, the collaborative filtering algorithm is the most successful personalized recommendation algorithm. It can be classified into two categories: one is item-based collaborative filtering algorithm ; the other is user-based collaborative filtering algorithm. The former was first put forward by Sarwar et al.  that when we calculate the user similarity, first we calculate the similarity between items to select the most similar items and then predict the rating. The latter was first proposed by Goldberg et al. , which is according to the rating of target user’s nearest neighbors to predict the target item rating. This algorithm can be computed offline, shorten the time of online calculation, and increase speed of the online recommendation.
In order to improve the accuracy of the nearest neighbor investor’s choice, we will use the user-based collaborative filtering algorithm to compute the degree of similarity in behavior between other investors and the target investor in the same fuzzy cluster. Then, according to the stocks of the nearest investors and target investor, we can generate optimized stock list from the stock set calculated by fuzzy cluster algorithm and choose the top- stock to recommend the target investor.
At first, we need to calculate the degree of similarity between investors based on their risk tolerance surveying data. At present, there are several methods to calculate the similarity between investors, such as cosine-based similarity, the adjusted-cosine similarity, and Person correlation-based similarity . According to the degree of similarity between the target investor and other investors, we can generate neighbor set for the target investor . Then, we choose the top- investors as neighbor investor for the target investor based value of ordered by descending.
2.3.2. Constructing Stock Cluster Set
In order to recommend some more accurate stocks to the target investor, we subdivide the stocks into several categories as follows based on the fuzzy clustering method in Section 2.2 in this paper. Then, we can utilize fuzzy clustering analysis method to construct stock cluster set. As we know, there are dozens of attributes in each stock. We choose six important attributes of stock to construct a stock attributes database. We think the six attributes of stock are the most important attributes for stock clustering. The six attributes are the daily average gains, the daily average amplitude, the days of price rise, the net profit in last year, the daily net amount of big order, and the days with net amount of big buying order. According to the six attributes of stock and the data format like Table 1 in this paper, we construct the stock attributes database. Then, we cluster the stocks in the stock attributes database and subdivide the stocks into several categories. We can effectively distinguish different stocks between poor stocks and good stocks. Such clustering results can reflect those stocks operational characteristics and be able to provide more accurate and effective recommendation information for target investor. In generally, the stocks in same cluster have similar trading dynamic characteristic. If we have a recommended stock for target investors, we will try to find some other stocks in its stock cluster set. This can improve the accuracy of recommended stocks and reduce the difficulty of the search for other recommended stocks. In order to reduce the computational complexity of clustering the stocks, we have to guarantee that the number of cluster is less than 10. The number of cluster will vary with the changes of recommended stocks. Generally, the number of cluster varied between five and seven in experimentation.
2.3.3. Generating Recommended Stocks List
At first, we get the stocks list of the target investor, and then we get the score of each stock in stocks list of target investor and the score of same stock like in stocks list of neighbor investors. If some stocks are not in the stocks list of neighbor investors, we can forecast their scores using the following formula:where represents the target investor ’s average scores for all stocks, represents the th neighbor investor scores of the target investor , represents the average scores of neighbors investor for all stocks, and represents the th stocks forecast score of target investor . is a modified similarity formula based on the cosine Similarity; it is defined as follows:where is the th stock average scores for all investors, is the th stock average scores for all investors, represents the th neighbor investor scores for the th stock, and represents the th neighbor investor scores for the th stock.
Now we give a simple example for the process of generating recommendation stock list. Assuming that there are two stocks in the stocks list of the target investor, such as stock and stock , here we write them as a set . Then we get the stocks set of four or more neighbors, such as set , , , and , and we can select the rated top- stocks as the target investor’s extend stocks list from the above five stock set. When we set as 2, we can get the top-2 stocks set .
Secondly, in order to get the more large stock extend set of the target investor, such as stock set , we need to revise the value. If we recommend to the target investor only five stocks, we will stop revising the value when the number of stock in the above stock set is greater than or equal to five.
Finally, we calculate net inflow amount of big order for each stock of the above stock extend set recently, then we sort the five stocks in descending order according to the net inflow amount of big order and select top- stocks recommend to the target investor. If we set as three, we get the recommend stock list . Because the process of judging money inflow is more complex, we propose a new method based on M/G/1 to compute the net inflow amount of big order and we can forecast the direction of stock price movements in the future.
2.4. M/G/1 Queue System
The M/G/1 queue system  has been extensively studied for the last three decades . According to the circumstances of the securities transaction, here we review the single server queue system which behaves like the usual M/G/1 queue when the server is working. We assume that the server goes through cycles of idle and busy periods. The idle periods include two cases: one is when there is no work to do, and the other is when there is work to do but the server is on vacation. The busy periods are the times when the server is actually working on the customers of primary customers . During the busy periods of the simple M/G/1 queue system, we assume that the customers arrival time follows Poisson distribution with parameter . Let be the customer arrival rate and let be the distribution of the service times of customers arriving during busy periods; then the customers arrival time has a general distribution function . Let be the epoch at the end of the th busy period and let be the epoch beginning the th busy period. Then, . We assume that the arrival process and the service times of the customers arriving in the interval are independent of those arriving in for . Let be the work in the queue system at time . Let and , where . Then, is the work in the queue system at the end of the th busy period and is the work in the queue system immediately after the beginning of the st busy period.
Let be the th workload step in all workload process. Let and .
If , then .
If , then and .
Clearly, if , then . The process behaves like the work in a simple M/G/1 queue.
3. Proposed Method
This section consists of two subsections: Section 3.1 describes the method based on M/G/1 to compute the net inflow amount of big order and Section 3.2 describes the proposed model and method for stock recommendation.
3.1. Compute the Net Inflow Method
3.1.1. Funds Flow Theory
The flow of funds is stock movement direction actively chosen by the funds in the stock market. From the amount of perspective to analyze the flow of funds, namely, observation volume and turnover, trading volume and turnover in the actual operation is directional, to buy or sell. For both the stock market trend analysis and the operation on individual stocks, the determination of the funds flow plays a vital role, and the process of the funds flow is more complicated, not easy to grasp. The funds flow can help investors see what others are doing in the end through the index (price) change fog. For example, the index (price) of a stock rise up to a point may be driven by 10 million funds or a billion of funds, both of which have a completely different significance for investors. In general, funds flow and the trend of stock index change are very similar, but in the following two cases funds flow measure has obvious significance. One is that the day’s flow of funds and stock index change opposite. For example, the stock overall index is down throughout the day, but funds flow shows a positive net inflow of funds throughout the day. The other is that there is very big opposite between the funds flow and the stock index change. For example, the stock index rises highly throughout the day, but the actual net inflows are small or even negative which is called net outflows. When the funds flows and the stock index change is opposite, the funds flow can reflect stock actual movement direction more than the stock index change in the future.
3.1.2. Funds Flow Concepts
In order to more clearly describe the proposed method, we review some concept about funds flow as follows.
Definition 9. Funds inflow refers to the amount of active buying. It is active buying transactions where the buyer actively buys stock with price equal to or higher than the first selling price.
Definition 10. Funds outflow refers to the amount of active selling. It is active selling transactions where the seller actively sells stock with price equal to or less than the first buying price.
Definition 11. Funds net inflow refers to the amount of funds inflow minus funds outflow. If it is positive, the probability of stock price rise is higher than fall and vice versa if the net inflow is negative; then, the probability of stock price fall is higher than rise.
Definition 12. Big order refers to the buying or selling order with big amount. According to the amount of the order, we divide it into four categories: small order, general order, big order, and very big order (king order). Using the number of the order which measures it, we think that the number of big order is more than 5 million shares or more than 20 million yuan and the number of king order is more than 20 million shares or more than 100 million yuan.
Definition 13. Big order net amount refers to the difference between the amount of big buying order and the amount of big selling order. If it is positive, we call it the big order net inflow. If not, we call it the big order net outflow.
Definition 14. Tick-by-tick data refer to the single transaction during transactions. It reflects the true circumstances of the transaction process and is proprietary data of the Level-2.
3.1.3. Method Based on M/G/1
We review the classical M/G/1 queue system which has the following four characteristics .(1)The characteristic of the arrival process follows Poisson distribution with arrival rate parameter ; M indicates a Poisson process without memory.(2)Probability of the service time follows general random distribution. Let be the service time for the th customer that is an independent and random value and has general distribution function : The mean value and variance of service time are given by (3)The number of server desk is one; the arrival time and service time are independent of each other.(4)The system allows an infinite captain for the length of customers; the queue discipline is first come first serve (FCFS).
Assuming that interarrival time follows Poisson distribution with parameter in , let be the number of arrival customers in time, let be the number of customers in queue system at time , let be the number of customers after the th customer departure instant, let be the departure time of the th customer, and let be the arrival time of the th customer. If the number of customers is greater than zero at time , thenorAt this time if , thenFrom (11) and (12) the following can be obtained:Transition probability matrix of Markov Chain iswhere chain has Markov property and is given by
3.2. The Framework of the Proposed Model
In Figure 2, the process and data flow framework of the proposed model can be divided into two stages, namely, the user clustering stage and stock recommend stage. Generally, investors who have similar characteristics have similar investment interest. According to Figure 2, in order to obtain accurately the similar stocks for target users, we have to process user clustering and obtain some similar users. The selection of the clustering threshold value affects the number and size of the user category and then affects the accuracy of the stocks set which is selected from the same category user stocks, so it is critical to select the value of user clustering threshold. In the stock recommend stage, the computing of the big orders net for lots of stocks is the most important part. The stock trend is mainly driven by big order transactions. It is generally believed that stocks rise with a large volume is closely associated with big orders net amount to buy, so the stocks are generally rising in price under the trend driven by big orders net inflow which is called big orders net buying. In contrast, the stocks are generally falling in price under the trend driven by big orders net outflow which is called big orders net selling. The big orders net amount includes the big orders net buying and the big orders net selling. The traditional model uses the funds inflow and funds outflow to predict stock trend that is unsuitable for some new special situation. For example, one day the funds inflow of a stock is far greater than zero, but the big orders net inflow of the stock is far smaller than zero. If it is that case, we can predict that this stock will go to falling trend over a period of time after that day. After observing many stocks, we found that it is indeed the case. Our proposed model can solve this problem by using big orders net amount that can avoid or reduce some forecast errors and can improve accuracy of recommend stocks trend in the future. Then, according to the indicated results of big orders net inflow, we select some optimal stocks recommend to target users to buy. Correspondingly, according to the indicated results of big orders net outflow, we select some optimal stocks recommend to target users to sell.
4. Simulation Experiment
In this section, we study and compare the performance of the proposed model. In general, during the Shanghai and Shenzhen Composite Index (CSI) rise, the accuracy of the recommendation algorithm is higher, but during the CSI fall, the accuracy of recommendation algorithm is very low or even completely incorrect. Thus, in the course of falling, the experimental results can test a recommendation algorithm. In order to make the experiment results more objective and realistic, we use stock return to test whether the recommended stock has a good return from 10 to 30 days after that. For target investor, the higher the yields the investor gets, the better the effect of the recommendation model.
4.1. Data Selection
In order to examine whether the proposed model has made improvement in prediction accuracy, we select data at four different periods of the real stock market in China as the experiment data. The four periods include bottom period (2012/11/28–2012/12/04, see A in Figure 3), middle period (2012/12/17–2012/12/21, see B in Figure 3), top period (2013/02/04–2013/02/08, see C in Figure 3) of a wave with rising trend, and middle period (2013/06/17–2013/06/21, see D in Figure 3) of a wave with falling trend. The experimental data include data of three parts that are the investor user data, the CSI data, and the tick-by-tick transaction data of stock. As the user’s data are related to the user’s personal privacy, we use the data which are removed from the user’s privacy as the experimental data. The CSI and stocks data are stock market free data, but the tick-by-tick transaction data of stock are the Level-2 data and charging data.
4.2. Data Processing
In order to reduce the amount of computation and be suitable for parallel computing, we randomly select 2560 users from the investor user database to fuzzy clustering and divide those users into several categories according to the threshold of clustering. By the nature of the clustering, the value of the threshold can control the number of the user categories. The value of the threshold and the number of the user categories are inversely related. The threshold can be in the range from 0 to 1. If the threshold is set too high, we will get very few user categories. Contrarily, if the threshold is set too small, we will get much more user categories. For example, if the threshold is set to 1, we will get 2560 user categories. Clearly, if the threshold is set to 0, we will get one user categories. The complexity of clustering the users increases exponentially with the number of the user categories. In order to reduce the computational complexity of clustering the users, we have to choose an appropriate threshold. For that 2560 users data, we made several experiments to cluster that users data by constantly adjusting the value of the threshold. We can get seven user categories and get better distribution of user in Table 2 while the threshold was set to 0.6268. Certainly, it is allowed that the threshold is set higher or lower than 0.6268, but the distribution of user will become worse. Therefore, in order to achieve the purpose of the clustering users, we have to choose an appropriate clustering threshold. After making several experiments, we set the clustering threshold in this paper and then divide 2560 users into seven categories in Table 2.
After the users clustering, we use formula (7) based on the cosine similarity to calculate the similarity between the target user and the other users. We can find out the nearest neighbor users of the target user according to the similarity value. In the similarity calculation, here there are two cases: the target user belongs to the clustered users; the target user does not belong to the clustered users. For case , the target user and his most nearest neighbors must belong to the same category of the clustered users; we only need to find out the most nearest neighbors in the same category according to the value of the fuzzy similarity matrix. For case , we calculate the similarity value between the target user and the clustered users at most based on the Binary Search Algorithm , where is the number of clustered users. In order to reduce the amount of calculation of the recommendation actions after that, we add the target user into the same category with the most clustered user. In order to make the experiment results more objective and realistic, we choose an investor user that does not belong to the clustered users as the study target in the experiment and the result of the nearest seven neighbors of the target user in Table 3.
With the complement classified for the target user, we can get the clustered users and get the stock set from those users. Here we call stock set as a prerecommended stock set. We use a method based on M/G/1 to compute the net inflow amount of big order for every stock in the prerecommended stock set. From the prerecommended stock set descending ordered by the value of big order net inflow, we choose the stocks with highest value in front of the stock set as the last recommendation stock set for the target user. Due to limited paper space, we selected five stocks in each period of Figure 3 as the research stock in Tables 4, 5, 6, and 7. If the value of any stock in the stock set is smaller than 0, we have to set that the value of is bigger and repeat the previous calculation steps. If the prerecommended stock set contains the all stock of the China Stock Market and any stock value of the big order net inflow is smaller than 0, we do not recommend any stock to the target user and recommend the target user to keep a wait state with holding money.
We do some research on some stocks that are recommended by the proposed model to analyze the maximum possible stocks return during the four different day intervals, that is 10 days, 30 days, 90 days, and 1 year, since that day when we recommend those stocks to the target user. By observing the experiment results, we found that those recommended stocks have better gains in Table 8, especially those stocks whose big order net inflow is higher proportion of trading volume of those days than the others. In addition, the stock price is the highest stock price without ex-rights or ex-dividend at the last trading day of the four corresponding day intervals.
4.3. Experimental Results Analysis
In this section, we briefly analyze and explain the experimental results. After selecting an investor user that does not belong to the clustered users as the study target user in the experiment, we calculate the similarity value between the target user and the clustered users using formula (7) based on the cosine similarity. Because the nearest neighbors of the target user all belong to category 3 of clustered users, the target user belongs to category 3 according to the nature of the clustering that the target user and his nearest neighbors belong to the same cluster. We select the nearest seven neighbors of the target user to show you in Table 3. We use a method based on M/G/1 to compute the net inflow amount of big order for every stock in the prerecommended stock set selected from those users. We select five stocks with higher value of big order net inflow during the period between 2012/11/28 and 2012/12/04 as the recommended stocks for the target user in Table 4. For simplicity, we sort the stocks using big order net inflow daily mean ratio from large to small and select five stocks to recommend the target user.
Table 4 shows the daily big order net flow, ratio, and money for the five recommended stocks. In order to ensure the data neat and objective, we calculate the daily big order net flow for every stock at continuous five trading days and compute the average of them. Combining the big order net money inflow, we use the big order net inflow ratio as first evaluation criterion for the stocks. From the data in Table 3, we find that the number of days of big order net money inflow is generally greater than the number of days of big order net money outflow. Although there are four days of big order net money outflow in the stock of Faw Car, the big order net money inflow of the fifth day is greater than the sum of big order net money outflow of the previous four days. For that reason we believe that the stock will rise in the future. Generally, the big order net inflow only affects the short-term price of stock and shows that some investors are upbeat about the stock in the near future. The short-term price of stock may be consistent with how much the big order net money inflow is, but the long-term price of stock may not be fully consistent with it. As we all know, the long-term price of stock may be consistent with company’s development, company’s profitability, stock trader’s goal and skill, and so on. We can see the above phenomenon from Table 8. Correspondingly, we can find that similar characteristics exist in Tables 5, 6, and 7.
Table 8 shows the recommended stocks maximum possible gains during the four different day intervals. In order to conveniently compare with the recommended stocks, we put the corresponding data of Shanghai Composite Index on the bottom of the data of the recommended stocks. Due to a similar trend of CSI 800 and Shanghai Composite Index, we chose Shanghai Composite Index as CSI in Table 8. If the highest price of the stock or CSI is equivalent in two or more continuous periods, this shows that the price of the stock or CSI is smaller than the previous price of that one and shows that the stock has been in decline during the following period. For example, the highest price of the CSI of the “highest price within 90 days” is equal to the highest price of CSI of the “highest price within 1 year”; they all are 2444.80 and we find that the running range of CSI is the interval from C to D in Figure 3. From Table 8, we find that the maximum possible gains of most of the recommended stocks are bigger than the CSI. Without a doubt, the target investor cannot get the same gains with the maximum possible gains because the time of buying or selling stock is uncertain for the target investor. It is easy to understand that the recommended stocks can get better gains between A period and C period in Figure 3. But it is very difficult that the recommended stocks can get better gains between C period and D period or after D period since the CSI has been going down or fluctuant. They fully illustrate the effectiveness of the proposed model.
In order to show clearly the return of the recommended stocks, we choose the C and D period in Figure 3. For that is the start period and middle period of the CSI fall or fluctuant, the general stocks will fall or fluctuate with the CSI. If the recommended stocks can get better returns in the falling period, it clearly shows that the proposed method is excellent. We select two last stocks from the recommended five stocks in the C and D period. Figures 4 and 5 show the return comparing the two stocks with the CSI in the 20 weeks after recommending those stocks. Because the CSI market was closed for a holiday, Figure 4 shows only 18 trading weeks and Figure 5 shows only 19 trading weeks. Observing the return of the two stocks during the 20 weeks and comparing with the CSI over the same period, the two stocks get the better returns and it is far better than the CSI. The stock of Victory Precision (002426) gets the 107.96% best returns at the 17th trading week in Figure 4 and the stock of Newcapec (300248) gets the 49.21% best returns at the 10th trading week in Figure 5. Certainly, we cannot get the best return in practice, but the experienced investors will get far better returns than the CSI by purchasing the recommended stocks.
Figure 6 shows that different stock recommendation models will bring different stock returns. We compute the mean return rate of 18 trading weeks for 20 stocks in two different stocks sets which were recommended by the proposed method and the traditional method in the above four different periods and Figure 6 shows the result. Assuming that the target user bought the frontal stocks in the recommend stocks set, we find that the target user with appropriate operating will get far better stock return rate as the top line indicated in Figure 6.
In Section 4.1, we selected data at four different periods of the real stock market in China. Each of the periods is very small and has only five trading days. What is the reason for validating the recommended stocks in experimentation on larger period with one to eighteen weeks in Figures 4 and 5? As been well known, it is easy for buying stocks with big order, but it is very difficult for selling stocks with big order. The investors which can buy or sell stocks are generally institutional investors. The institutional investors mainly refer to a number of financial institutions, including banks, insurance companies, investment trust companies, credit unions, national bodies to establish a pension fund, and other organizations.
It is very difficult or impossible that the institutional investors can sell out the stocks within a few days after buying lots of stocks with big order. There may be two cases. The first case is that the institutional investors want to sell out all stocks in the next period of day. In general, selling out all the stocks required time is two to four weeks depending on the amount of their holding stocks which is generally more than the amount of buying stocks in those days we select data. The other case is that the institutional investors want to buy and not to sell some stocks in the next period of day. For example, this day when we select data is the starting period of buying stocks for the institutional investors. After some days, the amount of their buying stocks reached their desired amount and the institutional investors are no longer buying stocks. They want to sell out all the stocks when the stocks reach a certain range of price. In general, the time is twelve to sixteen weeks from buying stocks to selling out stocks for the institutional investors. In fact, nobody knows when they buy and when to sell out. In order to validate the proposal, we make the experimentation at the large period from one to eighteen weeks after our recommend date. As we see in Figures 4 and 5, the stock of Victory Precision (002426) and the stock of Kingsun Optoelectronic (002426) get, respectively, the best returns at the 17th trading week and at the 14th trading week in Figure 4 and the stock of Newcapec (300248) gets the better returns at the 4th trading week and it gets the best returns at the 10th trading week in Figure 5. Certainly, we cannot get the best return in practice, because nobody knows the highest price of the stocks in certain period and when they ought to sell out the stocks.
5. Conclusions and Future Works
In this paper, we proposed an efficient stock recommendation model based on big order net inflow. The proposed stock recommendation model based on big order net inflow can not only filter some low investment value stocks and improve the prediction accuracy in order to recommend some more investment value stocks to the target users and get more stock returns, but also improve the speed of compute by using some advanced algorithms, such as Binary Search Algorithm, and satisfy the investment desire of the investors in real life. From the experimental results, we found that the proposed stock recommendation model has a better performance than the model based on the money flow, and then it can filter the low or negative investment returns stocks and improve the investment gains for target investors.
In the future, the next work is to study how to improve the accuracy of recommendation model. We will reduce the impact of the model on the influence of corporate events and rumors of trader and apply the model to more stock markets or future markets in the different country. We think that there are three aspects worth studying. Firstly, we can improve model performance with machine learning algorithms, time series analysis or latest technology of fuzzy cluster, and so on. For example, we may improve the accuracy thought training a large number of historical stocks data sets from different markets and countries using the machine learning algorithms. Secondly, we may improve the accuracy of the clustering set for stocks or users through selecting appropriate attributes of them or adding weights for every attribute. Finally, we may study how to find the big order transactions in real trading process, because some institutional investors may subdivide the big order into many random small orders by quantitative trading system software; it is very difficult for us.
Conflict of Interests
All the authors of this paper declare that they have no conflict of interests in connection with the work submitted.
This work is supported by the National Natural Science Foundation of China (no. 61370073), the National High Technology Research and Development Program of China (no. 2007AA01Z423), Sichuan Province Science and Technology Support Program (no. 2013GZX0165), the Constructing Program of the Key Discipline in Huaihua University, the Scientific Research Fund of Hunan Provincial Education (nos. 12C0840 and 14C0886), the Scientific Research Fund of Huaihua University (nos. HHUY2012-15 and HHUY2011-17), the Science and Technology Plan Projects of Huaihua City, and the Key Laboratory of Intelligent Control Technology for Wuling-Mountain Ecological Agriculture in Hunan Province (no. ZNKZ2014-9).
- J. Duan, H. Liu, and J. Zeng, “Posterior probability model for stock return prediction based on analyst's recommendation behavior,” Knowledge-Based Systems, vol. 50, pp. 151–158, 2013.
- E. J. de Fortuny, T. de Smedt, D. Martens, and W. Daelemans, “Evaluating and understanding text-based stock price prediction models,” Information Processing & Management, vol. 50, no. 2, pp. 426–441, 2014.
- L. Huo, B. Jiang, T. Ning, and B. Yin, “A BP neural network predictor model for stock price,” in Intelligent Computing Methodologies, D. S. Huang, K. H. Jo, and L. Wang, Eds., Lecture Notes in Computer Science, pp. 362–368, Springer, Berlin, Germany, 2014.
- A. A. Adebiyi, A. O. Adewumi, and C. K. Ayo, “Comparison of ARIMA and artificial neural networks models for stock price prediction,” Journal of Applied Mathematics, vol. 2014, Article ID 614342, 7 pages, 2014.
- K. Miwa and K. Ueda, “Slow price reactions to analysts' recommendation revisions,” Quantitative Finance, vol. 14, no. 6, pp. 993–1004, 2014.
- T. Geva and J. Zahavi, “Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news,” Decision Support Systems, vol. 57, no. 1, pp. 212–223, 2014.
- C. S. Han and C. Y. Wan, “Future stock price predicting system for use in enterprise, has future prediction compute server setting up weighted value and producing virtual total amount of market price information based on basis predictive model,” KR2014120416-A, to Cs Co Ltd, Korea Advanced Institute of Science & Technology, 2014.
- N. C. Brown, K. D. Wei, and R. Wermers, “Analyst recommendations, mutual fund herding, and overreaction in stock prices,” Management Science, vol. 60, no. 1, pp. 1–20, 2014.
- M.-Y. Chen, “A high-order fuzzy time series forecasting model for internet stock trading,” Future Generation Computer Systems, vol. 37, pp. 461–467, 2014.
- B. Q. Sun, H. F. Guo, H. R. Karimi, Y. Ge, and S. Xiong, “Prediction of stock index futures prices based on fuzzy sets and multivariate fuzzy time series,” Neurocomputing, vol. 151, no. 3, pp. 1528–1536, 2015.
- J. Patel, S. Shah, P. Thakkar, and K. Kotecha, “Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques,” Expert Systems with Applications, vol. 42, no. 1, pp. 259–268, 2015.
- D. Sonsino and T. Shavit, “Return prediction and stock selection from unidentified historical data,” Quantitative Finance, vol. 14, no. 4, pp. 641–655, 2014.
- M.-Y. Chen and B.-T. Chen, “A hybrid fuzzy time series model based on granular computing for stock price forecasting,” Information Sciences, vol. 294, pp. 227–241, 2015.
- Z. Xin, Z. Ma, and M. Gu, “Fuzzy clustering collaborative recommendation algorithms served for directional information recommendation,” Computer Science, vol. 34, no. 9, pp. 128–130, 2007.
- L. Chapple and J. E. Humphrey, “Does board gender diversity have a financial impact? Evidence using stock portfolio performance,” Journal of Business Ethics, vol. 122, no. 4, pp. 709–723, 2014.
- L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, no. 3, pp. 338–353, 1965.
- Q. Xu, “Continuous time M/G/1 queue with multiple vacations and server close-down time,” Journal of Computational Information Systems, vol. 3, no. 2, pp. 753–757, 2007.
- A. Frazzini and O. A. Lamont, “Dumb money: mutual fund flows and the cross-section of stock returns,” Journal of Financial Economics, vol. 88, no. 2, pp. 299–322, 2008.
- I. Gath and E. Bar-On, “Computerized method for scoring of polygraphic sleep recordings,” Computer Programs in Biomedicine, vol. 11, no. 3, pp. 217–223, 1980.
- M. Hong-Wei, Z. Guang-Wei, and L. Peng, “Survey of collaborative filtering algorithms,” Mini-Micro Systems, vol. 7, pp. 1282–1288, 2009.
- B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the 10th International World Wide Web Conference, pp. 285–295, May 2001.
- D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, “Using collaborative filtering to weave an information tapestry,” Communications of the ACM, vol. 35, no. 12, pp. 61–70, 1992.
- H. J. Ahn, “A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem,” Information Sciences, vol. 178, no. 1, pp. 37–51, 2008.
- G. Choudhury, “Some aspects of M/G/1 queue with two different vacation times under multiple vacation policy,” Stochastic Analysis and Applications, vol. 20, no. 5, pp. 901–909, 2002.
- B. T. Doshi, “Conditional and unconditional distributions for M/G/1 type queues with server vacations,” Questa, vol. 7, pp. 229–252, 1990.
- H. W. Lee, “M/G/1 queue with exceptional first vacation,” Computers & Operations Research, vol. 15, no. 5, pp. 441–445, 1988.
- K. K. Leung, “On the additional delay in an M/G/1 queue with generalized vacations and exhaustive service,” Operations Research, vol. 40, supplement 2, pp. S272–S283, 1992.
- A. Hatamlou, “In search of optimal centroids on data clustering using a binary search algorithm,” Pattern Recognition Letters, vol. 33, no. 13, pp. 1756–1760, 2012.
Copyright © 2016 Yang Yujun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.