A Novel AI-Based Stock Market Prediction Using Machine Learning Algorithm

M, Iyyappan.; Ahmad, Sultan; Jha, Sudan; Alam, Afroj; Yaseen, Muhammad; Abdeljaber, Hikmat A. M.

doi:https://doi.org/10.1155/2022/4808088

Scientific Programming

On this page

Abstract Introduction Literature Review Analysis Implementation Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Artificial Intelligence Approaches for Bridging Business Processes and Internet of Things and Big Data

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 4808088 | https://doi.org/10.1155/2022/4808088

A Novel AI-Based Stock Market Prediction Using Machine Learning Algorithm

Iyyappan. M,¹Sultan Ahmad,²Sudan Jha,¹Afroj Alam,³Muhammad Yaseen,⁴and Hikmat A. M. Abdeljaber⁵

Academic Editor: Zhongguo Yang

Received08 Feb 2022

Revised01 Mar 2022

Accepted07 Mar 2022

Published01 Apr 2022

Abstract

The time series forecasting system can be used for investments in a safe environment with minimized chances of loss. The Holt–Winters algorithm followed various procedures and observed the multiple factors applied to the neural network. The final module helps filter the system to predict the various factors and provides a rating for the system. This research work uses real-time dataset of fifteen stocks as input into the system and, based on the data, predicts or forecasts future stock prices of different companies belonging to different sectors. The dataset includes approximately fifteen companies from different sectors and forecasts their results based on which the user can decide whether to invest in the particular company or not; the forecasting will give an accurate result for the customer investments.

1. Introduction

The major goal of this work is to help a person invest in the stock market by providing a forecast of the closing price of stocks of various sectors. The system takes input from the user about the amount they want to invest, the duration of the investment, and how much loss or profit can the customer bear. The system uses the information given by the user and applies machine learning algorithms to come up with a solution, suggesting to the user where to invest the money for maximum profit and minimize the risk of loss. The database that is already present in the system is used to analyze the market situation and find an optimal solution. Investing in the stock market is tricky work; therefore, this project helps the user and gives them an upper hand in the process. The results are as accurate as they can be. Machine learning algorithms (MLA) work in real time and manipulate the data in real time, providing a much more efficient way to come up with the best solution. With the help of machine learning, the system recognizes the previous patterns and tries to suggest the output of what could be the future price of stock. The various algorithms that are being used to achieve this goal are the Holt–Winters algorithm (HWA) [1], recurrent neural network (RNN) [2], and recommendation system (RS) [3] to suggest the best stock to invest the money.

1.1. Stock Hypothesis

The efficient market hypothesis (EMH) is a theory based on laying out money or capital and it states that it is unachievable to beat the market because share prices that are already registered always include and depict all sets of relevant information [4]. According to EMH, trading of stocks is always done at fair value on stock exchanges [5]. Thus, it becomes unachievable for the investors to either buy stocks that are below value or sell stocks for inflated prices. This intern leads to purchasing riskier investment because it became impractical to outshout the overall market through stock selection given by experts. It is believed that it is of no importance to search for stocks that are much below the value, rather than being undervalued, and prediction of trends in market is made through either fundamental or technical analysis. Thus, this theory has become controversial and has been perceived as the cornerstone of the financial theory of the modern era. EMH has supporters based on academics pointing to a large entity of evidence, but the theory has opposers too; Warren Buffett is a great example of this as he has invariably beaten the market over the years, which contradicts what the theory states “it is impossible.” The cynics of EMH also point to a certain occurrence, such as Dow Jones Industrial Average (DJIA) [6], which is depreciated by 20% in a day, pointing to the fact that stock prices can disgrace from their fair values. The supporters of the EMH conclude that investors must invest in low-cost shares because of the haphazard nature of the market. This has been supported by the 2015 active/passive parameter study by Morningstar, Inc. In this study, a comparison of active managers' returns in all was made with related index funds and exchange-related funds. The study concludes that year after year, two groups surpassed the passive funds more than 50% of the time. These two groups were the United States small growth funds and diversified emerging market funds.

This work uses three main concepts for forecasting results. The first one is for stocks that show periodic change throughout the season, the Holt–Winters triple exponential smoothing (HWTES). Three basic things are taken into consideration by this algorithm: base level (BL), trend level (TL), and seasoning factor (SF). Their values are calculated by our experimental result and all of these factors are decomposed using the Holt–Winters algorithm. The second concept is RNN. The specific model of RNN [7] that is being used is long-short-term memory (LSTM) [8]. It is the same as the normal neural network (NNN); the only difference is that each intermediate cell is a memory cell and retails its value till next feedback loop. The third concept is the recommendation system. It is basically a subclass of information filtering system that seeks to predict the rating based on the different factors.

2. Literature Review

As per our literature survey, many systems can perform stock market prediction (PSMP). The dataset of statistical information is predicted well, but research was only performed on the deep learning model [9]. The fuzzy approach is used for the short-term process, but this prediction is not suitable for a better level of stock investment [10]. The stock investment proposed idea is a suitable strong prediction for stake and bodement, but it failed to test the real dataset values [11]. A survey article was helpful to understand the various techniques and comparisons between the different models and specific functionalities [12]. In the very beginning of prediction techniques, the ARIMA model is used by the researchers and follows various classification processes. The author could not make any changes in the existing model, which is the drawback of the proposed solution [13]. The neural network is suitable for the stock investment because it follows a certain algorithm and protocol applicable to build a better model [14]. The author proposed the optimization algorithm suitable for selecting an accurate model for the software metrics. These metrics help to understand the prediction model for customer investments [15, 16]. The chronic element has always been there in our lives, and chronic data can easily be gathered in various applications [17]. For instance, when buying a thing online or clicking on “add the item,” the item is registered. Hence, the investigation of chronic information can be applied in various areas. It is essential to note that user preferences change over time. For instance, a human being who watched animated television shows in childhood will most probably switch to watching the news in adulthood. It is of a great deal to incorporate such changes in the recommendation system. This article explained our approach, which predicts user preferences by learning the way of purchase history. Our suggestion is composed of three steps: firstly, obtaining user features based on matrix factorization and buying time; then, using Kalman filtering to predict user preference vectors from user features; finally, generating a list, at which point we suggest two types of recommendation methods using a real-world dataset that our approach outperforms other competitive methods, such as the first-order Markov model [18]. With the progressively savage price of e-commerce platform, the trust relationship among users has become the research hotspot of recommendation system [19]. In this article, a trust relationship is familiarized into the recommendation system and a multisource attribute trust prophecy method based on the improved dataset; the proof theory is proposed; the user attribute is investigated quantitatively according to every user data and the selection of four attributes is done. Then, the quantitative attribute is obtained by decreasing the quantitative one. At the last attribute, evidence is used repeatedly to get the trust relationship strength by using the weight allocation method. In the modeling, the trust prediction result is verified by the gene fold cross-validation method. In addition, the method is compared with another method. Machine learning and the result prove the dominating values of the proposed trust fusion method.

3. Problem Description

There are many conventional techniques available for stock market prediction-based news feed system, but they cannot forecast the prices in the long term as news pertaining to the events taking place in the future cannot be predicted. So, the proposed system predicts the stock market prices using a recurrent neural network and Holt–Winters triple exponential implementation, thus using only historical data to predict the closing price of individual stocks. The system takes input from the user about the amount how much they want to invest, the duration of the investment, and how much loss or profit they can bear. The system uses the information given by the user and applies machine learning algorithms to come up with a solution, suggesting to the user where to invest the money for maximum profit and minimize the risk of loss. The database that is already present in the system is used to analyze the market situation and find an optimal solution[20]. Investing in the stock market is tricky work; the project helps the user and gives them an upper hand in the process. The results are as accurate as they can be. Machine learning algorithms work in real time and manipulate the data in real time, providing a much more efficient way to develop the best solution. With the help of machine learning, the system recognizes the previous patterns and tries to suggest the output of what could be the future price of the stock.

4. Data Gathering and Analysis

The proposed system requires the dataset of one of the stock exchanges from the world; now, researchers need to gather the data. There are various datasets available on stock market exchange websites; these datasets are in various formats, such as intraday data, daily data, weekly data, or monthly data. However, our system needs to be operative on a general basis and not on active intraday trading. Thus,we here opted to go for day data; let us look into selecting a particular stock exchange and choosing a particular data format.

4.1. Choosing Stock Exchange

Many stock exchange markets are available throughout the world, but here it is decided to choose the biggest Indian stock exchange of the National Stock Exchange (NSE). Now, NSE is the biggest stock exchange available for stock traders in India. It has the biggest online digital exchange that gives the option to users to buy or sell their stocks online without any problems. The NSE has more than fifteen thousand securities listed under the equity section, making it the biggest stock exchange in Asia. Once the researcher has chosen the stock exchange market, it is time for selecting particular stocks.

4.2. Selecting Suitable Stocks

The system requires using the list of stocks that are traded on a daily basis and well established in the market. The stocks that give the best periodical graph of the close price of the stock market and are constantly being traded are chosen. Our system ignores the intraday trading prices; thus, we chose the stocks that give linear curves. All stocks in NSE are divided into sectors; each sector has hundreds of stocks. There are a total of fifteen sectors; one stock from each sector is selected. Table 1 shows one stock selected from each sector for optimal system performance.

4.3. Data Format of Downloaded Files

The received data format is (.csv) files, that is, comma-separated values following the structure of the dataset as in Table 2.

4.4. ER Diagram of Dataset

ER diagram in Figure 1 clearly shows that it consists of three types of entities.(i)The all stock entity is centered table and records the value of every stock available for evaluation. This table includes columns like stock id, stock name, sector of that particular stock, current P/E ratio of that particular stock, dividend yield, and RMSE value of our forecast.(ii)There are fifteen stock entity tables linked to all stocks table; each stock table is named after the symbol of that entity. This table records daily data of every stock traded in the last three years, one row for each day. It includes columns like stock id, symbol, date, previous close, open, high, low, close, last, average, and traded quantity, turn over, deliverable quantity, and the number of trades.(iii)Customer table records the login id of every customer; this table is updated at the time of new customer registration. It stores data like name, e-mail, and password.

4.5. Brief Description of Each Attribute

(i)All Stocks (Entity)(ii)Stock ID: ID assigned to each stock (Primary Key)(iii)Name: name of stock(iv)Sector: sector to which stock belongs(v)P/E Ratio: price per share divided by earning per share(vi)Dividend Yield: stipend distributed by securities to stockholders per share percentage(vii)RMSE: root mean square error of testing data of each stock(viii)Stocks (Entity)(ix)StockId: ID assigned to each stock (Primary Key)(x)Symbol: symbol of the stock(xi)Date: date of the data that row represents(xii)Previous close: the closing price of the previous day(xiii)Open: opening price of that particular day(xiv)High: recorded high value of the day(xv)Low: recorded lowest traded stock of the day(xvi)Last: price of the last stock traded(xvii)Close: closing price of the stock of that particular day(xviii)Average: average price of the stock throughout the day(xix)Traded Quantity: total amount of trade on that day(xx)Turn Over: turnover of that particular day and company(xxi)Deliverable Quantity: number of stocks that need to be delivered to new shareholders within two days(xxii)Number Of trades: number of stocks traded

4.6. Description of Tables of MySQL

Figures 2–4 show a description of MySQL tables being used to implement this work.

5. Basic Structure and Implementation

The programming language used for the project development is Java. Java is a general purpose programming language that is object-oriented. However, using machine learning in Java is not an easy task as there is no suitable predefined function. The stock market prediction system uses three different algorithms: Holt–Winters triple exponential algorithm, recurrent neural network, and recommendation system. In this research, NetBeans 8.1 is a Java integrated development environment (IDE) for the development of Java projects. This project works as a standalone Java application; that is, it is an independent platform and just requires MySQL and the latest java development kit (JDK) to run the project. This project provides a single JAR file that can be transferred to any platform to run the required code and uses many libraries for deep learning and user interface development. All the libraries used are open source and register under an open-source license. The implementation of this project can easily be divided into three sections. These three sections are different concepts and use different algorithms:(i)Holt–Winters triple exponential algorithm: this algorithm is used to give a time series forecast of the closing price of stocks; it uses three factors that we will discuss in detail later(ii)Recurrent neural network: long short-term memory RNN is part of deep learning algorithms and uses them to predict stocks(iii)Recommendation system: this algorithm gives us the best possible way to recommend a particular stock to the user by giving each stock a rating based on different factors

5.1. Basic Structure of Project

Figure 5 shows the project package that has been created in order to implement this work.

5.2. System Implementation

This project mainly works on three major concepts, which provide the system with three years of data of fifteen stocks and want to predict the next quarter of next year. Thus, we visualize having 750 rows of historical data of each security and we wish to predict the closing price of the next 59 days.

5.2.1. HWTES Algorithm

Holt–Winters triple exponential algorithm is used when data exhibit trend or reasonability. It captures the level of demand and trend overtime. We break demand observation into three different components:(i)Base level(ii)Trend level(iii)Seasonal factor

When the forecast is developed, it will recompose the demand expectation by summing up all three elements. It is exponential smoothing with trend and seasonality because, similar to the simple exponential smoothing, the smoothing factors are applied. The difference is that it breaks down demand observation into three components and then applies a smoothing constant to each one of them; that is the reason for observing three smoothing factors, i.e., α, β, γ. It starts with the first period of data and takes the actual demand observation and subtracts it from the average demand observation of the course of 3 years. This research took two weeks of data and then dragged down the same formula from the remaining weeks of the year. It will all sum up to 0 as average demand was subtracted from each actual demand. It starts from fifty-two to the final base level and subtracts seasonable factor from actual demand.

The basic equations of this method are given as follows.

Overall smoothing:

Trend smoothing:

Seasonal smoothing:

Forecast:where y is the observation, S is the smoothed observation, b is the trend factor, I is the seasonal index, F is the forecast at m periods ahead, and t is an index denoting a time period

For the given base level,(i)Alpha is equal to 0.200(ii)Beta is equal to 0.300(iii)Gamma is equal to 0.200

The combination of these three factors gives the best yield of the week 53 actual demand minus the seasonal factor of the week. Break POS into three distinct components that are estimates. There are two different ways of estimating each of these components and weighing them by applying the respective weighting factors. Simply copy them down throughout the remainder set of week 5 of year 3, then the week forecast of week 6 of year 3 till the end of week 52 of year 3. Most recent seasonal factor that is week 6 of year 2; then, we can drag down for year.

5.2.2. Separating Train and Test Data

In this research article, the training data and testing data needed to be separated. To achieve an error ratio from our algorithm, the developer first prepares the model and then runs the algorithm on testing data. The developer received RMSE, that is, our error parameter. Training data are from 01/01/2017 to 30/09/2019, counting total rows up to 743 rows. Testing data are from 01/10/2019 to 31/12/2020, counting total rows up to 59 rows. Forecasted data are from 01/01/2021 to 31/03/2021, counting total rows up to 58 rows.

5.2.3. Recurrent Neural Network LSTM

The LSTM is an evolution on a recurrent neural network, commonly known as RNN. The normal RNN module took the output of the last layer over a single TANH function. LSTM uses a feedback loop and gates to remember. It has four interacting RNNs layers in each module. The core idea behind LSTMs is the “module state β.” The “β” is the main chain of data flow. In a neural state, it allows data to flow almost unchanged, albeit with some linear transformations. However, LSTM can add/remove data from “β” via sigmoid “gates,” though the implementation of LSTM modules differs. In LSTM, if the output gets output repeatedly multiplied even by 1, then the output gets smaller quickly and this reduces the learning rate. LSTM behaves differently; it has 4 functions in one. The module state allows the LSTM modules to loop LSTM memory. Gates take the input and then decide how much of the input should be used to update the module state. For example, sigmoid decides how much of the input should be used to update. For instance, for “0,” it does not update anything, but for “1,” it updates the whole module state or anything in between. They allow the neurons to keep the long-term memory of what is going on.

Example 1. Time series approach: this approach has the following steps:(1)SIN wave predictor(2)Load 500 1 time periods(3)Split into windows of length 50(4)Split into train/test sets(90/10)(5)Reshape into NumPy array 3-dimentional array: total windows, window size, 1 [[[x], [x1], [x2]], [[x], [x1], [x2][x3]],….]The steps for this approach are as follows: (1)Create LSTM model:(i)(1,50,100,1)(ii)50 and 100 are LSTM neurons(iii)1 fully connected output (linear activation)(2)Train the model:(i)Only 1 epoch is required(3)Test prediction for next 500 steps:(i)Initialize with 1^st sequence of testing data(ii)Feedforward sequence windows to get singular next step prediction(iii)Shift sequence windows to remove 0^th element and push predicted value as nth element(iv)After >50 steps window will be only predicting on predictions

5.2.4. Modified Recommendation System

A subclass of information filtering system seeks to predict the best possible stock available for the user. A normal recommendation system would accept the values that are Boolean, for example, a commodity being liked or disliked. However, here, using a weighted recommendation system, this will accept the various factors of the stock market as its parameter to compare different stock prices. Let us discuss these factors first:(i)RMSE: this research received the RMSE values when we forecasted our data using RNN Algorithms or Holt–Winters triple exponential algorithms. This factor indicates how much accurate our prediction is. If we get a higher RMSE value, then a lower rating to the stock is given, and if we let the lower value of RMSE that means our forecast is good and we can trust our reading, we give a higher rating to that stock.(ii)DY: dividend yield is a percentage value that stays common for the particular stock. In reality, the stock market securities give a dividend yield to their shareholders as any commodity. This is given once every month or quarterly as an incentive to long-term investors to hold their stocks; also, this dividend value varies directly proportional to the profits of security. Thus, the more the dividend yields, the better the stock and vice versa.(iii)P/E Ratio: P/E ratio stands for price-earnings ratio. It is the ratio of price per share upon earning per share. Basically, it gives the status of the stock. The price-earnings ratio is the ratio for evaluating a company that measures its current share price to its per-share earnings. The price-earnings ratio is also sometimes known as the price multiple or the earnings multiple. This gives the evaluation of the stock; the larger P/E ratio means the stock price is overrated and is going to fall anytime; thus, we will give a lower rating and vice versa.(iv)ROI: this stands for return on investment calculated as follows:(v)This is a percentage quantity: the higher the ROI, the better the rating we give to our stock.

Example 2. Recommending the best stock (Table 3)
After giving ratings to each of the user, we will display all available options to buy stock in the list sorted by ratings given by the recommendation system.

6. User Interface Module

The snapshot of the interface modules of this recommendation system can be shown in Figures 6–12.

7. Conclusion

The stock market prediction tool was executed and displayed desired results from the real-time dataset. The sample database chosen worked fine for the system and represented the behavior of every sector correctly. Further, the forecasting algorithms like the Holt–Winters algorithm and recurrent neural network LSTM forecasted the closing price of one quarter of the year from 01 to 08-2021 to 30-11-2021 with an average RMSE value of less than 50 in many cases. The recommendation system also worked properly, displaying the best stock to buy in a given interval of time.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

X. Wang, “The short-term passenger flow forecasting of urban rail transit based on holt-winters’ seasonal method,” in 2019 4th International Conference on Electromechanical Control Technology and Transportation (ICECTT), pp. 265–268, Guilin, China, 26-28 April 2019.
View at: Publisher Site | Google Scholar
W. Chen, Y. Zhang, C. K. Yeo, C. T. Lau, and B. S. Lee, “Stock market prediction using neural network through news on online social networks,” in 2017 International Smart Cities Conference (ISC2), pp. 1–6, Wuxi, China, 14-17 September 2017.
View at: Publisher Site | Google Scholar
S. E. C. Gelper, R. Fried, and C. Croux, “Robust forecasting with exponential and holt-winters smoothing,” Journal of Forecasting, vol. 29, no. 3, pp. 285–300, 2010.
View at: Publisher Site | Google Scholar
E. F. Fama, L. Fisher, M. C. Jensen, and R. Roll, “The adjustment of stock prices to new information,” International Economic Review, vol. 10, no. 1, pp. 1–21, 1969.
View at: Publisher Site | Google Scholar
E. F. Fama, “Efficient capital markets: Ii,” The Journal of Finance, vol. 46, no. 5, pp. 1575–1617, 1991.
View at: Publisher Site | Google Scholar
E. N. Biktimirov and Y. Xu, “Market reactions to changes in the Dow Jones industrial average index,” International Journal of Managerial Finance, vol. 15, no. 5, pp. 792–812, 2019.
View at: Publisher Site | Google Scholar
A. Rajagopal, S. Jha, M. Khari, S. Ahmad, B. Alouffi, and A. Alharbi, “A novel approach in prediction of crop production using recurrent cuckoo search optimization neural networks,” Applied Sciences, vol. 11, no. 21, p. 9816, 2021.
View at: Publisher Site | Google Scholar
A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network,” Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020.
View at: Publisher Site | Google Scholar
J. Shen and M. O. Shafiq, “Short-term stock market price trend prediction using a comprehensive deep learning system,” Journal of Big Data, vol. 7, no. 1, p. 66, 2020.
View at: Publisher Site | Google Scholar
G. S Atsalakis and K. P. Valavanis, “Forecasting stock market short-term trends using a neuro-fuzzy based methodology,” Expert Systems with Applications, vol. 36, no. 7, pp. 696–707, 2009.
View at: Publisher Site | Google Scholar
H. Nekoeiqachkanloo, B. Ghojogh, A. S. Pasand, and M. Crowley, “Artificial counselor system for stock investment,” 2019, arXiv:1903.00955.
View at: Publisher Site | Google Scholar
N. Rouf, M. B. Malik, T. Arif et al., “Stock market prediction using machine learning techniques: a decade survey on methodologies, recent developments, and future directions,” Electronics, vol. 10, no. 21, p. 2717, 2021.
View at: Publisher Site | Google Scholar
S. M. Idrees, M. A. Alam, and P. Agarwal, “A prediction approach for stock market volatility based on time series data,” IEEE Access, vol. 7, pp. 17287–17298, 2019.
View at: Publisher Site | Google Scholar
R. Hafezi, J. Shahrabi, and E. Hadavandi, “A bat-neural network multi-agent system (BNNMAS) for stock price prediction: case study of DAX stock price,” Applied Soft Computing, vol. 29, pp. 196–210, 2015.
View at: Publisher Site | Google Scholar
M. Iyyappan and A. Kumar, “Optimization of software package selection using cohesion measurement and complexity metric for CBSS development,” International Journal on Emerging Technology, vol. 10, no. 1, pp. 211–217, 2019.
View at: Google Scholar
H. U. Rahman, M. Raza, P. Afsar, A. Alharbi, S. Ahmad, and H. Alyami, “Multi-criteria decision making model for application maintenance offshoring using analytic hierarchy process,” Applied Sciences, vol. 11, no. 18, p. 8550, 2021.
View at: Publisher Site | Google Scholar
J. T. Mentzer and R. Gomes, “Further extensions of adaptive extended exponential smoothing and comparison with the M-Competition,” Journal of the Academy of Marketing Science, vol. 22, no. 4, pp. 372–382, 1994.
View at: Publisher Site | Google Scholar
C. C. Tan and N. C. Beaulieu, “First-order Markov modeling for the Rayleigh fading channel,” IEEE GLOBECOM 1998 (Cat. NO. 98CH36250), vol. 6, pp. 3669–3674, 1998.
View at: Publisher Site | Google Scholar
A. Singh, S. Ahmad, and M. I. Haque, “Big data science and EXASOL as big data analytics tool,” International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 9S, pp. 933–937, 2019.
View at: Google Scholar
S. Ahmad, M. M. Afzal, and A. Alharbi, “Big data analytics with fog computing in integrated cloud fog and IoT architecture for smart devices,” International Journal of Computer Science and Network Security (IJCSNS), vol. 06, pp. 171–177, 2020, https://paper.ijcsns.org/07_book/202006/20200620.pdf.
View at: Google Scholar

Copyright

Copyright © 2022 Iyyappan. M et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

5932

Downloads

1951

Citations