An Intelligent Fusion Model with Portfolio Selection and Machine Learning for Stock Market Prediction

Padhi, Dushmanta Kumar; Padhy, Neelamadhab; Bhoi, Akash Kumar; Shafi, Jana; Yesuf, Seid Hassen

doi:https://doi.org/10.1155/2022/7588303

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Related Work Materials and Methods Results and Discussion Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Computational Intelligence in Internet of Things Enabled Applications

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 7588303 | https://doi.org/10.1155/2022/7588303

An Intelligent Fusion Model with Portfolio Selection and Machine Learning for Stock Market Prediction

Dushmanta Kumar Padhi,¹Neelamadhab Padhy,¹Akash Kumar Bhoi,^2,3,4Jana Shafi,⁵and Seid Hassen Yesuf⁶

Academic Editor: Yang Gu

Received12 Apr 2022

Revised17 May 2022

Accepted26 May 2022

Published23 Jun 2022

Abstract

Developing reliable equity market models allows investors to make more informed decisions. A trading model can reduce the risks associated with investment and allow traders to choose the best-paying stocks. However, stock market analysis is complicated with batch processing techniques since stock prices are highly correlated. In recent years, advances in machine learning have given us a lot of chances to use forecasting theory and risk optimization together. The study postulates a unique two-stage framework. First, the mean-variance approach is utilized to select probable stocks (portfolio construction), thereby minimizing investment risk. Second, we present an online machine learning technique, a combination of “perceptron” and “passive-aggressive algorithm,” to predict future stock price movements for the upcoming period. We have calculated the classification reports, AUC score, accuracy, and Hamming loss for the proposed framework in the real-world datasets of 20 health sector indices for four different geographical reasons for the performance evaluation. Lastly, we conduct a numerical comparison of our method’s outcomes to those generated via conventional solutions by previous studies. Our aftermath reveals that learning-based ensemble strategies with portfolio selection are effective in comparison.

1. Introduction

Before the end of the twentieth century, low-frequency financial data were available for analysing and forecasting the stock market. Fewer professionals and academicians use these low-frequency data for their empirical studies, but as there are no sufficient related data available, the empirical research will not succeed [1]. Due to the rapid development of science and technology, the cost of data capture and storage has been reduced dramatically, which makes it easy to record each day’s trading data related to the financial market. As a result, significant financial analysis of data has become a prominent area of research in economics and a variety of other disciplines [2].

With the recent rapid economic expansion, the quantity of financial activities has expanded, and their fluctuating trend has also become more complex. Asset prices trend forecast is a classic and fascinating issue that has piqued the interest of numerous academics from several fields. Academic and financial research subjects to understand stock market patterns and anticipate their growth and changes. Portfolio construction through competent stock selection has long been a critical endeavour for investors and fund managers. Portfolio enhancement and optimization have emerged among the most pressing issues in modern financial studies and investment decision-making in this era [3]. Portfolio development success is highly contingent on the future performance of financial markets. Forecasts that are realistic and exact can provide substantial investment returns while mitigating risk [4]. The prevailing economic and financial theory is the efficient market hypothesis (EMH) [5]. According to this hypothesis, forecasting the valuation of capital assets is challenging. However, according to past research, equity markets and yields can be predicted [6]. Before the invention of efficient machine learning algorithms, academics generated prediction models for research using a variety of alternative and econometric approaches [7]. Traditional statistical and econometric tools require linear models and cannot anticipate or analyse financial goods until nonlinear models are turned into linear models. Many studies have proven that nonlinearities arise in financial markets and that statistical models cannot effectively control them. With the rapid rise of AI and machine learning over the last decade, an increasing number of financial professionals have begun to analyse the index value of gaugeable models, have different requirements, and experiment with diverse methodologies [8]. K-NNs [9], Bayes classifiers [10], decision trees [11], and SVMs [12] are presently widely used for classification tasks [13]. However, in practice, these solutions fail to function when data are collected over an extended period, and storage space is limited (processing data at once are impractical).

Due to the ever-growing volume of incoming data, such as stock market indices, sensor readings, and live coverage, online learning has become highly significant [14]. When it comes to online learning, a system should absorb more training data without having to retrain from the beginning. Traditional AI frameworks, such as supervised learning tasks, usually work in a batch learning mode. A training dataset is supplied beforehand to train the model to use some learning algorithm. Due to the high cost of training, this paradigm necessitates the accessibility of the full training set before the learning assignment, so the learning process is frequently conducted offline [15]. In addition to being inefficient and low in both time and space costs, batch learning approaches have the disadvantage of being unable to scale for a range of applications since models would frequently need to be retrained from scratch for new learning data. With incremental learning, a learner attempts to acquire and improve the best predictor for a model as they go along through a sequential flow of information rather than using batch classifiers. Online learning overcomes the limitations of batch learning by allowing prediction models to be updated quickly in response to current data examples. Since machine learning jobs in real-world information analytic platforms tend to involve large volumes of information arriving at high speeds, online learning techniques offer a much more flexible and effective method for handling massive data inputs. In the real world, online learning can solve problems in several different application areas, just as traditional (batch) machine learning can.

Asset prices and economic forecasts are among the most complex and challenging activities in finance. Most traders depend on technical, fundamental, and quantitative analysis for making forecasts or creating price signals. With the advent of AI in different fields, its rippling effect may also be seen in finance and price forecasting. As stock prices are updated every second, there is always a possibility of a drift in the data distribution and rendering [16]. Continual advances in computational science and data innovation are essential to the globalization of the economy [17]. While numerous methods exist for estimating the cost of financial exchange, the latter has been the focus of the investigation. We may discover many issues and constraints in coming to more critical data even though there are many methods. Because classic analytical approaches have apparent flaws in dealing with nonlinear difficulties, several machine learning algorithms are being used in stock exchange inquiries [18].

Financial backers can make sound decisions, increase productivity, and reduce possible losses using a model capable of forecasting the growth path of a stock’s value. As a result, accurate forecasting and stock market research have become more complex and less favourable. We must constantly improve our deciding approaches for stock price prediction. Previously, several domestic and international researchers were dedicated to developing measurable monetary frameworks to forecast index growth. Before the advent of expert AI computations, analysts routinely used a variety of statistical approaches to create expectation models. A stock market prediction can be made with linear and nonlinear models. Most linear models are based on statistics, while most nonlinear models are based on machine learning techniques [19]. In principle, customary monetary structures and the arising automated thinking model might accomplish stock expense estimating, but the expectation sway is very astounding [20]. Observing systems with work on predicting future outcomes through model combination and inspection is advantageous for some analysts, and it also has excellent speculative value [21]. In reality, authentic data may be coordinated into monetary systems to forecast future data. For instance, assuming the stock worth check is more prominent, the model predicts that the future share price will be higher than the end-of-day price will climb. Monetary allies might choose to stay retaining the shares to obtain a higher return on investment [22]. If the asset value guesstimate is less than the day’s end price, the share price will likely fall later. As a result, developing a monetary framework to recognize asset value measuring is very feasible [23]. Furthermore, if you can correctly forecast the asset price movements and price flow patterns, it has a significant incentive for governments, listed companies, and private financial backers [24].

Recently, there has been a growth in research that looks at the path or pattern of financial market changes. Presently, the examination is advancing by inspecting the interest and pattern of securities exchanges. Academics have long been interested in equity market forecasting as an appealing and challenging subject. The amount of information that is available daily continuously increases; as a result, we are confronted with new issues in handling data to extract information and estimate the impact on asset values. There is always a challenge and disagreement when determining the optimal strategy to forecast the stock market’s daily return trend. As the study aims to anticipate the future market, this study topic has a self-sabotaging behaviour that has proven to be fascinating and prevalent for stock market forecasting. Researchers can always discover industry secrets and analyse the market using their unique methods, thanks to the fast development of machine learning models, techniques, and technologies. The machine learning models can improve their prediction performance by identifying suitable feature selections. A poor feature selection reduces the model’s performance and results in biased outcomes. Developing a reliable forecasting technique capable of identifying risk factors and providing favourable and unfavourable market direction is as important as appropriate feature extraction throughout the modelling procedure.

The purpose of this study was to develop an online learning framework based on machine learning that can reduce investment risk (by constructing an optimal portfolio) and make a predictive judgement regarding the direction of the selected indices. In addition, this study provides a new method for minimizing investment risk by building a framework that combines the mean-variance model for selecting stocks with minimal risk and an online learning framework for index forecasting.

This framework, in particular, has two major phases: portfolio selection and stock prediction. This study’s primary contributions are summarized as follows:(i)Compared with previous studies on portfolio development and machine learning-based forecasting strategies in general, it is always a top task to find the hidden features. So, the suggested approach has some unique combination of features generated from the raw transaction data with less effort on the human being.(ii)The system is intended for real-world use. Therefore, we adopted a unique framework that combines the mean-variance model for portfolio development and the online learning model for financial market prediction.(iii)Our experiment examined the performance of four different geographical reasons’ health sector equities throughout volatility stress and smooth trending periods, as well as the durability of financial crises and clustering. For this objective, a large amount of data was collected over a lengthy period.

The remainder of the article can be deduced from the information provided below. Section 2 portrays the related work, Section 3 depicts the materials and techniques, and Section 4 investigates our proposed framework. Section 5 focuses on exploratory research findings and discusses the most significant discoveries made during our investigation. Last but not least, in Section 6, we discuss the conclusion section of our work and the future scope of our research.

Investing proponents and professionals have long held that stock price movements are unpredictable. The phrase “efficient market hypothesis” was coined by Fama [25] and gave rise to this point of view (EMH). The nonstationary and dynamic nature of financial market data, according to Fama, makes it impossible to make predictions about the capital market [25]. According to the EMH, the market reacts immediately to new information about financial assets. As a result, it is impossible to break into the market. According to Shiller [26], the financial sector entered the 1990s when academics dominated behavioural finance. From 1989 to 2000, Shiller’s [26] study found that fluctuations in the stock market were driven by investor mood. When Thaler [27] predicted that the Internet stock boom would collapse, he criticized the generally accepted EMH of accepting all financial supporters as usual and making plausible forecasts about the future.

On the other hand, behavioural finance argues that stock market movements are always based on real knowledge, according to Shiller [26]. Shiller [26] showed that short-term stock prices are unpredictable, while long-term stock market movements are predictable. Fundamental and technical variables are both important when it comes to financial market forecasting [28, 29]. The entire analysis considers how much money the company has left, how many workers there are, how the directorate makes decisions, and what the company’s yearly report looks like [30]. It also takes into account things such as unnatural or catastrophic events, as well as information about politics. People look at the main things are the company’s GDP, CPI, and P/E ratios [31]. Stock market forecasting can benefit from a fundamental strategy that prioritizes the long term above the immediate [32]. Specialized observers [33] use trend lines and technical indicators to forecast the securities market for specialized observers [33]. Technical analysts can make educated guesses using mathematical algorithms and previous price data [34].

Researchers now have more resources to work with as AI techniques improve and datasets become more widely available, opening up new directions for investigation. According to Marwala and Hurwitz [35], advances in AI technologies have influenced the EMH and fuelled a need to learn from the market. According to a growing corpus of studies, capital markets can be predicted to some extent, according to a growing corpus of studies [36, 37]. Consequently, investors have the chance to minimize their losses while maximizing their earnings while dealing with the stock market [38]. Recent research suggests that statistical and machine learning are two distinct approaches [39].

Statistical techniques were utilized before machine learning to analyse market trends and analyse and forecast stocks. To assess the financial market, several statistical models are employed [40–42]. Traditional statistical approaches have struggled tremendously, and machine learning approaches are beginning to develop to circumvent the drawbacks of conventional statistical methods [43]. Numerous machine learning algorithms have been used to anticipate the stock market [44–49]. Prior research has established that machine learning techniques outperform all other predicting stock market directionality [50]. Traditional models are less flexible than AI approaches [51, 52]. Several machine learning algorithms have been investigated in the past [53, 54]. Some examples are logistic regression, SVM, K-NN, random forests, decision trees [34, 37, 40], and neural networks [37, 38]. As described in the literature, SVM and ANN are the most frequently used algorithms for stock market forecasting. A long-term financial market forecasting classification system was proposed by Milosevic et al. [55]. They say that a stock is excellent if its value improves by 10% in a fiscal year; otherwise, it is bad. Eleven fundamental ratios were recovered throughout the model-building process and are used as input features by several algorithms. They found that the random forest had an F score of 0.751 in differentiation using naive Bayes and SVM. Choudhury and Sen [56] trained a back propagation neural network and a multilayer feedforward network to forecast the stock value. A regression value of 0.996 was obtained using their proposed model.

Boonpeng and Jeatrakul [57] developed a multi-class categorization problem to determine whether a stock is a good investment. According to their findings, one-against-all neural networks beat the traditional one-against-one and classic neural network models with a 72.50% accuracy rate. According to Yang et al. [58], an effective forecasting model requires understanding the nonlinear components of stock. Multiple machine learning models for stock market direction prediction have been created, as Ballings et al. [59] noted. In addition to datasets from European businesses, they used a range of ensemble machine learning algorithms. They also used neural networks and logistic regression. Finally, using the random forest approach, they could predict the long-term fluctuations in the stock market using their dataset. According to Leung et al. [60–62], accurate forecasts of the growth of the stock worth list are essential for the creation of effective trading approaches such as financial backers that can protect against the predicted dangers of the securities exchange. Even if only a little accuracy is gained, anticipating execution is a considerable benefit. When it comes to predicting the financial markets, machine learning techniques often fall into one of two camps: predicting the stock market using solitary machine learning algorithms or employing many models. According to a number of studies, ensemble models are more accurate than solitary forecasting models. Only a few studies have looked into ensemble models [63].

Many ensemble approaches have been developed in machine learning platforms to improve predicting performance and decrease bias and variance trade-offs [64]. The most often used algorithms for machine learning-based ensemble learning include AdaBoost [65], XGBoost [66], and GBDT [67]. In Nobre and Neves [66], an XGBoost-based binary classifier is introduced. The results demonstrate that the framework may provide greater average returns. Furthermore, a stock forecasting model employing technical indicators as input features was proposed by Yun et al. [68]. According to the researchers, their XGBoost models outperformed both SVM and the ANN. Based on risk categorization, Basak et al. [25] created a methodology for forecasting whether the stock price will rise or fall. When employing random forest and XGBoost classifiers, researchers demonstrated that hybrid models perform much better with the right set of indicators as input features for a classifier. Ecer et al. [63] claim that ensemble machine learning approaches are superior to individual machine learning models in terms of performance. Multilayer perceptron, genetic algorithms, and particle swarm optimization are included in two new methods proposed by Ecer et al. [63]. A total of nine technical indicators were used to train their model and resulted in RMSEs of 0.732583 for MLP–PSO and 0.733063 for MLP–GA, respectively. According to the researchers, a combination of machine learning techniques can improve prediction accuracy.

Yang et al. [69] presented a feedforward network composed of many layers for Chinese stock market forecasting. Back propagation and Adam algorithms were used to train the model, and an ensemble was created using the bagging approach. The model’s performance may be enhanced by further normalizing the dataset. Wang et al. [70] constructed a combined approach that forecasts the financial markets every week using BPNN, ARIMA, and ESM. In predicting stock market direction, they found that hybrid models beat regular individual models with an accuracy of 70.16 percent. Finally, Chenglin and colleagues [71] proposed a model for forecasting the direction of the fiscal market. According to the researchers, mixed models, which included SVM and ARIMA, outperformed standalone models. Tiwari et al. [72] proposed a hybrid model that combines the Markov model and a decision tree to forecast the BSE, India, with an accuracy of 92.1 percent. Prasad et al. [39] investigated three algorithms, XGBoost, Kalman filters, and ARIMA, as well as two datasets, the NSE and NYSE. First, they looked at how well individual algorithms could predict and how well a hybrid model they made with Kalman filters and XGBoost worked. Finally, they looked at four models and found that the ARIMA and XGBoost models did well on both datasets, but the accuracy of the Kalman filter was not consistent. Jiayu et al. [62] developed a combined LSTM and attention mechanism, called WLSTM + attention, and demonstrated that the suggested model’s MSE became less than 0.05 on three independent measures. Moreover, they asserted that proper feature selection might enhance the model's forecasting accuracy.

Portfolio enhancement is the circulation of abundance among different assets, wherein two parameters, in particular, anticipated returns and risks, are vital. The ultimate goal of financial backers is usually to increase the returns and decrease the risks. Usually, as the return margin increases correspondently, the risk margin also increases. The model introduced by Markowitz [73] is popularly known as the mean-variance (MV) model, whose main objective is to solve the problems during portfolio optimization. The main parameter of this model is means and variances quantified by returns and risks, respectively, which facilitates financial supporters to strike a balance between maximizing expected return and reducing risk. After the exploration of Markowitz’s mean-variance model, some researchers tried to develop a modified version of this model in different ways: (i) an optimized portfolio selection with respect to multi-period [74–76] and (ii) introducing alternate risk assessment methods. The safety-first model [77], the mean-semi-variance model [78], the mean absolute deviation model [79], and the mean-semi-absolute deviation model [80] are all examples. (iii) Many real-world constraints, such as cardinality constraints and transaction costs, were also included in the study. [81–84]. Nonetheless, the above examinations focus harder on the improvement and extension of the mean-variance model; however, they never consider that it is essential to select high-quality assets for creating an optimal portfolio. The investment strategy process generally said that if we provide high-quality assets as input, there is a quiet assurance that we construct a reliable optimized portfolio. In the last few years, a few studies have been done to ensure that the asset selection and the portfolio determination models work together.

For an investment decision, Paiva et al. [85] develop a model. First, they use an SVM algorithm to classify assets, and then, they use the mean-variance model to make a portfolio. A hybrid model proposed by Wang et al. [86] is a combination of LSTM and Markowitz’s mean-variance model for optimized model creation and asset price prediction. These investigations showed that the mix of stock forecast and portfolio determination might give another viewpoint to financial analysis. So, in our current study, we use the mean-variance model for portfolio selection and determine individual assets’ contribution towards our model-building process.

Machine learning strategies have been broadly utilized for classification-related issues [13]. A couple of techniques and models are discussed in the above literature. However, instead of dealing with theoretical concepts if we deal and work with the practical environment, in the stock market the data are coming in continuously over a long duration time and the execution of current data at once each time is impossible, so these techniques are not working properly for forecasting in a real environment. As a result, online learning is becoming increasingly important in dealing with never-ending incoming data streams such as sensor data, video streaming data, and financial market indexes [14]. So, when it comes to online learning, a system should absorb extra training data without retraining from the start.

During the online learning process, the continuous data flows are coming in a sequence, and the predictive model generates a prediction level on each round of data flows. Then, according to the current data, the predictive online learning model may update the forecasting mechanism. Perceptron [87] is a basic yet effective incremental learning algorithm that has been widely researched to improve its generalization capability. Crammer et al. [88] introduced the passive-aggressive (PA) algorithm, which is faster than perceptron and sometimes shows more promise than perception. When the new sample comes in, it changes the model to ensure it does not lose too much data and that it is almost the same as the old one. The retrained model guarantees that it has a minimal loss on the current sample and is similar to the present one. A few online machine learning approaches have been developed to cope with massive streaming data. New rules may be discovered when new data arrive, while current ones may be revised or partially deleted [89]. In the training phase of traditional machine learning algorithms, each sample was considered equally valuable. However, in real-world applications, different samples should contribute to the decision boundary of participating classifiers in distinct ways [90]. Perceptron-based projection algorithm was proposed by Orabona et al. [91]; however, the number of online hypotheses is limited in this technique by projecting the data into the space encompassed by the primary online hypothesis rather than rejecting them. In ALMAp [92], the maximum margin hyperplane is estimated for a collection of linearly separable data.

Furthermore, SVMs have been updated for numerous iterative versions [93–95], which define a broad online optimization issue. For example, Laughter [96] introduced two families of image classification online developing classifiers. The created classifiers are first given training upon specific pre-labeled training data before being updated on newly recorded samples. In [97], a robust membership computation approach that works extraordinarily when confronted with noisy data was given. However, many membership generation algorithms are designed for specific data distributions or presuppose batch delivery of training samples. Because the early phases of distribution information are erroneous, transferring such approaches directly to online learning may provide additional issues. More significantly, when a fresh instance is obtained, the new decision boundary must be computed using the complete existing training set, which takes more time. As far as we know, very few articles have been written about the subject. As a result, a solid and efficient framework of incremental forecasted model based on the stock market is required for online classification. Overall, this research line has demonstrated significant promise for incremental model parameter modification and excellent understand ability of online learning systems in dynamically changing contexts. Table 1 shows the numerous research studies conducted based on batch learning techniques.

When we forecast the financial market using these methodologies, the literature mentioned above has some shortfalls, like these techniques follow the traditional batch learning techniques, which further can be improved by the help of online learning techniques. Moreover, some literature faces imbalanced classification problems when multiple indices are examined from different countries’ indices. Therefore, we selected quality-based stocks using the mean-variance model instead of focusing on the randomly selected stock for the experiment. So, we have introduced a framework that can handle the situations outlined above.

3. Materials and Methods

Before discussing our framework, we have emphasized the significant methodologies and datasets that will be employed in our proposed framework. Online learning or incremental learning is a machine learning approach for sequential data in which the learner attempts to develop and demonstrate the best predictor for each new dataset. Allowing the prediction model to be modified quickly for any current data instances, online learning makes up for the weaknesses of batch learning. As we all know, stock market data always come into existence sequentially and regularly. As a result, the batch learning process suffered greatly. So, we constructed two online learning algorithms for our experimental goals, which are briefly described below.

3.1. Perceptron

The perceptron algorithm is the most ancient method for online learning. The perceptron algorithm for online binary classification is described in Algorithm 1.

(1)	initialize: wg₁ = 0
(2)	For td = 1, 2, 3, 4, …, TD do
(3)	Assume the current incoming instance xi_t, prd
(4)
(5)	Receive the true class label
(6)
(7)
(8)	end if
(9)	end for

In general, if a specific margin can separate the data, the perceptron technique should result in a maximum of errors, where the margin is specified as and R is a constant such that The higher the margin , the narrower the error bound.

Numerous variations of perceptron algorithms have been presented in the literature. A straightforward modification is the normalized perceptron method, which varies only in its updating rule:

3.2. Passive-Aggressive Classifier

When a new piece of data comes in, the model is updated to ensure that the new piece of data does not get lost and that the model is close to the one already there [15, 105, 106].

This algorithm falls under the family of first-order online learning algorithms, and it works with the principle of margin-based learning [95].

Given an instance at round tr, the passive-aggressive generates the optimization as follows:where is the hinge loss of the classifier. When the hinge loss is zero, i.e., , then the classifier is passive and the classifier is treated as aggressive, and when loss is nonzero, then the algorithm is named as “passive-aggressive” (PA) [95]. So, the aim of the passive-aggressive classifier is to update the classifier and stay close to the previous one.

In particular, PA aims to keep the updated classifier stay close to the previous classifier (“passiveness”) and make sure all incoming instances are correctly classified by updating the classifier.

It is critical to recognize a significant distinction between PA and perceptron algorithms. Perceptron updates only when a classification error occurs. However, a PA algorithm updates aggressively anytime the loss is nonzero (even if the classification is correct). Although PA algorithms have equivalent error limitations to perceptron algorithms in principle [95], they frequently outperform perceptron considerably practically.

3.3. Modern Portfolio Theory

Modern portfolio theory (MPT), sometimes referred to as mean-variance analysis, is a mathematical framework for designing an asset portfolio to maximize expected returns for a given level of risk. It is a formalization and extension of diversification in investing, which maintains that possessing a diverse portfolio of financial assets is less risky than owning only one type. Its fundamental premise is that an asset’s risk and return should not be evaluated in isolation but rather in connection to contributing to the portfolio’s overall risk and return. Asset price volatility is used as a proxy for risk. Economist Harry Markowitz [73] popularized MPT in a 1952 article for which he was later given the Nobel Memorial Prize in Economic Sciences, which became known as the Markowitz Prize.

As per the following multi-objective optimization formula, the mean-variance model maximizes profits and reduces risks simultaneously.where denotes the correlation between assets k and l, denote the fraction of the original value; and is the expected return on asset k.

3.4. Imbalanced Data Handling Techniques

Imbalanced data distribution is frequently used in machine learning and data science. It occurs when the number of observations in one class is considerably more or smaller than the number of observations in other classes. However, because machine learning algorithms maximize accuracy by decreasing mistakes [107], they ignore class distribution. In more technical words, if our dataset includes an unequal data distribution, our model is more susceptible to situations in which the minority class has very little or no recall.

3.4.1. Synthetic Minority Oversampling Technique (SMOTE)

You can use synthetic minority oversampling technique (SMOTE) to deal with not evenly split up data. It tries to even things out by adding minority class examples at random through replication. Step 1. By calculating Euclidean distances between x and each other sample in set A, we define the minority classes set A for each k-nearest neighbors of x. Step 2. To determine the sampling rate N, an imbalanced proportion is calculated. The set A1 is constructed by randomly selecting N (i.e., x1, x2, …, xn) from its k closest neighbors, for each . Step 3. To make a new example for each complex the following formula is used: where rand (0, 1) denotes a random value between 0 and 1.

SMOTE is a well-known data preparation approach for addressing the issue of class imbalance. Sun et al. [103] provide the SMOTE technique, which generates new samples by identifying the k-nearest neighbors of each minority class sample and randomly interpolating between them to achieve sample class balance before training classifiers [103]. SMOTE, one of the most often used oversampling techniques for dealing with class imbalance, provides new minority class samples to balance the training dataset, hence enhancing the model's classification performance. SMOTE creates extra minority class samples combined with the initial training set to form an equitable training set. As an illustration of how SMOTE may be used to address a class imbalance in our prediction model, the SMOTE approach can be used to batch balance the original training dataset before starting the ensemble approach.

3.5. Evaluation Matrices

To assess the efficiency of the suggested model, we used performance matrices such as accuracy, receiver operating characteristic (ROC) curve (AUC), and F score. As a result, we evaluated the framework’s performance using a mixture of matrices rather than a single one. The following are the performance matrices.where denotes the total number of real positive values; denotes the total number of real negative values; denotes the total number of false-positives values; and denotes the total number of false-negative values.

According to Shen and Shafiq [101], the area under curve is an acceptable assessment matrix for classification issues; as the AUC value grows, so does the model’s prediction ability.

3.5.1. Hamming Loss

In general, there is no magical metric that is the best for every problem. In every problem, you have different needs, and you should optimize for them. The Hamming loss is the proportion of wrong labels compared with the total number of labels. Hamming loss is calculated as the Hamming distance between actual and predicted values. Generally, the Hamming loss is related to imbalance classification problems.

3.6. Instrumentation and Systems Employed

We used the Python open-source environment and Google Colab for our scientific experiment purposes for our suggested framework. Here, we used the Scikit library to access the predefined library functions related to machine learning models and the TA-Lib library to find the technical indicators used in our experiment. The complete development procedure was run on an Intel CPU (Core-i5-1035G1, 1.19 GHz) processor, with RAM installed of 8 GB, and the OS used 64 bit Windows.

3.7. Dataset

In particular, for our innovations, five different health sector indexes have been selected from four different stock markets of four different nations, namely London, Germany, France, and America. The data on equity indexes are updated daily. In addition, each trading day’s equity indexes are included in all datasets. Initially, we chose five high-quality companies from each nation based on their performance over time and asset size. When we are considering the stock market for forecasting, as we know, the nature of the dataset is random. So, if we take only one dataset for our experiment, we may not conclude that our model can also perform better on other datasets. So, for making a model as a general one we will suggest multiple datasets. The details of the stock are shown in Tables 2 and 3.

4. Proposed Framework

This study suggests a new technique to minimize the investment risk by developing a framework that is a combination of the mean-variance model for selecting minimal risk-based stock and a machine learning-based online learning model for stock forecasting. The proposed framework is shown in Figure 1. This framework, in particular, has two major phases: portfolio selection and stock prediction. So, during our framework setup, our empirical research was gone through four stages:(1)Initial asset selection.(2)Developing a mean-variance model for return prediction and final stock selection for an experiment.(3)Predictive model setup.(4)Outcome Evaluation. The Python programming language is used to prepare the computations, Scikit-learn is used to configure and train the online predictive model, and PyPortfolioOpt is used to implement the optimization strategies for finding the most valuable stocks for investment.

Initially, we selected 20 stocks for four different geographic reasons; they are enumerated in Table 2. The stock selection is based on the previous and current performances according to the market capitalization, and those stocks have been selected for their market existence for more than 20 years.

After selecting individual 20 stocks for different geographic reasons, we have to find the potential stocks that give a minimum loss and a maximum profit for an investor. Finally, those will be taken for the final experiment. For the selection of potential stocks, we examine the current portfolio theory, sometimes referred to as the mean-variance model, which was presented by Markowitz [73].

From the selected stocks’ historical datasets, we have collected for one year. We have extracted the mean return and covariance matrix of each stock according to geographic reasons from the historical price. Then, the extracted parameters, i.e., mean return and covariance matrix, are passed to the mean-variance model using the efficient Frontier optimizer. After implementing this optimization technique, we got the following potential stocks for future productivity: here, the following Tables 4–7 show the weights of each stock according to the geographic reason.

According to the modern portfolio theory, stocks with weight values other than zero are considered potential stocks. So, from the Tables 4–7, we have recorded the weights of each stock, and finally, we have to select only stocks that have weights greater than zero. To minimize the risk factor again, we have added one condition that those group stocks with a sharp ratio of more than 1 will be a great advantage for investors in choosing whether or not to purchase or sell a particular stock. Now, it is time to extract the real stocks, which will be carried out for our experiment.

After successfully selecting potential stocks, our next objective is to create an online training approach before developing an online training approach. It is crucial to find the essential features for finding the hidden truth behind our stock. Data preprocessing makes the dataset more powerful and makes sense for a machine learning model. Here, we are highlighting the steps.

4.1. Data Preprocessing

An earlier study of related topics lacked explicit instructions for picking relevant input features to predict the index’s flow direction. As a result, we can confidently assert a hidden behaviour behind every technical feature. For example, according to Weng et al. [18], investors use covert behaviour to analyse the present circumstances and determine whether to purchase or sell. Ultimately, given their opinion on the indicators that evaluate the concealed performance of these input data, it is possible to anticipate the fiscal market uniquely. As a result, we used indicators in our investigation and other elements to forecast asset price movement.

The raw dataset must be preprocessed once it has been received. As part of the data preprocessing, we followed the following steps:(1)In general, the index collected from the online site has specific preexisting attributes such as open, close, low, high, and soon. Therefore, we needed to deal with the null and missing values with the dataset in hand.(2)We extracted eight technical indicators from the previously stated line in the second phase. In addition, our process included two additional features: the variation between the opening and closing prices of a stock on a given day, which represents both a growth and a fall in its value, as well as the difference between the high and low price, which represents the volatility of that day’s stock price.(3)As part of our response anticipated variable, we created a binary feedback variable for individual trading days, i.e., Dn 0 and 1. The forecasted feedback variable on the Nth day is computed as follows: If Op_n < Cl_n Then D_n = 1 Else D_n = 0 End If

In this case, D_n is our forecasted variable since we used the prediction label “TREND.” On the nth day of the index’s life, Op_n is its opening price and Cl_n is its closing price. For instance, if the D_n returns a value of “1,” then the fund's value will rise, but if the D_n returns a value of “0,” then the fund’s value will fall. Traders and researchers use the (Ta-Lib) library to calculate technical indicators in the technical analysis [108]. We use the VIF technique to find the best set of features out of many.

4.2. Extracted Features

4.2.1. SAR Indicator

An indicator called the parabolic SAR is a way to see how the price of a specific thing will change over time.

4.2.2. Parabolic SAR Extended

It is an indicator designed for opsonists as it is reactive at the beginning of the trend but then remains little influenced by the movements. Although significant, that does not change the current trend. Buy signals are generated when the indicator is above 0; sell signals are generated when the indicator is below 0.

4.2.3. Aroon Indicator

The Aroon indicator determines whether a price is trending or trading inside a range. Additionally, it can show the start of a new trend its strength and assist in anticipating transitions from trading ranges to trends.

4.2.4. The Balance of Power (BOP)

The balance of power (BOP) indicator measures a price trend by evaluating the strength of buy and sell signals, determining how strongly the price moves between extraordinarily high and low levels. The BOP oscillates between −1 and 1, with positive values indicating more substantial buying pressure and negative values indicating intense selling pressure. When the indicator gets closer to zero, it shows that the buyers’ and sellers’ strength is equating.

4.2.5. The Directional Movement Index

The DMI is a useful metric that is used to cut down on the number of false signals. It analyses both the degree and direction of a price movement. The greater the spread between the two main lines, the more influential the trend.

4.2.6. The Chaikin A/D Oscillator

The indicator examines the line of moving average convergence-divergence that shows how much money is added or taken away. A cross above the accumulation-distribution line means that people in the market are buying more shares, securities, or contracts, which is usually a good thing.

4.2.7. OBV

On balance volume (OBV) is a straightforward indicator that uses turnover and pricing to determine how much people buy and sell. There is much-buying pressure when positive volume outnumbers negative volume, and the OBV line rises.

4.2.8. True Range

It looks at the range of the day and any different from the previous day’s close price.

4.2.9. COS

Vector cosine calculates the trigonometric cosine of each element in the input array.

4.2.10. Open

This is a feature that has been there for a long time. It shows the stock price at the start of every day.

4.2.11. Open-Close

Under this instance, the disparity between the entry and exit prices is clearly shown in all of the transactions that happen each day.

4.2.12. High-Low

This feature shows how volatile each trading day is. On that day, it is the gap between the top and low price points.

4.2.13. Close

At the close of the stock each day, this feature shows the stock price. Again, this is a feature that has been there before.

4.2.14. Volume

This feature shows each day’s total buying and selling quantity on each trading day.

As it is a binary classification problem whether the stock goes up or down, we found different stocks from different geographic regions during the preprocessing data stage, which leads to a data imbalance problem. So to avoid this problem, we used the SMOTE technique to balance the dataset. Finally, as part of the data processing stage, we used the scaling approach to normalize the characteristics that would be fed to our model. After acquiring a balanced dataset, 75 percent of the dataset was used for training and 25 percent for testing; we divided it into these two groups. As we know, during ML model training there is a chance of overfitting, so technically, during our practical experiment, we adopted the cross-validation technique to avoid overfitting issues.

As far we discussed the demerits of offline learning models, we have implemented two online learning models that can allow the prediction model to be upgraded quickly for any current data instances. Consequently, online learning algorithms are substantially more efficient and scalable than traditional machine learning algorithms to be applied to a wide range of machine learning problems encountered in actual data analytic applications. So, we have used two online learning algorithms. One is a perception, and the other one is a passive-aggressive classifier. Instead of developing a single predictive model for forecasting, we combine the models’ predictive capabilities and pass them to a voting classifier. Finally, the voting classifier merged the performances and built a highly reliable online predictive forecasting model.

5. Results and Discussion

To evaluate the suggested method's performance, we employed 10 potential indices for different geographic reasons, as reflected in Table 3. In this research, we assume that optimization techniques represented by the mean-variance model are well suited to enhancing the Sharpe ratio for building a portfolio of different geographically reason-based stocks. For selecting potential stocks, the primary performance measure that we use is the Sharpe ratio, and we calculate the weights of each index using the mean-variance optimization technique. The results are reflected in Tables 4–7. To test our online learning predictive model, we have selected different geographic stocks whose performance measures are given below according to the geographic reasons.

As shown in Table 8, our proposed model was performed on three American stock indices, likely UNH, NVO, and DHR, whose performances and accuracy measures are shown in Table 8. The recorded results found that the UNH index has a training accuracy of 99.16, while the testing accuracy is 99.60. The recorded AUC score is 99.6041, and the Hamming loss is 0.00399. The precision value recorded about group 0 is 0.99 and to group 1 is 1.00, whereas the recall value recorded pertaining to group 0 is 1.00 and to group 1 is 0.99. Finally, the f1 score related to group 0 is 1.00 and about group 1 is 1.00. The NVO index has a training accuracy of 99.68, whereas the testing accuracy is 99.73. The recorded AUC score is 99.7322, and the Hamming loss is 0.002368. The precision value recorded in group 0 is 0.99 and related to group 1 is 1.00, 0, whereas the recall value recorded in group 0 is 1.00 and related to group 1 is 0.99. Finally, the f1 score related to group 0 is 1.00 and about group 1 is 1.00. The DHR index has a training accuracy of 99.47, while the testing accuracy is 99.36. The recorded AUC score is 99.3724, and Hamming loss is 0.00635. The precision value recorded about group 0 is 0.99 and group 1 is 1.00. In contrast, the recall value recorded about group 0 is 1.00 and on group 1 is 0.99, and finally, the f1 score about group 0 is 99.00 and related to group 1 is 99.00. Table 9 shows the confusion matrix.

Our proposed model was performed on four French stock indices, likely CVS.F, EWL.F, MTO.F, and TN8.F, whose performances and accuracy measures are shown in Table 10. From the recorded results, we found that the CVS.F index has a training accuracy of 99.72, whereas the testing accuracy is 99.49. The recorded AUC score is 99.50, and the Hamming loss is 0.00508. The precision value recorded pertaining to group 0 is 0.99 and to group 1 is 1.00, whereas the recall value recorded pertaining to group 0 is 1.00 and to group 1 is 0.99. Finally, the f1 score related to group 0 is 99.00, and pertaining to group 1 is 1.00. By looking at the table, the EWL. F index has a training accuracy of 99.87 percent, whereas the testing accuracy is 99.61 percent. The recorded AUC score is 99.60, and the Hamming loss is 0.00386. The precision value recorded pertaining to group 0 is 0.99 and pertaining to group 1 is 1.00. In contrast, the recall value recorded pertaining to group 0 is 1.00 and pertaining to group 1 is 0.99, and finally, the f1 score pertaining to group 0 is 1.00 and pertaining to group 1 is 1.00. The MTO.F index has a training accuracy of 99.94, whereas the testing accuracy is 99.91. The recorded AUC score is 99.91, and the Hamming loss is 0.00086. The precision value recorded pertaining to group 0 is 1.00 and pertaining to group 1 is 1.00, whereas the recall value recorded pertaining to group 0 is 1.00 and pertaining to group 1 is 1.00, and finally, the f1 score pertaining to group 0 is 1.00 and pertaining to group 1 is 1.00. The TN8.F index has a training accuracy of 99.81, whereas the testing accuracy is 99.84. The recorded AUC score is 99.84, and Hamming loss is 0.00151. The precision value recorded pertaining to group 0 is 1.00 and pertaining to group 1 is 1.00, whereas the recall value recorded pertaining to group 0 is 1.00 and pertaining to group 1 is 1.00. Finally, the f1 score pertaining to group 0 is 1.00 and pertaining to group 1 is 1.00. Table 11 shows the confusion matrix.

As shown in Tables 12 and 13, our proposed model was performed on two German stock indices, likely AFX.DE and MRK.DE, whose performances and accuracy measures are shown in the table. From the recorded results, we found that the CVS.F index has a training accuracy of 99.1627, while the testing accuracy is 99.1631. The recorded AUC score is 99.15, and Hamming loss is 0.00836. The precision value recorded as group 0 is 0.99 and group 1 is 1.00. In contrast, the recall value recorded related to group 0 is 1.00 and about group 1 is 0.99, and the f1 score related to group 0 is 99.00 and related to group 1 is 99.00. For the index MRK.DE, the training accuracy is 99.4553, while the testing accuracy is 99.4771. The recorded AUC score is 99.15, and the Hamming loss is 0.00522. The precision value recorded in group 0 is 0.99 and in group 1 is 1.00. In contrast, the recall value recorded related to group 0 is 1.00 and related to group 1 is 0.99, and finally, the f1 score about group 0 is 99.00 and related to group 1 is 99.00. Table 11 shows the confusion matrix.

As shown in Table 14, our proposed model was performed on one London stock index, likely DPH.L, whose performances and accuracy measures are shown in Table 14. From the recorded results, we found that the DPH.L index has a training accuracy of 98.74, whereas the testing accuracy is 98.40. The recorded AUC score is 98.44, and the Hamming loss is 0.01591. The precision value recorded pertaining to group 0 is 0.97 and pertaining to group 1 is 1.00. In contrast, the recall value recorded pertaining to group 0 is 1.00 and pertaining to group 1 is 0.97, and finally, the f1 score pertaining to group 0 is 98.00 and about group 1 is 98 .00. The forecast performances of online learning ensemble model, London indices. Table 15 shows the confusion matrix.

5.1. Performance Comparison with Past Works

Table 16 shows a relative performance level with the past works with our proposed model.

5.2. Practical Implications

Nowadays, machine learning-based systems give recommendations about certain companies to investors to have a basic notion and minimize their investing losses. By analysing vast quantities of data and developing simple, widely accessible solutions that benefit everyone, not just businesses, AI has a big impact on the exchange of currency. In contrast to humans, who appear to be overly enthusiastic about the trading of assets, AI will make reasoned, accurate, and fair speculative decisions. This strategy could be used to develop new trading strategies or to manage investment portfolios by rebalancing holdings based on trend forecasts. This will help various financial institutions collect information about share prices so they can advise their clients on how to maximize earnings and reduce losses. In addition, it pushes the research community on a new path by illustrating how online algorithms can be combined with various technical markers and the ramifications of modifying various parameters [102, 113].

6. Conclusion

In this manuscript, we come up with a method that uses an ensemble-based incremental learning approach for short-term stock price forecasting that can minimize investment risk (an optimal portfolio) and make a predictive decision to find the direction of the selected stocks. Our proposed framework is composed of two models. The first model is the mean-variance model, which is used to minimize the risk assessment of individual stocks. The second model is an incremental-based ensemble model composed of two online learning algorithms, i.e., perceptron and passive-aggressive classifier. Besides the historical data of stock market closing prices, open price volume, nine indicators, and two extracted features are also inserted to boost the ensemble framework’s performance. Initially, we selected 20 stocks from four different geographic countries, i.e., America, Germany, France, and London. Still, after implementing the mean-variance model, we got only 10 indices for the final experiment. Our fact-finding technique revealed that, as the stock market is always a source of data collection at regular intervals, it is always a good perspective from a researcher’s point of view. Therefore, instead of using a batch learning technique, an online learning technique is used for forecasting the financial market. Our experimental bench reveals that the random selection of stock market data for forecasting is a meaningless practice for the researcher, which should be avoided with the help of different portfolio selection techniques. As a reference, our proposed framework revealed a lot. Our study found that our proposed online learning ensemble models’ performance, which is depicted in Table 17, shows the average accuracy level of indices belonging to four different geographic reasons.

The study revealed that performance levels may increase when we add indicators as our input features and handle the imbalanced dataset. Therefore, instead of using a batch learning-based single predictive model for forecasting, it is always better to practice using an ensemble model based on online learning to improve forecasting performances, as shown in Table 16. However, if we look into the runtime of this framework, it takes the range of training time, i.e., 4 to 12 seconds to train the model on different datasets. As the number of folds increases, the training time also increases, as well as when the data size increases, the training time also increases.

6.1. Limitations and Future Work

However, despite the excellent prediction performance of our proposed methodology, certain limitations may be resolved in the future. First, as our current study only predicts the direction of stocks one day in advance, it will require growth in the future for long-term market direction predictions. This study was limited to four stock exchanges, but it could have been more comprehensive if it had included stock exchanges from additional countries. In conclusion, future research must also employ this dataset, as the study did not investigate additional information sources, such as fundamental and sentiment analysis [52, 105, 106, 112–114].

Data Availability

The data used to support the findings of this study are available from the first author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

Jana Shafi would like to thank the Deanship of Scientific Research, Prince Sattam bin Abdulaziz University, for supporting this work.

References

G. Shivam, K. Arpan Kumar, B. Abdullah, A. A. Wassan, and K. Al, “Big data with cognitive computing,” A review for the future,International Journal of Information Management, vol. 42, pp. 78–89, 2018, ISSN 0268-4012.
View at: Google Scholar
C. Shousong, W. Xiaoguang, and Z. Yuanjun, “Revenue model of supply chain by internet of things technology,” IEEE Access, vol. 7, pp. 4091–4100, 2018.
View at: Google Scholar
T. Bodnar, S. Mazur, and Y. Okhrin, “Bayesian estimation of the global minimum variance portfolio,” European Journal of Operational Research, vol. 256, no. 1, pp. 292–307, 2017.
View at: Publisher Site | Google Scholar
F. Yang, Z. Chen, J. Li, and L. Tang, “A novel hybrid stock selection method with stock prediction,” Applied Soft Computing, vol. 80, pp. 820–831, 2019.
View at: Publisher Site | Google Scholar
M. Jensen, “Some anomalous evidence regarding market efficiency, J,” Finance and Economics, vol. 6, no. 2–3, pp. 95–101, 1978.
View at: Publisher Site | Google Scholar
E. F. Fama, “Random walks in stock market prices,” Financial Analysts Journal, vol. 21, no. 5, pp. 55–59, 1965.
View at: Publisher Site | Google Scholar
P. Yu and X. Yan, “Stock price prediction based on deep neural networks,” Neural Computing & Applications, vol. 32, no. 6, pp. 1609–1628, 2020.
View at: Publisher Site | Google Scholar
F. Kunze, M. Spiwoks, K. Bizer, and T. Windels, “The usefulness of oil price forecasts-e,” Managerial and Decision Economics, vol. 39, no. 4, pp. 427–446, 2018.
View at: Publisher Site | Google Scholar
B. V. Dasarathy, “Nearest-neighbor classification techniques,” IEEE Computer Society Press, Los Alomitos, CA, USA, 1991.
View at: Google Scholar
M.-L. Zhang, J. M. Peña, and V. Robles, “Feature selection for multi-label naive Bayes classification,” Information Sciences, vol. 179, no. 19, pp. 3218–3229, 2009.
View at: Publisher Site | Google Scholar
B. Chandra and P. Paul Varghese, “Moving towards efficient decision tree construction,” Information Sciences, vol. 179, no. 8, pp. 1059–1069, 2009.
View at: Publisher Site | Google Scholar
O. Chapelle, P. Haffner, and V. N. Vapnik, “Support vector machines for histogram-based image classification,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1055–1064, 1999.
View at: Publisher Site | Google Scholar
D. Fisch, B. Kühbeck, B. Sick, and S. J. Ovaska, “So near and yet so far: new insight into properties of some well-known classifier paradigms,” Information Sciences, vol. 180, no. 18, pp. 3381–3401, 2010.
View at: Publisher Site | Google Scholar
A. Bouchachia, “Learning with incrementality,” in Proceedings of the 13th International Conference on Neural Information Processing, pp. 137–146, Hong Kong, China, October 2006.
View at: Publisher Site | Google Scholar
S. C. H. Hoi, D. Sahoo, J. Lu, and P. Zhao, “Online learning: a comprehensive survey,” Neurocomputing, vol. 459, pp. 249–289, 2021, ISSN 0925-2312.
View at: Publisher Site | Google Scholar
W. Farida Agustini, I. R. Affianti, and E. R. Putri, “Stock price prediction using geometric Brownian motion,” Journal of Physics: Conference Series, vol. 974, Article ID 012047, 2018.
View at: Publisher Site | Google Scholar
K. Adam, A. Marcet, and J. P. Nicolini, “Stock market volatility and learning,” The Journal of Finance, vol. 71, no. 1, pp. 33–82, 2016.
View at: Publisher Site | Google Scholar
T.-A. Dinh and Y.-K. Kwon, “An empirical study on importance of modeling parameters and trading volume-based features in daily stock trading using neural networks,” Informatics, vol. 5, no. 3, p. 36, 2018.
View at: Publisher Site | Google Scholar
S. Mehdizadeh and A. K. Sales, “A comparative study of autoregressive, autoregressive moving average, gene expression programming and Bayesian networks for estimating monthly streamflow,” Water Resources Management, vol. 32, pp. 1–22, 2018.
View at: Publisher Site | Google Scholar
Y. Zhang, W. Song, M. Karimi, C.-H. Chi, and A. Kudreyko, “Fractional autoregressive integrated moving average and finite-element modal: the forecast of tire vibration trend,” IEEE Access, vol. 6, pp. 40137–40142, 2018.
View at: Publisher Site | Google Scholar
Q. Li, G. Cao, and W. Xu, “Relationship research between meteorological disasters and stock markets based on a multifractal detrending moving average algorithm,” International Journal of Modern Physics B, vol. 32, Article ID 1750267, 2018.
View at: Publisher Site | Google Scholar
T. Petukhova, D. Ojkic, B. McEwen, R. Deardon, and Z. Poljak, “Assessment of autoregressive integrated moving average (ARIMA), generalized linear autoregressive moving average (GLARMA), and random forest (RF) time series regression models for predicting influenza a virus frequency in swine in Ontario, Canada,” PLoS One, vol. 13, no. 6, Article ID e0198313, 2018.
View at: Publisher Site | Google Scholar
D. Wang and L. Zhang, “A fuzzy set-valued autoregressive moving average model and its applications,” Symmetry, vol. 10, no. 8, p. 324, 2018.
View at: Publisher Site | Google Scholar
R. Rui, D. D. Wu, and T. Liu, “Forecasting stock market movement direction using sentiment analysis and support vector machine,” IEEE Systems Journal, vol. 13, pp. 60–770, 2018.
View at: Google Scholar
S. Basak, S. Kar, S. Saha, L. Khaidem, and S. Dey, “Predicting the direction of stock market prices using tree-based classifiers,” The North American Journal of Economics and Finance, vol. 47, pp. 552–567, 2018.
View at: Google Scholar
R. J. Shiller, “From efficient markets theory to behavioral finance,” The Journal of Economic Perspectives, vol. 17, no. 1, pp. 83–104, 2003.
View at: Publisher Site | Google Scholar
R. H. Thaler, “The end of behavioral finance,” Financial Analysts Journal, vol. 55, no. 6, pp. 12–17, 1999.
View at: Publisher Site | Google Scholar
D. Jothimani and S. S. Yadav, “Stock trading decisions using ensemble-based forecasting models: a study of the Indian stock market,” Journal of Banking and Financial Technology, vol. 3, no. 2, pp. 113–129, 2019.
View at: Publisher Site | Google Scholar
I. K. Nti, A. F. Adekoya, and B. A. Weyori, “A systematic review of fundamental and technical analysis of stock market predictions,” Artificial Intelligence Review, vol. 53, no. 4, pp. 3007–3057, 2020.
View at: Publisher Site | Google Scholar
P. Agarwal, S. Bajpai, A. Pathak, and R. Angira, “Stock market price trend forecasting using machine learning,” International Journal for Research in Applied Science and Engineering Technology, vol. 5, pp. 1673–1676, 2017.
View at: Google Scholar
M. Schmeling, “Investor sentiment and stock returns: some international evidence,” Journal of Empirical Finance, vol. 16, no. 3, pp. 394–408, 2009.
View at: Publisher Site | Google Scholar
Z. Haider Khan, T. Sharmin Alin, and A. Hussain, “Price prediction of share market using artificial neural network 'ANN',” International Journal of Computer Application, vol. 22, no. 2, pp. 42–47, 2011.
View at: Publisher Site | Google Scholar
R. E. Schapire, “Explaining AdaBoost,” Empirical Inference, Springer. Berlin/Heidelberg, Berlin, Germany, 2019.
View at: Google Scholar
D. K. Padhi and N. Padhy, “Prognosticate of the financial market utilizing ensemble-based conglomerate model with technical indicators,” Evolutionary Intelligence, vol. 14, no. 2, pp. 1035–1051, 2021.
View at: Publisher Site | Google Scholar
T. Marwala and E. Hurwitz, “Artificial intelligence and economic theory: skynet in the market,” Springer, London, UK, 2017.
View at: Google Scholar
B. M. Henrique, V. A. Sobreiro, and H. Kimura, “Literature review: machine learning techniques applied to financial market prediction,” Expert Systems with Applications, vol. 124, pp. 226–251, 2019.
View at: Publisher Site | Google Scholar
E. Chong, C. Han, and F. C. Park, “Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies,” Expert Systems with Applications, vol. 83, pp. 187–205, 2017.
View at: Publisher Site | Google Scholar
C. N. Avery, J. A. Chevalier, and R. J. Zeckhauser, “The “CAPS“ prediction system and stock market returns,” Review of Finance, vol. 20, no. 4, pp. 1363–1381, 2016.
View at: Publisher Site | Google Scholar
V. V. Prasad, S. Gumparthi, L. Y. Venkataramana, S. Srinethe, R. M. SruthiSree, and K. Nishanthi, “Prediction of stock prices using statistical and machine learning models: a comparative analysis,” The Computer Journal, vol. 65, 2021.
View at: Publisher Site | Google Scholar
E. Zong, “Forecasting daily stock market return using dimensionality reduction,” Expert Systems with Applications, vol. 67, pp. 126–139, 2017.
View at: Google Scholar
M. Hiransha, E. A. Gopalakrishnan, V. K. Menon, and K. P. Soman, “NSE stock market prediction using deep-learning models,” Procedia Computer Science, vol. 132, pp. 1351–1362, 2018.
View at: Google Scholar
S. Yauheniya, T. M. McGinnity, S. A. Coleman, and B. Ammar, “Forecasting movements of health-care stock prices based on different categories of news articles using multiple kernel learning,” Decision Support Systems, vol. 85, pp. 74–83, 2016.
View at: Google Scholar
B. Huang, Y. Huan, L. D. Xu, L. Zheng, and Z. Zou, “Automated trading systems statistical and machine learning methods and hardware implementation: a survey,” Enterprise Information Systems, vol. 13, no. 1, pp. 132–144, 2018.
View at: Publisher Site | Google Scholar
X. Zhong and D. Enke, “Predicting the daily return direction of the stock market using hybrid machine learning algorithms,” Financ. Innov., vol. 5, pp. 1–20, 2019.
View at: Publisher Site | Google Scholar
M. Qiu, Y. Song, and F. Akagi, “Application of artificial neural network for the prediction of stock market returns: the case of the Japanese stock market,” Chaos, Solitons & Fractals, vol. 85, pp. 1–7, 2016.
View at: Publisher Site | Google Scholar
K. Chourmouziadis and P. D. Chatzoglou, “An intelligent short term stock trading fuzzy system for assisting investors in portfolio management,” Expert Systems with Applications, vol. 43, pp. 298–311, 2016.
View at: Publisher Site | Google Scholar
A. Arévalo, J. Niño, G. Hernández, and J. Sandoval, “High-frequency trading strategy based on deep neural networks,” Intelligent Computing Methodologies, Springer. Berlin/Heidelberg, Berlin, Germany, p. v9773, 2016.
View at: Google Scholar
P. Paik and B. Kumari, “Stock market prediction using ANN, SVM, ELM: a review,” IJETTCS, vol. 6, pp. 88–94, 2017.
View at: Google Scholar
D. Murekachiro, “A review of artifcial neural networks application to stock market predictions,” Network and Complex Systems, vol. 6, p. 3002, 2016.
View at: Google Scholar
E. K. Ampomah, Z. Qin, and G. Nyame, “Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement,” Information, vol. 11, no. 6, p. 332, 2020.
View at: Publisher Site | Google Scholar
H. Basak, R. Kundu, P. K. Singh, M. Fazal, and W. Marcin, “A union of deep learning and swarm-based optimization for 3D human action recognition,” Scientific Reports, vol. 12, p. 5494, 2022.
View at: Publisher Site | Google Scholar
A. Vulli, P. Naga, M. Sai, J. Shafi, and J. Choi, “Fine-tuned DenseNet-169 for breast cancer metastasis prediction using FastAI and 1-cycle policy,” Sensors, vol. 22, p. 2988, 2022.
View at: Publisher Site | Google Scholar
D. Shah, H. Isah, and F. Zulkernine, “Stock market analysis: a review and taxonomy of prediction techniques,” International Journal of Financial Studies, vol. 7, no. 2, p. 26, 2019.
View at: Publisher Site | Google Scholar
S. Basak, S. Kar, S. Saha, L. Khaidem, and S. R. Dey, “Predicting the direction of stock market prices using tree-based classifiers,” The North American Journal of Economics and Finance, vol. 47, pp. 552–567, 2019.
View at: Publisher Site | Google Scholar
N. Milosevic, “Equity forecast: predicting long term stock price movement using machine learning,” 2016, https://arxiv.org/abs/1603.00751.
View at: Google Scholar
S. S. Choudhury and M. Sen, “Trading in Indian stock market using ANN: a decision review,” Advances in Modelling and Analysis A: General Mathematics, vol. 54, pp. 252–262, 2017.
View at: Google Scholar
S. Boonpeng and P. Jeatrakul, “Decision support system for investing in stock market by using OAA-neural network,” in Proceedings of the Eighth International Conference on Advanced Computational Intelligence (ICACI), pp. 1–6, Chiang Mai, Thailand, February 2016.
View at: Publisher Site | Google Scholar
R. Yang, L. Yu, Y. Zhao et al., “Big data analytics for financial Market volatility forecast based on support vector machine,” International Journal of Information Management, vol. 50, pp. 452–462, 2019.
View at: Google Scholar
M. Ballings, D. Van den Poel, N. Hespeels, and R. Gryp, “Evaluating multiple classifiers for stock price direction prediction,” Expert Systems with Applications, vol. 42, no. 20, pp. 7046–7056, 2015.
View at: Publisher Site | Google Scholar
M. T. Leung, H. Daouk, and A.-S. Chen, “Forecasting stock indices: a comparison of classification and level estimation models,” International Journal of Forecasting, vol. 16, no. 2, pp. 173–190, 2000.
View at: Publisher Site | Google Scholar
B. S. Omer, O. Murat, and D. Erdogan, “A deep neural-network based stock trading system based on evolutionary optimized technical analysis parameters,” Procedia Computer Science, vol. 114, pp. 473–480, 2017.
View at: Google Scholar
Q. Jiayu, W. Bin, and Z. Changjun, “Forecasting stock prices with long-short term memory neural network based on attention mechanism,” PLoS One, vol. 15, Article ID e0227222, 2020.
View at: Google Scholar
F. Ecer, S. Ardabili, S. S. Band, and A. Mosavi, “Training multilayer perceptron with genetic algorithms and particle swarm optimization for modeling stock price index prediction,” Entropy, vol. 22, no. 11, p. 1239, 2020.
View at: Publisher Site | Google Scholar
O. Sagi and L. Rokach, “Ensemble learning: a survey, wiley interdiscip. Rev.-Data mining knowl,” Discover, vol. 8, no. 4, Article ID e1249, 2018.
View at: Publisher Site | Google Scholar
X.-d. Zhang, A. Li, and R. Pan, “Stock trend prediction based on a new status box method and AdaBoost probabilistic support vector machine,” Applied Soft Computing, vol. 49, pp. 385–398, 2016.
View at: Publisher Site | Google Scholar
J. Nobre and R. F. Neves, “Combining principal component analysis, discrete wavelet transform and xgboost to trade in the financial markets,” Expert Systems with Applications, vol. 125, pp. 181–194, 2019.
View at: Publisher Site | Google Scholar
F. Zhou, Q. Zhang, D. Sornette, and L. Jiang, “Cascading logistic regression onto gradient boosted decision trees forforecasting and trading stock indices,” Applied Soft Computing, vol. 84, Article ID 105747, 2019.
View at: Google Scholar
K. K. Yun, S. W. Yoon, and D. Won, “Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process,” Expert Systems with Applications, vol. 186, Article ID 115716, 2021.
View at: Publisher Site | Google Scholar
B. Yang, G. Zi-Jia, and Y. Wenqi, “Stock market index prediction using deep neural network ensemble,” in Proceedings of the 36th Chinese Control Conference (CCC), pp. 3882–3887, Dalian, China, July 2017.
View at: Publisher Site | Google Scholar
J.-J. Wang, J.-Z. Wang, Z.-G. Zhang, and S.-P. Stock, “Stock index forecasting based on a hybrid model,” Omega, vol. 40, no. 6, pp. 758–766, 2012.
View at: Publisher Site | Google Scholar
X. Chenglin, X. Weili, and J. Jijiao, “Stock price forecast based on combined model of ARI-MA-LS-SVM,” Neural Computing & Applications, vol. 32, pp. 5379–5388, 2020.
View at: Google Scholar
S. Tiwari, P. Rekha, and R. Vineet, “Predicting future trends in stock market by decision tree rough-set based hybrid system with hhmm,” International Journal of Electronics, vol. 1, pp. 1578–1587, 2010.
View at: Google Scholar
H. Markowitz, “Portfolio selection,” The Journal of Finance, vol. 7, no. 1, pp. 77–91, 1952.
View at: Publisher Site | Google Scholar
R. C. Merton, “Lifetime portfolio selection under uncertainty: the continuous-time case,” The Review of Economics and Statistics, vol. 51, no. 3, pp. 247–257, 1969.
View at: Publisher Site | Google Scholar
E. F. Fama, “Multiperiod consumption-investment decisions,” The American Economic Review, vol. 60, no. 1, pp. 163–174, 1970.
View at: Google Scholar
W. Chen, D. Li, and Y.-J. Liu, “A novel hybrid ICA-FA algorithm for multiperiod uncertain portfolio optimization model based on multiple criteria,” IEEE Transactions on Fuzzy Systems, vol. 27, no. 5, pp. 1023–1036, 2019.
View at: Publisher Site | Google Scholar
A. D. Roy, “Safety first and the holding of assets,” Econometrica, vol. 20, no. 3, pp. 431–449, 1952.
View at: Publisher Site | Google Scholar
H. Markowitz, Portfolio Selection: Efficient Doversification Of Investments, Wiley, Hoboken, NJ, USA, 1959.
H. Konno and H. Yamazaki, “Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market,” Management Science, vol. 37, no. 5, pp. 519–531, 1991.
View at: Publisher Site | Google Scholar
M. Speranza, “Linear programming models for portfolio optimization,” Finance, vol. 12, pp. 107–123, 1993.
View at: Google Scholar
W. Chen, Y. Wang, S. Lu, and M. Mehlawat, “A hybrid FA-SA algorithm for fuzzy portfolio selection with transaction costs, Ann,” Operations Research, vol. 269, no. 1–2, pp. 129–147, 2018.
View at: Publisher Site | Google Scholar
W. Chen, Y. Wang, P. Gupta, and M. Mehlawat, “A novel hybrid heuristic algorithm for a new uncertain mean-variance-skewness portfolio selection model with real constraints,” Applied Intelligence, vol. 48, no. 9, pp. 2996–3018, 2018.
View at: Google Scholar
J. Zhou and X. Li, “Mean-semi-entropy portfolio adjusting model with transaction costs,” Journal of Digital Information Management, vol. 2, no. 3, pp. 121–130, 2020.
View at: Publisher Site | Google Scholar
M. A. Akbay, C. B. Kalayci, and O. Polat, “A parallel variable neighborhood search algorithm with quadratic programming for cardinality constrained portfolio optimization,” Knowledge-Based Systems, vol. 198, Article ID 105944, 2020.
View at: Publisher Site | Google Scholar
F. D. Paiva, R. T. N. Cardoso, G. P. Hanaoka, and W. M. Duarte, “Decision-making for financial trading: a fusion approach of machine learning and portfolio selection,” Expert Systems with Applications, vol. 115, pp. 635–655, 2019.
View at: Publisher Site | Google Scholar
W. Wang, W. Li, N. Zhang, and K. Liu, “Portfolio formation with preselection using deep learning from long-term financial data,” Expert Systems with Applications, vol. 143, Article ID 113042, 2020.
View at: Publisher Site | Google Scholar
F. Rosenblatt, “The Perceptron: a probabilistic model for information storage and organization in the brain,” Psychological Review, vol. 65, no. 6, pp. 386–408, 1958.
View at: Publisher Site | Google Scholar
K. Crammer, O. Dekel, J. Keshet, S. S. Shwartz, and Y. Singer, “Online passive–aggressive algorithm,” Journal of Machine Learning Research, vol. 7, pp. 551–585, 2006.
View at: Google Scholar
L. Wang, H.-B. Ji, and Y. Jin, “Fuzzy Passive–Aggressive classification: a robust and efficient algorithm for online classification problems,” Information Sciences, vol. 220, pp. 46–63, 2013.
View at: Publisher Site | Google Scholar
J. Xiao, C. He, X. Jiang, and D. Liu, “A dynamic classifier ensemble selection approach for noise data,” Information Sciences, vol. 180, no. 18, pp. 3402–3421, 2010.
View at: Publisher Site | Google Scholar
F. Orabona, J. Keshet, and B. Caputo, “Bounded kernel-based online learning,” Journal of Machine Learning Research, vol. 10, pp. 2643–2666, 2009.
View at: Google Scholar
C. Gentile, “A new approximate maximal margin classification algorithm,” Journal of Machine Learning Research, vol. 2, pp. 213–242, 2001.
View at: Google Scholar
G. Cauwenberghs and T. Poggio, “Incremental and decremental support vector machine learning,” Advances in Neural Information Processing Systems, vol. 13, pp. 409–415, 2000.
View at: Google Scholar
C. P. Diehl and G. Cauwenberghs, “SVM incremental learning, adaptation and optimization,” in Proceedings of the International Joint Conference, Neural Networks, pp. 2685–2690, Portland, OR, USA, July 2003.
View at: Google Scholar
P. Laskov, C. Gehl, S. Kruger, and K. R. Muller, “Incremental support vector learning: analysis, implementation and applications,” Machine Learning, vol. 7, pp. 1909–1936, 2006.
View at: Google Scholar
E. Lughofer, “On-line evolving image classifiers and their application to surface inspection,” Image and Vision Computing, vol. 28, no. 7, pp. 1065–1079, 2010.
View at: Publisher Site | Google Scholar
G. Heo and P. Gader, “Fuzzy SVM for noisy data: a robust membership calculation method,” in Proceedings of the 2009 IEEE International Conference on Fuzzy Systems, pp. 431–436, Jeju, Korea, June 2009.
View at: Publisher Site | Google Scholar
M. Jiang, J. Liu, and L. Zhang, “An improved stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms,” Physica A, 2019.
View at: Google Scholar
J. Ayala, M. García-Torres, J. L. V. Noguera, F. Gómez-Vela, and F. Divina, “Technical analysis strategy optimization using a machine learning approach in stock market indices,” Knowledge-Based Systems, vol. 225, Article ID 107119, 2021.
View at: Publisher Site | Google Scholar
M. Nabipour, P. Nayyeri, H. Jabani, A. Mosavi, and E. Salwana, “Deep learning for stock market prediction,” Entropy, vol. 22, p. 840, 2020.
View at: Publisher Site | Google Scholar
J. Shen and M. O. Shafiq, “Short-term stock market price trend prediction using a comprehensive deep learning system,” J. Big Data, vol. 7, pp. 1–33, 2020.
View at: Publisher Site | Google Scholar
Y. Kumar, A. Koul, R. Singla, and M. Fazal, “Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda,” Journal of Ambient Intelligence and Humanized Computing, 2022.
View at: Publisher Site | Google Scholar
J. Sun, H. Li, H. Fujita, B. Fu, and W. Ai, “Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting,” Information Fusion, vol. 54, pp. 128–144, 2020.
View at: Publisher Site | Google Scholar
R. Yang, Yu Lin, Y. Zhao et al., “Big data analytics for financial Market volatility forecast based on support vector machine,” International Journal of Information Management, vol. 50, pp. 452–462, 2020.
View at: Google Scholar
D. K. Padhi, N. Padhy, A. K. Bhoi, J. Shafi, and M. F. Ijaz, “A fusion framework for forecasting financial market direction using enhanced ensemble models and technical indicators,” Mathematics, vol. 9, p. 2646, 2021.
View at: Publisher Site | Google Scholar
N. Srinivasu, P. Naga, and W. Kim, “Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM,” Sensors, vol. 21, no. 8, p. 2852, 2021.
View at: Publisher Site | Google Scholar
Ijaz, M. Fazal, M. Attique, and Y. Son, “Data-driven cervical cancer prediction model with outlier detection and over-sampling methods,” Sensors, vol. 20, no. 10, p. 2809, 2020.
View at: Publisher Site | Google Scholar
Ta-Lib, “Technical analysis library,” 2021, http://www.ta-lib.org.
View at: Google Scholar
L. Malagrino, N. Roman, and A. Monteiro, “Forecasting stock market index daily direction: a Bayesian network approach,” Expert Systems with Applications, vol. 105, pp. 11–22, 2018.
View at: Publisher Site | Google Scholar
P.-Y. Zhou, K. Chan, and C. Ou, “Corporate communication network and stock price movements: insights from data mining,” IEEE Transactions on Computational Social Systems, vol. 5, no. 2, pp. 391–402, 2018.
View at: Publisher Site | Google Scholar
Q. Wang, W. Xu, and H. Zheng, “Combining the wisdom of crowds and technical analysis for financial market prediction using deep random subspace ensembles,” Neurocomputing, vol. 299, pp. 51–61, 2018.
View at: Publisher Site | Google Scholar
R. Ren, D. Wu, and T. Liu, “Forecasting stock market movement direction using sentiment analysis and support vector machine,” IEEE Systems Journal, vol. 13, 2018.
View at: Google Scholar
H. Hu, L. Tang, S. Zhang, and H. Wang, “Predicting the direction of stock markets using optimized neural networks with Google trends,” Neurocomputing, vol. 285, 2018.
View at: Google Scholar
T. Fischer and C. Krauss, “Deep learning with long short-term memory networks for financial market predictions,” European Journal of Operational Research, vol. 270, no. 2, pp. 654–669, 2018.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Dushmanta Kumar Padhi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1179

Downloads

848

Citations

Computational Intelligence and Neuroscience

Computational Intelligence in Internet of Things Enabled Applications

An Intelligent Fusion Model with Portfolio Selection and Machine Learning for Stock Market Prediction

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Perceptron

3.2. Passive-Aggressive Classifier

3.3. Modern Portfolio Theory

3.4. Imbalanced Data Handling Techniques

3.4.1. Synthetic Minority Oversampling Technique (SMOTE)

3.5. Evaluation Matrices

3.5.1. Hamming Loss

3.6. Instrumentation and Systems Employed

3.7. Dataset

4. Proposed Framework

4.1. Data Preprocessing

4.2. Extracted Features

4.2.1. SAR Indicator

4.2.2. Parabolic SAR Extended

4.2.3. Aroon Indicator

4.2.4. The Balance of Power (BOP)

4.2.5. The Directional Movement Index

4.2.6. The Chaikin A/D Oscillator

4.2.7. OBV

4.2.8. True Range

4.2.9. COS

4.2.10. Open

4.2.11. Open-Close

4.2.12. High-Low

4.2.13. Close

4.2.14. Volume

5. Results and Discussion

5.1. Performance Comparison with Past Works

5.2. Practical Implications

6. Conclusion

6.1. Limitations and Future Work

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright