Abstract

Artificial intelligence (AI) based business process optimization has a significant impact on a country’s economic development. We argue that the use of artificial neural networks in business processes will help optimize these processes ensuring the necessary level in the functioning and compliance with the foundations of sustainable development. In this paper, we proposed a mathematical model using AI to detect outliers in the daily return of Saudi stock market (Tadawul). An outlier is defined as a data point that deviates too much from the rest of the observations in a data sample. Based on the Engle and Granger Causality test, we selected inflation rate, repo rate, and oil prices as input variables. In order to build the mathematical model, we first used the Tukey method to detect outliers in the stock return data from Tadawul that are collected during the period from October 2011 to December 2019. In this way, we categorized the stock return data into two classes, namely, outliers and nonoutliers. These data are further used to train artificial neural network in conjunction with particle swarm optimization algorithm. In order to assess the performance of the proposed model, we employed the mean squared error function. Our proposed model is signified by the mean squared error value of 0.05. The proposed model is capable of detecting outlier values directly from the inflation rate, repo rate, and oil prices. The proposed model can be helpful in developing and applying intelligent optimization techniques to solve problems in business processes.

1. Introduction

The Kingdom of Saudi Arabia is an active member of the Organization of Petroleum Exporting Countries that plays a significant role in the oil markets. Tadawul was launched as an informal organization during the 1970s with only 14 companies listed. In 1984, the government created a committee to develop and regulate the market. In 2003, the government established the Capital Market Authority to regulate the market where only Saudi investors were allowed to invest in the market. In 2007, Tadawul started functioning as a formal organization with nearly 200 companies listed. In addition, the investors from the Gulf Cooperation Council countries were permitted to invest in Tadawul. In 2008, the Capital Market Authority approved a new regulation that allowed non-Arab foreign investors to participate in stock trading. In 2015, the financial regulators opened Tadawul to qualified foreign investment firms. In 2018, the Saudi Financial Supervision Authority took additional steps by allowing foreign investors to own up to 49% in listed securities. These measures helped Tadawul to attract foreign investors in order to become one of the most dynamic capital markets in the region.

The Saudi stock market (Tadawul) is considered the largest financial market in developing countries [1]. It should be noted that the economy of the Kingdom of Saudi Arabia tends to rely on oil as a major source of revenue, and the stock market volatility depends on the fluctuation of oil prices. It should be further noticed that Saudi Arabia is a part of Group of twenty (G20): an international economic cooperation forum that involves representatives from 19 countries plus the European Union. More than 66% of the worldwide population, 75% of the international trade, and 85% of the global economy belong to the members of G20 [2]. Moreover, Saudi Arabia is currently implementing an economic reform package and opening up its venues to the world. These factors exposed the Kingdom to external crises due to which Tadawul has been suffering from heavy volatility during different periods.

The stock price variations are estimated by volatility that reflects the behavior of the stock market. It explains whether a stock price is changing rapidly over time that indicates high volatility or slowly over time that shows low volatility. Volatility is used to measure the standard deviation of stock prices. Stock market volatility (data contains outliers) measures risky stocks and plays an important role to support both market practitioners and policymakers, especially in emerging markets. The stock market volatility creates a wide variety of responses from market players providing an opportunity to some participants who see volatility as a chance to make money whereas some others see it as a threat. Therefore, a practitioner is always worried about the behavior of stock markets. In order to guarantee financial and macroeconomic stability, the economists try to reduce excessive volatility. Indeed, an effective quantitative approach is needed to model the volatility of a stock market in order to protect against its negative effects.

Outliers in time series data are defined as a type of data anomalies where observed values deviate from their expected values, and naturally correspond to critical events [3, 4]. The problem of outlier detection has been considered in several application areas such as customized marketing, credit card fraud detection, sensor event detection, fault diagnosis in industry, weather prediction, and loan approval-related applications. The occurrence of outliers in data may be due to several reasons such as poor data quality or inaccurate measurements [5]. On the other hand, outliers can also indicate interesting and meaningful information that can be represented by periods of high or low volatility particularly in financial time series data. Detecting outlier values is beneficial since these values reflect important information in many application domains [68]. Similarly, in financial data and stock markets, outliers are defined as extreme points that deviate a lot from the other data points. Index and asset prices may demonstrate such behavior. Prior to modeling any financial time series data, it is required to identify the unlikely data points provided that certain fitted model is assumed to have generated the data. Financial time series data are frequently messed up with outliers due to the influence of unusual and nonrepetitive events. Forecast accuracy in such situations is decreased dramatically due to a carry-over effect of the outliers on the point forecast and a bias in the model parameter estimation.

Metaheuristic optimization algorithms are used to estimate the optimal solutions to a set of parameters related to the optimization problems and computational models [9,10]. Note that metaheuristic optimization algorithms are characterized by their ability of quickly reaching the global optimum values of a problem and are easy to implement and control according to different problem models. Particle swarm optimization (PSO) [11], genetic algorithm (GA) [12], and prey-predator algorithm (PPA) [13] are some of the widely used optimization algorithms in many fields. Machine learning, data mining, artificial intelligence (AI), and engineering applications are some of the real-world optimization problems where metaheuristic algorithms can be used effectively [9, 14]. ANNs are widely used for forecasting and classification problem-solving that can be fed with raw data and desired feature representation can be automatically constructed [15]. Several factors add to the importance of ANN, the most important of which are accuracy, speed, and convergence.

There are many outlier detection methods available in the literature. In [16], a new model is proposed named as RBFNDDA, which combines the radial basis function network with dynamic decay adjustment to learn information from a dataset and group it in terms of prototypes. Then, a neighborhood procedure based on rough sets is applied to detect prototype outliers. Interested readers may refer to [17] for more in-depth understanding of outlier detection. Moreover, ANN has been applied to solve real-world problems in various application domains such as medical diagnosis [18], pattern recognition [19], and other related applications. An ANN learns the nonlinearity of the input-output data mapping through a topological structure that is also self-adaptive with a universal functional approximation capability [20]. An ANN generates a good knowledge base that represents different patterns of data samples [21, 22]. It is evident from the literature review that radial basis function neural network (RBFNN) has not been applied to detect and classify the outliers in Tadawul from inflation rate and repo rate that are collected from Saudi Authority for Statistics [23] and oil prices that are collected from Saudi Central Bank [24]. We have used RBFNN in conjunction with PSO for the classification of Saudi Arabia stock prices. The obtained results confirmed the effectiveness of our proposed technique for the mentioned task. It is worth mentioning that some of the most representative computational intelligence algorithms can also be used for training RBFNN, such as Earthworm Optimization Algorithm (EWA) [25], Moth Search (MS) algorithm [26], Slime Mould Algorithm (SMA) [27], and Harris Hawks Optimization (HHO).

This paper is structured as follows. In Section 2, we first present a description of the data that is followed by an overview of RBFNN, PSO, and Tukey method. In Section 3, we discuss the variable selection, correlation, causality test, and the proposed model along with its performance. In Section 4, we draw the conclusion.

2. Materials and Methods

2.1. Data Description

The sample data of closing prices are collected from the stock market (Tadawul) in Saudi Arabia. The day-to-day closing prices were collected during the period from October 2011 to December 2019. The sample size is 2026 [28, 29]. Table 1 shows the descriptive statistics of the dataset. Note that the natural logarithm of standard deviation for closing stock price is indicated by LSCS. In addition, the symbols Repo and Loil are used to represent the repo rate and the logarithm of oil price, respectively. The mean and standard deviation of LSCS are 6.75 and 0.6923, respectively. Furthermore, the minimum and maximum values of LSCS are 3.83 and 7.22, respectively. The mean and standard deviation of Repo are 0.70 and 0.28, correspondingly. The minimum and maximum values of Repo are, respectively, 0.13 and 4.55. The mean and standard deviation of Loil are 4.30 and 0.35. The minimum and maximum values of Loil are 3.33 and 4.84, respectively.

2.2. Radial Basis Function Neural Network

RBFNN is considered a special three-layered network, which consists of an input layer, a hidden layer, and an output layer as shown in Figure 1. Note that each layer consists of a set of neurons. The input values of the RBFNN are passed through the input layer to the hidden layer via the “input weights”. After that, the output values of the hidden layer are forwarded to the output layer via the “output weights”. In this work, all hidden neurons have the same activation functions, i.e., the Gaussian function :where is the Gaussian function in the hidden neuron , is the input weight between the input neuron and the hidden neuron , is the center, and is the width. In this study, we have used three input neurons, five hidden neurons, and one output neuron (see Figure 1).

2.3. Particle Swarm Optimization Algorithm

PSO, first proposed by [30], is a popular swarm intelligence-based metaheuristic technique inspired by the behavior of birds flying in flocks or fish swimming in schools that can solve complex mathematical problems. Similar to evolutionary algorithms in nature, PSO starts the optimization process with a population of some randomly generated solutions that are optimized with each passing generation. However, PSO does not incorporate any evolutionary operators like mutation and crossover. Particles (parameters) in PSO determine their new location by following the current optimum particle in the problem space. It has been found to be effective while applied to various optimization problems including artificial neural networks [31], mechanical engineering design optimization problems [3234], and chaotic systems [3537]. Moreover, it is easy for it to achieve high accuracy with fast converging speed [38, 39]. It has been widely used in many real-life optimization problems of different domains [31, 33, 34]. In this study, we have used the PSO algorithm to determine the optimal parameter values of our RBFNN model in order to find the minimum value of mean squared error (MSE).

The algorithm in the search space for the optimal solution depends on equations (2) and (3). The algorithm keeps updating the parameter values until an appropriate solution is obtained.where indicates the position of particle at time and represents the velocity of particle at time . Similarly, indicates the dimension along which the vectors and are updated at time . Note that and where represents the number of swarm particles and indicates the dimension of particles. In this study, we have 15 input weight parameters, 10 hidden neuron parameters, and 5 output weight parameters. Figure 2 presents the procedure of PSO algorithm in this work.

The PSO procedure steps are summarized as follows (see Figure 2). The PSO starts with N initial solutions (N particles), which are always generated randomly. In this work, the initial values represent the parameters of the RBFNNs that include input weights, output weights, and the parameters of the activation function. Moreover, in each iteration, the solutions will develop over time to create a new generation by using equations (2) and (3). Moreover, the algorithm memorizes the optimal solution based on MSE.

2.4. Tukey Method

Tukey’s boxplot is a very famous method for detecting outliers. It reveals the spread, skewness, and locations of the data. This method works perfectly for the detection of outlier values when the data are symmetric [40]. This method depends on the upper fence and lower fence where is the interquartile range, which is the difference between the third quartile and the first quartile , i.e., . In this study, outliers are defined as the observations that lie outside the interval whereas observations that fall 1.5 times interquartile range apart from the first and third quartiles are regarded as suspected outliers. The constants of fences, which are both fixed as 1.5 are considered too liberal for detecting outliers in random normally distributed data [5, 41]. In this study, the Tukey method is used to identify outliers and nonoutliers from the stock return data. After this categorization, RBFNN in conjunction with PSO is used to develop the mathematical model for learning the features of the two categories. Once the training is complete, the proposed model is used to predict outliers.

3. Results and Discussion

The experiments are performed using MATLAB2019 and R software running on a 64-bit OS (Windows 8 platform) equipped with a 2.2 GHz Intel Core i7 processor and 8 GB RAM.

The proposed model is constructed across two stages (see Figure 3). In the first stage, we used the Tukey method to determine the outlier values of the stock return dataset using R software. We found that the outlier values are those that are located outside the interval (−0.01783, 0.01908), as discussed in Section 3.2. After that, we constructed the RBFNN model to detect outliers that are outside the interval (−0.01783, 0.01908) in the daily return of Saudi stock market (Tadawul). Based on the Engle and Granger Causality test, we selected inflation rate, repo rate, and oil prices as input values to the RBFNN model with the same period of stock return. The optimal values of RBFNN parameters are obtained using the PSO algorithm.

3.1. Selecting Variables

We have selected three independent variables (inflation rate, repo rate, and oil prices) based on correlation and causality tests. These macroeconomic variables have a strong effect on stock returns.

3.1.1. Correlation

In this section, we carefully selected independent variables among a number of other variables, which are eliminated based on certain test. First, as already seen in Table 2, we removed variables because of multicollinearity among independent variables. The absence of perfect multicollinearity, which is an exact (nonstochastic) linear relationship between two or more independent variables, is generally referred to as no multicollinearity. We extracted some variables from independent variables according to the strong relation with other independent variables. Table 2 gives the correlations between the independent and the dependent variables.

3.1.2. Engle and Granger Causality Test

The test of Engle and Granger represents the causal relationships through cointegration. It generates residuals (errors) that depend on static regression. Augmented Dickey–Fuller test or another similar test can be performed that uses the residuals to see whether unit roots are available. If the time series is cointegrated, the residuals would be almost stationary [42].where is the dependent variable, is the independent variable set, and ECT indicates error correction. The parameters are , , and .

The null hypothesis of the Engle–Granger test (: there is no cointegration) is rejected if is negative and greater than 1.96. For more details, the rejecting null hypothesis suggests that dependent variables are caused by independent variables. In this study, the Engle and Granger test for dependent variables and independent variables is explained in Table 3. The findings indicate that the independent variables at a significant level of 5 percent have cointegration with dependent variables. This outcome virtually suggests that the dependent variable is caused by independent variables. Therefore, we have significant evidence to include the independent variables (inflation rate, repo rate, and oil price) in our study.

3.2. Outlier Detection

Referring to Figure 3, this study has two stages. In the first stage, the outlier values are yielded for the return data using the Tukey method, which are based on the evaluation of upper and lower fences. As a result, we found that the upper fence and the lower fence values are 0.01908 and , respectively. Therefore, any value out of this interval will be an outlier value. In this study, we have labeled the two classes as outlier = 1 and nonoutlier = 0. Note that the procedures of this study are identified in Figure 4 where the return data have been sorted from smallest to largest. That is why the outliers can appear on the tails since these values are out of the two red bounded lines.

The current study examines the closed price data of Tadawul. It is chosen for different reasons; the emerging markets have interesting historical experience for stock market volatility. The Saudi market is an example of significant volatility due to an imbalance in the information, random trading, and unprofessional financial analysis. In addition, the investors of other countries except the Gulf Cooperation Council (GCC) are not allowed to invest in Saudi stocks. Tadawul is the largest in the Middle East. From 2011 to 2019, Tadawul has experienced numerous fluctuations. For example, the general index decreased to 6417.7 points in 2011 whereas increased to 8535 points in 2013. The trading mechanism has been changed from SAXESS to X-Stream INET by market management, and an interactive multiuser system (IFSAH) has been developed to enhance the market’s efficiency and effectiveness [43]. One of the problems faced by different economies in the world is the fluctuation of stock prices. Domestic and foreign economies are influencing the Saudi stock market. External financial crises are transferred to the domestic markets. One such incident happened when the global financial crisis hit the Saudi stock market in 2008 [44].

3.3. RBFNN Model

In the second stage (see Figure 3), we defined the RBFNN architecture where it depends on the data generated by the Tukey method for its classification performance. According to the stock return dataset used in this study, we have three input neurons, ten hidden neurons, and one output neuron. Note that the input neurons accept the three variables (Inf., Repo, and oil prices) whereas the output neuron has two values, i.e., either a 0 or a 1. By using the PSO algorithm, we have successfully designed an RBFNN model that finds the outlier values of return data based on the input values (macroeconomic variables). The RBFNN model is trained with PSO algorithm for 20 trials and 1500 iterations where each trial covers a population size of 80. The computed MSE value is less than 0.05 that is the best model as evident from Figure 5. Therefore, we can successfully use this model to predict future unknown data, which is an expression of an AI model trained with the algorithm (PSO-RBFNN) for the classification of return data of Tadawul market.

4. Conclusions

The proposed model is sufficiently powerful to optimize business processes for economic development of a country. The main purpose of this study was to develop an ANN model to detect and classify outlier values in the daily return of Saudi stock market (Tadawul). We selected inflation rate month, repo rate, and oil prices as the independent variables and the daily return data as dependent variable from October 2011 to December 2019. We confirmed that these parameters are strongly correlated as demonstrated by Engle and Granger causality test. The outliers and nonoutliers from the stock return data have been first categorized using the Tukey method. We observed that the values outside the range [0.01908, −0.01783] are outliers. We labeled the data as 1 (outlier) and 0 (nonoutlier). As a result, the proposed model is capable of detecting the outlier values of the stock return data based on the inflation rate, repo rate, and oil prices.

We trained RBFNN classifier that effectively learned the patterns for detection and classification of outlier values based on independent variables without referring to stock return data (dependent variable). Note that we have used the PSO algorithm successfully to construct the optimal RBFNN model evaluated with MSE measure, which is used to test the effectiveness of the model. It is affirmative that the proposed model can be used to detect and classify the outlier values for any other stock return data.

Data Availability

The data used to support the study can be made available upon request from the first author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.