Abstract

Classification in statistics is usually used to solve the problems of identifying to which set of categories, such as subpopulations, new observation belongs, based on a training set of data containing information (or instances) whose category membership is known. The article aims to use the Gaussian Mixture Model to model the daily closing price index over the period of 1/1/2013 to 16/8/2020 in the Kingdom of Saudi Arabia. The daily closing price index over the period declined, which might be the effect of corona virus, and the mean of the study period is about 7866.965. The closing price is the last regular deal that took place during the continuous trading period. If there are no transactions on the stock during the day, the closing price is the previous day’s closing price. The closing auction period comes after the continuous trading period (from 3 : 00 PM to 3 : 10 PM), during which investors can enter by buying and selling the stocks at this period. The experimental results show that the best mixture model is E (equal variance) with three components according to the BIC criterion. The expectation-maximization (EM) algorithm converged in 2 repetitions. The data source is from Tadawul KSA.

1. Introduction

The stock market index’s direction indicates the movement of the price index or the future trend of fluctuation in the stock market index [1, 2]. Guessing the trend is a practical issue that heavily influences a financial trader buying or selling an instrument [3, 4]. An accurate forecast of the stock index trends can help investors acquire opportunities for gaining profit in the stock exchange [57]. Hence, precise forecasting of the stock price index trends can be extremely advantageous for investors [8]. It is essential to study the extent to which the stock price index’s movement can be predicted using the data Tadawul from emerging markets such as the Saudi stock market, since its inception on 6 June 2003, corresponding to 2/6/1424 AH. On March 19, 2007, the Council of Ministers approved it [3]. The Saudi Stock Exchange Company “Tadawul” under Article 20 of the Capital Market Authority. The passage of years is involved the incredible expansion of the local economy and companies which need to reach a wide range of investors. It obtained the total of listings in the market. The leading and parallel 262 companies and securities and debt instruments as many as 190 companies were listed in the leading market and ten firms in growth-parallel market and 62 instrument issues’ debt by the end of 2018. In addition to listing the shares, Tadawul is also listed tading in bonds and Sukuk and funds (RITs) [9] and index traded funds (ETF). The leading market index (TASI) also includes all companies listed on the leading “Tadawul” market, one of the leading indicators trusted by investors. It depends on the performance of the companies listed in the stock market in Saudi Arabia. Tadawul also publishes different types of sector indices following the Global Industry Classification Standard (GICS).

2. Gaussian Mixture Model

In this section, we introduced mixture models. Recall that, if our observations come from a mixture model with mixture components, the marginal probability distribution of is of the formwhere is the latent variable representing the mixture component for which is the mixture component and is the mixture proportion representing the probability that belongs to the mixture component [10].

2.1. Expectation Maximization (EM)

It is an algorithm within the Gaussian mixture models. Consider represents the probability distribution function for a normal random variable. Thus, we get that the conditional distribution so that the marginal distribution of is

Similarly, the joint probability of observations is therefore

See [11], for more details. This note defines the EM algorithm that aims to determine the maximum likelihood estimates of , given a dataset of observations .

2.2. MLE of Normal Distribution

Suppose we have observations from a Gaussian distribution with an unidentified mean and a recognized variance To define the maximum likelihood estimate for , we get the log-likelihood to take the derivative concerning set it equal zero, and resolve for :

Defining the result equal to zero and resolving for we have that . Furthermore, applying the log function to the likelihood helped decompose the product and eliminated the exponential function. Thus, the MLE could be resolved easily.

2.3. MLE of Gaussian Mixture Model

Now, we attempt the same strategy for deriving the MLE of the Gaussian mixture model. Our unknown parameters arebased on the first section of the note, and our likelihood is

So, our log-likelihood is

Considering the expression above, we already see a difference between this scenario and the simple setup in the preceding section. The summation over the constituents “blocks” our log function from application to the ordinary densities. When following the same earlier steps, differentiating concerning and setting the expression equal to zero, the result would be

We are currently stuck due to the inability to resolve analytically for , but a significant observation is made when we defined the latent variables . After that, we could collect all samples such that and utilize the estimate from the preceding section to estimate .

3. Numerical Results

The available historical data consisted of the daily closing price index over the period of 1/1/2013 to 16/8/2020 in the Kingdom of Saudi Arabia. The data source is from Tadawul KSA [12].

Figure 1 displays the data that is taken daily over 1/1/2013 to 16/8/2020. It can be seen that the data contains numerous of information. However, the proposed methodology can answer questions that we need it. On the contrary, Table 1 shows the summary statistics. This information can be obtained quickly from any software. From Table 1, the mean is 7866.965 and the standard deviation is 1126.780, where the first impression can be given from this primary information.

The comparison among these models can be found using BIC in Table 2, where this method is used in the maximum likelihood and then compared between them. It can be seen that the model number 3 has the smallest figure, although the model number 2 gives negative result.

Since mixed model is used, it is beneficial to study the proportion between two normal distributions, where the model numbers 1 and 2 have the same number of the balance and larger than model 3, see Table 2.

Table 3 shows the means after the proposed methods are implemented. It is easy to see that the model number 3 gives a high figure of mean, where model numbers 1 and 2 are closer to 7385, though the variances are similar in the three models.

In contrast, the selection criterion can be used to find the class in the data. It shows that there is more than one clustering in this data, where the method gives 25.436. This means that two or three groups can be given (see Table 4).

Figure 2 shows that there are two clusters: the red line shows first groups and the green shows the other. This means that two groups of companies can be gathered together. More precisely, from the data there are one group going down and the other going up. It is not easy to find this information from the data directly.

The plots of three model can be found in Figure 3, where the mixed model gives the larger maximum likelihood than the other models “1, 2.”

It can be seen that the proposed method gives curve close to the empirical components. This means that the proposed method gives excellent results, even compared to various models, see Figures 4 and 5.

4. Conclusion

This paper used the Gaussian mixtures model to classify the daily closing price index over 1/1/2013 to 16/8/2020 in KSA and describe the problem of predicting the daily closing price index in KSA which represents a huge problem. The mean of the daily closing price index over the study period is 7866.965. The decline of the daily closing price index in KSA, which occurred last year, might have been due to COVID-19 pandemic. The EM algorithm converged in 124 iterations; according to Bayesian information criterion, the best mixture model is the equal variance with three components (see Table 2). The proportions of the three ingredients are varied between 0.261 and 0.370 (see Table 5). The mean of the three components is run between 9232.995 and 7385.066 (see Table 3). The variance of the four ingredients is 610676.444 (see Table 6). Finally, we must point out that implementing such a mechanism to predict the daily closing price index in the KSA is beneficial.

Data Availability

No data were used to support the findings of this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest related to this article.

Acknowledgments

This work was supported by Taif University Researchers Supporting Project (no. TURSP2020/279), Taif University, Taif, Saudi Arabia.