Abstract

Coronavirus disease 2019 (COVID-19) is still a great pandemic presently spreading all around the world. In Gulf Cooperation Council (GCC) countries, there were 1015269 COVID-19 confirmed cases, 969424 recovery cases, and 9328 deaths as of 30 Nov. 2020. This paper, therefore, subjected the daily reported COVID-19 cases of these three variables to some statistical models including classical ARIMA, kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA to study the trend and to provide the long-term forecasting of the confirmed, recovery, and death cases of the novel COVID-19 pandemic in the GCC countries. The data analyzed in this study covered the period starting from the first case of coronavirus reported in each GCC country to Jan 31, 2021. To compute the best parameter estimates, each model was fitted for 90% of the available data in each country, which is called the in-sample forecast or training data, and the remaining 10% was used for the out-of-sample forecast or testing data. The AIC was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE and MAPE were utilized for testing data, and the model with the minimum RMSE and MAPE was selected for future forecasting. The main finding, in general, is that the two models WMA-ARIMA and EWMA-ARIMA, besides the cubic and 4th degree polynomial regression, have given better results for in-sample and out-of-sample forecasts than the classical ARIMA models in fitting the confirmed and recovery cases while SMA-ARIMA and WMA-ARIMA were suitable to model the recovery and death cases in the GCC countries.

1. Introduction

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first case was registered in Wuhan, China, in December 2019. It has since spread worldwide, leading to an ongoing pandemic. Most of the people around the world have been infected with the new COVID-19 virus nowadays; the people of the Gulf Cooperation Council (GCC) countries were no exception to that. GCC countries adopted many severe strategies and policies to face the new crisis, such as early diagnosis, isolated infected people, and social distancing. The first COVID-19 confirmed in the GCC area was reported in UAE on Jan 29, 2020, and flowed by Bahrain on Feb 21, 2020, Kuwait, and Oman on Feb 24, 2020, Qatar on Feb 27, 2020, and KSA on Mar 2, 2020. The confirmed, recovery, and death cases are recorded daily in each of these countries. Accordingly, the ultimate requirement is to create an efficient statistical methodology that can efficiently predict morbidity cases. Thus, it can aid in decision making and logistical planning in healthcare systems for coming challenges. The statistical forecast models are always conducted to predict the disease’s behavior in the future, and in this way, it may decrease and restrain in the pandemic.

Recently, many studies of COVID-19 have been conducted in the GCC countries; some of these studies used measures of descriptive statistics to explain the disease prevalence, incidence, and underline prevention and control techniques applied in these countries, while others used mathematical models to analyze and predict the evolution of the disease. Alandijany et al. [1] reviewed the status of COVID-19 in GCC countries, summarized the control measures taken by each government, and highlighted some future challenges. They recommended that these countries must take severe appropriate precautions to limit the spread of infection. Sharif et al. [2] utilized the SIRD and smoothing spline regression models to predict the number of cases in the Eastern Mediterranean Region including Saudi Arabia, Iran, and Pakistan. They concluded that the cumulative infected cases were expected to grow exponentially during the study period from Jan 29 till Apr 14, 2020. Zuo et al. [3] provided a brief comparison of the COVID-19 events, which involves the total confirmed cases, total deaths, total recovery, and active cases that have been reported in the Asian countries up to Apr 8, 2020. Moreover, they introduced a new family of statistical models and proposed a particular submodel called flexible extended Weibull distribution. Abuhasel et al. [4] applied the classical SIR model besides the ARIMA model to forecast the prevalence and recovery rates of the COVID-19 pandemic; the two models were applied to the daily data from Mar 3 to Jun 30, 2020. Ayinde et al. [5] used the statistical curve model to model the daily cumulative confirmed, discharged, and death cases in Nigeria for the period beginning from Feb 27, 2020, until Apr 30, 2020. They concluded that the Cubic Linear Regression with AR (1) models are the best ones. Elhassan and Gaafar [6] employed ARIMA and logistic growth models to predict the cumulative confirmed, recovery, and death cases of COVID-19 in Saudi Arabia between Mar 2, 2000, and Jun 21, 2020. They inferred that ARIMA (0,2,0), ARIMA (1,2,0), and ARIMA (1,2,3) were the most useful models to fit the cumulative confirmed, recovery, and death cases, respectively. Singh et al. [7] developed the ARIMA to model the daily confirmed cases reported in Malaysia using training data of observed cases from Jan 22 to Mar 31, 2020. Subsequently, they validated using data on cases from Apr 1 to Apr 17, 2020, deducing that the ARIMA (0,1,0) model produced the best fit to the observed data. Ding et al. [8] analyzed the epidemic data from Feb 24 to Mar 30, 2020, in Italy, based on the ARIMA model. They selected ARIMA (2,1,0) to fit the logarithmic sequence of cumulative diagnoses. Hernandez-Matamoros et al. [9] constructed a model for 145 countries, which are distributed in 6 geographic regions using the ARIMA parameters, the population per 1M people, the number of cases, and polynomial functions, and the study period was until Apr 25. Dawoud [10] applied classical ARIMA together with the kth MA-ARIMA to model the COVID-19 cumulative confirmed cases in Palestine from Mar 5, 2020, through Aug 27, 2020. He inferred that ARIMA (1,2,4) and the 5th EWMA-ARIMA (2,2,3) were the best models. Duong et al. [11] applied the ARIMA model for the total daily confirmed cases worldwide from Jan 21, 2020, to Mar 16, 2020. They found that ARIMA (1,2,1) could describe and predict the epidemiological trend of COVID-19. Roy et al. [12] used ARIMA to fit COVID-19 from Jan 26, 2020, to May 9, 2020, in India. They selected the ARIMA (1,0,2) model to fit the sequence of diagnoses. Verma et al. [13] developed some models based on the ARIMA and FUZZY time-series methodology to forecast COVID-19, mortality, and recovery in India throughout the phase between Mar. 2, 2020, to May 17, 2020. They deduced that the ARIMA and FUZZY time-series models' forecasts would be useful for the decision makers.

The main objective of this article is to model confirmed, recovery, and death cases of COVID-19 using classical ARIMA besides the three types of kth Moving Average-ARIMA (kth MA-ARIMA), including kth Simple Moving Average-ARIMA (kth SMA-ARIMA), kth Weighted Moving Average-ARIMA (kth WMA-ARIMA), and kth Exponential Weighted Moving Average-ARIMA (kth EWMA-ARIMA), in the GCC countries. This study starts from the first case of coronavirus reported in each GCC country to Nov 30, 2020. This article’s main contribution is that it considers the only study that used the classical ARIMA together with kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA to model the three variables, confirmed, recovery, and death cases, of COVID-19 in the GCC countries.

The organization of the paper is as follows. The next section describes the study area and data collection. Section 3 briefs the methodology used in the study. The article ends with the results and discussion in Section 4 and conclusions in Section 5.

1.1. Study Area and Data Collection

To achieve this study's objectives, all six countries within the GCC were included (Saudi Arabia, United Arab Emirates, Kuwait, Qatar, Bahrain, and Oman). The sample data consist of daily reported COVID-19 cases of 3 variables involving confirmed, recovery, and deaths in each country. The data cover the period starting from the first confirmed case of COVID-19 reported in each country to Jan 31, 2021. The data were extracted from the WHO situation reports, Sehhty website, and Wikipedia.

2. Methodology

This paper's main goal is to model 3 variables involving daily confirmed, recovery, and death cases in GCC countries using classical ARIMA besides the three types of kth MA-ARIMA including kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA. Therefore, this section investigates each of these models, discussing model building and model evaluation.

2.1. ARIMA Model

The ARIMA model, which was developed by Box and Jenkins [14], is a statistical model that uses time-series data to study the trend and generate future forecasting of time-series data.

For a given nonstationary time series , the classical model is defined aswhere is the backward shift operator, is the difference filter, is the number of times needed to differentiate to make the data stationary, is the order of autoregression, is the order of moving average, , , and.

The ARIMA model is a generalized model that integrates the autoregressive model and the moving average model ; ARIMA models that do not require differencing are considered as ARMA models; therefore, model (1) can be expressed as polynomials of autoregressive , residuals , and a combination of them as

2.2. The kth Moving Average-ARIMA Time-Series Models

The kth moving average ARIMA model technique (kth MA-ARIMA) was proposed by Shih and Tsokos [15] and Tsokos [16]. This model is based on modifying a given time series into a new k-time moving average time series and then developing the autoregressive integrated moving average (ARIMA) model using the Box and Jenkins method. Once the new time series' forecasting model is built, a back-shift operator is applied to obtain estimates of the original phenomenon. The kth MA-ARIMA involves kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA.

2.3. The kth SMA-ARIMA Model

The kth SMA-ARIMA process of a time series and the corresponding back-shift operator are defined, respectively, by

2.4. The kth WMA-ARIMA Model

The kth WMA-ARIMA process of a time series and the corresponding back-shift operator are given, respectively, as

2.5. The kth EWMA-ARIMA Model

The kth EWMA-ARIMA process of a time series and the corresponding back-shift operator are computed, respectively, as

2.6. Model Selection Criteria

Model selection criteria are rules used to select a statistical model among a set of candidate models based on the observed data. The Akaike information criterion (AIC) is a widely used model selection tool due to its computational simplicity and effective performance in many modeling frameworks. The AIC is given as [17]where is the likelihood of the model and is the total number of estimated parameters in the model. A good model is the one that has the minimum AIC among all other models.

2.7. Measures of Forecast Accuracy

The most popular measures of forecast accuracy in univariate time-series data are the Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE). The RMSE and MAPE are computed aswhere and are the actual and predicted values at time , respectively, and is a sequence of time points. The lower value of RMSE and MAPE indicates better calibration and, therefore, better performance.

2.8. Checking the Model’s Goodness of Fit

After the ARIMA or kth MA-ARIMA model, which is considered appropriate among the alternatives, is put in place, it can be tested for a goodness fit, which entails testing its efficiency. The model is assumed to be a good fit if the residuals are approximately equal to the white noise. The essential tools are the plots of ACF and PACF. The Box–Ljung test is a diagnostic tool used to test the lack of fit of a time-series model. This test is applied to the residuals of a time series after fitting an ARIMA or kth MA-ARIMA model to the data. The test examines autocorrelations of the residuals. The null and alternative hypotheses for this test are as follows:(1): the model does not exhibit a lack of fit, or there is no serial correlation among lags(2): the model exhibits a lack of fit, or the residuals are approximately equal to the white noise

3. Results and Discussion

This section first demonstrates summary statistics for the three variables, confirmed, recovery, and death cases, in each GCC country and then reports and discusses the results obtained from applying the ARIMA and kth MA-ARIMA models on these variables.

4. Summary Statistics for COVID-19 Confirmed, Recovery, and Death Cases

Table 1, shows the summary statistics measures, including mean and standard deviation of the confirmed, recovery, and death cases of COVID-19 among the GCC countries. Moreover, Table 1 also demonstrates the prevalence of confirmed cases per 100000 population for the first four weeks.

Based on Table 1, it is observed that KSA has the highest mean of confirmed cases (1095.10) with a standard deviation of 1178.20, followed by UAE, Kuwait, and Qatar; on the other side, Bahrain has the lowest mean (297.85) with a standard deviation of 200.24. For recovery cases, KSA has the highest mean, followed by UAE, Kuwait, Qatar, and Oman, but Bahrain has the lowest one. KSA has the highest mean of reported death cases, followed by Oman and Kuwait. On the other hand, Qatar has the lowest one. It can be also seen that, in the first 4 weeks of the COVID-19 outbreak, Qatar and Bahrain had the highest prevalence of confirmed cases of 18.26 and 15.81 infected persons per 1000000, respectively. In contrast, UAE and Oman had the lowest ones of 0.13 and 1.08, respectively (see Figure 1).

4.1. Prediction Model for COVID-19 Confirmed, Recovery, and Death Cases

This paper uses the time series of daily COVID-19 confirmed, recovery, and death cases in each GCC country. Therefore, we have a time series presented as follows:where represents the confirmed, recovery, or death cases at day and denotes the date of the first case of COVID-19 detected in a given country. The time-series plot of the daily COVID-19 confirmed, recovery, and death cases for GCC countries is presented in Figures 24, respectively.

5. Prediction Method

To compute the best parameter estimates of ARIMA, kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA models, these models were fitted for 90% of the available data in each country which is called the in-sample forecast or training data and the remaining 10% was used for the out-of-sample forecast or testing data. The AIC of equation (11) was applied to the training data as a criterion method to select the best model within each type of the four statistical models. Furthermore, the statistical measures RMSE and MAPE of equation (12) and equation (13) were utilized for testing data. The model with the minimum RMSE and MAPE among the best models was chosen for future forecasting. The calculations were performed using R studio software and Eviews 10.

5.1. ARIMA Model for COVID-19 Confirmed, Recovery, and Death Cases

To check whether the daily COVID-19 confirmed, recovery, and death cases time series in each country were stationary, we carried out an ADF root test. The results of the ADF unit root test are demonstrated in Table 2. Based on Table 2, we conclude that all variables are stationary with constant and trend at first differences throughout the study period; therefore, the ARIMA model can be performed. After the stationarity of the confirmed, recovery, and death cases’ time series in each country were determined, the best ARIMA model that fits these 3 variables well for training data with the minimum AIC was selected. Both forecasting evaluation measures RMSE and MAPE were computed using the testing data for each model. Table 3 summarizes the best ARIMA model for the confirmed, recovery, and death cases in each country and their corresponding RMSE, MAPE, and AIC.

Based on the results in Table 3, we observed that ARIMA (2,1,3), ARIMA (2,1,3), and ARIMA (1,1,2) are considered as the best models to fit the confirmed, recovery, and death cases of COVID-19 in Saudi Arabia, respectively; these models have the minimum AICs (4356.60, 5227.94, and 1720.18) among all models. These results imply that ARIMA (2,1,3), ARIMA (2,1,3), and ARIMA (1,1,2) are more efficient than other ARIMA models with . Consequently, a new confirmed, recovery, and death case can be interpreted based on the current case and the most recent change of the COVID-19 trend. The remaining results of Table 3 can be interpreted in the same manner.

5.2. kth MA-ARIMA Model for COVID-19 Confirmed, Recovery, and Death Cases

We can summarize the process of developing the kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA models as follows:1.Transforming the original time series into the new one for by using equation (5), equation (7), and equation (9), respectively(2)Checking the stationary of time series using the ACF test until we achieve stationarity(3)Applying the classical for the , or determined in step 2, where (4)Computing the AIC for each model, and choosing the one with the smallest AIC(5)Solving the estimates of the original time series (fitted values) by using the back-shift operator of equation (6), equation (8), and equation (10), respectively(6)Computing the RMSE and MAPE for each model, and choosing the one with the smallest RMSE and MAPE to be the best model for future forecasting

After taking the first differences of the transformed data to make it stationary, we fitted 72 models for each type of the 3 kth MA-ARIMA models [6 countries 3 variables 4 values of ]. The best 18 out of 72 different combinations of kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA models for foresting the confirmed, recovery, and death cases of COVID-19 with the corresponding RMSE, MAPE, and AIC for each country are presented in Tables 46, respectively.

Depending on the results in Table 4, it can be concluded that the 2nd SMA-ARIMA (2,1,3), 3rd SMA-ARIMA (1,1,1), and 2nd SMA-ARIMA (3,1,1) were selected as the best models for forecasting the confirmed, recovery, and death cases of COVID-19 in Saudi Arabia, respectively. These models have the minimum AICs (3890.6, 4662.5, and 1259.24), lowest RMSEs of (23.58, 30.27, and 1.34), and MAPEs of (11.19, 11.99, and 24.28) among all SMA-ARIMA models. These results imply that the selected models are more efficient than other SMA-ARIMA models with and for in-sample forecasts. Accordingly, a new confirmed, recovery, and death case can be interpreted based on the current case and the most recent change of the COVID-19 trend. The remaining results of Table 4 and the outputs in Tables 5 and 6 can be interpreted in the same manner. Table 7 reviews the best models for forecasting among classical ARIMA besides the kth MA-ARIMA based on the smallest RMSE and MAPE.

After identifying the best model within the classical ARIMA and kth MA-ARIMA models for forecasting the confirmed, recovery, and death cases for each country (see Table 7), the next step is to check the pattern followed by residuals from the specific model by plotting the ACF of the residuals and conducting the Box–Ljung test to examine the goodness of fit for each model. Figures 5(a)–5(r) show ACF plots for all the best models located in Table 7, while Table 8 demonstrates the outputs of the Box–Ljung test.

By looking at the ACF plots in Figure 5, it is observed that, for the first 30 lags, most of the autocorrelations are inside the 95% confidence interval bounds indicating that they are white noise and normally distributed except ACF of Figure 5(d), Figure 5(j), and Figure 5(q) which have deviated a little from normality and randomized. The outputs of the Ljung–Box test in Table 8 confirm that there is no autocorrelation left on the residuals for all models in Table 7 except the three models concerning confirmed cases in UAE and Qatar and recovery cases in Oman. The null hypothesis of which the residuals were white noise was not rejected, and therefore, all models (excluding the 3 models) exhibited goodness of fit. Thus, we conclude that each model in Table 7 has passed the required checks and ready for forecasting except the tree models 5th EWMA-ARIMA (2,1,3), 2nd EWMA-ARIMA (1,1,3), and 4th WMA-ARIMA (2,1,2) corresponding to the confirmed cases in UAE and Qatar and recovery cases in Oman, respectively. Tables 911 show the values of the estimated AR and MA parameters and their standard errors for the kth MA-ARIMA models which have passed the required checks and ready for forecasting the confirmed, recovery, and death cases in each country. Tables 1214, respectively, demonstrate the forecasting result of confirmed, recovery, and death cases for COVID-19 in each country from Feb. 1, 2021, to Feb. 10, 2021 (10 values), based on each corresponding model listed in Tables 911. Note that depending on the values of the estimated AR and MA parameters in Tables 9, 10, and 11, to perform the forecasting, each model of the MA-ARIMA model has been written in the form of equation (1) in conjunction with its the corresponding back-shift operator. For example, the 2nd WMA-ARIMA (2,1,3) which is used to forecast COVID-19 confirmed cases in KSA has been written asin conjunction with

On the other hand, the suitable models for the confirmed cases in UAE and Qatar and recovery cases in Oman were the cubic regression, 4th degree polynomial regression, and cubic regression models, respectively. The estimated model for confirmed cases in UAE iswhile the estimated model for confirmed cases in Qatar isand the estimated model for recovery cases in Oman is

Therefore, the forecast values of the confirmed cases in USA and Qatar and of the recovery cases in Oman were computed based on the cubic, 4th degree, and cubic of the polynomial regression models, respectively.

6. Conclusions

Four important models including classical ARIMA, kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA have been considered in the prediction of the confirmed, recovery, and death cases of the novel COVID-19 pandemic in the GCC countries, and these models have been applied on the daily data from the first case reported in each country until Jan 31, 2021. To compute the best parameter estimates, each model was fitted for 90% of the available data in each country, which is called the in-sample forecast or training data, and the remaining 10% was used for the out-of-sample forecast or testing data. The AIC was utilized for the training data as a criterion method to select the best model. Moreover, the statistical measures RMSE and MAPE were applied to the testing data, and the model with the minimum RMSE and MAPE was selected for future forecasting. The main finding, in general, is that the two models WMA-ARIMA and EWMA-ARIMA, besides the cubic and 4th-degree polynomial regression models, have given better results for in-sample and out-of-sample forecasts than the classical ARIMA models in fitting the confirmed cases while SMA-ARIMA and WMA-ARIMA were suitable to model the recovery and death cases in the GCC countries.

Data Availability

The data that support the findings of this study are openly available in the following:(1)WHO: Coronavirus Disease (COVID-19) Situation Report, https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports(2)Sehhty: Coronavirus Statistics over the word, https://sehhty.com/(3)Wikipedia: COVID-19 pandemic, https://en.wikipedia.org/wiki/COVID-19_pandemichttps://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports

No written consent has been obtained from the patients as there are no patient identifiable data included in this study.

Conflicts of Interest

The authors declare no conflicts of interest.