Abstract
Fitting a time series model to the process data before applying a control chart to the residuals is essential to fulfill the basic assumptions of statistical process control (SPC). Autoregressive integrated moving average (ARIMA) model has been one of the well-established time series modeling approaches that is extensively used for this purpose and is widely recognized for its accuracy and efficiency. Nevertheless, the research community commented that its iterative stages are laborious and time-consuming. In addressing this gap, a novel time series modeling technique with its conceptual assumptions of attributes that was derived from the geometric Brownian motion (GBM) law was developed in this study. It was termed as the logarithmic return (LR) model. Then, the model was employed and tested on a real-world autocorrelated data, whereby the results were assessed and benchmarked with the ARIMA model. The findings for LR model reported a mean average percentage error that ranged between 1.5851% and 3.3793% (less than 10%), which were as accurate as the ARIMA model. The running time (in second of CPU time) taken by the LR model was at least 96.2% faster than the ARIMA model. Interestingly, the corresponding multivariate control chart constructed from the LR model also portrayed a similar general conclusion as that of its counterpart. The LR model was obviously parsimonious and easier to compute and took a shorter running time than the ARIMA model. Therefore, it possessed the potential as an alternative time series modeling methodology for the ARIMA model in the procedures of SPC.
1. Introduction
Over the decades, statistical process control (SPC) techniques were extensively implemented in industrial processes to produce high quality products [1]. It is a simple and effective technique to learn about the process history and detect changes in the process. Among the SPC basic tools, control charts are broadly utilized due to its simplicity. Ali et al. [2] gave an overview of control charts to detect significant variations in the process to deliver high quality products. Each type of control chart has its strength and limitation in monitoring process target and process dispersion [3, 4].
Meaningful interpretation of the control charts is subject to the basic assumption of SPC, observations within samples are independent and identically normally distributed (i.i.n.d.). Nevertheless, in many situations, it is difficult to achieve the independency assumption [5, 6]. The control chart implementation might be misleading if this assumption is not fulfilled. The autocorrelation effect will thus deteriorate the control chart performance [6–9].
The standard practice in dealing with the autocorrelation problem is fitting a time series model to the process data before applying a control chart to the residuals [10–14]. In general, the goal of time series modeling is to obtain accurate forecast [15]. While in autocorrelated SPC, the goal of time series modeling is to obtain i.i.n.d. residuals [16]. These are the main difference between forecasting and autocorrelated SPC. If the residuals are i.i.n.d., then classical control charts can be applied directly to the residuals for detecting the process changes. For the rest of this paper, time series modeling refers to the one used to deal with autocorrelated SPC.
One of the most widely used time series modeling techniques is the Box and Jenkins methodology that works to remove autocorrelation characteristics in process data and thus accepted as standard manufacturing process control practice [1, 17]. Autoregressive integrated moving average model (ARIMA) which is based on the above methodology has been employed extensively across various fields, such as in production processes [18, 19] and market exchange rates [12, 20, 21]. The ARIMA model is stable and is doubtlessly used as a benchmark for other time series methodologies.
In manufacturing industry practice, a good model of autocorrelated process providing i.i.n.d. residuals is not enough. It is unreliable if the process of time series modeling is not as fast as expected. Since quality is a continuous process [1], a fast decision-making process is always desirable and put as the first consideration before the accuracy of decision. A fast decision-making which requires fast time series modeling process is the main concern. Moreover, the model must be as simple as possible with fewer parameters, less computational effort, and reasonable accuracy.
Unfortunately, even though the most adopted method such as Box-Jenkins's methodology, ARIMA modeling provided a vigorous and structured approach in time series modeling; it is in general computationally time consuming [22]. Eni and Adeyeye [23] commented that the ambiguous patterns of autocorrelation function (ACF) and partial autocorrelation function (PACF) complicate the modeling identification procedure. Sometimes, the best fitted ARIMA model contains a large number of parameters estimated, which consequently leads to overfitting, suggesting that a simpler parsimonious model should be considered [24–26]. The decision of which model is more reliable is subjected to the analyst’s knowledge, skill, and experience [27]. Researchers apparently have indicated that Box-Jenkins’s methodology, which comprises the three iterative stages of model identification, parameter estimation, and model validation, is laborious, time-consuming, and expensive [28]. There is no doubt that ARIMA model is powerful. However, it is not suitable for those whose works require high-speed modeling processes.
Since the most adopted methods are not suitable for processes requiring short modeling time, the problem encountered in this study is to search for a method which is able to provide the result from desired model as fast as possible. To have an accurate model is not enough. The goal of this paper is to come out with a model that is faster and easier to compute without compromising the model accuracy. Consequently, an alternative time series modeling methodology for autocorrelated process data which overcomes the difficulties and gaps occurred in ARIMA modeling is proposed in this study and it is coined as the logarithmic return (LR) model. The proposed LR model is easy to compute, parsimonious, and has a shorter running time and the output is as accurate as ARIMA model. These advantages suggest that LR model is a significant contribution particularly to the body of time series methodology and generally to the field of SPC. Its conceptual assumption of attributes is derived from the geometric Brownian motion (GBM) law. GBM law, which was named after Robert Brown in 1827, is commonly used in stock price prediction, whereas its application herein is exceptionally promising. The proposed LR model is used to fit the cocoa powder dataset. The produced residuals are assessed and benchmarked with the standard ARIMA model for its accuracy in terms of mean absolute percentage error (MAPE) and the processing speed in terms of running time. In the subsequent section, a detailed mathematical derivation proof of LR model is shown. In Section 3, the results of exploratory study on the important role of LR modeling are reported. Later on, an industrial application is presented in Section 4. Lastly, a conclusion is discussed in Section 5.
2. Proposed LR Modeling for Autocorrelated Process
To deal with autocorrelated process, an ARIMA model is usually constructed to remove the effect of autocorrelation before applying a classical control chart to the i.i.n.d. residuals. The standard methodology for developing an ARIMA model can be found in [17].
Consider a time series process which follows an ARIMA (p, d, q) model,
In equation (1),(i) and where is the back-shift operator such that (ii)’s are i.i.n.d. with zero mean and constant variance .
However, this standard methodology for developing an ARIMA model is laborious and tedious. It needs sophisticated statistical skills to identify the model, estimate the parameter(s), and validate the model. Therefore, a far simpler and more parsimony model is proposed.
According to [16], one of the main features of autocorrelated process control is that the time series data are positive. This feature leads the current study to conduct a meta-analysis and it is identified that one of the potential mathematical laws which govern the positive time series is the geometric Brownian motion (GBM). The GBM process is the solution of the following stochastic differential equation in the sense of Itô calculus,where is a Wiener process, is the drift, and is the volatility. To find the solution of equation (2), Itô’s lemma provides a good tool (see supplementary Appendix I). If is the initial value, the general solution of that equation is given by
From equation (3),where . Since is a Wiener process, then is the standard normal random variable for all . Consequently,where and , for , is a sequence of i.i.n.d. random variables with mean zero and constant variance . Equation (5) can be considered as a special case of a more general linear regression equationwhere the error terms are i.i.n.d. random variables with and which are the regression parameters. Interestingly, if we consider the logarithmic returnsthen are i.i.n.d. More importantly, is a first order autoregressive process, AR(1),where constant is the intercept, is the slope in AR(1) model, and the error terms are i.i.n.d. with zero mean and constant variance. Equation (8) can be written as
This process leads to the following time series model, which is coined as the LR model,
Accordingly, the fitted value of iswhere and are the ordinary least square estimates of and , respectively. Therefore, the residuals at time are . In the rest of the paper, these residuals are used in control chart construction to monitor the process.
Therefore, whether an autocorrelated process is governed by GBM law can be checked based on the above properties. Specifically, if the logarithmic returns of a continuous process are i.i.n.d., it is an indication that the process is governed by GBM law. Consequently, to check whether it is so or not, the following procedures are used.(i)Transform the original time series data into logarithmic return, .(ii)Test whether are i.i.n.d. If the result is significant, then the process is a GBM.
If it is affirmative, then the LR model is given by the properties of GBM. According to the preliminary exploratory study on various public time series datasets, LR modeling is very promising. It might be as accurate as, or even outperform, ARIMA. In the next section, the evidence from an exploratory study is presented to support the proposed LR model.
3. Exploratory Study on Applicability of LR Model
Empirical study plays a key role in better understanding the strengths and weakness of two different methods used in time series modeling. For this purpose, 119 datasets of the 151 datasets from five references are explored. The summary and its results are given in Table 1.
According to the exploratory study that are conducted on time series model building, 119 datasets are dominated by positive data and the remaining are nonpositive data. Further investigation reveals that, when dealing with positive time series data, GBM law can be considered as a potential candidate for the law that governs the data. In other words, the data can be accurately described by using a LR model in equation (11). When LR modeling is experimented to the 119 positive datasets, 85 datasets can be described by using LR model. This is an encouraging result indicating that LR model is very promising. In order to compare the accuracy of LR and ARIMA models, their MAPEs are calculated where the model is considered as highly accurate if MAPE ≤10% [33]. Those resulted observations as presented in Table 1 implied that if a time series dataset can accurately be described by LR model, it can also be described accurately using ARIMA model although the quality of the corresponding residuals might be different.
Besides that, the running time of a time series modeling process is vital. In our exploratory study on the 85 datasets shown in Table 1, the average running time of LR model (0.16 seconds of CPU time) and ARIMA model (5.54 seconds of CPU time) are computed using R programming. It is obvious that the running time of LR model is shorter than ARIMA model. Under GBM law, time series modeling becomes very cost effectiveness and rewarding. It is easy to compute, computationally efficient with short running time, and acceptable accuracy. Therefore, it is suitable for those who require speedy modeling process.
As mentioned earlier, a time series dataset that can accurately be described by LR model can also be described accurately by ARIMA model although the quality of the corresponding residuals might be different as shown in Table 1. Subsequently, the same control charting method could literally be implemented regardless the type of model used. Since the nonnormality and time dependence affect the performance of control charts considerably, the robustness of a model thus lies in terms of the independency and normality levels of the residuals [16]. This entails that the robustness problem lies in terms of the quality of the residuals as the main input in control charting. As a result, if the data are governed by GBM law, it can be accurately described by using either ARIMA or LR model. However, if LR model works well as expected, it is more preferable than ARIMA. To demonstrate the advantages of the LR model, an industrial example is presented in the next section.
4. An Industrial Application
The applicability of the proposed methodology was demonstrated by using a dataset from a cocoa powder industry (see supplementary Appendix II). The name of the industry was kept undeliverable due to its confidentiality. The quality of cocoa powder was determined by the color of the powder, which was measured using the ColorFlex EZ Hunterlab. These attributes consisted of , , and , which represent the color solid for the color space in Figure 1. The represents the lightness, which varies between for white and for black; the represents the red-green direction, which varies between for red and for green; the represents the yellow-blue direction, which varies between for yellow and for blue.

In this study, 112 observations were collected in time order () of the production process. To visualize the presence of autocorrelation in the data, the Lag-1 scatter plots were presented in Figure 2. From these figures, it indicated that the processes were autocorrelated. Then, Durbin-Watson’s test () was used to confirm the presence of autocorrelation. Table 2 shows the affirmative autocorrelation in all attributes.

(a)

(b)

(c)
Therefore, a time series model to remove the effect of autocorrelation was required before a classical control chart on i.i.n.d. residuals could be employed to monitor the process. For the purpose of benchmarking, both ARIMA and LR models were employed to handle the autocorrelation effect. Then, the properties of the resulted residuals from ARIMA model were discussed and compared with those issued by LR models. Finally, the outputs of both model building methods were used for the construction of control chart.
4.1. ARIMA Model
The normal practice in the industry is to apply a fitted time series model and remove the autocorrelation. In this study, R programming was used to construct a best fitted ARIMA model for each of the attributes. The resulted fitted model for is ARIMA (2, 1, 2), is ARIMA (0, 1, 2), and is ARIMA (0, 1, 0), as follows:
Figure 3 shows the scatter plots and Q-Q plots of the residuals produced by the ARIMA models for each attribute. The residual scatter plots of the three attributes signified no autocorrelation and were subsequently confirmed by the DW test. Furthermore, the residual Q-Q plots indicated that normality was assumed, which was proven by the Anderson Darling’s test () with significant value . Therefore, the conditions of i.i.n.d. for all attributes were fulfilled.

(a)

(b)

(c)
4.2. LR Model
To employ the LR model, the attributes , , and were first transformed into their respective logarithmic returns of which their independency and normality assumptions were checked. The results are presented in Table 3. It was noticed that the logarithmic returns for all attributes were i.i.n.d., and thus the process was governed by GBM.
Thereafter, LR models were fitted to the attributes , , and by using R Programing and the corresponding fitted LR models are given as follows:
Figure 4 shows the scatter plots and Q-Q plots of the residuals issued by the respective LR models. The residual scatter plots showed pattern of randomness, and it was supported by the results of the DW test which signified no autocorrelation. Furthermore, the residual Q-Q plots indicated that normality was assumed, which was proven by the AD test with significant value . Therefore, i.i.n.d. residuals were obtained in these LR models.

(a)

(b)

(c)
4.3. Comparison between the Performance of ARIMA and LR Models
The produced residuals from both ARIMA models and LR models were evaluated for their accuracy and speed. It was found that MAPE values were less than 10% (high accuracy) for both models, as shown in Table 4. This signified that the output from LR models was as accurate as ARIMA models.
As for the processing speed, the running time (in second of CPU time) from the LR models was comparatively much shorter than those from the ARIMA models, as shown in Table 5. For each attribute, it was observed that the running time from the LR models was faster than ARIMA models, by 96.2%, 96.7%, and 96.5%, respectively.
Before constructing the multivariate control chart, the randomness of in both ARIMA and LR models were checked (see in Figure 5). At a glance, both scatter plots were random, and no significant trend was identified. Consequently, the multivariate control chart was constructed for ARIMA models () and LR models (). The end results are shown in Figure 6. Comparison of the charts revealed a similar general conclusion; out-of-control signal did not occur along the process. Therefore, the process was considered in-control.

(a)

(b)

(a)

(b)
5. Conclusions
Researchers commented that the ARIMA model was laborious, time-consuming, and expensive. LR model with its conceptual basis drawn from the GBM law was derived in this study as an alternative time series modeling methodology for the ARIMA model. It was then employed and tested on a time dependent cocoa powder dataset from a factory in Malaysia, which described the cocoa powder production color. The resulted residuals were assessed and benchmarked with the typical ARIMA model for their accuracy and CPU running time performances. The standard classical control chart on the i.i.n.d. residuals was subsequently applied. The findings revealed that the output of LR model was equally accurate as the powerful and yet complicated ARIMA model. Moreover, the parsimonious LR model was noticeably easier to compute and taking much shorter running time than its counterpart. On top of these, the multivariate control chart constructed from the LR models portrayed a similar general conclusion that could be drawn as the one obtained from the ARIMA models. The LR model derived was thus undoubtedly a promising alternative time series modeling methodology for the ARIMA model that should be considered in the procedures of SPC when ensuring high quality products is of concern. Recommendations for future research include the construction of a model that is at least equally parsimonious as the LR model and capable of accommodating data of real values nature. Further studies may consider the applicability of LR model in other fields, whereby GBM law is observed based on the research objectives.
Data Availability
All relevant data are included in the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Excellence Fund Grant from Universiti Teknologi MARA (UiTM), Cawangan Sarawak (Project Code No. 600-TNCPI 5/3/DDN(13) (016/2020)). The authors acknowledge UiTM, Cawangan Sarawak for providing the resources and facilities and the cocoa powder industry for supplying the data.
Supplementary Materials
Appendix I contains the derivative and proof of Itô’s lemma. It provides a framework to differentiate the stochastic differential equations which results in a closed-form expression of a GBM process. Then, the properties of the GBM process were derived and exploited in time series modeling of autocorrelated SPC in the study. Appendix II contains 112 cocoa powder data used in the study that were collected from June 2011 until July 2011 at a Cocoa Powder Industry (located in Johor Bahru, Malaysia). (Supplementary Materials)