Abstract

The accuracy of the wavelet-ARIMA (WA) model in monthly fishery landing forecasting is investigated in the study. In the first part of the study, the discrete wallet transform (DWT) is used to decompose fishery landing time series data. Then ARIMA, as a powerful forecasting tool, is implemented to predict each wavelet transform subseries components independently. Finally, the prediction results of the modeled subseries components are summed to formulate an ensemble forecast for the original fishery landing series. To assess the effectiveness of this model, monthly fishery landing recorded data from East Johor and Pahang states of Peninsular Malaysia have been used as a case study. The result of the study shows that the proposed model was found to provide more accurate fishery landing series forecasts than the individual ARIMA model.

1. Introduction

Fishing is one of the most important industries in Malaysia. For many years, the fisheries sector in Malaysia makes a significant contribution to the national economy in terms of income, foreign exchange, and employment. Besides that, it also plays significant role as a major supplier of animal protein for the local citizen consumptions. In order to ensure that the local demand can be catered without highly depending on imported fish, the authority has kept track of the annual total fishery production and took necessary actions to increase or maintain the level of production while at the same time maintaining a sustainable ecology. To achieve this aim, it is necessary to forecast uncontrollable events, such as possible abundance or biomass changes [1]. However, a proper selection of models for forecasting fishery landing is one of the major research efforts over the past few decades.

Traditional statistical methods such as linear regression, autoregressive, moving average, and autoregressive integrated moving average (ARIMA) models have been applied to forecast the landings and catch per unit effort of many fish and invertebrate [25]. For modelling fisheries sciences time series data, ARIMA model has been popular and widely chosen [1, 2, 69]. The ARIMA model is the standard parametric forecasting model for statistical time series analysis since the 1970s. The ARIMA model is a linear combination of time-lagged variables and error terms. The popularity of the ARIMA model is due to their statistical properties, such as the well-known Box-Jenkins methodology, forecasting capabilities, and richness of information regarding time-related changes. Although ARIMA models have been proven to be effective in many decision support applications, they still have certain shortcomings. They are basically linear models assuming that data are stationary and have a limited ability to capture nonstationarities and nonlinearities in series data [10, 11].

Due to the limitations of the traditional statistical, another approach that has been used for dealing with nonstationary and nonlinear characteristic of a time series is employed, the decomposition approach. Forecasting using a decomposition method is often more useful in providing forecasts and information regarding the component of a time series than trying to predict a single time series [12]. In the last decade, wavelet transforms have been become a common tool for analyzing variations, periodicities, and trends in time series [1316]. Recently, new hybrid models based on wavelet transform have been proposed in time series forecasting. The corresponding empirical results demonstrated that the hybrid wavelet transform with other model outperform individual forecasting model in many cases [1623]. Wavelet transforms provide useful decompositions of the original time series; therefore, wavelet-transformed data improve ability of a forecasting model by capturing useful information on various resolution levels. However, existing literatures regarding fishery landing forecasting have not adopted wavelet transform processes, and this study will be filling this gap.

In this study, we introduce wavelet transform and ARIMA to construct a novel fishery landing forecasting methodology. In this methodology, the original fishery landing series is decomposed into several subseries using wavelet transform by Mallat algorithm. Secondly, the tendencies of these subseries are then modeled and forecasted using ARIMA. Finally, the forecasted value of the proposed model can be obtained by summing the forecasted value of each subseries. In order to evaluate the performance of the proposed approach, the monthly fishery landing series in East Johor and Pahang of Peninsular Malaysia were used as the illustrative example and its prediction performance was compared with some popular individual ARIMA model.

2. Methodology

2.1. ARIMA Model

The ARIMA models were introduced by Box and Jenkins [24] and have dominated many areas of time series forecasting. Box-Jenkins models used ARIMA models composed of the nonseasonal part and seasonal part which are represented by the following way:where , , , are the polynomials in of degree , , , and , respectively. , , , , , and are integers, and are the order of nonseasonal autoregressive and moving average, and and are the order of seasonal autoregressive and moving average, respectively. is the number of regular difference, is the order of seasonal differences, and is the random error.

The Box-Jenkins methodology includes four iterative steps: identification, estimation, diagnostic checking, and forecasting. Figure 1 shows the process of ARIMA modelling.

In the identification step, data transformation is often used to make the time series stationary. The autocorrelation (ACF) and partial autocorrelation function (PACF) are used to determine whether or not the series is stationary and as the basic tools in order to identify the appropriate ARIMA model. Once the tentative model is identified, the parameters of the model are estimated straightforward. The last step in model building is the diagnostic checking of model adequacy. Adequacy of the model was performed by examining the ACF of residual and through diagnostic checks of residual using Ljung-Box test. The process is repeated several times until a satisfactory model is finally selected. The forecasting model was then used to compute the fitted values and forecasts values.

2.2. Wavelet Transform

Wavelet transformations (WT) provide useful decomposition of the original time series by capturing useful information on various decomposition levels. WTs can be divided in two categories: continuous wavelet transforms (CWT) and discrete wavelet transforms (DWT). For the time series , the CWT of the time series with respect to a mother wavelet is defined aswhere () corresponds to the conjugate complex function, stands for a time, stands for time step, and for the wavelet scale. The CWT is not often used for forecasting due to its computationally complex and time requirements to compute. Instead, successive wavelet is often discrete in forecasting applications to simply the numeric solutions. DWT requires less computation time and is simpler to implement. DWT can be defined aswhere is the wavelet coefficient for the discrete wavelet at scale and According to Mallat’s theory, the original discrete time series can be decomposed into a series of linearity independent approximation and detail signals by using the inverse DWT. The inverse DWT is given by Mallat [25]: or in a simple format aswhere is called approximation subseries or residual term at level and    are detail subseries which can capture small features of interpretational value in the data.

3. Results

3.1. Study Areas and Data

To evaluate the performance of the proposed forecasting model using wavelet transform with ARIMA methodologies, this paper uses monthly fishery landing data obtained from the Annual Fisheries Statistics through the official website of Department of Fisheries Malaysia, Ministry of Agriculture and Agro-Based Industry Malaysia. This research is focusing on the marine fishery landing for two different states, East Johor and Pahang of Peninsular Malaysia (Figure 2). Whole monthly data of fishery landing in East Johor and Pahang states, covering the period from January 2001 to December 2012 with a total of 144 observations, are used, as showed in Figure 3. In this fishery landing data, the data from January 2001 to December 2011 are used as training dataset (132 observations) and the remaining data from January 2012 to December 2012 are chosen as testing dataset (12 observations).

3.2. Performance Criteria

For comparison of the forecasting performance of the proposed model, two criteria are used as accuracy measures, namely, the root mean squared error (RMSE) and mean absolute percentage error (MAPE). These criteria are calculated as follows:where is the actual data, is mean of actual data and is the forecasted value of period , and is the number of observations. Obviously the smaller the values of RMSE and MAPE, the higher the efficiency of the model.

4. Forecasting Results

4.1. Fitting ARIMA Model to the Data

Figure 3 describes the curve of monthly fishery landing series in East Johor of Peninsular Malaysia in units of tones. The data show nonlinear, nonstationary, and seasonal characteristic. The sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) for the original fishery landing series are plotted in Figure 4. In the ACF there were significant spikes present near lags 12, 24, 36, 48, and 60, and therefore the series was seasonally differenced with 12 as period. The plot of ACF and PACF after seasonal differencing is shown in Figure 5.

The ACF is damping out in sine-wave manner with significant spikes near lags 1, 7, and 12. In the PACF, there are significant spikes at lags 1, 6, 11, 12, and 13. This indicates a possible ARIMA model. All combinations are evaluated to determine the best model out of these candidate models. The identification of the best model for the fishery landing series is based on minimum AIC. After extensive investigation, the model finally selected was an ARIMA. The model can be expressed asOnce an appropriate model is chosen, the Box-Jenkins methodology requires examining the residuals of the model to verify that the model is an adequate one for the series. For a good forecasting model, the residuals left over after fitting model should be white noise. Figure 6 displays a plot of the standardized residuals, the ACF of residuals and the value of the Ljung-Box statistic at lags 1–20. Inspection of the time plot of standardized residual in Figure 6 shows no obvious patterns. From the residual plot of the ARIMA model, it was found that the ACF of residuals is small and lies within confidence limits which show that the residuals from the best model are white noise. Additionally, the adequacy of the model is confirmed using the Ljung-Box test. The values of Ljung-Box test for all lags exceed 0.05 which means accepting model accuracy at 95% significance level. It is clearly supported that the ARIMA is the adequacy model of the fishery landing series in East Johor state.

According to the above ARIMA model, the future fishery landing from East Johor can be obtained. RMSE and MAPE values of this model for test data set are 971.29 and 8.62%, respectively.

For Pahang data, the fitness model generated from the data set is ARIMA. The equation of this model isDiagnostics for this model are displayed in Figure 6(b), and it appears that this model fits the data well. RMSE and MAPE values of this model for test data set are 1776.01 and 14.77%, respectively.

4.2. Fitting Wavelet Transform-ARIMA Model to the Data

The hybrid wavelet and ARIMA model (WA) is obtained by combining two methods, discrete wavelet transform (DWT) and ARIMA model. In the WA model, the original fishery landing series was decomposed into a certain number of subtime series components which were entered to the ARIMA model in order to improve the model accuracy. When conducting wavelet analysis, the number of decomposition levels that are appropriate for the data must be chosen. To choose the number of decomposition level, the following formula is used [17, 19]: where is the level of decomposition and is the number of time series data. According to this formula, the optimal number of decomposition levels for the fishery landing series data in this study would have been two. The approximation and detail subseries of the original time series of East Johor and Pahang fishery landing series, decomposed at level 2 by the Db2 wavelet, are presented in Figures 7 and 8. Figures 8 and 9 show that the original fishery landing series is decomposed into one approximation (A2) and two detail series (D1 and D2).

In this study, we tried to investigate the effects of the used decomposition level on the model efficiency. To achieve this purpose, the time series data were decomposed into one, two, and three levels by Daubechies-2 (Db2) wavelet. Figure 9 describes the process of hybrid wavelet and ARIMA model using one-level (WA1), two-level (WA2), and three-level (WA3) wavelet decomposition for original fishery landing series, respectively. As can be seen from Figure 9, WA models can be described as the following steps.(i)Decompose the original fishery landing time series ,  , which were decomposed to one, two,and three levels by DWT.(ii)Use the ARIMA model to model each of Ds and the As. ARIMA models are then applied to forecast the future one-day values of these Ds and the As.(iii)The forecasting value is obtained by summing up all the prediction results of each of Ds and the As component.

For comparing the forecasting accuracy, the same testing data set is examined for three proposed forecasting models. The performance measurements of the selected forecasting models are given in Table 1. For East Johor state, it can be observed that the magnitudes of RMSE and MAPE using the proposed W2 and W3 models are almost the same and smaller than those using the W1 and ARIMA models. Table 2 shows the percentage improvement of the proposed model with ARIMA model. The improvement listed in this paper is calculated in terms of RMSE and MAPE bywhere denotes the error of basic method used as comparison, which is here the ARIMA prediction error. Table 2 shows the proposed forecasting procedures using the W1 model which was able to improve the RMSE in comparison with the single ARIMA model by about 24.21%. Forecasting precision of WA2 model has a great improvement to ARIMA where the RMSE and MAPE reduced by 40.88% and 14.86%, respectively. Moreover, WA3 also gives better forecasting than ARIMA (reduced in the RMSE of 40.84% and MAPE of 13.83%).

In Table 1, for Pahang state, these results demonstrate again that the proposed models perform better in fishery landing forecast. Also, it has been observed from Table 2 that the proposed forecasting procedures using the WA1, WA2, and WA3 models lead to 22.99%, 51.23%, and 56.40% reductions in total RMSE and 23.74%, 47.68%, and 52.62% reductions in total MAPE, respectively, in comparison with the ARIMA model alone.

By comparing the obtained results (Table 2), it can be clearly seen that, by increasing the decomposition level to 3, the proposed model’s performance increases; therefore level 3 can be considered as proper decomposition level for the data.

The actual fishery landing data and forecasted values in East Johor and Pahang states for the ARIMA, WA1, WA2, and WA3 models are illustrated in Figures 10 and 11, respectively. It can be observed from Figures 10 and 11 that the forecasted values obtained from the proposed models are closer to the actual values than those obtained from the ARIMA model.

Obviously, the single ARIMA model does not perform well. The forecasting accuracy of the ARIMA model is the worst among all models investigated in this paper. The overall results obtained in this study indicate that, due to the seasonal, nonlinear, and nonstationary appearance of monthly fishery landing, hybrid models are more suitable for forecasting than the linear model (ARIMA).

5. Conclusion

ARIMA models have been widely used in fisheries science time series forecasting problems. Unfortunately, ARIMA models are basically linear not capable of accurately forecasting the fisherly landing time series, due to the fact that the series which is often highly nonstationary, nonlinearity, and seasonality. A fishery landing forecasting methodology based on wavelet transform combined with ARIMA is proposed in this study. To assess the effectiveness of this model, monthly fishery landing record data from East Johor and Pahang states of Peninsular Malaysia have been used as a case study. Empirical results indicate that the proposed model showed a great improvement in fishery landing modeling and produced better forecasts than the ARIMA models alone. ARIMA models have enhanced forecasting accuracy when the wavelet transform is applied to original fishery landing data. Thus it can be concluded that the proposed wavelet-ARIMA model may be an effective tool as a very promising methodology for complex problems such as fishery landing time series forecasting with seasonality variations and nonlinearity.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The author wish to thank the Ministry of Higher Education (MOHE) for the financial support of the work under the Fundamental Research Grant Scheme (FRGS no. 4F275). Also the authors gratefully acknowledge the critical comments and corrections of the anonymous reviews; their comments significantly improved the original paper.