Computational Intelligence in Modeling Complex Systems and Solving Complex Problems
View this Special IssueResearch Article  Open Access
Yuehjen E. Shao, JunTing Dai, "Integrated Feature Selection of ARIMA with Computational Intelligence Approaches for Food Crop Price Prediction", Complexity, vol. 2018, Article ID 1910520, 17 pages, 2018. https://doi.org/10.1155/2018/1910520
Integrated Feature Selection of ARIMA with Computational Intelligence Approaches for Food Crop Price Prediction
Abstract
Because of global climate change, lack of arable land, and rapid population growth, the supplies of three major food crops (i.e., rice, wheat, and corn) have been gradually decreasing worldwide. The rapid increase in demand for food has contributed to a continuous rise in food prices, which directly threatens the lives of over 800 million people around the world who are reported to be chronically undernourished. Consequently, food crop price prediction has attracted considerable attention in recent years. Recent integrated forecasting models have developed various feature selection methods (FSMs) to capture fewer, but more important, explanatory variables. However, one major problem is that the future values of these important explanatory variables are not available. Thus, predictions based on these variables are not actually possible. Because an autoregressive integrated moving average (ARIMA) can extract important selfpredictor variables with future values that can be calculated, this study incorporates an ARIMA as the FSM for computational intelligence (CI) models to predict three major food crop (i.e., rice, wheat, and corn) prices. Other than the ARIMA, the components of the proposed integrated forecasting models include artificial neural networks (ANNs), support vector regression (SVR), and multivariate adaptive regression splines (MARS). The predictive accuracies of ARIMA, ANN, SVR, MARS, and the proposed integrated model are compared and discussed. Experimental results reveal that the proposed integrated model achieves superior forecasting performance for predicting food crop prices.
1. Introduction
1.1. Background
Everyone needs food. However, not everyone has enough food to survive. Food crops are a primary source of human food, with rice, wheat, and corn being the most widely consumed sources of grains around the world. In this study, food crops refer to plants, which provide food for human consumption, cultivated by agriculturists. The demand and consumption for food crops will rapidly increase in the future, propelled by a 2.3 billion person increase in global population and greater per capita incomes anticipated through the midcentury [1, 2]. According to the Food and Agriculture Organization of the United Nations (FAO) [3], the demand for food is expected to grow substantially by 2050. A major factor for this increase is world population growth. Today, world population has reached 7.6 billion, and we may reach 9.7 billion by 2050. This growth, along with rising incomes in developing countries, is driving up global food demand. Consequently, humanity may directly face one difficulty, which is “will enough food crops be produced at affordable prices or will rising prices drive more of the humanity into hunger?”
The three most important food crops, rice, wheat, and corn, directly provide more than 50% of all calories consumed by the world population. While the area harvested for wheat each year is 214 million ha, the areas harvested for rice and corn are 154 million ha and 140 million ha, respectively [4]. Additionally, the rapid rise in food prices has been a burden on the poor in developing countries, who spend roughly half their household income on food. The issue of being able to affordably purchase the aforementioned food crops plays a very important role in human life. Therefore, from the human point of view, the prediction of prices for rice, wheat, and corn has become a significant research topic.
1.2. Related Work
Owing to various agricultural and environmental factors, meteorological factors, and biophysical factors, it was indicated in [5] that the exports of rice were very difficult to predict. They employed the autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) methodologies to predict Thailand’s rice exports. The wheat price in the Chinese market was predicted by using the ARIMA, ANN, and linear combination models [6]. The overall results showed that the ANN technique was the best prediction model. In Africa, around 40% of the rice consumed is imported [7]. This high dependence on rice imports indicates that Africa is highly exposed to international rice market shocks with sometimes grave consequences for its food security and political stability. In addition, models were constructed to predict the quarterly prices of two types of food crops, barley and wheat [8]. In [9], it was reported that the world food prices would rise around 32% by 2022.
They showed the dynamic relationship between acreage response, food crop yield, and price volatility by developing an optimization model [10]. In [11], it was stated that most countries would like to predict their annual food requirements in order to provide food security for the people. They proposed an artificial intelligent support vector regression (SVR) model to predict the output energy in rice production.
Because food crop prices are seasonally affected [5, 6], this study uses the ARIMA to predict the prices of rice, wheat, and corn. However, the assumptions inherent to the linear form of ARIMA may encounter problems in adopting nonlinear relationships for practical data [12]. Computational intelligence (CI) techniques have been widely used in many forecasting applications because of datadriven features and fewer a priori assumptions. Accordingly, in addition to ARIMA modeling, this study uses CI schemes, including ANN, SVR, and multivariate adaptive regression splines (MARS), for predicting the prices of the three food crops because they allow nonlinearity modeling and provide good forecasting characteristics.
However, CI modeling may face difficulties in its training process for designing an optimal topology owing to the use of a high number of input variables. Therefore, feature selection methods (FSMs) have been incorporated in order to reduce the number of explanatory or predictor variables [13, 14]. In this study, feature selection refers to the process of identifying a subset of relevant explanatory or predictor variables for use during forecasting model construction. This subset of variables contains fewer but more important input variables that aid in predicting the outcome.
Feature selection techniques have become a research hot spot in many forecasting applications. For example, in order to accurately predict wind speed, an SVR forecasting model with a feature selection procedure called phase space reconstruction has been proposed [15]. Additionally, a set of general ARIMA models was used for performance comparison. A hybrid forecasting model with a feature selection technique based on mutual information, extreme learning machines, and bootstrap techniques was proposed to predict dayahead electricity prices [16]. The authors of [17] proposed a hybrid filterwrapper approach to predict electrical load and the price of electricity. The feature selection method used in their approach identifies a minimum subset of the most informative features by considering the relevancy, redundancy, and interactions of candidate inputs. In [18], the authors proposed a twostep approach that selects a set of candidate features based on data characteristics. A hybrid ANNbased model was then used to predict the dayahead price of electricity. A hybrid forecasting methodology was designed to predict electrical load in [19]. The study used a feature selection method that was developed based on entropy and several CI techniques. The prediction performance of the proposed model was compared to various ARIMA models. A regression model was proposed to predict online sales in [20]. Additionally, a feature selection methodology based on a multiobjective evolutionary algorithm was combined with the forecasting model. A combined ANN and Kalman filter approach was proposed to predict wind speed in [21]. An ARIMA feature selection method was used to determine the input structure for their hybrid model.
When reviewing related research, we noticed that, although many studies have focused on using FSMs to obtain accurate forecasts, little attention has been devoted to the use of FSMs for food crop price prediction. Additionally, we encounter practical problems when using FSMs to predict food crop prices. As mentioned earlier, one of the major purposes for using an FSM is to select a subset of explanatory variables for further use in forecasting. However, the future values of the explanatory variables that are selected by the FSM are unknown. Relatively little research has attempted to address this problem. As a consequence, predictions cannot be computed even though the most important explanatory variables have been identified by effective FSMs. We propose an integrated forecasting model to remedy the problems caused by unknown explanatory variables. The proposed integrated model employs an ARIMA as an FSM to capture important selfpredictor variables, which then serve as input variables for the ANN, SVR, and MARS models. Because the future values of a selfpredictor variable can be computed based on its previous and current values, food crop price predictions can be obtained.
In this study, we propose four singlestage models and three integrated models for predicting food crop prices. A real monthly dataset was obtained, which contains the prices of rice, wheat, and corn from January 1990 to September 2015. This real dataset makes it possible to compare predictions of food crop prices using singlestage models and integrated models. The remainder of the study is organized as follows: Section 2 presents the modeling methodologies of different forecasting schemes. In Section 3, we describe the design structure for our single and twostage integrated models. Real data on the prices of three food crops are collected to verify the singlestage models and integrated models. Section 4 contains forecasting comparisons for four singlestage models and three integrated models. Some practical implications and other concerns are addressed in Section 5. The final section summarizes our research findings and contains our conclusions.
2. Forecasting Methodologies
2.1. Autoregressive Integrated Moving Average
Because seasonal effects are possibly involved in food crop price forecasting, seasonal ARIMA modeling should be developed [22]. A seasonal ARIMA model can be described as follows: where
is the backward shift operator, defined as ;
is the number of seasons in a year, and for monthly data;
is the values of nonseasonal difference transformations;
is the values of seasonal difference transformations;
is the working series, which are stationary after fitting a suitable difference transformation from original time series
is an unknown constant;
is the white noise at time , which is independent and identical (iid) with normal distribution;
is the order of nonseasonal autoregressive (AR) models;
is the order of seasonal AR models;
is the order of nonseasonal moving average (MA) models;
is the order of seasonal MA models;
is a polynomial function for a nonseasonal AR model, defined as ;
is a polynomial function for a seasonal AR model, defined as ;
is a polynomial function for a nonseasonal MA model, defined as ;
is a polynomial function for a seasonal MA model, defined as .
The original nonstationary time series should be transformed into a stationary working series through differencing. Typically, the appropriate transformations can be performed by four combinations of and ; that is, , , , and , respectively. Once the stationary working series has been attained, we can observe the behavior of the sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) to determine the values of , , , and for the seasonal ARIMA models.
They prefer using the maximum likelihood (ML) technique to obtain estimates for model parameters [22]. For the ML technique, the likelihood function is maximized via nonlinear least squares using Marquardt’s method. Because the ML approach is more computationally expensive than conditional least squares (CLS) estimates, most computer packages employ CLS as a default approach. LS refers to the parameter estimate associated with the smallest sum of squared errors (). For the CLS approach, we assume that the number of past unobserved errors is zero. The data series can be represented as follows:
The weights are calculated as follows:
The CLS approach should produce parameter estimates that can minimize the following: where the unobserved past values of are set to 0 and is computed from the estimates of the ARIMA model parameters and at each iteration.
After estimating the model parameters, the diagnostic checks for testing the lack of fit of the seasonal ARIMA model should be conducted. A logical way to check the adequacy of a seasonal ARIMA model is to analyze the residuals obtained from the underlying model. The LjungBox test was developed to examine whether the first sample autocorrelations of residuals indicate adequacy of the model [23]. The null hypothesis for this test is that the first autocorrelations are jointly zero; that is,
The LjungBox statistics are described as where
is the LjungBox statistics, and the asymptotic distribution of follows a chisquare distribution with degrees of freedom;
is the number of observations;
is the degree of nonseasonal differencing;
is the number of parameters other than that must be estimated in the ARIMA model under consideration;
is the square of the sample autocorrelation of the residuals at lag .
The LjungBox test rejects the null hypothesis, which indicates that the underlying model has significant lack of fit if where is the chisquare distribution table value with degrees of freedom and significance level .
2.2. Artificial Neural Networks
ANN is composed of a large number of highly interconnected processing elements, which are referred to as nodes or neurons, working in parallel to solve a particular problem. While the leftmost layer of the network is referred to as the input layer, the rightmost layer is called the output layer. The middle layer is called the hidden layer. In the ANN structure, each layer consists of a number of nodes connected by links. The ANN contains nodes that are connected to themselves, enabling a node to influence other nodes and itself. ANN includes input variables and output variables. In addition, a certain number of nodes are contained in the hidden layer, and the hidden nodes are nonlinear functions of the input variables. The output variable is a function of the nodes in the hidden layer.
ANN modeling is briefly described as follows. For an ANN model, the relationship between inputs and output () can be represented as where and are the model connection weights; is the number of input nodes; is the number of hidden nodes; and is the error term. In the hidden layer, the transfer function is often used with a logistic function.
Consequently, the ANN transports nonlinear functional mapping from the inputs to the output ; namely, where is a vector of model parameters and is a function determined by the ANN structure and connection weights.
For the ANN structure, the nodes in the input layers receive input signals from an external source and the nodes in the output layers generate the target output signals. The output of each neuron in the input layer is the same as the input to that neuron. The hidden layers adjust the weights of those inputs until the neural network’s error is minimized. For ANN processing, backpropagation is one method for computing the error contribution of each neuron after a batch of data is processed. This method can be used to adjust the weight of each neuron, thereby completing the learning process for that case. For each neuron in the hidden layer and each neuron in the output layer, the net inputs are given by where is a neuron in the previous layer, is the output of node , and is the connection weight from neuron to neuron . The neuron outputs are given by where is the input signal from the external source to node in the input layer and is a bias.
The generalized delta rule is the conventional technique used to derive the connection weights in the network. First, a set of random numbers is assigned to the connection weights. Then for a presentation of a pattern with a target output vector , the sum of squared errors to be minimized is given by where is the number of output nodes. By minimizing the error using the gradient descent technique, the connection weights can be updated by applying the following equations: where for output nodes, and for hidden nodes,
Note that the learning rate affects the network’s generalization ability and learning speed to a great extent.
2.3. Support Vector Regression
SVR is an adaptation of the support vector machine (SVM), one of the most powerful techniques in CI [24, 25]. The basic concept of SVR is to find a model function to represent the relationship between the features and target. The modeling of SVR can be described as follows. Suppose where and represent the model input and output vectors, is the weight vector, is a constant, denotes a mapping function in the feature space, and describes the dot production in the feature space F.
Typical regression modeling estimates the coefficients by minimizing the square error, which can be considered as an empirical risk based on loss function. The insensitivity loss function was introduced [25] and can be described as follows: where is the model output and is the region of insensitivity. When the predicted value falls into the band area, the loss is zero. However, when the predicted value falls outside the band area, the loss is defined as the difference between the predicted value and margin.
When both empirical risk and structure risk are considered, the SVR can be designed to minimize the following quadratic programming problem [25]: where is the number of training observations; is the empirical risk; is the structure risk preventing overlearning and lack of applied universality; and is a modifying coefficient representing the tradeoff between empirical and structure risks. With an appropriate modifying coefficient , band area width , and kernel function, the optimum value of each parameter can be solved by the Lagrange procedure. In addition, the general form of the SVRbased regression function can be expressed as follows [25]: where and are the Lagrangian multipliers and satisfy the equality . In addition, is the kernel function. The values of the kernel are equal to the inner product of two vectors, and , in the feature space and ; that is, . Because the radial basis function (RBF) is the most commonly used kernel function [26], we employ it for the experimental study. The RBF is written as follows: where is the width of the RBF.
2.4. Multivariate Adaptive Regression Splines
MARS was developed for solving regressiontype problems [27]. It is a nonparametric regression procedure that makes no assumption about the functional relationship between the response and explanatory variables. MARS modeling is based on a divideandconquer strategy where training datasets are partitioned into separate regions, each of which is assigned to its own regression equation. Consequently, MARS is appropriate and effective for problems with more than two input variables. Particularly, MARS can select important explanatory variables and relationships for complex data structures that often hide in higher dimensional data series.
A MARS model can be described as follows [27]: where and are the parameters, is the number of basis functions (BFs), is the number of knots, takes on values of either 1 or −1 and indicates the right or left sense of the associated step function, is the label of the independent variable, and is the knot location. The optimal MARS model is determined in a twostep procedure. In the first step, a model is grown by adding BFs until an overly large model is obtained. The BFs are then deleted in the order of least contribution to most contribution by using the generalized crossvalidation () criterion in the second step. The measure of variable importance is provided by observing the decrease in the computed values when a variable is removed from the model. The is described as follows: where is the number of observations and is the cost penalty measures of a model containing BF.
3. Food Crop Price Forecasting
3.1. Datasets and Forecasting Criteria
In this study, the price data of the three most important food crops were collected for the period of January 1990 to September 2015 from the web sites of IndexMundi [28–31]. The datasets consist of 309 records for each food crop price. Among them, the first 297 data records were used to develop the different forecasting models, while the remaining 12 data records were used to perform model validation. This study has presented an experiment based on a larger population of data and has foreseen obtaining accurate forecasts of food crop price, with the forecast horizon for the outofsample forecasting experiment of 12. Good prediction of prices may assist in planning agricultural supply, but the investigated data does not contain any information on population.
The forecasting capability of the models was compared using three forecasting accuracy criteria, including mean absolute percentage error (), root mean square error (), and mean absolute error (). These forecasting measurements are expressed as follows: where is the value of the residual at time and is the observation at time .
Obviously, it can be noted that the lower the , , and values, the closer the forecasted values to the actual values.
3.2. Forecast Modeling of Rice Prices
Figure 1 shows the original time plot for the rice price data series. This study uses an SAS package to run the ARIMA modeling. Table 1 shows the parameter estimates, and all the estimated parameters are significantly different from zero. In Table 1, the notations “AR1,1,” “AR1,2,” “AR1,3,” and “AR1,4” correspond to the parameters , , , and of the AR model. The ARIMA model presented in Table 1 was a subset model. Since the parameters, , , , and , are significantly different from zero, this study refers to the model in Table 1 as an AR (1, 2, 7, and 13) model.

In addition, Table 2 shows the LjungBox test results. As mentioned earlier, the LjungBox statistic can be used to check the fit of an ARIMA model. A simple method to derive the hypothesis testing result is described as follows. We should reject the model under consideration by setting the type I error equal to if and only if the value is less than . Generally, is set equal to 0.05. The SAS package computes the LjungBox statistics and their associated values for values of equal to 6, 12, 18, 24, 30, and 36. In Table 2, the first column contains the values of , the second column contains the values of , and the fourth column contains the associated values. By studying Table 2, we observe that all the associated values are greater than . Accordingly, a conclusion can be made concerning the fit of the underlying ARIMA model. Thus, the model expressed in (25) is suitable for modeling rice prices. where

For ANN designs, there is no fixed mode to decide the number of hidden nodes. Too few hidden nodes confine the network generalization capability, whereas too many hidden nodes may lead to overtraining difficulties. Therefore, in this study, we consider the hidden nodes to be set from () to () if , where represents the number of input variables. When , we consider the hidden nodes to be set from () to (). In this study, we denote the term {n_{i}n_{h}n_{o}} for ANN parameter settings, where is the number of neurons in the input layer, stands for the number of neurons in the hidden layer, and represents the number of neurons in the output layer, respectively. Additionally, this study employs the learning rate for all ANN models at the default value (i.e., 0.01) to ensure consistency [32]. The network topology with the minimum MAPE is considered as the optimal network topology, due to the fact that the MAPE is one of the most important performance measurements of forecasting capability.
In this study, because there are no explanatory variables available, the inputs of ANN modeling for food crop price forecasting becomes unfeasible. Consequently, this study uses selfpredictor variables to serve as the inputs for ANN. In here, the selfpredictor variable is defined as the lags or past values of a variable. This study proposes two input designs for the modeling of CI techniques. Because seasonal effects may influence food crop price forecasts, the first design rationally selects the preceding 12 observations to forecast food crop prices at time . Accordingly, the first design used 12 selfpredictor variables (i.e., and ) to serve as input variables and employed a single variable (i.e., ) to serve as the output for ANN modeling. That is, the first design considered 12 input nodes and one output node for the ANN structures. This design is denoted as ANN_{1}.
The second input design for CI technique modeling is performed using FSM. For this proposed modeling, this study incorporated FSMs with ANN, SVR, and MARS to develop forecasting models. The proposed input design used ARIMA as an FSM to extract important selfpredictor variables, which serve as the inputs for the ANN, SVR, and MARS models. Because the ARIMA model for rice price forecasting is obtained in (25), it indicates that , and will influence . Consequently, this study selected , and to serve as selfpredictor, or input, variables for the proposed integrated modeling on the rice price data series. In this study, this second input design is denoted as ARIMAANN.
To forecast the rice prices, the two ANN models contain twelve and four input nodes for the first and second designs, respectively. The hidden nodes were selected as 10, 11, 12, 13, and 14 for the first design and 6, 7, 8, 9, and 10 for the second design. After performing the ANN modeling, the ANN_{1} design showed that the {12141} structure provided the best results and minimum testing MAPE for rice prices. For the ARIMAANN design, the best structure was {471}. Three forecasting measurements for different settings of the ANN topologies for the two designs are shown in Table 3.

For modeling SVR on rice prices, this study adopted the input structure as ANN modeling. SVR modeling employed two designs for the input variables. The first design used 12 variables, and as the input variables and as the output variable. The second design used four selfpredictor variables, , and to serve as the input variables and as the output variable. The first and second SVR designs are denoted as SVR_{1} and ARIMASVR, respectively. Because the parameter settings of and often affect the performance of SVR modeling, an analytic parameter selection approach and grid search are used in this study [26]. Accordingly, we denote the term {, }_{SVR} as the best SVR parameter settings. After performing the SVR modeling, the SVR_{1} design reported that the parameter settings of {2^{−4}, 2^{11}}_{SVR} provided MAPE = 6.912% for rice price forecasting. For the second design, ARIMASVR, the parameter settings of {2^{−5}, 2^{9}}_{SVR} provided MAPE = 2.057%.
For MARS modeling, this study also adopted the same input structure of ANN or SVR modeling. The first and second MARS designs are denoted as MARS_{1} and ARIMAMARS, respectively. Table 4 lists the variable selection results and BFs after modeling rice price data using MARS_{1}. In Table 4, the first column contains the variables that should be included in the model, the second column contains the relative importance of the variables that are listed in the first column, the third column contains the various BFs, and the final column contains the estimated coefficient values for the BFs that are listed in the third column. Based on the results in Table 4, it can be seen that three input variables (i.e., , and ) played important roles in building the MARS forecasting models. Their relative importance values (%) are 100, 13.1, and 12.1. The construction of the BFs can be expressed as follows. Consider the variable as an example. We have two BFs (i.e., and ) to be considered. The observed values of are determined by the values larger than 0 or . The corresponding coefficient is estimated to be 0.985. The observed values of are determined by the values larger than 0 or . The corresponding coefficient is estimated to be −1.258. Accordingly, the MARS forecasting model for rice prices can be expressed as follows:

For ARIMAMARS modeling, the variable selection results and BFs are summarized in Table 5. In addition, it can be seen that three input variables (i.e., , and ) played important roles in building the ARIMAMARS forecasting models. The MARS forecasting model for rice prices can be described as follows: where

3.3. Forecast Modeling of Wheat Prices
Figure 2 displays the original time plot for wheat prices. After modeling the wheat price data series with the ARIMA procedure, the parameter estimates are shown in Table 6. In Table 6, the notations “MA1,1,” “MA1,2,” “MA1,3,” and “MA1,4” correspond to the parameters , , , and of the MA model. Table 7 displays the LjungBox test results, and a conclusion is made on the appropriateness of the underlying ARIMA model. Accordingly, (30) is suitable for modeling wheat prices. where


In addition, stands for the white noise at time .
For ANN modeling on wheat prices, this study also used two ANN designs, same as the structures for modeling rice prices. The first design used 12 selfpredictor variables (i.e., and ) to serve as input variables and a single variable (i.e., ) to serve as the output variable. The second design considers the ARIMA as an FSM to extract important selfpredictor variables, which serve as the inputs for the ANN models. The ARIMA model for wheat price forecasting is obtained in (30) and indicates that , and will influence .
Because , where is the forecast at time , we observe that will influence . Therefore, should be selected as an input variable as long as the component is in the MA model. Following the same logic, because the model (e.i., (30)) contains , and , we conclude that , and will influence . Accordingly, we have selected , and to serve as input variables for the second proposed ANN modeling design for the wheat price data series.
After performing ANN modeling on wheat price data, Table 8 shows the corresponding forecast validity measures for different settings of ANN topologies for the two designs. As shown in Table 8, we observe that the ANN_{1} design with the {12111} structure has the smallest MAPE. For the second, or ARIMAANN design, the {481} structure was associated with the smallest MAPE.

For modeling SVR on wheat prices, this study adopted the input structure as ANN modeling. The first design employed 12 variables, and as the input variables and as the output variable. The second design employed four selfpredictor variables, , and to serve as the input variables and as the output variable. After performing SVR modeling on the wheat price data series, we obtained the parameter settings of {2^{−8}, 2^{10}}_{SVR} and associated MAPE = 4.661% for the first design. In addition, we had parameter settings of {2^{−7}, 2^{−1}}_{SVR} and associated MAPE = 4.347% for the second design.
For MARS modeling with the first design, Table 9 presents the variable selection results and BFs. As shown in Table 9, we notice that three input variables (i.e., , and ) played important roles in building the MARS forecasting models. In addition, the MARS forecasting model for wheat prices can be expressed as follows:

For MARS modeling with the second design, the variable selection results and BFs are summarized in Table 10. Furthermore, the MARS forecasting model for wheat prices is expressed as follows:

3.4. Forecast Modeling of Corn Prices
Figure 3 presents the original time plot for corn prices. Table 11 presents the parameter estimates after performing ARIMA modeling. In Table 11, the notations “AR1,1,” “AR1,2,” “AR1,3,” “AR1,4,” “AR1,5,” and “AR1,6” correspond to as parameters , , , , , and of the AR model. In Table 11, we still contain the parameters of “AR1,2” and “AR1,4” in the model, although the absolute values are less than or equal to 2 (i.e., the typical type I error is chosen as 0.05). The main reason is that those two parameters are important, and the LjungBox statistics indicate that the model is not appropriate if those two parameters are not involved in the underlying ARIMA model. Additionally, Table 12 demonstrates the LjungBox test results and a conclusion is made on the appropriateness of the ARIMA model. Thus, (34) is a suitable ARIMA model for modeling the corn price data series. where


For ANN modeling on corn prices, while the first design used 12 input variables, the second design used six input variables extracted by using the FSM of ARIMA modeling. After performing ANN modeling, Table 13 presents the corresponding forecast validity measures for different ANN topology settings for the two designs. From Table 13, we observe that the ANN_{1} design with the {12141} structure has the smallest MAPE. For the second design, the {6131} structure has the smallest MAPE.

In the modeling of SVR for the corn price data series, this study adopted the same input structure as ANN modeling. The first design employed 12 variables (i.e., and ) as the input variables and as the output variable. The second design employed six selfpredictor variables (i.e., , and ), which were selected by using the FSM of ARIMA modeling to serve as the input variables and as the output variable. After performing SVR modeling, we obtained the parameter settings of {2^{−9}, 2^{7}}_{SVR} and associated MAPE = 4.060% for the first design. In addition, we obtained parameter settings of {2^{−11}, 2^{11}}_{SVR} and associated MAPE = 4.215% for the second design.
For MARS modeling on corn price data with the first design, Table 14 illustrates the variable selection results and BFs. As shown in Table 14, we notice that two variables (i.e., and ) play important roles in building the MARS forecasting models. Consequently, the MARS forecasting model for corn prices can be described as follows:

For MARS modeling with the second design, Table 15 lists the variable selection results and BFs. The MARS forecasting model for the corn prices is expressed as follows:

4. Forecasting Comparison
Several forecasting models were proposed to forecast the three major food crop prices in this study. These models include four single models (i.e., ARIMA, ANN_{1}, SVR_{1}, and MARS_{1}) and three integrated models with FSMs (i.e., ARIMAANN, ARIMASVR, and ARIMAMARS). Tables 16–18 present the forecasting results, as well as the , , and values of the forecasting models for rice, wheat, and corn price forecasting, respectively. Low , , or values are associated with better forecasting accuracy.



In comparison to the forecasting performance of the first design, or single models, in Tables 16–18, we observe that the three CI models demonstrated better performance than the ARIMA models. The possible reason may be that ARIMA modeling is difficult for capturing nonlinear features in the food crop price data series. By reviewing these tables, we found that there is no best model for food crop price forecasting. For example, while the ANN_{1} model seems to possess better forecasting accuracy for rice price forecasting, the SVR_{1} model seems to have better forecasting accuracy for wheat and corn price forecasting.
It can be clearly seen in Tables 16–18 that the proposed integrated models possess better forecasting accuracy than the single models for most cases. For example, by reviewing Table 16, the proposed integrated ARIMAANN model is associated with , , and values of 10.365%, 12.037%, and 2.650%, respectively, for rice price forecasting. These three performance values are smaller than the corresponding performance values for any one of the four single models. Take the proposed integrated ARIMASVR model as another example. By reviewing Table 17, the ARIMASVR model is associated with , , and values of 9.779%, 11.371%, and 4.347%, respectively, for wheat price forecasting. These three performance values are also smaller than the corresponding performance values for any one of the four single models in Table 17. Thus, our proposed models, which were integrated with FSM, provide more accurate forecasting results than the single models.
In addition, Table 19 presents a comparison with respect to the overall percentage improvements (PIs) of forecasting accuracy for the proposed integrated models over the single models. The PIs of the , , and are defined as follows:

From Table 19, it is obvious that positive PIs can be achieved by using the proposed integrated models. For example, in the rice price forecasting, the of the proposed ARIMAANN model over the two single models, ARIMA and ANN_{S}, were 376.038% and 34.226%, respectively. The corresponding and were 369.822% and 34.250% and 340.945% and 27.273%, respectively. Apart from the rice price forecasting, most of the , , and are positive large numbers for wheat and corn price forecasting. Accordingly, considerable forecasting accuracy improvements are achieved through the use of the proposed integrated models. Furthermore, we also found five negative PIs in Table 19.
For wheat price forecasting, the of the ARIMAMARS over MARS_{1} was −1.029%. For corn price forecasting, the of the ARIMAANN over ANN_{1} was −10.973%. The three remaining negative values occurred in corn price forecasting when using the ARIMASVR model, and the , , and were −2.601%, −5.525%, and −3.677%, respectively. A negative value implies that the forecasting performance declined. However, the magnitudes of those five negative values are small, and they have minor effects on forecasting performance. Additionally, the associated for those five negative PIs were only one negative. That is, if a forecasting model has the minimum , this does not mean it has the minimum or .
5. Discussion
While the aforementioned description focuses on accuracy comparisons for each individual food crop price forecast, the following discussion provides the comparison regarding overall performance for food crop price forecasting as a whole.
Table 20 lists the average PIs of the proposed integrated models over their common element, the ARIMA models. Figure 4 presents the average PIs of , , and by employing the proposed integrated models over single ARIMA models. As shown in Figure 4, we notice that considerable accuracy improvements can be reached by using our proposed approaches. Additionally, Table 21 reports the average PIs of , , and by using the proposed integrated models over another element, other than ARIMA element. Figure 5 displays the satisfied average PIs by using the proposed integrated models.


In addition, Figure 6 displays the forecasts of rice prices, obtained by using ARIMA, ANN_{1}, and the proposed ARIMAANN models, for the twelve testing records. One can see that the forecasts of the ARIMAANN model are closest to the actual observations. The forecasts of the basic ARIMA model are relatively far away from the actual observations. Additionally, the prediction accuracy of ANN_{1} is better than that of the ARIMA model. Regarding the ANN_{1} design, we observe that the input vectors are equivalent to and that the output can be obtained by performing a nonlinear functional mapping, as shown in (10). Regarding the ARIMAANN mechanism, we observe that the input vectors are characterized by and that the output can be obtained by performing a nonlinear functional mapping, as shown in (10).
Figure 7 presents the forecasts of rice prices, obtained by using ARIMA, SVR1, and the proposed ARIMASVR models, for the twelve testing records. Regarding the SVR_{1} design, we observe that the input vectors are characterized by and that the output can be obtained by performing a nonlinear functional mapping, as shown in (17). Regarding the ARIMASVR mechanism, we observe that the input vectors are characterized by and that the output can be obtained by performing a nonlinear functional mapping, as shown in (17).
Figure 8 shows the forecasts of rice prices, obtained by using ARIMA, MARS_{1}, and the proposed ARIMAMARS models, for the twelve testing records. Regarding MARS_{1} modeling, we observe that the input vectors are characterized by and that the output can be obtained by performing a nonlinear functional mapping, as shown in (22). Regarding the ARIMAMARS design, we observe that the input vectors are characterized by and that the output can be obtained by performing a nonlinear functional mapping, as shown in (22). By observing Figures 7 and 8, we can see that the forecasts of the ARIMASVR and ARIMAMARS models are closest to the actual observations. Both figures illustrate that the forecasts of the ARIMA models are relatively far away from the actual observations.
Similar implications were obtained for the cases of wheat and corn price predictions. That is, the forecasts of the proposed integrated models are closer to the actual observations. The forecasts of the ARIMA model are far away from the actual observations. These findings can be observed in Figures 9–14.
In this study, we collected 309 records to build various models for the prediction of the prices of rice, wheat, and corn. One assumption for building the ARIMA models is that their structures will remain unchanged over time. Therefore, even if more sample data become available, there is no need to rebuild the ARIMA model for our proposed integrated models. This is a significant advantage over existing models. In other words, the ARIMA feature selection procedure only needs to be performed once. Even if more sample data becomes available, we can still use the same input variables for our proposed ARIMAANN, ARIMASVR, and ARIMAMARS models. Some feature selection modeling processes may need to be performed repeatedly when more sample data become available, which may be a timeconsuming task. It is true that, after performing feature selection with the ARIMA, we can increase the accuracy of our forecasts by performing ANN, SVR, and MARS modeling with the additional sample data.
6. Conclusion
Humans survive based on food crops. We eat significant quantities of rice, wheat, corn, and other simple crops to maintain energy and good health. Accordingly, food crop price forecasting is very important and has drawn considerable attention in recent decades.
Typical CI forecasting techniques require proper explanatory variables to make predictions. However, proper explanatory variables are difficult to capture, and it is infeasible to obtain the future values of these variables. Therefore, we proposed integrated ARIMAANN, ARIMASVR, and ARIMAMARS models in order to perform forecasting for three important food crop prices. The role of the ARIMA element is as an FSM that can capture important selfpredictor variables. Rather than using unavailable explanatory variables, the selfpredictor variables serve as the inputs for ANN, SVR, and MARS modeling. The experimental results reveal that the proposed integrated models are desirable alternatives for food crop price forecasting, because they all have excellent forecasting performance. Most importantly, the main contribution of the proposed models is their ability to provide predictions of food crop prices without requiring extensive effort to obtain the future values of explanatory variables.
One limitation of our models is that the techniques for identifying the correct ARIMA model from the variety of possible models may be unintuitive and computationally expensive. However, the modeling in this study can be used as a guideline for developing forecasting models for other food crop price data series. Additionally, because MARS is effective for selecting important variables for predicting response variables, attempting to extend the FSM of MARS modeling may be a valuable future research direction. Finally, the integrated models may also be combined with other CI techniques, such as extreme learning machines, time delay neural networks, or artificial immune systems, which may be worthy of investigation in the future.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
Acknowledgments
This work is partially supported by the Ministry of Science and Technology of the Republic of China, Grant no. MOST 1062221E030010MY2.
References
 H. C. J. Godfray, J. R. Beddington, I. R. Crute et al., “Food security: the challenge of feeding 9 billion people,” Science, vol. 327, no. 5967, pp. 812–818, 2010. View at: Publisher Site  Google Scholar
 D. Tilman, C. Balzer, J. Hill, and B. L. Befort, “Global food demand and the sustainable intensification of agriculture,” Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 50, pp. 20260–20264, 2011. View at: Publisher Site  Google Scholar
 Food and Agriculture Organization of the United Nations, “How to feed the world in 2050,” 2009, http://www.fao.org/wsfs/forum2050/wsfsforum/en/. View at: Google Scholar
 International Development Research Centre, “Facts & figures on food and biodiversity,” 2017, https://www.idrc.ca/en/article/factsfiguresfoodandbiodiversity. View at: Google Scholar
 H. C. Co and R. Boosarawongse, “Forecasting Thailand’s rice export: statistical techniques vs. artificial neural networks,” Computers & Industrial Engineering, vol. 53, no. 4, pp. 610–627, 2007. View at: Publisher Site  Google Scholar
 H. F. Zou, G. P. Xia, F. T. Yang, and H. Y. Wang, “An investigation and comparison of artificial neural network and time series models for Chinese food grain price forecasting,” Neurocomputing, vol. 70, no. 16–18, pp. 2913–2923, 2007. View at: Publisher Site  Google Scholar
 P. A. Seck, E. Tollens, M. C. S. Wopereis, A. Diagne, and I. Bamba, “Rising trends and variability of rice prices: threats and opportunities for subSaharan Africa,” Food Policy, vol. 35, no. 5, pp. 403–411, 2010. View at: Publisher Site  Google Scholar
 A. Jumah and R. M. Kunst, “Seasonal prediction of European cereal prices: good forecasts using bad models?” Journal of Forecasting, vol. 27, no. 5, pp. 391–406, 2008. View at: Publisher Site  Google Scholar
 U. Chakravorty, M. H. Hubert, M. Moreaux, and L. Nøstbakken, “Longrun impact of biofuels on food prices,” The Scandinavian Journal of Economics, vol. 119, no. 3, pp. 733–767, 2017. View at: Publisher Site  Google Scholar
 R. Cai, J. D. Mullen, M. E. Wetzstein, and J. C. Bergstrom, “The impacts of crop yield and price volatility on producers’ cropping patterns: a dynamic optimal crop rotation model,” Agricultural Systems, vol. 116, pp. 52–59, 2013. View at: Publisher Site  Google Scholar
 M. Yousefi, B. Khoshnevisan, S. Shamshirband et al., “Support vector regression methodology for prediction of output energy in rice production,” Stochastic Environmental Research and Risk Assessment, vol. 29, no. 8, pp. 2115–2126, 2015. View at: Publisher Site  Google Scholar
 M. Khashei and M. Bijari, “An artificial neural network (p, d, q) model for timeseries forecasting,” Expert Systems with Applications, vol. 37, no. 1, pp. 479–489, 2010. View at: Publisher Site  Google Scholar
 Y. E. Shao and C. D. Hou, “Change point determination for a multivariate process using a twostage hybrid scheme,” Applied Soft Computing, vol. 13, no. 3, pp. 1520–1527, 2013. View at: Publisher Site  Google Scholar
 Y. E. Shao, C. D. Hou, and C. C. Chiu, “Hybrid intelligent modeling schemes for heart disease classification,” Applied Soft Computing, vol. 14, pp. 47–52, 2014. View at: Publisher Site  Google Scholar
 G. SantamariaBonfil, A. ReyesBallesteros, and C. Gershenson, “Wind speed forecasting for wind farms: a method based on support vector regression,” Renewable Energy, vol. 85, pp. 790–809, 2016. View at: Publisher Site  Google Scholar
 R. Tahmasebifar, M. K. SheikhElEslami, and R. Kheirollahi, “Point and interval forecasting of realtime and dayahead electricity prices by a novel hybrid approach,” IET Generation, Transmission & Distribution, vol. 11, no. 9, pp. 2173–2183, 2017. View at: Publisher Site  Google Scholar
 O. Abedinia, N. Amjady, and H. Zareipour, “A new feature selection technique for load and price forecast of electrical power systems,” IEEE Transactions on Power Systems, vol. 32, no. 1, pp. 62–74, 2017. View at: Publisher Site  Google Scholar
 A. R. Gollou and N. Ghadimi, “A new feature selection and hybrid forecast engine for dayahead price forecasting of electricity markets,” Journal of Intelligent & Fuzzy Systems, vol. 32, no. 6, pp. 4031–4045, 2017. View at: Publisher Site  Google Scholar
 S. Jurado, A. Nebot, F. Mugica, and N. Avellana, “Hybrid methodologies for electricity load forecasting: entropybased feature selection with machine learning and soft computing techniques,” Energy, vol. 86, pp. 276–291, 2015. View at: Publisher Site  Google Scholar
 F. Jimenez, G. Sanchez, J. M. Garcia, G. Sciavicco, and L. Miralles, “Multiobjective evolutionary feature selection for online sales forecasting,” Neurocomputing, vol. 234, pp. 75–92, 2017. View at: Publisher Site  Google Scholar
 O. B. Shukur and M. H. Lee, “Daily wind speed forecasting through hybrid KFANN model based on ARIMA,” Renewable Energy, vol. 76, pp. 637–647, 2015. View at: Publisher Site  Google Scholar
 G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, HoldenDay, San Francisco, CA, USA, 1970.
 G. M. Ljung and G. E. P. Box, “On a measure of lack of fit in time series models,” Biometrika, vol. 65, no. 2, pp. 297–303, 1978. View at: Publisher Site  Google Scholar
 V. N. Vapnik, “An overview of statistical learning theory,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988–999, 1999. View at: Publisher Site  Google Scholar
 V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 2000. View at: Publisher Site
 V. Cherkassky and Y. Ma, “Practical selection of SVM parameters and noise estimation for SVM regression,” Neural Networks, vol. 17, no. 1, pp. 113–126, 2004. View at: Publisher Site  Google Scholar
 J. H. Friedman, “Multivariate adaptive regression splines,” The Annals of Statistics, vol. 19, no. 1, pp. 1–67, 1991. View at: Publisher Site  Google Scholar
 2017, https://www.indexmundi.com/.
 Rice, https://www.indexmundi.com/commodities/?commodity=rice&months=360.
 Wheat, https://www.indexmundi.com/commodities/?commodity=wheat&months=360.
 Corn, https://www.indexmundi.com/commodities/?commodity=corn&months=360.
 Y. E. Shao and C. C. Chiu, “Applying emerging soft computing approaches to control chart pattern recognition for an SPC–EPC process,” Neurocomputing, vol. 201, pp. 19–28, 2016. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2018 Yuehjen E. Shao and JunTing Dai. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.