Complexity

Volume 2018, Article ID 1910520, 17 pages

https://doi.org/10.1155/2018/1910520

## Integrated Feature Selection of ARIMA with Computational Intelligence Approaches for Food Crop Price Prediction

Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City, Taiwan

Correspondence should be addressed to Yuehjen E. Shao; wt.ude.ujf.liam@3001tats

Received 26 September 2017; Revised 22 May 2018; Accepted 3 June 2018; Published 10 July 2018

Academic Editor: Laszlo T. Koczy

Copyright © 2018 Yuehjen E. Shao and Jun-Ting Dai. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Because of global climate change, lack of arable land, and rapid population growth, the supplies of three major food crops (i.e., rice, wheat, and corn) have been gradually decreasing worldwide. The rapid increase in demand for food has contributed to a continuous rise in food prices, which directly threatens the lives of over 800 million people around the world who are reported to be chronically undernourished. Consequently, food crop price prediction has attracted considerable attention in recent years. Recent integrated forecasting models have developed various feature selection methods (FSMs) to capture fewer, but more important, explanatory variables. However, one major problem is that the future values of these important explanatory variables are not available. Thus, predictions based on these variables are not actually possible. Because an autoregressive integrated moving average (ARIMA) can extract important self-predictor variables with future values that can be calculated, this study incorporates an ARIMA as the FSM for computational intelligence (CI) models to predict three major food crop (i.e., rice, wheat, and corn) prices. Other than the ARIMA, the components of the proposed integrated forecasting models include artificial neural networks (ANNs), support vector regression (SVR), and multivariate adaptive regression splines (MARS). The predictive accuracies of ARIMA, ANN, SVR, MARS, and the proposed integrated model are compared and discussed. Experimental results reveal that the proposed integrated model achieves superior forecasting performance for predicting food crop prices.

#### 1. Introduction

##### 1.1. Background

Everyone needs food. However, not everyone has enough food to survive. Food crops are a primary source of human food, with rice, wheat, and corn being the most widely consumed sources of grains around the world. In this study, food crops refer to plants, which provide food for human consumption, cultivated by agriculturists. The demand and consumption for food crops will rapidly increase in the future, propelled by a 2.3 billion person increase in global population and greater per capita incomes anticipated through the midcentury [1, 2]. According to the Food and Agriculture Organization of the United Nations (FAO) [3], the demand for food is expected to grow substantially by 2050. A major factor for this increase is world population growth. Today, world population has reached 7.6 billion, and we may reach 9.7 billion by 2050. This growth, along with rising incomes in developing countries, is driving up global food demand. Consequently, humanity may directly face one difficulty, which is “will enough food crops be produced at affordable prices or will rising prices drive more of the humanity into hunger?”

The three most important food crops, rice, wheat, and corn, directly provide more than 50% of all calories consumed by the world population. While the area harvested for wheat each year is 214 million ha, the areas harvested for rice and corn are 154 million ha and 140 million ha, respectively [4]. Additionally, the rapid rise in food prices has been a burden on the poor in developing countries, who spend roughly half their household income on food. The issue of being able to affordably purchase the aforementioned food crops plays a very important role in human life. Therefore, from the human point of view, the prediction of prices for rice, wheat, and corn has become a significant research topic.

##### 1.2. Related Work

Owing to various agricultural and environmental factors, meteorological factors, and biophysical factors, it was indicated in [5] that the exports of rice were very difficult to predict. They employed the autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) methodologies to predict Thailand’s rice exports. The wheat price in the Chinese market was predicted by using the ARIMA, ANN, and linear combination models [6]. The overall results showed that the ANN technique was the best prediction model. In Africa, around 40% of the rice consumed is imported [7]. This high dependence on rice imports indicates that Africa is highly exposed to international rice market shocks with sometimes grave consequences for its food security and political stability. In addition, models were constructed to predict the quarterly prices of two types of food crops, barley and wheat [8]. In [9], it was reported that the world food prices would rise around 32% by 2022.

They showed the dynamic relationship between acreage response, food crop yield, and price volatility by developing an optimization model [10]. In [11], it was stated that most countries would like to predict their annual food requirements in order to provide food security for the people. They proposed an artificial intelligent support vector regression (SVR) model to predict the output energy in rice production.

Because food crop prices are seasonally affected [5, 6], this study uses the ARIMA to predict the prices of rice, wheat, and corn. However, the assumptions inherent to the linear form of ARIMA may encounter problems in adopting nonlinear relationships for practical data [12]. Computational intelligence (CI) techniques have been widely used in many forecasting applications because of data-driven features and fewer a priori assumptions. Accordingly, in addition to ARIMA modeling, this study uses CI schemes, including ANN, SVR, and multivariate adaptive regression splines (MARS), for predicting the prices of the three food crops because they allow nonlinearity modeling and provide good forecasting characteristics.

However, CI modeling may face difficulties in its training process for designing an optimal topology owing to the use of a high number of input variables. Therefore, feature selection methods (FSMs) have been incorporated in order to reduce the number of explanatory or predictor variables [13, 14]. In this study, feature selection refers to the process of identifying a subset of relevant explanatory or predictor variables for use during forecasting model construction. This subset of variables contains fewer but more important input variables that aid in predicting the outcome.

Feature selection techniques have become a research hot spot in many forecasting applications. For example, in order to accurately predict wind speed, an SVR forecasting model with a feature selection procedure called phase space reconstruction has been proposed [15]. Additionally, a set of general ARIMA models was used for performance comparison. A hybrid forecasting model with a feature selection technique based on mutual information, extreme learning machines, and bootstrap techniques was proposed to predict day-ahead electricity prices [16]. The authors of [17] proposed a hybrid filter-wrapper approach to predict electrical load and the price of electricity. The feature selection method used in their approach identifies a minimum subset of the most informative features by considering the relevancy, redundancy, and interactions of candidate inputs. In [18], the authors proposed a two-step approach that selects a set of candidate features based on data characteristics. A hybrid ANN-based model was then used to predict the day-ahead price of electricity. A hybrid forecasting methodology was designed to predict electrical load in [19]. The study used a feature selection method that was developed based on entropy and several CI techniques. The prediction performance of the proposed model was compared to various ARIMA models. A regression model was proposed to predict online sales in [20]. Additionally, a feature selection methodology based on a multiobjective evolutionary algorithm was combined with the forecasting model. A combined ANN and Kalman filter approach was proposed to predict wind speed in [21]. An ARIMA feature selection method was used to determine the input structure for their hybrid model.

When reviewing related research, we noticed that, although many studies have focused on using FSMs to obtain accurate forecasts, little attention has been devoted to the use of FSMs for food crop price prediction. Additionally, we encounter practical problems when using FSMs to predict food crop prices. As mentioned earlier, one of the major purposes for using an FSM is to select a subset of explanatory variables for further use in forecasting. However, the future values of the explanatory variables that are selected by the FSM are unknown. Relatively little research has attempted to address this problem. As a consequence, predictions cannot be computed even though the most important explanatory variables have been identified by effective FSMs. We propose an integrated forecasting model to remedy the problems caused by unknown explanatory variables. The proposed integrated model employs an ARIMA as an FSM to capture important self-predictor variables, which then serve as input variables for the ANN, SVR, and MARS models. Because the future values of a self-predictor variable can be computed based on its previous and current values, food crop price predictions can be obtained.

In this study, we propose four single-stage models and three integrated models for predicting food crop prices. A real monthly dataset was obtained, which contains the prices of rice, wheat, and corn from January 1990 to September 2015. This real dataset makes it possible to compare predictions of food crop prices using single-stage models and integrated models. The remainder of the study is organized as follows: Section 2 presents the modeling methodologies of different forecasting schemes. In Section 3, we describe the design structure for our single- and two-stage integrated models. Real data on the prices of three food crops are collected to verify the single-stage models and integrated models. Section 4 contains forecasting comparisons for four single-stage models and three integrated models. Some practical implications and other concerns are addressed in Section 5. The final section summarizes our research findings and contains our conclusions.

#### 2. Forecasting Methodologies

##### 2.1. Autoregressive Integrated Moving Average

Because seasonal effects are possibly involved in food crop price forecasting, seasonal ARIMA modeling should be developed [22]. A seasonal ARIMA model can be described as follows: where

is the backward shift operator, defined as ;

is the number of seasons in a year, and for monthly data;

is the values of nonseasonal difference transformations;

is the values of seasonal difference transformations;

is the working series, which are stationary after fitting a suitable difference transformation from original time series

is an unknown constant;

is the white noise at time , which is independent and identical (iid) with normal distribution;

is the order of nonseasonal autoregressive (AR) models;

is the order of seasonal AR models;

is the order of nonseasonal moving average (MA) models;

is the order of seasonal MA models;

is a polynomial function for a nonseasonal AR model, defined as ;

is a polynomial function for a seasonal AR model, defined as ;

is a polynomial function for a nonseasonal MA model, defined as ;

is a polynomial function for a seasonal MA model, defined as .

The original nonstationary time series should be transformed into a stationary working series through differencing. Typically, the appropriate transformations can be performed by four combinations of and ; that is, , , , and , respectively. Once the stationary working series has been attained, we can observe the behavior of the sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) to determine the values of , , , and for the seasonal ARIMA models.

They prefer using the maximum likelihood (ML) technique to obtain estimates for model parameters [22]. For the ML technique, the likelihood function is maximized via nonlinear least squares using Marquardt’s method. Because the ML approach is more computationally expensive than conditional least squares (CLS) estimates, most computer packages employ CLS as a default approach. LS refers to the parameter estimate associated with the smallest sum of squared errors (). For the CLS approach, we assume that the number of past unobserved errors is zero. The data series can be represented as follows:

The weights are calculated as follows:

The CLS approach should produce parameter estimates that can minimize the following: where the unobserved past values of are set to 0 and is computed from the estimates of the ARIMA model parameters and at each iteration.

After estimating the model parameters, the diagnostic checks for testing the lack of fit of the seasonal ARIMA model should be conducted. A logical way to check the adequacy of a seasonal ARIMA model is to analyze the residuals obtained from the underlying model. The Ljung-Box test was developed to examine whether the first sample autocorrelations of residuals indicate adequacy of the model [23]. The null hypothesis for this test is that the first autocorrelations are jointly zero; that is,

The Ljung-Box statistics are described as where

is the Ljung-Box statistics, and the asymptotic distribution of follows a chi-square distribution with degrees of freedom;

is the number of observations;

is the degree of nonseasonal differencing;

is the number of parameters other than that must be estimated in the ARIMA model under consideration;

is the square of the sample autocorrelation of the residuals at lag .

The Ljung-Box test rejects the null hypothesis, which indicates that the underlying model has significant lack of fit if where is the chi-square distribution table value with degrees of freedom and significance level .

##### 2.2. Artificial Neural Networks

ANN is composed of a large number of highly interconnected processing elements, which are referred to as nodes or neurons, working in parallel to solve a particular problem. While the leftmost layer of the network is referred to as the input layer, the rightmost layer is called the output layer. The middle layer is called the hidden layer. In the ANN structure, each layer consists of a number of nodes connected by links. The ANN contains nodes that are connected to themselves, enabling a node to influence other nodes and itself. ANN includes input variables and output variables. In addition, a certain number of nodes are contained in the hidden layer, and the hidden nodes are nonlinear functions of the input variables. The output variable is a function of the nodes in the hidden layer.

ANN modeling is briefly described as follows. For an ANN model, the relationship between inputs and output () can be represented as where and are the model connection weights; is the number of input nodes; is the number of hidden nodes; and is the error term. In the hidden layer, the transfer function is often used with a logistic function.

Consequently, the ANN transports nonlinear functional mapping from the inputs to the output ; namely, where is a vector of model parameters and is a function determined by the ANN structure and connection weights.

For the ANN structure, the nodes in the input layers receive input signals from an external source and the nodes in the output layers generate the target output signals. The output of each neuron in the input layer is the same as the input to that neuron. The hidden layers adjust the weights of those inputs until the neural network’s error is minimized. For ANN processing, backpropagation is one method for computing the error contribution of each neuron after a batch of data is processed. This method can be used to adjust the weight of each neuron, thereby completing the learning process for that case. For each neuron in the hidden layer and each neuron in the output layer, the net inputs are given by
where is a neuron in the previous layer, is the output of node , and is the connection weight from neuron to neuron *.* The neuron outputs are given by
where is the input signal from the external source to node in the input layer and is a bias.

The generalized delta rule is the conventional technique used to derive the connection weights in the network. First, a set of random numbers is assigned to the connection weights. Then for a presentation of a pattern with a target output vector , the sum of squared errors to be minimized is given by where is the number of output nodes. By minimizing the error using the gradient descent technique, the connection weights can be updated by applying the following equations: where for output nodes, and for hidden nodes,

Note that the learning rate affects the network’s generalization ability and learning speed to a great extent.

##### 2.3. Support Vector Regression

SVR is an adaptation of the support vector machine (SVM), one of the most powerful techniques in CI [24, 25]. The basic concept of SVR is to find a model function to represent the relationship between the features and target. The modeling of SVR can be described as follows. Suppose where and represent the model input and output vectors, is the weight vector, is a constant, denotes a mapping function in the feature space, and describes the dot production in the feature space F.

Typical regression modeling estimates the coefficients by minimizing the square error, which can be considered as an empirical risk based on loss function. The -insensitivity loss function was introduced [25] and can be described as follows: where is the model output and is the region of -insensitivity. When the predicted value falls into the band area, the loss is zero. However, when the predicted value falls outside the band area, the loss is defined as the difference between the predicted value and margin.

When both empirical risk and structure risk are considered, the SVR can be designed to minimize the following quadratic programming problem [25]: where is the number of training observations; is the empirical risk; is the structure risk preventing overlearning and lack of applied universality; and is a modifying coefficient representing the trade-off between empirical and structure risks. With an appropriate modifying coefficient , band area width , and kernel function, the optimum value of each parameter can be solved by the Lagrange procedure. In addition, the general form of the SVR-based regression function can be expressed as follows [25]: where and are the Lagrangian multipliers and satisfy the equality . In addition, is the kernel function. The values of the kernel are equal to the inner product of two vectors, and , in the feature space and ; that is, . Because the radial basis function (RBF) is the most commonly used kernel function [26], we employ it for the experimental study. The RBF is written as follows: where is the width of the RBF.

##### 2.4. Multivariate Adaptive Regression Splines

MARS was developed for solving regression-type problems [27]. It is a nonparametric regression procedure that makes no assumption about the functional relationship between the response and explanatory variables. MARS modeling is based on a divide-and-conquer strategy where training datasets are partitioned into separate regions, each of which is assigned to its own regression equation. Consequently, MARS is appropriate and effective for problems with more than two input variables. Particularly, MARS can select important explanatory variables and relationships for complex data structures that often hide in higher dimensional data series.

A MARS model can be described as follows [27]: where and are the parameters, is the number of basis functions (BFs), is the number of knots, takes on values of either 1 or −1 and indicates the right or left sense of the associated step function, is the label of the independent variable, and is the knot location. The optimal MARS model is determined in a two-step procedure. In the first step, a model is grown by adding BFs until an overly large model is obtained. The BFs are then deleted in the order of least contribution to most contribution by using the generalized cross-validation () criterion in the second step. The measure of variable importance is provided by observing the decrease in the computed values when a variable is removed from the model. The is described as follows: where is the number of observations and is the cost penalty measures of a model containing BF.

#### 3. Food Crop Price Forecasting

##### 3.1. Datasets and Forecasting Criteria

In this study, the price data of the three most important food crops were collected for the period of January 1990 to September 2015 from the web sites of IndexMundi [28–31]. The datasets consist of 309 records for each food crop price. Among them, the first 297 data records were used to develop the different forecasting models, while the remaining 12 data records were used to perform model validation. This study has presented an experiment based on a larger population of data and has foreseen obtaining accurate forecasts of food crop price, with the forecast horizon for the out-of-sample forecasting experiment of 12. Good prediction of prices may assist in planning agricultural supply, but the investigated data does not contain any information on population.

The forecasting capability of the models was compared using three forecasting accuracy criteria, including mean absolute percentage error (), root mean square error (), and mean absolute error (). These forecasting measurements are expressed as follows: where is the value of the residual at time and is the observation at time .

Obviously, it can be noted that the lower the , , and values, the closer the forecasted values to the actual values.

##### 3.2. Forecast Modeling of Rice Prices

Figure 1 shows the original time plot for the rice price data series. This study uses an SAS package to run the ARIMA modeling. Table 1 shows the parameter estimates, and all the estimated parameters are significantly different from zero. In Table 1, the notations “AR1,1,” “AR1,2,” “AR1,3,” and “AR1,4” correspond to the parameters , , , and of the AR model. The ARIMA model presented in Table 1 was a subset model. Since the parameters, , , , and , are significantly different from zero, this study refers to the model in Table 1 as an AR (1, 2, 7, and 13) model.