Abstract

Prediction of well-grounded market information, particularly short-term forecast of prices of agricultural commodities, is the essential requirement for the sustainable development of the farming community. Such predictions are mostly performed with the help of time series models. In this study, the soft computing method is used for short-term forecasting of agriculture commodity price based on time series data using the artificial neural network (ANN). The time series data for sunflower seed and soybean seed are considered as the agriculture commodities. The soybean seed time series data were collected for the period of five years (Jan 2014–Dec 2018), for Akola district market, Maharashtra, India. The sunflower time series data were collected for the period of six years (Jan 2011–Dec 2016), for Kadari district market, Andhra Pradesh, India. The dataset is available at the Indian government website taken from the website www.data.gov.in. For forecasting, the ANN model is used on the abovementioned datasets. The performance of the model is compared with the result of the traditional ARIMA model. The mean absolute percentage error (MAPE) and root mean square percentage error (RMSPE) are considered as the performance parameters for the forecasting model. It is observed that the ANN is a better forecasting model than the ARIMA model by considering the two forecasting performance parameters MAPE and RMSPE.

1. Introduction

In India, the 2/3rd parts of total population directly or indirectly depend on the agriculture [1, 2]. As per the survey conducted by “Agriculture Census of India” in 2011, approximately 62% of Indian population living in rural area is dependent upon agriculture directly or indirectly. To this sector of population, agriculture is the main source of income. India is second ranked in terms of production of agriculture commodity. Agriculture sector contributes almost 18% to the Indian GDP [3]. Agriculture commodities are the important source from the earning point of view. Hence, the influence of commodity price [4] is crucial in Indian economy. The agriculture commodity price forecast will play the important role for the farmers, the policymakers, and various administrative offices. For example, if a farmer knows in advance the price of crop in near future (short term), then he can decide about the farming area of that particular crop to be undertaken. Other than farmers, government agencies also need to know the probable price of commodity in advance for implementing the government schemes (subsidy schemes and import/export activity) smoothly.

Agriculture commodity forecasting is very important for sustainability of future generation. With ever increasing demand of agricultural products and reduction in agricultural land, this forecasting methodology is very important for sustainability of farmers. Indian economy is majorly an agriculture-based economy. This forecasting methodology can help the farmers and other stakeholders to make it sustain for a larger duration. The advantage of this forecasting methodology includes healthy and economical food products to the consumers leading to improved health parameters. This can lead to sustainability of the agricultural land and products.

Forecasting of agricultural commodity is very essential for our day-to-day life. The agricultural price fluctuations are rising nowadays, hence, resulting in mismanagement of people’s food expenditure. Agriculturalists need to resolve this problem for the better future of agricultural commodity.

Fluctuating and rising agricultural prices are one of the major factors resulting in global fight against poverty. Many models are used for the forecasting of which statistical method is used the most. But still no proper model is made to resolve the problem efficiently.

The forecasting of agriculture commodity price is mainly divided into two parts: structural and nonstructural. The structural methods [5] mainly consider the supply demand ratio. Computationally, it is very difficult to estimate the consumers’ needs and the production of that particular crop for developing countries. The nonstructural methods [6] may be categorized as statistical technique [7, 8] and machine learning technique. For nonstructural methods, historical data are collected as time series data. The time series data can be of linear or nonlinear in nature. There are various methods [9] for forecasting based on time series data. The ANN is a better alternative than the statistical model for nonlinear time series data [10, 11].

Some research studies have been performed in forecasting of agriculture commodities price in developing countries such as India. According to [12, 13], there are some special features of the ANN such as nonlinearity, adaptability, and mapping procedures providing strong support for using the ANN as a good forecasting model.

In [14], the ARIMA model and time delay neural network (TDNN) are for time series forecasting of agriculture commodity price. They concluded that the neural network model performed better due to nonlinear nature of time series data. Finally, they presented a hybrid model for forecasting. Surprisingly, the hybrid model was less efficient than the ANN for soybean data and more efficient for mustard.

According to the work reported in [15], the neural network is presented which is a very good alternative for “short term” forecasting, while the Box–Jenkins method performs better for very short-term forecasting. They also discussed that the neural network without a hidden layer can work similar to the Box–Jenkins method.

Work presented in [16] used the support vector machine for forecasting of financial time series data to perform better in terms of efficiency in comparison with the back propagation neural network.

In [17], the authors presented the ANN approach for multivariate time series data. They used the dataset of flour price of three cities, and based on training and testing results, they concluded that the ANN model can well be used for forecasting.

In [18] too, the ANN model is used for electrical load forecasting. They used the characteristics of the ANN to learn from the relationship among the past data, current, and future temperature. Based on the testing data, the result was very satisfactory.

In [19], the Jordan neural network is used in forecasting the inflation based on time series data. They used macroeconomic variables such as financial variable, lagged inflation, and labor market variable. In the work [20], the ANN is used for sales forecasting of the apparel retail chain stores. The MAPE for the model they observed was 8.79%. Some of the applications of the ANN model for forecasting based on time series data are as follows:(1)Electricity load forecasting [21, 22](2)Financial forecasting [10, 23](3)Monthly average rainfall prediction [24]

This study is summarized in five sections. The first and current section contains the brief introduction of problem statement and various solutions given by the scholars. The second section elaborates the computational models ARIMA and ANN. The third section explains the implementation and result analysis for the sake of efficiency measurement of computational models discussed in second section. The fourth section explains the conclusion of work presented in this study.

2. Materials and Methods

Sunflower time series data and soybean time series data are taken in this research work. Statistical description of the data is given in Table 1. Description of soybean time series datais as follows:(1)Taken from “data.gov.in” an Indian government website(2)For the period of five years (January 2014–December 2018)(3)Data related to the Akola district market, Maharashtra, India

Description of sunflower time series data is as follows:(1)Taken from “data.gov.in” an Indian government website.(2)For the period of five years (January 2011–December 2016).(3)Data related to the Kadari district market, Andhra Pradesh, India.

2.1. Forecasting Techniques

Forecasting is defined as the prediction made on the basis of some scientific calculation based on historical data and demand-supply data. Classification of forecasting techniques [25] is shown in Figure 1. Forecasting technique is mainly divided into two types: “Quantitative technique” and “Qualitative Technique.” In the qualitative method, we use the facts that cannot be measured in terms of the numeric value. It is also known as judgmental forecasting [26] where the prediction is made on the basis of survey, events, and many more noncomputational parameters. The quantitative technique [27] works on numerical data or computational data. It is also known as statistical technique or time series technique. The time series forecasting can be divided into two parts: the (a) classical Box–Jenkins Models [15, 28, 29] and (b) machine learning models [30, 31]. The classical models work well on linear data, while the machine learning models work well on a wide range of data. The ARIMA model and ANN model are further discussed later in this section.

Selection of forecasting technique depends upon the various parameters. Some of them are level of accuracy required, purpose of forecasting, type of data available, tenure of the forecasting, and many more. Qualitative models for agriculture commodity forecasting are very expensive and not suitable for developing countries. As India is a developing country, time series forecasting models are suitable to forecast the agriculture commodity price. The agriculture time series data are nonlinear in nature; hence, naturally, the “Artificial Neural Network Model” is a best suitable model [3235] for forecasting of agriculture commodity price.

2.2. Forecasting Using ARIMA

ARIMA stands for autoregressive (AR) integrated (I) moving average (MA). It works on the principle of Box–Jenkins [5, 29, 36, 37]. ARIMA [38] is associated with three important parameters, namely, p, d, and q as shown in Figure 2.

The working model of ARIMA is shown in Figure 3. Visualization of time series data is fundamental and most basic for ARIMA. After visualization, we can do the preprocessing of the data such as removing the outliers and dealing with missing data. By visualization, we can also conclude whether the data are stationary or not. If the series is nonstationary, then first of all, we should make time series stationary. After the stationary time series, we should find the optimal parameters for the ARIMA model with the help of ACF plot and PACF plot [39].

2.3. Forecasting Using the Artificial Neural Network

The feed forward neural network with a single hidden layer is used as shown in Figure 4. Back propagation concept is used for learning purpose. Let “m” be the input size (neurons at the input layer) of the neural network and “n” is the number of nodes at the hidden layer. The input is scaled into the interval [0, 1]. The activation function rectified linear unit (ReLU) [1] is used for finding the activation value for the input layer neuron to hidden layer neuron. Sigmoid is used for calculating the activation for the intermediate layer to output layer. The ReLU function, mathematically, is defined as

The final output of the network will be given by the following equation:where f1 and f2 are the activation functions at the hidden layer and output layer, respectively.

2.4. Learning Method

The training session of the network categories consists of two parts. In the first part of the training, the network will produce the output based on the selected input window. In second part, the error is calculated based on the actual value and the predicted value. Now, this error is propagated back via the output layer for updating the weight of the neurons in the hidden layer for next round as shown in Figure 5.

For training purpose, the error is calculated by comparing the actual value yt with the predicted value. The error is back propagated to the neural network for updating the weight of connection between the hidden layer and output layer.

The output of a neuron is calculated by the following equation:where is the output of neuron i in the Lth layer, (Bias)i is the bias in Lth layer, is the weight from ith neuron of the layer to jth neuron of layer L, and is the output of ith neuron in layer L.

The updated weight at time T will be given by the following equation:where η is the learning rate, α is the momentum, and δ [40] can be calculated with the help of gradient of the output function of neuron.

3. Results and Discussion

3.1. Results by Using ARIMA
3.1.1. Analyzing the Time Series Data

The implementation part is performed in R. Figures 6 and 7 show the plot of monthly average price of soybean for the period of 2014–2018 and monthly average price of sunflower seed for the period of 2011–2016, respectively.

By seeing the boxplot of soybean time series data in Figure 8, it seems that the price is with higher mean and variance in the months of February, March, and April. Similarly, in the boxplot of sunflower time series data, the price is with higher mean and variance in the months of March and April as shown in Figure 9.

3.1.2. Components of Time Series Data

After analyzing the time series data from Figures 8 and 9, we can clearly say that there are components such as seasonality, trend, and cycle which are shown in Figures 10 and 11.

3.1.3. Finding Parameters of the ARIMA Model

Stationarity of the time series data is checked with the help of the “Augmented Dickey–Fuller Test” as shown in Figure 12. The ADF test for the soybean time series data is as follows:Data: Soyabean“Dickey–Fuller” = −2.2649, “Lag order” = 3, “ value = 0.4683”

As value >0.05, hence, the series is not stationary. To get the stationary data, difference operation is applied on the data, and again, the ADF test was performed. Now, the result of ADF test is as follows:Data: diffSoyabeanDickey–Fuller = −3.4255, Lag order = 3, value = 0.04386Alternative hypothesis: stationary

Similarly, the result of ADF test on sunflower time series data is as follows:Data: diffSunflowerDickey–Fuller = −3.829, Lag order = 3, value = 0.02325

3.1.4. Forecasting the Price

The “auto.arima( )” function is used to automatically fit the model based on the input time series data and to find the optimal parameters for the ARIMA model. ARIMA (0, 1, 0) is chosen as an optimal model for forecasting for both sunflower time series data and soybean time series data as shown in Figures 13 and 14. Figures 15 and 16 show the plot of linear models for the soybean time series data and sunflower time series data, respectively.

3.2. Results by Using the ANN

Implementation of the ANN is performed in Python. Two time series data, namely, soybean price and sunflower price are considered as the experimental dataset. The soybean time series data are collected for the period of sixty months (Jan 2014–Dec 2018), for Akola district market, Maharashtra, India [41]. The sunflower time series data are collected for the period of sixty months (Jan 2011–Dec 2016), for Kadari district market, Andhra Pradesh, India.

3.2.1. Data Preprocessing

In data preprocessing, we mainly focus on to analyze the data, remove the noise, deal with the missing values, and transform the input value to the desired scale for the model to be implemented. The first step for data preprocessing is to plot the series. Figure 17 shows the plot of monthly average price of soybean for the period January 2014–December 2018. Similarly, Figure 18 shows the time series data for sunflower.

3.2.2. Training Dataset and Test Dataset

We used the supervised learning concept. The former 80 percent of the preprocessed data is used to train the model and last 20 percent of the data is used to test the model as per the standard accepted by the various scholars [42]. Figure 19 shows the division of the soybean dataset into train data and test data. Similarly, Figure 20 shows the division of the sunflower dataset into train and test data.

3.2.3. Forecasting the Price

For actual forecasting, the trained model is applied on the test data. Figures 21 and 22 represent the comparison of actual price and forecasted price for the agriculture commodities soybean and sunflower, respectively. The forecasted results of the ANN model are given in Tables 2 and 3 for the commodities soybean and sunflower, respectively.

3.3. Evaluating Forecasting Accuracy

We have used two parameters “Mean Absolute Percentage Error (MAPE)” and “Root Mean Square Percentage error (RMSPE)” for the forecasting accuracy.

3.3.1. MAPE

MAPE [43, 44] is one of the important parameters to measure the quality of the forecasting system. It is defined in the following equation:where At is the actual price at time t, Ft is the forecasted price at time t, and n is the number of the forecasted value.The MAPE by using the ANN for the forecasted sunflower time series data is 2.4 and for soybean time series data is 7.7% as given in Table 4.

3.3.2. RMSPE

RMSPE can be calculated by following the given steps:Step 1: Calculate the percentage residuals by using the following formula:where At is the “actual price”, and Ft is the “forecasted price.”Step 2: Calculate the residuals squareStep 3: Calculate the mean of residuals squares by adding the residuals squares and divide it by nStep 4: Calculate the square root of mean obtained in Step 3

4. Conclusion and Future Work

Currently, India is ranked second in the world for production of agricultural commodities and contributes almost 18% in the Indian GDP. Although, the market prices of these commodities fluctuates geographically. To give a better understanding of these fluctuations to stakeholders, in this study, we have presented a short-term price forecasting model which will eventually lead to more sustainability to different stakeholders. For this, we have compared the ANN and ARIMA model for forecasting the prices. We considered sunflower time series data and soybean time series data collected from Indian government portal for training and testing purpose of the proposed forecasting model. The parameters MAPE and RMSPE are used for the accuracy measurement of the presented model. For soybean and sunflower time series data for prices, the mean absolute percentage error (MAPE) by using the ANN is 2.4% and 7.7%, respectively. Whereas by using ARIMA, MAPE for soybean and sunflower time series data is 19.76% and 15.2%, respectively. Similarly, the root mean square percentage error (RMSPE) by using the ANN for soybean and sunflower time series data is 3.15% and 8.92%, respectively, whereas, by using ARIMA for the same time series data, RMSPE is 19.84% and 15.9%, respectively. These results concluded that the ANN is a better model for forecasting of agriculture commodity price than the ARIMA model. As per the literature review, the ANN model is suitable for nonlinear time series data and the ARIMA model is suitable for linear time series data. Hence, future work will be focused on developing the hybrid model for forecasting of agriculture commodity price to overcome the limitation of the ANN model.

Data Availability

The data used to support the findings of this study are taken from the website “http://www.data.gov.in” managed by Government of India.

Conflicts of Interest

The authors declare that they have no conflicts of interest.