Abstract

In recent years, few water transportation forecasting studies conduct relative to transportation forecasting. As a neglected area, the inland waterway volume prediction is an important indicator for investment management and government policymaking. Considering the time-series forecasting, some researchers try to narrow the predicted value interval. However, certain limitations detract from their popularity. For instance, if the prediction length is more than ten, the result would not be acceptable. Therefore, we propose a hybrid model that combines both of their unique properties’ advantages to provide more accurate traffic volume forecasts. Also, the forecasting process will be more straightforward. The empirical results present the proposed model and improve the long-term predictive accuracy at waterway traffic volume.

1. Introduction

A long-term prediction usually hardly predicts more than ten periods and follows the time pass, and the accumulated error will make the result barely acceptable. On the contrary, how to combine different influence factors and historical time-series data is also an interesting part for researchers. In this case, some scholars propose Fuzzy Time Series (FTS), Autoregressive Integrated Moving Average (ARIMA), Artificial Neural Networks (ANNs), etc., to deal with this problem.

Thus, this research focused on keeping the long-term prediction result in a narrow interval value range. In other words, the study attempts to combine different methods for exploiting its strong points to increase forecasting accuracy rates. This work will benefit the long-term forecasting area. Usually, the inland waterway volume (VOL) has a long transfer period, and it also contributes to the inland waterway investment management and government policymaking.

When the fuzzy prediction methods were first introduced to deal with the forecasting issue, the historical data were linguistic values [13]. Since then, the researchers have generally accepted the standard form of Fuzzy Time Series (FTS) [4]. After that, numerous research studies in the forecasting area work on the FTS method. Some of them forecasted the pollution concentrations [5]. Duru adopted a fuzzy integrated logical forecasting model for dry bulk shipping index forecasting [6]. Chang and Chen proposed the temperature prediction method based on fuzzy clustering and fuzzy rule interpolation techniques [7]. Chen and Dang mentioned a method to construct with variable spreads [8]. However, both of them used fuzzy rule techniques. Aladag and Basaran proposed an approach that used feedforward neural networks to gain the high-order fuzzy time data [9]. Huang proposed a fuzzy time-series forecasting, which used a multivariable heuristic function to improve forecasting results by integrating various univariate models [10]. Karnik and Mendel designed a type-2 fuzzy logic system to deal with the Mackey-Grass chaotic time-series forecasting problem [11]. Yu and Huang applied neural networks to fuzzy time series forecasting and proposed bivariate models to improve forecasting quality [12]. Wang and Hsu an improved fuzzy time series model for forecasting short-term time-series data [13]. Lau proposed a method to forecast energy consumption change based on the fuzzy logic method [14].

Another common approach for prediction is Autoregressive Integrated Moving Average (ARIMA) model, which was a typical time-series prediction method. Cullinane applied the Box–Jenkins approach to forecast the shipping market and discuss the different forecasting stages in shipping [15]. Fanoodi et al. predicted the blood platelet demands based on Artificial Neural Networks (ANNs) and ARIMA to reduce the supply chain’s uncertainty [16]. Pavlyuk devoted to a multivariate model including ARIMA and Vector Autoregressive Moving Average (VARMA), Error Correction Vector Autoregressive Moving Average (EC-VARMA), Space-time ARMA (STARMA), and Multivariate Autoregressive Space State (MARSS) models to explain short-term traffic flow forecasting [17]. Li and Parson introduced a nonlinear approach in neural networks and evaluated its short and long period forecasting performance relative to ARMA models [18]. Cullinane et al. forecaste the Baltic Freight Index (BFI) and investigate the impact of a change in the BFI composition in 1993, although the results indicated that the modification did not significantly affect the behavior of the BFI [19]. Wong generated the short- and long-term predictions of the Baltic Dry Index (BDI) and used Fuzzy heuristic modeling with Grey System and ARIMA, and the results indicated that the ARIMA has a better performance in the long-term forecast [20].

However, the ARIMA prediction interval was too broad to be precise for the forecasting purpose. In this case, some researchers combine Fuzzy Time Series and ARIMA model to solve this issue. Duru et al. presented a bivariate long-term fuzzy inference system for time-series forecasting tasks in the freight market by combining FTS and ARIMA models [21]. Wong et al. proposed a traditional ARIMA model and the Fuzzy Time Series Method for forecasting the amount of Taiwan export [22]. Torbat et al. proposed a fuzzy autoregressive integrated moving average model as an improved ARIMA version to narrow the forecast interval [23]. Li and Hu raised three examples to test the Neuro-Fuzzy System ARIMA (NFS-ARIMA) model for nonlinear problems forecasting ability [24]. Tseng integrated the time-series ARIMA model and fuzzy regression model as a Fuzzy ARIMA (FARIMA) model to forecast NT dollars’ exchange rate of US dollars [25]. Tseng and Tseng propose a Fuzzy Seasonal ARIMA (FSARIMA) forecasting model to predict the total production value of the Taiwan machinery industry [26].

Compared with the previous works, the contributions of this paper are as follows.

Through original multivariate information, transform into integrated information by constructing a fuzzy membership set. Therefore, Correction Fuzzy Time Series Autoregressive Integrated Moving Average (CFTSARIMA) method dramatically simplifies calculating and efficiently processing large amounts of data.

Generally, the FTSARIMA and FMLR models provide minimum widths’ intervals for a given confidence degree. However, forecasting will significantly deviate from the real data in the long term. In this case, we propose a CFTSARIMA method, and the result could be acceptable. In equation (3), the h value level represents the value range widths’ influences. However, when the h value grows, the prediction interval’s tendency is not changing, instead the space between Upper Control Limit (UCL) and Lower Control Limit (LCL) is increased. Therefore, it was necessary to recalculate a new center of the fuzzy numerical value and the width which was around the center fuzzy number.

Compared to railways and road deliveries, few research pieces focus on inland waterway transportations; however, precise long-term forecasting could benefit the economic aspect. The example demonstrated provides both the best- and worst-possible situations to decision-maker consideration.

3. Problem Statement

In this section, we first briefly review the coupled Fuzzy Clustering, Fuzzy Multiple Linear Regression method, and Fuzzy Time Series Autoregressive Integrated Moving Average method for long-term forecasting. However, the results diverge from observing data for solving the issue. We apply the Correction Fuzzy Time Series Autoregressive Integrated Moving Average (CFTSARIMA) method to solve the projection bias problems specifically. Table 1 summarizes the code of nomenclature for fuzzy ARIMA prediction and the correspondence between the symbol and the definition.

3.1. Fuzzy Clustering (See [7, 27])

Different time-series groups have been chosen to participate in the prediction operation. For reducing the noise in the forecasting information, a fuzzy clustering method is useful.

To measure the cluster similarity, the Membership formula for different fuzzy sets is as follows.

When the predictors are positively correlated with the sample of eigenvalues, .

When the predictors are negatively correlated with the sample of eigenvalues, , in which are fuzzy sets, where , , is the influence factor and is the number of variable collected data sets. Let elements and belong to set . Then, the correlation coefficient lie in [0, 1].

is the similarity between two fuzzy clusters and which are computed throughwhere , , is the predicted output, is the number of data sets, And and are the average value of and , respectively.

In Table 2, the time series are divided into different groups, and Table 3 selects the high similarity data from fuzzy clusters and compares them to y.

3.2. Fuzzy Multiple Linear Regression (See [8, 28])

The general form of a fuzzy multiple linear regression model can be expressed as follows:where , , is the predicted output, is the number of datasets, and , , is the variable of the collected data. The number of independent variables is the fuzzy coefficient of the independent variable.

The criterion of minimizing the total vagueness S is defined as the sum of individual spreads of the fuzzy parameters of the model:

The membership value of each observation is considering the condition that should be greater than an imposed threshold h, h ∈ [0, 1]. The choice of the h level value influences the widths of the fuzzy parameters. This criterion simply expresses the fact that the fuzzy output of the model should “cover” all the data points to a certain h level. denotes a center of a fuzzy parameter, and shows the fuzziness of its parameter. The fuzzy coefficients can be expressed as , , and .

In the following, take in equation (2), and the result is as follows:

3.3. FTSARIMA (See [23, 25, 26, 29])

The basis of this technique is the Autoregressive Integrated Moving Average ARIMA (p, d, q). p is the order of autocorrelation, d is the number of differencing to achieve stationarity, and q is the order of the moving average. The mathematic formula (5) is given aswhere is a time series, and let y denote the differenced (stationarized) version of x, e.g., . K is a constant, the random variables and are error/disturbance terms, and and present correlative parameters. The formula gives the mathematic representation of an ARMA (p, d, q) model. The index t refers to the number of nonfuzzy data used in constructing the model. Transfer the fuzzy regression parameters problem as a linear programming problem [30].

The fuzzy model interval is aswhere is the center of the fuzzy numbers and presents the extension value.

Despite the advantages of the fuzzy time series autoregressive integrated moving average models (FTSARIMA), forecasted intervals are too wide when data are fluctuations or outliers in the datasets. Besides, the results also include the maximum and minimum value of . The result is an interval between and , obtained by the FTSARIMA and a confidence degree of .

3.4. Combined Model

As mentioned above, the weight factors could participate in the combined model and the variance analysis, which presents the impact of different factors through data analysis. Variance decomposition here is used as a weight to help combine the FMLR and FTSARIMA results. The new upper bound is and the new lower bound is :where α is the FMLR’s weight and β is the FTSARIMA’s weight. Through equation (3),where and are vectors of unknown variables. An interval with new width is obtained by the FTSARIMA and FMLR. This combined model considers the history data and other influence factors. The results of this model for long-term forecasting are more accurate than the others.

4. The Empirical Results and Validation

In general, the Mississippi shipping volume appears to be influenced mainly by the local urbanization rate , gross domestic product , rate of industrialization , proportion of railway transportation , proportion of road transportation , proportion of waterway transportation , and rate of agriculture .

The information used consists of 47 annual observations of Mississippi River shipping volume from 1970 to 2016, 30 observations are initially used to formulate the model, and the last 17 observations are used to verify the results (see Supplementary Material for data analysis).

The research step is shown in Figure 1.

From Figure 1, there were four steps to process the FTSARIMA correction model, and the details are as follows.

4.1. U.S. Mississippi Inland Waterway Influencing Factors’ Fuzzy Clustering

The purpose of using the fuzzy clustering method is to reduce the computational complexity and disturbance term, and a simple algorithm is introduced. Then, an equation is proposed to generate fuzzy membership matrices, which are the basis for the final clustering results. Finally, the closer the relationship between the sample and the factor, the greater the influence of the predictive factor on the phenomenon. Therefore, we can consider removing some effect factors.

The similarity of fuzzy clusters is computed through equation (1).

Let be fuzzy sets defined on and with membership functions . Let an element and belong to set .

Four different groups are selected and divided from the similarity coefficient which is greater than or equal to 0.9 (see Table 4).Group 1: Group 2: , , and Group 3: and Group 4:

Through the analysis of pertinence between y and , specially , , and are less correlated with the y set (see Table 5). Therefore, group1 and group2 are retained to rearrange as time series and .

4.2. Fuzzy Multiple Linear Regression

This fuzzy multiple linear regression model (FMLR) is applied to the Mississippi River’s shipping volume. The input data are obtained from the fuzzy clusters and the output data (see Figure 2). In the column , two clustering categories are indicated: group 1 and group 2. From these data, the fuzzy linear system is and the fitting model for the data is given.

The results of fuzzy parameters and are given as , where h = 0, denotes a center of a fuzzy parameter, and shows the fuzziness of its parameter:

In this case, the fuzziness of is 2.23. and are positive values and rely on the correlations variables and . The conventional regression model’s confidential interval seems to estimate the upper and lower observation errors’ limits. Setting  = (0.24, 3.44) and h = 0. The following linear interval model has shown its result in Figure 2.

Following the FMLR 17 predictions, see Figure 3, the actual value is lower than estimated, which means the FMLR method for shipping volume forecasting still needs improvement.

4.3. Fuzzy Time Series ARIMA Building the ARIMA Model

The time-series data were preprocessed using the first-order regular differencing to stabilize the variance and remove the growth trend. The derived model is ARIMA (1, 1, 1), and the equation is

After determining the minimal fuzziness equation (3), the FTSARIMA model is

Setting  = (0.536, 1) and h = 0, the linear interval model is obtained, and the results are demonstrated in Figure 4.

The actual values locate under the prediction interval value set (see Figure 5) below even the most pessimistic forecasts. To improve the accuracy of model prediction in the long term, the influence of the other factors is necessary to be discussed.

4.4. Correction Fuzzy Time Series ARIMA Model

From FMLR and FTSARIMA, two different estimated equations (10) and (13) are obtained. To restructure and , the variance decomposition is introduced to assign a weight to them.

Summing up the results of the variance decomposition, the 10th period (see Table 6) could be considered as the weight value that 0.88 belong to and (1–0.88), is the allocation to . According to the FMLR model and FTSARIMA method, there are two different upper control limit (UCL) and lower control limit (LCL) lines. Based on the table weights 0.88 and 0.12, respectively, the new series are as follows:in which is the result of the FTSARIMA model and are the FMLR’s model result. From equation (13), we can obtain equation (14), and Figure 6 reveals the tendency:

By substituting  = (0.536, 1), , and into equation (9), the new c value has been determined, and the following linear interval model is given in equation (14) and plotted in Figure 7:

Usually, three metrics are used to evaluate prediction intervals: coverage rate, calibration, and sharpness [31, 32]. The coverage refers to the statistical consistency between the forecasts and the observations, and it measures how many observations are inside the prediction interval. The properties of sharpness and resolution refer to the concentration of the predictive distribution, or how wide and variable are the intervals, and refer uniquely to the forecasts [33].

Compared to FMLR, FTSARIMA, and CFTSARIMA with 17 length predictions, only the CFTSARIMA conclusion has covered half of the observed values, and other prediction intervals are much higher than observed values. On the contrary, CFTSARIMA’s prediction intervals are wide-ranging than FMLR and FTSARIMA, and the results are narrow than the traditional ARIMA method.

5. Results and Discussion

Theses examples’ empirical results expose that the possible interval of the fuzzy ARIMA is narrower than the interval of classical ARIMA.

The fuzzy ARIMA usually is indicated as formulating the model. The output is fuzziness, leading to the assumption of white noise; in this case, FTSARIMA requires fewer observations than ARIMA.

In Table 7, UCL and LCL denote the upper control limit and the lower control limit and PV indicates the predicted values from different forecasting methods.

The fuzzy linear regression (FMLR) model is based on the probability distribution of statistics. The relationship between them can only manage the input and output information; therefore, massive information is required. This method is conducive to observing information with trend growth, and the short-term prediction accuracy is only acceptable. On the contrary, long-term forecasts still need to be advanced. Table 7 is compared with four different results. The FTSARIMA is learned from the Fuzzy Linear Regression method and the ARIMA model; compared to both of them, the prediction interval has significantly narrowed. A four- or five-period forecasting result can be acceptable. If the prediction period is more than five, the actual value will be under the prediction interval; neglected other influence factors may cause this.

According to Figure 7 and Table 7, CFTSARIMA showed that the prediction interval contains the most observed value than FMLR and FTSARIMA. Although this method has also enlarged the range of forecasting value, the LCL line keeps a steady trend, and the prediction interval is still narrow than others.

To compare the proposed method’s performance with the existing techniques, we apply the proposed approach to long-time forecasting by using 30 observations and 7 influence indexes. Table 7 shows a comparison of the upper control limit (UCL), lower control limit (LCL), and predicted values (PV) from different methods. The proposed method comes with prediction intervals that have significantly narrowed, and the predicted values are more precise than the existing methods. In other words, the proposed method produces better forecasting results than the strategies presented in [8, 23, 25, 26, 28, 29] for long-time forecasting.

6. Conclusions

In this study, the primary objective is to find an accurate prediction method for long-term waterway traffic volume prediction, and the proposed algorithm consists of four parts.

In the first part, a fuzzy cluster reduces the algorithm design difficulty, and nonessential double-counting raises the counting yield. After cluster, seven influence factors are classified into 4 groups, and only two of them satisfy the criteria.

According to the fuzzy clustering, two group influence factors are converted to two different time series sets. A fuzzy multiple linear regression is used to estimate the upper bound and lower bound .

In the third part, a typically fuzzy time series ARIMA method is adopted to define the center of fuzzy number and the width around the center of the fuzzy number . After that, FTSARIMA prediction intervals of and could be obtained.

In the final part, by utilizing the variance decomposition to identify the weight coefficient, then combine them into the FMLR interval value and FTSARIMA interval value to obtain a new result. Repetition of the FTSARIMA is carried out, using new and , to obtain the CFTSARIMA model. When the influence factors participate in forecasting processing, a long-term prediction becomes possible.

For future work, the prediction could improve in the following aspects:

The fuzzy clustering still needs to increase the method’s robustness to nonstationarity and concept-drift in this case, and incremental learning may be helpful.

Although most of the actual observations in the long-term prediction have entered the prediction range, there is still a gap compared with the accurate prediction. Consider combining the Markov chain in fuzzy clustering to improve accuracy.

Supposing the influence factors have nonlinearity by employing the adaptive filtering algorithm combined with the fuzzy method to forecast the chaotic time series may solve this issue.

Data Availability

The data used to support this findings of the study are included within the supplementary information file.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of the paper.

Acknowledgments

This work was substantially supported by the Research and Demonstration of Regulation Technology of the “Golden Waterway” of the Yangtze River Projects (no. 2016YFC0402103) and a project sponsored by the Yangtze River Survey, Planning, Design and Research Co., Ltd..

Supplementary Materials

In the Supplementary Materials, most data came from “WATERBORNE COMMERCE OF THE United States” and the first line was given the timeline, from the year 1970 to 2016. The first part from lines 2 to 10 refers to the Mississippi River (river system), coal, petro and petro products, chemicals, crude materials, manufactured goods, food and farm, and other. The second part from lines 11 to 19 refers to the Mississippi River (mainstream), coal, petro and petro products, chemicals, crude materials, manufactured goods, food and farm, and other. Both the first and second part unit of measurement is one hundred million tons. The third part from lines 20 to 26 refers to population (100 million), urban population (100 million), urbanization rate (percentage), GDP (USD 1 billion), per capita GDP (USD 10 million), US manufacturing ($1 billion), and US rate of industrialization (percentage). (Supplementary Materials)