Research Article  Open Access
Won Joong Kim, Gunho Jung, SunYong Choi, "Forecasting CDS Term Structure Based on Nelson–Siegel Model and Machine Learning", Complexity, vol. 2020, Article ID 2518283, 23 pages, 2020. https://doi.org/10.1155/2020/2518283
Forecasting CDS Term Structure Based on Nelson–Siegel Model and Machine Learning
Abstract
In this study, we analyze the term structure of credit default swaps (CDSs) and predict future term structures using the Nelson–Siegel model, recurrent neural network (RNN), support vector regression (SVR), long shortterm memory (LSTM), and group method of data handling (GMDH) using CDS term structure data from 2008 to 2019. Furthermore, we evaluate the change in the forecasting performance of the models through a subperiod analysis. According to the empirical results, we confirm that the Nelson–Siegel model can be used to predict not only the interest rate term structure but also the CDS term structure. Additionally, we demonstrate that machinelearning models, namely, SVR, RNN, LSTM, and GMDH, outperform the modeldriven methods (in this case, the Nelson–Siegel model). Among the machine learning approaches, GMDH demonstrates the best performance in forecasting the CDS term structure. According to the subperiod analysis, the performance of all models was inconsistent with the data period. All the models were less predictable in highly volatile data periods than in less volatile periods. This study will enable traders and policymakers to invest efficiently and make policy decisions based on the current and future risk factors of a company or country.
1. Introduction
A credit default swap (CDS) is a credit derivative based on credit risk, similar to a bond. The prices of both CDSs and bonds change depending on the risk of the reference entity. If the reference entity has a higher risk, then the CDS spread is set higher. To manage credit risk, we can use a CDS contract. The CDS seller (protection seller) insures the protection buyer’s risk in the event of a credit default, such as bankruptcy of the reference entity, debt repudiation, or, in the case of a sovereign bond, a moratorium. There are two ways for a protection seller to compensate the protection buyer’s loss. The first is to buy the underlying asset at face value; the second is to pay the difference between the remaining value and the face value. In this way, the protection buyer can hedge his or her credit risk and give the CDS spread to the protection seller.
A CDS spread is an insurance fee that a protection buyer pays to the protection seller, often quarterly. Its value is determined by factors such as the probability of credit default and recovery rate. The recovery rate is the percentage of the bond value that the reference entity offers to the protection buyer when a credit default happens. Therefore, if the recovery rate is high, the CDS spread will be low. The CDS spread will be high if the default rate is high, which indicates a high probability of credit default. Because the CDS spread indicates the bankruptcy risk of institutions or countries, it is an important economic index that is being actively traded. According to the Bank for International Settlements, the total outstanding notional amount of CDS contracts was $7809 billion in the first half of 2019.
To date, numerous studies have been conducted on the prediction of financial asset values. For example, Li and Tam [1] forecasted stock price movements of different volatilities using a recurrent neural network (RNN) and support vector machine (SVM). Chen et al. [2] predicted the movement of the Chinese stock market using a long shortterm memory (LSTM) based model. Gao et al. [3] also used LSTM to predict stock prices. However, few studies have been conducted on forecasting the CDS term structure. Shaw et al. [4] used the Nelson–Siegel model to make 1, 5, and 10day forecasts of the CDS curve and compared its efficiency with that of the randomwalk method. They showed that, although the 1day forecast was not very effective, the accuracy of the 5 and 10day forecasts outperformed those of the randomwalk model. Avino and Nneji [5] predicted daily quotes of iTraxx Europe CDS indices using linear and nonlinear forecasting models, such as autoregressive (AR) and Markov switching AR models. They found that the AR model often outperforms Markov switching models, but Markov switching models offer a good insample fit for iTraxx index data. Sensoy et al. [6] used permutation entropy to test the weakform efficiency of CDS markets in some countries. They found that CDS markets could be efficient during crisis periods, which implies that the impact of a crisis on CDS market efficiency is limited, and Asian markets outperformed the other tested markets in terms of efficiency. In addition, they showed a negative linear correlation between a country’s CDS efficiency and daily CDS levels. Neftci et al. [7] asserted that CDS markets provide unique information on default probability. They showed that the information provided by a CDS regarding the default risk of a sovereign bond is more accurate than the information from a bond spread provided by the corresponding treasury using a stochastic differential equation based on the Markov process. Duyvesteyn and Martens [8] used the structural model for a sovereign bond from Gray et al. [9] to predict how exchange rate returns and volatility changes affect market CDS spread movements. The model results, such as default probability and spreads, were strongly correlated with CDS spreads. Their results also rejected their hypothesis that changes in sovereign credit spreads are correlated to changes in sovereign market spreads.
As mentioned above, several studies have attempted to predict various financial market indices with machinelearning methods; however, research on CDS term structure is limited. CDS term structure reflects the conditions for monetary policy and companies’ future risk expectations. CDS spread can be classified into two types. The first one is sovereign CDS, which has a country as its reference entity. Sovereign CDS spreads reflect the creditworthiness of a country. That is, the sovereign CDS spread can be considered as a measure of the sovereign credit risk [10]. Furthermore, the sovereign CDS spreads contain some components that are attributed to global risk, according to Pan and Singleton [11] and Longstaff et al. [12]. Studies on sovereign CDS include Pan and Singleton [11], Longstaff et al. [12], Blommestein et al. [10], Galariotis et al. [13], Srivastava et al. [14], Ho [15], and Augustin [16]. The other type of CDS is written with respect to one single reference entity, the socalled singlename CDS. In addition, CDS sector indices are based on the most liquid 5year term, are equally weighted, and reflect an average midspread calculation of the given index’s constituents. However, singlename CDS spreads are much less liquid than indices [17–19]. In several studies, the creditworthiness of individual industries was investigated using CDS sector data [19–22].
The CDS term structure is important because it integrates the future risk expectations of both markets and companies by offering CDS spreads over time. Thus, we can confirm various types of information from the CDS term structure, such as firm leverage and volatility, as shown by Han and Zhou [23]. Furthermore, understanding the implications of the term structure also provides us with a method of extracting this information and predicting the effect of financial events and risk on it. Despite the large number of studies on CDS, studies that attempt to forecast its term structure remain few.
In this study, we analyze the CDS term structure, particularly sovereign CDS, forecast it using machinelearning models, and identify the most suitable model for predicting CDS term structure. We consider modeldriven and datadriven methods: the Nelson–Siegel model, RNN, SVR, LSTM, and GMDH. The Nelson–Siegel model, as a modeldriven method, was devised to fit the yield term structure; however, in this study, it was fitted to the CDS term structure to extract the term structure parameters and forecast the CDS term structure with the AR(1) model. RNN, SVR, LSTM, and GMDH are machinelearning models that specialize in predicting timeseries data. RNN memorizes previous information and uses it to predict future information. LSTM is basically the same as RNN; however, it memorizes only significant information based on some calculations. SVR is derived from the structural risk minimization principle [24] and has been used for prediction in many fields [25–27]. Among the machinelearning methods, a GMDH network is a system identification method that has been used in various fields of engineering to model and forecast the nature of unknown or complex systems based on a given set of multiinputsingleoutput data pairs [28–30].
Machine learning is widely used in various fields to analyze data and forecast future flow. For example, Yan and Ouyang [31] compared the efficiency of the LSTM model in predicting financial timeseries data with that of other machinelearning models, such as SVM and Knearest neighbor. Baek and Kim [32], Yan and Ouyang [31], Cao et al. [33], and Fischer and Krauss [34] also analyzed and forecasted financial data using machine learning. Machine learning is widely used in medical research. Thottakkara et al. [35], Motka et al. [36], Boyko et al. [37], and Tighe et al. [38] studied and predicted various illnesses and clinical data with machinelearning models. Many studies have also been performed to predict weather conditions using machine learning. Choi et al. [39], Haupt and Kosovic [40], Rhee and Im [41], and James et al. [42] conducted research on forecasting weather conditions. Ma et al. [43] and Li et al. [44] used a convolutional neural network (CNN) to predict a transportation network. Furthermore, GMDH has been widely used for timeseries prediction [45–47]. As in these studies, we will apply machinelearning methods to forecast the CDS term structure and identify the most efficient method. There are not many studies on financial data using machinelearning methods compared to other areas, and to the best of our knowledge, this work is the first to present a forecasting model for CDS data. Therefore, although there are many prediction methods, we especially focus on methods which are generally used in the prediction of timeseries data, such as LSTM, RNN, SVR, and GMDH.
Methodologically, we adopt Nelson–Siegel as a modeldriven method and RNN, LSTM, SVR, and GDMH as datadriven methods to predict the CDS term structure for the period (2008–2019). We optimize the datadriven models using a grid search algorithm with the Python technological stack. Furthermore, these tests are explored using subperiod analyses to investigate changes in the model performances over the experimental period. Specifically, we split the entire sample period into two subperiods: January 2008–December 2011 (subperiod 1) and January 2012–December 2019 (subperiod 2), because subperiod 1 contains financial market turbulence due to the global financial crisis and European debt crisis. Through this subperiod analysis, we investigate the change in the forecasting performance of all methods in both highvariance and relatively lowvariance data. This kind of subperiod analysis is common in other studies [48–51].
In timeseries forecasting, sequence models, either RNN, LSTM, or a combination of both, are frequently used owing to considerations of time. The sequence model recognizes time as an order and can check how it changes according to the order; therefore, it can be applied to data, such as weather and finance. According to SiamiNamini and Namin [52] and McNally et al. [53], neural network (NN) models, such as RNN and LSTM, outperformed conventional algorithms, as measured by their autoregressive integrated moving averages (ARIMAs), when using financial data or bitcoin prices. McNally et al. [53] also evaluated the performance of LSTM using volatile Bitcoin data, and Cortez et al. [54] used data from the Republic of Guatemala to predict emergency events. Furthermore, LSTM is known to be better than RNN because it is modified to correct the disadvantages of RNN; however, it appears to depend on the dataset. For example, Samarawickrama and Fernando [55] demonstrated that LSTM exhibited higher accuracy than RNN when predicting stock prices. However, Selvin et al. [56] also compared RNN with LSTM in forecasting stock prices and found that RNN outperformed LSTM. Therefore, in this study, we used both RNN and LSTM to confirm whether LSTM outperforms RNN when forecasting CDS spreads. Ultimately, the motivation for conducting this study is to compare the CDS forecasting performance between the NelsonSiegel model and the RNN, LSTM, SVR, and GDMH models, to determine the difference between modeldriven and datadriven methods.
This paper is organized as follows: in the next section, we review our dataset and present a statistical summary of the CDS term structure; we describe our methods: Nelson–Siegel, RNN, SVR, LSTM, and GMDH, and we explain hyperparameter optimization and its application to the CDS term structure; Section 3 presents our forecasting results on CDS term structure with various error estimates and demonstrates the performance of each model; and Section 4 provides a summary and concluding remarks.
2. Data Description and Methods
2.1. Data Description
The CDS spread can be classified into several categories. The classification method usually depends on the frame of the credit event. The full restructuring clause is the standard term. Under this condition, any restructuring event could be a credit event. The modified restructuring clause limits the scope of opportunistic behavior by sellers when restructuring agreements do not result in a loss. While restructuring agreements are still considered as credit events, the clause limits the deliverable obligations to those with a maturity of less than 30 months after the termination date of the CDS contract. Under the modified contract option, any restructuring event, except the restructuring of bilateral loans, could be a credit event. Additionally, the modifiedmodified restructuring term is introduced because modified restructuring has been too severe in its limitation of deliverable obligations. Under this term, the remaining maturity of deliverable assets must be less than 60 months for restructured obligations and 30 months for all other obligations. Under the no restructuring contract option, all restructuring events are excluded under the contract as “trigger events.”
For this type of CDS, we will use a full restructuring sovereign CDS spread dataset because other datasets are unavailable for long periods. Sovereign CDS spread reflects the market participants’ perceptions of a country’s credit ratings. Our data cover the period from October 2008 to October 2019 and maturities of six months and 1, 2, 3, 4, 5, 7, 10, 20, and 30 years. All data were sourced from Datastream and correspond to the daily closing price of the CDS spread. The term structure of the CDS spread normally shows upward sloping curves, as seen in Figure 1. Furthermore, CDS spreads seem to be lower as they get closer to the current date with no exceptions. Table 1 provides summary statistics of the CDS data. We can also verify that spreads with longer maturities have higher prices in terms of both mean and percentile. It is interesting to note that the standard deviation is also higher when the maturity is longer, which implies that the market predictions are highly unstable for longer periods.
 
per.: percentile. 
2.2. Nelson–Siegel Model
Nelson and Siegel [57] proposed a parsimonious model, and it is widely used to predict the interest rate term structure. The formula is as follows:where is the timedecay parameter; is the maturity; and , , and are the three Nelson–Siegel parameters. is the longterm component of the yield curve as it does not decay to 0 and remains constant for all maturities. is the shortterm factor, which starts at 1 but quickly decays to 0. Finally, starts at 0 and increases before decaying back to 0; hence, it is medium term, which creates a hump in the yield curve.
The Nelson–Siegel model is a simple but effective method for modeling a term structure, and various studies have used the model to predict the yield curve or other term structures. For example, Shaw et al. [4] forecasted CDS using the Nelson–Siegel model to fit the CDS curve. Guo et al. [58] used the Nelson–Siegel model to model the term structure of implied volatility. GrØnborg and Lunde [59] used it to model the term structure of future oil contracts and forecast the prices of these contracts, while West [60] determined the future price of agricultural commodities. In particular, the CDS term structure has a strong relationship with the interest rate term structure. For example, Chen et al. [61] found that interest rate factors not only affected creditspread movements but also forecasted future credit risk dynamics. They claimed that the different frequency components of interest rate movements affected the CDS term structure in various industrial sectors and credit rating classes. Specifically, worsening credit conditions tend to lead to future easing of monetary policy, leading to lower current forward interest rate curves. On the contrary, positive shocks to the interest rate narrow the credit spread at long maturities. Tsuruta [62] tried to decompose the yield and CDS term structure into risk and nonrisk structures and found that credit risk components have a negative relationship to the local equity market.
In this study, we attempted to fit the CDS curve to the Nelson–Siegel model by estimating the timedecay parameter and Nelson–Siegel parameters , , and . We can estimate Nelson–Siegel parameters using various models, such as autoregressivemovingaverage (ARMA) and ARIMA, and select the most accurate model. For example, Shaw et al. [4] used the AR(1) process to estimate , , and . Here, we used the AR(1) process to estimate Nelson–Siegel parameters and timedecay parameters. The error measures mean squared error (MSE), root MSE (RMSE), mean percentage error (MPE), mean absolute percentage error (MAPE), and mean absolute error (MAE) to compare the efficiency of this method with that of other methods, such as RNN or LSTM.
2.3. SVR
SVR is a field of machinelearning models derived from SVM. SVM is an algorithm that returns a hyperplane that separates the training samples into two labels, positive and negative. We refer to the distance between the closest point and the hyperplane as the “margin,” and the goal of SVM is to identify the hyperplane that maximizes the margin. There are two types of margin. The first type is a hard margin, which is for linearly separable datasets, meaning that every point does not violate its label. In other words, all the points can be classified into their labels with a hyperplane. The second one is a soft margin, which is for nonseparable cases. In this case, some points in the dataset, called “outliers,” are incorrectly classified. There are two ways to select a soft margin hyperplane. On the one hand, we can make the margin larger and take more errors (outliers). This is usually used for datasets that have only a small number of outliers. On the other hand, we can choose a hyperplane that has a small margin and minimize the empirical errors. This is useful for datasets with dense point distributions, where it is difficult to separate the data explicitly.
Additionally, the kernel trick can be used for linearly nonseparable datasets. Kernel represents a function that maps origin data points to a higher dimensional dataset that is separable. The reason it is called the “kernel trick” is that, although the dimension of the dataset is increased, the cost of the algorithm does not increase much.
SVM originated from the statistical learning theory introduced by Vapnik and Chervonenkis. The characteristic idea of SVM is to minimize the structural risk, while artificial neural networks (ANNs) minimize the empirical risk. Furthermore, SVM theoretically demonstrates better forecasting than articular neural networks, according to Gunn et al. [63] and Haykin [64].
SVR is derived from SVM. It is a nonlinear kernelbased approach, and the main idea is to identify a function whose deviation from the actual data is located within the predetermined scale. SVR is applied to a given dataset , where is the input vector, is the output, and is the total number of data points. The following formulation was introduced by PérezCruz et al. [65]. SVR assumes that the function is a nonlinear function of the form , where and are the weight and constant, respectively. denotes a mapping function in the feature space. Then, weight vector and the constant are estimated by minimizing the following optimization problem:where is the prespecified value and and are slack variables indicating the upper and lower constraints, respectively. Setting , equations (3) and (4) become the loss function introduced by Vapnik. is the regularization parameter, and is a nonlinear transformation to a higher dimensional space, also known as feature space.
Using Lagrange multipliers and the Karush–Kuhn–Tucker condition, the dual problem for the optimization problem (2)–(4) can be obtained:
To solve the above problem, we do not identify the nonlinear function . The solution can be obtained aswhere is called the kernel function, defined as . Any kernel function satisfying Mercer’s condition can be used as the kernel function (see Mohri et al. [66]).
The selection of the kernel has a significant impact on its forecasting performance. It is a common practice to estimate a range of potential settings and use crossvalidation over the training set to determine the best one. In this research, we use three kernel functions: polynomial, Gaussian, and Sigmoid, as presented in Table 2.

Cao and Tay [67] provided a sensitivity of SVMs to the parameters C and . and play an important role in the performance of SVR. Therefore, it is necessary to choose these parameters properly.
2.4. RNN
An ANN is a classification or prediction process that imitates human neurons. The output of a simple ANN model is generated by multiplying weights assigned to input data. After comparing the output data and the real values to be predicted, we create new weights adjusted according to the error. The step in which weights are multiplied by the input data is called forward propagation, and the step in which the error is calculated and weights are adjusted is called backpropagation. The final goal of the ANN model is to determine the weights that minimize the error between the predicted and target values.
A CNN is a machinelearning method that uses a neural network algorithm. It consists of convolution layers, pooling layers, and neural network layers. A convolution layer uses a “filter” to analyze data, typically vectorized image data. The filter analyses small sections while moving over the entire dataset, and each section expresses a “feature” of the data with pooling layers.
An RNN is another representative neural network model that has a special hidden layer. While a simple neural network has a backpropagation algorithm and adjusts its weights to reduce prediction errors, the RNN has a hidden layer that is modified by the hidden layer of the previous state. Each time the algorithm operates, the RNN hidden layer affects the next hidden layer of the algorithm. Because of its characteristics, RNN is an optimized method to analyze and predict nonlinear timeseries data, such as stock prices. It is an algorithm operating in sequence with input and output data. It can return a single output from one or more input data and return more than one output from one or more input data. One of its characteristics is that it returns the output in every hidden timestep layer and simultaneously sends it as input data to the next layer; we demonstrate the simplified structure in Figure 2. RNN has a memory cell in the hidden layer, which returns the output through various activation functions, such as the sigmoid and softmax functions. The memory cell memorizes the output from the previous timestep and uses it as input data recurrently. For instance, at a specific time , the output of the previous timestep and input of timestep are used as input data, and the output is among the input data of the next timestep .
The greatest difference between RNN and CNN or multilayer perceptron (MLP) is that CNN and MLP do not consider previous state data in later steps, but RNN considers both the output of the previous state and the input of the present state. Furthermore, as it is optimized to deal with sequential data, it is used in text, audio, and visual data processing.
However, RNN has a vanishing gradient problem in long backpropagation processes. The algorithm of an RNN is based on gradient descent and modifies its weights in each timestep after one forward propagation process. Weights are modified with error differentials so that these rapidly converge to zero with repetitive backpropagation—this is called the vanishing gradient problem. To solve this problem in longterm timeseries data, LSTM is widely used.
2.5. LSTM
To solve the vanishing gradient problem of RNN, Hochreiter and Schmidhuber [68] proposed LSTM, while Gers and Schmidhuber [69] added a forget gate to improve it. RNN considers all previous timestep memories, whereas LSTM chooses only the necessary memories to convey to the next timestep, using an algorithm in a special cell called the LSTM cell. Each of the cells has a forget gate, input gate, output gate, and long shortterm memory (, ) that pass these cells, as shown in Figure 3.
Input data are deleted, filtered, and added to the longterm memory in the forget gate. The forget gate generally uses a sigmoid function as an activation function that transposes input data and shortterm memory into numbers ranging from zero to one. This implies that if the output of the forget gate is close to zero, then most of the information will not pass through; if the output is close to one, then most of the information will pass to the next cell. Next, the input gates decide which data from input and shortterm memory must be added after substitution to and .
generates new candidate vectors that could be added to the present cell state, and decides the amount of the information that generated to save. uses the sigmoid function in the same way as the forget gate with the same meaning, i.e., if the value of is close to one, then most of will pass through, and if it is close to zero, then most would not be taken in this cell. is computed with the input gate value and forget gate value. By multiplying with , the amount of information from the previous timestep cell that will be memorized is determined. Finally, the output gate decides which data will be the output of each cell, considering the memory term and .
The processes performed by each gate are expressed as follows:
and are the weights of and , respectively. For example, is the weight of input data to input gate .
To develop an LSTM model, we must assign the initial values of and . As mentioned by Zimmermann et al. [70], we set both initial memory term values as zero. LSTM is broadly applied to forecast timeseries data; however, owing to its complexity, Chung et al. [71] designed a simpler model called a gated recurrent unit (GRU) while adopting the advantages of LSTM. GRU consists of a reset gate, which decides how to add new input data to the previous cell memory, and an update gate, which decides the amount of memory of the previous cell to save. However, as our dataset is not very large, we used the LSTM model and compared its performance in forecasting the CDS term structure with RNN.
2.6. GMDH
GMDH is a machinelearning method based on the principle of heuristic selforganizing, proposed by Ivakhnenko [72]. The advantage of GMDH is that various considerations, including the number of layers, neurons in hidden layers, and optimal model structure, are determined automatically. In other words, we can apply GMDH to model complex systems without a priori knowledge of the systems.
Suppose that there is a set of variables consisting of and one variable. The GMDH algorithm represents a model as a set of neurons in which different pairs in each layer are connected via quadratic polynomials, and they generate new neurons in the next layer [28, 73]. Figure 4 shows the simplified structure. The formal identification problem of the GMDH algorithm is to identify a function that can be used to forecast the output for a given input vector as close as possible to its actual output instead of actual function . Therefore, we can describe the observations of multiinput and single output data pairs as follows:
We train a GMDH network to predict the output for any given input vector , which is given as
Now, the GMDH network is determined by minimizing the squared sum of differences between sample outputs and model predictions, that is,
The general connection between input and output variables can be expressed by a series of Volterra functions:where is the input variable vector and is the weight vector. Equation (10) is known as the Kolmogorov–Gabor polynomial [28, 45, 72, 74, 75].
In this study, we use the secondorder polynomial function of two variables, which is written as
The main objective of the GMDH network is to build the general mathematical relation between the inputs and output variables given in equation (10). The weights in equation (11) are estimated using regression techniques so that the difference between actual output () and the calculated output () is minimized, described as
These parameters can be obtained from multiple regression using the least squares method, and we can compute them by solving some matrix equations. Refer to [28, 29, 46, 76] for a detailed description of the parameter estimation process. The GMDH network can be associated with various algorithms, such as the genetic algorithm [77, 78], singular value decomposition [28], and backpropagation [29, 46, 73, 79–81]. We also improved the GMDH network using backpropagation.
2.7. Hyperparameter Optimization
Hyperparameter optimization refers to the problem of determining the optimal values of hyperparameters that must be set up in advance to perform training and that can complete the generalized performance of the training model to the highest level. In the deeplearning model, for example, the learning rate, batch size, etc. can be regarded as hyperparameters, and in some cases, they can be added as targets for exploration as hyperparameters that determine the structure of the deeplearning model, such as the number of layers and the convolution filter size. Hyperparameter optimization typically includes manual search, grid search, and random search.
Manual search is a way for users to set hyperparameters individually and compare performances according to their intuition. After selecting the candidate hyperparameter values and performing training using them, the performance results measured against the verification dataset are recorded, and this process is repeated several times to select the hyperparameter values that demonstrate the highest performance. This is the most intuitive method; however, it has some problems. First, it is relatively difficult to ensure that the optimal hyperparameter value to be determined is actually optimal because the process of determining the optimal hyperparameter is influenced by the user’s selections. Second, the problem becomes more complicated when attempting to search for several types of hyperparameters at once. Because there are some types of hyperparameters that have mutually affecting relationships with others, it is difficult to apply an existing intuition to each single hyperparameter.
Grid search is a method of selecting candidate hyperparameter values within a specific section to be searched at regular intervals, recording the performance results measured for each of them, and selecting the hyperparameter values that demonstrated the highest performance (see Hsu et al. [82]). The user determines the search target, length of the section, interval, etc., but more uniform and global search is possible than in the previous manual search. On the contrary, the more the hyperparameters to be searched that are set at one time, the longer the overall search time, and it increases exponentially.
Random search (see Bergstra and Bengio [83]) is similar to grid search but differs in that the candidate hyperparameter values are selected through random sampling. This method can reduce the number of unnecessary repetitions and simultaneously search for values located between predetermined intervals so that the optimal hyperparameter value can be determined more quickly. Random search has the disadvantage that unexpected results can be obtained by testing various combinations other than the values set by the user.
The grid search and random search algorithms are illustrated in Figure 5. In this study, we use the grid search algorithm because it is the simplest and is most widely used for determining optimal hyperparameters [84]. Although a random search can perform much better than grid search for highdimensional problems, according to Hutter et al. [85], our data are simple timeseries data, and the candidate parameter set is limited; thus, we use the grid search algorithm [86, 87]. The Python technological stack was used for experiments. We implemented the machinelearning algorithms and grid search via “Keras,” “TensorFlow,” and “GmdhPy.”
(a)
(b)
3. Empirical Results
We used 2886 daily timeseries data points on CDS term structure from October 2008 to October 2019. Because international financial markets from 2008 to 2011 were unstable, we divided these data into two subperiods, and we measured the forecasting performance of the five methods we used in both highvariance and relatively lowvariance data. The first training dataset is from 1st October 2008 to 22nd January 2019 (full period), the second one is from 1st October 2008 to 9th September 9th 2011 (subperiod 1), and the third one is from 2nd January 2012 to 22nd January 2019 (subperiod 2). We selected our test dataset as the last 200 days (from 23rd January 2019 to 29th October 2019, test dataset 1) for each maturity in the full period, the subperiod 2, and last 80 days (from 12th September 2011 to 30th December 2011, test dataset 2) for the subperiod 1. There is a gap between subperiod 1 and subperiod 2 because of the test dataset 2 for subperiod 1 training set. These all cases are summarized in Table 3. Summary statistics for the test dataset are provided in Tables 4 and 5. Test dataset 2 has higher standard deviations than test dataset 1. Through this subperiod analysis, we compared the prediction power of the models in a relatively volatile period (subperiod 2) and a less volatile period (subperiod 1). We used grid search to optimize the parameters in RNN, LSTM, SVR, and GMDH and calculated the RMSE, MSE, MAPE, MPE, and MAE to compare the performance of these five models. Figures 6–11 show the performance of the Nelson–Siegel, RNN, LSTM, SVR, and GMDH models with the test datasets for each maturity.



(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
Our main findings can be summarized as follows: first, as shown in Figures 6–11, every model provides accurate predictions of CDS term structure. Figures 12–14 also show that machinelearning methods have similar accuracy and outperformed the Nelson–Siegel with AR(1) model. This proves that machinelearning models can be applied to forecasting CDS timeseries data and that the Nelson–Siegel model fits both the interest rate term structure and CDS term structure. Furthermore, GMDH, SVR, and RNN have very similar accuracies in all periods and maturities. Second, comparing the Nelson–Siegel model with the four machinelearning methods in predictive power, the Nelson–Siegel model shows the poorest performance for all test sets. That is, machinelearning algorithms are more effective in predicting CDS spread than the Nelson–Siegel model, based on interest rate term structures, which play an important role in determining CDS spread levels. Third, among the machinelearning methods, GMDH presents the best prediction results. The error of the GMDH was found to be the lowest among the five methods, as shown in Tables 6–8. In addition, we expected LSTM to outperform RNN, but the RNN model slightly outperformed the LSTM model. However, this result remains debatable, as mentioned in Introduction. Performance comparisons between machinelearning algorithms are finding different conclusions in different studies [55, 56, 88–91]. Fourth, the periods with higher standard deviations are generally harder to predict accurately, as seen in Tables 7 and 8. Additionally, the maturities with higher standard deviations are generally harder to predict accurately, as seen in Figures 12–14. The changes in the standard deviation and in the forecasting error are similar for most error measures except MAPE and MPE, as shown in Figure 13.
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(e)
(f)
 
NS: Nelson–Siegel. 


4. Summary and Concluding Remarks
The purpose of this study is to compare the prediction of CDS term structure between the Nelson–Siegel, RNN, LSTM, SVR, and GMDH models. We determined the most suitable model to predict timeseries data, especially the CDS term structure. The CDS spread is a default risk index for a country or company; hence, this study is useful because it not only offers the best timeseries forecasting model but also predicts future risk.
Existing studies on the prediction of CDS term structure and other risk indicators using machinelearning models remain few; most focus on stock price prediction. This study is significant because it demonstrated that various machinelearning models can be applied to other timeseries data, and further research on various timeseries data using machinelearning models is expected. This study also confirmed that datadriven methods, such as RNN, LSTM, SVR, and GMDH, outperform the modeldriven Nelson–Siegel method, which is usually used in analyzing the CDS term structure. The performance of modeldriven methods could decline if the data have a significant number of outliers because it is dependent on the assumption that the dataset can be formalized on a specific formula. In our dataset, the presence of outliers made it difficult to make predictions with modeldriven methods. On the contrary, datadriven methods were not affected by outliers (see Solomatine et al. [92]), as these consider only datasets that include outliers. As most data available today have many outliers, it is not surprising that datadriven methods outperform modeldriven ones.
Some studies show that linear models such as AR are better than ANNs [93–95] for forecasting time series. However, CDS series data are not persistent and volatile, as shown in Figure 1, so Nelson–Siegel based on the AR process performs more poorly than the machinelearning methods. In other words, because of the nonlinearity, machinelearning techniques can be successfully used for modeling and forecasting time series [96–100].
Based on the empirical findings given in Section 3, we have three implications. The first is that the datadriven method is more effective in predictive power than the theoretical model consisting of theoretical variables that influence a financial asset’s price. Of course, the datadriven method has a much larger number of parameters than the modeldriven method and a much slower implementation speed. However, it is acceptable to use a machinelearning algorithm without the need for prior knowledge, such as interest rate period structure, to predict CDS term structure more accurately. Second, we need to improve the existing Nelson–Siegel model. We showed that the machinelearning models outperform the Nelson–Siegel model for all three cases, which implies both that the machinelearning methodologies excel at this task and that there is a factor in the CDS term structure that the Nelson–Siegel model does not reflect. Nelson–Siegel still has room for improvement in its performance, especially in forecasting applications. Third, the performance of all models was inconsistent depending on the data period. In the highly volatile data period (subperiod 1), all models were less predictable than in the less volatile data period (subperiod 2). In both approaches, the model performance is not stable when the data are highly volatile. Figure 1 shows that the CDS term structure from 2012 to 2019 seems regular but has some unpredictable points related to the financial turbulence from 2008 to 2011. This unusual volatility is one of the things that reduced the forecasting performance of all models. Therefore, it is necessary to consider a new approach that can achieve solid forecasting performance regardless of the volatility of the data.
Our findings can help investors and policymakers analyze the risk of companies or countries. The CDS spread is an index that represents the probability of credit default; thus, this study offers a measure to predict future risk. For instance, Zghal et al. [101] showed that CDS can function as a strong hedging mechanism against European stock market fluctuations, and Ratner and Chiu [19] confirmed the hedging and safehaven characteristics of CDS against stock risks in the U.S. Researchers can also apply machinelearning models to forecast financial risk timeseries data.
Future studies should apply this same experiment to datasets other than CDS data for comparing the forecasting performance of modeldriven and datadriven methods, such as the implied volatility surface. The implied volatility surface is a fundamental concept for pricing various financial derivatives. Therefore, for a long time, many researchers have been working on it, and various models have been developed [102–106]. Because it is a key part of the evaluation of financial derivatives, comparisons of performance between existing volatility models and datadriven models in predicting implied volatility should draw attention from academics and practitioners. GMDH showed the best predictive performance for the CDS term structure used in this study. It is now necessary to ensure that GMDH performs best for other term structures as well, such as for volatility term structures and yield curves, or other CDS contracts, for example, corporate CDS and CDS index. As a possible future study, extended Nelson–Siegel models can be used, such as regimeswitching [107] and the Nelson–Siegel–Svensson model [108], to forecast CDS term structure. Optimized through grid search for machinelearning algorithms, we expect to increase the forecasting power of the Nelson–Siegel model using extended models rather than by optimizing parameters for the Nelson–Siegel model.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The authors are grateful to the editor Baogui Xin for the valuable comments which helped to significantly improve this paper. This work was supported by the Gachon University Research Fund of 2018 (GCU20180295) and by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (no. 2019R1G1A1010278).
References
 Z. Li and V. Tam, “A comparative study of a recurrent neural network and support vector machine for predicting price movements of stocks of different volatilites,” in Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8, IEEE, Paris, France, June 2017. View at: Google Scholar
 K. Chen, Y. Zhou, and F. Dai, “A LSTMbased method for stock returns prediction: a case study of China stock market,” in Proceedings of the 2015 IEEE international conference on big data (big data), pp. 28232824, IEEE, Santa Clara, CA, USA, 2015. View at: Google Scholar
 T. Gao, Y. Chai, and Yi Liu, “Applying long short term momory neural networks for predicting stock closing price,” in Proceedings of the 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS),, pp. 575–578, IEEE, Beijing, China, November 2017. View at: Google Scholar
 F. Shaw, F. Murphy, and F. O’Brien, “The forecasting efficiency of the dynamic NelsonSiegel model on credit default swaps,” Research in International Business and Finance, vol. 30, pp. 348–368, 2014. View at: Publisher Site  Google Scholar
 D. Avino and O. Nneji, “Are CDS spreads predictable? an analysis of linear and nonlinear forecasting models,” International Review of Financial Analysis, vol. 34, pp. 262–274, 2014. View at: Publisher Site  Google Scholar
 A. Sensoy, F. J. Fabozzi, and V. Eraslan, “Predictability dynamics of emerging sovereign CDS markets,” Economics Letters, vol. 161, pp. 5–9, 2017. View at: Publisher Site  Google Scholar
 S. Neftci, A. Oliveira Santos, and Y. Lu, Credit Default Swaps and Financial Crisis Prediction, 2005, Technical Report, Working Paper Series.
 J. Duyvesteyn and M. Martens, “Forecasting sovereign default risk with Merton’s model,” The Journal of Fixed Income, vol. 25, no. 2, pp. 58–71, 2015. View at: Publisher Site  Google Scholar
 D. F. Gray, R. C. Merton, and Z. Bodie, “Contingent claims approach to measuring and managing sovereign credit risk,” Journal of Investment Management, vol. 5, no. 4, p. 5, 2007. View at: Google Scholar
 H. Blommestein, S. Eijffinger, and Z. Qian, “Regimedependent determinants of euro area sovereign CDS spreads,” Journal of Financial Stability, vol. 22, pp. 10–21, 2016. View at: Publisher Site  Google Scholar
 J. Pan and K. J. Singleton, “Default and recovery implicit in the term structure of sovereign CDS spreads,” The Journal of Finance, vol. 63, no. 5, pp. 2345–2384, 2008. View at: Publisher Site  Google Scholar
 F. A. Longstaff, J. Pan, L. H. Pedersen, and K. J. Singleton, “How sovereign is sovereign credit risk?” American Economic Journal: Macroeconomics, vol. 3, no. 2, pp. 75–103, 2011. View at: Publisher Site  Google Scholar
 E. C. Galariotis, P. Makrichoriti, and S. Spyrou, “Sovereign CDS spread determinants and spillover effects during financial crisis: a panel var approach,” Journal of Financial Stability, vol. 26, pp. 62–77, 2016. View at: Publisher Site  Google Scholar
 S. Srivastava, H. Lin, I. M. Premachandra, and H. Roberts, “Global risk spillover and the predictability of sovereign CDS spread: International evidence,” International Review of Economics & Finance, vol. 41, pp. 371–390, 2016. View at: Publisher Site  Google Scholar
 S. H. Ho, “Long and shortruns determinants of the sovereign CDS spread in emerging countries,” Research in International Business and Finance, vol. 36, pp. 579–590, 2016. View at: Publisher Site  Google Scholar
 P. Augustin, “The term structure of CDS spreads and sovereign credit risk,” Journal of Monetary Economics, vol. 96, pp. 53–76, 2018. View at: Publisher Site  Google Scholar
 E. Bouri, S. J. H. Shahzad, N. Raza, and D. Roubaud, “Oil volatility and sovereign risk of BRICS,” Energy Economics, vol. 70, pp. 258–269, 2018. View at: Publisher Site  Google Scholar
 E. Bouri, N. Jalkh, and D. Roubaud, “Commodity volatility shocks and BRIC sovereign risk: a GARCHquantile approach,” Resources Policy, vol. 61, pp. 385–392, 2019. View at: Publisher Site  Google Scholar
 M. Ratner and C.C. Chiu, “Hedging stock sector risk with credit default swaps,” International Review of Financial Analysis, vol. 30, pp. 18–25, 2013. View at: Publisher Site  Google Scholar
 S. Hammoudeh, T. Liu, C.L. Chang, and M. McAleer, “Risk spillovers in oilrelated cds, stock and credit markets,” Energy Economics, vol. 36, pp. 526–535, 2013. View at: Publisher Site  Google Scholar
 S. J. Hussain Shahzad, E. Bouri, J. ArreolaHernandez, D. Roubaud, and S. Bekiros, “Spillover across eurozone credit market sectors and determinants,” Applied Economics, vol. 51, no. 59, pp. 6333–6349, 2019. View at: Publisher Site  Google Scholar
 S. J. H. Shahzad, S. M. Nor, R. Ferrer, and S. Hammoudeh, “Asymmetric determinants of CDS spreads: U.S. industrylevel evidence through the NARDL approach,” Economic Modelling, vol. 60, pp. 211–230, 2017. View at: Publisher Site  Google Scholar
 B. Han and Y. Zhou, “Understanding the term structure of credit default swap spreads,” Journal of Empirical Finance, vol. 31, pp. 18–35, 2015. View at: Publisher Site  Google Scholar
 W. Huang, Y. Nakamori, and S.Y. Wang, “Forecasting stock market movement direction with support vector machine,” Computers & Operations Research, vol. 32, no. 10, pp. 2513–2522, 2005. View at: Publisher Site  Google Scholar
 K.j. Kim, “Financial time series forecasting using support vector machines,” Neurocomputing, vol. 55, no. 12, pp. 307–319, 2003. View at: Publisher Site  Google Scholar
 C.J. Lu, T.S. Lee, and C.C. Chiu, “Financial time series forecasting using independent component analysis and support vector regression,” Decision Support Systems, vol. 47, no. 2, pp. 115–125, 2009. View at: Publisher Site  Google Scholar
 P.F. Pai and C.S. Lin, “A hybrid arima and support vector machines model in stock price forecasting,” Omega, vol. 33, no. 6, pp. 497–505, 2005. View at: Publisher Site  Google Scholar
 N. Amanifard, N. NarimanZadeh, M. H. Farahani, and A. Khalkhali, “Modelling of multiple shortlengthscale stall cells in an axial compressor using evolved gmdh neural networks,” Energy Conversion and Management, vol. 49, no. 10, pp. 2588–2594, 2008. View at: Publisher Site  Google Scholar
 M. Najafzadeh and G.A. Barani, “Comparison of group method of data handling based genetic programming and back propagation systems to predict scour depth around bridge piers,” Scientia Iranica, vol. 18, no. 6, pp. 1207–1213, 2011. View at: Publisher Site  Google Scholar
 M. Witczak, J. Korbicz, M. Mrugalski, and R. J. Patton, “A GMDH neural networkbased approach to robust fault diagnosis: application to the damadics benchmark problem,” Control Engineering Practice, vol. 14, no. 6, pp. 671–683, 2006. View at: Publisher Site  Google Scholar
 H. Yan and H. Ouyang, “Financial time series prediction based on deep learning,” Wireless Personal Communications, vol. 102, no. 2, pp. 683–700, 2018. View at: Publisher Site  Google Scholar
 Y. Baek and H. Y. Kim, “Modaugnet: a new forecasting framework for stock market index value with an overfitting prevention lstm module and a prediction lstm module,” Expert Systems with Applications, vol. 113, pp. 457–480, 2018. View at: Publisher Site  Google Scholar
 J. Cao, Z. Li, and J. Li, “Financial time series forecasting model based on ceemdan and lstm,” Physica A: Statistical Mechanics and its Applications, vol. 519, pp. 127–139, 2019. View at: Publisher Site  Google Scholar
 T. Fischer and C. Krauss, “Deep learning with long shortterm memory networks for financial market predictions,” European Journal of Operational Research, vol. 270, no. 2, pp. 654–669, 2018. View at: Publisher Site  Google Scholar
 P. Thottakkara, T. OzrazgatBaslanti, B. B. Hupf et al., “Application of machine learning techniques to highdimensional clinical data to forecast postoperative complications,” PloS One, vol. 11, no. 5, Article ID e0155705, 2016. View at: Publisher Site  Google Scholar
 R. Motka, V. Parmarl, B. Kumar, and A. R. Verma, “Diabetes mellitus forecast using different data mining techniques,” in Proceedings of the 2013 4th International Conference on Computer and Communication Technology (ICCCT), pp. 99–103, IEEE, Allahabad, India, 2013. View at: Google Scholar
 N. Boyko, T. Sviridova, and N. Shakhovska, “Use of machine learning in the forecast of clinical consequences of cancer diseases,” in Proceedings of the 2018 7th Mediterranean Conference on Embedded Computing (MECO), pp. 1–6, IEEE, Budva, Montenegro, 2018. View at: Google Scholar
 P. J. Tighe, C. A. Harle, R. W. Hurley, H. Aytug, A. P. Boezaart, and R. B. Fillingim, “Teaching a machine to feel postoperative pain: combining highdimensional clinical data with machine learning algorithms to forecast acute postoperative pain,” Pain Medicine, vol. 16, no. 7, pp. 1386–1401, 2015. View at: Publisher Site  Google Scholar
 S. Choi, Y. J. Kim, B. Simon, and D. Mavris, “Prediction of weatherinduced airline delays based on machine learning algorithms,” in Proceedings of the 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), pp. 1–6, IEEE, Sacramento, CA, USA, September 2016. View at: Google Scholar
 S. E. Haupt and B. Kosovic, “Big data and machine learning for applied weather forecasts: forecasting solar power for utility operations,” in Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence, pp. 496–501, IEEE, Cape Town, South Africa, December 2015. View at: Google Scholar
 J. Rhee and J. Im, “Meteorological drought forecasting for ungauged areas based on machine learning: using longrange climate forecast and remote sensing data,” Agricultural and Forest Meteorology, vol. 237238, pp. 105–122, 2017. View at: Publisher Site  Google Scholar
 S. C. James, Y. Zhang, and F. O’Donncha, “A machine learning framework to forecast wave conditions,” Coastal Engineering, vol. 137, pp. 1–10, 2018. View at: Publisher Site  Google Scholar
 X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning traffic as images: a deep convolutional neural network for largescale transportation network speed prediction,” Sensors, vol. 17, no. 4, p. 818, 2017. View at: Publisher Site  Google Scholar
 Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent neural network: datadriven traffic forecasting,” 2017, http://arxiv.org/abs/1707.01926. View at: Google Scholar
 S. J. Farlow, SelfOrganizing Methods in Modeling: GMDH Type Algorithms, vol. 54, CRC Press, Boca Raton, FL, USA, 1984.
 M. Najafzadeh and H. M. Azamathulla, “Group method of data handling to predict scour depth around bridge piers,” Neural Computing and Applications, vol. 23, no. 78, pp. 2107–2112, 2013. View at: Publisher Site  Google Scholar
 N. NarimanZadeh, A. Darvizeh, and G. R. AhmadZadeh, “Hybrid genetic design of GMDHtype neural networks using singular value decomposition for modelling and prediction of the explosive cutting process,” Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, vol. 217, no. 6, pp. 779–790, 2003. View at: Publisher Site  Google Scholar
 S.Y. Choi and C. Hong, “Relationship between uncertainty in the oil and stock markets before and after the shale gas revolution: evidence from the OVX, VIX, and VKOSPI volatility indices,” PloS One, vol. 15, no. 5, Article ID e0232508, 2020. View at: Publisher Site  Google Scholar
 R. Y. M. Li, S. Fong, and K. W. S. Chong, “Forecasting the reits and stock indices: group method of data handling neural network approach,” Pacific Rim Property Research Journal, vol. 23, no. 2, pp. 123–160, 2017. View at: Publisher Site  Google Scholar
 I. Pavlova, M. E. de Boyrie, and A. M. Parhizgari, “A dynamic spillover analysis of crude oil effects on the sovereign credit risk of exporting countries,” The Quarterly Review of Economics and Finance, vol. 68, pp. 10–22, 2018. View at: Publisher Site  Google Scholar
 R. Ramezanian, A. Peymanfar, and S. B. Ebrahimi, “An integrated framework of genetic network programming and multilayer perceptron neural network for prediction of daily stock return: an application in tehran stock exchange market,” Applied Soft Computing, vol. 82, Article ID 105551, 2019. View at: Publisher Site  Google Scholar
 S. SiamiNamini and A. S. Namin, Forecasting Economics and Financial Time Series: ARIMA vs. LSTM, 2018, http://arxiv.org/abs/1803.06386.
 S. McNally, J. Roche, and S. Caton, “Predicting the price of bitcoin using machine learning,” in Proceedings of the 2018 26th Euromicro International Conference on Parallel, Distributed and Networkbased Processing (PDP), pp. 339–343, Cambridge, UK, 2018. View at: Google Scholar
 B. Cortez, B. Carrera, Y.J. Kim, and J.Y. Jung, “An architecture for emergency event prediction using LSTM recurrent neural networks,” Expert Systems with Applications, vol. 97, pp. 315–324, 2018. View at: Publisher Site  Google Scholar
 J. Samarawickrama and T. G. I. Fernando, A Recurrent Neural Network Approach in Predicting Daily Stock Prices: An Application to the Sri Lankan Stock Market, IEEE, Peradeniya, Sri Lanka, 2017.
 S. Selvin, R. Vinayakumar, E. A. Gopalakrishnan, V. K. Menon, and K. P. Soman, “Stock price prediction using LSTM, RNN and CNNsliding window model,” in Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1643–1647, IEEE, Udupi, India, September 2017. View at: Google Scholar
 C. R. Nelson and A. F. Siegel, “Parsimonious modeling of yield curves,” Journal of Business, vol. 60, no. 4, pp. 473–489, 1987. View at: Publisher Site  Google Scholar
 B. Guo, Q. Han, and B. Zhao, “The nelsonsiegel model of the term structure of option implied volatility and volatility components,” Journal of Futures Markets, vol. 34, no. 8, pp. 788–806, 2014. View at: Publisher Site  Google Scholar
 N. S. GrØnborg and A. Lunde, “Analyzing oil futures with a dynamic NelsonSiegel model,” Journal of Futures Markets, vol. 36, no. 2, pp. 153–173, 2016. View at: Publisher Site  Google Scholar
 J. West, “Longdated agricultural futures price estimates using the seasonal NelsonSiegel model,” International Journal of Business and Management, vol. 7, no. 3, pp. 78–93, 2012. View at: Publisher Site  Google Scholar
 R.R. Chen, X. Cheng, and L. Wu, “Dynamic interactions between interestrate and credit risk: theory and evidence on the credit default swap term structure,” Review of Finance, vol. 17, no. 1, pp. 403–441, 2011. View at: Publisher Site  Google Scholar
 M. Tsuruta, “Decomposing the term structures of local currency sovereign bond yields and sovereign credit default swap spreads,” The North American Journal of Economics and Finance, vol. 51, Article ID 101072, 2020. View at: Publisher Site  Google Scholar
 S. R. Gunn, “Support vector machines for classification and regression,” Tech. Rep. 1, pp. 5–16, 1998, ISIS Technical Report. View at: Google Scholar
 S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994.
 F. Pérezcruz, J. A. Afonsorodríguez, J. Giner et al., “Estimating GARCH models using support vector machines,” Quantitative Finance, vol. 3, no. 3, pp. 163–172, 2003. View at: Publisher Site  Google Scholar
 M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, MIT Press, Cambridge, MA, USA, 2018.
 L. Cao and F. E. H. Tay, “Financial forecasting using support vector machines,” Neural Computing & Applications, vol. 10, no. 2, pp. 184–192, 2001. View at: Publisher Site  Google Scholar
 S. Hochreiter and J. Schmidhuber, “LSTM can solve hard long time lag problems,” in Proceedings of the 10th Annual Conference on Neural Information Processing Systems (NIPS 1996), pp. 473–479, Denver, CO, USA, December 1996. View at: Google Scholar
 F. A. Gers and J. . Schmidhuber, “Recurrent nets that time and count,” in Proceedings of the IEEEINNSENNS International Joint Conference on Neural Networks (IJCNN 2000), pp. 189–194, IEEE, Como, Italy, July 2000. View at: Google Scholar
 H.G. Zimmermann, C. Tietz, and R. Grothmann, “Forecasting with recurrent neural networks: 12 tricks,” in Neural Networks: Tricks of the Trade, Springer, Berlin, Germany, 2012. View at: Google Scholar
 J. Chung, C. Gulcehre, K.H. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, http://arxiv.org/abs/1412.3555. View at: Google Scholar
 A. G. Ivakhnenko, “Polynomial theory of complex systems,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC1, no. 4, pp. 364–378, 1971. View at: Publisher Site  Google Scholar
 M. Najafzadeh, G.A. Barani, and H. M. Azamathulla, “GMDH to predict scour depth around a pier in cohesive soils,” Applied Ocean Research, vol. 40, pp. 35–41, 2013. View at: Publisher Site  Google Scholar
 G. P. Liu and V. Kadirkamanathan, “Multiobjective criteria for neural network structure selection and identification of nonlinear systems using genetic algorithms,” IEE ProceedingsControl Theory and Applications, vol. 146, no. 5, pp. 373–382, 1999. View at: Publisher Site  Google Scholar
 E. Sanchez, T. Shibata, and L. Asker Zadeh, Genetic Algorithms and Fuzzy Logic Systems: Soft Computing Perspectives, World Scientific, Singapore, 1997.
 A. M. Ghaedi, M. M. Baneshi, A. Vafaei et al., “Comparison of multiple linear regression and group method of data handling models for predicting sunset yellow dye removal onto activated carbon from oak tree wood,” Environmental Technology & Innovation, vol. 11, pp. 262–275, 2018. View at: Publisher Site  Google Scholar
 I. Ebtehaj, H. Bonakdari, F. Khoshbin, and H. Azimi, “Pareto genetic design of group method of data handling type neural network for prediction discharge coefficient in rectangular side orifices,” Flow Measurement and Instrumentation, vol. 41, pp. 67–74, 2015. View at: Publisher Site  Google Scholar
 R. Shirmohammadi, B. Ghorbani, M. Hamedi, M.H. Hamedi, and L. M. Romeo, “Optimization of mixed refrigerant systems in low temperature applications by means of group method of data handling (GMDH),” Journal of Natural Gas Science and Engineering, vol. 26, pp. 303–312, 2015. View at: Publisher Site  Google Scholar
 M. Najafzadeh, G.A. Barani, and H. M. Azamathulla, “Prediction of pipeline scour depth in clearwater and livebed conditions using group method of data handling,” Neural Computing and Applications, vol. 24, no. 34, pp. 629–635, 2014. View at: Publisher Site  Google Scholar
 A. Sakaguchi and T. Yamamoto, “A gmdh network using backpropagation and its application to a controller design,” in Proceedings of the 2000 IEEE International Conference on Systems, Man and Cybernetics’ Cybernetics Evolving to Systems, Humans, Organizations, and their Complex Interactions, pp. 2691–2696, IEEE, Nashville, TN, USA, 2000. View at: Google Scholar
 D. Srinivasan, “Energy demand prediction using GMDH networks,” Neurocomputing, vol. 72, no. 1–3, pp. 625–629, 2008. View at: Publisher Site  Google Scholar
 C.W. Hsu, C.C. Chang, C.J. Lin et al., A Practical Guide to Support Vector Classification, National Taiwan University, Taipei, Taiwan, 2003.
 J. Bergstra and Y. Bengio, “Random search for hyperparameter optimization,” Journal of Machine Learning Research, vol. 13, pp. 281–305, 2012. View at: Google Scholar
 N. Schilling, M. Wistuba, L. Drumond, and L. SchmidtThieme, “Hyperparameter optimization with factorized multilayer perceptrons,” Machine Learning and Knowledge Discovery in Databases, Springer, Berlin, Germany, 2015. View at: Publisher Site  Google Scholar
 F. Hutter, J. Lücke, and L. SchmidtThieme, “Beyond manual tuning of hyperparameters,” KIKünstliche Intelligenz, vol. 29, no. 4, pp. 329–337, 2015. View at: Publisher Site  Google Scholar
 J. Sun, C. Zheng, X. Li, and Y. Zhou, “Analysis of the distance between two classes for tuning SVM hyperparameters,” IEEE Transactions on Neural Networks, vol. 21, no. 2, pp. 305–318, 2010. View at: Publisher Site  Google Scholar
 C. Thornton, F. Hutter, H. H. Hoos, and K. LeytonBrown, “Autoweka: combined selection and hyperparameter optimization of classification algorithms,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855, Chicago, IL, USA, August 2013. View at: Google Scholar
 N. K. Ahmed, A. F. Atiya, N. El Gayar, and H. ElShishiny, “An empirical comparison of machine learning models for time series forecasting,” Econometric Reviews, vol. 29, no. 56, pp. 594–621, 2010. View at: Publisher Site  Google Scholar
 A. Khosravi, L. Machado, and R. O. Nunes, “Timeseries prediction of wind speed using machine learning algorithms: a case study osorio wind farm, Brazil,” Applied Energy, vol. 224, pp. 550–566, 2018. View at: Publisher Site  Google Scholar
 X. Qiu, L. Zhang, Y. Ren, P. N. Suganthan, and G. Amaratunga, “Ensemble deep learning for regression and time series forecasting,” in Proceedings of the 2014 IEEE symposium on computational intelligence in ensemble learning (CIEL), pp. 1–6, IEEE, Orlando, FL, USA, 2014. View at: Google Scholar
 A. V. Seliverstova, D. A. Pavlova, S. A. Tonoyan, and Y. E. Gapanyuk, “The time series forecasting of the company’s electric power consumption,” Advances in Neural Computation, Machine Learning, and Cognitive Research II, Springer, Berlin, Germany, 2018. View at: Publisher Site  Google Scholar
 D. Solomatine, L. M. See, and R. J. Abrahart, “Datadriven modelling: concepts, approaches and experiences,” in Practical Hydroinformatics, Springer, Berlin, Germany, 2009. View at: Google Scholar
 S. Aras and İ. D. Kocakoç, “A new model selection strategy in time series forecasting with artificial neural networks: IHTS,” Neurocomputing, vol. 174, pp. 974–987, 2016. View at: Publisher Site  Google Scholar
 M. C. Brace, J. Schmidt, and M. Hadlin, “Comparison of the forecasting accuracy of neural networks with other established techniques,” in Proceedings of the First International Forum on Applications of Neural Networks to Power Systems, pp. 31–35, IEEE, Seattle, WA, USA, 1991. View at: Google Scholar
 W. R. Foster, F. Collopy, and L. H. Ungar, “Neural network forecasting of short, noisy time series,” Computers & Chemical Engineering, vol. 16, no. 4, pp. 293–297, 1992. View at: Publisher Site  Google Scholar
 Ü. Ç. Büyükşahin and Ş. Ertekin, “Improving forecasting accuracy of time series data using a new ARIMAANN hybrid method and empirical mode decomposition,” Neurocomputing, vol. 361, pp. 151–163, 2019. View at: Publisher Site  Google Scholar
 A. Lapedes and R. Farber, “Nonlinear signal processing using neural networks: prediction and system modelling,” Tech. Rep., 1987, Technical Report. View at: Google Scholar
 M. C. Medeiros, A. Veiga, and C. E. Pedreira, “Modeling exchange rates: smooth transitions, neural networks, and linear models,” IEEE Transactions on Neural Networks, vol. 12, no. 4, pp. 755–764, 2001. View at: Publisher Site  Google Scholar
 K. Rasouli, W. W. Hsieh, and A. J. Cannon, “Daily streamflow forecasting by machine learning methods with weather and climate inputs,” Journal of Hydrology, vol. 414415, pp. 284–293, 2012. View at: Publisher Site  Google Scholar
 J. C. Reboredo, J. M. Matías, and R. GarciaRubio, “Nonlinearity in forecasting of highfrequency stock returns,” Computational Economics, vol. 40, no. 3, pp. 245–264, 2012. View at: Publisher Site  Google Scholar
 R. Zghal, A. Ghorbel, and M. Triki, “Dynamic model for hedging of the european stock sector with credit default swaps and euro STOXX 50 volatility index futures,” Borsa Istanbul Review, vol. 18, no. 4, pp. 312–328, 2018. View at: Publisher Site  Google Scholar
 S.Y. Choi, J.P. Fouque, and J.H. Kim, “Option pricing under hybrid stochastic and local volatility,” Quantitative Finance, vol. 13, no. 8, pp. 1157–1165, 2013. View at: Publisher Site  Google Scholar
 B. Dupire, “Pricing with a smile,” Risk, vol. 7, no. 1, pp. 18–20, 1994. View at: Google Scholar
 J. Gatheral and A. Jacquier, “Arbitragefree SVI volatility surfaces,” Quantitative Finance, vol. 14, no. 1, pp. 59–71, 2014. View at: Publisher Site  Google Scholar
 P. S. Hagan, Deep Kumar, A. S. Lesniewski, and D. E. Woodward, “Managing smile risk,” The Best of Wilmott, vol. 1, pp. 249–296, 2002. View at: Google Scholar
 S. L. Heston, “A closedform solution for options with stochastic volatility with applications to bond and currency options,” Review of Financial Studies, vol. 6, no. 2, pp. 327–343, 1993. View at: Publisher Site  Google Scholar
 J. Xiang and X. Zhu, “A regimeswitching nelsonsiegel term structure model and Interest rate forecasts,” Journal of Financial Econometrics, vol. 11, no. 3, pp. 522–555, 2013. View at: Publisher Site  Google Scholar
 R. B. De Rezende and M. S. Ferreira, “Modeling and forecasting the yield curve by an extended nelsonsiegel class of models: a quantile autoregression approach,” Journal of Forecasting, vol. 32, no. 2, pp. 111–123, 2013. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2020 Won Joong Kim et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.