Journal of Mathematics

Journal of Mathematics / 2014 / Article

Research Article | Open Access

Volume 2014 |Article ID 753852 | https://doi.org/10.1155/2014/753852

Luca Di Persio, Samuele Vettori, "Markov Switching Model Analysis of Implied Volatility for Market Indexes with Applications to S&P 500 and DAX", Journal of Mathematics, vol. 2014, Article ID 753852, 17 pages, 2014. https://doi.org/10.1155/2014/753852

Markov Switching Model Analysis of Implied Volatility for Market Indexes with Applications to S&P 500 and DAX

Academic Editor: Niansheng Tang
Received31 May 2014
Revised26 Nov 2014
Accepted26 Nov 2014
Published18 Dec 2014

Abstract

We adopt a regime switching approach to study concrete financial time series with particular emphasis on their volatility characteristics considered in a space-time setting. In particular the volatility parameter is treated as an unobserved state variable whose value in time is given as the outcome of an unobserved, discrete-time and discrete-state, stochastic process represented by a suitable Markov chain. We will take into account two different approaches for inference on Markov switching models, namely, the classical approach based on the maximum likelihood techniques and the Bayesian inference method realized through a Gibbs sampling procedure. Then the classical approach shall be tested on data taken from the Standard & Poor’s 500 and the Deutsche Aktien Index series of returns in different time periods. Computations are given for a four-state switching model and obtained numerical results are put beside by explanatory graphs which report the outcomes obtained exploiting both smoothing and filtering algorithms used in the estimation/calibration procedures we proposed to infer on the switching model parameters.

1. Introduction

Many financial time series are characterized by abrupt changes in their behaviour, a phenomena that can be implied by a number of both endogeneous and exogeneous facts, often far from being forecasted. Examples of such changing factors can be represented, for example, by large financial crises, government policy and political instabilities, natural disasters, and speculative initiatives.

Such phenomena have been frequently observed during last decade especially because of the worldwide financial crisis which originated during the first years of 2000 and is still running. Indeed such a crisis, often referred to as the Global Financial Crisis, has caused a big lack of liquidity for banks, both in USA, Europe, and many countries all over the world, resulting, for example, in a collapse of many financial institutions, a generalized downfall in stock markets, a decline consumer wealth, and an impressive growth of the Eurpean sovereign debt. The reasons behind these phenomena are rather complicated, particularly because of the high number of interconnected point of interests, each of which is driven by specific influences often linked between each other. Nevertheless there are mathematical techniques which can be used to point out some general characteristics able to synthesize some relevant informations and to give an effective help in forecasting future behaviour of certain macroquantities of particular interest.

Although linear time series techniques, for example, the autoregressive (AR) model, the moving average (MA) model, and their combination (ARMA), have been successfully applied in a large number of financial applications, by their own nature they are unable to describe nonlinear dynamic patterns as in the case, for example, of asymmetry and volatility clustering. In order to overcome latter issue various approaches have been developed. Between them and with financial applications in mind, we recall the autoregressive conditional heteroskedasticity (ARCH) model of Engle, together with its generalised version (GARCH), and the regime switching (RS) model which involves multiple equations, each characterizing the behaviors of quantities of interest in different regimes, and a mechanism allowing the switch between them. Concerning the switching methods that can be considered in the RS framework, we would like to cite the threshold autoregressive (TAR) model proposed by Tong in [1], in which regime switching is controlled by a fixed threshold, the autoregressive conditional root (ACR) model of Bec et al. (see [2]) where the regime switching between stationary and nonstationary state is controlled by a binary random variable, and its extension, namely, the functional coefficient autoregressive conditional root (FCACR) model, considered by Zhou and Chen in [3]. In particular in this work we aim at using the RS approach to model aforementioned types of unexpected changes by their dependence on an unobserved variable typically defined as the regime or state. A customary way to formalize such an approach is given through the following state-space representation: where is the vector of problem parameters, is the information set, the state set , with , is the (finite) set of possible values which can be taken by the the state process at time , that is, , and is a suitable function determining the value of the dependent variable at any given time .

The simplest type of structure considers two regimes; that is, and at most one switch in the time series: in other words, the first (unknown) observations relate regime 1, while the remaining data concern regime 2. Such an approach can be generalized allowing the system to switch back and forth between the two regimes, with a certain probability. The latter is the case considered by Quandt in his paper published in 1962, where it is assumed to be theoretically possible for the system to switch between regimes every time that a new observation is generated. Note that previous hypothesis is not realistic in an economic context, since it contradicts the volatility clustering property, typical of financial time series.

The best way to represent the volatility clusters phenomenon consists in assuming that the state variable follows a Markov chain and claiming that the probability of having a switch in the next time is much lower than the probability of remaining in the same economic regime. The Markovian switching mechanism was first considered by Goldfeld and Quandt in [4] and then extended by Hamilton to the case of structural changes in the parameters of an autoregressive process (see [5]). When the unobserved state variable that controls the switching mechanism follows a first-order Markov chain, the RS model is called Markov Switching Model (MSM). In particular the Markovian property of such a model implies that, given , the value of the state variable depends only on , a property that turns out to be useful to obtain a good representation of financial data, where abrupt changes occur occasionally.

After Hamilton’s papers, Markov switching models have been widely applied, together with a number of alternative versions, to analyze different type of both economic and financial time series, for example, stock options behaviors, energy markets trends, and interest rates series. In what follows we shall often refer to the following, actually rather general, form for an MSM: where the terms are nothing but the transition probabilities, from state at time to state at time , which determine the stochastic dynamic of the process .

In Section 2 we recall the classical approach to MSM following [5] and with respect to both serially uncorrelated and correlated data. Then, in Section 3, we first introduce basic facts related to the Bayesian inference; then we recall the Gibbs sampling technique and related Monte Carlo approximation method which is later used to infer on the MSM parameters. Section 4 is devoted to a goodness of fit (of obtained estimates for parameters of interest) analysis, while in Section 5 we describe our forecasting MSM-based procedure which is then used in Section 6 to analyse both the Standard & Poor’s 500 and the Deutsche Aktien indexes.

2. The Classical Approach

In this section we shall use the classical approach (see, for example, [5]) to develop procedures which allow us to make inference on unobserved variables and parameters characterizing MSM. The main idea behind such an approach is splitin two steps: first we estimate the model’s unknown parameters by a maximum likelihood method; secondly we infer the unobserved switching variable values conditional on the parameter estimates. Along latter lines, we shall analyze two different MSM settings, namely, the case in which data are serially uncorrelated, and the case when they are autocorrelated.

2.1. Serially Uncorrelated Data

Let us suppose that , , is a discrete time stochastic process represented by a first-order Markov chain taking value in some set , with transition probabilities: Then the state-space model we want to study is given by where the variables , are introduced in order to have a slightly compact equation for and : in particular, if , otherwise , which implies that under regime , for , parameters of mean and variance are given by and .

Let us underline that in (4) the are our observed data, for example, historical returns of a stock or some index time series, and we suppose that they are locally distributed as Gaussian random variable in the sense that occasionally jumps could occur for both the mean and the variance . In particular, we assume that and we want to estimate these unobserved values for standard deviation, as well as the values for the switching mean. Note that we could also take as a constant, obtaining the so called switching variance problem or as a constant, having a switching mean problem. The first one is in general more interesting in the analysis of financial time series, since variance is usually interpreted as an indicator for the market’s volatility.

Given the model described by (4), the conditional density of given is Gaussian with mean and standard deviation ; namely, its related probability density function reads as follow: and we are left with the problem of estimating both the expectations and the standard deviations parameters; a task that is standard to solve by maximizing the associated log-likelihood function defined as follows: A different and more realistic scenario is the one characterized by unobserved values for . In such a case, it could be possible to consider the MSM-inference problem as a two-step procedure consisting in(i)estimating the parameters of the model by maximizing the log-likelihood function,(ii)making inferences on the state variable ,   conditional on the estimates obtained at previous point.

Depending on the amount of information we can use inferencing on , we have(i)filtered probabilities that refer to inferences about conditional on information up to time , namely, with respect to ,(ii)smoothed probabilities that refer to inferences about conditional to the whole sample (history), .

In what follows we describe a step-by-step algorithm which allows us to resolve the filtering problem for a sample of serially uncorrelated data. In particular we slightly generalize the approach given in [6], assuming that the state variable belongs to a 4-state space set , at every time . Despite 2-state (expansion/contraction) and 3-state (low/medium/high volatility regime) models being the usual choices, we decided to consider 4-state MSM in order to refine the analysis with respect to volatility levels around the mean. A finer analysis can be also performed, even if one has to take into account the related nonlinear computational growth. We first define the log-likelihood function at time as where is the vector of parameters that we want to estimate. Let us note that the is updated at every iteration, since we maximize with respect to the function at every stage of our step-by-step procedure. In particular the calibrating procedure reads as follows.

Inputs.(i)Put .(ii)Compute the transition probabilities of the homogeneous Markov chain underlying the state variable; that is, Since in the applications we can only count on return time series, we first have to calibrate with respect to the transition probabilities :(1)choose 4 values (e.g., , , , and ) and a positive, arbitrary small, constant , for example, ,(2)compute for every and for every the values (3)simulate a value in for at each time from the discrete probability vector (4)set the transition probabilities just by counting the number of transition from state to state , for , in order to obtain the following transition matrix:(iii)Compute the steady-state probabilitiesLet us note that, by definition, if   is a vector of steady-state probabilities, then for every ; moreover , and (see for example, [6, pag.  71]) we also have that , where is a matrix of zeros and , is the four dimensional identity matrix, while , that is, the vector of steady-state probabilities is the last column of the matrix .

Next we perform the following steps, for .

Step 1. The probability of conditional to information set at time is given by

Step 2. Compute the joint density of and conditional to the information set The marginal density of is given by the sum of the joint density over all values of

Step 3. Update the log-likelihood function at time in the following way: and maximize with respect to , under the condition , to find the maximum likelihood estimator for the next time period.

Step 4. Once is observed at the end of the th iteration, we can update the probability term: where both and are computed with respect to the estimator .

2.2. Serially Correlated Data

In some cases it is possible to argue and mathematically test by, for example, the Durbin-Watson statistics or Breusch-Godfrey test, for the presence of a serial correlation (or autocorrelation) between data belonging to a certain time series of interest. Such a characteristic is often analyzed in signal processing scenario, but examples can be also found in economic, meteorological, or sociological data sets, especially in connection with autocorrelation of errors in related forecasting procedure. In particular if we suppose that the observed variable linearly depends on its previous value, then we obtain a first-order autoregressive pattern and the following state-space model applies: where , and , and are the same variables introduced in the previous section; that is, if , otherwise .

In this situation, if the state is known for every , we need and to compute the density of conditional to past information , indeed we have where

If are unobserved (and as before we assume that the state variable can take the four values ), we apply the following algorithm in order to resolve the filtering problem for a sample of serially correlated data.

Inputs.(i)Put .(ii)Compute the transition probabilities We apply the same trick as before, but firstly we have to estimate the parameter : in order to obtain this value we can use the least square methods (see for example, [7]) that is, Then we compute for every and consider the values (we apply the Normal distribution function to instead of , as done before).(iii)Compute the steady-state probabilities taking the last column of the matrix (see procedure in Section 2.1 for details).Next perform the following steps for .

Step 1. Compute the probabilities of conditional to information set at time , for

Step 2. Compute the joint density of , , and given : where is given by (20) and is computed in Step 1. The marginal density of conditional on is obtained by summing over all values of and :

Step 3. The log-likelihood function at time is again and it can be maximized with respect to , under condition , giving the maximum likelihood estimator for the next time period.

Step 4. Update the joint probabilities of and conditional to the new information set , using the estimator computed in Step 3 by maximizing the log-likelihood function :
Then compute the updated probabilities of given by summing the joint probabilities over as follows:

The Smoothing Algorithm. Once we have run this procedure, we are provided with the filtered probabilities, that is, the values for and for each (in addition to the estimator ).

Sometimes it is required to estimate probabilities of given the whole sample information; that is, which are called smoothed probabilities. We are going to show how these new probabilities can be computed from previous procedure (the same algorithm, although with some obvious changes, can be still used starting from procedure in Section 2.1).

Since the last iteration of the algorithm gives us the probabilities for , we can start from these values and use the following procedure by doing the two steps for every .

Step 1. For , compute

Remark 1. Note that equality , that is, holds only under a particular condition; namely, where (see [6] for the proof). Equation (32) suggests that if were known, then would contain no information about beyond that contained in and and does not hold for every state-space model with regime switching (see, for example, [6, Ch.  5]) in which case the smoothing algorithm involves an approximation.

Step 2. For compute

3. The Gibbs Sampling Approach

3.1. An Introduction to Bayesian Inference

Under the general title Bayesian inference we can collect a large number of different concrete procedures; nevertheless they are all based on smart use of the Bayes’ rule which is used to update the probability estimate for a hypothesis as additional evidence is learned (see, for example, [8, 9]). In particular, within the Bayesian framework, the parameters, for example, let us collect them in a vector called , which characterize a certain statistic model and are treated as random variables with their own probability distributions; let us say , which plays the role of a prior distribution since it is defined before taking into account the sample data . Therefore, exploiting the Bayes’ theorem and denoting by the likelihood of of the interested statistic model, we have that where is the joint posterior distribution of the parameters. The denominator defines the marginal likelihood of and can be taken as a constant, obtaining the proportion It is straightforward to note that the most critical part of the Bayesian inference procedure relies in the choice of a suitable prior distribution, since it has to agree with parameters constraints. An effective answer to latter issue is given by the so called conjugate prior distribution, namely the distribution obtained when the conjugate prior is combined with the likelihood function. Let us note that the posterior distribution is in the same family as the prior distribution.

As an example, if the likelihood function is Gaussian, it can be shown that the conjugate prior for the mean is the Gaussian distribution, whereas the conjugate prior for the variance is the inverted Gamma distribution (see, for example, [9, 10]).

3.2. Gibbs Sampling

A general problem in Statistics concerns the question of how a sequence of observations which cannot be directly sampled, can be simulated, for example, by mean of some multivariate probability distribution, with a prefixed precision degree of accuracy. Such kind of problems can be successfully attacked by Monte Carlo Markov Chain (MCMC) simulation methods, see, for example, [1113], and in particular using the so called Gibbs Sampling technique which allows to approximate joint and marginal distributions by sampling from conditional distributions, see, for example, [1416].

Let us suppose that we have the joint density of random variables, for example, , fix and that we are interested in in obtaining characteristics of the -marginal, namely such as the related mean and/or variance. In those cases when the joint density is not given, or the above integral turns out to be difficult to treat, for example, an explicit solution does not exist, but we know the complete set of conditional densities, denoted by , , with , then the Gibbs Sampling method allows us to generate a sample from the joint density without requiring that we know either the joint density or the marginal densities. With the following procedure we recall the basic ideas on which the Gibbs Sampling approach is based given an arbitrary starting set of values .

Step 1. Draw from .

Step 2. Draw from .

Step 3. Draw from .

Step k. Finally draw from to complete the first iteration.

The steps from 1 through can be iterated times to get , .

In [17] S. Geman and D. Geman showed that both the joint and marginal distributions of generated , converge, at an exponential rate, to the joint and marginal distributions of , as . Thus the joint and marginal distributions of can be approximated by the empirical distributions of simulated values , , where is large enough to assure the convergence of the Gibbs sampler. Moreover can be chosen to reach the required precision with respect to the empirical distribution of interest.

In the MSM framework we do not have conditional distributions ,  , and we are left with the problem of estimate parameters , . Latter problem can be solved exploiting Bayesian inference results, as we shall state in the next section.

3.3. Gibbs Sampling for Markov Switching Models

A major problem when dealing with inferences from Markov switching models relies in the fact that some parameters of the model are dependent on an unobserved variable, let us say . We saw that in the classical framework, inference on Markov switching models consists first in estimating the model’s unknown parameters via maximum likelihood, then inference on the unobserved Markov switching variable , conditional on the parameter estimates, has to be perfomed.

In the Bayesian analysis, both the parameters of the model and the switching variables ,   are treated as random variables. Thus, inference on is based on a joint distribution, no more on a conditional one. By employing Gibbs sampling techniques, Albert and Chib (see [14]) provided an easy to implement algorithm for the Bayesian analysis of Markov switching models. In particular in their work the parameters of the model and ,   are treated as missing data, and they are generated from appropriate conditional distributions using Gibbs sampling method. As an example, let us consider the following simple model with two-state Markov switching mean and variance: where with transition probabilities , . The Bayesian method consider both , and the model’s unknown parameters , , , , and , as random variables. In order to make inference about these variables, we need to derive the joint posterior density , where and . Namely the realization of the Gibbs sampling relies on the derivation of the distributions of each of the above variables, conditional on all the other variables. Therefore we can approximate the joint posterior density written above by running the following procedure times, where is an integer large enough to guarantee the desired convergence. Hence we have the following scheme.

Step 1. We can derive the distribution of , conditional on the other parameters in two different ways.(1)Single-move gibbs sampling: generate each from , , where .(2)Multi-move gibbs sampling: generate the whole block from .

Step 2. Generate the transition probabilities and from . Note that this distribution is conditioned only on because we assume that and are independent of both the other parameters of the model and the data, .
If we choose the Beta distribution as prior distribution for both and , we have that posterior distribution is again a Beta distribution. So, Beta distribution is a conjugate prior for the likelihood of transition probabilities.

Step 3. Generate and from . In this case the conjugate prior is the Normal distribution.

Step 4. Generate and from . From definition of the model we have that : we can first generate conditional on , and then generate conditional on . We use in both cases the Inverted Gamma distribution as conjugate prior for the parameters.

For a more detailed description of these steps (see [6, pp.  211–218]). Here we examine only the so called Multi-move Gibbs sampling, originally motivated by Carter and Kohn (see [15]) in the context of state space models and then implemented in [6] for a MSM. For the sake of simplicity, let us suppress the conditioning on model’s parameters and denote

Using the Markov property of it can be seen that where is provided by the last iteration of filtering algorithm (see Sections 2.1 and 2.2). Note that (39) suggests that we can first generate conditional on and then, for , we can generate conditional on and , namely we can run the following steps.

Step 1. Run the basic filter procedure to get , and save them; the last iteration of the filter gives us the probability distribution , from which is generated.

Step 2. Note that where is the transition probability and has been saved from Step 1. So we can generate in the following way, first calculate and then generate using a uniform distribution. For example, we generate a random number from a uniform distribution between 0 and 1; if this number is less than or equal to the calculated value of , we set , otherwise, is set equal to 0.

In view of applications, let us now consider the following four state MSM: where if , otherwise . Note that this is a particular case of the model analysed in Section 2.1, where   , hence we can perform the procedure referred to serially uncorrelated data taking to start the Gibbs sampling algorithm, therefore we have

Step 1. Generate conditional on ,
For this purpose, we employ the Multi-move Gibbs sampling algorithm:(1)run procedure in Section 2.1 with in order to get, from last iteration, ,(2)recalling that for , we can generate from the vector of probabilities where, for ,

Step 2. Generate , conditional on and the data .
We want to impose the constraint , so we redefine in this way: where for , so that , and . With this specification, we first generate , then generate , and to obtain , and indirectly.

Generating  , Conditional on  ,    and  . Define for

and take , in (42). By choosing an inverted Gamma prior distribution, that is, where are the known prior hyperparameters, it can be shown that the conditional posterior distribution from which we generate is given by:

Generating    Conditional on  ,    and  . Note that the likelihood function of depends only on the values of for which . Therefore, take and denote with the size of this sample. Then define hence, for the observation in which , 3 or 4, we have . If we choose an inverted Gamma distribution with parameters for the prior, we obtain from the following posterior distribution: In case put and . Otherwise reiterate this step.

Generating    Conditional on  ,    and  . Operate in a similar way as above. In particular if we define we will obtain

Generating    Conditional on  ,    and  . Operate in a similar way as above. In particular if we define we will have

Step 3. Generate conditional on . In order to generate the transition probabilities we exploit the properties of the prior Beta distribution. Let us first define Hence we have that Given , let , be the total number of transitions from state to ,   and the number of transitions from state to .
Begin with the generation of probabilities ,   by taking the Beta distribution as conjugate prior: if we take , where and are the known hyperparameters of the priors, the posterior distribution of given still belongs to the Beta family distributions, that is, The others parameters, that is, for and , can be computed from the above equation , where are generated from the following posterior Beta distribution: For example, given that is generated, we can obtain and by considering where and . Finally, given , and generated in this way, we have .

Remark 2. When we do not have any information about priors distribution we employ hyperparameters , . Usually we know that elements of the matrix diagonal in the transition matrix are bigger than elements out of the diagonal, because in a financial framework regime switching happens only occasionally: in this case, since we want close to 1 and , , close to 0, we will choose bigger than .

4. Goodness of Fit

Since financial time series are characterized by complex and rather unpredictable behavior, it is difficult to find, if there is any, a possible pattern. A typical set of techniques which allow to measure the goodness of forecasts obtained by using a certain model, is given by the residual analysis. Let us suppose that we are provided with a time series of return observations , , for which we choose, for example, the model described in (4) with . By running the procedure of Section 2.1 we obtain the filtered probabilities and, by maximization of the log-likelihood function, we compute the parameters , therefore we can estimate both the mean and variance of the process at time , for any , given the information set as weighted average of four values: If the chosen model fits well the data, then the standardized residuals will have the following form: therefore it is natural to apply a normality test as, for example, the Jarque-Bera test (see [18]) for details. We recall briefly that Jarque-Bera statistics is defined as where the parameters and indicate the skewness, respectively, the kurtosis of . If come from a Normal distribution, the Jarque-Bera statistics converges asymptotically to a chi-squared distribution with two degrees of freedom, and can be used to test the null hypothesis of normality: this is a joint hypothesis of the skewness being zero and the excess kurtosis being also zero.

Remark 3. Note that the Jarque-Bera test is very sensitive and often rejects the null hypothesis only because of a few abnormal observations, this is the reason why, one has to take point out these outliers which has to be canceled out before apply the test on the obtained smoothed data.

5. Prediction

The forecasting task is the most difficult step in the whole MSM approach. Let us suppose that our time series ends at time , without further observations, then we have to start the prediction with the following quantities:(i)the transition probability matrix ;(ii)the vector obtained from the last iteration of the filter algorithm, for example, the procedure in Section 2.1.It follows that we have to proceed with the first step of the filter procedure obtaining the one-step ahead probability of the state given the sample of observations , that is, Equation (62) can be seen as a prediction for the regime at time , knowing observations up to time . At this point, the best way to make prediction about the unobserved variable is the simulation of further observations. Indeed, with the new probability and the vector of parameter estimates we can estimate the one step ahead mean and variance as follows: Then we simulate by the Gaussian distribution and, once has been simulated, we defe . Then we first apply again the filter procedure of Section 2.1 for in order to obtain , then we compute ,   and , and we simulate by the Gaussian distribution . Latter procedure runs the same all the other rime-steps , where is the time horizon of our forecast.

Remark 4. We would like to underline that latter described method is not reliable with few simulations since each , for may assume a wide range of values and a single drawn describes only one of the many possible paths. So we can think to reiterate previous strategy many times in order to compute the mean behavior of , and . After having obtained a satisfactory number of data, then we can construct a confidence interval within the state probability will more likely take value. Obviously a high number of iterations of latter procedure rapidly increases the computational complexity of the whole algorithm because of the MLE related computational complexity, therefore we will adopt a rather different strategy which consists in simulating times at each step (e.g., ) and then taking the mean over those values. However, we must pay attention because the mean calculation could cancel the possible regime switching: for example, if we draw many times from and we take the mean, by the law of large number we will have zero at any time. To overcome this problem we can take the mean of absolute values and then multiply this mean by a number , which is a random variable that takes values 1 or −1, with equal probability, hence deciding the sign of at every simulation step.

6. Applications

In this section we are going to apply the classical inference approach for a MSM to analyse real financial time series. In particular we will first examine data coming from the Standard & Poor’s 500 (S&P 500) equity index which is considered, being based on the 500 most important companies in the United States, as one of the best representations of the U.S. stock market. Secondly, we shall consider the DAX (Deutsche Aktien Index) index which follows the quotations of the 30 major companies in Germany. Our choice is motivated by a twofold goal: first we want to test the proposed 4-states MSM model on two particularly significant indexes which have shown to incorporate abrupt changes and oscillations, secondly we aim at comparing the behaviour of the two indexes between each other.

Computations have been performed following the MSM approach described in previous section, namely exploiting the procedures illustrated in Section 2. Let us underline that, instead of a standard 3-states MSM model, we shall use a 4-states MSM approach both for the S&P 500 and the DAX returns. Moreover the analysis has been realized for different intervals of time, focusing mainly on the period of Global Financial Crisis.

6.1. The S&P 500 Case

Figure 1 reports the graph of the Standard & Poor’s 500 from 1st June 1994 to 27th May 2014, and it points out the dramatic collapse of index prices in years 2008-2009, when the crisis blowed-up causing the achievement, 6th of March 2009 with 683.38 points, of the lowest value since September 1996.

Because of the latter fact we decided to focus our analysis on recent years. In particular we take into account data starting from the 1st of June 2007, and until 27 May 2014, therefore, denoting with the set of observations and with ,  , the price data of the S&P 500, returns are calculated as usual by , , where are the values for which we want to choose the best MSM. Note that in our implementation we grouped the daily data into weekly parcels in order to make the filter procedures less time-consuming and have a more clear output, therefore we obtain a vector of 365 values, still denoted by , as shown in Figure 2.

Next step consists in understand if the returns are serially correlated or serially uncorrellated, a taks which can be accomplished by running some suitable test, for example, the Durbin-Watson test (see, for example, [19, 20] or [7]) computing directly the value of the autoregressive parameter by least square methods, namely , which gives us a rather low value, that is, , so that we can neglect the autoregressive pattern and start the analysis by considering S&P 500 returns to be generated by a Gaussian distribution with switching mean and variance, that is, where for we have if , otherwise . Therefore, we suppose that the state variable , , takes its values in the set , and we expect that the probabilities of being in the third and fourth state increase as a financial crisis occurs. Exploiting the procedure provided in Section 2.1, with respect to the returns , , we get the results shown in Figures 3 and 4.

Let us now consider the estimated standard deviation which we want to compare with the VIX index, also known as the Chicago Board Options Exchange (CBOE) market volatility index, namely one of the most relevant measure for the implied volatility of the S&P 500 index, whose value used by our analysis are reported in Figure 5.