Recent Developments on the Stability and Control of Stochastic SystemsView this Special Issue
Microstructure Models with Short-Term Inertia and Stochastic Volatility
Partially observed microstructure models, containing stochastic volatility, dynamic trading noise, and short-term inertia, are introduced to address the following questions: (1) Do the observed prices exhibit statistically significant inertia? (2) Is stochastic volatility (SV) still evident in the presence of dynamical trading noise? (3) If stochastic volatility and trading noise are present, which SV model matches the observed price data best? Bayes factor methods are used to answer these questions with real data and this allows us to consider volatility models with very different structures. Nonlinear filtering techniques are utilized to compute the Bayes factor on tick-by-tick data and to estimate the unknown parameters. It is shown that our price data sets all exhibit strong evidence of both inertia and Heston-type stochastic volatility.
Financial analysts list speculation, finiteness of assets, interest rates, tick size, price inertia, price clustering, belief heterogeneity, asymmetric information, greed and fear, and so forth as causes for price fluctuations over time. Yet, popular models like geometric Brownian motion (GBM) (e.g., Black and Scholes , Merton ) or the Cox-Ross-Rubinstein model  try to handle all these factors in an overly simple framework, resulting in unnatural phenomena like the volatility smile. Consequently, stochastic volatility, which has been observed in real prices, is often added to the price value evolution (e.g., Heston , Jachwerth and Rubinstein , Hull and White , and Nelson ) to avoid the volatility smile. However, which stochastic volatility model fits the market data best?
Nowadays, many authors talk about the misspecification of stochastic price-volatility models (including the Heston model which we show favorably herein) so much. It leads us to wonder whether there are missing ingredients to these very simple models. Even combined stochastic value-volatility models do not address tick size, price inertia, price clustering, hidden liquidity, and fear-greed cycles that traders, especially high frequency traders, must deal with. To handle these issues, one is drawn to tick-by-tick microstructure models and left with the perplex question: How should one model price inertia in continuous time? We are using the term price inertia instead of the related term price momentum because we are not weighting transaction prices by volume. Fractional Brownian motion (FBM), best known for its long memory properties, exhibits inertia and has been used to model markets (Mandelbrot , Shiryaev ) even though these models allow arbitrage strategies. We speculate that FBM’s success in modeling observed data is more attributable to inertia than long memory. However, we introduce an alternative inertia process and show that this new process better satisfies the desired properties of inertia than FBM. We then show strong statistical evidence of price inertia that lasts for hours or days using Bayes estimates and Bayes factor on real price data. We do not consider the possibility of arbitrage nor determine derivative prices for our models but rather leave these interesting mathematical finance questions to the experts. (See Capinski and Zastawniak  for an excellent introduction to these types of questions and to mathematical finance in general.) Also, we leave the difficult task of obtaining theoretical error bounds for our particle filter methods to other works. (See, e.g., Kouritzin and Zeng  and Del Moral et al.  for related work on approximate filters.) Our focus is solely on modeling observed stock price data and the methodology of determining which of a class of models best fits the observed data.
High frequency data contains complete market-participant trading activities (Engle ) and is modeled using microstructure (Black , Chan and Lakonishok , Hasbrouck [16, 17], Engle and Russell , Engle , and Bandi and Russell ). Unlike the macrostructure market, the trading noise in the microstructure market is not negligible; thus, the intrinsic asset value is not readily discernable. In this paper, we introduce a class of dynamic microstructure models, where the transaction price is formulated as a distorted and color-noise corrupted variant of the intrinsic asset value with the intrinsic asset value being a traditional stochastic value-volatility process. Indeed, we view the transaction price data as random counting-measure observations of intrinsic value corrupted by microstructure trading noise with such things as inertia and fear-greed cycles built in. However, trading noise sources themselves introduce volatility to transaction prices. This raises the question, “Do we need to model stochastic volatility explicitly in the presence of dynamic microstructure trading noise?” We will give strong evidence of the presence of stochastic volatility through Bayes factor methods and stochastic filtering theory. Moreover, we also utilize model selection to provide strong evidence of Heston-type volatility over competing stochastic volatility models based on the observed transaction data in a microstructure market. This suggests that the common viewpoint of the Heston model being highly misspecified might be better stated as overly simplistic macrostructure-only models are underspecified. Bayes factor (see, e.g., Kass and Raftery ) is our preferred model selection method since it provides statistical comparisons in real time as to which model best fits the market data while allowing the stochastic value-volatility (signal) models to be singular to one another. Indeed, to use the Bayes factor method, we need only to be able to transform all microstructure asset-price observation models of interest into the same canonical process via Girsanov-type measure change.
Previously, Zeng  studied a filtering equation for inferring the intrinsic value process in a microstructure model while Xiong and Zeng  proposed a branching particle approximation to this equation. Kouritzin and Zeng  derived a Bayes factor equation and discussed the Bayesian model selection problem to determine whether financial data, such as stock prices, display jump-type stochastic volatility. However, all these works are based on a restricted microstructure model and thus cannot be applied to our general setting. Moreover, our problems of showing statistical evidence of inertia and determining which of the classical stochastic volatility models best represents real data in the presence of microstructure noise were not considered. We also propose a new inertia process, explain its role in modeling prices, and show its statistical significance with real tick-by-tick data.
Section 2 is devoted to explaining our model. First, our five standard value-volatility models (GBM, Hull-White, Log Ornstein-Uhlenbeck, continuous GARCH, and Simplified Heston) are given followed by our microstructure inertia process and its properties and then the other components of our dynamic microstructure model. Together the value-volatility and microstructure components form our price evolution model, which, at the end of Section 2, is interpreted as a filtering model. In Section 3, we discuss model calibration and fair price/value estimation through Bayesian filter estimation. A filtering equation and a branching particle filter approximation algorithm are first given and explained. Then, their use to identify parameters and come up with initial state estimates is discussed. Finally, numeric parameter and initial state estimates for each model are given. As a byproduct, it is demonstrated that proper modeling and estimation of fair price (as is done herein) can provide information about overbought conditions and help avoid financial loss (see Figure 4). Section 4 is dedicated to Bayesian model selection. We first motivate the use of Bayes factor for model selection and explain how to estimate Bayes factor from unnormalized particle filters. Then, we establish strong statistical evidence of inertia and Heston-type volatility in all our price data through model selection using the Bayes factor method to test which fair price-volatility model and what amount of inertia best fit the observed price data.
2. The Partially Observed Market Model
In this section, we build our stochastic model that has macrostructure and microstructure components and interpret this model in terms of a signal that needs to be estimated in real time and observations which are used to form the signal estimates. The macrostructure model consists of fair price, volatility, and related parameters and will be denoted by in the sequel, with being price and volatility and being the parameters for this model. Unlike macrostructure models, we do not assume access to , but rather we take it to be part of the signal to be estimated. Indeed, a model would be judged to be better if the macrostructure price (which represents a “fair” price) is quite different than the observed price and we can use filtering to determine overbought and oversold situations.
The microstructure price construction converts the macrostructure model into the observed price. Such things as inertia (or momentum), fear-greed cycles, and whole-price clustering (or rounding), which are not part of the fair price, are incorporated into the microstructure model. A distinguishing feature in our microstructure is dynamic state: To allow the microstructure to influence price over a period of time so that the observed microstructure price can differ from fair price significantly, one needs to add and then estimate microstructure state . In particular, the inertia process, characterized by a parameter , is introduced to capture price inertia that might be caused by hidden liquidity; various reaction and access times to information as well as momentum traders themselves. This inertia process is not Markov, so we will have to consider the historical version of this state. Further, is also unobservable and hence must be added to the signal along with microstructure parameters and all must be estimated as nuisance parameters.
The nondynamic part of the microstructure noise consists of rounding and clustering noise. It is widely observed in markets that more trades occur at more even prices like whole nickel or whole dollar levels. Therefore, to match observed prices well, we should have a mechanism to convert evenly distributed raw prices into whole-price-biased observed prices. This is done by binning raw prices into sets , , , , and depending on how even they are and then randomly moving raw prices in the less even bins to close prices in the more even bins in order to match the observed prices.
The observations then become the marked counting process of the number of trades that occur at the various prices. We will later use these observations to select and calibrate models and to estimate the augmented signal:The whole point of the microstructure is to allow the macrostructure price to distinguish itself from the observations and rather to represent fair value. We then use filtering on asset prices to estimate implied value (hereafter called fair price) and thereby judge whether an asset is overbought or oversold.
2.1. General Notation
Let be a fixed time period and let be a complete filtered probability space. For any stochastic process , its natural filtration, defined as , represents the information in up to time . denotes the set of nonnegative integers and, for any Polish space is the set of all bounded measurable -valued functions on .
2.2. Common Macrostructure State Models
We use a macrostructure model for the unobservable fair price together with its volatility and parameters. Here, is the macrostructure financial state (fair price plus volatility) with macrostructure parameter for some . We let be a probability distribution on , take to be a generator with domain , and assume satisfies the martingale problem.
Definition 1. is the unique solution of the -valued martingale problem for with initial distribution . That is,is -martingale for each Moreover, if also satisfies (i) and (ii), then and have the same finite dimensional distributions.
Remark 2. While does not vary in time, we include it in our macrostructure model to be estimated because it is still unknown. Nevertheless, the operator does not act on the variable since for our fixed parameters.
The martingale problem formulation (2) (see Stroock and Varadhan , Ethier and Kurtz  for more details) is general enough to cover most interesting financial models. In this paper, the macrostructure state consists of two components: the fair price and the stochastic volatility (if any). The most common example of in finance is the “geometric Brownian motion” (GBM) utilized in the classical Black-Scholes option pricing formula. Throughout this section, and are two independent standard Brownian motions and .
In GBM model, the volatility is a constant. To account for the “volatility smile” commonly observed in market option prices (see Jackwerth and Rubinstein  for a detailed survey), the GBM model is generalized to stochastic volatility (SV) models, where itself is replaced by a stochastic process . Some of the popular SV models include the following.
Example 4 (Hull-White model; see Hull and White ). Considerwith parameters and generator
Example 5 (Logarithmic Ornstein-Uhlenbeck model; see Scott ). We have thatwith parameters and generator
Example 6 (continuous GARCH model; see Nelson ). We have thatwith parameters and generator
Example 7 (simplified Heston model; see Heston ). We have thatwith parameters and generator
We label this example as simplified because we do not allow and to be correlated as Heston did. There is no mathematical issue by including this correlation, but it would add a parameter to the model, which increases computation time. The Heston model already performed the best without this parameter. GBM (with microstructure) plays a special role in our study as it is our no stochastic volatility model. We will compare our other models against it on real data to determine whether stochastic volatility is present. In summary, refer to Table 1.
Remark 8. The continuous GARCH model is the continuous-time limit of many classical GARCH-type discrete-time processes (Nelson , Drost and Werker ). We did not consider jumping stochastic volatility models (e.g., Elliott et al. , Kouritzin and Zeng , Duffie et al. , Eraker et al. , and Eraker ) or models where , are correlated, due to our need to dedicate our limited computer resources to handling our complicated (non-Markov) microstructure with inertia. Still, we want to emphasize that the computational complexity we experienced is fundamental to the fact that we are using non-Markov (inertia) models and has little to do with our particular methods. Indeed, our Bayes factor filtering methods are what makes the computations possible on an inexpensive contemporary desktop computer.
2.3. Construction of Microstructure Price
The fair price-volatility models account for the random variances of the intrinsic asset value; thus, the selection of proper SV model is crucial for investing, derivative pricing, and hedging. On the other hand, microstructure noise (see Black , Hansen and Lunde , Duan and Fulop , etc.) causes random perturbations of transaction price from its intrinsic value and the disregard of such trading noise introduces severe bias into stochastic volatility estimation (see Duan and Fulop ). We incorporate microstructure trading noise into traditional fair price-volatility models and use statistical filtering to reveal such things as short-term inertia in the trading noise and stochastic volatility in the intrinsic value.
In microstructure markets, the price changes occur only at irregularly spaced transaction times with total trading intensity (see Engle ). Here, we assume is just a time-varying measurable function as the empirical analysis illustrates that there is no need to consider more general structures. At each transaction time , the transaction price is formulated aswhere is some nonlinear random field modeling the trading noise. Formulation (13) is similar to that of Hasbrouck , where is the intrinsic and permanent component while introduces the transitory component.
The empirical evidence reported by Hansen and Lunde  suggests strongly that the trading noise is serially correlated. Similar results can be found in Aït-Sahalia et al. . Indeed, there exist situations in which the trading noise variance estimate is zero if the trading noise is simply assumed to be independent (see Duan and Fulop ). This does not mean there is no trading noise but rather that the trading noise is autocorrelated. To characterize this correlation, Hansen and Lunde  assume the trading noise to be some Gaussian random sequence with stationary covariance and finite dependence. However, this model is most suitable for the low-frequency data and ignores many crucial microstructure effects. We build correlation into our microstructure information noise through inertia and mean-reversion while utilizing microstructure rounding and clustering noise to explain the discreteness and whole-price biasing.
The idea of momentum or inertia has been used in many studies (see Jegadeesh and Titman , Moskowitz and Grinblatt , Grundy and Martin , Grundy et al. , etc.). Basically, there is the tendency for a stock to continue to move in one direction. To illustrate our approach, we introduce the following definition.
Definition 9. A process is said to have stochastic inertia at time if is called the inertia function.
The idea behind our definition is that for inertia we should expect and to have the same sign for , but close to and small. We strengthen this condition toMany processes have inertia. However, to model the stock price effect of the information reaching all market participants, we want the following five properties: (1) is Gaussian and driftless and is proportional to so resembles Brownian motion; (2) is finite, not infinite, indicating that the influence of past values on immediate future is not too strong; (3) makes sense from informational and hidden liquidity points of view; more precisely, it can explain well the price effects due to the reactions of all market participants to information and rumor being diffused and simulated over a period of time as well as due to the purchases or sales of an agent spreading out a large change in his/her position over time; (4) is easy to simulate using, for example, the Gaussian property; (5) is easy to analyze.
Neither a Brownian motion nor more generally a square integrable martingale has inertia. Brownian motion with drift has inertia but we do not want drift. For fractional Brownian motion (FBM) ,where is the Hurst parameter. Therefore,Thus, the inertia function of is infinity for all if (and is if ). Neither case satisfies our five properties. Still, standard representations of FBM motivate the creation of driftless inertia by convolving a Brownian motion with the desired impulse response for information dissemination. With this in mind, we consider the following inertia process.
Definition 10. Our stochastic inertia process iswhere is a -dimensional standard Brownian motion, , and
Remark 11. is a weighted average of the historical information (the first term) and fundamental information (the second term). In fact, can be viewed as the impulse response on price created by market participants receiving and simulating the “information” and determines the diffusion speed in the market. This formulation captures the idea that news or rumor and its ramifications require time to be fully disseminated and understood. When , it represents the case of only historical information resulting in the strongest inertia in prices. Alternatively, we can use inertia to explain “hidden liquidity.” If everybody knew that an agent was going to make a big change in a position, then the price would immediately jump. However, if the agent breaks up the desired change into small transactions, then it takes time for this extra buying or selling pressure to be recognized in the market. In this case, represents the case, where all changes in position are done over a period of time and represents the time to effect 58% of the positional change.
Note that is a centered Gaussian process such that the autocovarianceis positive for any . In particular,Thus, converges to as with speed determined by . (Hence, informational noise increases at the same asymptotic rate as Brownian motion.) Moreover,and, using standard antiderivatives,Thus, the inertia function of our inertia process is , the steady-state inertia isand this happens quickly for small . We can thus verify that , defined in (18), satisfies our five desired properties. One can also look upon as the time for new information to be disseminated to fifty-eight percent of the market. Below, we consider three different dissemination times: minutes, hours, and day on real stock data. Finally, the fact that is Gaussian eases its simulation greatly.
2.3.2. Information Noise and Augmented State
Hitherto, we have focused on constructing inertia processes. Now, we include all informational noise into asset prices. Information noise is introduced to represent trading noises due to things like inertia, fear-greed cycles, belief heterogeneity, and asymmetric information. For the th-transaction occurring at , the raw price is defined bywhere is the dynamical part of the microstructure through which inertia is introduced (with our inertia process ) and . The case is of particular importance in the sequel as it represents the nondynamical microstructure case and is used as a calibration model.
The information noise consists of two parts: is a sequence of independent standard Gaussian random variables, ; is Ornstein-Uhlenbeck- (O-U-) like inertia velocity process with mean-reverting parameter Here, , , and are independent and is a constant. provides an intuitive continuous-time model that accommodates the joint presence of the inertia and mean-reversion. Our information noise is more reasonable than that of Zeng  in that (1) we preclude the possibility of negative prices by using multiplicative noise; (2) the stochastic inertia process captures the empirical feature of the inertia observed in transaction prices (e.g., Jegadeesh and Titman ); (3) the mean-reverting structure of when combined with the inertia captures the cyclic property of prices (e.g., Black ). is not a Markov process, so we introduce its historical process aswhich is Markovian. Moreover, , the space of all continuous functions on , since the paths of are continuous. Consequently, we augment the state vector to bewhere is the microstructure noise parameter set. The advantage of this formulation is that we can estimate and thus jointly with other components using particle filtering methods. The generalized state incorporates fair price, volatility, parameters, and the historical trading noise while keeping the tractability of a Markovian framework.
Remark 12. We include neither nor into the model parameters but rather consider different models corresponding to different values of and as well as different SV models 1–5. Indeed, we will provide evidence of inertia in the sequel by using Bayesian methods to select a model with a large value of based upon tick-by-tick stock data.
2.3.3. Rounding and Clustering Noise
Our final modeling goal is to convert uniform raw price into observed whole-price-biased price. While raw price can take any value, the trading price is restricted to multiples of the tick, , for some positive integer . The tick size in New York Stock Exchange (NYSE) was switched to $1/16 from $1/8 in June 24, 1997, and then further to $0.01 from January 29, 2001. The empirical studies suggest that the tick size plays an important role in microstructure market analysis (e.g., Huang and Stoll ). Since we are concerned with price clustering for decimal pricing in stock markets, we let .
It is well documented that there is price clustering to more whole prices. To quantify this price clustering, we examine the price behavior for three NYSE-listed stocks over April 2010 (Figure 1 and Table 2). (In a larger study, we considered eight NYSE stocks in different sectors. However, we only report on three here to conserve space. The results for the other five were similar in nature.)
The transaction data of these stocks shows there is modest clustering at multiples of cents as shown in Figure 1, plotted in terms of pennies. Supposing the raw price falls in the interval , then if there was no clustering noise, the trading price would just be . Thus, the probability of trading at with no clustering noise given , would beEquivalently, we can write in terms of the historical process aswhere is the projection onto time ; that is,Clearly, is a smooth function of for each fixed .
To build the observed whole-price bias into our model, we introduce the following sets:While the raw price will be uniformly distributed over (or rather the continuous interval ), the observed price model must bias over , over either or , and so forth. We distribute the observed price randomly over based upon the raw price in a biased manner favoring the more whole-price ticks in . In particular, if the fractional part of the raw price rounded to the nearest cent is in , then the observed value will stay at the same price with probability or move to the closest multiple of cents, that is, the closest tick level in with probability . Then, if the fractional part of the price is in , it will stay in the same level with probability or move to the closest tick level in with probability . Finally, if the fractional part of the price is in , then it will stay in the same level with probability or move to the closest tick level in with probability and the closest tick level in with probability In summary, the transition probability function is obtained iteratively by the following.
Case 1. If the fractional part of belongs to ,
Case 2. If the fractional part of belongs to ,where
Case 3. If the fractional part of belongs to ,where
Case 4. If the fractional part of belongs to ,
Case 5. If the fractional part of belongs to ,Moreover, we have to handle the case separately to avoid negative prices.
Case 6. For ,
Remark 13. Our clustering setup is designed to work well for intrinsic prices over $1. For real penny stocks, our setup would introduce positive bias and should be modified slightly.
Using relative frequency analysis on the aggregate of our three stocks, we found the values presented in Table 3.
The large degree of clustering exhibited, especially to the whole dollar, might be considered surprising. However, earlier studies of Huang and Stoll , Chung et al. , and Chung et al.  also showed significant clustering. Moreover, the degree of price clustering in NYSE is weaker than that of NASDAQ. For example, Barclay  examined 472 stocks from NASDAQ before and after their listing in NYSE or American Stock Exchange (AMEX): before the listing, the average fraction of even-eighths () is while thereafter it drops to about 56%.
2.4. Nonlinear Filtering Model
Our price process can be formulated as a marked point process : a sequence of random vectors , where denotes the time of th-trade and the corresponding trading price. Accordingly, the mark space of is , where and is all its subsets. Here, corresponds to the th-tick level . For each , we associate the counting processto count the trades in tick level set up to time . In particular, for ,denotes the total trades at th-tick level until time . Equivalently, we can introduce the random counting measure on byThe natural filtration, that is, information content, of isNow, we assume the following.
(C1) The total trade process admits an intensity for some positive measurable function .
Therefore, using the conditional probabilities defined in the previous subsection, we find that has intensityTo simplify the notation, we rewrite (44) as .
For our present work, we estimated total intensity function from intertrade data allowing for intraday variation. Figure 2 is the intertrade duration histogram of our 3 NYSE-listed stocks averaged over all times of the day. We divided the intertrade data into half-hour periods over the course of the day and took to be constant over these half-hour periods: for in that daily period.
(C2) There exist some positive constants such that for all
Based on representation (40), (44), is framed by a partial-observation model, where is the state (signal), which is partially observed through the infinite dimensional counting process . One difficulty in calibrating these models is that their transition probability functions are usually unknown in closed form, so maximum likelihood estimation (MLE) methods are difficult to use (see Aït-Sahalia and Kimmel  for further details). Instead, we use Bayesian filtering because (1) Bayes estimates do not require the availability or regularity of the full likelihood functions; (2) Bayes estimates can be computed recursively for our tick-by-tick data; (3) Bayesian hypothesis tests can be conducted through Bayes factor, which is the ratio of marginal likelihoods and is easily computed even when the signals are of different dimension or, more generally, singular to each other.
3. Model Calibration
Our foremost goal is to contribute to the process of model building for financial markets both by suggesting elements to be included in the models and proposing methods to select models based on real observation data. To be able to do this effectively, we need to be able to tune each possible model effectively to get good prior (probability distribution) estimates for the complete signal before the test period. We do this through nonlinear filtering and in particular through particle filtering. In this section, we first introduce the filtering equations for our problem. Then, we introduce a branching particle filter algorithm that is an approximation to the unnormalized filter and can be implemented on a computer. Next, we explain how we did the calibration (i.e., came up with this prior distribution) and finally we give the results for the models of interest herein.
3.1. Nonlinear Filtering Equations
The available information about is the observation filtration , defined in (43), and the primary goal of nonlinear filtering is to characterize the conditional distributionor, equivalently,for Here, , is the long memory portion of our information noise and is the state and parameter of our fair price-volatility martingale problem.
Remark 14. Actually, we often only want to estimate , but there is no simple recursive formula for this marginal. The filter is naturally model dependent, so we can produce different filtering processes for each model, that is, for each SV choice (1–5), each value of , and each value of in our inertia process.
Suppose is a positive constant for each such that , and consider the continuous-time likelihood function is a martingale under Condition (C2) and , defined byis called the reference measure. Under , the observations are just a Poisson measure, independent of the state vector , with mean measure . To make the likelihoods more manageable in the particle filters to follow, we choose to be a long time average value of and to be highest where the trades will be more concentrated. Bayes Theorem (see Bremaud , p. 165) then links the desired (real-world) conditional distribution with the unnormalized filter bywhere the unnormalized filter is defined byfor all . Now, we can give the evolution equation for .
Theorem 15. Under (C1) and (C2), the unnormalized filter is the unique measure-valued solution of the stochastic filtering equationfor and .
This theorem is a modest generalization of prior results and can be obtained in much the same manner as results in Kouritzin and Zeng  and Xiong and Zeng . Here, is the generator of the joint martingale problem to obtained from , the generator of state and , the generator of the historical process We do not need an explicit formula for . Instead, we can use particle filters to approximate .
Henceforth, it is convenient to think of the reference measure as the standard measure from which we can construct the measure corresponding to model with parameters and microstructure with parameters , , and .
3.2. Particle Filter
The weighted filter is the simplest of particle filters. The idea behind the weighted filter is that, by the independence of signal from the observations under , we can create an infinite collection of particles , each having the same law as that are also independent of the observations. Then, it follows from the law of large numbers that for -almost all we have the weak convergence of finite measuresUnfortunately, it is well known that the weighted particle filter may not work well for a fixed number of particles . Roughly speaking, most of the particles diffuse away, do not track the signal well, are assigned low likelihoods, and do not really affect the average . Meanwhile, very few particles do match the observations better and have likelihoods that are orders of magnitude higher than of the majority of particles. essentially becomes an average over too few particles to reflect well.
To fix the weighted filter particle spread problem, we add particle resampling, resulting in following novel particle filter. (See Gordon et al. , Del Moral et al. , Del Moral et al. , and Ballantyne et al.  for earlier algorithms.) For some large , the particle system is constructed as follows.
At the initial time , we generate independent particles from the joint prior distribution of . The empirical measure at iswhere is the Dirac measure at . By the strong law of large numbers, so for almost all . Here, for measures so
Remark 16. When , Note that is a constant function defined on . Whereas most particle filters approximate the filter , we will approximate the unnormalized filter to facilitate Bayesian model selection without the storage of prior filter estimates.
We also initialize the number of particles to and particle likelihoods all to .
Between observations at and , the particles, , move independently as samples from the transition probability of . In particular, we use the Euler scheme (see, e.g., Kloeden and Platen ) to evolve the dynamics, Examples 3–7 and (25). We let denote the evolved version of .
3.2.3. Particle Weights and Average Weight
We simulate using the reference measure and we incorporate the observations based upon (48). At the th observation , the th particle’s weight is multiplied bywhere . Hence, the th particle’s weight becomesand the average weight isNote that in (57) by continuous paths. Here, depends on the observation and the increment of likelihood ratio of measure over measure defined by (48) given the simulated particle path realized on the interval . These weights do not depend upon the parameters directly. This is common and is why the observations are often called partial observations. We still can estimate and include these parameters as part of the particles’ states since they do affect stock price , which is observed in the presence of noise and distortion. The weights are stored along with the states of particles before resampling.
After weighting, we resample the particles pruning the unlikely ones and duplicating the better ones in an unbiased manner. In particular, we let be -Bernoulli random variable independent of everything and produce particles at location . We then give all the particles weight and let
3.2.5. Unnormalized Filter
Now, we can estimate the unnormalized filter at the th observation time, , byThe actual algorithm that was implemented is as follows.
Initialize. are independent samples of , , , for all , and for .
Repeat. For , do(1)evolve to independently of other particles;(2)weight by observation: for ;(3)estimate by ;(4)average weight: ;(5)repeat: for do(a)offspring number: , with being -Bernoulli independent of everything;(b)resample: for ;(c)add offspring number: .
Remark 17. (i) We extract our estimate before resampling to avoid excess noise. (ii) The key step is (5) that determines the new number of particles and weights in an unbiased manner. The result is zero or more particles all having the average weight at the same location as the parent. (iii) The particle evolution would typically be done via Newton’s or Milstein’s method.
Since the above algorithm produces unbiased resampling of the weighted particle filter, it is quite reasonable to believe the following result.
Theorem 18. Under (C1) and (C2), for any and almost all observation paths.
The technicality of this result’s proof would detract from our applications so it is omitted.
3.2.6. Bayesian Estimation
By Bayes rule (50), the particle approximation of the normalized filter isfor all . To get our parameter estimates, we can just set to one component of these parameters, that is, or .
3.3. Calibration and Historical Training
To keep the problem size manageable, we just used the clustering parameter estimates of , , , and given above as the actual values throughout our simulations.
One is often faced with the problem of estimating initial distributions for fair price, volatility, and the parameters prior to filtering over the time interval of interest (April 2010 here). Our approach was to make arbitrary assignments very far in the past (January 3, 2000, to be precise) and then do an excessive amount of prior particle filtering, relying on the ability of the filter to forget its starting point and to produce reasonable distributions at a much later point, April 1, 2010. (See, e.g., Ocone and Pardoux , Delyon and Zeitouni , and Atar  for mathematical results regarding this phenomenon.) This had to be done for every model, namely, every combination of our three stocks, five SV models, and multiple microstructure models, characterized by inertia parameters. Our main purpose in this historical training was to get a starting joint distribution for as of April 1, 2010, under each model combination. Due to the large number of cases this produced, we first display and discuss two models: the nondynamical microstructure Heston case and the median inertia dynamical case where and s (i.e., hrs) in the inertia microstructure model. Also, to ensure that and did not converge to a single value, we made them vary slightly in a random manner; that is, we replaced the equation with for a very low variance Brownian motion .
In Figure 3, we illustrate our prior filtering of PepsiCo. The choppiest curve is the actual stock price while the smoothest curve is the filter’s fair price estimate using the Heston SV model with (median) microstructure inertia. The middle curve is the filter’s fair price estimate using the Heston SV model without dynamics in the microstructure; that is, . These curves go beyond April 1, 2010. However, the required initial distributions were taken from the filter at that point.
Notice from Figure 3 that the implied fair price process estimate is far less volatile in the presence of dynamical microstructure than without. This lower volatility for fair price is highly desirable. It does not make sense that the fair price of a stock should fluctuate dramatically from day to day or within a day in the absence of an event, but rather these short-term fluctuations are better explained by trading noise. Moreover, fair price is a mathematically more optimal version of moving averages, which are used to judge value and momentum from, and so fair price estimates should inherit the smooth nature of such moving averages. From a modeling perspective, this fair price smoothness indicates that dynamical microstructure (with inertia) can replace much of what stochastic volatility tries to do and leads to one of our central questions addressed below. Is stochastic volatility necessary in the presence of dynamical microstructure?
3.4. Numerical Results
The data is one month (April 2010) of transaction prices of our three NYSE-listed stocks. Our filter produces Bayes estimates to the macro- and microparameter vectors and , respectively. These estimates in the nondynamical microstructure case (i.e., using the simpler form in (24)) for PepsiCo are as shown in Table 4.
All parameters are estimated using time in seconds. Our PepsiCo Bayes estimates in the median inertia case are as shown in Table 5.
While it is difficult to read much from these numbers, we can see that the main volatility parameters , , and are mostly smaller when dynamics is included in the microstructure. This further justifies our conjecture that at least some stochastic volatility is better replaced by microstructure with dynamics.
Figures 4 and 5 show the conditional expectation fair price estimation for Goldman Sachs and PepsiCo, respectively, in the cases of no dynamics and median inertia dynamics for each of our SV models. There are a total of eleven curves in both figures. The most volatile curve is the stock price itself over this month. The smoothest curves somewhat separated from the stock price are the fair price estimates using the five SV models with (median inertia) dynamical microstructure. The remaining five curves (that hug the stock price in Figures 4 and 5) are our fair price estimates for our five SV models with nondynamical microstructure. In this last case, the microstructure does not have the power to separate the fair price and actual stock price to any large degree.
It is important to realize that these pictures are really just a one-month snapshot of a much bigger multiyear filtering process. This explains why many of the fair price processes are significantly different than the actual stock price on April 1, 2010: The filter is estimating that the difference is due to the microstructure. It is apparent that adding dynamics to the microstructure allows the estimated fair price process to differ significantly from the stock price. Indeed, there is a significant correction of all three stock prices (especially Goldman Sachs) towards estimated fair price of the models with (median inertia) dynamical microstructure. This produces a compelling reason to use models with microstructure dynamics. You would be estimating that the stocks were significantly overvalued before the correction if you used the model with microstructure dynamics and this could be used as a warning to lessen ones exposure. You have no such warning when the microstructure does not contain (inertia) dynamics as the estimated fair price is very close to the observed price. It is interesting to ponder what this possible discrepancy would mean to option prices.
The filters provide conditional distributions and estimates for more than just fair price and parameters. Table 6 shows the average volatility estimates without microstructure dynamics (see (24)) and with (the best performing) microstructure inertia using the simplified Heston SV model. We only highlighted Heston here because (1) we will show evidence below that Heston performs the best and (2) the volatility estimates of the other SV models behave similarly. The amount of stochastic volatility estimated when there is (median inertia) dynamics in the microstructure shrank to a couple of percent of what it was without. This really suggested that by far the primary use of stochastic volatility is as a proxy for microstructure with dynamics and further raises the question about the need for stochastic volatility in the presence of microstructure dynamics.
The final and most difficult quantity the filter estimates (in the dynamical microstructure case) is the historical noise. For practical purposes, we can not let the historical path go back all the way to year 2000, but we found that there is not much loss if we just update discrete samples over the previous three years, which is still a tremendous amount of data. Also, we can not plot these historical paths so we just plot the projection onto the current time; that is, we just plot even though we must propagate the Markov process in the filter. Figure 6 shows the noise estimate for PepsiCo. In this graph, we look at the effect of inertia. The curves where represent the no-inertia case, so is just an Ornstein-Uhlenbeck process. Conversely, the case represents the one hundred percent inertia case and is not Markov. We see from these graphs that the amount of estimated noise is very similar indicating that the amount of inertia modeled might not be that significant. However, the noise processes where are far smoother due to the inertia. Below, we will produce strong evidence that inertia is important and find that the best is in the range , depending upon the stock. We compare the behavior of our models in terms of the SV models and the inertia parameters and within the Bayesian model selection framework in the following section.
4. Evidence for Inertia and Stochastic Volatility
The main objective of this section is to use Bayes factor to investigate the model selection in microstructure markets. To use the Bayes factor method, we need only to be able to transform all observation models of interest into the same canonical process via Girsanov measure change. The signal models can be singular to one another. Kouritzin and Zeng  discuss the Bayesian model selection problem. However, their equations do not apply to our models.
4.1. Model Selection and Bayes Factor
Consider our five SV macrostructure fair price-volatility models where the generators of the martingale problem to are, respectively, for . Normally, we would have to consider a multitude of parameters resulting in a plethora of models. However, by our calibration process we have reduced the setting to one parameter set per martingale problem so we have a base of five models. However, we still have to consider the various choices for our inertia. For simplicity, we restrict ourselves to three distinct values for , eleven choices for , and we use the calibration process to estimate the other microstructure parameters . Therefore, we have a total of models to test.
The likelihood of being produced by model up until time isHere, is the counting measure on and the same observations and observation rate information are used for all models. One can think of as the likelihood ratio of the model with distribution characterized by to the simple (or null) model with distribution where the observation prices just arrive according to a Poisson measure with intensity measure , that is, with rate independent of any macrostructure model and independent of any microstructure state. In other words, then transforms the observations into the same Poisson measure with intensity measure regardless of . Unfortunately, depends upon , , which are unknown so we can not select models via the likelihood.
4.2. Bayes Factor
The available information in microstructure market is the observation process , which represents the cumulative transaction records throughout all tick price levels. The normalized filter , , , , satisfieswhere for , the unnormalized filter isand is the integrated (or marginal) likelihood of for model .
Now, we use Bayes factor to compare models. The Bayes factor determines which model best fits this observed data by doing pairwise comparisons. We define Bayes factor of model to the null model by the conditional likelihood:which is consistent with more basic definitions of Bayes factor. It then follows that the Bayes factors for two models, characterized by and , are the ratioswith the integrated likelihoods , that can be approximated using the algorithm of Section 3.2.5. Kass and Raftery  demonstrate how to interpret Bayes factor shown in Table 7.
4.3. Numerical Results on Stochastic Volatility
First, we consider the problem of selecting the best of our five fair price-volatility models, and the resulting partially observed market models,We compare these five models to determine which can best represent the market data. More precisely, we run all unnormalized filters as explained in Section 3.2 with the optimal parameters discovered and reported earlier. Then, we choose Model if is the largest. Naturally, this corresponds to the model whose Bayes factor ends up greater than one when compared to any other model. While we have five basic models, we also consider different market ingestion times and inertia magnitude parameters for each model.
Using GBM with nondynamic microstructure (i.e., ) as the benchmark, we determine which combination of SV model and inertia parameters outperforms GBM most. We first focus on the candidate models (Examples 3–7). In each case, we pick the inertia parameters from the sets and that would yield the highest Bayes factor against the calibration model. The data is the transaction price of PepsiCo, IBM, and Goldman Sachs, April 2010. Figure 7 and Table 8 summarize the Bayes factor performance. The Bayes factors computed in this table give strong evidence (based upon the Kass and Raftery criterion mentioned before) for the Heston model based on a full month of real tick-by-tick stock price data. Indeed, as we will see below, there would still be strong evidence supporting Heston if we used different values of and . It is also interesting that the order of the models did not change over our three stock selections, with Heston always being preferred and GBM always performing the worst. Recall that all models are tuned to have their best parameters and .
4.4. Numerical Results on Inertia
Next, we look at the ingestion time using nondynamic microstructure Heston as the calibration model. Figure 8 and Table 9 show the effect of varying over for fixed to give the highest Bayes factor. There is a drop in the Bayes factor values from the model determination experiment which is entirely due to the change of calibration model from GBM with nondynamic microstructure to Heston with nondynamic microstructure. Our results show that the best ingestion times for Goldman Sachs, PepsiCo, and International Business Machines stocks are, respectively, 1/2 day, 2 hours, and 1/2 day. The fact that the data supports long-time ingestion might add merit to the case of the momentum trader.
Finally, we investigate the optimal amount of inertia. Figure 9 and Table 10 show the effect of varying the amount of inertia over for fixed to give the highest Bayes factor. The table shows that inertia is important. In fact, the best was always at least and was even in the case of IBM so all microstructure dynamics should be driven by the inertia process.
Herein, we considered five popular SV models to represent intrinsic or fair price and stochastic volatility of this price. These SV models are free of inertia or momentum. We then added microstructure noise with possible dynamics and inertia to these SV models to accommodate trading noise, trend following, information dispersion, and the slow unwinding of big positions. We used Bayesian model selection techniques to determine which of these combined models fits real NYSE data best. In the process of selecting the best model we also investigated characteristics like microstructure dynamics, inertia, and stochastic volatility. For the stock data considered, we can conclude the following:(1)Bayesian model selection through particle filtering provides a computationally effective means to identify the best finance models on real tick-by-tick data.(2)The SV and inertia components of the financial models compared can be singular to each other as long as the microstructure can be changed into the same canonical Poisson measure process for all models considered.(3)There is strong evidence of dynamical microstructure noise.(4)Adding dynamics to the microstructure allowed much greater deviations of price from intrinsic value, which can be detected by filtering and used as a warning sign to investors and traders.(5)The simplified Heston stochastic volatility model with microstructure dynamics and significant inertia performed the best in all cases.(6)There is strong statistical evidence that such simplified Heston stochastic volatility models with microstructure dynamics and inertia match the data better than the classical geometrical Brownian motion.(7)The amount of inertia and the time it lasted varied a little from stock to stock but in all cases there was significant inertia that lasted for hours.More complicated SV models can be investigated in our future work. One could also postulate more complicated microstructure dynamics and consider additional real data analysis.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
B. B. Mandelbrot, Fractals and Scaling in Finance: Discontinuity, Concentration, Risk, Springer, 1997.
A. Shiryaev, Essentials of Stochastic Finance, World Scientific, 1999.
M. Capinski and T. Zastawniak, Mathematics for Finance. An Introduction to Financial Engineering, Springer, London, UK, 2nd edition, 2011.
F. Black, “Noise,” Journal of Finance, vol. 41, pp. 529–543, 1986.View at: Google Scholar
J. Hasbrouck, Handbook of Statistics, vol. 14 of Modelling Market Microstructure Time Series, North-Holland, Amsterdam, The Netherlands, 1996.
R. E. Kass and A. E. Raftery, “Bayes factor and model uncertainty,” Journal of the American Statistical Association, vol. 90, pp. 773–795, 1995.View at: Google Scholar
D. W. Stroock and S. R. S. Varadhan, Multidimensional Diffusion Processes, Springer, Berlin, Germany, 1979.
S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Convergence, John Wiley & Sons, New York, NY, USA, 1986.
Y. Aït-Sahalia, P. Mykland, and L. Zhang, “How often to sample a continuous-time process in the presence of market microstructure noise,” Review of Financial Studies, vol. 18, pp. 351–416, 2005.View at: Google Scholar
P. Bremaud, Point Process and Queues: Martingale Dynamics, Springer, New York, NY, USA, 1981.View at: MathSciNet
P. Del Moral, J.-C. Noyer, and G. Salut, “Maslov optimisation theory: stochastic interpretation, particle resolution,” in 11th International Conference on Analysis and Optimization of Systems Discrete Event Systems, vol. 199 of Lecture Notes in Control and Information Sciences, pp. 312–318, Springer, Berlin, Germany, 1994.View at: Publisher Site | Google Scholar
P. E. Kloeden and E. Platen, Numerical Solution of Stochastic Differential Equations, Springer, 1992.
B. Delyon and O. Zeitouni, “Lyapunov exponents for filtering problems,” in Applied Stochastic Analysis, M. H. A. Davis and R. J. Elliott, Eds., pp. 511–521, Gordon and Breach Science Publishers, London, UK, 1991.View at: Google Scholar