Recent Developments on the Stability and Control of Stochastic Systems
View this Special IssueResearch Article  Open Access
Microstructure Models with ShortTerm Inertia and Stochastic Volatility
Abstract
Partially observed microstructure models, containing stochastic volatility, dynamic trading noise, and shortterm inertia, are introduced to address the following questions: (1) Do the observed prices exhibit statistically significant inertia? (2) Is stochastic volatility (SV) still evident in the presence of dynamical trading noise? (3) If stochastic volatility and trading noise are present, which SV model matches the observed price data best? Bayes factor methods are used to answer these questions with real data and this allows us to consider volatility models with very different structures. Nonlinear filtering techniques are utilized to compute the Bayes factor on tickbytick data and to estimate the unknown parameters. It is shown that our price data sets all exhibit strong evidence of both inertia and Hestontype stochastic volatility.
1. Introduction
Financial analysts list speculation, finiteness of assets, interest rates, tick size, price inertia, price clustering, belief heterogeneity, asymmetric information, greed and fear, and so forth as causes for price fluctuations over time. Yet, popular models like geometric Brownian motion (GBM) (e.g., Black and Scholes [1], Merton [2]) or the CoxRossRubinstein model [3] try to handle all these factors in an overly simple framework, resulting in unnatural phenomena like the volatility smile. Consequently, stochastic volatility, which has been observed in real prices, is often added to the price value evolution (e.g., Heston [4], Jachwerth and Rubinstein [5], Hull and White [6], and Nelson [7]) to avoid the volatility smile. However, which stochastic volatility model fits the market data best?
Nowadays, many authors talk about the misspecification of stochastic pricevolatility models (including the Heston model which we show favorably herein) so much. It leads us to wonder whether there are missing ingredients to these very simple models. Even combined stochastic valuevolatility models do not address tick size, price inertia, price clustering, hidden liquidity, and feargreed cycles that traders, especially high frequency traders, must deal with. To handle these issues, one is drawn to tickbytick microstructure models and left with the perplex question: How should one model price inertia in continuous time? We are using the term price inertia instead of the related term price momentum because we are not weighting transaction prices by volume. Fractional Brownian motion (FBM), best known for its long memory properties, exhibits inertia and has been used to model markets (Mandelbrot [8], Shiryaev [9]) even though these models allow arbitrage strategies. We speculate that FBM’s success in modeling observed data is more attributable to inertia than long memory. However, we introduce an alternative inertia process and show that this new process better satisfies the desired properties of inertia than FBM. We then show strong statistical evidence of price inertia that lasts for hours or days using Bayes estimates and Bayes factor on real price data. We do not consider the possibility of arbitrage nor determine derivative prices for our models but rather leave these interesting mathematical finance questions to the experts. (See Capinski and Zastawniak [10] for an excellent introduction to these types of questions and to mathematical finance in general.) Also, we leave the difficult task of obtaining theoretical error bounds for our particle filter methods to other works. (See, e.g., Kouritzin and Zeng [11] and Del Moral et al. [12] for related work on approximate filters.) Our focus is solely on modeling observed stock price data and the methodology of determining which of a class of models best fits the observed data.
High frequency data contains complete marketparticipant trading activities (Engle [13]) and is modeled using microstructure (Black [14], Chan and Lakonishok [15], Hasbrouck [16, 17], Engle and Russell [18], Engle [13], and Bandi and Russell [19]). Unlike the macrostructure market, the trading noise in the microstructure market is not negligible; thus, the intrinsic asset value is not readily discernable. In this paper, we introduce a class of dynamic microstructure models, where the transaction price is formulated as a distorted and colornoise corrupted variant of the intrinsic asset value with the intrinsic asset value being a traditional stochastic valuevolatility process. Indeed, we view the transaction price data as random countingmeasure observations of intrinsic value corrupted by microstructure trading noise with such things as inertia and feargreed cycles built in. However, trading noise sources themselves introduce volatility to transaction prices. This raises the question, “Do we need to model stochastic volatility explicitly in the presence of dynamic microstructure trading noise?” We will give strong evidence of the presence of stochastic volatility through Bayes factor methods and stochastic filtering theory. Moreover, we also utilize model selection to provide strong evidence of Hestontype volatility over competing stochastic volatility models based on the observed transaction data in a microstructure market. This suggests that the common viewpoint of the Heston model being highly misspecified might be better stated as overly simplistic macrostructureonly models are underspecified. Bayes factor (see, e.g., Kass and Raftery [20]) is our preferred model selection method since it provides statistical comparisons in real time as to which model best fits the market data while allowing the stochastic valuevolatility (signal) models to be singular to one another. Indeed, to use the Bayes factor method, we need only to be able to transform all microstructure assetprice observation models of interest into the same canonical process via Girsanovtype measure change.
Previously, Zeng [21] studied a filtering equation for inferring the intrinsic value process in a microstructure model while Xiong and Zeng [22] proposed a branching particle approximation to this equation. Kouritzin and Zeng [23] derived a Bayes factor equation and discussed the Bayesian model selection problem to determine whether financial data, such as stock prices, display jumptype stochastic volatility. However, all these works are based on a restricted microstructure model and thus cannot be applied to our general setting. Moreover, our problems of showing statistical evidence of inertia and determining which of the classical stochastic volatility models best represents real data in the presence of microstructure noise were not considered. We also propose a new inertia process, explain its role in modeling prices, and show its statistical significance with real tickbytick data.
Section 2 is devoted to explaining our model. First, our five standard valuevolatility models (GBM, HullWhite, Log OrnsteinUhlenbeck, continuous GARCH, and Simplified Heston) are given followed by our microstructure inertia process and its properties and then the other components of our dynamic microstructure model. Together the valuevolatility and microstructure components form our price evolution model, which, at the end of Section 2, is interpreted as a filtering model. In Section 3, we discuss model calibration and fair price/value estimation through Bayesian filter estimation. A filtering equation and a branching particle filter approximation algorithm are first given and explained. Then, their use to identify parameters and come up with initial state estimates is discussed. Finally, numeric parameter and initial state estimates for each model are given. As a byproduct, it is demonstrated that proper modeling and estimation of fair price (as is done herein) can provide information about overbought conditions and help avoid financial loss (see Figure 4). Section 4 is dedicated to Bayesian model selection. We first motivate the use of Bayes factor for model selection and explain how to estimate Bayes factor from unnormalized particle filters. Then, we establish strong statistical evidence of inertia and Hestontype volatility in all our price data through model selection using the Bayes factor method to test which fair pricevolatility model and what amount of inertia best fit the observed price data.
2. The Partially Observed Market Model
In this section, we build our stochastic model that has macrostructure and microstructure components and interpret this model in terms of a signal that needs to be estimated in real time and observations which are used to form the signal estimates. The macrostructure model consists of fair price, volatility, and related parameters and will be denoted by in the sequel, with being price and volatility and being the parameters for this model. Unlike macrostructure models, we do not assume access to , but rather we take it to be part of the signal to be estimated. Indeed, a model would be judged to be better if the macrostructure price (which represents a “fair” price) is quite different than the observed price and we can use filtering to determine overbought and oversold situations.
The microstructure price construction converts the macrostructure model into the observed price. Such things as inertia (or momentum), feargreed cycles, and wholeprice clustering (or rounding), which are not part of the fair price, are incorporated into the microstructure model. A distinguishing feature in our microstructure is dynamic state: To allow the microstructure to influence price over a period of time so that the observed microstructure price can differ from fair price significantly, one needs to add and then estimate microstructure state . In particular, the inertia process, characterized by a parameter , is introduced to capture price inertia that might be caused by hidden liquidity; various reaction and access times to information as well as momentum traders themselves. This inertia process is not Markov, so we will have to consider the historical version of this state. Further, is also unobservable and hence must be added to the signal along with microstructure parameters and all must be estimated as nuisance parameters.
The nondynamic part of the microstructure noise consists of rounding and clustering noise. It is widely observed in markets that more trades occur at more even prices like whole nickel or whole dollar levels. Therefore, to match observed prices well, we should have a mechanism to convert evenly distributed raw prices into wholepricebiased observed prices. This is done by binning raw prices into sets , , , , and depending on how even they are and then randomly moving raw prices in the less even bins to close prices in the more even bins in order to match the observed prices.
The observations then become the marked counting process of the number of trades that occur at the various prices. We will later use these observations to select and calibrate models and to estimate the augmented signal:The whole point of the microstructure is to allow the macrostructure price to distinguish itself from the observations and rather to represent fair value. We then use filtering on asset prices to estimate implied value (hereafter called fair price) and thereby judge whether an asset is overbought or oversold.
2.1. General Notation
Let be a fixed time period and let be a complete filtered probability space. For any stochastic process , its natural filtration, defined as , represents the information in up to time . denotes the set of nonnegative integers and, for any Polish space is the set of all bounded measurable valued functions on .
2.2. Common Macrostructure State Models
We use a macrostructure model for the unobservable fair price together with its volatility and parameters. Here, is the macrostructure financial state (fair price plus volatility) with macrostructure parameter for some . We let be a probability distribution on , take to be a generator with domain , and assume satisfies the martingale problem.
Definition 1. is the unique solution of the valued martingale problem for with initial distribution . That is,is martingale for each Moreover, if also satisfies (i) and (ii), then and have the same finite dimensional distributions.
Remark 2. While does not vary in time, we include it in our macrostructure model to be estimated because it is still unknown. Nevertheless, the operator does not act on the variable since for our fixed parameters.
The martingale problem formulation (2) (see Stroock and Varadhan [24], Ethier and Kurtz [25] for more details) is general enough to cover most interesting financial models. In this paper, the macrostructure state consists of two components: the fair price and the stochastic volatility (if any). The most common example of in finance is the “geometric Brownian motion” (GBM) utilized in the classical BlackScholes option pricing formula. Throughout this section, and are two independent standard Brownian motions and .
Example 3 (GBM model; see Black and Scholes [1], Merton [2]). We have thatwith parameters , corresponds to our martingale problem with the generator
In GBM model, the volatility is a constant. To account for the “volatility smile” commonly observed in market option prices (see Jackwerth and Rubinstein [5] for a detailed survey), the GBM model is generalized to stochastic volatility (SV) models, where itself is replaced by a stochastic process . Some of the popular SV models include the following.
Example 4 (HullWhite model; see Hull and White [6]). Considerwith parameters and generator
Example 5 (Logarithmic OrnsteinUhlenbeck model; see Scott [26]). We have thatwith parameters and generator
Example 6 (continuous GARCH model; see Nelson [7]). We have thatwith parameters and generator
Example 7 (simplified Heston model; see Heston [4]). We have thatwith parameters and generator
We label this example as simplified because we do not allow and to be correlated as Heston did. There is no mathematical issue by including this correlation, but it would add a parameter to the model, which increases computation time. The Heston model already performed the best without this parameter. GBM (with microstructure) plays a special role in our study as it is our no stochastic volatility model. We will compare our other models against it on real data to determine whether stochastic volatility is present. In summary, refer to Table 1.

Remark 8. The continuous GARCH model is the continuoustime limit of many classical GARCHtype discretetime processes (Nelson [7], Drost and Werker [27]). We did not consider jumping stochastic volatility models (e.g., Elliott et al. [28], Kouritzin and Zeng [23], Duffie et al. [29], Eraker et al. [30], and Eraker [31]) or models where , are correlated, due to our need to dedicate our limited computer resources to handling our complicated (nonMarkov) microstructure with inertia. Still, we want to emphasize that the computational complexity we experienced is fundamental to the fact that we are using nonMarkov (inertia) models and has little to do with our particular methods. Indeed, our Bayes factor filtering methods are what makes the computations possible on an inexpensive contemporary desktop computer.
2.3. Construction of Microstructure Price
The fair pricevolatility models account for the random variances of the intrinsic asset value; thus, the selection of proper SV model is crucial for investing, derivative pricing, and hedging. On the other hand, microstructure noise (see Black [14], Hansen and Lunde [32], Duan and Fulop [33], etc.) causes random perturbations of transaction price from its intrinsic value and the disregard of such trading noise introduces severe bias into stochastic volatility estimation (see Duan and Fulop [33]). We incorporate microstructure trading noise into traditional fair pricevolatility models and use statistical filtering to reveal such things as shortterm inertia in the trading noise and stochastic volatility in the intrinsic value.
In microstructure markets, the price changes occur only at irregularly spaced transaction times with total trading intensity (see Engle [13]). Here, we assume is just a timevarying measurable function as the empirical analysis illustrates that there is no need to consider more general structures. At each transaction time , the transaction price is formulated aswhere is some nonlinear random field modeling the trading noise. Formulation (13) is similar to that of Hasbrouck [16], where is the intrinsic and permanent component while introduces the transitory component.
The empirical evidence reported by Hansen and Lunde [32] suggests strongly that the trading noise is serially correlated. Similar results can be found in AïtSahalia et al. [34]. Indeed, there exist situations in which the trading noise variance estimate is zero if the trading noise is simply assumed to be independent (see Duan and Fulop [33]). This does not mean there is no trading noise but rather that the trading noise is autocorrelated. To characterize this correlation, Hansen and Lunde [32] assume the trading noise to be some Gaussian random sequence with stationary covariance and finite dependence. However, this model is most suitable for the lowfrequency data and ignores many crucial microstructure effects. We build correlation into our microstructure information noise through inertia and meanreversion while utilizing microstructure rounding and clustering noise to explain the discreteness and wholeprice biasing.
2.3.1. Inertia
The idea of momentum or inertia has been used in many studies (see Jegadeesh and Titman [35], Moskowitz and Grinblatt [36], Grundy and Martin [37], Grundy et al. [38], etc.). Basically, there is the tendency for a stock to continue to move in one direction. To illustrate our approach, we introduce the following definition.
Definition 9. A process is said to have stochastic inertia at time if is called the inertia function.
The idea behind our definition is that for inertia we should expect and to have the same sign for , but close to and small. We strengthen this condition toMany processes have inertia. However, to model the stock price effect of the information reaching all market participants, we want the following five properties: (1) is Gaussian and driftless and is proportional to so resembles Brownian motion; (2) is finite, not infinite, indicating that the influence of past values on immediate future is not too strong; (3) makes sense from informational and hidden liquidity points of view; more precisely, it can explain well the price effects due to the reactions of all market participants to information and rumor being diffused and simulated over a period of time as well as due to the purchases or sales of an agent spreading out a large change in his/her position over time; (4) is easy to simulate using, for example, the Gaussian property; (5) is easy to analyze.
Neither a Brownian motion nor more generally a square integrable martingale has inertia. Brownian motion with drift has inertia but we do not want drift. For fractional Brownian motion (FBM) ,where is the Hurst parameter. Therefore,Thus, the inertia function of is infinity for all if (and is if ). Neither case satisfies our five properties. Still, standard representations of FBM motivate the creation of driftless inertia by convolving a Brownian motion with the desired impulse response for information dissemination. With this in mind, we consider the following inertia process.
Definition 10. Our stochastic inertia process iswhere is a dimensional standard Brownian motion, , and
Remark 11. is a weighted average of the historical information (the first term) and fundamental information (the second term). In fact, can be viewed as the impulse response on price created by market participants receiving and simulating the “information” and determines the diffusion speed in the market. This formulation captures the idea that news or rumor and its ramifications require time to be fully disseminated and understood. When , it represents the case of only historical information resulting in the strongest inertia in prices. Alternatively, we can use inertia to explain “hidden liquidity.” If everybody knew that an agent was going to make a big change in a position, then the price would immediately jump. However, if the agent breaks up the desired change into small transactions, then it takes time for this extra buying or selling pressure to be recognized in the market. In this case, represents the case, where all changes in position are done over a period of time and represents the time to effect 58% of the positional change.
Note that is a centered Gaussian process such that the autocovarianceis positive for any . In particular,Thus, converges to as with speed determined by . (Hence, informational noise increases at the same asymptotic rate as Brownian motion.) Moreover,and, using standard antiderivatives,Thus, the inertia function of our inertia process is , the steadystate inertia isand this happens quickly for small . We can thus verify that , defined in (18), satisfies our five desired properties. One can also look upon as the time for new information to be disseminated to fiftyeight percent of the market. Below, we consider three different dissemination times: minutes, hours, and day on real stock data. Finally, the fact that is Gaussian eases its simulation greatly.
2.3.2. Information Noise and Augmented State
Hitherto, we have focused on constructing inertia processes. Now, we include all informational noise into asset prices. Information noise is introduced to represent trading noises due to things like inertia, feargreed cycles, belief heterogeneity, and asymmetric information. For the thtransaction occurring at , the raw price is defined bywhere is the dynamical part of the microstructure through which inertia is introduced (with our inertia process ) and . The case is of particular importance in the sequel as it represents the nondynamical microstructure case and is used as a calibration model.
The information noise consists of two parts: is a sequence of independent standard Gaussian random variables, ; is OrnsteinUhlenbeck (OU) like inertia velocity process with meanreverting parameter Here, , , and are independent and is a constant. provides an intuitive continuoustime model that accommodates the joint presence of the inertia and meanreversion. Our information noise is more reasonable than that of Zeng [21] in that (1) we preclude the possibility of negative prices by using multiplicative noise; (2) the stochastic inertia process captures the empirical feature of the inertia observed in transaction prices (e.g., Jegadeesh and Titman [35]); (3) the meanreverting structure of when combined with the inertia captures the cyclic property of prices (e.g., Black [14]). is not a Markov process, so we introduce its historical process aswhich is Markovian. Moreover, , the space of all continuous functions on , since the paths of are continuous. Consequently, we augment the state vector to bewhere is the microstructure noise parameter set. The advantage of this formulation is that we can estimate and thus jointly with other components using particle filtering methods. The generalized state incorporates fair price, volatility, parameters, and the historical trading noise while keeping the tractability of a Markovian framework.
Remark 12. We include neither nor into the model parameters but rather consider different models corresponding to different values of and as well as different SV models 1–5. Indeed, we will provide evidence of inertia in the sequel by using Bayesian methods to select a model with a large value of based upon tickbytick stock data.
2.3.3. Rounding and Clustering Noise
Our final modeling goal is to convert uniform raw price into observed wholepricebiased price. While raw price can take any value, the trading price is restricted to multiples of the tick, , for some positive integer . The tick size in New York Stock Exchange (NYSE) was switched to $1/16 from $1/8 in June 24, 1997, and then further to $0.01 from January 29, 2001. The empirical studies suggest that the tick size plays an important role in microstructure market analysis (e.g., Huang and Stoll [39]). Since we are concerned with price clustering for decimal pricing in stock markets, we let .
It is well documented that there is price clustering to more whole prices. To quantify this price clustering, we examine the price behavior for three NYSElisted stocks over April 2010 (Figure 1 and Table 2). (In a larger study, we considered eight NYSE stocks in different sectors. However, we only report on three here to conserve space. The results for the other five were similar in nature.)

The transaction data of these stocks shows there is modest clustering at multiples of cents as shown in Figure 1, plotted in terms of pennies. Supposing the raw price falls in the interval , then if there was no clustering noise, the trading price would just be . Thus, the probability of trading at with no clustering noise given , would beEquivalently, we can write in terms of the historical process aswhere is the projection onto time ; that is,Clearly, is a smooth function of for each fixed .
To build the observed wholeprice bias into our model, we introduce the following sets:While the raw price will be uniformly distributed over (or rather the continuous interval ), the observed price model must bias over , over either or , and so forth. We distribute the observed price randomly over based upon the raw price in a biased manner favoring the more wholeprice ticks in . In particular, if the fractional part of the raw price rounded to the nearest cent is in , then the observed value will stay at the same price with probability or move to the closest multiple of cents, that is, the closest tick level in with probability . Then, if the fractional part of the price is in , it will stay in the same level with probability or move to the closest tick level in with probability . Finally, if the fractional part of the price is in , then it will stay in the same level with probability or move to the closest tick level in with probability and the closest tick level in with probability In summary, the transition probability function is obtained iteratively by the following.
Case 1. If the fractional part of belongs to ,
Case 2. If the fractional part of belongs to ,where
Case 3. If the fractional part of belongs to ,where
Case 4. If the fractional part of belongs to ,
Case 5. If the fractional part of belongs to ,Moreover, we have to handle the case separately to avoid negative prices.
Case 6. For ,
Remark 13. Our clustering setup is designed to work well for intrinsic prices over $1. For real penny stocks, our setup would introduce positive bias and should be modified slightly.
Using relative frequency analysis on the aggregate of our three stocks, we found the values presented in Table 3.

The large degree of clustering exhibited, especially to the whole dollar, might be considered surprising. However, earlier studies of Huang and Stoll [39], Chung et al. [40], and Chung et al. [41] also showed significant clustering. Moreover, the degree of price clustering in NYSE is weaker than that of NASDAQ. For example, Barclay [42] examined 472 stocks from NASDAQ before and after their listing in NYSE or American Stock Exchange (AMEX): before the listing, the average fraction of eveneighths () is while thereafter it drops to about 56%.
2.4. Nonlinear Filtering Model
Our price process can be formulated as a marked point process : a sequence of random vectors , where denotes the time of thtrade and the corresponding trading price. Accordingly, the mark space of is , where and is all its subsets. Here, corresponds to the thtick level . For each , we associate the counting processto count the trades in tick level set up to time . In particular, for ,denotes the total trades at thtick level until time . Equivalently, we can introduce the random counting measure on byThe natural filtration, that is, information content, of isNow, we assume the following.
(C1) The total trade process admits an intensity for some positive measurable function .
Therefore, using the conditional probabilities defined in the previous subsection, we find that has intensityTo simplify the notation, we rewrite (44) as .
For our present work, we estimated total intensity function from intertrade data allowing for intraday variation. Figure 2 is the intertrade duration histogram of our 3 NYSElisted stocks averaged over all times of the day. We divided the intertrade data into halfhour periods over the course of the day and took to be constant over these halfhour periods: for in that daily period.
(C2) There exist some positive constants such that for all
Based on representation (40), (44), is framed by a partialobservation model, where is the state (signal), which is partially observed through the infinite dimensional counting process . One difficulty in calibrating these models is that their transition probability functions are usually unknown in closed form, so maximum likelihood estimation (MLE) methods are difficult to use (see AïtSahalia and Kimmel [43] for further details). Instead, we use Bayesian filtering because (1) Bayes estimates do not require the availability or regularity of the full likelihood functions; (2) Bayes estimates can be computed recursively for our tickbytick data; (3) Bayesian hypothesis tests can be conducted through Bayes factor, which is the ratio of marginal likelihoods and is easily computed even when the signals are of different dimension or, more generally, singular to each other.
3. Model Calibration
Our foremost goal is to contribute to the process of model building for financial markets both by suggesting elements to be included in the models and proposing methods to select models based on real observation data. To be able to do this effectively, we need to be able to tune each possible model effectively to get good prior (probability distribution) estimates for the complete signal before the test period. We do this through nonlinear filtering and in particular through particle filtering. In this section, we first introduce the filtering equations for our problem. Then, we introduce a branching particle filter algorithm that is an approximation to the unnormalized filter and can be implemented on a computer. Next, we explain how we did the calibration (i.e., came up with this prior distribution) and finally we give the results for the models of interest herein.
3.1. Nonlinear Filtering Equations
The available information about is the observation filtration , defined in (43), and the primary goal of nonlinear filtering is to characterize the conditional distributionor, equivalently,for Here, , is the long memory portion of our information noise and is the state and parameter of our fair pricevolatility martingale problem.
Remark 14. Actually, we often only want to estimate , but there is no simple recursive formula for this marginal. The filter is naturally model dependent, so we can produce different filtering processes for each model, that is, for each SV choice (1–5), each value of , and each value of in our inertia process.
Suppose is a positive constant for each such that , and consider the continuoustime likelihood function is a martingale under Condition (C2) and , defined byis called the reference measure. Under , the observations are just a Poisson measure, independent of the state vector , with mean measure . To make the likelihoods more manageable in the particle filters to follow, we choose to be a long time average value of and to be highest where the trades will be more concentrated. Bayes Theorem (see Bremaud [44], p. 165) then links the desired (realworld) conditional distribution with the unnormalized filter bywhere the unnormalized filter is defined byfor all . Now, we can give the evolution equation for .
Theorem 15. Under (C1) and (C2), the unnormalized filter is the unique measurevalued solution of the stochastic filtering equationfor and .
This theorem is a modest generalization of prior results and can be obtained in much the same manner as results in Kouritzin and Zeng [23] and Xiong and Zeng [22]. Here, is the generator of the joint martingale problem to obtained from , the generator of state and , the generator of the historical process We do not need an explicit formula for . Instead, we can use particle filters to approximate .
Henceforth, it is convenient to think of the reference measure as the standard measure from which we can construct the measure corresponding to model with parameters and microstructure with parameters , , and .
3.2. Particle Filter
The weighted filter is the simplest of particle filters. The idea behind the weighted filter is that, by the independence of signal from the observations under , we can create an infinite collection of particles , each having the same law as that are also independent of the observations. Then, it follows from the law of large numbers that for almost all we have the weak convergence of finite measuresUnfortunately, it is well known that the weighted particle filter may not work well for a fixed number of particles . Roughly speaking, most of the particles diffuse away, do not track the signal well, are assigned low likelihoods, and do not really affect the average . Meanwhile, very few particles do match the observations better and have likelihoods that are orders of magnitude higher than of the majority of particles. essentially becomes an average over too few particles to reflect well.
To fix the weighted filter particle spread problem, we add particle resampling, resulting in following novel particle filter. (See Gordon et al. [45], Del Moral et al. [46], Del Moral et al. [12], and Ballantyne et al. [47] for earlier algorithms.) For some large , the particle system is constructed as follows.
3.2.1. Initialization
At the initial time , we generate independent particles from the joint prior distribution of . The empirical measure at iswhere is the Dirac measure at . By the strong law of large numbers, so for almost all . Here, for measures so
Remark 16. When , Note that is a constant function defined on . Whereas most particle filters approximate the filter , we will approximate the unnormalized filter to facilitate Bayesian model selection without the storage of prior filter estimates.
We also initialize the number of particles to and particle likelihoods all to .
3.2.2. Evolution
Between observations at and , the particles, , move independently as samples from the transition probability of . In particular, we use the Euler scheme (see, e.g., Kloeden and Platen [48]) to evolve the dynamics, Examples 3–7 and (25). We let denote the evolved version of .
3.2.3. Particle Weights and Average Weight
We simulate using the reference measure and we incorporate the observations based upon (48). At the th observation , the th particle’s weight is multiplied bywhere . Hence, the th particle’s weight becomesand the average weight isNote that in (57) by continuous paths. Here, depends on the observation and the increment of likelihood ratio of measure over measure defined by (48) given the simulated particle path realized on the interval . These weights do not depend upon the parameters directly. This is common and is why the observations are often called partial observations. We still can estimate and include these parameters as part of the particles’ states since they do affect stock price , which is observed in the presence of noise and distortion. The weights are stored along with the states of particles before resampling.
3.2.4. Resampling
After weighting, we resample the particles pruning the unlikely ones and duplicating the better ones in an unbiased manner. In particular, we let be Bernoulli random variable independent of everything and produce particles at location . We then give all the particles weight and let
3.2.5. Unnormalized Filter
Now, we can estimate the unnormalized filter at the th observation time, , byThe actual algorithm that was implemented is as follows.
Initialize. are independent samples of , , , for all , and for .
Repeat. For , do(1)evolve to independently of other particles;(2)weight by observation: for ;(3)estimate by ;(4)average weight: ;(5)repeat: for do(a)offspring number: , with being Bernoulli independent of everything;(b)resample: for ;(c)add offspring number: .
Remark 17. (i) We extract our estimate before resampling to avoid excess noise. (ii) The key step is (5) that determines the new number of particles and weights in an unbiased manner. The result is zero or more particles all having the average weight at the same location as the parent. (iii) The particle evolution would typically be done via Newton’s or Milstein’s method.
Since the above algorithm produces unbiased resampling of the weighted particle filter, it is quite reasonable to believe the following result.
Theorem 18. Under (C1) and (C2), for any and almost all observation paths.
The technicality of this result’s proof would detract from our applications so it is omitted.
3.2.6. Bayesian Estimation
By Bayes rule (50), the particle approximation of the normalized filter isfor all . To get our parameter estimates, we can just set to one component of these parameters, that is, or .
3.3. Calibration and Historical Training
To keep the problem size manageable, we just used the clustering parameter estimates of , , , and given above as the actual values throughout our simulations.
One is often faced with the problem of estimating initial distributions for fair price, volatility, and the parameters prior to filtering over the time interval of interest (April 2010 here). Our approach was to make arbitrary assignments very far in the past (January 3, 2000, to be precise) and then do an excessive amount of prior particle filtering, relying on the ability of the filter to forget its starting point and to produce reasonable distributions at a much later point, April 1, 2010. (See, e.g., Ocone and Pardoux [49], Delyon and Zeitouni [50], and Atar [51] for mathematical results regarding this phenomenon.) This had to be done for every model, namely, every combination of our three stocks, five SV models, and multiple microstructure models, characterized by inertia parameters. Our main purpose in this historical training was to get a starting joint distribution for as of April 1, 2010, under each model combination. Due to the large number of cases this produced, we first display and discuss two models: the nondynamical microstructure Heston case and the median inertia dynamical case where and s (i.e., hrs) in the inertia microstructure model. Also, to ensure that and did not converge to a single value, we made them vary slightly in a random manner; that is, we replaced the equation with for a very low variance Brownian motion .
In Figure 3, we illustrate our prior filtering of PepsiCo. The choppiest curve is the actual stock price while the smoothest curve is the filter’s fair price estimate using the Heston SV model with (median) microstructure inertia. The middle curve is the filter’s fair price estimate using the Heston SV model without dynamics in the microstructure; that is, . These curves go beyond April 1, 2010. However, the required initial distributions were taken from the filter at that point.
Notice from Figure 3 that the implied fair price process estimate is far less volatile in the presence of dynamical microstructure than without. This lower volatility for fair price is highly desirable. It does not make sense that the fair price of a stock should fluctuate dramatically from day to day or within a day in the absence of an event, but rather these shortterm fluctuations are better explained by trading noise. Moreover, fair price is a mathematically more optimal version of moving averages, which are used to judge value and momentum from, and so fair price estimates should inherit the smooth nature of such moving averages. From a modeling perspective, this fair price smoothness indicates that dynamical microstructure (with inertia) can replace much of what stochastic volatility tries to do and leads to one of our central questions addressed below. Is stochastic volatility necessary in the presence of dynamical microstructure?
3.4. Numerical Results
The data is one month (April 2010) of transaction prices of our three NYSElisted stocks. Our filter produces Bayes estimates to the macro and microparameter vectors and , respectively. These estimates in the nondynamical microstructure case (i.e., using the simpler form in (24)) for PepsiCo are as shown in Table 4.

All parameters are estimated using time in seconds. Our PepsiCo Bayes estimates in the median inertia case are as shown in Table 5.
