Journal of Probability and Statistics

Volume 2019, Article ID 8740426, 24 pages

https://doi.org/10.1155/2019/8740426

## Hierarchical Models and Tuning of Random Walk Metropolis Algorithms

Département de Mathématiques et Statistique, Université de Montréal, Montréal, Canada H3C 3J7

Correspondence should be addressed to Mylène Bédard; ac.laertnomu@dradeb.enelym

Received 15 April 2019; Revised 4 July 2019; Accepted 17 July 2019; Published 26 August 2019

Academic Editor: Hyungjun Cho

Copyright © 2019 Mylène Bédard. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We obtain weak convergence and optimal scaling results for the random walk Metropolis algorithm with a Gaussian proposal distribution. The sampler is applied to hierarchical target distributions, which form the building block of many Bayesian analyses. The global asymptotically optimal proposal variance derived may be computed as a function of the specific target distribution considered. We also introduce the concept of locally optimal tunings, i.e., tunings that depend on the current position of the Markov chain. The theorems are proved by studying the generator of the first and second components of the algorithm and verifying their convergence to the generator of a modified RWM algorithm and a diffusion process, respectively. The rate at which the algorithm explores its state space is optimized by studying the speed measure of the limiting diffusion process. We illustrate the theory with two examples. Applications of these results on simulated and real data are also presented.

#### 1. Introduction

Random walk Metropolis (RWM) algorithms are widely used to sample from complex or multidimensional probability distributions [1, 2]. The simplicity and versatility of these samplers often make them the default option in the MCMC toolbox. Implementing a RWM algorithm involves a tuning step, to ensure that the process explores its state space as fast as possible and that the sample produced be representative of the probability distribution of interest (the target distribution). In this paper, we solve an aspect of the tuning problem for a large class of target distributions with correlated components. This issue has mainly been studied for product target densities, but attention has recently turned towards more complex target models [3, 4]. The specific type of target distribution considered here is formed of components which are related according to a hierarchical structure. These distributions are ubiquitous in several fields of research (finance, biostatistics, and physics, to name a few) and constitute the basis of many Bayesian inferences.

Bayesian hierarchical models comprise a likelihood function , which is the statistical model for the observed data . The parameters are then modeled using a prior distribution ; since this prior might not be easy to determine, it is common practice to assume that the hyperparameters are themselves distributed according to a noninformative prior distribution . The various models thus represent different levels of hierarchy and give rise to a posterior distribution , which is often quite complex. Most of the time, this distribution cannot be studied analytically or sampled directly, and thus simulation algorithms such as MCMC methods are required to perform a statistical analysis. Samplers such as the RWM, RWM-within-Gibbs, and Adaptive Metropolis (see [5]) are usually the default algorithms for such targets.

The idea behind RWM algorithms is to build a Markov chain having the Bayesian posterior (target) distribution as its stationary distribution. To implement this method, users must select a proposal distribution from which are generated candidates for the Markov chain. This distribution should ideally be similar to the target, while remaining accessible from a sampling viewpoint. A pragmatic choice is to let the proposed moves be normally distributed around the latest value of the sample. Tuning the variance of the normal proposal distribution () has a significant impact on the speed at which the sampler explores its state space (hereafter referred to as “efficiency”), with extremal variances leading to slow-mixing algorithms. In particular, large variances seldom induce suitable candidates and result in lazy processes; small variances yield hyperactive processes whose tiny steps lead to a time-consuming exploration of the state space. Seeking for an intermediate value that optimizes the efficiency of the RWM algorithm, i.e., a proposal variance offering sizable steps that are still accepted a reasonable proportion of the time, is called the optimal scaling problem.

The optimal scaling issue of the RWM algorithm with a Gaussian proposal has been addressed by many researchers over the last few decades. It has been determined in [6] that target densities formed of independent and identically distributed (i.i.d.) components correspond to an optimal proposal variance , where is the density of one target component and the number of target components. This optimal proposal variance has also been shown to correspond to an optimal expected acceptance rate of 23.4%, where the acceptance rate is defined as the proportion of candidates that are accepted by the algorithm. Generalizing this conclusion is an intricate task, and further research on the subject has mainly been restricted to the case of target distributions formed of independent components (see [7–12]). In the specific case of multivariate normal target distributions however, the optimal variance and acceptance rate may be easily determined (see [11, 13]). Lately, [3, 4] have also performed scaling analyses of nonproduct target densities. These advances are important, as MCMC methods are mainly used when dealing with complex models, which only rarely satisfy the independence assumption among target components. These results however assume that the correlation structure among target components is known and used in generating candidates for the chain. This is a restrictive assumption that leads, as expected, to an optimal acceptance rate of 23.4% (see [12] for an explanation).

In this paper, we focus on solving the optimal scaling problem for a wide class of models that include a dependence relationship, the hierarchical distributions. Weak convergence results are derived without explicitly characterizing the dependency among target components, and thus rely on a Gaussian proposal distribution with diagonal covariance matrix. The optimal proposal variance may then be obtained from these results, i.e., by maximizing the speed measure of the limiting diffusion process. This constitutes significant advances in understanding the theoretical underpinnings of the RWM sampler. More importantly in practice, the results theoretically support the use of RWM-within-Gibbs over RWM samplers and provide a convenient approach for obtaining a new type of proposal variances. These proposal variances are a function of the current state of the Markov chain; they thus evolve with the chain and lead to more appropriate candidates in the RWM-within-Gibbs algorithm.

In the next section, we describe the target distribution and introduce some notation related to the RWM sampler. The theoretical optimal scaling results are stated in Section 3 and then illustrated with two examples using RWM samplers in Section 4. In Section 5, the potential of RWM-within-Gibbs with local scalings is illustrated in Bayesian contexts through a simulation study and an application on real data. Extensions are briefly discussed in Section 6, while appendices contain proofs.

#### 2. Framework

Consider an -dimensional target distribution consisting of a mixing component and conditionally i.i.d. components () given . Suppose that this distribution has a target density with respect to Lebesgue measure, where

To obtain a sample from the target density in (1), we rely on a RWM algorithm with a Gaussian proposal distribution. This sampler builds an -dimensional Markov chain having as its stationary density. Given , the time- state of the Markov chain, one iteration is performed according to the following steps:(1)Generate a candidate from a , where is a diagonal variance matrix with elements . In particular, set , where is a tuning parameter and the -dimensional identity matrix(2)Compute the acceptance probability (3)Generate (4)If , accept the candidate and set ; otherwise, the Markov chain remains at the same state for another time interval and

Optimal scaling results widely rely on the use of Gaussian proposal distributions which, due to their symmetry, lead to a simplified form of the acceptance probability. Although generally not emphasized in the literature, we note that the proposal variance could also be a function of , which would result in a nonhomogeneous random walk sampler. In that case, there would be no simplification in the Metropolis–Hastings acceptance probability and Step 2 would then replaced bywhere is the density of a .

In what follows, we work towards finding the optimal value of , i.e., leading to an optimally mixing chain. The proofs of the theoretical results rely on CLTs and LLNs; as such, the results are obtained by letting . This is a common approach in MCMC theory and does not prevent users from applying the asymptotically optimal value of in lower dimensional contexts (as small as or 15). Indeed, a particularity of optimal scaling results is that the asymptotic behaviour kicks in extremely rapidly, as shall be witnessed in the examples of Section 4.

The first thought of most MCMC users when facing a target density as in (1) would be to use a RWM-within-Gibbs algorithm, which consecutively updates subgroups of the components in a given iteration. The tuning of RWM-within-Gibbs algorithms has been addressed in [11], but only for target distributions with i.i.d. components and Gaussian targets with correlation. Focusing on RWM algorithms is thus a good starting point to understand the behaviour of samplers applied to hierarchical target distributions. The results expounded in this paper lead to the concept of local tunings, which is particularly appealing in the context of RWM-within-Gibbs. Incidentally, the proofs in appendices provide a theoretical justification for the use of locally optimal scalings in RWM-within-Gibbs (see [14]). These findings are illustrated in the examples of Section 5.

In Sections 2.1, 2.2, and 3, we expound how to obtain asymptotically optimal variances and for RWM and RWM-within-Gibbs, respectively. Section 2.1 describes the regularity conditions imposed on , while Section 2.2 explains why the proposal matrix is the optimal choice for obtaining the theoretical results that shall be presented in Section 3.

##### 2.1. Assumptions on the Target Density

To characterize the asymptotic behaviour of the conditionally i.i.d. components (), we impose some regularity conditions on the densities and in (1). The density is assumed to be a continuous function on, with forming an open interval.

For all fixed , is a positive density on and is Lipschitz continuous with constant such that . Here, denotes the space of real-valued functions with continuous second derivative. For all fixed , is a function on and is Lipschitz continuous with constant such that . Furthermore,and, hereafter, the notation means that the expectation is computed with respect to conditionally on the other variables in the expression; the first expectation in (3) is thus obtained according to the conditional distribution of given . Where there is no confusion possible, shall be used to denote an expectation with respect to all random variables in the expression. The above regularity conditions constitute an extension of those stated in [8] for target distributions with independent components and are weaker than would be a Lipschitz continuity assumption on the bivariate function . They also imply that the Lipschitz constants and themselves satisfy a Lipschitz condition.

We now impose further conditions on to account for the movements of the coordinate when studying the asymptotic behaviour of a component (). These movements should not be too abrupt, so for almost all fixed , is Lipschitz continuous with constant such that and

Finally, in order to characterize the asymptotic behaviour of the mixing component , we introduce assumptions that are closely related to the Bernstein von Mises Theorem. Let , , and denote convergence in probability. Assume that , and denote such that as , with . Hereafter, we make a small abuse of notation by letting and sometimes denote the random variable or the realisation, depending on the context. Furthermore, define ; for almost all , the conditional density of given , , is assumed to converge almost surely to , a continuous density on with respect to Lebesgue measure. In fact, the information on increases linearly in , meaning that the limiting density of is degenerate, but that a standard rescaling leads to a nontrivial density on (normal distribution).

##### 2.2. Form of the Proposal Variance Matrix

In Section 3, we focus on deriving weak convergence and optimal scaling results for the RWM algorithm with a Gaussian proposal by letting , the dimension of the target density in (1), approach . Traditionally, asymptotically optimal scaling results have been obtained by studying the limiting path of a given component ( say) as . In the case of target distributions with i.i.d. components (and some extensions), the components of the RWM algorithm are asymptotically independent of each other and their limiting behaviour is regimented by identical one-dimensional Markovian processes. In the current correlated framework, we expect the presence of an asymptotic dependence relationship among () and , in the spirit of (1). In the following section, we thus study the limiting behaviour of components and separately, on their respective conditional space. This approach allows us to quantify the mixing rate of each component conditionally on the others and to propose optimal scalings for the sampler.

To obtain nontrivial limiting processes describing the behaviour of the RWM sampler as , we need to fix the form of the proposal scalings . Whilst the proposals are independent, a single accept-reject step is used, which makes the paths of the components dependent. We aim to choose the maximal scalings that avoid a degenerate limit (of either 0 or 1) for this acceptance probability. Since the distribution of conditional on contracts at a rate of , if , the proposed jumps in will be too large. If , then the change in makes no contribution to the acceptance probability in the limit; to maximise movements, we, therefore, require . Now, the conditional distribution of given does not contract with . Nonetheless, when proposing jumps in using , the odds of rejecting an -dimensional candidate increase with and lead to a degenerate (null) acceptance probability. To overcome this problem, we then let the proposal variance be a decreasing function of the dimension. In fact, since Lipschitz conditions control the contribution to the accept-reject ratio coming from the movements of , a similar argument to that which leads to in the case of i.i.d. targets applies again here. We therefore set , where is a tuning parameter and the -dimensional identity matrix.

As , it becomes necessary to speed up time to compensate for the reduced movement along components . The time interval between each proposed candidate is thus set to , and we study the continuous-time, sped up version of the initial Markov chain defined as , where is the floor function. Similarly to the i.i.d. case, a limiting diffusion is obtained for the rescaled one-dimensional process related to (), but this time its behaviour is conditional on .

Since the first coordinate converges to a point , a transformation is required to obtain the limiting behaviour of this component. We thus study the continuous-time process ; in other words, we are now looking at a magnified, centered version of the path associated to . This transformation leads to proposal distributions and , with ; it thus cancels the effect of in . Without the speed up of time, the limiting process for is then a propose-accept-reject on the conditional density for , given the current values of ; this is made precise in Theorem 1. When considering the diffusion limit for with time sped-up, this effectively means that at every instant, is simply a sample from its conditional distribution given the current values of ; this is made precise in Theorem 2.

We note that an alternative scaling of could also be applied. The sped-up limiting process would then be a diffusion for all coordinates and would be easier to describe. However, this would also be a deliberate handicapping of the algorithm since the change in would make no contribution to the acceptance probability in the limit. A suboptimal , besides altering the movements of , would thus also indirectly affect the efficiency according to which explores its state space.

#### 3. Asymptotics of the RWM Algorithm

In this section, we introduce results about the limiting behaviour (as ) of the time- and scale-adjusted univariate processes and (). From these results, we determine the asymptotically optimal scaling (AOS) values and acceptance rate (AOAR) that optimize the mixing of the algorithm.

Hereafter, we let denote weak convergence in the Skorokhod topology and a Brownian motion at time ; the cumulative distribution function of a standard normal random variable is denoted by .

Theorem 1. *Consider a RWM algorithm with proposal distribution used to sample from a target density as in (1). Suppose that satisfies the conditions on and specified in Section 2.1 and that is distributed according to in (1).*

Ifwiththen the magnified process . Here, and () are distributed according to the densities and , respectively, which implies that is distributed according to the density in Section 2.1. Given the time- state , the process evolves as the continuous-time version of a special RWM algorithm applied to the target density ; the proposal distribution of this algorithm is a , and the acceptance rule is defined as

*Proof. *See Appendix A.1.

This result describes the limiting path associated to the coordinate as , which is Markovian with respect to the history of the multidimensional chain . We recall that the conditional distribution of given contracts at a rate of and that . Conditionally on , the transformed thus mixes according to and explores its conditional state space much more efficiently than the other components, as shall be witnessed in Theorem 2. The asymptotic process found can be described as an atypical one-dimensional RWM algorithm, whose acceptance rule and target density both vary according to at every iteration. The acceptance function in (7) satisfies the reversibility condition with respect to (see [8] for more details about this acceptance function).

Theorem 1 is interesting from a theoretical perspective but cannot be used to optimize the global mixing of the algorithm. Although we could try to determine the value of leading to the optimal mixing of on its conditional space, it will be wiser to focus instead on optimizing the mixing rate of on its own conditional space given . Since the distribution of contracts about , the position of this coordinate heavily depends on the current state of . We shall also see in Theorem 2 that given , the coordinates () explore their conditional state space according to . Since these coordinates take more time exploring their conditional distribution and heavily affect the position of , the global performance of the sampler is subjected to the mixing of conditionally on .

Theorem 2. *Consider a RWM algorithm with proposal distribution used to sample from a target density as in (1). Suppose that satisfies the conditions on and specified in Section 2.1 and that is distributed according to in (1).*

For , we have , where is distributed according to , and according to . Conditionally on , the evolution of over an infinitesimal interval satisfieswith

*Proof. *See Appendix A.2.

Equation (8) describes the behaviour of the process at the next instant, (), given its position at . This expression should not come as a surprise: each rescaled component () asymptotically behaves according to a diffusion process that is Markovian with respect to . Examination of (8) also tells us that is invariant for this diffusion process (see [15], for instance). We finally recall that and therefore, conditionally on , the rescaled mixes according to . Each coordinate thus requires more iterations than were required by the coordinate to explore its conditional state space.

Since and use different time rescaling factors, the asymptotic behaviour of these coordinates cannot be expressed as a bivariate diffusion process. To obtain such a diffusion, we would have to rely on inhomogeneous proposal variances to ensure that also mixes in iterations; as mentioned at the end of Section 2, this would require setting , for . This framework would, of course, be suboptimal as it would restrain the movements. Proposed jumps for would then become insignificant, and so the first term in (11) would be null.

*Remark 1. *Studying the limiting behaviour of and () separately does not cause information loss. In fact, studying the paths of simultaneously would require letting the test function of the generator in (A.3) be a function of . Such a generator would however be developed as an expression in which cross-derivative terms (e.g., ) are null, which confirms that given the current state of the asymptotic process, one-dimensional moves are performed independently for each coordinate.

The limiting processes in Theorems 1 and 2 indicate that the component explores its conditional state space at a different (higher) rate than explores its own. Combined to the specific Markovian forms of the limiting processes obtained (with respect to and , respectively), this points towards the need for updating and separately, assessing the superiority of RWM-within-Gibbs samplers for sampling from hierarchical targets. These algorithms update blocks of components successively, a design that allows fully exploiting the characteristics of the target considered. To our knowledge, this is the first time that asymptotic results are used to theoretically validate the superiority of RWM-within-Gibbs over RWM samplers for hierarchical target distributions. This theoretical superiority is obviously tempered in practice by an increased computational effort; the extent of this computational overhead is however difficult to quantify in full generality. To this end, Section 5 presents two examples that illustrate the performance of the RWM-within-Gibbs and compare it to RWM and Adaptive Metropolis samplers.

##### 3.1. Optimal Tuning of the RWM Algorithm

To be confident that the -dimensional chain has entirely explored its state space, we must be certain that every one-dimensional path has explored its own space. In the correlated framework considered, the overall mixing rate of the RWM sampler is only as fast as the slowest component. As explained in Section 3, optimal mixing of the algorithm shall be attained by optimizing the mixing of the coordinates , . In the limit, the only quantity that depends on the proposal variance (i.e., on ) is in (9). To optimize mixing, it thus suffices to find the diffusion process that goes the fastest; i.e., the value of for which the speed measure is optimized.

The speed measure in (9) is quite intuitive; it is in fact similar to that obtained when studying i.i.d. target densities. The main difference lies in the form of which, in the i.i.d. case, is given by the constant term . The second term in (11) is thus equivalent to and consists in a measure of roughness of the conditional density under a variation of (). In the case of hierarchical target distributions, we find an extra term that might be viewed as a measure of roughness of under a variation of . This term is weighted by , the square of the (standardized) candidate increment for the first component; in other words, the further the candidate is from the current , the greater is the weight attributed to the associated measure of roughness. Of course, in optimizing the speed measure function, we do not need to know in advance the exact value of the proposed standardized increment ; the speed measure averages over this quantity.

It is interesting to note that optimizing the speed measure leads to local proposal variances of the form . Such proposal variances would then be used for proposing a candidate at the next instant , given the position of the mixing coordinate at time . These local proposal variances thus vary from one iteration to another, by opposition to usual tunings in the literature that are fixed for the duration of the algorithm. Naturally, if both expectations in (11) are constant with respect to , then the proposal variance obtained by maximizing the speed measure also is constant.

*Remark 2. *It turns out that local proposal variances optimizing (9) are bounded above by , the asymptotically optimal scaling (AOS) values for targets with i.i.d. components given a fixed . Indeed, if is fixed across iterations, we find ourselves in an i.i.d. setting and the associated speed measure is expressed as . The mentioned upper bounds then follow from the fact that the function in (9) decreases faster in than in the above expression.

Relying on a local variance to propose a candidate for the next time interval is usually time-consuming, as it involves numerically solving for the appropriate local proposal variance at every iteration. Since the process is assumed to start in stationarity and explores its conditional state space faster than the other coordinates, we might determine a value that is fixed across iterations by integrating the speed measure over with respect to the marginal distribution . Hence, the global (unconditional) asymptotically optimal scaling value maximizes the functionwhere is the probability density function of a standard normal random variable.

*Remark 3. *The asymptotic process introduced in Theorem 2 naturally leads us to the concept of local proposal variances. It is however unclear whether the local tunings obtained by maximizing (9) really optimize the mixing rate of the algorithm. Indeed, the proof of Theorem 2 is carried out with constant; this allows, among other things, relying on the simplified form for the acceptance probability. In order to claim that the local proposal variances obtained are optimal, a weak convergence result would need to be proven using a general proposal variance of the form . This extension is not trivial, as the ratio of proposal densities would then need to be included in the acceptance probability. Since the concept of locally optimal proposal variances is numerically demanding in the current framework, we choose to focus on constant.

In RWM-within-Gibbs, the blocks and are updated consecutively and the situation is therefore different. In that case, local variances of the form obtained by maximizing (9) may be used to update the block . Since is updated separately, the first term in (11) is null, which makes local variances easier to compute. Furthermore, since local variances only depend on (which is updated separately), the ratio is equal to 1 and does not need to be included in the acceptance probability. Local variances are thus very appealing in that context and shall be studied in Section 5.

Rather than tuning the sampler using the global AOS value, one may instead monitor the acceptance rate in order to work with an optimally mixing version of the RWM algorithm. To express optimal scaling results in terms of acceptance rates, we introduce the expected acceptance rate of the -dimensional stationary RWM algorithm with a normal proposal:where denotes the probability density function of an -dimensional standard normal random variable. Optimal mixing results for the RWM sampler are summarized in the following corollary.

Corollary 1. *In the settings of Theorem 2, the global asymptotically optimal scaling value maximizes*

Furthermore, we have thatand the corresponding asymptotically optimal acceptance rate is given by .

In contrast to the i.i.d. case, the AOAR found is not independent of the densities and . Hence, there is not a huge advantage in choosing to tune the acceptance rate of the algorithm over the proposal variance; in fact, both approaches involve the same effort. Although it would also be possible to compute an overall acceptance rate associated to using local proposal variances, it could not be used to tune the algorithm. Building an optimal Markov chain based on local proposal variances would imply modifying the proposal variance at every iteration, which cannot be achieved by solely monitoring the acceptance rate.

For simplicity, the theoretical results expounded in this section attribute the same tuning constant to all components. In practice, when a RWM algorithm is used to sample from a hierarchical target, users will likely want to use a different proposal variance for the mixing component . In fact, the proofs of Theorems 1 and 2 easily generalize to the case of inhomogeneous proposal variances.

Corollary 2. *Let with and , where are independent. Then, Theorems 1 and 2 hold as stated, except that the limiting proposal distribution in Theorem 1 is and the random variable in Theorem 2 is such that .*

In this paper, we consider the simple, yet useful hierarchical model described in (1) and featuring a single mixing component . This is a natural starting point to study weak convergence of RWM algorithms for hierarchical targets, and even for correlated targets in general. There exist many generalizations of (1), just as there are many extensions of the proposal distribution considered. Some extensions of the hierarchical target are considered in the discussion, but we do not aim at presenting a detailed treatment of these cases.

#### 4. Numerical Studies

To illustrate the theoretical results of Section 3, we consider two toy examples: the first target distribution considered is a normal-normal hierarchical model in which the components are related through their mean, while the second one is a gamma-normal hierarchical model in which are related through their variance. In both cases, we show how to compute the optimal variance . We also study the performance of RWM samplers and conclude that even in relatively low-dimensional settings, the samplers behave according to the asymptotic results previously detailed.

##### 4.1. Normal-Normal Hierarchical Distribution

Consider an -dimensional hierarchical target such that and for . To sample from this distribution, we use a RWM algorithm with a proposal distribution. This simple target shall relate Theorem 2 to the theoretical results derived in [8].

Standard calculations lead to ; as , almost surely. If we let and , then . Furthermore, the term is reexpressed as and thus converges in probability to , where denote independent standard normal random variables. By Theorem 1, we can thus affirm that the component asymptotically behaves according to a one-dimensional RWM algorithm with a standard normal target and acceptance function as in (7); these do not, in the current case, depend on .

Evaluating the function in (11) is a simple task and leads to . The AOS value is then found by maximizingwith respect to , where . This yields an AOS of and a corresponding AOAR of . These values are naturally smaller than those obtained for a target with i.i.d. components (5.66 and 0.234, respectively); indeed, the proposal distribution is formed of i.i.d. components and accordingly better suited for similar targets. Relying on a proposal with correlated components would however require a certain understanding of the target correlation structure, which goes against the general framework we wish to consider.

It is worth pointing out that the speed measure of the limiting diffusion process does not depend on in the present case. This holds for arbitrary densities and satisfying the conditions in Section 2.1, provided that is a location parameter for (). Since a variation in the location parameter does not perturb the roughness of the distribution, the AOS and AOAR found are valid both locally and globally. This means that , which remains fixed across iterations, is the best possible proposal scaling conditionally on the last position of the component (i.e., ).

A second peculiarity of this example is that the target distribution is jointly normal with mean and covariance matrix given by , (), and (). Normal distributions being invariant under orthogonal transformations, we can find a transformation under which the target components become mutually independent. The covariance matrix is thus transformed into a diagonal matrix whose diagonal elements consist in the eigenvalues of . In moderate to large dimensions, the eigenvalues can be approximated by . It turns out that the optimal scaling problem for target distributions of this sort (i.e., formed of components that are i.i.d. up to a scaling term) has been studied in [13]. Solving for the AOS value and AOAR of the transformed target using Theorem 1 and Corollary 2 in [8] leads to values that are consistent with those obtained using Theorem 2 in Section 3.

To illustrate these theoretical results, we consider the 20-dimensional normal-normal target described above and run 50 RWM algorithms that differ by their proposal variance only. For each sampler, we perform 100,000 iterations (sufficient for convergence according to the autocorrelation function) and measure efficiency by recording the average squared jumping distancewhere is the number of iterations and is the dimension of the target distribution. We also record the average acceptance rate of each algorithm, expressed as

We repeat these steps for 50- and 100-dimensional normal-normal targets and combine all three curves of efficiency versus acceptance rate on a graph along with the theoretical efficiency curve of versus the expected acceptance rate (Figure 1(b), bottom set of curves). To assess the limiting behaviour of the coordinate , we also plot the ASJD of this single component (for the 20-, 50-, and 100-dimensional cases) along with the ASJD for the limiting one-dimensional RWM sampler described in Theorem 1 (Figure 1(a), top set of curves).