Abstract

The concepts of standard analysis techniques applied in the field of Fourier spectroscopy treat fundamental aspects insufficiently. For example, the spectra to be inferred are influenced by the noise contribution to the interferometric data, by nonprobed spatial domains which are linked to Fourier coefficients above a certain order, by the spectral limits which are in general not given by the Nyquist assumptions, and by additional parameters of the problem at hand like the zero-path difference. To consider these fundamentals, a probabilistic approach based on Bayes’ theorem is introduced which exploits multivariate normal distributions. For the example application, we model the spectra by the Gaussian process of a Brownian bridge stated by a prior covariance. The spectra themselves are represented by a number of parameters which map linearly to the data domain. The posterior for these linear parameters is analytically obtained, and the marginalisation over these parameters is trivial. This allows the straightforward investigation of the posterior for the involved nonlinear parameters, like the zero-path difference location and the spectral limits, and hyperparameters, like the scaling of the Gaussian process. With respect to the linear problem, this can be interpreted as an implementation of Ockham’s razor principle.

1. Introduction

Fourier spectroscopy is a diagnostic application which reveals information about spectral quantities like refractive index, absorption, and transmission of a medium under test. In addition, the characterisation in absolute terms is possible for broadband spectra, for example, emitted by electrons of a high-temperature plasma, being magnetically confined [1].

Commonly, an interferometer diagnostic, let us say of Michelson [2] or Martin-Puplett [3] design type, probes the Fourier transform of a spectral quantity. The corresponding interferometric data is a discrete set with finite length and includes noise contributions. Standard Fourier data analysis techniques [46] have been developed. These techniques lack describing and capturing properly several fundamental aspects, like the noisy nature of measured data and possible spectral limits and their impact on the spectral quantity to be inferred.

One misconception, arising from the standard formulation, is that certain spectral information must be lost inherently, because only a finite amount of data is acquired. This is proven in standard literature by evaluating the convolution function which has a finite full width at half maximum (FWHM), implying that only a finite amount of Fourier coefficients is accessible via measurements. While this conclusion remains valid when a continuous spectrum is probed, the reasoning does not hold in general for a discrete spectrum. This fact was exploited to develop a (self-)deconvolution procedure, so that some discrete lines which were separated by less than the FWHM width of the convolution function have been inferred [7].

Opposed to the standard data analysis techniques, a probabilistic ansatz was introduced to estimate model parameters, like amplitude spectra and frequencies, in the field of Fourier spectroscopy [8, 9]. For a spectroscopic problem, a model is formulated via Bayes’ theorem which allows to state prior information about model parameters. Furthermore, a Gaussian likelihood connects functionally the parameters with the noisy interferometric data. Then, after having measured a noisy data set and framing the reality by a certain model, the knowledge about model parameters is expressed in probabilistic terms by the posterior. This approach [8, 9] demonstrated that the uncertainty on the posterior mean of a frequency to be estimated for a single-frequency problem can be orders of magnitude below the FWHM width of the equivalent convolution function. On top of that, a criterion has been derived how far frequencies need to be separated, so that the Bayesian approach is still able to make a distinction. This separation can be well below the FWHM width of the convolution function. These findings are a direct consequence of the use of probabilistic theory.

One of the advantages of the Bayesian approach is that different models and, hence, their assumptions can be compared with each other also known as Ockham’s razor. This enables the identification of the best, that is, the most likely, model, complying with the data. Given this context, the (self-)deconvolution procedure mentioned above is interpreted here as an optimisation to find a minimising set of discrete frequencies to describe the data sufficiently. In general, the most fundamental issue is whether the spectrum to be inferred is discrete or continuous. If a discrete spectrum is more likely or follows by a physics model, how many discrete frequencies are involved and what are their estimates including uncertainties? Each frequency is associated with an amplitude and a phase which need to be estimated as well. If a continuous spectrum is at hand, then the spectral limits are of interest. In addition, in case, the underlying physics process is understood, what is the uncertainty on the spectrum and the phase following from experimentally inaccessible regions in the data domain.

The investigation of the fundamental issues listed above is quite challenging from numerical point of view. However, the computational effort is largely reduced when even and odd amplitudes are used instead of the phase and amplitude. This ensures a linear dependence between the even and odd amplitude parameters and the data. Formulating the prior information about all even and odd amplitudes as a multivariate normal with a specific prior mean and covariance gives straightforwardly the posterior mean and covariance. Furthermore, the marginalisation can be carried out analytically for these linear parameters. The remaining posterior quantity carries information about the nonlinear model parameters like frequencies or spectral limits and so-called hyperparameters, entering merely in the prior.

In Section 2, the basic equations for Fourier spectroscopy and their implications are generally investigated, and the main concepts of the standard analysis and their drawbacks are pointed out. Section 3 of this paper presents a Bayesian formalism, so that it lends itself to applications in the field of Fourier spectroscopy for Gaussian (white) noise. The fundamental information about the spectral quantity, that it must vanish at a lower and upper spectral limit, can be stated by the covariance function of a Brownian bridge process as described in Section 4. This covariance is used as prior in the example application of the Bayesian approach presented in Section 5 to infer continuous but band-limited spectral quantities given an actually measured interferometric data set. Thereby, some diagnostic imperfections like a drifting signal offset, the zero-path difference, and a nonuniform spatial sampling are also taken into account. Section 6 discusses the strategy for a plausibility study of models, using different priors for the spectral quantities, and attempts to compare results and computational efforts obtained with the Bayesian model and with a standard model. The last section presents the conclusions.

2. Fourier Spectroscopy

2.1. General Definitions

Commonly, the complementary coordinates used for Fourier transformations are the frequency and time , or the wavenumber and a spatial coordinate . For the moment, the latter pair is used to state the basic operations of Fourier transformation. Afterwards, the wavenumber is replaced via for convenience.

The real-valued continuous functions and form a Fourier transformation pair stated byThe inverse operation readsNote that so far holds. To find a relation, one inserts (1) in the above expression. After applying trigonometric identities, the spatial integral becomesbecause the sinusoidal contribution vanishes. Hence,follow, because the two wavenumber coordinates equal each other. In fact, the delta distribution occurs, because is a distribution.

Replacing with gives in the spectral domain the quantity . This givesand rescales the inverse operation liketo match the units /Hz.

2.2. Selection of Representation

The spectral domain over which the integration is performed in (5) includes the whole negative and the positive ranges. While the cosine transform acts on the even part of , the sine transform is linked only to the odd part . Thus, an alternative formulation readsand it becomes clear that the even and the odd parts of and are connected.

Another representation uses the amplitude and the phase by settingwhich givesBecause the model representations (5) and (7) are linear in , , and , both are favoured over the formulation (9), when linear inversion techniques are to be applied. Of both favoured representations, the one, using even and odd parts, has preferred properties. Since the orders of magnitude of the amplitudes can be quite different for the even and odd functions, a separation is logical.

2.3. Finite Bandwidth

If the spectral domain is band-limited, such that and are finite for the range from and with the bandwidth and the centre frequency , (7) becomesHere, the assumption must be mentioned that both functions have the same spectral limits. This is assumed in the following but not mandatory. Furthermore, for any combination of and the relation must be fulfilled to be meaningful.

2.3.1. Relation: Fourier Transform-Fourier Coefficients

When the bandwidth is finite, one can express the spectral functions by Fourier coefficients multiplied each with the associated sinusoidal basis function of order (). These coefficients are defined here by the integralswhich carry the unit as and . The coefficients label the mean values of and in the spectral domain covered. Then, one can replace the even and odd functions withwhich allows performing the spectral integration in (10) analytically. Since the resultwithfollows, it becomes clear that the Fourier transform of a band-limited function can be expressed by Fourier coefficients scaled with the bandwidth and multiplied each with the associated continuous basis function in the spatial domain. These basis functions have two contributions. The first is a sum/difference of two functions which depend on the order , the spatial coordinate, and the bandwidth. The latter quantity determines the spatial width of . Furthermore, the localisation is permitted at , where a coefficient for a given mainly acts. Hence, increasing the order implies the localisation at a larger distance from the spatial origin. This explains the occurrence of factor 2 for for which both sinc functions coincide.

The second contribution causes a modulation of and is given by a sine/cosine with the centre frequency and spatial coordinate in the argument. This dependency makes the basis function for vanish at the spatial origin.

With respect to the spatial origin, the transformed basis functions of the coefficients for and are symmetric and antisymmetric, respectively.

Some basis functions in the spatial domain are shown in Figures 1(a) and 1(b) for = 500 GHz and = 1000 GHz.

2.3.2. Embedding into Larger Spectral Domain

The functions and may be finite in the spectral domain with limits and or centre frequency and bandwidth . Embedding this domain in a larger one with limits and (), another set of Fourier coefficients and is obtained with associated basis functions for the spectral domain. Without going into more detail here, these coefficients can be evaluated from and and the scalar products of the basis functions labeled with and . For instance, one finds for the ratio of the means . Basically, and maximise the information per coefficient when and are known. For example, a function which is constant inside a spectral domain and zero outside appears as a boxcar function from outside this domain. Thus, the only coefficient is mapped to an infinite number of coefficients and which are mandatory to capture both discontinuities.

In the spatial domain, the basis functions for and behave differently than the ones for and . Important to mention is the effect which the larger bandwidth has; are spatially narrower than . In addition, the number of coefficients per spatial domain increases which is expressed by .

For = 1873.7 GHz and Figure 2 shows some basis functions (, 1, and 2) in the spatial domain. Indeed, the basis function for with = 500 GHz and = 1000 GHz (see Figure 1(a)) is broader.

2.4. Parseval’s Theorem

Parseval’s theorem states abstractly that the length of a function in the spectral domain equals the length of its Fourier transform counterpart in the spatial domain. The length of the band-limited function is stated bybecause only the term remains odd and cancels by the integration. Furthermore, the scaling by the factor appears which originates in . Replacing with the expression (12) and exploiting that the basis functions are perpendicular in the spectral domain leavesThus, the length is given by the sum of the Fourier coefficients squared. According to Parseval’s theorem,the length in the spectral and spatial domain remains unchanged. Inserting (13) in the above expression implies that the spatial basis functions must be orthogonal for , and the spatial integral yields for and for . Analytically, this is hard to prove; however, this was numerically investigated and is considered to be valid.

2.4.1. Square-Integrable Functions

The function is said to be square-integrable, when the condition holds. Furthermore, if is square-integrable, then the Fourier series representations in (12) converges towards and almost everywhere in the spectral domain as the order grows [10]. Hence, the requirement on to be square-integrable seems reasonable.

2.5. Interferometric Data and Basic Model
2.5.1. Ideal and Real-World Interferometer

The Fourier transform can be performed by an interferometer, achieving an optical path difference between two partial beams, and the real-valued function can be sampled. From theoretical point of view, with an ideal interferometer diagnostic, a purely symmetric and noiseless interferogram is acquired. However, a real-world interferometer suffers from diagnostic imperfections like, for example, dispersion of any kind and/or misalignment. As a consequence, any acquired interferogram is to some degree asymmetric, and, hence, an odd feature is inherent due to the measuring principle. Furthermore, a measurement involves noise, always.

2.5.2. Spatial Sampling and Implications

In the spatial domain, is sampled at a finite set of optical path difference locations with , and marks the number of sample points. Usually, the sampling with constant increment between subsequent locations is preferred which puts constraints on the diagnostic design. Furthermore, the spatial origin is most likely missed by the sampling, and, thus, the absolute value of might be unknown. If so, it is mandatory to introduce the zero-path difference which is in the following set that holds.

The finite spatial sampling leaves undetermined between the sampling nodes and outside the limits and . Assuming holds, the Nyquist theorem states that the maximum frequency accessible is given by the Nyquist frequency . Hence, for the spectral quantities and to be inferred, a maximum for the upper limit follows from sampling theory. To prevent aliasing, needs to be chosen small enough so that and vanish below . In case, can be acted on by reducing the diagnostic throughput via optical filters, the transmission line, the detector sensitivity, and postdetection amplifier settings. In fact, solely by these precautions, one can make sure that no other band well above contributes to . If and only if no such band exists, then the interferogram is smooth with respect to the chosen sampling nodes, and missing to sample exactly at has no profound impact.

A diagnostic limitation is that the distance is finite, and, thus, no sampling is achieved below and above these limits. To gain information about and or the phase (see (8)), needs to be sampled on both sides of the spatial origin, so that the asymmetric feature in the interferogram is captured. Hence, in the following and, thus, are set to be negative, and the double-sided region is identified for the locations . This diminishes the maximum optical path difference achievable, being positive, and, thus, the length of the single-sided domain is identified by the relation . Because scales with the order of the Fourier coefficients (see Section 2.3), only a finite number of coefficients can be probed. According to Parseval’s theorem (see Section 2.4), information about the total length is missing. Furthermore, the Gibbs phenomenon, that is, a ringing, is present when and are inferred. Hence, should be maximal, so that as many as possible coefficients can be probed to decrease the loss of information. However, a trade-off between lengths of the single-sided and double-side domains is inevitable, depending on the level of the asymmetric imperfection.

2.5.3. Noise Contribution

Since any measurement has a noise contribution, the noisy data value can be written as , and the actual interferometric data is expressed by the vector . As spectral quantities are investigated, photons are involved in the measuring principle, and, hence, a part of has a Poissonian origin. However, the diagnostic under investigation later probes broadband spectra in the microwave and far-infrared range, and, thus, a large number of photons are present. Hence, the central limit theorem suggests that is a sample of a normal distribution with vanishing mean and a certain variance given by the squared noise level . In any case, dedicated diagnostics tests are mandatory to characterise for a given interferometer.

2.5.4. Basic Model

The combination of the relation (10) with the interferometric data, being noisy and sampled in a finite spatial domain, gives the most basic model. Formally, this model is stated here bywith the additional information: the zero-path difference location , the spatial limits are restricted by , and , the sampling increment is constant and known, and have the same spectral limits which obey and , and in case of Gaussian noise . To be precise, the unknowns of the model are , , , and , and .

The basic model is a starting point and must be amended by diagnostic imperfections and specifics to the interferometer design type.

2.6. Inferring Spectra by Standard Analysis Techniques

To infer the spectral quantities and from an interferometric data set , the standard techniques rely on a noiseless model and follow a hierarchical ansatz. After making assumptions on the spectral limits and , the zero-path difference location is estimated. The next step evaluates a phase which is a measure of the ratio , relying on the data located in the double-sided region. Given the model, the spectral limits, the spatial origin, and the phase, and are estimated up to the Nyquist frequency from the whole data set. To reduce the Gibbs phenomenon on the inferred spectral quantities, window functions are multiplied to the interferometric data. In the following, the weak points of the standard analysis techniques are described.

2.6.1. Noiseless Model

The model used and stated by (18) lacks the noise contribution by definition, and, hence, treats the noisy data as being not noisy. Strictly speaking, the model is not applicable for the problem at hand. Only by repeating the measurement sufficiently often so that the total noise contribution given by the vector becomes small, implying , the model applies. Because the integration time remains finite or only a single measurement is possible, the influence of the noise on the inferred quantities cannot be derived from the noiseless model.

2.6.2. Hierarchical Ansatz

Several steps are carried out to deduce the quantities of main interest and . Each step relies on model assumptions, which are not questioned or tested in any way, and results of previous steps, which carry an unstated uncertainty. This hierarchical ansatz lacks the uncertainty propagation onto and entirely.

2.6.3. Spectral Limits: Nyquist Assumptions

Two fundamental assumptions, called Nyquist assumptions in the following, are made by setting the spectral limits to 0 and the Nyquist frequency (see Section 2.5.2). Hence, the chosen spatial sampling would determine the bandwidth of the spectrum which is a misconception. Furthermore, the associated Fourier coefficients are located apart via their basis functions in the spatial domain, and the maximum order probed is artificially blown up to due to the Nyquist assumptions (see Section 2.3). However, if the functions and are finite in the spectral domain with limits and , then the embedding of the smaller domain into the domain leads to the reduction of the information content per Fourier coefficient as discussed in Section 2.3.2. Figure 2 compares the narrower spatial basis functions for the Nyquist assumptions with μm, that is, = 0 GHz and = 3747.4 GHz ( = 1873.7 GHz, ) with the wider basis function for the absolute term for = 0 GHz and = 1000 GHz ( GHz, = 1000 GHz).

The uncertainty of a Fourier coefficient, relying on the Nyquist assumptions, scales like (noise level/maximum bandwidth) which follows from the linear uncertainty propagation for (13). But for the band-limited case, the uncertainty would scale like , where the square root term states that more than one data point is related to one coefficient. Hence, if a band-limitation exists but is not taken into account, then the uncertainty is maximised on the inferred spectral quantities.

2.6.4. Estimation of Spatial Origin

The spatial origin or zero-path difference is most likely missed by the spatial sampling. One of the standard approaches to estimate fits a parabola to the main interferogram peak without any information about the even and odd spectra itself. However, as one can see from (13), the basis functions for the even and odd absolute terms ( and ) are of leading order close to the spatial origin. Hence, information about the zeroth-order coefficients and the spectral limits should be at hand for the estimation of .

With available, the double-sided and single-sided regions are identified. Though, a systematically affected estimate of the origin causes an additional asymmetry in the interferometric data which would result in an increase of and a decrease in which is usually interpreted as a phase ramp feature. Hence, the origin should be determined with the criterion that it minimises the odd spectral function.

2.6.5. Windowing

Having only a finite amount of Fourier coefficients probed causes the Gibbs phenomenon to appear for the spectral quantities inferred. To reduce this ringing feature, window functions are applied in the spatial domain to bring the interferometric data smoothly to zero towards the sampling limits. More precise, probed Fourier coefficients of higher orders are damped out, and a window function corresponds to a certain convolution function in the spectral domain. Hence, a weighted averaging of the spectral quantities is carried out which reduces the ringing. This approach can give a good global approximation of and for regions with no significant gradients. However, the damping of Fourier coefficients worsens the convergence of the inferred quantities in regions with considerable gradients.

Implicitly, the application of window functions excludes the investigation of the uncertainty on and introduced by nonprobed Fourier coefficients. Hence, the requirement of square-integrability of the spectral functions is not taken into account.

3. Bayesian Formalism

3.1. Bayes’ Theorem

The joint probability density function (pdf) captures the chance that the outcome , let us say a data value or set, and the outcome , a single model parameter or a set, are realised simultaneously.

The product ruleintroduces the conditional probabilities for finding the outcome , if the outcome were true and vice versa. By the theorem of Bayesone conditional probability can be expressed by the other, when the marginal distributions and are known. Hence, Bayes’ theorem captures the information/knowledge gained for when a certain outcome for has manifested. For the pdfs occurring in Bayes’ theorem, common names are used, that is, the posterior , the likelihood , the evidence , and the prior . The link or functional dependence enters in the likelihood which takes into account known uncertainties like, for example, measurement noise. Any knowledge about before new data is available can be found in the prior .

Bayes’ rule can be extended tointroducing a set of hyperparameter which enters per definition solely in the prior . The additional pdf is called hyperprior which allocates trust in . Apart from having the posterior for the parameters , the marginalisation with respect to reveals the posteriorfor which measures the plausibility of an outcome of the hyperparameter given the data. Since does not depend on , the most likely hyperparameter set is identified by the maximum of , assuming is uniform.

3.2. Formalism for Linear, Nonlinear, and Hyperparameter Problem for Gaussian Noise
3.2.1. Multivariate Normal

Let the joint pdf for the random vector () be a multivariate normal with mean () and covariance matrix (); then the pdf becomeswith the determinant .

3.2.2. Model for Linear Problem

If the dependency between the data and the parameters of interest is linear, and the likelihood and the prior can be expressed by multivariate normals, then the evaluation of the posterior is analytically straightforward. Such a model is the starting point for investigating a more complex model which includes parameters with a nonlinear mapping to the data domain and/or hyperparameters.

(a) Gaussian Likelihood. The data may be represented by the vector (). The parameters of interest () map linearly to the data domain likewhere the dimensional matrix M encodes the linear mapping, and captures the random noise contribution. When the data is acquired independently, and the noise is independent for each datum and follows a Gaussian with vanishing mean and standard deviation (noise level), then the Gaussian likelihoodcan be found with the covariance matrix .

(b) Gaussian Prior. The prior information about may be expressed by the multivariate normalwith the prior mean and the prior covariance .

(c) Gaussian Posterior and Evidence. Formally, Bayes’ theorem states the posterior byAfter some algebra, one can show that the posterioris a multivariate normal with posterior meanand covariancewhich are both analytically obtained. Furthermore, the evidence readswhere the first part depends explicitly on the measured data, and the second part, being dimensionless, incorporates the ratio dependent on the means and covariances of the prior and posterior.

3.2.3. Model for Linear, Nonlinear, and Hyperparameter Problem

The linear model is amended by hyperparameters, entering in some way in the prior, and parameters with a nonlinear connection to the data domain. Such a model is then applicable in the field of Fourier spectroscopy.

(a) Gaussian Likelihood. The linear mapping of the parameters to the data domain, as stated by (24), should remain valid. However, the mapping itself may depend on the parameters in a nonlinear way, so that . This leaves the Gaussian likelihood in (25) formally unchanged but is symbolically stated as .

(b) Priors. The Gaussian prior for should be given by , where the prior mean and covariance depend on some of the hyperparameters . Similarly, a prior follows for the nonlinear parameters. Finally, the hyperparameters have an assigned prior .

(c) Posteriors and Evidence. According to Bayes’ theorem, one can write the joint posterior likeand the conditional amplitude posterior for becomes a multivariate normalThus, both, the conditional posterior mean and covariance evaluated by (29) and (30), depend on the nonlinear parameters and hyperparameters. After the trivial marginalisation with respect to , the joint posterior for and remains. By expressing the posterior, named settings posterior in the following, likethe evidence is identified withNote, that the dimensionless constant , and, thus, the evidence depend on the chosen model, including likelihood and priors. Hence, is of importance, when the model is even further abstracted or compared with alternative models.

(d) Role of Settings Posterior. The optimisation, that is, the finding of the maximum of the settings posterior , can be interpreted as an implementation of the Ockham’s razor principle and/or as a regularisation procedure. This is essential when the number of parameters exceeds the number of data points.

Unfortunately, a general analytical expression is not available for this posterior, and, thus, it needs to be investigated numerically for the problem at hand. In order to do so, the quantityis of interest, because it is numerically accessible. In case, has a well distinguishable global maximum, can be approximated by a multivariate normal which is estimated by evaluating the Hessian matrix. Thus, one finds the posterior means and with the associated posterior covariance. This allows an approximate marginalisation with respect to and/or . This can be understood as a propagation of the posterior uncertainties in and to the marginalised posterior for the parameters of interest.

(e) Simplifications. For the remainder of this paper, some simplifications are made which modify (34), (35), and (36) accordingly. The prior mean is set to 0, and the priors and are chosen to be uniform. Then, one can set which modifies (37) to

4. Brownian Bridge Covariance

The continuous even and odd spectral functions to be inferred can be modelled each by a Gaussian process [11]. Thereby, the Brownian bridge process is a good starting point, because it exploits a fundamental condition to prevent aliasing for Fourier spectroscopy applications. This condition states that the spectrum and, thus, and must vanish at the spectral origin and at an upper limit which is smaller than the Nyquist frequency (see Section 2.6.3). However, this information is usually not taken into account any further in the analysis. On the contrary, a Brownian bridge and its associated covariance function fulfil the boundary conditions for any lower and upper limit. Hence, the covariance can be used in the Gaussian prior for and . In addition, this process has only one scaling hyperparameter which makes it attractive from data analysis point of view. This scaling can be estimated as well from the Fourier coefficients probed. In fact, this reveals information about the nonprobed coefficients and gives an additional uncertainty on and . After presenting some properties of the Brownian bridge covariance, it is used as prior covariance in the example application (see Section 5).

4.1. Standard Definition

The Brownian bridge is a continuous stochastic process for an interval, say from 0 to . This bridge is constructed by tying-down a Brownian motion process to 0 at the end of the interval in question. Furthermore, the tie-down at the beginning of the interval is inherited from the Brownian motion process. The covariance function for the bridge is defined in standard literature byfor and .

4.2. Adapted Definition

The even and odd spectral functions, being finite for the interval , demand some adaption of the standard covariance expression (39) when modelled by a Brownian bridge process. To keep the same properties on spectral scale, a shift by and the interval length need to be set. Thus, one gets

with the unit = Hz. With the normalisationthe modified covariance becomeswhere = Hz−2 has now the proper unit with respect to the spectral scale.

The parameters and of unit are introduced which are defined each as a scaling factor for the associated process. With these scalings the covariances are obtained, so that the units = V2/Hz2 match.

4.3. Covariance for Fourier Coefficients

The Brownian bridge covariance function for the spectral domain can be studied in the domain of the Fourier coefficients via the coordinate transform stated in (11) [11]. Compactly written, one finds the infinite-dimensional covariance matrix for the Fourier coefficients analytically by for all , and similarly follows. The only finite off-diagonal elements occur for the absolute term in connection with the higher order terms for the even coefficients captured by the infinite-dimensional row and column vectors and , respectively. This is caused by the condition that vanishes at the spectral boundaries where the sine vanishes intrinsically for any but the cosine takes values either 1 or −1 for even and odd orders, respectively. Hence, the covariance imposes the boundary condition.

For orders greater than 0, has no off-diagonal elements due to the Kronecker delta , meaning that these coefficients are independent on each other. Furthermore, for the infinite-dimensional matrices holds, and the amplitudes for even and odd coefficients drop equally with the square of the order.

4.4. Square-Integrable Property

According to Parseval’s theorem (see (16)), is evaluated by summing the squares of the Fourier coefficients. Because the entries of the main diagonal in the covariances and drop with the order squared, the Brownian bridge process ensures square-integrability of and as long as the scalings and remain finite.

4.5. Signal Envelope

For the even process, the signal level can be estimated by the envelope in the data domain. Starting point is the square root of the main diagonal of . Since the argument of (see (14)) localises the even and odd Fourier coefficient at a fixed in the same data domain, the even and odd contributions of must be added for . As can be seen by (13), the mapping of the absolute term to the data domain includes already the factor 2. In addition, the mapping comprises the bandwidth . In total, one finds the envelope asIn the above equation, factor 2 in front of was chosen, so that captures most of the signal. An approximation might be convenient, because 1.

For the envelope of the odd process, the same reasoning can be applied with one modification. The mapping demands that the contribution of the absolute term at the spatial origin vanishes (see (13)). Hence, one findsBoth envelops drop with , and, thus, most of the signal associated with each process would in the data domain.

5. Example Application

5.1. Formulation of Model
5.1.1. Martin-Puplett Interferometer at JET

The Martin-Puplett interferometer diagnostic [12] at the fusion device JET (Culham, UK) probes the spectrum emitted by a broadband source and performs the Fourier transform. The interferogram data is acquired in terms of Volts dependent on the optical path difference . However, two different sources are probed for 20 minutes subsequently to remove a class of diagnostic imperfections not treated here any further. By subtraction of the corresponding two interferograms, the data becomes available in form of the difference interferogram acquired at the spatial grid node . Then, the abstract model for the Martin-Puplett interferometer is stated by using the total amplification of the detection system. Furthermore, the offset marks a diagnostic imperfection which varies with . The Gaussian noise contributes to each data sample described by . The unknown quantities in the diagnostic model are the spatial grid , the lower and upper spectral boundaries and of the Fourier transform integral, the even and odd functions and dependent on frequency , and the offset.

5.1.2. Interferometric Data

The data set , that is, the difference interferogram consists of = 788 values (see Figure 3(a)). Merely for graphical presentation a certain is chosen derived from the standard approach (see Section 5.1.3). Globally, the data shows an upward trend with respect to the zero baseline.

The components are measured independently on each other, and the noise level for each is captured by = 132.29 μV, and, thus, the variance of the whole data vector is stated by the matrix .

5.1.3. Optical Path Difference

The diagnostic is set up, so that the sampling of the interferogram is triggered ideally when the optical path difference has changed by the increment = 40 μm. Hence, the standard model is obvious with , and the zero-path difference is a free parameter. However, this model is not accurate. Applying the standard approach which fits a second-order polynomial to the maximum and its two nearby values,  mm, is inferred for the data set shown in Figure 3(a). Furthermore, the difference quotient evaluated by is presented versus the optical path difference in Figure 3(b). The point-to-point variation of the quotient has a zig-zag pattern which implies that the assumption = const. is incorrect. Indeed, each of two the difference quotients and is smoother. Hence, the modelseems more appropriate, making use of two free parameters: the zero-path difference and a shift for every other grid value. The priors and are set to be uniform.

5.1.4. Offset

The upward trend of (see Figure 3(a)) is modelled by the offset . Here, a second-order polynomial is chosen to capture the offset by with the free parameters , , and being summarised by the vector . The corresponding mapping to the data domain can be expressed aswith the matrix . The joint prior is expressed by the factorisable multivariate normal distribution:where are considered as hyperparameters to which a uniform prior is assigned.

5.1.5. Spectral Quantities

The kernel of the spectral integration in (46) is finite only for frequencies     . Thereby, and are free parameters. The spectral domain is dicretised using the constant increment which is considered as a free parameter as well. Then, the spectral domain is represented by the set with , covering the band centred at . To be clear, since , , and are free parameters and will be inferred, the number is variable. This number determines the dimensionality of the vectors and which represent the discretised functions and . The mapping of and to the data domain in (46) is written aswith the two matricesThe joint prior for the two vectorial quantities and is factorised, and each prior is chosen as a multivariate normal distribution with vanishing mean. Since the Brownian bridge covariance (see Section 4) describes functions which vanish at the boundaries and and are square-integrable, and its signal envelope decays with the optical path difference like the interferometric data at hand, the priors are chosen bywith the two hyperparameters and for scaling. For each of these hyperparameters a uniform prior is applied.

Finally, the joint priorshould be constant, so that any combination of , , and has the same probability. Furthermore, the conditions , , and (global upper limit) must be fulfilled. For example, the upper limit of is set here to the Nyquist frequency = 3747.4 GHz.

5.2. Bayes’ Theorem

In the following, the linear parameters are summerised by the set , the nonlinear parameters by , and the hyperparameters by .

The matrix maps the parameters to the data domain. Hence, the Gaussian likelihood is written asThe prior for the full problem takes the formwith the multivariate normal priorfor , using the dimensional covariance matrixOne gets the joint posteriorwith the conditional amplitude posterior given by the multivariate normalusing the posterior covariance matrix and the mean . Furthermore, one obtains the settings posteriorwhere the constant is unknown so far.

5.3. Investigation of Posterior
5.3.1. Conditional Amplitude Posterior for Chosen Settings

To give some insight, the conditional posterior for the amplitudes is evaluated given the specific set of values for , , , , , and , , , and .

The values for  mm and  mm are chosen to form an optical path difference grid for which all subsequent nodes are separated by . Then,  mm identifies the double-sided domain (256 data points), and the single-sided domain is marked by 5.118 mm 26.362 mm (532 data points).

The spectral domain is covered from = 0 GHz to the Nyquist frequency = 3747.4 GHz (=). With the bandwidth and the optical path difference set, the conversion (see Section 2.3.1) to the order of Fourier coefficient follows. Thus, the double-sided and full spatial domain coefficients up to the orders 64 and 330 are located (see Figure 4(b)), respectively. Some examples of the spatial basis functions associated with this set of Fourier coefficients are shown in Figure 2.

To match the maximum order (330) of probed Fourier coefficient, the discretisation is set to .68 GHz which reflects the classical increment. Then, the spectral grid has = 660 elements.

The hyperparameters are set to large values like  V2 to prevent from a prior determined posterior. This unwanted feature would be present, if one would choose too small, so that the signal envelops do not include the measured data (see next paragraph).

With the chosen spectral priors, sample functions/vectors and are drawn and shown in Figure 4(a). The function values are of the order of  V/Hz decaying towards the spectral boundaries. The mapping of the spectral prior samples to the spatial domain via shows that can be of the order of some Volts close to the spatial origin (see Figure 4(b)) which exceeds the measured data by about two orders of magnitude. Furthermore, the amplitude of drops with the distance from the origin. All these data samples are well bounded by the signal envelopes obtained by which is the scaled version of the expressions (44) and (45).

For the prior covariance is determined by setting  V,  V/m and  V/m2.

(a) Posterior Mean. For the above values, the mean of the amplitude posterior is given by .41 nV, 5.75 μV/m, 2.42 μV/m2, and as presented in Figure 5(a). Clearly, peaks below 1000 GHz with the maximum of approximately 6 × 10−18 V/Hz around 400 GHz. Furthermore, is much smaller than below 1000 GHz.

The offset contribution to the data has an upwards trend but is small (see Figure 5(b)). The even and odd contributions behave differently in the double- and single-sided domains (see Figure 5(b)). While in the double-sided domain is much larger than , and equal each other in the single-sided domain. This indicates an underestimation (overestimation) of the Fourier coefficients above the order 64 for (). The cause of this finding relies in the choice () which becomes more clear in Section 5.3.3.

Since the quantity equals almost the data set (see residuals in Figure 5(c)), a nearly perfect match is achieved. Hence, all noise contributions are captured by the mean , and an overfitting of the data is obvious.

(b) Posterior Covariance. The posterior covariance is not characterised in detail. From the elements of the main diagonal of , one finds the values = 73.30 nV, = 12.58 μV/m, and = 446.67 μV/m2. These posterior uncertainties are quite large when compared to the values of the corresponding mean , especially for the quadratic coefficient.

For the spectral quantities, the square root of the main diagonal elements of is of the order of some  V/Hz (for one sample function drawn from the conditional amplitude posterior see Figure 5(a)). Hence, a considerable deviation from the posterior mean is possible.

(c) Posterior Samples. 100 samples are drawn from the amplitude posterior and mapped to the data domain (see Figure 5(b)). Thereby, the contributions , , and are split to investigate their interplay. has a noticeable width but remains in the vicinity of for the whole spatial domain.

For the double-sided region and form a narrow band around the corresponding posterior mean quantities and . On the contrary, for the single-sided region and increase drastically but are restricted by and . The deviation is large between the individual and the posterior mean quantities; though, the sum remains well inside the 2- band surrounding (see Figures 5(c) and 5(d)). Hence, the posterior covariance captures properly the correlations.

With the specific values of the settings one gets the number .

5.3.2. Settings Posterior

As the problem is formulated, the settings posterior is proportional to . Its optimisation, that is, finding the global maximum of the settings posterior is a 10-dimensional problem. Since large numbers are involved, one has to investigate . In general, if only one parameter is changed, a well distinguishable peak in is found. Hence, the optimisation is currently carried out by varying each parameter separately (coordinate descent algorithm). Thereby, increases when becomes smaller. This implies an increase of the dimensionality of the involved covariances and and, thus, a prolonging of the optimisation procedure. To demonstrate this procedure, the parameters = 0 GHz,  V2,  mm,  mm,  V,  V/m, and  V/m2 are set to the values used for the nonoptimised case (see Section 5.3.1). Scanned roughly in , shows a peak close to 1000 GHz (see Figure 6(a)). Furthermore, this peak increases for  GHz to . The peak is localised at 860 GHz which moves to about 910 GHz for the reduced values  V2 (see Figure 6(b)).

To ease the computational effort but still being able to characterise , its maximum is determined dependent on which is scanned in the values 5.68, 4, 3, 2, 1, and 1/2 GHz. Each maximum is captured by the sets , , , , , , , , and summarised by Table 1. All maxima locate at very similar sets which seem to converge as becomes smaller. Although, in relative terms, the maximum for a smaller has higher odds (see Figure 7). For example, the odds read 1 : 0.011 when the maxima at = 1/2 GHz and 5.68 GHz are related. Furthermore, taking the values for the maximum at = 1/2 GHz to evaluate at = 1/3 GHz and 1/4 GHz gives the odds 1 : 1.013 : 1.018. Thus, the global maximum locates somewhere in the range below 1/4 GHz. But this range cannot be investigated in more detail from numerical point of view. However, the increase in when is decreased below 1/2 GHz is interpreted as confirmation that a continuous spectrum is indeed probed.

For = 5.68 GHz, the odds read 1 : 10−1390 when the associated maximum () is related to the nonoptimised case () investigated in Section 5.3.1. This is caused by having chosen some parameters like several orders of magnitude too large with respect to the maximum settings values.

For = 1 GHz, Figures 8(a)–8(f) summarise scans in the parameter pairs (), (), (), (), () and in while the remaining parameters are held at the maximum values. In , , and , a skewed distribution is found. The remaining six-dimensional posterior distribution has a high probability in a narrow region for a given . This distribution is well approximated by a multivariate normal. Its mean is given by the maximum values listed in Table 1. The posterior covariance is estimated from the inverse of the Hessian matrix evaluated numerically via the second-order partial derivatives in the vicinity of the maximum. The off-diagonal elements of this covariance are negligible, so that one can factorise the posterior as a product of individual Gaussians. The posterior standard deviations , , , , , and vary little when is changed (see Table 2). The spectral boundaries are well determined within an interval of some GHz. While the uncertainty in the scaling is small compared to its posterior mean value, is more uncertain. The uncertainty in the zero-path difference is of the order of some m, and the shift is quite certain within some hundreds of nanometers.

5.3.3. Conditional Amplitude Posterior for Maximising Settings

In the following, the amplitude posterior is investigated for the maximising settings given = 1/2 GHz (see Table 1). For the listed settings, the double-sided domain is identified by 5.115 mm; the single-sided domain is bounded by the lower limit and the upper limit of about 26.36 mm, the centre frequency , and the bandwidth of the spectral domain read 472.925 GHz and 881 GHz, respectively. In addition, the conversion of the spatial coordinate to the order of the Fourier coefficients reveals that the double- and single-sided domain contain information up to the 15th and 77th order (see Figures 9(c) and 9(d)), respectively. From this follows that the interferometric part of the 256/788 data values in the double-sided/full domain are modelled best by 62/310 Fourier coefficients and their associated basis functions dependent on , , and . The signal envelopes and are evaluated with and .

(a) Posterior Mean. Given the boundaries and increment, the spectral dimension reads = 1762, and, thus, the amplitude posterior mean is 3527-dimensional. The posterior mean values for the three coefficients  nV, = 5.997 μV/m, and μV/m2 have magnitudes which are close to the corresponding prior standard deviations at the maximum of the settings posterior (see Table 1). Furthermore, the absolute and the linear mean values are similar to the ones obtained for the nonoptimised case (see Section 5.3.1). The offset which follows from mapping is shown in Figures 9(b) and 9(d).

The posterior means and presented in Figure 9(a) approach zero towards the spectral boundaries as expected. Furthermore, is much larger in amplitude than anticipated by the quite different posterior mean scalings and . Hence, the even process describes most of the interferometric data in the double-sided domain. This is seen by mapping both spectral means to the data domain which gives the even and odd quantities and (see Figures 9(b)9(d)). In addition, the single-sided domain is described mainly by , because in this domain only the envelope for the even process is of the order of the interferometric data (see Figure 9(d)). This differs from the findings for the nonoptimised case () for which and determine the single-sided region almost equally.

The histogram of the residuals (see Figure 9(e)) is approximated very well by the normal distribution . Since the mean vanishes almost, and the standard deviation is very close to unity, the data set is well described by the model and the posterior means. Most likely, the data point located at 7 mm (see Figure 9(f)) is an outlier, because its residual is outside the 3.5- band.

(b) Posterior Covariance. For the linear parameters the posterior covariance matrix is written likeand taking the square root of an element of the main diagonal gives the posterior standard deviation for the th parameter. The correlation coefficient between two parameters and can be evaluated by .

The standard deviations = 0.501 nV, = 0.093 μV/m, and = 4.119 μV/m2 of the three coefficients are reduced by about two orders of magnitude with respect to the uncertainties obtained for the nonoptimised case. Though, the quadratic coefficient is inferred with small confidence. Coefficients of neighbouring degree have the highest correlation in magnitude and are anticorrelated. For example, one finds and , while = 0.29 remains small.

The covariances and and their associated correlations and resemble sinc functions centred along the line plus a tip at this condition (see Figures 10(a)10(c) for cross section at = 100.175 GHz). Though, the FWHM widths of 1.5 GHz and 10 GHz are very different for the even and odd covariances/correlations. Regarding the covariances, the tip heights account for  V2/Hz2 and  V2/Hz2, and towards the spectral boundaries the heights are approaching zero which can be seen by the standard deviations and (see Figure 10(d)).

At the condition , the covariance and the correlation vanish and are asymmetrically close to this line (see Figures 10(b) and 10(c) for cross section at = 100.175 GHz). This means that at a given frequency the amplitudes and are independent but have a finite and opposite correlation with the neighbouring spectral domains to either side.

The correlations and are oscillatory and rise towards the spectral origin (see Figures 10(e) and 10(f)).

(c) Posterior Samples. From the conditional amplitude posterior samples , and are drawn. While Figure 11(a) shows one sample for each of the even and odd spectral functions, Figure 11(b) presents 100 samples. The samples form a band around the corresponding posterior mean with the width of about twice the standard deviation as shown in Figure 10(d). Hence, the band for the odd function is much smaller.

The samples mapped to the data domain give , , and (see Figure 11(c)) which form much more narrow bands when compared to the nonoptimised case (see Figure 5(b)). In particular, the transition from the double- to the single-sided domain is smooth. This is a consequence of having used the most likely scalings and and the associated but quite different signal envelops inferred from the data. Furthermore, this explains why the band for is much smaller than the one for .

For each sample, the sum complies with the measured data and the noise level (see Figure 11(d)).

With the samples and at hand, and are predicted outside the probed spatial domain. Up to the optical path difference of 70 mm, Figures 11(e) and 11(f) show and for one sample and 100 samples, respectively. No difference is obvious in and for an individual prediction in the probed and nonprobed domain. However, many samples take values between outside the probed region. Since the spread in the probed domain is smaller than in the nonprobed domain, the main uncertainty in and originates in Fourier coefficients not probed rather than the noise on measured data.

5.3.4. Marginal Amplitude Posterior

Since only the marginal posterior of the spectral functions is of interest for this specific Fourier spectroscopy problem, formally the marginalisationneeds to be carried out. In general, the pdf is not Gaussian but can be approximated by a multivariate normal. Performing the marginalisation, that is, the uncertainty propagation, is analytically not possible but achievable numerically described below for = 1/2 GHz.

To reduce the numerical efforts, only the most important parameters , , , , , and are taken into consideration for the marginalisation, and the remaining three parameters , , and are held at the maximum values listed in Table 1. The remaining pdf, labeled by in the following, is well approximated by a factorisable multivariate normal, because the corresponding covariances are negligible (see Section 5.3.2). The posterior mean values and standard deviations are listed in Tables 1 and 2, respectively.

From the six-dimensional Gaussian posterior , 1000 samples are drawn. For each sample, mean and covariance for the conditional amplitude posterior are evaluated. Thereby, a difficulty arises due to the lower and upper spectral limits and . For given increment , and determine the spectral grid. To keep this grid fixed, the spectral grid obtained for = 32.425 GHz and = 913.425 GHz is extended by some tens of GHz on either side. Furthermore, the posterior samples for and are compared with the extended but fixed spectral grid points and changed to the closest grid point. With the modified boundaries and the remaining samples unchanged, a conditional amplitude posterior follows. For example, Figure 12(a) shows the mean functions and for the 1000 posterior samples of the settings. While changes little but most at the spectral boundaries, varies considerably, mostly in the centre of the spectral domain covered. This is caused mainly by the posterior uncertainty of the zero-path difference . However, the spread is symmetric around the most likely mean.

For each settings sample, 100 samples are drawn from . Overall, 100000 samples are available for , , and at each spectral grid point. From this large sample set, the marginalised means , and covariance are evaluated which capture the first and second moments of the pdf . In doing so, the marginalisation with respect to is achieved implicitly.

(a) Posterior Mean. The means and (see Figure 12(b)) look similar to , at the maximum of the settings posterior. However, and approach zero some GHz below and above and , respectively.

(b) Posterior Covariance. The posterior covariance after marginalisation with respect to the setting parameters is decomposed like (see Figure 13(a) for cross section at = 100.175 GHz) and is very similar to evaluated at the maximum of the settings posterior (see Section 5.3.3). The same holds for and . However, a significant change is found for (see Figures 13(a) and 13(b) for some cross sections) when compared to . Due to the posterior uncertainty in , a broadband additive contribution increases up to about one order of magnitude in the centre of the spectral domain. Comparing the standard deviations with , and with (see Figure 13(c)) shows that towards the spectral limits the propagation of the settings posterior uncertainties has little effect. Furthermore, the correlation is high for a broad spectral range in the central region (see Figure 13(d)).

(c) Posterior Samples. From the marginalised Gaussian posterior 100 samples ( and ) are drawn and shown in Figure 12(b). As expected, the spread of around the mean is almost equal to the one for around the mean (see Figure 11(b)). The samples have a wider spread in the centre of the spectral domain than mainly caused by the uncertainty in the zero-path difference.

5.4. Figure of Merit for Real-World Interferometer

For the ideal Martin-Puplett interferometer, the odd spectral function, and, hence, the odd process must vanish from theoretical point of view. However, imperfections of a real-world interferometer leave the odd contribution finite in general. This imperfection is captured by the scaling and in terms of signal by the envelope . By relating the square root of the scalings likeone can define the figure of merit which expresses by a number the signal deviation of a real-world instrument from the ideal case. For an ideal interferometer, = 1 holds (). For the interferometer investigated here, the settings posterior (see Section 5.3.2) carries the information to state the mean as = 0.933 with the uncertainty of about 0.01. Hence, about 7% of the signal is converted from the ideal even process to the odd process by the real-world diagnostic.

6. Discussion

6.1. Choice of Spectral Priors or Model Plausibility

The results obtained in the previous section rely on the model presented here with the assumption that the even and odd spectral functions can be described each by a Brownian bridge process and its associated prior covariance. An alternative model, let us say , with certain assumptions on the spectral functions, leads to different prior covariances and to a different posterior. In addition, the model might be more or less plausible when compared to the model .

In principle, the plausibility of a model relative to an alternative can be investigated within the Bayesian framework by rising the abstraction level. Starting from (36), a further factorisation needs to be carried out with respect to the used models or . Basically, one can assign the model posterior by and with the model priors and . The dimensionless constants and may be obtained by marginalising over all linear and nonlinear parameters and hyperparameters. Then, the model plausibility is captured by the ratio . This ratio becomes , if no model is preferred a priori for which one sets . Such a model plausibility study was not carried out, because it demands an investigation on its own alongside with a costly numerical treatment. However, some aspects of a plausibility study will be discussed below.

The signal envelope, corresponding to the used prior covariance for the even and odd spectral functions, is expected to be an indicator for a competitive model. This envelope should be able to resemble the global trend of the interferometric data (see Figure 3(a)). The investigated model seems to have these desired characteristics (see Figure 9).

For the even and odd spectral functions, an alternative prior choice could be which assigns no correlation. For example, if the function is chosen to remain constant or as a triangular function, centred at and approaching 0 towards the spectral limits, then the corresponding covariance for the Fourier coefficients has a constant value along the main diagonal for all orders. This can be shown analytically by performing the operation stated in (43) on . As a consequence, no decay is imposed on the Fourier coefficients with rising order which is incompatible with square-integrability. Furthermore, the alternative signal envelop is constant in the optical path difference domain, opposing the fall-off in the interferometric data (see Figure 3(a)). Thus, the data should not be described better by either of the two suggested alternative models.

A competitive model could use a prior covariance for the Fourier coefficients which has a dropping amplitude when the order of the coefficients rises. For instance, the main diagonal of could be chosen like , and a transformation back to the spectral domain would allow further investigations of the properties of . For different but positive exponents , the model plausibility could be examined.

6.2. Comparison with Standard Model

The standard analysis approach for the interferometer investigated here relies on a different model which is set up hierarchically [12], and no covariances of the parameters involved are available. Basically, the quantity of interest, that is, the spectrum, is determined conditionally on the inferred voltage offset , the zero-path difference , while setting the shift and the phase. Furthermore, spatial window functions are multiplied to the data to retrieve the phase and spectrum, using discrete fast Fourier transformation (DFFT) routines, for the Nyquist assumptions. A consideration of nonprobed Fourier coefficients, locating outside the experimentally accessible spatial domain, is missing completely as well as the influence of the measurement noise. Hence, a comparison of the standard model with the Bayesian model is not straightforward. In order to perform a reasonable comparison, the spectral grid with the increment = 3.66 GHz, which follows from the use of DFFT by , and the limits = 0 GHz, = 3747.4 GHz are the same for both models. In addition, settings are determined for which come close to the settings for (see second column of Table 3) to resemble the standard model. Given these settings, the even and odd spectra are compared. Then, the plausibility of these settings can be obtained with respect to the maximum of the settings posterior for by evaluating and the corresponding odds.

The only window function applied here in the standard analysis weighs the single-sided data domain twice as large as the double-sided domain. Furthermore, the settings of the standard approach  mm and vanishing are transferred to the model , and the remaining settings of are optimised (see third column of Table 3). The even and odd quantities and are inferred with , and the means and of the conditional amplitude posterior of the model are evaluated. Figure 14(a) shows the results which are similar in amplitude. For both models, an aliasing feature can be found for the odd spectral quantity in the spectral domain from 3000 GHz to 3500 GHz. This feature originates in the assumed uniform optical path difference grid (). The differences and settle mostly in the range  V/Hz (see Figure 14(b)).

Keeping fixed all settings in but choosing and which are present at the maximum of the settings posterior for = 3.66 GHz (see forth column of Table 3), the aliasing feature vanishes completely, because the nonuniformity in the optical path difference grid is taken into account properly. This small change in the spatial settings would make the conditional amplitude posterior about 47 times more likely.

The odds 1 :  for the values at the maximum of the settings posterior (see fifth column of Table 3) with respect to the settings for the model which mimics mark the Nyquist assumptions made by the standard approach very unlikely. This is explained when the corresponding residuals are investigated which show an overfitting of the data if the Nyquist assumptions are used (similar to Figure 5(c)).

6.3. Computational Time

The algorithms for the model and the standard model are implemented in Scilab [13]. To obtain the even and odd spectra with via DFFT routines, a computational time of about 4 ms is measured. This fast analysis time is exceeded by several orders of magnitude, when the problem is investigated with the Bayesian model.

For the model , the number of linear parameters, dependent on the spectral domain and increment, gives the dimension of the prior and posterior covariance matrices. Hence, determines the computational times , to evaluate the mean and covariance of the conditional amplitude posterior, and , to investigate the settings posterior at one point in the parameter space. For the implemented algorithm and the maximising settings listed in Table 1 with decreasing ( increases by about one order of magnitude), and measured increase at least quadratically with (see Table 4). This is caused mainly by the need to invert numerically prior and posterior covariance matrices.

The characterisation of the settings posterior (see Section 5.3.2) requires a duration of at least which adds up to hours for small (large ) and to much less than one hour for = 4 GHz.

The numerical marginalisation described in Section 5.3.4 takes about half a day for = 1/2 GHz.

7. Conclusions

The Fourier transform is the heart of Fourier spectroscopy applications. Thereby, the interferometric data has a linear dependence on the even and odd continuous spectra to be inferred. Standard analysis techniques lack appropriate handling of fundamental aspects like noisy measurements, the influence of nonprobed spatial domains linked to Fourier coefficients above a certain order, the estimation of spectral limits, and the propagation of uncertainties of additional parameters like the zero-path difference onto the inferred spectra. For instance, the Nyquist assumption implies the fundamental misconception that the upper spectral limit of spectra to be inferred would depend on the spatial sampling. In addition, a broad spectral bandwidth would follow which increases artificially the number of Fourier coefficients necessary to describe the data. On the contrary, it can be shown analytically that a band-limitation causes spatially extended basis functions (modulated sinc functions) assigned to the Fourier coefficients in the data domain. Thus, several nearby data points are captured sufficiently by less coefficients. This example demonstrates that interferometric data contains more information than usually extracted.

As an alternative to the standard analysis techniques, a probabilistic ansatz, relying on Bayes’ theorem, was proposed which is able to capture the fundamental aspects listed above. In general, Bayes’ theorem relates the posterior probability density function of model parameters to the product of the likelihood and the prior probability density function for these parameters. The ansatz presented here uses multivariate normal distributions for the likelihood and the prior for parameters which map linearly to the data domain. This gives straightforwardly an analytical solution for the posterior of these linear parameters in form of a multivariate normal. Though, this amplitude posterior is conditional on the settings parameters, summarising all nonlinear model parameters and hyperparameters. After the trivial marginalisation over the linear parameters, the remaining quantity can be scanned in the settings parameters to investigate their joint posterior. This can be understood as a means of applying Ockham’s razor for the linear problem. With the settings posterior at hand, the marginalisation projects the uncertainties in the settings onto the linear parameters.

The example application for the Bayesian approach infers even and odd spectra, which qualify as linear parameters, in the microwave and far-infrared spectral domain and several settings parameters, like the spectral discretisation increment, the spectral limits, the scalings of the even and odd processes, the zero-path difference, and a shift correction to the spatial sampling, given a measured interferometric data set. Each spectrum is modelled by a scaled Brownian bridge process which is able to capture a band-limitation, and the associated covariance is used in the Gaussian prior. This covariance assigns a broadband correlation, but its transform to the domain of Fourier coefficients reveals no correlation (vanishing off-diagonal elements except in connection with the zeroth-order term) between the coefficients. Furthermore, the diagonal elements drop with the square of the order of the coefficients. Hence, the prior information stated by the Brownian bridge covariance considers functions which are square-integrable and, thus, converge globally in the limit when the discretisation increment approaches zero and the order of the Fourier coefficients tends to infinity. In addition, these functions vanish smoothly at the lower and upper spectral boundaries. In the data domain, a signal envelope follows from the Brownian bridge process. This envelope decays with the optical path difference and the spectral bandwidth.

For the linear parameters like the even and odd spectra, a conditional amplitude posterior was briefly examined, relying on the Nyquist assumptions. Due to the large upper spectral limit, all noise contributions to the interferometric data are captured by the posterior mean of the linear parameters. This implies an overfitting. Because large and equal values are taken for the two Brownian bridge scalings (large signal envelops), the mapped posterior means of the spectra describe the even and odd parts of the interferometric data to equal parts in the single-sided domain, while the even part dominates in the double-sided domain. This is an indicator that the Fourier coefficients located in the single-sided domain are underestimated and overestimated for the even and odd spectra, respectively. The posterior samples for both spectra show large deviations from the means, and the even and odd contributions, obtained by mapping the samples, form much wider bands than the measurement uncertainty, especially in the single-sided domain. This indicates an unnecessary expanded solution space for the problem. Only by the posterior covariance of the linear parameters, the sum of the mapped samples complies with the data and its uncertainty band. The listed features mark a very unlikely conditional amplitude posterior which is revealed by the settings posterior.

The settings posterior for the most important settings is well approximated by the product of individual normal distributions, because no significant correlations could be found. The corresponding posterior means and standard deviations take reasonable values. These values are affected little by the discretisation increment which tends to be small, confirming the proposition that continuous spectral quantities are probed. The upper spectral limit is about a factor four smaller than the Nyquist frequency, and the lower limit is well separated from zero. This reduction of the bandwidth implies that the interferometric data can be described by a number of Fourier coefficients with associated spatial basis functions which is about one-quarter of the amount of data points. The scaling of the even process exceeds the one for the odd process by about two orders of magnitude.

For values corresponding to the maximum of the settings posterior at a small discretisation increment, the conditional amplitude posterior was investigated. By the discretisation, the number of the linear parameters exceeds the one of the Fourier coefficients, which is mandatory to describe the data points within the measurement uncertainty, by one order of magnitude. However, the Ockham’s razor principle implemented by the settings posterior limits the solution space, so that the posterior means and samples for the linear spectral parameters, mapped to the data domain, have a smooth transition between the double- and single-sided regions. Due to the much larger scaling with respect to the odd process, the even process and, thus, the even spectrum describe most of the single-sided and double-sided interferogram region. While the probed interferometric data is well described within the uncertainty by the means and samples, the nonprobed data domain, corresponding to Fourier coefficients above a certain order, is filled broadly by these mapped samples. This filling decays with increasing optical path difference and is limited by the signal envelopes which follow from the estimated scalings of the Brownian bridge processes and the bandwidth. Because the spread of the mapped samples in the nonprobed domain exceeds the one in the probed region, the main uncertainties for the even and odd spectra originates in nonprobed Fourier coefficients.

The numerically costly marginalisation over some of the settings shows that the zero-path difference changes the covariance for the odd spectral parameters significantly. Basically, a broad increase of the posterior uncertainties and correlation was found.

A figure of merit was introduced which states the deviation of a real-world interferometer from an ideal diagnostic. By relating the scalings of the even and odd processes, the used interferometer is characterised as being close to the ideal case for which the odd process must vanish.

Disclosure

The views and opinions expressed herein do not necessarily reflect those of the European Commission.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The first author acknowledges the institutions CCFE (Abingdon, United Kingdom), IPP (Greifswald and Garching, Germany), IFP (Milan, Italy), and KTH (Stockholm, Sweden) and thanks C. Marchetto, Drift, U. May, C. and J. Hastie, C. Giroud, S. Jachmich, K. Anke-Pense and E. Pense, A. Dinse, M. Domin, W. and T. Schlett, W. Mühlenbeck and J. Vieweg, D. and E. and H. Förster, S. and J. Ferdinand, and K. and U. Schmuck for the continuous support and help, especially during a period of severe sickness of his mother Sabine. This work has been carried out within the framework of the Contract for the Operation of the JET Facilities and has received funding from the European Unions Horizon 2020 research and innovation programme.