Abstract

We apply the minimizing technique of compressive sensing (CS) to nonlinear quadratic observations. For the example of coherent X-ray scattering we provide the formulas for a Kalman filter approach to quadratic CS and show how to reconstruct the scattering data from their spatial intensity distribution.

1. Introduction

The rapidly growing Si technology in semiconductor electronics [1, 2] opens the possibility to grow III–V inorganic nanowires, such as GaAs, InAs, or InP, which were supposed to have potential [3, 4] to become building blocks in a variety of nanowire-based nanoelectronic devices, for example, in nanolaser sources [5] or nanoelectronics [6]. Such epitaxially grown nanowires are repeating the crystal orientation of the substrate and usually grow in Wurtzite (WZ) or Zinc-Blende (ZB) structure differing in the stacking sequences and , respectively, of the atomic bilayers. Theoretical predictions on the electronic properties [7] of these nanowires show that stacking sequences with WZ and ZB segments considerably differ in the conductivity. However, during the nanowire growth stacking faults, the mixing of ZB and WZ segments takes place, and twin defects [8] appear. As these defects have their own impact on the conductivity and band structure there is great interest in knowing the exact stacking sequence which can be studied by, for example, Transmission Electron Microscopy [9]. But, as this is a destructive method, it is impossible to use the nanowire after the structural studies. Nowadays the 3rd generation synchrotron sources and rapidly developing focusing devices like Fresnel Zone Plates opens new fields of nondestructive X-ray imaging. For example, in the Coherent X-ray Diffraction Imaging (CXDI) experiments, an isolated nanoobject is illuminated with coherent X-ray radiation and the scattered intensity is measured by a 2D detector [10, 11] under the Bragg condition. The diffraction patterns are structure-specific and encode the information about the electron density of the sample and thus the stacking sequence of the atomic bilayers formally by Fourier transform. However, because of sensor physics, the phase information is lost in CXDI measurement since the measured intensity pattern is the modulus square of the scattered X-rays and no inverse Fourier transform can be directly applied to recover the stacking sequence. The classical approach with the Patterson function [12] fails as the number of expected randomly distributed stacking faults is too high. Other inversion algorithms [13, 14] could be used instead to reconstruct the lost phase information: although dual space iterative algorithms [15] have been shown to converge under specific conditions [16] there still remain some convergence problems for a number of cases when not enough preliminary information regarding the structure of the object is previously known [17]. Indeed ptychography type of experiments [18, 19] were suggested to determine at least the relative phases of the bilayers by interfering adjacent Bragg reflections. But this type of experiments is more difficult to realize, as it requires higher stability in comparison to conventional CXDI and the longer measurement time can, however, influence the sample [20].

Due to the principal loss in phase information the scattering data can be considered to be undersampled and algorithms tailored to this lack of information, like basis pursuit approaches [21] realized by minimizing the norm of sparse vectors in an appropriate basis or phase retrieval via matrix completion [2224], could be tested for reconstruction from conventional CXDI measurements: as these data are recorded without structured illumination for applying matrix completion we focus on utilizing the minimizing technique to reconstruct undersampled sparse signals [25, 26] as vectors from linear mappings between the signal and the observations. For an overview of this so-called compressive sensing (CS) see the textbook [27]. The method of CS aims at a signal’s sparse support which can be estimated by, for example, Kalman filtering [28]. As the underlying filter formulas also relate observations and signals to be reconstructed by linear mappings this technique meets CS and was also used to explicitly minimize the norm by so-called pseudomeasurements [29]. For an example see the reconstruction from a random sample of Fourier coefficients [30], where the dimensionality of the underlying Kalman gain matrices can be reduced utilizing the null space of the sensing matrix [31]. Even for nonlinear mappings of signals Kalman filtering applies by using Jacobians rather than constant sensing matrices. These extended Kalman filters (EKF) are used for, for example, tracking issues [32] or robotics [33] and matching the sensing problem of the quadratic nonlinearities in the spatial intensity distribution of CXDI.

The paper is organized as follows. In Section 2 we set up the observation model for modulus-squared amplitudes of intensity distributions in coherent X-ray scattering and point out the relation to the minimization. In Section 3 we give a brief overview of the linear Kalman filter model and show how to incorporate the complex nonanalytic norm as a linearized observation to meet the minimizing strategy of CS. In this framework we prove a convergence concept for the minimization in the reconstruction scheme. In Section 4 we apply our findings to 1D and 2D Fourier data and remark in Section 5 on future considerations.

2. Motivation

2.1. Intensity Spectra in Coherent Scattering Experiments

Exposing crystals to nondestructive coherent X-ray radiation in dimensions the amplitude of the elastically scattered radiation is proportional to the Fourier transform [34]of the electron density , where the vector is a parametrization of the direction where radiation is detected and (multiplied by Planck’s constant) also describes the momentum transfer in a kinematical scattering theory; is the incident direction whereas represents the outgoing direction for radiation with wave vectors of wavelength (see Figure 1).

Following standard textbooks, for example, [34], one deals with two sets of basis vectors to characterize the scattering. The spatial vectors are represented in the basis of the grid whereas the reciprocal basis is used for wave vectors relying on the normalization

As the units of lengths and wave vectors are carried by the basis sets the scalar products read, with dimensionless factors and ,

Restricting to a finite grid with lattice sites in each direction with periodic boundary conditions the possible grid positions and wave vectors allowing for a discrete Fourier transform1 read, for all ,

Thus the scalar product separates into the spatial dimensions according towhere are related to the lattice positions in direction of the lattice constant and refer to a discretized wave vector in direction of the corresponding reciprocal basis . For example, a -dimensional regular hexagonal lattice encloses angles (see Figure 2).

The bold parallelogram represents periodic boundary conditions with, for example, and fragmenting the 1st Brillouin zone, the elementary cell spanned by in reciprocal space, into a subgrid according to (4). With the normalization (2) the reciprocal lattice encloses angles of with a fixed orientation with respect to .

2.2. Setting

Of course for the application the formulas are only needed up to dimensions: nanowires can be characterized by the stacking sequence of atomic bilayers which are shifted laterally and vertically with respect to each other. In coherent X-ray diffraction summing up all scattered radiation from a certain bilayer perpendicular to the growth direction yields a complex scattering amplitude associated with the th bilayer spanned by . Due to the hexagonal lattice and with respect to an arbitrary reference bilayer there are three different phase factors:for the amplitudes left [34, 35]. Wave vectors sensitive to the arrangement of the bilayers are selected from directions related to the Bragg condition by with yielding , which directly relates to the three possible lateral shifts. With respect to the experimental setup [17] this is satisfied by and . Recovering the stacking sequence of these relative phase factors by varying directly reveals the Zinc-Blende or Wurtzite structure of the wire along with their stacking faults.

Carrying out the remaining summation over all equidistant bilayers in growth direction mathematically corresponds to the 1D discrete Fourier transform of the periodically continued complex scattering amplitudes combined with the vector . Each nonzero entry represents one bilayer and refers to the discretized component of the vector in the reciprocal basis corresponding to the growth direction. As the outcome of the detector is the measured intensity rather than the amplitude the observation is the squared signal where the Fourier coefficients build a Hermitian Toeplitz matrixshowing orthogonality with respect to multiplication and a completeness property with respect to summation. Clearly, due to the squaring, the signal (8) is invariant under both the transformation with any global phase and the reflection , on the lattice. As the spatial directions exhibit periodic boundary conditions the signal also remains invariant under discrete translations , on the grid.

On the contrary, setting the component corresponding to the growth direction in the reciprocal basis to zero, (1) examines the 2D structure of all the bilayers simultaneously. If they are assumed to be identical without lateral2 shifts the scattering amplitudes of the bilayer’s lattice sites are 2D real data sets with the discrete Fourier transformfor fixed and discretizing and . Squaring (10) yields Toeplitz matrices with Kronecker product structure (A.7) for the detected signalwith multiple row and column indices and , respectively, referring to the ordinary rows and columns of the Toeplitz matrices (9):

The scattering amplitudes in Fourier domain can be assigned with arbitrary phases. As this leaves the given intensity distribution unchanged an related infinite set of broadened scattering data is formed by convolution. But if the signal is assumed to be oversampled the original scattering data in this set appear to be sparse and can be recovered by an minimization run on, for example, (8) or (11)—up to translations, reflections, or global phases.

3. Kalman Filter-Driven Minimization

3.1. Kalman Filter Equations

The equations for the Kalman filter usually apply to vectors over the field of real numbers . We will examine next that the equations can also be extended to the field . Anticipating this in the classical state space approach [36] the vectorial quantity is supposed to evolve according to the linear evolution with a fixed matrix and a determined sequence of evolution matrices . In the model above the quantity is assumed to be only traceable indirectly by linear observations which can be viewed as linear mappings with given sensing matrices by

The stochastic behaviour of all the considered quantities is modelled by zero-mean Gaussian distributions with positive definite matrices and . These covariance matrices and related mean values can be calculated from normalized Gaussian integrals denoted by expectation or mean values :

The main idea behind Kalman filtering is to invert the observation model (14a) and (14b), where the estimation of the quantities from the measurements shows predictor-corrector structure [36]: the prediction step relates the estimates and by extrapolation whereas the correction step updates the estimate by relating it to the new measurement : as the underlying equations utilize conditional mean values the update can be formulated in terms of covariances

An alternative form of the covariance cycle in the correction step above reads, in terms of the explicit so-called Kalman gain matrix ,

3.2. Minimization

The main problem in CS is to minimize subject to the constraint for fixed and . Because this linear structure matches the evolution equation (13a) of Kalman filtering in Section 3.1 we want to follow [29] using a pseudomeasurement to iteratively minimize the norm: here, the th estimate of the state vector can be associated with and the norm minimization is driven by the fixed constraint augmented by the lowered norm from the previous step by a factor of as an additional observation. The main issue of applying the Kalman filter to the minimization procedure consists in suitably linearizing the nonanalytic norm: Solving for in the vicinity of yields for the expression

Thus the norm of any vector can be linearized in the vicinity of utilizing its phase information by

Due to vectors over the field it is necessary to distinguish row vectors from the usual column vectors related by Hermitian conjugation. The notation is based on complex scalar products and matrix multiplication (cf. Appendix A).

3.3. Pseudomeasurements

Involving the observation as a constraint like, for example, a Fourier transform, the Kalman filter equations (16a), (16b), (17a), and (17b) read (for , ) for the estimates to the vector involving a full prediction-correction step with the initial values , and the constant parameters , and to reconstruct from a given as a limiting value. The minimization of the norm is incorporated into the (pseudo)measurements by where the factor adaptively lowers the current norm solving for a corresponding estimate in the next iteration step. For a solution to the linear constraint as initial data can serve

3.4. Convergence Considerations

By inserting (22a) into (22b) the enhancement of the estimate compared to reads with respect to the measurement of a lowered norm

Using the covariance cycle (22a) again the enhancement matrix composed of positive definite matrices , , and on the RHS can be recast into the more suitable3 formyielding the inequalityby components. The key ingredients are the restricted positive eigenvalues of (26) bounded from above by the spectral norm (cf. (B.1)). What remains to prove for the inequality in (26) is the combination to be positive semidefinite.

In the overdetermined case this holds true for to be positive definite: as the combination has only rank the amount of of its eigenvalues is zero. The positive sign of the remaining eigenvalues can be obtained from a Cholesky factorization of with a lower triangular matrix yielding . Due to a singular value decomposition [37] of the remaining eigenvalues belong to which, however, is along with the prerequisite positive definite because of Theorem 1.

Theorem 1 (see [37]). If is positive definite and has rank , then the matrix is also positive definite.

By virtue of the same theorem the combination is always positive definite in the underdetermined case , which is independent of the sensing matrix .

So in each iteration step with adaptive the linearized norm is lowered (or remains at least constant) subject to the approximated constraint . In the real case (20) even holds with “” yielding for the relation implying for all . To show a general convergence let us start with the exact solution and related to it. Because of with zero shifts in the Kalman filter model (13a), (13b), and (13c) the constant (pseudo)measurement already implies with a linear convergence of the series byfor all starting with an in the vicinity of . Thus the series is said to be weakly convergent and reconstructs the original signal as long as the requirement for sparsity in the CS approach is met. In the framework of weak convergence the limiting value of reads for large where “” represents the special case . The limit is only possible for yielding a constant covariance in each iteration.

For an example of linear compressive sensing in the framework of Kalman filtering see [30, 31] using random samples of Fourier coefficients. Applying the method to a quadratic nonlinear observation model is shown for coherent X-ray scattering in Section 5.

4. Linearized Observation Model

Like the norm in Section 3.2 we linearize the squared observations (8) to apply the Kalman filtering scheme. Differentiating with respect to and , respectively, yieldsand Taylor expansion around reads up to 1st order

With the biases , from the linearization the observation model iswhere the real parts over all combinations have to be taken. The model in detail reads and because of the combinations we choose the alternative representation (18a), (18b), and (18c) of the Kalman filter equations to invert (32). Then the iterated estimates to the state vector read with the complex sensing matrix , biases , and the given real (pseudo)measurements according to

A properly chosen factor adaptively lowers the norm in each iteration step to reconstruct the signal as a limiting value from its given squared measurements . Note that form an orthogonal basis. Thusis Hermitian and positive definite due to (B.1) for all with which is sufficient to prove weak convergence of the minimization under the constraint of squared observations (cf. (26)–(28)).

4.1. Example 1: Stacking Sequence

To demonstrate the relative phase recovery from the observed intensity (8) not only restricting to the Zinc-Blende and Wurtzite phases (6) assume a linear chain with lattice sites and 6 equidistant sparse amplitudes with phases from the set forming .

As in the original objective the number of bilayers is roughly known we use as initial values a broadened modulus and phase distribution related to the setting (cf. Figure 3): with the support a leakagefor the moduli and phases of is with shifts , a suitable way to also consider the limit of exactly known positions. Note that for the phases we only used one fixed value for each occupied lattice site broadened by the leakage. So the main effort is the recovery of the relative phases from the measurements.

For an exponential decaying lowering factor within iterations the reconstruction results for noiseless synthetic measurements are shown in Figures 4 and 5. Experientially, good algorithm’s covariances were found to be

To drive the minimization the corresponding entry (here ) in the matrix has to be much lower in magnitude than the ones (here ) related to the observations of the constraint. When reconstructed amplitudes are said to vanish (e.g., in Figure 4) this means only up to numerical precision where the zero value is dominated by the entries of . Empirically, the exponential decay in the lowering factor is chosen such that its values approach about after the maximum number of iterations. A resulting typical smooth convergence of the norm due to the exponential decay is shown in Figure 6: the constant tail can be used to determine a maximal number of iterations as a stopping criterion. Note that the reconstructed results beyond this guessed number are insensitive to further iterations meaning for (34a), (34b), and (34c) to perform fixed point iterations within accuracies related to the covariances and .

In the nanowire the equidistant bilayers are at fixed positions in growth direction . Thus the reconstruction of their complex scattering amplitudes does not suffer from “off-grid” problems in general (see, e.g., [3840]) if integer multiples of the lattice constants are used for the DFT.

4.2. Example 2: Pattern Reconstruction

The 2D reconstruction from (11) can be calculated with the same algorithm already used for Example above, as the 2D data set of size can be mapped to a 1D vector by means of (12) (cf. Figure 7). Like the example from Figure 3, we use as initial values with the same reasoning and broadening parameters the given distribution. Within iterations and an exponentially decaying lowering factor the reconstruction for noiseless measurements is shown in Figure 8 with empirically good algorithm’s covariances:Note that there is a combined translational and reflectional symmetry mapping the vectorized real scattering amplitude distribution onto itself (cf. Figure 7). As this is also a symmetry of the squared observations (8) in the linear configuration, there are two competing solutions to the sensing problem. Interestingly, breaking this symmetry improved the convergence in Figure 8: without the parasitic scatterers the symmetric positions of consecutive sites4 are correctly (with respect to the support) found after iterations (cf. Figure 9) and the remaining discrepancies in the amplitudes do not even vanish completely after inefficient iterations. On the contrary the random sample of scatterers from Figure 10 is already reconstructed with less than iterations suggesting a strong dependence of the algorithm’s convergence from the vector under consideration.

5. Conclusion and Outlook

We aimed to implement nonlinear compressive sensing for complex vectors by inverting the underlying observation model with Kalman filtering. For this reason we proved a weak convergence of the filter equations for complex sensing matrices and Hermitian covariances . For the example of quadratic nonlinearities we applied our formulas to retrieve relative phase information in the objective of simulated noiseless coherent X-ray diffraction. Due to the nonlinearity the noise in the intensity is not Gaussian and needs further considerations to also account for, for example, lattice distortions or the finite coherence of the primary beam. As an outlook we presented a 2D pattern reconstruction which, hopefully, can help to investigate if the cross sections of the wires were grown regularly.

Because of the Jacobians building up the sensing matrix the convergence of the underlying norm minimization does depend on the reconstructed vector and may go beyond the resolution issues [40] for even constant sensing matrices. For this reason a thorough analysis on the relation between maximal sparsity (for the examples here about 10% of the available lattice sites) and the Toeplitz matrix (9) representing the sensing process for a successful CS is needed. As mentioned in the introduction algorithms in general minimizing the norm could be of interest retrieving information out from intensity spectra: to compare our results we applied the nonlinear version [41] of the primal dual algorithm [42] to the 1D sensing problem above yielding similar results with respect to iteration numbers, reconstructed amplitudes, and phases. On the one hand primal dual reconstructed the zeros outside the support (once a solution was isolated from the algorithm) numerically exact compared to accuracies of orders reached by the Kalman filter. On the other hand our approach focused in the first quarter of the iterations on guessing the support by strongly increasing and decreasing the corresponding amplitudes, whereas primal dual homogeneously acted on all the amplitudes during all iterations. For a detailed comparison and maybe a combination, the whole parameter range of the approaches have to be investigated. To apply the Kalman filter-based algorithm above to recorded data [17] we need to deal with several -vector’s entries describing a sparsely (by a factor of roughly 1 : 101) occupied linear grid covering all the approximately bilayers in a nanowire of 500 nm in height. For this reason matrix inversions, cf. (34a), (34b), and (34c), should be reformulated by, for example, sequential processing techniques to make the algorithm more scalable. Furthermore it would also be interesting if other adaptive lowering factors compared to the exponential ones yield faster convergences.

As the reconstruction from quadratic constraints seems to depend only little on the explicit algorithm used for the minimization, it would be interesting to figure out to which extent the linear matrix completion approach [2224] with respect to the nuclear norm can still be applied in the face of seemingly too few numbers of independent observations.

Appendix

A. Vector Notations for Complex Numbers

Vectors over the field consist of complex numbers with and . With a vector we usually associate complex numbers arranged as a column. With the complex conjugate row vectors are considered to be dual to bywhere denotes with the usual standard basis. With the complex scalar product each vector can be represented in a unitary basis yielding the expansion

With we denote matrices with rows and columns consisting of the complex entries . Let represent the Hermitian conjugate; then we get from its column decomposition , Using such a row decomposition the matrix multiplication of commensurable and can be viewed as single-scalar products yielding with including products for . Thus with the identity unitary matrices consist of orthonormal (row and) columns defined by the property

The possibility of multiplying matrices and of arbitrary dimensions is covered by the Kronecker product reading in components with multiple row and column indices and , respectively. For , , and the Kronecker product meets

Introducing the length of row or column vectors can be accomplished with the norms defined for all real by Particularly is no norm but can be viewed as the number of nonzero entries. The corresponding matrix norms of can be related to the vector norms according to

B. Hermitian Matrices

Hermitian matrices can be related by () if is positive (semi)definite which can be written as (). In addition, there is Theorem B.1.

Theorem B.1 (see [43]). If are Hermitian with , then .

Proof. , cf. 9.2.4 in [37] , has eigenvalues larger than that has positive eigenvalues lower than .

Thus all Hermitian matrices with positive definite and positive semidefinite satisfy the inequalities

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work has been supported in part by the Deutsche Forschungsgemeinschaft under Grants Lo 455/20-1 and Pi 214/38-2.

Endnotes

  1. In solid state physics the reciprocal vectors according to (3) and (4) for are said to be from the Brillouin zone which is a fragmentation of the elementary cell spanned by .
  2. However, in the hexagonal lattice there are three possible lateral shifts yielding additional phase factors in (10) for each layer. So carrying out the complete Fourier transform of all bilayers yields a random sequence with rather than the constant in front of (10) and (11). In the general case the vectors corresponding to the bilayers can be expressed by . Then the phase factors readand their dependence can be suppressed by choosing wave vectors related to the Bragg condition by with . One could be misled that this selects a certain amount of data points on the grid spanned by allowing for CS techniques. Switching to the discrete version this reduces to one corner of the Brillouin zone, that is, to the case of one single data point which is in fact too little for CS.
  3. A completely factorized version of (26) reads for
  4. At least in linear CS with a constant sensing matrix the resolution of consecutive frequency bins is also to be considered in discrete Fourier transform (cf. [40] and the references therein).