Abstract

The severe layover problem of complex urban scenarios in SAR data makes SAR data interpretation very difficult, especially for nonexperts. In this paper, we use 3D SAR tomography for SAR data interpretation of dense urban areas. An efficient and robust approach named Butterworth-filter based singular value decomposition (BSVD) is used for tomographic analysis. Two typical dense urban areas of interest located in Shanghai are analyzed. The tomographic results could help users to better understand the backscattering scenario. The experimental results indicate that SAR tomography is a promising and effective way to facilitate SAR data interpretation of complex urban areas.

1. Introduction

With the launch of the new generation of high-resolution synthetic aperture radar (SAR) satellites in 2007, large amounts of high quality SAR data with up to submeter spatial resolution are available [1]. With such high-resolution data, details of buildings and large-scale man-made structures can be observed from space. However, since SAR sensors work in a side-looking geometry, SAR images show tremendous complexity [2]. In particular, in dense urban areas where man-made structures form numerous corner-reflectors, high-resolution SAR images are mainly corrupted by speckling, shadowing, multibounce, layover, as well as sidelobes [35]. These problems make it difficult for users to interpret the data. Interpretation of such high resolution SAR images in urban areas became an interesting but difficult topic.

The extraction of buildings is an important task for SAR image interpretation in urban areas. There are a few ways to extract building information from SAR images. With one single SAR image, the building height could be estimated by analyzing the back-scattering properties of pixels and the electromagnetic property of the building object [68]. With one InSAR pair or StereoSAR pair, the topography of the illuminated area can be estimated, including the height of buildings [3, 9]. SAR simulation can also be used to estimate building heights in SAR images [4]. However, none of the above techniques could separate multiple scatterers superimposed onto the same pixel (layover) commonly occurring in urban areas.

As an advanced technique in radar remote sensing, SAR tomography (TomoSAR) is able to detect multiple targets superimposed onto one resolution cell [10, 11] commonly occurring in urban areas where a lot of man-made structures exist. It is a multibaseline technique which uses a stack of SAR images acquired from slightly different orbital positions. The concept of radar tomography was proposed in the 1980s [12] and introduced to SAR remote sensing in the 1990s with simulated data [13]. The first experiment with airborne SAR data was carried out in 2000 [11], while the first full application of mid-resolution satellite data was carried out in 2003, which demonstrated the possibility of using TomoSAR to locate interfering targets inside the same resolution cell [14]. The first tomographic results from high resolution TerraSAR-X spotlight data using Wiener regularized singular value decomposition was published by Zhu and Bamler [15]. However, tomographic results from high-resolution strip map data are still very rare.

Since strip map data has a lower spatial resolution than spotlight data, data interpretation in dense urban areas is even more difficult [16]. In this paper, we work on the above mentioned problem by interpreting the complex urban scenarios with 3D SAR tomography using TerraSAR-X StripMap data. We use an efficient and robust Butterworth filter based singular value decomposition (BSVD) [17]. BSVD has been demonstrated to be an efficient and robust tool for tomography with TerraSAR-X spotlight data, since it applies Butterworth filter on singular values instead of giving a hard threshold to cut off the small singular values. In this paper, we use BSVD for tomographic interpretation of TerraSAR-X StripMap data over complex urban areas in Shanghai. The results are compared with ground truth for validation.

2. Methodology

2.1. Three-Dimensional SAR Tomography

In the typical SAR imaging coordinate system, the azimuth and slant range are the two dimensions of the SAR image plane. Let us define the third coordinate axis orthogonal to the azimuth-range plane as elevation. A focused SAR image can be considered as a 2D projection of the 3D backscattering scenario along the elevation direction into the azimuth-range plane [15]. Therefore, the complex valued measurement of one specific pixel in a focused SAR image is actually the integral of the backscattered signals from all scatterers along the elevation direction.

In typical urban areas, man-made structures form large amounts of corner reflectors. As shown in Figure 1, the measurement of one pixel may contain backscattered signals from three or more different targets, since those targets have the same slant-range distance to the sensor and will be focused into the same pixel. With a stack of coregistered SAR images, the so-called elevation aperture can be established by combining the measurements of each pixel. The aim of SAR tomography is to reconstruct the backscattered signals along the elevation dimension for each pixel.

In a stack of coregistered SAR images, the complex value of a certain pixel in the th image is considered as the integral of the backscattering signals along elevation direction, which can be represented by the following equation: where is the elevation span, is the spatial frequency along elevation, and is the reflectivity power along the elevation direction [15]. If is sampled along elevation direction with samples, then is large enough. The above mentioned equation could be discretized as, where   () is the sampled elevation position, is the reflectivity at elevation position , and is the noise term, which could be ignored if proper preprocessing steps are conducted [6]. is an imaging operator with elements; each element of could be calculated using the following equation:

Then, (2) could be simplified as

The objective of SAR tomography is to retrieve the backscattering profile for each pixel from measurements () and then use it to estimate scattering parameters such as the number of scatterers present in a resolution cell, their elevations, and their reflectivity.

SAR tomography is actually an inverse problem and can be solved by spectral analysis of a coregistered SAR data stack of acquisitions [18]. There are two types of spectral analysis approaches: parametric and nonparametric. In parametric spectral analysis approaches, prior knowledge is necessary for parameter transformation or parameter separation procedures. No prior information is required for nonparametric approaches, because the number of parameters and their positions can be estimated directly from the data. In general, parametric methods may offer the better estimates if the data closely agrees with the assumed model; otherwise, nonparametric methods are better. Furthermore, parametric estimators require a large computational effort and are not recommended for practical data processing. Therefore, nonparametric spectral estimators are commonly used in practical applications.

2.2. Singular Value Decomposition in SAR Tomography

Singular value decomposition (SVD) is one of the nonparametric spectral analysis methods [19]. It is a simple and valuable tool for signal processing, and it retrieves reliable results from observations in the presence of noise. SVD is chosen for tomographic processing because of its good performance at high noise levels without compromising the spatial resolution in the azimuth-slant range plane [15].

In linear algebra, we suppose that is a   () matrix whose entries come from either the field of real numbers or the field of complex numbers. Then, there exists a factorization of the form where is an unitary matrix (orthogonal matrix); the matrix is an rectangular diagonal matrix with nonnegative real numbers on the diagonal; the diagonal entries of are known as the singular values of ; the unitary matrix denotes the conjugate transpose of the unitary matrix . Such a factorization is called a singular value decomposition of . In the TomoSAR application, the imaging operator is decomposed by singular value decomposition. Then, (4) could be replaced by

The backscattered reflectivity profile could be estimated by inversing (6) as

In order to limit the noise propagation induced by small singular values, the usual solution is cutting off the small singular values with a hard threshold, which is called truncated-SVD or TSVD [14]. The decomposition result is therefore highly dependent on the threshold, which means that choosing the right threshold is vital to the success of TSVD. In order to avoid choosing a hard threshold and also make the tomographic procedure adaptive to different data stacks, we proposed a Butterworth filter based singular value decomposition method in our previous research, which we call BSVD [17]. The transfer function of Butterworth-SVD is defined as where is the th nonnegative singular value of the rectangular diagonal matrix and , . The weights in this transfer function are used to multiply the inverse of the singular values () in the SVD reconstruction equation (7), aiming to weaken the influence of small singular values while retaining the influence of large singular values and eventually improving the robustness and adaptability of singular value decomposition.

3. Experimental Data Stack

The high-resolution TerraSAR-X StripMap data over Shanghai is used for tomographic analysis in this paper. We have acquired a stack of 36TerraSAR-X StripMap images from descending orbit over Shanghai. The parameter details of the data stack are shown in Table 1. The incidence angle of this stack is about 26 degrees. The data stack has an azimuth resolution of about 3.3 meters and a slant range resolution of about 1.2 meters.

4. Experiments on Region of Interests

The block diagram of our analysis is given in Figure 2. Before tomographic processing on real data, we need to preprocess the data stack. All SAR images need to be coregistered to a single master image; then interferograms are made from the coregistered images [20]. After InSAR processing, we need to calibrate the data with deramping and remove the atmospheric phase with PS-InSAR technology [21]. After preprocessing step, tomographic processing is conducted using BSVD, followed by model selection procedure [22]. Then, final estimates including number of scatterers and their corresponding reflectivity for each pixel will be given. We choose two regions of interest (ROI) in Shanghai as our study areas. The tomographic results over these two areas are analyzed and compared with ground truth, respectively. The tomographic results would help us better understand the high-resolution strip map data.

Our first ROI is the Zhengda Wudaokou Plaza, located in Pudong district, with 90 × 140 pixels, as shown in Figure 3(a). There are two office buildings in this area; one of them is a 28-floored high-rise with a height of about 112 meters; the other one is a 13-floored office building with a height of about 52 meters. Because of the side-looking geometry of SAR sensors, the top of a building is at near range and the building bottom is at the far range. The amplitude SAR image presented in Figure 3(a) is for nonexperts very difficult to understand. With the assistance of the bird-eye images from Microsoft Bing Map, we could see the basic structures of the two office buildings, as shown in Figure 3(b). The red arrows in Figure 3(b) refer to the azimuth and range direction in the SAR imaging geometry.

In this ROI, 4131 pixels with strong backscattering signals were selected and analyzed by SAR tomography. The distribution of the selected pixels is depicted in Figure 4(a) and the tomographic result is shown in Figure 4(b). As shown in Figure 4(b), the vertical structures and the L-shape footprints of both buildings are retrieved by SAR tomography. The estimated height of building 1 is about 120 meters, whereas the estimated height of building 2 is about 60 meters. Both estimates are very close to their real heights.

Besides the facades of both buildings, ground targets located in front of building 1, which layover with the building facade, are also derived by SAR tomography. This is due to the layover separation ability of SAR tomography, which could discriminate multiple targets focused inside the same layover pixel. Thanks to SAR tomography, the retrieved 3D distribution of targets over the ROI in Figure 4(b) is now understandable for all users.

In order to analyze the tomographic estimates in detail, the estimates along four azimuth lines are plotted in Figure 5, respectively. The four azimuth lines are located along the four facades of both buildings, as shown in Figure 4(a). Lines 1 and 2 are along the facades of building 1, whereas lines 3 and 4 are located along the facades of building 2. The estimates along the two azimuth lines over building 1 are presented in Figures 5(a) and 5(b). Both estimates show the clear structure of the building facades. The height difference between the top and the bottom is about 105 meters, which is close to the real height of 112 meters. There are more layover pixels detected at the near range side than the far range side in Figures 5(a) and 5(b). Due to the intrinsic imaging geometry of SAR sensors, near range refers to the top of buildings and far range refers to the bottom of buildings. The distance between possible ground targets and facade targets is larger at the top than the bottom of the building, which makes it easier to separate both targets. The estimates along azimuth lines 3 and 4 are shown in Figures 5(c) and 5(d). Since lines 3 and 4 are located along building 2, their estimates are identical. The structures along the facades of building 2 are clearly depicted. By estimating the height difference between the pixel on top and the pixel at the bottom, we could tell that the height of building 2 is around 60 meters. This is also consistent with ground truth.

Our second ROI is Dingyuan located in Xuhui district, with 45 × 170 pixels. This ROI is a much more complicated area than Zhengda Wudaokou Plaza, with a severe layover problem. Merely from the mean amplitude SAR image in Figure 6(a), it is hard to tell how many buildings exist in this area and their corresponding height even if one is experienced in SAR image interpretation. With the assistance of bird-eye image from Microsoft Bing Map in Figure 6(c), we know that this area is composed of two 35-floor residence buildings (buildings 1 and 2) and another very low building (building 3). Since residence buildings in China are usually designed with a floor-height of about 3 meters, we assume that the real height of those two buildings is about 105 meters. Those two buildings form severe layover problem and make the mean amplitude SAR image very confusing for users. One thing that needs to be mentioned is that the residence buildings in Dingyuan are designed to have 3 groups of large balconies on the facade, as shown in Figure 6(d). These groups of balconies form large amounts of corner reflectors which make the response signal very strong in the SAR images.

We selected 2176 pixels with strong backscattering signal for tomographic analysis. The distribution of selected pixels is presented in Figure 6(b). The estimated 3D distribution of scatterers/targets is depicted in Figure 7(a). The two 35-floored residence buildings are sucessfully derived from the confusing SAR images by TomoSAR. The estimated heights of those buildings are very close to their real height. The three groups of balconies on building 1 are all reconstructed, as shown in Figure 7(a). However, only one group of balconies on building 2 is detected by SAR tomography. The other two groups are lost in the original SAR images. Besides these two high-rise buildings, a third lower building with a height of about 40 meters is also detected.

In order to analyze the tomographic result in detail, the estimated targets along two azimuth lines are plotted in Figures 7(b) and 7(c). The locations of these two azimuth lines are shown in Figure 6(b). We could see that line 1 goes across the three groups of balconies on building 1 and one group of balconies on building 2, which is demonstrated by the estimates of line 1 in Figure 7(b). There exists three separate short lines along the facade of building 1 and only one short line along the facade of building 2. On the other hand, line 2 goes through facades of both building 1 and building 2, as shown in Figure 6(b). Therefore, we could see two long lines in Figure 7(c) where each line refers to one building facade.

5. Conclusions

In this paper, SAR tomography is used for TerraSAR-X StripMap data interpretation, which overcomes the severe layover problems of dense urban areas. For this purpose, we used a tomographic method called Butterworth-filter based singular value decomposition (BSVD). Two areas of interest in Shanghai are analyzed using SAR tomography. It is very hard for nonexpert users to understand the SAR images, sometimes even difficult for someone experienced in SAR data. With the assistance of SAR tomography, the 3D distributions of targets in Zhengda Wudaokou Plaza area and Dingyuan area are derived in this paper. The tomographic results could help users better understand the complex urban scenarios in SAR data.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank Dr. X. Zhu for the great help during this research work, Prof. Dr. L. Meng and Prof. Dr. J. Krisp for the kind support under IGSSE project 6.08 4D City. This work is financially supported by the National Natural Science Foundation of China under Grants 41174120 and 61331016, and the Research Fund for the Doctoral Program of Higher Education of China under Grant 20110141110057. The TerraSAR-X data was provided by Airbus Defence and Space.