Research Article  Open Access
A Semiparametric Model for Hyperspectral Anomaly Detection
Abstract
Using hyperspectral (HS) technology, this paper introduces an autonomous scene anomaly detection approach based on the asymptotic behavior of a semiparametric model under a multisample testing and minimumorder statistic scheme. Scene anomaly detection has a wide range of use in remote sensing applications, requiring no specific material signatures. Uniqueness of the approach includes the following: (i) only a small fraction of the HS cube is required to characterize the unknown clutter background, while existing global anomaly detectors require the entire cube; (ii) the utility of a semiparematric model, where underlying distributions of spectra are not assumed to be known but related through an exponential function; (iii) derivation of the asymptotic cumulative probability of the approach making mistakes, allowing the user some control of probabilistic errors. Results using real HS data are promising for autonomous manmade object detection in difficult natural clutter backgrounds from two viewing perspectives: nadir and forward looking.
1. Introduction
Hyperspectral (HS) sensors collect the radiation over a wide range of contiguous spectral bands, with each band corresponding to a unique spectral value. The field of view of the sensor is broken into hundreds of thousands of pixels, with each pixel representing from less than one to many squared meters of the region of interest depending on the spatial resolution of the sensor and the height of the sensor during the data collection. A collection of spatialspectral images is put together resulting in an HS data cube, where the length and width represent the spatial dimension, and the depth represents the spectral dimension [1]. The resulting HS data cube consists of hundreds of thousands of pixels. Each pixel has tens or hundreds of data points, each point corresponding to a unique spectral value. In theory, the spectral signature of each pixel should uniquely characterize the physical material in that spatial land area. In practice, the recorded spectral signatures will never be identical for samples of the same material. Owing to the different illumination conditions, atmospheric effects, sensor noise, and so forth, the resulting spectral signatures for HS imagery pixels containing similar materials will exhibit spectral variability.
The discriminant capability, however, of spectral signatures has led to two major applications: object classification and target detection. The former aims to assign all pixels of the image to thematic classes, the latter searches the image for the presence of specific material (the target). As highlighted in [2], from a theoretical point of view, target detection can be considered as a binary classification problem, aiming at classifying the image into the target class and the background class. But, since targets are usually sparsely populated in the scene, while the background is abundant and heterogeneous, a major distinction between the approaches designed for target detection and classification is that target detectors cannot use criteria based on the minimization of classification error since that would lead to labeling every pixel as background. So, a typical solution proposed in the literature for target detection is to use the NeymanPearson approach, as discussed by Manolakis in [3], where maximizing the detection probability for a fixed falsealarm rate is the adopted criterion for the algorithm design.
Due to the availability of spectral signature libraries for a wide range of materials, spectral signaturebased target detectors have been widely examined [3, 4]. These methods assume the target spectral signature is both reliable and known a priori and aim at finding highly correlated spectra in the scene corresponding to the reference target spectra; these methods are also known as target matching.
Target matching approaches are complicated by the large number of possible objects of interest and uncertainty as to the reflectance/emission spectra of these objects. For example, the surface of an object of interest may consist of several materials, and the spectra may be affected by weathering or other unknown factors. One may be interested in a large number of possible objects each with several signatures. Thus, the multiplicity of possible spectra associated with the objects of interest and the complications of atmospheric compensation, as well spectral calibration, acquisition geometry, and contamination from adjacent objects (see, for instance, the discussion in [5, 6]), have led to the development and application of anomaly detectors that seek to distinguish observations of unusual materials from typical background materials without reference to target signatures.
Anomalies are defined with reference to a model of the background. Typically, background models are developed adaptively using reference data from either a local neighborhood of the test pixel or a large section of the image. Local and global spectral anomalies are defined as observations that deviate in some way from the neighboring clutter background or the imagewide clutter background, respectively. Both approaches have their merits.
Local anomaly detectors are typically designed under the assumption that an anomalous material (the target) is spectrally distinct from local neighborhood spectra, which are assumed to be controlled by a known multivariate distribution (Gaussian); also, it is assumed that the scales of targets are known a priori, or the viewing perspective is assumed to be nadir and the altitude of the flying platform carrying the sensor is available for target scale estimation. This kind of detectors is susceptible to unknown spectral mixtures that are often obtained by sampling spectra through a moving window in the imagery, as the window is placed at a transition of distinct regions, forcing the neighboring spectral mixture to be compared against spectra of one of the regions in that transition; this may significantly increase the false alarm rate. The local anomaly detector is also susceptible to false alarms that are isolated spectral anomalies. For example, consider a scene containing isolated trees on a grass plain. Each separate tree may be detected as a local spectral anomaly even if the image contains a separate region with many pixels of trees. The most popular local anomaly detector in the HS research community is based on maximum likelihood estimation under the multivariate normal distribution; this detector is commonly known as the ReedXi (RX) algorithm [7] and has become a benchmark for comparison. Kwon and Nasrabadi proposed a kernelized version of RX in [8], and Matteoli et al. proposed a segmented version of RX in [9], for the estimation of the local background covariance matrix from global background statistics to cope with the small sample size problem in estimating covariance from local patches in the image; a small sample size problem occurs when the number of spectral observations is lesser than the number of spectral bands (see examination of this problem in [2]). Banerjee et al. in [10] leveraged the employment of kernel methods and the method known as support vector data description (SVDD) to propose the kernel SVDD. But, it is widely known that the performance of kernel methods crucially depends on the kernel function and its parameter(s) [11]. More recently, Gurram and Kwon in [12] and Khazai et al. in [13] have also addressed the parameter sensitivity of kernelSVDD based detectors, which is still an open area of research. In the statistical based arena, Stein et al. in [14] presented an overview of the statistical anomaly detectors derived from three background models: local Gaussian model, Gaussian mixture model (GMM), and linearmixing model. More recently, these models were compared against others approaches using the same dataset [15, 16]. In [15], the algorithms RX, GMM, and a clusterbased one were examined. Matteoli et al. in [16] extended the comparison to include the orthogonal subspace projection (OSP) detector and a deterministic signal subspace processing detector. Other classic approaches have also been adapted to the local anomaly detection problem, for example, Fisher’s linear discriminant (see an implementation in [17]).
Global anomaly detectors are based on a simple universal distribution, which is designed to represent the background process in the whole image, thus, the name global. Example of these detectors is the GMM [14] or a different version of the RX algorithm, which estimates its required parameters (mean and covariance) not locally, but using spectra from the entire HS data cube; this version is informally known as the Global RX. By design, these methods are more resistant to the small sample size problem mentioned earlier, but they have limited ability to adapt to all nuances of the background class (sometimes referred to as an underfitting problem), which may result in both high false alarm and low anomaly detection rates. Other earlier versions of global detectors require that an HS data cube is first segmented into its constituent material classes, so detection is achieved by applying a cutoff threshold and automatically locating pixel clusters with pixel values above the threshold, representing the outliers of these classes. These hybrid algorithms vary in the method of segmentation, but also tend to use maximum likelihood detection under the multivariate normal distribution. The stochastic expectation maximization clustering algorithm by Stocker in [18] is a related example; see also Masson and Pieczynski in [19]. But, since segmentation results are known to be also sensitive to algorithms’ parameters (see, for instance, [1]), utility of segmentation algorithms in the context of anomaly detection has not met expectations for varying real world scenarios.
Independently of the kind of anomaly detector in use, the following is a key consideration that should not be ignored: susceptibility to unknown spectral mixtures of unknown distributions often observed by sampling spectra through a moving window in the imagery, where the spectra in test belong to one of the components in that mixture (for instance, a local patch of canopy being compared against neighboring spectral mixture of the same canopy type and a patch of soil). Rosario in [20] examined this particular problem using near infrared HS imagery, where he showed that even on simple real case scenarios (e.g., motor vehicles parked at an open grassy field having also trees in the background on a sunny day) transitions of distinct regions may contribute to 18% of all of the locally observed spectra in the imagery, using a small moving window (10 by 10 pixels). Of course, with increased scene complexity (increased heterogeneity), this percentage reaches higher levels.
This paper presents a quasiglobal, semiparametric anomaly detection (QGSemiP) approach that is less susceptible to problems and issues mentioned above for existing local or global anomaly detectors. In particular, the approach requires only a small fraction of the HS cube to characterize the unknown clutter background (hence, the term quasiglobal), in contrast to existing global anomaly detectors, which require the entire cube. It does not use segmentation and it is less susceptible to spectral mixtures in local neighborhoods of the imagery.
The approach consists of three major parts: (i) a data dimension reduction method, which plays an important role on the overall approach, since it maps the data from their native multivariate space to a univariate domain in order to avoid the small sample size problem mentioned earlier, gaining in the process insensitiveness to illumination on objects, reducing the dominance of the blackbody response produced by Earth (if the longwave infrared region applies), among other benefits to be discussed later; (ii) a univariate semiparametric model [21], which is chosen because of its robustness to samples representing a mixture of material types; (iii) a parallel random sampling scheme that characterizes the unknown background clutter, using a binomial probability density function to model the likelihood of sampling targets by chance and erroneously labeling them as clutter, justifying multiple sampling trials in parallel in order to significantly decrease the labeling error probability.
The semiparametric model is neither parametric, since the specific distribution controlling the data is not assumed to be known, nor nonparametric, since other parameters must still be estimated relating two unknown distributions. The semiparametric model, however, assumes that the distributions of the samples to be tested are related to each other through an exponential function (a distortion), having two unknown parameters. As it will be shown later in detail, the model is appealing for many reasons, including the following: if two spectral samples under test belong to the same material type (i.e., they are not anomalous to each other), then the assumed exponential function relating both distributions is reduced to unity. If the two samples under test belong to different material types (i.e., they are anomalous to each other), then the exponential function will impose a significant weight relating both distributions, indicating that the samples are anomalous to each other; a key point here is that such an outcome is invariant to whether the assumption of exponential relationship between the distributions is satisfied or not—this will be discussed in more detail later. Finally, another benefit, although not recognized earlier by users of semiparametric models in other areas of study, is the model’s natural ability to handle samples representing a mixture of different material types, which also will be shown later. As a note, samples representing a mixture of material types are known to significantly increase the false alarm rate in operational scenarios, requiring autonomous anomaly detection; they can produce dominant edges between spatial regions of different material classes, later to be detected as meaningless (false) anomalies [2].
The strength of the semiparametric model handling the mixture problem is attributed to the fact that a sample under test is expected to contribute to the estimation of the distribution function controlling the sample labeled as reference, where both samples are expected to equally contribute to the estimation (when both are in fact under the same distribution), only the reference sample will be able to contribute (when both samples are from different distributions), or both the reference sample and a portion of the test sample will contribute (when the reference is a mixture and the test sample represents a component in that mixture, or vice versa). These outcomes are produced naturally by the model because of the imposed exponential relationship between the two distributions, as shown later using simulated univariate data to make the point. Another appeal for using an exponential distortion assumption is that many of the known distribution functions can be expressed in terms of an exponential distortion of another distribution, including all of the exponential family of distributions [22].
Rosario in [23] published a much earlier and limited conference paper version of this work, where a twostep semiparametric detector (data transformation and semiparametric test statistic) was introduced to the limited task of local anomaly detection (where prior knowledge of targets’ scales was required, imposing the limitation) and its performance was compared only against the RX algorithm. Relative to the previous work in [23], this paper significantly extends both the overarching methodology and presents additional results using the extended approach to test significantly more challenging scenarios from the ground to ground viewing perspective, where targets’ scales are unknown a priori. In other words, this work enables capability rather than just offers an incremental improvement. The specific contributions in this paper (method extension and additional results) are as follows. (i)The twostep anomaly detector (data transformation and semiparametric test statistic) is employed for the first time in a quasiglobal framework (which is also proposed herein), where only a fraction of the entire data cube being represented by blocks of data are randomly selected from the imagery and used as reference sets of spectra to test the entire imagery. The results are later fused using order statistics, as the sampling scheme is modeled by the Binomial distribution.(ii)An analytical cutoff threshold is derived from the approach’s asymptotic cumulative probability of rejecting a null hypothesis, when either the null or the alternative hypothesis is true (given that the null hypothesis is based on a multisample testing and order statistic scheme).(iii)The extended approach is applied to additional real HS imagery to automatically find manmade objects in the scene, producing excellent results in difficult natural clutter scenarios viewed from both nadir and forward looking viewing perspectives.(iv)Performance of the extended approach is compared via ROC curves against multiple methods found in the literature, for example, global methods (kmeans, Gaussian mixture model, global RX), local methods (kernel RX, standard RX, Fisher’s linear discriminant, and the local semiparametric detector discussed in [23]).(v)An analytical study of the twosample test framework, using the local semiparameter detector in [23] as the base detector. (vi)An analytical study of the multisample fusion test framework, using the semiparametric model embedded in the quasiglobal framework, which relaxes the prior knowledge requirement of target scales, hence, enabling scene anomaly detection tasks independently of the viewing perspective (nadir or forward looking).(vii)A study of the semiparametric model’s behavior in the presence of samples representing a mixture of two different material classes, which is the most common mixture case scenario given the sliding window sizes typically used in anomaly detection operational scenarios. (viii)A subsection specially devoted to discuss the motivation and give a qualitative assessment of the data transformation used in the twostep semiparametric anomaly detection of [23].
For convenience, a list of notations is available after the appendix.
This paper is organized as follows: Section 2 describes spatial data window models for HS sensing, a semiparametric model, and a hypothesis test; Section 3 discusses the sampling method, its probabilistic model, and introduces QGSemiP; Section 4 discusses performance of QGSemiP testing nadir and forward looking HS imagery, consisting of manmade objects in difficult natural background scenarios; Section 5 concludes the paper.
2. Problem Formulation, Data Transformation, and Semiparametric Model Description
The main goal of anomaly detection algorithms testing incoming imagery is to detect objects that are spectrally anomalous to its surroundings, yielding in the process a tolerable number of nuisance detections. In many surveillance and reconnaissance applications, it is desired that manmade objects are detected as being anomalous to the surrounding natural clutter. Both format and model of the data play a significant role in attempting to achieve the intended goal.
2.1. Remote Sensing Data and Data Format
Experiment was carried out on data sets from two distinct sensors and viewing perspectives: (i) the hyperspectral digital imagery collection experiment (HYDICE) sensor, from a nadir looking perspective; (ii) the SOC700 hyperspectral sensor, from a forward looking perspective. Data from these sensors will be referred to in this paper as nadir looking imagery and forward looking imagery, respectively.
The HYDICE sensor records 210 spectral bands in the visible to near infrared (VNIR: bands between 0.38 and 0.97 μm) and shortwave infrared (SWIR: bands between 1.0 and 2.50 μm). An extended description of this dataset can be found, for example, in Schweizer and Moura (2000). The results shown in this section for one data subcube are representative for other subcubes in the HYDICE (forest radiance) dataset. An illustrative subcube (shown as an average of 150 bands; pixels) is depicted in Figure 1 (Cube 1, top). We discarded water absorption and low signal to noise ratio bands; the bands used are the 23rd–101st, 109th–136th, and 152nd–194th. The scene consists of 14 stationary motor vehicles (targets near a treeline) in the presence of natural background clutter (e.g., trees, dirt roads, grasses). Each target consists of about pixels, and each pixel corresponds to an area of about square meters at the given altitude.
The forward looking imagery used for this work was recorded using the SOC700 VNIR HS spectral imager, which is commercially available off the shelf. The system produces HS data cubes of fixed dimensions by pixels by spectral bands between 0.38 and 0.97 μm. Figure 1 (Cube 2 through 4, bottom) depicts examples of the forward looking imagery, where each pixel in any of the three cube examples corresponds to the average of 120 band values. Data cubes Cube 2 and Cube 3 were collected during the summer of 2004 in California, USA; Cube 4 was collected during the spring of 2008 in New Jersey, USA. From actual ground truth, it is known that the scene in Cube 2 (see Figure 1) contains three motor vehicles and a standing person in the center of the scene (i.e., two pickup trucks to the left in proximity to each other, a man slightly forward from the vehicles in the center, and a sport utility to the right). Although the natural clutter in Cube 2 and Cube 3 is dominated by Californian valleytype trees and/or terrain at the same general geolocation, the data in Cube 3 depict a significantly more complex scenario. From actual ground truth, it is known that in Cube 3 a sport utility vehicle is inconspicuously deployed in the shades of a large cluster of trees. Portions of the shadowed vehicle can be observed near the center in Cube 3. Cube 4 was recorded in a wooded region at Picatinny, where (according to the available ground truth) a sport car is located behind multiple tree trunks, partially obscuring the vehicle; see Figure 1 (Cube 4).
The four data cubes in Figure 1 are independently displayed as intensity images after linear mapping the gray scale of each to the range 0–255. Pixel intensities shown in each individual surface is only relative to corresponding values in that surface; in other words, pixel values representing the same material (e.g., general terrain) may be displayed with different intensities in another surface. This fact explains, for instance, the difference in brightness between the same terrain under similar atmospheric conditions shown in Cube 2 and Cube 3. (The strong reflections from certain parts of the vehicles captured by the sensor in Cube 2 are not as dominant in Cube 3 because the vehicle in Cube 3 is in tree shades; the open field in Cube 3 is then the strongest reflector in the scene).
Next, we present a model of observed data using a sliding window in (a data cube). The data format of is shown in (1), where and index pixels in the spatial area , where and . Pixels within a fixed block of data in are indexed from the upper left corner of this block using relative to rows and columns in , where and . A representation of an window at pixel location in is as follows:
Before pixels within a block of data can be used by a detector, they need to be rearranged to a sequence of multivariate variables. The rearrangement is made by concatenating individual rows in the window in (1) as follows: where , , and , such that , and so forth until finally . Since a window can be anywhere in and represents any HS data cube, are considered random vectors and the entire set of spectra that constitutes will be observed through the window.
Using the assumption that the random vectors in are independent and identically distributed (i.i.d.), the distribution of data within the window, using (2), can be simplified to the following: where is a conditional multivariate probability density function (PDF) and is its parameter set; both and are typically unknown for real applications.
Normally, an anomaly detector requires two input sets of spectra and to perform its task on . The test sample is obtained at a fixed location in , as shown in (1) and (2); but, the source to obtain a reference sample will depend on the application, or viewing perspective.
For the nadir looking imagery, the most popular sampling method is to use pixel vectors surrounding a block of data to construct , where is constructed using the block of data to be tested. Notice that this sampling method is not suitable for the forward looking imagery because a priori knowledge of target scales in the imagery are required to set the size of a separation (guard) region between the block of data to be tested and locally surrounding samples.
For the forward looking imagery, the reference input set could be made available from a spectral library, or be randomly selected from the testing data cube. In either case, would be a rearranged version of a block of data. The latter is addressed in Section 3, where (in order to make such a test useful for real applications) is independently compared to multiple spectral sets ; fusing thereafter these comparison results in order to yield a decision (output) surface, as described in Section 3.
Both input sets and feed the anomaly detector.
As mentioned earlier, whether the viewing perspective is nadir or forward looking, mixtures of different materials in and/or in can significantly degrade anomaly detectors’ performances, as examined by Manolakis and Shaw in [2]. It is customary to assume normality in (3), or mixture of Gaussian distributions, but experience has shown (see [1]) that relaxation of these assumptions is needed.
We discuss next a twostep approach for anomaly detection, as introduced in [23], comprising of spectral transformation followed by the application of a univariate semiparametric model.
2.2. Data Transformation
This subsection discusses the employed method to transform spectra from their native multivariate space to a univariate domain, satisfying the univariate data requirement of the semiparametric model. We also provide justification for choosing the employed transformation and give some example cases to reinforce its use.
We consider a data transformation approach in two parts: (i) spectral differencing and (ii) angle mapping.
The rationale for (i) is twofold: (a) since HS samples are contiguous in the spectral domain (i.e., typical spectral resolution is of the order of 10 nanometers), more discriminant and independent information pertaining to a particular material type may be found between adjacent bands, which could augment the statistical power of detectors (this is specially the case in LWIR (longwave infrared) HS imagery where the radiance values observed in calibrated data (collected outdoor) are overwhelmingly influenced by the Planck’s blackbody equation [1]), as the Earth’s landscape (primarily, canopy and soil) behaves as a blackbody in the LWIR region of the electromagnetic spectrum (note: there is a whole topic of study in mathematical statistics on feature exploitation by zero crossing, which uses the output of random variable differencing as used herein for spectra, see [24]); (b) spectral differencing also puts significantly less weight in the importance of spectral magnitude (or bias) in anomaly detection applications, putting focus on the importance of spectral shape, instead. Spectral magnitude relates to the mean average of all measured radiance within a spectral sample, and spectral shape relates to the plotted curve of measured radiance as a function of wavelength. Existing classification and detection algorithms directly or indirectly exploit magnitude and/or shape of spectra in order to perform their tasks.
The benefit of (ii) is that it reduces the detection problem from a multivariate dimensional space to a univariate domain, avoiding the undesired problem of singularity issues during inverse estimations of covariance matrices. Singularity is known to occur when the sample size of a spectral sample is less than its number of spectral bands. Although there are approaches proposed in the literature to overcome this issue, the output of these approaches is not always desirable (see, for instance, [3]), since a typical HS sensor usually delivers between 120 and 1,000 bands, but targets may vary in number of pixels from as large as in the thousands to as small as 1 to 4 pixels, depending on the actual physical sizes of these targets and/or distance between the sensor and targets.
This paper is concerned about ensuring that the data transformation method can in fact reduce algorithm sensitivity to spectral magnitude (which can be achieved via (i)), so that a manmade object, for instance, deployed in tree shades can be considered as much as an anomaly relative to a dominant natural clutter background in the same way that the manmade object would have been if it were deployed, instead, out in the open field. All of this, while simultaneously preserving both a high sensitivity to spectral shape and the discriminant characteristics among spectra of distinct material types (which can be achieved via (ii)). If these requirements are matched, then the data transformation approach has achieved the overall desired goal—some example results are shown later in this subsection to reinforce those points.
Before those examples are presented, however, consider the following: let two spectra—having spectral bands—be available for comparison, and , where are nonnegative radiance values. Further assume that is twenty percent stronger in magnitude than . One way to formalize the disparity in magnitude is to let , , and . Two key points are worth noticing: (a) the difference would provide more discriminant and independent information than jointly using the highly correlated and , ditto for ; (b) the difference would remove from consideration the 20 percent stronger average magnitude of over , if the were used instead for comparison against . In essence, (a) replaces the need for using—for instance—PCA (principal component analysis) to uncorrelate the data, as it is commonly employed in the HS community [2]); from (b), if and are observations of the same material under different illumination conditions (e.g., spectrum representing a shaded object and spectrum representing the same object but not shaded), then the average magnitude difference between the two spectra would not play a role in the comparison test, which is desired. For those readers who may have some concerns about what may be lost in the process of transforming the data from multivariate to univariate in the context of anomaly detection, as it will be shown shortly, the loss—although difficult to quantify—is not relevant to the anomaly detection problem, since an effective anomaly detector is not expected to distinguish a material type that is spectrally similar to another material type. If the detector is designed to be that sensitive, it would likely also produce an unacceptably high false alarm rate due to expected within class variability of the same material types dominating the scene (for instance, the expected within class variability of tree clusters across the scene).
The twopart transformation approach is described next.
Borrowing from the discussion in Section 2.1, the transformation approach requires two sets of spectra, a reference set , where is calibrated spectra from a pixelsize location at the scene observed by the band sensor, during a particular set of atmospheric and illumination conditions; is the vector transposed operator; are radiance values, such that, adjacent radiance values are usually highly correlated; and is the sample size of ; and an independent test set that most likely has the same atmospheric condition captured in (4), but not necessarily the same illumination condition captured in (4), where all of the qualifying comments made for (4) also apply to with .
Letting denote the index that distinguish both sets , the magnitude of in (4) and (5) depends on the amount of illumination (e.g., shaded or nonshaded objects) and the illumination environment, this dependence can be significantly reduced by applying the first order differentiation—an approximation of the first derivative—to the columns of , or
Notice in (6) that and , and the sample means of and are, respectively, where is a column vector of dimension filled with real values of 1’s.
Denoting the columns of (which corresponds to the reference set) as , then the multivariate reference and test samples can be transformed to univariate reference and test samples through the following anglemapping formulas: where , , the operator using a column vector is the square root of (note: although we prefer to use a metric that yields a number having a geometric interpretation, the reader who is concerned about algorithm speed may replace the angle mapper metric with any other comparable metric of choice, for instance, the correlation metric [2] or the normalized dot product showed inside the operator in (8) and (9). The most important aspect about the employed metric is that it must be able to preserve the discriminant characteristics among spectra of different material types, as it will be shown shortly).
From (8) and (9), the following two univariate sequences are constructed: where (reference) and (test) are the input sequences to be used by the univariate based anomaly detection technique discussed in Section 2.3. Note that both samples end up having the same sample size, .
As mentioned earlier, the employed data transformation was specifically chosen to offer a number of desired properties, including reduced sensitivity to spectral magnitude and preservation of discriminant features among spectra of distinct material types. Regarding the latter, notice in (10) that, for the proposed transformation to work as advertised, when both multivariate samples and happen to be observations of the same material type, the component values in and are expected to be comparable and closer to 0° in the scale between 0° and 90°; however, when and are observations of distinct material types, the component values in and should be proportionally apart, where values in are expected to be closer to 0° while values in are closer to 90°. In addition, when the observation represents a mixture of two different material types, the proposed transformation should yield a univariate sample that is representative of the mixture.
Now, we will take a closer look at these expectations, using (8) and (9) to transform two sets of real HS spectra for a qualitative assessment, addressing first the most common sliding windowbased testing scenarios naturally occurring in anomaly detection problems: local testing, which requires a priori knowledge of object scales (range dependent), and global testing, which does not require a priori knowledge of object scales (range invariant). Figure 2 depicts these scenarios (in this context, local testing means comparing clustered spectra against neighboring spectra, while global testing means comparing clustered spectra against spectra located elsewhere across the same imagery).
Figure 2 illustrates the same datacube spatial representation under the twotest case scenarios, where for simplification the scene background is spatially dominated by only two distinct material types (Class A and Class B) and a third material (Class C—also distinct from Class A and Class B) illustrates the presence of an anomalous material relative to the background. Notice also that the two objects of Class C in the scene have significant size and shape differences, so that, the advantage of approaching the anomaly detection problem from a global rather than a local perspective can be pointed out.
The left hand side image in Figure 2 shows the overlaid sliding window locations of the standard approach to local anomaly detection in the HS research community (see [3]), where a sliding window consisting of an interior square (inner window) is concentrically located along with a larger square so that spectra observed through the inner window can be compared against spectra observed through the outer portion (outer window) of the larger square (i.e., the area of the larger square minus the inner window). Furthermore, the spectral set observed through the outer window is labeled as the reference sample, while the spectral set observed through the inner window is labeled as the test sample.
As the test window slides across the image one pixel location per algorithmic test, the labels P1 through P7 (left hand side image in Figure 2) highlight seven key test locations in the image. Table 1 summarizes a list of plausible anomaly declarations by an anomaly detector versus the desired declarations, using the known ground truth information about the scene.

The last column of Table 1 (desired anomaly) shows that out of the seven most distinct window locations for local testing, only two (P6 and P7) are desired, which may contradict declarations made by anomaly detection models currently found in the literature. In fairness, these detectors would be performing the job they are employed to do, as the plausible anomaly declarations shown in Table 1 are indeed correct in the strict sense. For instance, P2 shows a test between observations from Class A (test sample) and observations from a mixture (reference sample: Class A and Class B), while P6 shows a test between observations from the same material type (the intended anomaly: Class C). The reason, however, these declarations are not desired is that locations similar to P2 will likely accentuate edges as anomalies across the image, increasing the probability of false alarms, and locations similar to P6 will suppress the intended anomaly, decreasing the probability of correct detections. Location P6 also emphasizes the lack of robustness using the local testing approach to find anomalies in the scene, as a priori knowledge of object scales (consisting of the anomalous material) are not always available.
Regarding global testing shown in Figure 2 (the illustration at the right hand side), the window locations denoted as P1 through P5 depict distinct testing locations (observed test samples), while locations denoted as R1 through R3 depict distinct observations (fixed reference samples) that may have been randomly sampled from the image (prior to testing) and used to test every possible windowsized regions across the image, including P1 through P5. Table 2 shows the plausible declarations of anomalies versus the desired declarations, using the same ground truth information about the scene.

The last column of Table 2 (desired anomaly) in essence shows that any test that involves a mixture of classes and a component of that mixture should not be declared as an anomaly so that only truly anomalous material types (in this case Class C) relative to the background will be accentuated. An implementation scheme for the global testing approach, which requires fusion of declarations, will be discussed later. For now consider the following: the final declaration for any given window location is to retain the declaration NO, if it is there, out of all of the results produced by the particular testing location. Using this rule, locations P1, P2, and P3 would produce a final declaration of NO, and P4 and P5 would produce a final declaration of YES, as it would be desired by a global detection scheme. Notice also that P4 ensures that the circular object (Class C) would be accentuated as an anomaly, using the same test window size that would have detected the other anomaly of different scale shown in location P5; this is also desired.
Using real spectral samples, we will now qualitatively demonstrate the behavior of transformation output equations (10), when exposed to the key window positions shown in Figure 2. In anticipation, we would like to see that the data transformation will preserve distinction between samples of two different classes and produce results corresponding to a mixture of classes, when such a mixture is being observed. If successful, those results would give us some level of confidence that not much is being lost in terms of class distinction and that examples of mixtures would be shown as mixtures by the data transformation (demonstration of the desired anomaly detection declarations will be shown later, when the semiparametric model is discussed in Section 2.3. We also defer answering questions about what would happen when other possible window location cases appear, besides the ones shown in Figure 2, until discussion of results from testing real HS imagery in Section 4).
Figure 3 shows spectral transformation results using two sets of real spectra (Class A and Class B), where in this case spectral band differencing was not used as input for angle estimation among spectra and spectral means (see (6)), the actual individual radiance values (L’s) were used instead of the difference between adjacent radiance values (L’s). Figure 4 shows results using the same two sets of spectra but this time around using band differencing, following exactly the path from (6) to (10).
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(a)
(b)
(c)
(d)
Class A in Figure 3 consists of 200 real spectra representing a grassy area in an open field; Class B consists of 200 real spectra representing a cluster of tree leaves in the same geographic location of Class A. The employed HS sensor operates in the VNIR (visible to near infrared) region, producing 120 spectral bands per spectrum. To observe the behavior of the proposed data transformation as it processes spectra equivalent to the window location examples shown in Figure 2, Class A is denoted as the reference sample and processed with an independent test set representing Class A (200 spectra), it was also processed with a second test set representing Class B (200 spectra). The spectral sample means of both spectral sets are also shown in Figure 3.
Using both the reference and test samples from the same class (Class A) exemplifies window locations P1, P3, P5, and P6 in the local testing image in Figure 2; for the global testing illustration in Figure 2, it also represents the cases when spectra from R1 are processed with spectra from P1 and when spectra from R3 are processed with spectra from P3. Using spectra Class A as the reference and spectra from Class B as the test exemplifies the local testing locations P7 (i.e., the presence of an anomaly) and global testing location duals (R1, P4), (R1, P5), (R3, P4), and (R3, P5). The key point to notice in Figure 3 is the angle mapped plots shown in the bottom portion of the figure, where the plot at the left hand side shows that both the univariate reference sample (blue bubbles) and the univariate test sample (green bubbles) are comparable, preserving the fact that both samples belong to the same class. On the other hand, the angle mapped plot shown on the right hand side shows that the spectral transformation preserves the distinction between the samples from Class A and Class B. Both results are desired and would be passed as input to the employed detector for a decision.
The same experiment was held, but instead differentiating data in the spectral direction in order to check whether such a step would change the desired outcome from the spectral transformation shown in Figure 3. Figures 4(a) and 4(b) shows the band differencing means using (7) choosing samples from Class A to be both the reference and test sets (Figure 4(a)), and choosing a sample from Class A as reference and a sample of Class B as the test (Figure 4(b)).
The angle plots shown in Figures 4 (c) and 4(d) affirm that the differentiation step shown in (6) and (7), followed by output transformation from multivariate to univariate data, do preserve the lack of distinction between samples from the same class (Figure 4(c)) and a strong distinction between samples from Class A (reference) and samples from Class B (test), see plots in (Figure 4(d)). This example provides a direct assertion that what is lost due to the transformation is not relevant to the anomaly detection problem, since the goal of an anomaly detector is to find those material types that are truly distinct from the material types spatially dominating the background scene, as other factors will conspire against the detector’s effectiveness (e.g., mixture of distinct material classes, within material class variability). In other words, if a specific material type (target) is desired by the user to be considered as an anomaly relative to a natural environment but the target is not sufficiently distinct spectrally from the background, then an effective anomaly detector would most likely not declare the target as an anomaly. Ironically, as mentioned earlier, this is a virtue because a detector that is too sensitive in distinguishing two spectrally similar material types is also likely to produce an unacceptably high number of false alarms due to expected within class variability of dominant background material types in the scene.
Next, we would like to check how a spectral set, representing a mixture, is altered by the data transformation. For this demonstration, we constructed a reference sample by combining 100 spectra of Class A with 100 spectra of Class B, so that, the reference sample represented a mixture of two classes, and arbitrarily chose the test set to be represented by 200 spectra of Class B (the latter were independent from the ones used to construct the reference mixture). Figures 5(a) and 5(b) show both the mixture (reference construct) and the component of that mixture (Class B: test sample), and the resulting angle mapped plots using spectra without band differencing (Figure 5(c)) and spectra after band differencing (Figure 5(d)).
(a)
(b)
(c)
(d)
The key message from Figures 5(c) and 5(d) is that the data transformation with or without the band differencing step does preserve in the univariate domain the fact that the material class of the multivariate test spectra is a component of the mixture of classes represented by the multivariate reference set of spectra. In this particular case, the univariate reference sample (blue bubbles) clearly shows the presence of two classes (i.e., half of the observations has lower angle values and the other half has higher angle values), while the test univariate sample (green bubbles) shows observation values commensurate to the one of the mixture class component (lower angle values). Although this fact is not necessarily desired for anomaly detection (see, for instance, P2 and P4 in Figure 2 (local testing, left image), the data transformation at least does not seem to alter such a scenario involving a mixture, which is fine as long as the employed anomaly detector is designed to handle similar challenging cases.
In summary, the proposed data transformation does preserve distinctions or similarities that exist between spectral sets and also offers additional benefits, as highlighted earlier in this subsection. What is needed now is a model that is more effective finding anomalies, while simultaneously managing the expected negativeimpact nuisances naturally occurring in real operational scenarios (e.g., the presence of mixtures). A semiparametric model is described next for that purpose.
2.3. Univariate Semiparametric Model
Let two multivariate samples and be transformed to and , respectively (using, for instance, (4) through (9)), where and are unknown joint PDFs.
To simplify the anomaly detection problem using the transformed data, suppose the two random samples (real valued, not vector valued) (reference) and (test) are independent, consisting of i.i.d. (see mathematical notation list after the appendix) random variables controlled by unknown marginal PDFs and , respectively. Moreover, let and be related through the following model:
The model in (11)(12) is appealing for many reasons, consider the following examples.
If and are samples of the same distribution, then the assumed exponential relationship is reduced to unity so that (whether or is known or not), indicating that is not anomalous to . If and are samples of different distributions, then the exponential function will impose a significant weight relating both distributions, indicating that and are anomalous to each other. The key point here is that the latter outcome is invariant to and independent of whether the assumption of exponential relationship between the distributions is satisfied or not; that is, if the exponential relationship assumption is satisfied, then is anomalous to , but if this assumption is not satisfied, is still anomalous to . Since the application of interest only requires a determination of whether is anomalous to , satisfying the imposed assumption of (12) is inconsequential. So, a hypothesis test could just be designed to check whether in order to determine the presence of anomalies.
However, the relationship assumption in (12) plays a major role in the mathematical development of the statistical test and, more importantly for the application in context, it also plays an important role in determining whether a portion or the entire test sample () can contribute to the estimation of the reference distribution (). The latter is a subtle feature never before recognized in other areas of study, for example, pharmaceutical [22], where semiparametric models are more commonly employed for their utility. The implication of this subtlety is that when one of the samples represents a mixture of different material types and the other represents a component of that mixture, the model in (11)(12) allows both samples, through the assumed relationship, to contribute to the distribution estimation of the chosen reference sample; in essence stating that the assumption may also be partially satisfied as long as a portion of the test sample belongs to the reference distribution. This is manifested when the test produces an estimation of that is significantly closer to unity than it would produce when the test sample has absolutely no relationship with the reference sample. In the practical scenario this is the difference between having a detector capable or not of naturally handling samples representing a mixture, as it will be shown later in this subsection. As mentioned earlier, samples representing a mixture of material types are known to significantly increase the false alarm rate in operational scenarios for autonomous anomaly detection—they can produce dominant edges between regions of different material classes, later to be detected as false alarms [2].
Notice in (12) that, since is a density, must imply that , as merely functions as a normalizing parameter, following from the requirement that a PDF (in this case ) must integrate to unity, see PDF properties in [25]. Notice also that if then and must belong to the same population (i.e., ). Using this fact, a test statistic can be designed to test the following hypotheses:
In order to estimate , denote the union of and (combined data) by , and following the construction by Qin and Zhang in [21] and Fokianos et al. in [22], a maximum likelihood estimator of —the continuous cumulative distribution function (CDF) corresponding to the reference , can be obtained by maximizing the likelihood over the class of step CDF’s with jumps at the observed values . Accordingly, if , where denotes the differentiation operator, , the likelihood becomes,
One can now express each in terms of and and then substitute the terms back into the likelihood to produce a function of and only. When and are fixed, (15) is maximized by maximizing only the product term , subject to the constraints
Denoting , Qin and Zhang in [21] showed that and that the value of the profile loglikelihood up to a constant can be expressed as a function of and only, or
A system of score equations that maximizes (18) over is shown below [21],
The solution of the score equations yields the maximum likelihood estimators and consequently by substitution also yields an estimator of , or [21]
So, in summary, by using profiling, an estimator (20) for is attained in addition to score equations (19), where both the reference and test samples as shown in (14) are used to estimate (the reference PDF). This is only possible because the model in (12) implies that can be expressed in terms of . This feature allows this model to be robust when either or is bimodal or multimodal (representing a sample mixture) while the other represents a component of the same mixture—a key factor for handling transitions of distinct regions in the anomaly detection application, as it will be shown later.
Using results from Fokianos et al. in [22], the estimator has the normal asymptotic behavior, as the sample size tends to infinity , or where denotes the true parameter, is the variance (a scalor) using components from the combined sequence , , , and means converges to—in this case to a normal distribution having 0 mean and variance equals to .
Both estimators and are required to estimate in (21) via . Denoting the estimator of and using results from [22], where is shown in (20).
Normalizing the left side of (21) with , setting and squaring the final result, and using known properties of the family of Chi square distributions [25], one can test in (13) with the following expression: which has the Chi square distribution asymptotic behavior with 1 degree of freedom, . Under the idealized assumptions of model (11) and (12), a decision can be based on the value of in (23), that is, high values of reject hypothesis , declaring the presence of local anomalies (note: Regarding (23), as typical from any asymptotic expression, the larger the value of is the more accurate and precise the approximation of the expression will be. Since in this context coincides with twice the sample size of the reference sample, sample sizes typically used in anomaly detection applications will suffice (greater than 100, yielding greater than 200). Practitioners in statistics usually recommend that for univariate variables, asymptotic expressions should use at least thirty two observations, indicating that in this case should be at least 32 or greater).
The test statistic in (23) will be referred to from here forward as the test statistic or anomaly detector, which has two steps: data transformation and test statistic estimation.
2.4. Implementation Notes for the Standing Alone Detector
2.4.1. Function Maximization
In order to implement (23), we perform an unconstrained maximization of the log maximum likelihood function in (18), or alternatively one could minimize the negative version of (18), to obtain the extremum estimators and . We used one of the conventional unconstrained nonlinear optimization algorithms—the simplex method [26], which is available in Matlab software (Release 13) under the function name fminsearch. The simplex method is a direct search method that does not use numerical or analytic gradients. If is the length of , a simplex in dimensional space is characterized by the distinct vectors that are its vertices. For instance, in twospace, a simplex is a triangle; in threespace, it is a pyramid. At each step of the search, a new point in or near the current simplex is generated. The function value at the new point is compared with the function’s values at the vertices of the simplex and, usually, one of the vertices is replaced by the new point, giving a new simplex. This step is repeated until the diameter of the simplex is less than the specified tolerance. A limitation using such a method is that it may find a local extremum, so the choices of initial parameters may prove to be critical in some cases; however, we found in practice that by setting the initial values to , the method converges reasonably fast and works very well for all of the cases that we have observed, independently of whether anomalies were present or not in the tests.
2.4.2. Variance under the Null Hypothesis
The term in (23) is computed using , where and is shown in (20).
2.4.3. Decision Threshold
As mentioned earlier, using (23), high values of Z_{SemiP} reject hypothesis in (13), declaring that is an anomaly relative to . Using this detector as a standing alone unit, one could set a decision threshold based on the type I error, that is, based on the probability of rejecting given that is true. Using a standard integral table for the Chi square distribution, with 1 degree of freedom, find a threshold that yields an acceptable probability of error (e.g., 0.001).
2.5. Model Behavior in the Presence of Sample Mixtures
We show in this subsection the robustness of the semiparametric model toward an asymmetric test, that is, when a sample of a mixture is compared against a component of that mixture, which is found locally across the image in the form of spatial transitions. More specifically, we would like to show that (estimator for in (13)) is significantly closer to ZERO when, for two PDFs , and or (representing the union or a mixture of two PDFs) than when and . We illustrate this fact using simulated data and focusing on three specific case studies.
Case 1. versus .
Case 2. versus .
Case 3. versus .
According to [20], Case 2, which represents a transition of distinct regions in the image, appears some 20% of the entire image, or higher, as local patches are observed through a small moving window across typical images. Therefore, Case 2 is a major source of false alarms that could be avoided using a more robust model of the background than the typical models used in the target community. In anomaly detection applications, it is desired that the employed detector declares the presence of an anomaly for Case 1 but no anomalies for Cases 2 and 3; a declaration of no anomalies present is also desired, if Case 2 were reversed, that is, versus , although this case is not shown here, its results are consistent with Table 3.

The results shown in Table 3 were computed using 100 simulated random samples from an i.i.d. Gaussian distribution to represent the reference sequence and another 100 samples to represent the test sequence . A sequence representing a mixture of two classes consists of 50 samples for each class resulting in a total of 100 samples. The formulation and parameters used to generate these sequences are shown in Table 3 for different case studies, where the samples in row 2 (starting from the left upper corner in Table 3) simulates a local test between a genuine isolated object and its homogeneous surrounding background —Case 1, the samples in row 3 simulates a local test at a transition between two classes, where the test sample belongs to one of these classes—Case 2; the samples in row 4 simulates a local test within a homogeneous region—Case 3. Practical implementation details of the detector, which includes the estimation of , are shown in Section 2.3. The parameters were estimated by maximizing the log likelihood function, using an optimization subroutine initialized to (0,0), (0,0), and (0,0) for Cases 1, 2, and 3, respectively, so that convergence to a solution down to a tolerable error could be achieved using the subroutine.
Since the solution of the semiparametric model uses the union of samples and estimators and to estimate (which itself is an estimator of ), we also included in Table 3 the mean estimates , , and and the variance estimates , , ; where estimates variance from the solution of the semiparametric model using the union of samples and ; estimates variance using only the reference sample ; and is the sample variance using . The mean estimates were computed accordingly, see Table 3.
In reference to results shown in Table 3, recall that the null hypothesis is , and notice that the value of in Table 3 are significantly closer to zero for Case 3 (homogeneous region) and Case 2 (a transition of two different region) than for Case 1 (genuine local anomaly), where in Case 1 and do belong to different classes. Notice also that the disparity between the values of and for each case study also reflects how close is to zero. For instance, the disparity between , and for Case 1 is quite large compared to corresponding disparities for Cases 2 and 3.
The semiparametric model handles mixture by showing sensitivity on the estimation of and , such that when the test sample has strong statistical information about one of the subclasses in the reference sample, the semiparametric method responds by keeping both and relatively close to zero in order to maximize the log likelihood function in (18). The estimates affect directly the computation of , which in turn is used to compute .
To shed some light on the effect of and on , we present some results in Figure 2. The plots shown in Figure 2 (row 1) corresponds to Case 1; where, the plots on the left depict the values of as a function of index , for convenience we have marked where the sequences and are relative to each other within ; and the plots on the right depict versus . Likewise for Case 2 (row 2) and Case 3 (row 3).
Let us consider first Cases 3 and 1. As mentioned earlier, because of the semiparametric model in (11) and (12), the union of samples , where is the reference sequence, is used to estimate the reference PDF estimator . Circumstances when both samples belong to the same population (Case 3), the estimated cumulative weight for the test sample , that is, , is expected to be approximately equal to the estimated cumulative weight for the reference sample , that is, ; because the constraint was used in the profiling method to attain in terms of , we would expect both and to be near 0.500. Using simulated samples for the normal distributions and parameters shown in Table 3 for Case 3, we obtained and , which are very close to our expectations, see Figure 6. We interpret the actual values of shown for Case 3 in Figure 6(f) to be an indication that the semiparametric method regards the test sample to be as important in the estimation of as the reference sample is in that estimation, for the right justification, as both and happens to be governed by the same distribution.
(a)
(b)
(c)
(d)
(e)
(f)
In contrast, when both and belong to clearly different classes (see Case 1 in Table 3), the semiparametric method recognizes this difference and virtually shuts down the contribution of in the estimation of . The way it shuts down the contribution of is by maximizing the log likelihood function with values of and that allow the estimates of atomic exponential distortions to be relatively high when corresponds to components of .
And since are inversely proportional to , see (20), the contributions of corresponding components of in estimating are shut down as an indication of nonimportance to this estimation. The implication of this shut down is that the value of is relatively away from zero (relative to Case 3, for instance), which rejects the null hypothesis as desired. Figures 6(a) and 6(b) show the combined samples for Case 1 and the resulting cumulative weights for the test sample and for the reference sample , where denotes approximately equal to. The shutdown is reflected in the results for .
In reference to Case 2, where the information carried in is also contained in , as half of the random components in are governed by the same distribution of the random components in —see Table 3 and Figures 6(c) and 6(d), the semiparametric method recognizes this fact by holding the value of at near zero, and interestingly by allowing the contributions of estimating to be comparable to those contributions of the portion of that are similar to . In other words, since both and are used to estimate and half of the random components in are governed by the same distribution of all of the random components in and the other half are governed by a different distribution, the semiparametric method will not discriminate between and the portion of that is similar to . The outcome of this behavior is that, in order to maximize the log likelihood function, the values of and are kept statistically close to zero (in this case ) to reflect that the information in is important in the estimation of . The way this method explicitly shows this nondiscrimination is by yielding the cumulative weight using and the portion of that is similar to to hold half of the power, while the other half of the power is allocated to the portion of that is dissimilar to . In other words, using results from Figure 6(d) and this claim, we would expect a value of 0.5 (half power) for the cumulative weight using and the portion of that is similar to ; we obtained . And we would expect the other half of the power to be in the cumulative weight using the portion of that is dissimilar to ; we obtained . Notice that adding 0.5007 and 0.4993 yields exactly 1.0 as expected because the constraint was used in the profiling method in order to attain a representation of in terms of free parameters and .
A conclusion that we can draw from this discussion is that the semiparametric method will indirectly compare two samples (reference) and (test) by assuming that the distribution of and the distribution of are related (exponentially) to each other and that, therefore, the information content in both samples can be used to estimate one of these distributions, in particular, .
We found this indirect comparison method to be highly sensitive to the cumulative contribution of estimating . This sensitivity has an important practical value in the anomaly detection application for three reasons.
First, if , both samples and are expected to equally contribute to the estimation of , which in fact would improve that estimation due to the increase of sample size. Result: would be labeled as not being anomalous to in this application.
Second, if , sample is not expected to be allowed to contribute to the estimation of , thus, this estimation would solely rely on the cumulative contribution of . Result: would be labeled as being anomalous to in this application.
And third, if is a mixture of densities, such that, is a component in that mixture, we found that the contribution of would not be suppressed, but proportional to the weight of in that mixture (see Figure 6). Result: would be labeled as not being anomalous to in this application.
This behavior of the semiparametric test statistic is highly desired in the target community because it conforms with be behavior of a human analyst performing the same task in the target application, and it separates this method from existing ones performing the same task.
3. Quasiglobal Semiparametric Approach
The semiparametric test statistic is used as the primary scoring method for the quasiglobal anomaly detection approach. As mentioned in Section 1, the quasiglobal anomaly detection approach was designed to tackle the forward looking anomaly detection problem, although the application of the quasiglobal algorithm using nadir looking imagery are also considered in Section 4.
We start by describing the background sampling method and its probabilistic model, followed by description of the quasiglobal algorithm framework using the semiparametric test statistic.
3.1. Sampling Method and Its Probabilistic Model
Assume that target pixels are present in the spatial area of a HS data cube , denote the total number of target pixels in , the probability of a pixel in belonging to the target, and the relationship , where (or all pixels in ) (in most applications is unknown, and if multiple targets are present in the imagery, will be the total number of all pixels belonging to all targets present in the imagery; also, these targets may or may not have the same material type). In order to represent the unknown clutter background in the imagery, let blocks of data—all having a fixed small area —be randomly selected from the area, see one of the data cubes in Figure 1. In theory, for and using the assumption that target pixels in are disjoint and randomly located across the imagery area (note that in practice, this assumption is not satisfied when targets are present in the scene, but we will use this assumption to establish a guideline), the probability that at least one block of data has a target pixel is where is the binomial density function [27], given parameters and , and is the number of blocks of data containing a target pixel, or (symbols and denote given parameters and the factorial operator, resp.).
For convenience, we will refer to as the probability of contamination and the number of contaminated blocks of data.
The implementation of this contamination model to the autonomous background sampling problem requires that each one of the blocks of data be regarded as an independent reference set representing clutter spectra, where is a rearranged sequence version of the block of data having spectra. By necessity, must be significantly greater than one—for statistical purposes—but yet significantly smaller than (e.g., ) in order to reasonably satisfy the assumption that a block of data has an unit area at the imagery area. A contaminated block of data, then, will be treated qualitatively as a block having target pixels covering a large portion of the block’s area (e.g., greater than 0.70). In addition—when targets are present, since pixels representing a single target are expected to be clustered in the imagery, the assumption that each target pixel is randomly located across the imagery area will be ignored. Using (24), while ignoring the nonclustered target pixel assumption, implies that the probability of contamination will be overestimated, as blocks of data are less likely to be randomly selected from the same cluster of target pixels (for the autonomous background sampling problem, it is more conservative to overestimate the probability of contamination than to underestimate).
Figure 7(b) shows a plot of the probability of contamination versus , for two values of (0.1 and 0.2). It is highlighted in the plot in reference that, for instance, if parameters are set to then. Notice that for , if target pixels are present but cover less than of the imagery area, = 0.90 is overestimated by two fronts: (i) pixels from a single target are not randomly spread across the imagery area, but clustered, and (ii) the cumulative number of target pixels covers less than 0.10 of the imagery area. So, (24) provides an upper bound (conservative) approximation of the probability of contamination, given parameters and .
(a)
(b)
(c)
Figure 7(b) also shows the tradeoff between having a larger number of spectral sets (increasing ) in order to adequately represent the clutter background, which is desired, and the cost of increasing probability of contamination, which is not desired (in particular, contamination implies that once target pixels are randomly selected by chance from the imagery area, they will be used by a detector as reference set to test the entire imagery, in which case targets would be suppressed).
Since the presence of target pixels in the imagery is unknown a priori, finding a way to decrease the probability of contamination becomes a necessity. In order to decrease this probability, using an adequately large and a sensible value for , we propose to independently repeat the random sampling process described in this subsection number of times. Figure 7(c) illustrates the outcome of repetitions. If we denote the probability of contamination of the random sampling process (or repetition) as , , for a fixed and , note that each and, since and these processes will be repeated independently from each other, the overall probability that all of the processes will be contaminated with at least a contaminated block of data will decrease as a function of increasing , or
Figure 7(c) plots as a function of increasing , for and . Taking, as an example, the curve in Figure 7 corresponding to using in (24), notice that for , decreases to virtually zero. This outcome implies that at least one out of the processes has an extremely high probability of not being contaminated, given that and target pixels do not cover significantly more than 10% of the imagery area .
3.2. Algorithmic Fusion
A framework for the quasiglobal semiparametric anomaly detection algorithm can now be developed using (i) the repeated random sampling model discussed in Section 3.1 (needed to characterize the unknown clutter background in the imagery), (ii) the semiparametric anomaly detector discussed in Section 2.3 (needed to test reference data against the entire imagery), (iii) a way to fuse the results from testing randomly chosen blocks of data against the entire imagery using small windows (this will produce a 2dim output surface per repetition), and (iv) a way to fuse independently produced 2dim output surfaces into a single 2dim decision surface.
Start by letting a HS data cube be available for autonomous testing. Let also blocks of data be randomly selected from the ’s spatial area and used as a reference library set representing clutter background spectra, where is a rearranged sequence version of the block of data having spectra, where are dim column vectors. Let be the rearranged version of a window of test data at location in —see (1) for column vectors ; first, we would like to automatically test against all , and produce a single output (scalar) value from these test results.
For better clarity in this subsection, we repeat the data transformation steps discussed in Section 2.2, but with the inclusion of index and letting , where is the radiance value in , , and and, denoting the columns of as , And equivalently for —the rearranged version of a window of test data at location in and the columns of in (28)—, one has From (30) and (31), the following two univariate sequences will be used as inputs to the detector: where .
Following the discussion that led to (33), results from the semiparametric test statistic can be used (or fused) as following: where , , , and index the leftupper corner pixel of an window in ; using (32) and (33), and estimates and can be obtained by replacing with in (18) and then performing an unconstrained maximization of ; for this paper, a standard unconstrained minimization routines available in Matlab software (i.e., fminsearch) was used, setting initial values to .
Notice that if are placed in ascending order and denoted by , such that , then according to (34)—the lowest order statistics (see, for instance, [28]). This fact will be used in estimating the asymptotic behavior of the overall quasiglobal semiparametric algorithm, shown in the Appendix.
Notice also that if is significantly different from all , then all of the corresponding results in (35) would yield high values; this outcome would guarantee the lowest order statistics in (34) to hold a high value. Otherwise, if is significantly similar to at least one of the samples in , then at least one of the corresponding results in would yield a low value; this low value would be assigned to , according to (34).
Since it is unknown a priori whether target spectra are present in , the entire needs to be tested. In order to achieve it, all must be computed according to (34), producing a 2dim output surface . The index has been introduced to results produced by (34) in order to denote the repetition number discussed in Section 3.1, yielding
The computation leading to (39) will be independently repeated number of times in order to significantly reduce the probability of contamination (i.e., samples of targets labeled as clutter background). Applying a cutoff threshold to all pixel values in , such that, pixel values that are above or equal to the threshold become 1 and values below become 0, yielding a binary surface (a probabilistic cutoff threshold for this algorithm is presented in Section 4.3). The binary surfaces are fused using the logic OR operator , leading to the algorithm’s final output surface , or
Figure 8 illustrates (39) and (40) through a repeated random sampling and result fusion diagram. The diagram shows M independent paths, where, in each path, independent blocks of data are randomly selected from the input HS data cube so that the entire data cube can be tested against these blocks of data, using a testing window of the same block size. Each path, which is indexed by , produces a 2dim output surface , where, at the backend of the diagram, all passes through a logical OR operator on a pixelwise fashion (i.e., only the pixel values at the same pixel location are logically OR’ed), producing a final 2dim surface , as shown in (40).
The motivation and functionality shown in Figure 8 are summarized as follows: for a given repetition , assume that the realization of