Abstract
Thanks to the rapid development of hyperspectral sensors, hyperspectral videos (HSV) can now be collected with high temporal and spectral resolutions and utilized to handle invisible dynamic monitoring missions, such as chemical gas plume tracking. However, using such sequential largescale data effectively is challenged, because the direct process of these data requires huge demands in terms of computational loads and memory. This paper presents a keyframe and targetdetecting algorithm based on cumulative tensor CANDECOMP/PARAFAC (CP) factorization (CTCF) to select the frames where the target shows up, and a novel superresolution (SR) method using sparsebased tensor Tucker factorization (STTF) is used to improve the spatial resolution. In the CTCF method, the HSV sequence is seen as cumulative tensors and the correlation of adjacent frames is exploited by applying CP tensor approximation. In the proposed STTFbased SR method, we consider the HSV frame as a thirdorder tensor; then, HSV frame superresolution problem is transformed into estimations of the dictionaries along three dimensions and estimation of the core tensor. In order to promote sparse core tensors, a regularizer is incorporated to model the high spatialspectral correlations. The estimations of the core tensor and the dictionaries along three dimensions are formulated as sparsebased Tucker factorizations of each HSV frame. Experimental results on real HSV data set demonstrate the superiority of the proposed CTCF and STTF algorithms over the comparative stateoftheart target detection and SR approaches.
1. Introduction
Hyperspectral imaging has been one of the most popular research fields due to its ability of identifying the materials from very high spectral resolution and coverage. In the last decade, researchers focused on the processing and application of hyperspectral image (HSI), such as denoising [1, 2], feature extraction [3, 4], classification [5–11], detection [12–14], and superresolution (fusion) [15–18]. In this section, researching of the latter two fields which are related to this paper will be briefly introduced.
Basically, target detection is a kind of binary classifier with the purpose of labeling every image pixel as a target or background. In HSIs, pixels with a significantly different spectral signature from their neighboring background pixels are defined as spectral anomalies. Anomaly detectors are statistical or pattern recognition methods used to detect distinct pixels that differ from the background. It is worth mentioning that, in spectral anomaly detection approaches [19–22], such as ReedXiaoli (RX) algorithm [23], no prior information of the target spectral signature is assumed or used. However, we focus on the detection of invisible gas plumes in this paper, and the prior knowledge of the desired targets spectral characteristics is assumed to be known. In such cases, signaturebased target detection algorithms are presented instead of anomaly detection. In these algorithms, the spectral characteristics of the target can be represented by a target subspace or a single target spectrum [24]. Likewise, the characteristics of background can be statistically expressed by a Gaussian distribution or a subspace defining the local or whole background statistics. As for this category, the matched subspace detector (MSD) method [25] is one of the most typical algorithms. In the MSD, the target pixel vectors are represented by a linear combination of the target spectral signature and the background spectral signature, which stand for the subspace target spectra and the subspace background spectra, respectively. Then, the generalized likelihood ratio test (GLRT) is applied, using projection matrices associated with the background subspace and the targetandbackground subspace. At last, the comparison between the output of GLRT and a preset threshold makes a final decision about whether the target is absent or present. From pixel level to subpixel level, a single pixel may contain several distinct pure materials (endmembers), also known as the mixed pixel. The presence of mixed pixels is a tough problem caused by the low spatial resolution of HSIs. Accordingly, some unmixing approaches [26–28] have been designed to compute fractional abundance of endmembers. In [29], a hyperspectral unmixing approach based on constrained matrix factorization (CMF) was proposed. Unlike conventional methods, each column vector of endmember matrix is represented as a nonnegative linear combination of pixel spectra. After endmember matrix and the corresponding fractional abundance matrix are obtained by solving optimization problems, abundance map of the target endmember shows the detection result.
As mentioned before, the HSIs often suffered from low spatial resolution. To acquire an HSI, the number of sun photons in each spectral band has to be greater than a minimum value, and the number of spectral bands is so huge in an HSI that the spatial resolution has to be sacrificed. Therefore, superresolution (SR) techniques have aroused great interest in the last decade. Generally, the SR methods of HSI can be classified into four categories: Bayesian [30], component analysis [31], deep learning [32], and sparse representation. Due to the limited length of this paper, we focus on the introduction of sparsebased algorithms. In such HSI superresolution schemes, images are expressed by dictionaries and corresponding sparse coefficients. On the basis of the spatialspectral sparsity in the HSIs, the dictionaries and sparse coefficients are estimated jointly [33]. Huang et al. [34] introduced a fusion method of multispectral images (MSIs) with different spectral and spatial resolutions based on sparse matrix factorization. Akhtar et al. [35] presented an MSIHSI fusion approach using sparse coding and Bayesian dictionary learning. Moreover, some algorithms based on matrix factorization [36–38] or unmixing [39] can also be regarded as the sparse representation schemes because the source images are decomposed into some basis and the corresponding coefficients. Yokoya et al. proposed a couple nonnegative matrix factorization (CNMF) [40] algorithm, where the unmixing techniques are employed to yield the endmember matrices and the highresolution (HR) abundance matrices of HSI. In [41], Lanaras et al. suggested a joint scheme to solve the spectral unmixing problems. In [42], Zhang et al. fused the lowresolution (LR) HSI and HRMSI based on the group spectral embedding and lowrank factorization.
However, the matrix factorization based schemes cannot fully exploit the spatialspectral correlations of the HSIs. It is believed that considering HSIs as tensors is better because an HSI can be naturally expressed as a thirdorder tensor. In this paper, a detection algorithm based on cumulative tensor CP factorization (CTCF) is proposed. The sequential HSV data is expressed as a fourdimensional (4D) cumulative tensor; factor matrices are obtained by decomposing original 4D tensor using CP factorization. When a new frame presents and is added to the time dimension of the original tensor, this 4D cumulative tensor is updated together with the factor matrices. Consequently, a CP tensor approximation of the new frame is computed by updated factor matrices and the fitness between the new frame and the approximation is calculated. After comparing the fitness to a preset threshold, we can make the decision that whether the new frame continues to be used to update the cumulative tensor or the new frame is the keyframe where the target presents. CTCFbased method exploits not only the spatialspectral correlations of the HSIs by applying tensor model, but also the temporal correlation between adjacent frames of the HSV.
On the other hand, tensorbased analysis has also been widely used in HSI superresolution [43–45]. To the best of our knowledge, most of the SR algorithms enhance spatial resolution by fusing highresolution MSI (HRMSI) and lowresolution HSI (LRHSI) from the same scene. Unfortunately, it is less practical in real application. In some situations, LRHSI is the only data we have rather than both. In this paper, we suggest an SR algorithm using sparsebased tensor Tucker factorization (STTF). Inspired by the Tucker factorization and its related works, the HSV frames are represented as thirdorder tensors, which are approximated by the multiplication of the dictionaries along three dimensions (i.e., the dictionaries of the height mode, the width mode, and the spectral mode: they are named “three modes dictionaries” for short in the rest of this paper) and a core tensor. Then, the problem of SR is transformed into the estimations of the three modes dictionaries and estimation of the core tensor. Specifically, the spatial information is represented by the height mode dictionary and the width mode dictionary, the spectral information is represented by the spectral mode dictionary, and the correlations of the three modes dictionaries are modeled by the core tensor. HSIs are generally selfsimilar so that a sparse prior can be imposed on the core tensor; then, the estimations of the core tensor and three modes dictionaries are formulated as the STTF of the LR and HR HSV frames. In the iteration of STTF, core tensor and dictionaries are all updated and accurate estimates are yielded when convergence is achieved.
The remainder of this paper is organized as follows. Section 2 presents the materials and methods, including the basic notations and preliminaries of tensor and tensor factorization, the proposed CTCF approach for keyframe detection, and the proposed STTF method for keyframe superresolution problem. In Section 3, experimental results on real HSV and the discussions are given. The paper is summarized in Section 4 with ideas for future work along the path presented here.
2. Materials and Methods
2.1. Tensor Notations and Preliminaries
2.1.1. Tensor Notations
In this paper, vectors are denoted by boldface lowercase letters , matrices are denoted by boldface capital letters , and tensors are denoted by bold Euler script letters . Generally, a tensor is a kind of multidimensional array, denoted by . Here, tensor is an Nthorder tensor and is the dimension of the nth mode. Obviously, vectors are firstorder tensors and matrices are secondorder tensors. We use to denote the moden fiber, which are vectors yielded from tensor by changing index with other indexes fixed. The moden unfolding matrix of tensor is generated by placing all the moden fibers in a matrix as columns, denoted by .
An important calculation between a tensor and a matrix is the nmode product, which is defined aswhere and . The elements of are denoted by , so the elements of are computed by
Given the definition of nmode product, we can obtain
For continuous multiplication of a tensor and matrices in distinct modes, the result is not affected by the multiplication order, described by
If the modes are equivalent, equation (4) is transformed into
Suppose that is a collection of matrices; we define tensor as
The matricization form of equation (6) is presented bywhere () and () are vectors yielded by arranging the mode1 fibers of the tensors and . The Kronecker product is denoted by symbol “.”
Moreover, given the tensor , represents the norm which equals the number of nonzero elements of , denotes the norm, and denotes the Frobenius norm.
The definition of rankone tensor is introduced at last. The Nthorder tensor is rankone if it can be written as the outer product of N vectors, i.e., . The symbol “” denotes the vector outer product [46].
2.1.2. Tensor Factorizations
CANDECOMP/PARAFAC (CP) factorization decomposes a tensor into a sum of component rankone tensors [47]. For example, given a thirdorder tensor , we may formulate it aswhere R is a positive integer and , , and (). The element of tensor can be computed by
CP factorization is illustrated in Figure 1.
The factorization result can be expressed by factor matrices of three dimensions. Factor matrices refer to the combination of the vectors from the rankone components; i.e.,
Following [48], the CP model can be concisely represented as
On the basis of factor matrices, the moden unfolding matrices () of can be represented aswhere the symbol “” denotes the KhatriRao product [49]. In this way, loss functions can be modeled as the approximation of the moden unfolding matrices; then the factor matrices of CP factorization can be obtained by solving the corresponding optimization problem.
Tucker factorization is another popular tensor decomposing approach [50]. It decomposes a tensor into a core tensor multiplied by a matrix along each mode. Thus, in the same case as above where , the factorization can be described aswhere , , and are factor matrices which can be regarded as the principal components in each mode. Therefore, Tucker factorization is a form of higherorder principal component analysis (PCA). Tensor is the core tensor and its elements stand for the correlation level between the different components. Similar to (11), the Tucker model can be concisely represented by . Elementwise equation (13) can be represented as
The Tucker factorization is illustrated in Figure 2.
2.2. The Proposed CTCFBased Detection Method
In this subsection, the optimization problem of updating factor matrix is presented, followed with the proposed cumulative tensor CP factorization (CTCF) of thirdorder tensors. It is then extended to Nthorder tensors. The CTCFbased detection method is described in the end of this subsection with its flowchart shown in Figure 3.
2.2.1. CP Tensor Approximation by Factor Matrices
Similar to equation (12), the moden unfolding matrix of can be approximated by factor matrices; i.e.,where the factor matrices are obtained by CP factorization. The corresponding loss function is
The Alternating Least Squares (ALS) algorithm is often applied to obtain factor matrices by solving the following optimization problem:
When the tensor updates, the new tensor can be computed by the updated factor matrices which are given by equation (17).
2.2.2. CTCF of ThirdOrder Tensor
Generally, an image is a secondorder tensor; then sequential images form a thirdorder tensor, i.e., a video, adding a temporal dimension on two spatial dimensions. When a new video frame presents and is added to the time dimension of the original tensor, it is defined as a threedimensional (3D) cumulative tensor. With the number of new frames increasing, the 3D cumulative tensor updates frame by frame.
In conventional CP tensor approximation, whenever a new frame of image is added in the time dimension, ALS algorithm needs to be reused to approximate the new cumulative tensor, which is a time consuming process. In addition, the temporal correlation between neighboring frames is not exploited in the decomposition of the cumulative tensor. This paper proposes CTCF to update the CP factorization of original cumulative tensor, obtain the updated factor matrices, and approximate the new frame.
Given an original 3D cumulative tensor , the result of CP factorization is denoted by . When a new tensor is added in the time dimension, the updated cumulative tensor is , of which the CP factorization appears as . We focus on obtaining , , and by updating , , and .
The updating process is operated in an alternating way. Firstly, temporal dimensional factor matrix is computed while factor matrices and are fixed; i.e.,where is divided into two terms. For and are fixed as and , the first row of (18) will be minimized if . To minimize the second row, according to (12), the optimal solution of is , where the symbol “” denotes Moore–Penrose pseudoinverse of the matrix [51]. So, can be updated by adding which is represented by
Secondly, factor matrix is computed while factor matrices and are fixed. Similar to 16, the loss function of estimating is written as
Derive with respect to ; then, we have
To simplifyequation (21), denote and ; thus, when , we have . According to [47], can be rewritten as
For computing , we also divide and into two terms; i.e.,
Since are fixed as , the first term of equation (23) contains only the information of original tensor, which can be expressed byso,equation (23) is rewritten asHence, can be updated from using mode1 unfolding matrix of and factor matrix mentioned above . Generally, is initialized by , which is a small front part of , and updated iteratively by (25). Analogously, the update process of can be represented by
The update of may be summarized as
Finally, the update of factor matrix may likewise be expressed bywhere and .
To make the process clearer, the proposed CTCF of thirdorder tensor is summarized by Algorithm 1.

2.2.3. CTCF of NthOrder Tensor
On the basis of Section 2.2.2, we try to extend CTCF to higherorder tensors. Suppose an Ndimensional cumulative tensor where the last dimension is temporal dimension. The CP factorization of is represented as . When a new tensor is added in the time dimension, the updated cumulative tensor is , of which the CP factorization is denoted by .
Similar to Section 2.2.2, temporal dimensional factor matrix is firstly updated with other matrices fixed. Like 17, the optimization problem of estimating is formulated by
We also separate original part from new added part; i.e.,
The original part is minimized by fixing the first factor matrix and the new part is updated by .
The updates of nontemporal dimensional factor matrices () may refer to the ones of factor matrices and in Section 2.2.2. The loss function of estimating is the same as 16. Let and introduce matrices and ; the update of may be summarized aswhere and .
2.2.4. CTCFBased Detection Method
In HSV, the sequential data is expressed as a 4D cumulative tensor; the temporal dimension increases with new frames are added in. Whenever a new frame presents, the results of original cumulative tensor CP factorization are updated to obtain the factor matrices of the new cumulative tensor, and the CP tensor approximation of the newly added frame is obtained at the same time. If the target is absent, the CP tensor approximation will lead to a small error, since the background information is similar between adjacent frames. On the contrary, if the error is large, the target is likely to present. We define the fitness between the new frame and its approximation in 34. If the fitness is smaller than the threshold, the target is supposed to appear in the new frame. Otherwise, the new frame is added in the temporal dimension and used to update original cumulative tensor.
The original 4D cumulative tensor is denoted by ; denotes the frame number of initial video. The factor matrices of four dimensions are represented aswhere , , , and and denotes the number of component rankone tensors in CP factorization. When a new frame is added in the temporal dimension of original 4D cumulative tensor, the 4D cumulative tensor is updated and denoted by . The factor matrices of are expressed bywhere , , , and . Based on Section 2.2.3, we estimate and obtain the approximation of and , where . Actually, it is the specific case when .
We define the fitness (, ) as
If the target does not appear, the approximation error is small and the result of fitness is large. Given a preset threshold , when , i.e., the fitness is larger than , we decide that the target is absent. Then, the nontarget frame is added in temporal dimension and the updated 4D cumulative tensor becomes the new original 4D cumulative tensor, which can be expressed as
If the target appears, the approximation error is large and the fitness is smaller than . The residual of and is the approximation of the target tensor; i.e.,
The target of each frame will be shown in 2D form by taking the maximum value of every spectrum. In this way, the proposed CTCFbased detection method can extract not only the keyframes where the target presents, but also the approximate region of target in every keyframe. The flowchart of the proposed method is shown in Figure 3. In Section 3, experiments on real HSV data are conducted and the proposed method is compared with some representative techniques.
2.3. The Proposed STTFBased SuperResolution Method
In Section 2.2, we present an approach to detect the frames where the target appears in HSV and the approximate region of the target. However, as discussed in Section 1, there has to be a tradeoff between spectral resolution and the spatial resolution in HSI imaging systems [52]. The spatial resolution is always low since high spectral resolution is required in HSIs and HSV. So, we are interested in improving the spatial resolution of targets after the detecting process. Instead of fusing HRMSI and LRHSI, we try to handle the target SR problem by what we have got, which is more practical in real cases.
2.3.1. Problem Formulation
In this subsection, HSIs are represented as 3D tensors with three indexes (), which stand for the height, width, and spectral modes. denotes the HRHSI and the LRHSI is denoted by , where and . The goal is to estimate from .
There are two significant characteristics of HRHSIs [53]: the first one is that spectral vectors can be well approximated in low dimensional subspaces, and the second one is that HSIs are spatially selfsimilar. This means that sparsity exists in both spectral and spatial dimensions. Inspired by sparse representation [54], the low dimensionality in spectral domain gives the possibility to form a spectral mode dictionary with few nonzero atoms; the selfsimilarities in spatial domain guarantee the sparse representations of the height and width modes with spatial dictionaries and . In this way, the conventional Tucker factorization is transformed into the multiplication of the core tensor and three modes dictionaries. The factorization is illustrated in Figure 4. The HRHSI is represented aswhere , , and . The variables , , and denote the atoms (i.e., the number of columns) of , , and , respectively. The core tensor contains the coefficients of over three modes dictionaries. We can see that 37 incorporates the information of separated modes into a unified framework.
The LR keyframe of HSV can be seen as the spatially downsampled version of HRHSI , which is written aswhere and are downsampling matrices of the height and width modes. Substituting 37 into (38), is represented bywhere and denotes the downsampled dictionary of height and width modes. To recover , we focus on estimating the dictionaries , , and and the core tensor .
2.3.2. The Proposed STTFBased SR Algorithm
Since is a downsampled version, recovering from is a typical inverse problem, which is badly illposed. So, some prior knowledge of is needed to regularize the superresolution problem. In HSI processing, the spectral sparsity is a widespread regularizer applied to solve varieties of illposed problems [55–58]. In such regularization, spectral vectors are linearly combined by a small quantity of different spectral signatures. However, these schemes only take advantage of the sparsity of the spectral domain. In the proposed algorithm, taking into account the HSI selfsimilarity, sparsity regularization is extended to the spatial domain by exploiting the sparsebased tensor Tucker factorization (STTF). In STTF, the HRHSI performs a united sparse representation of the core tensor and three modes dictionaries.
On the basis of equation (39), the HSV frame superresolution is formulated as a constrained leastsquares optimization problem:where represents the Frobenius norm and denotes the number of maximum nonzero elements of . Because of the norm constraint, equation (40) is nonconvex. To make the optimization processable, the norm is replaced by the norm and 40 is transformed into an unconstrained version:where is the parameter of sparse regularizer. Equation (41) is also nonconvex, and the solutions of , , and and are not unique. Nonetheless, if we focus on only one variable with other variables fixed, the objective function in equation (41) is convex. Inspired by [59, 60], equation (41) can be solved by proximal alternating optimization scheme, which is guaranteed to reach convergence in a particular situation. Concretely, , , , and are updated iteratively bywhere denotes the previous estimation in the last iteration and denotes a positive number. Equation (41) defines the object function . The optimizations of , , , and will be presented detailedly in the appendix. The conjugate gradient (CG) method [61] and the alternating direction method of multipliers (ADMM) [62] will be used in the optimizations.
2.3.3. Initialization of the Proposed Method
Since the optimization problem in (41) is nonconvex, the solution would result in poor local minima if we set the initialization carelessly. In this paper, we initialize the spatial dictionaries and from and dictionaryupdatescycles KSVD (DUCKSVD) [63]; this method can promote sparse representations. Then, initialization of spectral dictionary is accomplished by simplex identification split augmented Lagrangian (SISAL) algorithm [64]; this approach can efficiently identify a minimum unit that contains the spectral vectors.
The proposed STTFbased SR algorithm is summarized in Algorithm 2.

3. Results and Discussion
3.1. Experimental Data Set
To highlight the advantages of HSIs, we choose invisible gas plume to be the target. The proposed algorithms can be extended to other types of data reasonably. In this section, the HSV data set is acquired by the infrared imaging spectrometer “HyperCamLW.” Sulfur hexafluoride (SF_{6}) is chosen to be the target, since it is a kind of odorless and colorless gas plume with a distinct absorption peak in LWIR range. The HSV data set consists of 60 infrared hyperspectral frames with the size of . The imaging interval is 4.8 s, and the wavelength of the data ranges from 7.8 μm to 11.8 μm.
In SR method, only the middle pixels are used in the experiment (specifically, column 71 to column 198) for reasons connected with the algorithm process. And we remove the spectral band 41–127 because of water vapor absorption and extremely low SNR. At last, the size of input LRHSI is .
3.2. Compared Methods
For CTCFbased detection method, we compare it with two representative methods: MSD (matched subspace detector) [25] and CMF (constrained matrix factorization) [29]. For STTFbased SR method, we compare it with three stateoftheart algorithms: bicubic interpolation, sparse representationbased SR method [54], and sequence informationbased SR method [65].
3.3. Qualitative and Quantitative Metrics
For detection methods, receiver operating characteristic (ROC) curves [66] are used to evaluate the performance. Generally, a detector outperforms another one if the area under its ROC curve is larger [67]. As suggested in [68], the area under the ROC curve (AUC) is also calculated as a measure of performance of these detection methods. Usually, a better detector gets a higher AUC value.
For SR algorithms, since we directly process the LRHSI, there is no original HRHSI (i.e., the ground truth) for reference. Thus, some popular quantitative metrics are not available, such as RMSE (rootmeansquare error) [69], PSNR (peak signal to noise ratio), and SAM (spectral angle mapper). In this section, entropy and average gradient are introduced to evaluate the performance of SR methods.
3.3.1. Entropy
Superresolution aims to introduce more useful information into images, so we may measure the performance of SR methods by calculating the contained information in the experimental results. The entropy is indicated as
The probability of a pixel in the image is denoted by and denotes the grey value range . The larger the entropy value of the image, the richer the information contained in the image.
3.3.2. Average Gradient
Another assessment to measure the performance of superresolution is the change of the amount of detailed information in the image. We may evaluate the experimental results by average gradient, since it can reflect the ability of expressing the details and measuring the clarity of the image. The gradient increases if the greyscale level rate in one direction of the image varies quickly. The average gradient is formulated aswhere and denote the height and width of the image, respectively; denotes the greyscale value of pixel in the image. The larger the average gradient value of the image is, the clearer the image will be.
Besides, the visual quality of output images is an important qualitative metric.
3.4. Parameters Setting
In MSD, we pick 463 spectrums of gas target and 846 spectrums of background from the 12th frame of HSV to build up the training set. The size of the target subspace and background space is and , respectively. In CMF, the number of endmembers is 3, the sparsity of factor matrices is 2, and number of iteration is 3. In the proposed CTCFbased method, the original cumulative tensor is obtained by ALS, the tensor rank is 3, the maximum iteration number is 100, and the reconstruction error is 10^{−8}; in update stage, the threshold of fitness is 0.9. In the proposed STTFbased SR method, the number of iterations is 5; the parameter is the weight in (42) and we set ; parameter controls the sparsity of ; we set ; parameter is set by ; the size of is set by , , and . The parameters above are decided after sufficient number of experiments to make a balance between efficiency and stability.
3.5. Experimental Results and Discussion
In this subsection, we show the experimental results of the various methods for detection and superresolution.
After processing the HSV by the proposed CTCFbased method, we compute the values of Frobenius norm of each frame, which are presented in Figure 5. It is obvious that target gas appears in the 12th frame and disappears in the 51st frame. Figure 6 compares the ROC curves of test methods on four frames in detail, and Figure 7 illustrates the general trends of ROC curves of MSD, CMF, and CTCF, respectively. As can be seen from Figures 6 and 7, the proposed CTCFbased detection algorithm outperforms the other two methods. The AUC values of three approaches are shown in Table 1. In each row, the bold value represents the highest AUC value. Although the AUC values of CMF in some frames are better, we can see that the AUC values of CMF in some other frames are very low (less than 0.98). On the contrast, all the results of CTCF lie in the range of 0.98 to 1. From the average value and the variance (the bold value represents the highest value), we can conclude that the proposed method is superior and more stable. The graphical results are illustrated in Figure 8.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
The target of each keyframe is shown in 2D form (grey image) by taking the maximum value of every spectrum. To save the length of the paper, we choose 8 frames to show the comparison of three detectors, which are shown in Figure 9. The first row to the eighth row present the detection result of the chosen frame, of which the frame number is 15, 18, 22, 28, 31, 39, 48, and 50. The higher the greyscale of the pixel in the image is, the closer it is to the target. It is apparent that our method extracts more accurate targets.
Table 2 shows the entropy and average gradient of the keyframes by four SR algorithms. Since sequencebased method needs 5 LR frames to form 1 HR frame, the compared frame number is changed from range 12∼50 to range 14∼48. In each row, the bold values represent the highest entropy value and the highest average gradient value. From Table 2, we can conclude that firstly, although interpolation can add more information in the frame, the details of the target are lost; secondly, sparse representation SR and sequence information SR have almost the same entropy, but the latter approach offers more details because in the method the HR dictionary is formed by several LR dictionaries; finally, the proposed STTFbased SR method outperforms the other three methods in both metrics.
Figure 10 presents the visual quality of the results obtained by four test methods. We choose the 16th, 21st, 34th, and 47th frames as a representative. The smaller one with size of is the LR 2Dform frame. The bigger ones with size of are the SR results of different algorithms. As can be seen from Figure 10, the proposed approach yields clearer outputs with sharper edges and more textures. A drawback is the “checkerboard artifacts,” which may be caused by the deconvolution operations in the method. We desired to fix it in our future work.
(a)
(b)
(c)
(d)
4. Conclusions
In this paper, aiming at hyperspectral video, we propose a novel keyframe and target detection method based on cumulative tensor CP factorization, termed as CTCF, and a superresolution algorithm based on sparsebased tensor Tucker factorization, called STTF. Unlike conventional matrix factorization based methods, CTCF considers hyperspectral video (HSV) as 4D cumulative tensor and approximates new added frames by updating factor matrices. To break the limit of conventional methods and make superresolution (SR) more practical, STTF exploits the sparsity of HSV frames and factorizes them as a sparse core tensor multiplied by three modes dictionaries. In this way, spatial resolution of LRHSI is enhanced directly without HR samples. The experimental results systematically prove that the proposed CTCF and STTF methods outperform other stateoftheart algorithms.
In the future works, we focus on tensor factorization based target tracking methods which are able to extract target region more accurately and clearly. For superresolution, we aim at exploiting nonlocal similarities in tensor factorization framework, which has been widely used in inverse problems. Besides target tracking and superresolution, regions of interest (ROI) approaches will be investigated, in order to make HSV target recognition more efficient and full featured. Inspired by [70] and other related works, we believe that the researches of chemical gas detecting methods will benefit the agricultural application of HSI/HSV. These studies will be of great significance in internet of things (IoT), smart agriculture, pollution monitoring, etc.
Appendix
The optimizations of , , , and in Section 2.3.2 are presented as follows.(1)Optimization of : when , , and are fixed, the optimization of in (42) is represented aswhere denotes the previous estimation of height mode dictionary in last iteration. Using characteristics of nmode product (see (3)), (A.1) is represented aswhere denotes the mode1 unfolding matrix of and . Equation (A.2) is quadratic and can be solved by computing general Sylvester matrix equation; i.e.,
The conjugate gradient (CG) method is utilized to solve (A.3). After several iterations, CG will reach the convergence in certain conditions. In our experiments, it has been found that the solution of (A.3) is well approximated after 20 iterations.(2)Optimization of : when , , and are fixed, the optimization of in (42) is expressed bywhere denotes the previous estimation of width mode dictionary in last iteration. Similar to the optimization of , (A.4) can be transformed intowhere denotes the mode2 unfolding matrix of and . Equation (A.5) is also quadratic and can be solved by computing general Sylvester matrix equation; i.e.,
Likewise, CG is used to solve (A.6).(3)Optimization of : when , , and are fixed, the optimization with respect to in (42) can be formulated aswhere denotes the previous estimation of spectral mode dictionary in last iteration. Same as the processing in the two subsections above, we havewhere denotes the mode3 unfolding matrix of and . Similarly, (A.8) can be solved by computing general Sylvester matrix equation; i.e.,
We apply CG to solve (A.9) and the convergence is achieved in a few iterations.(4)Optimization of : when , , and are fixed, the optimization of in (42) can be written aswhere denotes the previous estimation of core tensor in last iteration. Equation (A.10) is convex, so we can employ the ADMM to solve the optimization problem. Introducing splitting variables and , (A.10) can be transformed into the equivalent constrained form:where
Equation (A.11) is a typical form of optimization problem that corresponds to the standard ADMM. The augmented Lagrangian function for (A.11) is represented aswhere denotes the Lagrangian multiplier and denotes the penalty parameter. The process of ADMM is formulated as
Here, the optimizations of and are independent because function is decoupled with respect to these variables. Next, (A.14) will be discussed more detailedly.(i)Update : based on (A.13), we haveand the closedform solution of (A.15) iswhere .(ii)Update : based on (A.13), we have
Based on (6) and (7), (A.17) is equivalent towhere the vectors , , , and are the vectorization form of tensors , , , and , respectively, and matrix . Equation (A.18) has the closedform solution which is denoted by
However, is so large that (A.19) is too heavy to be solved. We rewrite the first term of (A.19) as follows:where and () denote eigenvector matrices and eigenvalue matrices of , , and , respectively. So, is diagonal and can be computed easily. Moreover, the operation of and of is imode products and the multiplication in (A.20) is elementwise. Finally, in the second term of (A.19) can be computed by(iii)Update : based on (A.14), is updated by
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
The authors would like to thank Professor Gu from Heilongjiang Province Key Laboratory of SpaceAirGround Integrated Intelligent Remote Sensing for his selfless help. This work was supported by the National Natural Science Foundation of China (Grant no. 61671184) and the National Natural Science Foundation of Key International Cooperation of China (Grant no. 61720106002).