Abstract
The lowrank representation (LRR) method has recently gained enormous popularity due to its robust approach in solving the subspace segmentation problem, particularly those concerning corrupted data. In this paper, the recursive sample scaling lowrank representation (RSSLRR) method is proposed. The advantage of RSSLRR over traditional LRR is that a cosine scaling factor is further introduced, which imposes a penalty on each sample to minimize noise and outlier influence better. Specifically, the cosine scaling factor is a similarity measure learned to extract each sample’s relationship with the lowrank representation’s principal components in the feature space. In order words, the smaller the angle between an individual data sample and the lowrank representation’s principal components, the more likely it is that the data sample is clean. Thus, the proposed method can then effectively obtain a good lowrank representation influenced mainly by clean data. Several experiments are performed with varying levels of corruption on ORL, CMU PIE, COIL20, COIL100, and LFW in order to evaluate RSSLRR’s effectiveness over stateoftheart lowrank methods. The experimental results show that RSSLRR consistently performs better than the compared methods in image clustering and classification tasks.
1. Introduction
The limitations of classical feature learning techniques such as PCA [1] easily made the robust principal component analysis (RPCA) method an efficient choice for dealing with noise and outliers. Specifically, RPCA is focused on learning a lowrank subspace directly from the original highdimensional data to preserve its geometric structure in a lowdimensional subspace. And this strategy has shown tremendous improvements in several applications [2–5]. However, as RPCA only seeks a single lowrank subspace, it may still be limited with noise damage since highdimensional data are known to reside in multiple lowdimensional subspaces [6]. Thus, extending RPCA’s idea, Liu et al. [4, 7] proposed a method named “lowrank representation” (LRR). LRR’s main advantage over RPCA lies in its aim to learn data’s multiple lowdimensional subspaces and their membership. This approach makes LRR very robust to the negative effect of noise and outliers [8].
Therefore, considering LRR’s robustness mentioned above, several attempts were made in the literature such as references [9–13] to improve its performance. For example, to deal with the data from nonlinear subspaces, Tang et al. [13] proposed robust kernel LRR (RKLRR). Liu et al. [11] then adopted a fixedrank strategy to accelerate LRR’s computation process. However, their performances could be reduced with insufficient or heavily corrupted samples. That is why Xiao et al. [14] had previously proposed the latent LRR (LatLRR) for joint subspace segmentation and feature selection. The idea behind LatLRR is to include hidden data in constructing the dictionary to improve robustness further. On the contrary, how to handle gross data damage remains unsolved. It is observed that the more the data corrupted by noise, the larger the degradation in the classification and clustering performance.
To address this issue, a recursive sample scaling lowrank representation (RSSLRR) method is proposed in this paper. Since some data samples will be more damaged in gross data corruption, we estimate each data sample’s importance using a cosine scaling factor. This scaling factor measures the angle connecting each data sample and the lowrank representation’s principal components in feature space. In this way, we then iteratively subdue noisy data samples to overcome their effect. Thus, the proposed RSSLRR can effectively obtain a good lowrank representation than existing methods. Our main contributions are summarized as follows:(1)We propose a novel method named “RSSLRR,” which measures each data sample’s importance using a cosine scaling factor. This scaling factor is used in our model to extract each data sample’s relationship with the lowrank matrix’s principal components.(2)The proposed RSSLRR method can effectively handle noisy data samples by iteratively restricting noisy data samples using the sample scaling factor to suppress noise so as to obtain a good lowrank representation.(3)Several experiments are performed with varying levels of corruption to evaluate RSSLRR’s effectiveness. The experimental results show that RSSLRR consistently outperforms other stateoftheart methods in image clustering and classification tasks.
2. Related Work
This section presents a brief review of the baseline methods RPCA and LRR. First, in this paper, matrices are written in uppercase, e.g., . Thus, and denote the Frobenius norm and nuclear norm, respectively. and denote the vector norm and norm, which are defined by and .
2.1. Robust Principal Component Analysis
To recover a subspace structure from corrupted data, RPCA was proposed in [2]. Its strategy is to decompose a given data matrix into two components matrices by solving the following optimization problem:where the data matrix is a matrix of n samples in the mdimensional space, is the lowrank matrix, is a sparse error matrix, and is the regularized parameter to balance the effects of the two terms. Thus, RPCA’s main objective from the above formula is to obtain lowrank and sparse elements by combining the nuclear norm and norm. This approach is proven to be possible under some assumptions [16]. However, as RPCA assumes a single lowrank subspace, its performance can degrade easily.
2.2. LowRank Representation
Liu et al. [5] proposed LRR to tackle RPCA’s limitations. Specifically, LRR is focused on pursuing a data representation matrix with the lowest rank. It can achieve that by using data’s selfexpressiveness property such that the given data itself are utilized as a selfdictionary. This way, each data sample is then represented as a linear combination of similar samples belonging to the same class. The optimal lowrank matrix obtained by LRR is defined as follows:where the data matrix denotes the selfdictionary is used to capture the error components where denotes a certain norm, which can be determined based on the type of noise corruption. For example, while is a suitable candidate for data damaged by Gaussian noise, is good for random noise. Besides, is an efficient choice when only a part of data are contaminated.
Although LRR’s approach is shown to be very effective, particularly in noisy settings, its performance may degrade with insufficient samples. For this reason, Liu et al. [15] also proposed latent LRR that exploits both the observed and the hidden data to construct the selfdictionary. This strategy is most useful for image restoration [17]. Consequently, the work of [18] proposed a method for line pattern noise removal to address contaminated instances. It is realized in a transform domain by using a line pattern’s directional property. Besides, other efforts were made in references [11, 12, 19–31] to improve LRR’s discriminative capability.
Notably, BingKun Bao et al. [23] used a fixedrank approach so as to reduce LRR’s singular value decomposition cost. Zhang et al. [24] proposed two instantaneous methods: the first can reasonably handle noise interference by decomposing given data into two parts, namely, the lowrank sparse principal feature part and a noisefitting error part. In [25], Tang et al. introduced a diversity regularization and a rank constraint to suppress the redundancy in different data views. In [26], Zhang et al. presented a method that adaptively preserves local information of salient features, thus guaranteeing a blockdiagonal coefficient structure. Meanwhile, a compressive robust subspace clustering method was proposed in [27] for dimensionality reduction. However, because the subspace techniques, including LRR, do not provide linear dimensionality reduction (LDR) functionality, the feature selective projection (FSP) [28] was proposed. FSP combines feature extraction, feature selection, and LRR into a unified model to promote robust LDR. Likewise, a method was introduced in [29], which exploits a robust dictionary learning strategy to discover hybrid salient lowrank and sparse representations in a factorized compressed space. Furthermore, in an attempt to keep both similarity and local structures, the hierarchical weighted lowrank representation (HWLRR) [30] was proposed. Similarly, the more recent study [31] was focused on capturing crossview information through an approach that preserves both diversity and consensus information of each data view.
3. The Proposed Method
3.1. RSSLRR
Since realworld data are not always perfect and inevitably corrupted by noise in practice, most existing lowrank methods cannot guarantee robust performance. Therefore, noise interference must be carefully handled to resolve the present drawback. As a result, a rational solution is pursued in this paper, which uses a cosine scaling factor to estimate the importance of each data sample. Essentially, we suppose that the clean data samples will give high significant values, while the noisy ones would differ from the principal component of the data. Thus, the cosine scaling factor is introduced into the LRR formulation in equation (2) using the constraint . The motivation behind our approach is straightforward: according to reference [32], Z can be decomposed as , where and are left and right lowrank singular vectors. In other words, or becomes the pursued projection vector such that is the data projection in feature space. Therefore, is chosen as our maximum projection direction where , the maximum eigenvalue of , is ’s column vectors. So, for an outlier data sample , the angle between it and the principal component vector would differ more than that of clean data , as described in Figure 1(a). Hence, expressed in the following is used to estimate the importance of each data sample:where is a constant that stops from 0. Thus, using the significant factor , a given data matrix can be scaled to minimize the effect of noisy data samples, allowing the lowrank structure to be realized with clean data as shown in Figure 1(b). As such, the proposed sample scaling lowrank model is obtained as follows:where denotes the lowrank matrix and is used to capture the noise elements, similar to that in equation (2). Then, denotes the scaling factor of samples.
From the above formulation, one can easily check that the scaling factor suits our goal, as we can then detect the noisy points from closer angles to , with lower values being assigned to such points. Illustratively, let us assume , and is almost 0, so is subdued with . Thus, is used to obtain new training data. Besides, using SVD, , meaning that both new singular vectors and new projection vectors of sample space are obtained by suppressing the noisy data samples. As a result, our proposed method can then learn an optimal lowrank structure using new data where the points closer to the principal component vector are enhanced.
We give the summary of our model’s main characteristics as follows:(i)Unlike the existing lowrank methods, which use the input data itself as the dictionary, a new dictionary is presented with by imposing a recursive scaling factor on to suppress the effect of noisy samples.(ii)Specifically, our recursive modeling is very useful for learning a good lowrank representation, most especially when the data are heavily contaminated. As shown in equation (4), our focus is to find Z by minimizing equation (4) with the constraint of , thus allowing to preserve a better lowrank structure using only data samples with huge cosine similarity (referred to as clean data samples as they would have the smaller angle with the principal component). In other words, Z is obtained by equation (4) using clean data .
3.2. Optimization
In this section, we propose an optimization algorithm to solve equation (4). First, following standard practice, we introduce a variable to relax equation (4) further. Thus, equation (4) can be recast as
The augmented Lagrangian function equation (5) is given aswhere and are Lagrange multipliers, is a penalty parameter, and denotes the Frobenius norm of a matrix. Many concepts for convex optimization have been developed [33–35], which rely on nuclearnorm regularization. And the optimization problem can be solved via the method in [36].
3.2.1. Computation of J
According to references [37, 38], nuclearnorm minimization methods have a stable performance. For computing , we rewrite equation (6) as
3.2.2. Computation of Z
By fixing and and substituting , can be updated using the following formula:
3.2.3. Computation of E
With , , and fixed, can be solved as follows:
Following reference [39], equation (9) can be solved by the following lemma.
Lemma 1. (see [40]). Let be a given matrix. If is the optimal solution, then
Thus, the ith column of is
Based on Lemma 1, supposing we have a matrix , can be reached directly using the above formula, making the computation process very efficient.
The complete solution is given in Algorithm 1.

3.3. Complexity Analysis
This section gives an analysis of the computational cost of Algorithm 1. In Step 6 of Algorithm 1, the values of are the same as in the inner loop. Therefore, as the number of iterations increases, the time complexity in pursuing the lowrank variable decreases faster. From the above discussion, the computing time in is far shorter than that in . The cost of SVD is , where denotes the number of data vectors. Besides, costs for computation, where is the data dimension. in Step 10 then costs . Furthermore, supposing that the subblock requires times until convergence, the subproblems will then be calculated for iterations. Therefore, based on the number of iterations, the combined cost is , such that is a representation of the number of iterations. Thus, when , the cost’s upper bound would be . Accordingly, the overall computational cost of the proposed method is .
4. Experiments
In this section, our proposed RSSLRR method’s effectiveness is evaluated by comparing it with similar methods such as LRR [5], latent LRR [15], LRRLC, GLRR [8], and GODEC+ [41] in background modeling from video [42] and image denoising [43].
4.1. Background Modeling from Surveillance Video
4.1.1. Experimental Settings
This experiment is performed using surveillance video with various illumination settings. It is composed of a chain of 200 grayscale frames of 32 × 32 dimensions. Thus, each algorithm’s effectiveness is evaluated using precision, recall, and Fscore metrics, and their parameters are tuned according to the corresponding literature. Precisely, background modeling [44] is measured by manually quoting out the activities. In this experiment, 50% of frames are randomly selected as the training set, while the remaining are treated as the testing set.
4.1.2. Experimental Results
In Figure 2, we show each algorithm’s background recovery and activity segmentation performance. Additionally, it can be observed from Table 1 that RSSLRR outperforms other methods in activity segmentation as it can better generalize to the testing frames. This result further substantiates our sample scaling factor approach’s effectiveness such that a more reliable lowrank object is obtained than that from the compared methods.
4.2. Image Clustering with Varying Levels of Noise
4.2.1. Experimental Settings
Here, several experiments are performed on three wellknown image datasets, namely, ORL, CMU PIE, and COIL20, to evaluate the effectiveness of the proposed algorithm on image clustering. Each algorithm’s parameters are tuned according to the corresponding literature using the grid search strategy. We give a brief description of each one of these datasets as follows: ORL: it contains face images of ten individuals, with each of them contributing forty distinct images under various conditions such as facial details and different facial expressions. CMU PIE: it is a face image repository with images of sixtyeight individuals with different settings. It includes thirteen different poses, four different expressions, and fortytwo different illuminations. COIL20: it is an object image dataset consisting of 20 separate objects. Each object contributes 72 grayscale images, amounting to a total of 1440 images.
For each dataset, the images are resized to 3232 dimensions in our experiments. As illustrated in Figure 3, these datasets are then corrupted with 5%, 10%, 15%, and 20% random pixel noise to demonstrate each algorithm’s robustness to noise. Thus, a spectral clustering algorithm is applied to the similarity matrix of each algorithm to obtain the clustering results with ten multiple tries to ensure fairness [45].
4.2.2. Experimental Results
Tables 2–4 display the clustering results of each algorithm concerning the accuracy evaluation metric on ORL, CMU PIE, and COIL20, respectively. Therefore, it is obvious to see that the accuracy of our proposed RSSLRR method consistently beats those of the compared methods in all three datasets. For example, on the ORL dataset, the accuracy of our proposed method is about 1% higher than that of its closest competitor on clean data. Then, gradually increasing the noise level, one may notice that all the algorithms had reduced performance. However, our proposed method shows more robustness than the other methods, especially with 20% noise damage. Let us take, for instance, the clustering result of LRR, which moved from 0.7505 to 0.3497, while that of the proposed method had a lower drop from 0.7609 to 0.5650.
Similarly, in Tables 3 and 4, we can also see that the proposed method maintained its performance over other methods on the CMU PIE and COIL20 datasets. Particularly, RSSLRR results with a 20% corruption level show that its accuracy is about 4% better than that of GODEC+, which is 0.5069 on the COIL20 dataset. Accordingly, we present the clustering variation graph of each method in Figure 4 that further reveals the robustness of the proposed method to noise. Thus, it is safe to conclude that the proposed method’s performance is steadier than that of the other algorithms, especially at a high level of corruption. We attribute that to our scaling factor approach in iteratively overcoming the noise effect.
(a)
(b)
(c)
4.3. Image Recognition with Contiguous Occlusion
4.3.1. Experimental Settings
In order to evaluate the robustness of RSSLRR on image recognition under different levels of contiguous occlusions in images [19], we randomly add and 46 block occlusions to each dataset, as illustrated in Figure 5. Therefore, 50% of samples in each dataset are selected as the training set and the rest as the testing set. Besides, we compare our proposed method’s performance with that of similar ones: LRR, LRRLC, latent LRR (LLRR), GLRR, and GODEC+, by adopting relevant experimental settings in Section 4.2. Thus, each algorithm’s classification accuracy is evaluated using the nearest neighbor (KNN) classifier.
4.3.2. Experimental Results
In Table 5, the average classification accuracies are obtained on the ORL dataset in two levels of contiguous occlusions. From Table 5, we can see that the accuracy of RSSLRR is only slightly higher than that of GODEC+. On the contrary, RSSLRR’s accuracy is significantly better than that of LRRLC, LLRR, and LRR by about 5%, 4%, and 12%, respectively, under occlusion. For example, the classification accuracy of RSSLRR is 0.6015 and 0.4891 in LRR. This indicates that our proposed RSSLRR can effectively obtain a good lowrank representation than existing methods.
From Table 6, we can see that the accuracy of RSSLRR is better than that of LRR by about 6%. Under occlusion, the accuracy of RSSLRR is better than that of GODEC+ and GLRR by about 2%. Thus, our proposed RSSLRR demonstrates approximately outstanding effectiveness on classification accuracies among all algorithms. In Table 7, the accuracy of RSSLRR is higher than that of LRRLC and LRR by about 3% and 6% under occlusion. For example, the classification accuracy of GLRR is 0.6985 and 0.7130 in RSSLRR, while the result of LRR is 0.6522 under occlusion.
Figures 6(a)–6(f) illustrate the variations of classification accuracies with increasing feature dimensionalities in different occlusions on ORL, CMU PIE, and COIL20 datasets. From Figures 6(a) and 6(b), we can see that RSSLRR achieves the highest accuracies among all algorithms. In Figures 6(c) and 6(d), we can see that the performance of RSSLRR becomes better than that of others on the CMU PIE dataset when the feature dimensionality is over 70. From Figures 6(e) and 6(f), on the COIL20 dataset, we can see that the accuracies of RSSLRR gradually show its superiority when the feature dimensionalities are more than 70.
(a)
(b)
(c)
(d)
(e)
(f)
From the above discussion, it can be seen that our proposed RSSLRR shows better performance in classification accuracy compared with the other algorithms.
4.4. Experiments on LargeScale Dataset
4.4.1. Experimental Settings
In this experiment, RSSLRR effectiveness is further evaluated on two larger datasets, namely, COIL100 and LFW. Also, the relevant settings from previous sections are adopted to perform the largescale experiment. We give a brief description of each dataset as follows: COIL100: it has 7200 images of 100 objects, which amount to 72 images for each object with each image taken at pose intervals of 5 degrees. LFW: the Labeled Faces in the Wild (LFW) dataset originally contains more than 13000 face images, mainly from Internet sources. However, 2484 face images were extracted from 38 classes in our experiments due to fewer samples in some categories. Each image was resized to 64×64 pixels, yielding 4096 features per image.
4.4.2. Experimental Results
From the classification results in Tables 8 and 9, it can be noticed that RSSLRR performance is consistently better than that of the compared methods. For instance, while RSSLRR’s performance of 0.6052 on the COIL100 dataset corrupted with 6 × 6 block occlusion (Table 8) is slightly better than that of the secondbest GODEC+ by over 1%, it is far better by over 2% under 8 × 8 block occlusion. Similar results are obtained on the LFW dataset (Table 9), where the proposed method’s performance is also better than that of the GODEC+ method, which follows closely, except that more margin of over 4% is obtained under 6 × 6 block occlusion.
From the clustering results displayed in Tables 10 and 11, the following can be observed:(i)Although all methods obtained comparative performances on both datasets, their accuracies degrade significantly by increasing noise. This, however, is not unexpected because more corruption levels would mean that more discriminative data features are destroyed, making it difficult to accurately group similar data samples in the same cluster.(ii)Furthermore, while the relatively newer method GODEC+ shows more robustness to noise than the older methods, its clustering accuracy on clean data is slightly lower than that of the other methods on the LFW dataset, perhaps due to class imbalance in this dataset.(iii)Overall, RSSLRR’s robustness to noise grows stronger with the increase in noise level. For example, its performance on COIL100 under 0% noise is merely 1% better than that of its closest competitor GODEC+, but it is over 2% better under 20% noise. The same can be said on the LFW dataset, where RSSLRR’s clustering accuracy is only about 2% better than that of LRR on clean data, whereas it is more than 4% better than that of GODEC+, which is the closest result under the 20% noise level.
Additionally, in order to demonstrate more novelty, RSSLRR’s clustering and classification performances on largescale datasets are further compared with those of two more recent stateoftheart (SOTA) methods, namely, nonnegative sparse discriminative lowrank representation (NSDLRR) [47] and lowrank and collaborative representations for hyperspectral anomaly detection (LRCRD) [48]. From the classification results displayed in Figure 7, it can be observed that all three methods obtain correlative results on both datasets. However, RSSLRR shows more robustness, especially on the LFW dataset. Similarly, the clustering results are shown in Figure 8, and the results are also close with those of the proposed method displaying the best overall performance.
(a)
(b)
(a)
(b)
4.5. Convergence Study
Based on established standards of inexact augmented Lagrange multiplier (IALM) optimization strategy in several pieces of literature [5, 49–51], the objective function values of the , , and subproblems were expected to decrease monotonically in each iteration until convergence. Besides, the subproblem is known to have a closedform solution [51] since it has a convex function. Also, the convergence of the and subproblems is confirmed by references [36, 52–54], respectively. Thus, a slow convergence is supposed when everything is put together. Luckily, as shown in Figure 9, the proposed algorithm has a strong convergence property, as it converges within 150 iterations on COIL100 and ORL datasets.
(a)
(b)
5. Conclusion
In this paper, we propose a recursive sample scaling lowrank representation method named “RSSLRR.” Different from the existing methods, each data sample’s importance is estimated by introducing a cosine scaling factor. This scaling factor is used to extract each sample’s relationship with the lowrank representation’s principal components in the feature space. Thus, our proposed model can effectively reduce the noise effect by iteratively reducing the importance of noisy samples in learning the robust lowrank matrix. Several experimental results on wellknown benchmark datasets demonstrate that RSSLRR performs better in clustering and classification tasks than the stateoftheart methods. It includes various experiments conducted on gross corrupted data. Therefore, we will extend RSSLRR’s idea to multiview data in future work.
Data Availability
The datasets used in this study are open benchmark datasets that are allowed for use in research. The following is a description and links to each one of them. Each can be accessed using the corresponding link. ORL contains face images of ten individuals, with each of them contributing forty distinct images under various conditions such as facial details and different facial expressions: http://camorl.co.uk/facedatabase.html. CMU PIE is a face image repository with images of sixtyeight individuals with different settings. It includes thirteen different poses, four different expressions, and fortytwo different illuminations: https://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/MultiPie/Home.html. COIL20 is an object image dataset consisting of 20 separate objects. Each object contributes 72 grayscale images, amounting to a total of 1440 images: https://www.cs.columbia.edu/CAVE/software/softlib/coil20.php. COIL100 has 7200 images of 100 objects, which amount to 72 images for each object with each image taken at pose intervals of 5 degrees: https://www.kaggle.com/jessicali9530/coil100. The Labeled Faces in the Wild (LFW) dataset originally contains more than 13000 face images, mainly from Internet sources. However, 2484 face images were extracted from 38 classes in our experiments due to fewer samples in some categories. Each image was resized to 64 × 64 pixels, yielding 4096 features per image: http://viswww.cs.umass.edu/lfw/.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this study.
Authors’ Contributions
Wenyun Gao and Stanley Abhadiomhen conceptualized the research idea and reviewed and edited the paper. Xiaoyun Li and Sheng Dai were involved in data investigation. Sheng Dai ran the software. Stanley Abhadiomhen wrote the original draft. Wenyun Gao, Xinghui Yin, and Stanley Abhadiomhen supervised the project. Wenyun Gao was involved in funding acquisition.
Acknowledgments
This research was funded in part by the National Key Research and Development Program of China, Grant no. 2020YFC1511800.