Abstract
Clustering of tumor samples can help identify cancer types and discover new cancer subtypes, which is essential for effective cancer treatment. Although many traditional clustering methods have been proposed for tumor sample clustering, advanced algorithms with better performance are still needed. Lowrank subspace clustering is a popular algorithm in recent years. In this paper, we propose a novel onestep robust lowrank subspace segmentation method (ORLRS) for clustering the tumor sample. For a gene expression data set, we seek its lowest rank representation matrix and the noise matrix. By imposing the discrete constraint on the lowrank matrix, without performing spectral clustering, ORLRS learns the cluster indicators of subspaces directly, i.e., performing the clustering task in one step. To improve the robustness of the method, capped norm is adopted to remove the extreme data outliers in the noise matrix. Furthermore, we conduct an efficient solution to solve the problem of ORLRS. Experiments on several tumor gene expression data demonstrate the effectiveness of ORLRS.
1. Introduction
Tumor is a group of cells that have undergone unregulated growth and often form a mass or lump. It is critical to reveal the pathogenesis of cancer by analyzing tumor gene expression data. The advances of various sequencing technologies have made it possible to measure the expression levels of thousands of genes simultaneously [1]. Increasingly, one challenge is how to interpret these gene expression data to gain insights into mechanisms of tumors [2]. Many advanced machine learning algorithms [3–9] have thus been proposed to analyze various data. Among them, clustering can be used for discovering tumor samples with similar molecular expression patterns [10, 11].
Many traditional clustering methods, such as hierarchical clustering (HC) [12, 13], selforganizing maps (SOM) [14], nonnegative matrix factorization (NMF) [15, 16], and principal component analysis (PCA) [17–20] have been used for gene expression data clustering. The gene expression data often contains structures that can be represented and processed by some parametric models. The linear subspaces are possible to characterize a given set of data since they are easy to calculate and often effective in real applications. The subspace methods, such as NMF, are essentially based on the assumption that the data is approximately drawn from a lowdimensional subspace. In recent years, these methods have been gaining much attention. For example, Yu et al. proposed a correntropybased hypergraph regularized NMF (CHNMF) method for clustering and feature selection [21]. Specifically, the correntropy is used in the loss term of CHNMF instead of the Euclidean norm to improve the robustness of the algorithm. And, CHNMF also uses the hypergraph regularization to explore the highorder geometric information in more sample points. Jiao et al. proposed a hypergraph regularized constrained nonnegative matrix factorization (HCNMF) method for selecting differentially expressed genes and tumor sample classification [22]. HCNMF incorporates a hypergraph regularization constraint to consider the higher order data sample relationships. A nonnegative matrix factorization framework based on multisubspace cell similarity learning for unsupervised scRNAseq data analysis (MscNMF) was proposed by Wang et al. [23]. MscNMF can learn the gene features and cell features of different subspaces, and the correlation and heterogeneity between cells will be more prominent in multisubspaces, resulting in the final cell similarity learning will be more satisfactory.
However, real data rarely can be well represented by a single subspace. A more reasonable model is to assume that the data are lying near multiple subspaces (i.e., the data are considered as samples approximately drawn from a mixture of multiple lowdimensional subspaces). Subspace clustering (or segmentation) has been proposed to improve clustering accuracy. It is assumed that the data points are drawn from the combination of multiple lowdimensional subspaces. The goal of subspace clustering is to obtain such multiple lowdimensional subspaces with each subspace corresponding to a cluster. Subspace clustering has obtained promising results in previous studies, and subspace clustering methods have been found widespread applications in many areas, such as pattern recognition [24], image processing [25], and bioinformatics [26].
When the data are clean, i.e., the samples can be strictly drawn from multiple subspaces, several existing methods, such as sparse subspace clustering (SSC) [27], lowrank representation (LRR) [5], and lowrank model with discrete group structure constraint (LRS) [28], are able to solve the subspace clustering problem. SSC clusters the data drawn from multiple lowdimensional subspaces based on sparse representation (SR) [29]. Since lowrank structure can well perform matrix recover, the multiple subspaces can be exactly recovered by LRR. Recently, many excellent works based on lowrank representation are published. For example, Tang et al. proposed a multiview subspace clustering model by learning a joint affinity graph for multiview subspace clustering based on lowrank representation with diversity regularization and rank constraint [30]. This method can effectively suppress redundancy and enhance the diversity of different feature views. In addition, the cluster number is used to promote affinity graph learning by using a rank constraint. In [31], an unsupervised linear feature selective projection (FSP) method was proposed for feature extraction with lowrank embedding and dual Laplacian regularization. FSP can take advantage of the inherent relationship between data and can effectively suppress the influence of noise. LRR have two steps in the clustering task: building the affinity matrix and performing spectral clustering. How to define an excellent affinity matrix is crucial. Furthermore, the clustering problem will be transformed into a segmentation problem of graph by using spectral clustering. The choice of segmentation criteria will directly affect the clustering results. To address the above concerns, LRS directly grasps the indicators of different subspaces via the discrete constraint. As a result, multiple lowrank subspaces can be obtained clearly. Furthermore, Nie et al. introduced a piecewise function to relax the rank constraint which makes LRS better at handling the noisy dataset than the preliminary version [32].
As pointed out in [33], one major challenge of subspace clustering is to deal with the outliers that exist in data. Therefore, robust subspace clustering has become an active research topic. To address the robustness issue, the main idea is to explore the L_{2,1}norm based objective functions since the nonsquared residuals of L_{2,1}norm can reduce the effects of data outliers. In [34, 35], the L_{2,1}norm is adopted in robust PCA (RPCA) for detecting outliers. In [33], Liu et al. proposed a robust LRR model via L_{2,1}norm for subspace clustering. Although the L_{2,1}norm is robust to outliers, it still suffers from the extreme data outliers. The L_{2,1}norm just reduces, not completely removes, the effects of the outliers. Capped norm is a more robust strategy than L_{2,1}norm due to the fact that it can remove the effects of the outliers. It has been recently studied in many applications [36, 37].
In this paper, a onestep robust lowrank subspace segmentation (ORLRS) method via the discrete constraint and capped norm is proposed for clustering tumor sample. For a data set with genes and samples, a lowrank representation matrix and a noise matrix , i.e., , are being sought. The lowrank representation of the th subspace can be denoted as . Here, we impose the discrete constraint on a diagonal matrix to obtain the lowrank representation , where and ( is the number of total subspaces and is an identify matrix). The indicators of the th cluster are included in . In contrast to traditional lowrank based models, we can directly learn the cluster indicators. To avoid trivial solutions and approximate the lowrank constraint, the rank of all subspace simultaneously can be minimized as , where denotes the Schatten norm which has a better relaxation than the nuclear norm [38]. For the noise matrix , capped norm is used to improve the robustness. We define as a thresholding parameter for choosing the extreme data outliers, and then the capped norm of can be formulated as . This function treats equally if is smaller than . Hence, it is more robust to outliers than L_{2,1}norm. Meanwhile, we derive an efficient optimization algorithm to solve ORLRS with a rigorous theoretical analysis.
The main contributions of our paper are given as follows: ① Compared with traditional lowrank representationbased methods, ORLRS can obtain the clustering result directly by learning a subspace indicator matrix from the lowrank representation matrix without spectral clustering. This avoids the graph construction process in spectral clustering and makes the clustering process simpler. ② We introduced the capped norm into our model and formed a novel objective function for the gene expression data clustering task. Capped norm is used to constrain the noise matrix to improve the robustness of ORLRS. ③ Optimizing the objective function of ORLRS is a nontrivial problem, thus we derive a new optimization algorithm to solve the problem. Furthermore, we have also given a rigorous convergence analysis of ORLRS.
The remainder of the paper is structured as follows. In Section 2, the proposed ORLRS is presented, and the theoretical analysis of the proposed method is provided. Experimental results are presented in Section 3. In Section 4, the conclusions are given.
2. Methods
We start with a brief introduction of several classical clustering methods. Then, the proposed ORLRS is presented, and the optimal solution and convergence analysis of ORLRS is provided.
2.1. Subspace Clustering via LRR
Denote as a data set with features and samples. LRR can be defined aswhere , i.e., the nuclear norm of [33], can detect outliers with columnwise sparsity, is a dictionary, and is a balance parameter.
A brief explanation of LRR subspace clustering process is provided as follows. Firstly, the lowrank problem is solved by equation (1). Then, the optimal solution to equation (1) is used to calculate the affinity matrix by , where is the absolute value function. Finally, the data are clustered by using spectral clustering [39].
2.2. OneStep Robust LowRank Subspace Clustering
In this paper, we propose the onestep robust lowrank subspace clustering (ORLRS) method via discrete constraint and capped norm. Different from LRR, ORLRS was proposed for clustering the data by learning the indicators.
Suppose the data matrix has subspaces , the lowrank representation of each subspace needs to be optimized. In the clustering task, we want each subspace to belong to its own cluster. To obtain a lowrank representation of each subspace, the following formula should be computed: , which has trivial solution. Therefore, we need to solve the problem in another way. We define a cluster indicator matrix as : if the th sample belongs to the th subspace, and otherwise. And, the diagonal matrices are defined as , where the diagonal elements of are formed by the th row of and is the identity matrix. Then, can be represented as the th subspace of . That is, can be rewritten as . We can get the clustering label in one step by directly optimizing [28].
Finally, the problem of the onestep lowrank subspace clustering method can be defined aswhere is the Schatten norm of . The clustering indicators of each subspace can be obtained from the optimized diagonal matrix directly.
However, equation (2) is sensitive to data outliers in practical problems since it does not consider the noise in data. To address the robustness problem, we represent the gene expression data with genes and samples as the addition of lowrank representation matrix and the noise matrix , i.e., , which is the same strategy as in RPCA. Our onestep lowrank subspace clustering problem can be written aswhere is a balance parameter and indicates certain regularization strategy. Note that Schatten norm is used to approximate the lowrank problem in equation (3) since it is a better relaxation for the rank constraint problem than nuclear norm [38]. The Schatten norm of a matrix was defined as , where is the th singular value of . In [38], the convergence of Schatten norm with is proved. Here, we set to guarantee the convergence of first term in equation (3). So, the range of is .
To seek a better robustness strategy for the outliers, we adopt capped norm to regularize the noise matrix , i.e., . Then, equation (3) becomeswhere is a thresholding parameter for choosing the data outliers. If the data point , we consider as extreme outlier, and it is capped as . In this way, the influence of extreme outliers is fixed. For other data point , equation (4) will minimize , i.e., the L_{2,1}norm. That is, if is set as , is equivalent to . Thus, the capped norm is a more robust strategy than L_{2,1}norm.
As a result, ORLRS provides a more robust lowrank subspace clustering model by using capped norm. And, the clustering indicators of each subspace can be obtained from the optimized diagonal matrix directly. We will propose an efficient optimization algorithm to solve equation (4) in Section 2.3.
2.3. Optimization Algorithm
The objective function equation (4) of the ORLRS is nonconvex, thus jointly optimizing , , and is extremely difficult. The augmented Lagrange multiplier (ALM) algorithm is used to optimize equation (4). The Lagrangian function of equation (4) can be written aswhere is a Lagrange multiplier, is a penalty parameter, is the Frobenius norm, and encodes the constraints of . We rewrite equation (5) as follows:
We divide equation (6) into three subproblems: optimizing while fixing and , optimizing while fixing and , and optimizing while fixing and .
2.3.1. Fixing and to Optimize
Equation (6) can be simplified towhere .
Lemma 1. (ArakiLiebThirring [40, 41]). For any positive semidefinite matrices , , the following inequality holds when :
While for , the inequality is reversed.
Following , [28] and Lemma 1, the first term in equation (7) can be denoted as since . According to , we convert the first term in equation (7) to
Then, equation (7) can be represented as
Taking derivative w.r.t and setting to zero, the above formula becomeswhere . So, we can achieve the optimal :
2.3.2. Fixing and to Optimize
Here, we can denote equation (6) aswhere . It can be easily verified that the derivative of equation (13) is equivalent to the derivative ofwhere
Equation (14) can be formulated aswhere is a diagonal matrix with . The problem of equation (16) can be optimized by using the iterative reweighted optimization strategy.
When fixing , taking derivative w.r.t and setting it to zero, the above formulation can be written as
So, we can obtain the optimal :
When fixing , the updating rule for is as follows:
2.3.3. Fixing and to Optimize
We can rewrite equation (6) as
Taking derivative w.r.t and setting to zero, the above formulation can be written aswhere .
Since depends on , an iterationbased algorithm is used to obtain the solution of equation (21). Firstly, we calculate by using the current solution of . If is given, the solution of to the following objective function will satisfy equation (21):
The current solution of can be updated according to the optimal solution to equation (22).
Denote that , equation (22) can be written as
Due to are diagonal matrices, the above formulation becomeswhere is the th diagonal element of matrix and is the th diagonal element of matrix . We can optimize equation (24) by
The algorithm to solve the problem of ORLRS is summarized in Algorithm 1.

2.4. Convergence Analysis
In this section, the convergence analysis of the proposed algorithm will be proved.
Theorem 1. At each iteration, the updating rule in Algorithm 1 for matrix while fixing others will monotonically decrease the objective value in equation (4) when .
Proof. It can be verified that equation (12) is the solution to the following problem:Then, at the iterationThat is,Equation (28) can be converted toaccording to Lemma 2 in [38].
Lemma 2. For any positive definite matrices , the following inequality holds when .Note that, here, we set , so equation (30) is equivalent toThen, we have
Combining equations (29) and (32), we have
That is to say,
Thus, the updating rule for matrix in Algorithm 1 will not increase the objective value of the problem in equation (10) at each iteration when .
Theorem 2. At each iteration, the updating rule in Algorithm 1 for matrix while fixing others will monotonically decrease the objective value in equation (4).
Proof. We fist prepare the following lemma in [37].
Lemma 3. Given , we have the following inequality:
It can be verified that equation (18) is the solution to the following problem:
Suppose the updated in Algorithm 1 is while fixing others. Since is the optimal solution to equation (4), we have
According to the definition of in equation (19) and Lemma 3, we have
Summing over equations (37) and (38) at both sides, we can obtain
Therefore, at each iteration, the updating rule in Algorithm 1 for matrix while fixing others will monotonically decrease the objective value in equation (4).
Theorem 3. At each iteration, the updating rule in Algorithm 1 for while fixing others will monotonically decrease the objective value in equation (4) when .
Proof. It can be easily verified that equation (25) is the solution to the following problem: Assume the updated in Algorithm 1 is . Since is the optimal solution to the equation (22), we can haveAccording to the definition of in Algorithm 1, equation (41) can be written asAccording to the Cauchy–Schwarz inequality, it can be proved that, when , we haveThus, combining inequations (42) and (43), we can obtainEquation (44) indicates that the updating rule in Algorithm 1 for while fixing others will monotonically decrease the objective value in equation (4) during the iteration until the algorithm converges when . In practice, the algorithm is also converged when . If the objective function of equation (40) is changed to ( in Algorithm 1 becomes ), the convergence is also observed [28].
As a result, the objective of equation (4) is nonincreasing under the updates of , , and according to Theorems 1–3, respectively. Therefore, the iteratively updating Algorithm 1 converges to a local optimal.
2.5. Complexity Analysis
In Algorithm 1, the most complicated calculations are and in Step 3. We suppose in the lowrank representation matrix . Firstly, needs to be computed. Denoting the SVD of is . Computing needs SVD of , which takes . can be decomposed as by SVD, which takes . So, computing takes and computing takes , where c is the number of clusters. For , we only need to compute the diagonal elements, which takes . And, computing takes . In summary, the computational complexity of Algorithm 1 is , where t is the iteration number.
3. Results and Discussion
We test ORLRS on six publicly available gene expression data sets, i.e., Leukemia [42], DLBCL [43], Colon cancer [44], Brain_Tumor1 [43], Brain_Tumor2 [43], and 9_Tumors [43].
Following [28, 45–47], clustering accuracy (ACC) is a widely used evaluation method for tumor clustering. Given a data point , suppose as the target label and as the truth label. ACC can be denoted as [45]where if and if , maps to the equivalent label from the raw data and is the number of tumor samples.
We also evaluate the clustering performance by normalized mutual information (NMI) [48]. NMI is defined aswhere is the mutual information function between the true class label C and the clustering label S and is the entropy function. The larger the NMI value is, the better the clustering result is.
3.1. Gene Expression Data Sets
A brief introduction of six gene expression data sets is presented, and the detailed information of these datasets is summarized in Table 1. Leukemia data contain 25 cases of AML and 47 cases of ALL. It is packaged into a 7129 × 72 matrix [42]. DLBCL data consist of 5469 genes and 77 samples. These samples include 58 patients of diffuse large Bcell lymphoma (DLBCL) and 19 patients of follicular lymphomas (FL) [43]. The colon cancer data [44] consists of a matrix that includes 2000 genes and 62 tissues. These tissues are divided into 22 normal and 40 colon tumor samples Brain_Tumor1 data set consists of 5920 genes in 90 patient samples. These samples contain 5 types of histological diagnoses, i.e., 60 cases of medulloblastoma, 10 cases of malignant glioma, 10 cases of atypical teratoid/rhabdoid tumors (AT/RTs), 4 cases of normal cerebellum, and 6 cases of primitive neuroectodermal tumors (PNETs). The Brain_Tumor2 data set contains 10367 genes in 50 samples. It contains 4 types of malignant glioma, i.e., classic glioblastomas (CG), classic anaplastic oligodendrogliomas (CAO), nonclassic glioblastomas (NCG), and nonclassic anaplastic oligodendrogliomas (NCAO) [34]. 9_Tumors data set integrates 9 tumor types to develop a genomicsbased approach to the prediction of drug response. It contains 5726 genes in 60 samples. The number of samples of 9 tumor types is shown as follows: 9 samples of nonsmallcell carcinoma (NSCLC), 7 samples of colon cancer, 8 samples of breast cancer, 6 samples of ovary cancer, 6 samples of leukemia, 8 samples of renal cancer, 8 samples of melanoma, 2 samples of prostate cancer, and 6 samples central nervous system cancer (CNS).
3.2. Comparison Algorithms
We compare LRS [28], ExtLRR [32], RPCA [3], PLRR [47], robust LRR [33], LatLRR [49], robust NMF [50], and Kmeans [51] with the proposed method for tumor clustering. In these methods, LRS is the basic version of our method to implement the onestep clustering, and ExtLRR is a simpler and more effective extension work compared with LRS; RPCA is a classic robust learning algorithm; PLRR (projection LRR) is one of the latest subspace clustering methods for tumor sample clustering; Robust LRR and LatLRR are the best stateofart lowrank subspace segmentation algorithms; robust NMF is a classic NMFbased method and is widely used for tumor clustering. Kmeans is the most commonly used clustering method and is embedded into many methods including PLRR, robust LRR, and LatLRR to achieve better performance. Since our proposed method is a novel onestep robust lowrank subspace clustering model, we choose these methods as our comparison algorithms.
3.3. Parameter Setting
Since gene expression data have the characteristics of highdimensional and small samples, we use PCA to perform dimensionality reduction. And, we use the Kmeans method to initialize in the proposed ORLRS. Here, three parameters, i.e., threshold parameter , balance parameter , and lowrank constraint parameter , need to be determined. In the experiment, we investigated one parameter by fixing the other two parameters. Since the initialization of will bring some uncertainty, the proposed ORLRS method run 100 times, and the average of the accuracies of 100 times is reported. The choices of parameters in the following are heuristic and might not be the best for tumor clustering.
3.3.1. Determination of Threshold Parameter
In the ORLRS model, data outliers are not heuristically determined based on the magnitude. They are selected during the optimization process. The data outliers may be distinct at different iterations (with the same thresholding parameter), while we iteratively optimize the objective function of ORLRS method. When the algorithm converges, likely correct extreme data outliers can be found. So, we just need to determine one value of for each data set.
Figure 1 presents the results of ORLRS with different . Since the gene expression levels in different data are very different, the values of extreme data outliers are also very different. So, the value of has a large range in six data sets. From Figure 1, we can observe that ORLRS can obtain the best performance in the case of in Leukemia data, DLBCL data, Colon cancer data, Brain_Tumor1 data, Brain_Tumor2 data, and 9_Tumors data, respectively. The results indicate that the value of should be determined appropriately. If the value of is too large, we will miss some extreme outliers. If the value of is too small, some important information may be removed, thereby affecting the clustering performance.
(a)
(b)
(c)
(d)
(e)
(f)
3.3.2. Determination of Balance Parameter
Figure 2 presents the results of ORLRS with different . ORLRS can obtain the best results in the case of in Leukemia data, DLBCL data, Colon cancer data, Brain_Tumor1 data, Brain_Tumor2 data, and 9_Tumors data, respectively. According to the experimental results in each data set, before reaching the best results, the clustering accuracies showed an overall upward trend when increases; after achieving the best results, the clustering accuracies showed an overall downward trend when increases. So, we suggest a rough range on the choice of .
(a)
(b)
(c)
(d)
(e)
(f)
3.3.3. Determination of Schatten PNorm Parameter
Since the algorithm is converged for the Schatten pnorm parameter , we determine the value of in this range. Figure 3 presents the results of ORLRS with different . ORLRS can achieve the best performance in the case of in Leukemia data, DLBCL data, Colon cancer data, Brain_Tumor1 data, Brain_Tumor2 data, and 9_Tumors data, respectively. So, a general guidance is given on the choice of .
(a)
(b)
(c)
(d)
(e)
(f)
3.4. Experimental Results
In this section, experimental results of our proposed method and six comparison algorithms, i.e., LRS, ExtLRR, RPCA, PLRR, robust LRR, LatLRR, robust NMF, and Kmeans, are reported. ORLRS, LRS, and ExtLRR use Kmeans to initialize the indicator matrix . PLRR, robust LRR, and LatLRR use the normalized cuts method to segment data, which cluster data points by using the Kmeans method. For robust NMF method, we initial the coefficient matrix and basis matrix randomly. To avoid randomness, we run all methods 100 times, and the mean and standard error results of the clustering accuracies of 100 times are shown in Table 2. The best result of each data is indicated in bold.
Based on the results reported in Table 2, we have the following observations and discussions. ORLRS extends LRS by adding a noise matrix into the objective function to enhance the robustness, which contributes to the observation that ORLRS outperforms LRS. From the results shown in Table 2, it can be observed that ORLRS achieves generally 8%–19% higher performances than LRS in terms of the clustering accuracy on four data sets, i.e., Leukemia data, DLBCL data, Colon cancer data, and Brain_Tumor1 data. On Brain_Tumor2 and 9_Tumors data sets, ORLRS has a slightly better performance than LRS. ORLRS has better results than ExtLRR on all datasets. Compared to the three classical lowrank based methods, PLRR, robust LRR, and LatLRR, the clustering accuracy of ORLRS is 1%–9% higher on all six data. The main reason is that we use capped norm to remove the extreme outliers in the noise matrix and Schatten pnorm to better approximate the lowrank representation. Compared with traditional clustering methods, RPCA, robust NMF, and Kmeans, ORLRS achieves outstanding results on all of the six data sets.
The NMI results on five gene expression data sets are shown in Table 3. The best result of each data is indicated in bold. Due to the NMI results of all the methods on colon data is less than 0.1, we only reported the results of remain five data sets. From Table 3, we can observe that ORLRS has better results on all the five data sets than PLRR, robust LRR, LatLRR, robust NMF, RPCA, and ExtLRR. Except on 9_Tumors data set, our method outperforms LRS and Kmeans on the other four data sets.
3.5. Convergence Curves and Running Time
We plotted the convergence curves of our ORLRS on different datasets. The convergence curves can be found in Figure 4. It shows that our method can converge around the 10th iteration on all six data sets. In Table 4, we also reported the running time of ORLRS on six gene expression data sets without dimensionality reduction by PCA. We implement our experiment with MATLAB R2020b on an ordinary computer, which is configured with Intel i910900 KF (up to 3.70 GHz) cores, 8 GB RAM, and Windows 10 operating system.
(a)
(b)
(c)
(d)
(e)
(f)
4. Conclusions
In this paper, a novel onestep robust lowrank subspace clustering method (ORLRS) is proposed for tumor clustering, where the gene expression data set is represented by a lowrank matrix and a noise matrix. By using the Schatten norm and discrete constraint, lowrank representation of each subspace can be well obtained. Different from traditional lowrankbased methods, such as LRR and LatLRR, ORLRS learns indicators directly and perform clustering process in one step by using the discrete constraint. Capped norm is used to improve the robustness of ORLRS since it can effectively remove the extreme data outliers in the noise matrix. Furthermore, we propose an efficient algorithm to solve the proposed subspace clustering model, and the convergence of the proposed algorithm is proved. We thus can discover the clusters of tumor data depending on the optimal cluster indicators. We tested the proposed ORLRS method on six tumor data. The results are proved that ORLRS is an excellent method for clustering tumor sample.
There remain several interesting directions for future work. First, it might be better to learn a dictionary for ORLRS since some lowrank subspace segmentation methods achieve significant improvements by learning a dictionary. Second, ORLRS may be extended to solve other problems, such as matrix recovery and classification. Third, ORLRS may be employed in other applications, such as gene clustering and coclustering.
Data Availability
The data used to support the findings of this study are available from the first author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant nos. 61906198, 61976215, and 61772532) and the Natural Science Foundation of Jiangsu Province (Grant no. BK20190622).