Abstract

In the case of multiview sample classification with different distribution, training and testing samples are from different domains. In order to improve the classification performance, a multiview sample classification algorithm based on L1-Graph domain adaptation learning is presented. First of all, a framework of nonnegative matrix trifactorization based on domain adaptation learning is formed, in which the unchanged information is regarded as the bridge of knowledge transformation from the source domain to the target domain; the second step is to construct L1-Graph on the basis of sparse representation, so as to search for the nearest neighbor data with self-adaptation and preserve the samples and the geometric structure; lastly, we integrate two complementary objective functions into the unified optimization issue and use the iterative algorithm to cope with it, and then the estimation of the testing sample classification is completed. Comparative experiments are conducted in USPS-Binary digital database, Three-Domain Object Benchmark database, and ALOI database; the experimental results verify the effectiveness of the proposed algorithm, which improves the recognition accuracy and ensures the robustness of algorithm.

1. Introduction

Traditional machine learning algorithms are usually applicable to the data, during which the training and testing samples are from the same characteristic space with the same distribution. With the change of characteristic and distribution, most statistical models need to be reconstructed by the new training sample collection. But, in practice, the training and testing samples are often collected in different periods and different environment; therefore, the distribution may be different. In order to solve this problem, the transfer learning theory is introduced, which aims to transform knowledge from the source domain marked to the target domain unmarked, for example, natural language learning processing [1], sentiment analysis [2], and image classification [3]. Using transfer learning to improve the recognition accuracy and robustness of different distribution is widely concerned by the domestic and foreign scholars.

In recent years, in view of the fact that the marked training samples and unmarked testing samples are from different domains, more and more researchers begin to attach importance to transfer learning [4, 5]. The basic idea of transfer learning is that although the different data distribution in the source domain and the target domain exists, some associated domains can share some of the same knowledge structure, which can be used as a bridge of knowledge transformation from the source domain to the target domain. The existing methods usually look for these common structures through the optimization of predefined objective functions, including the maximum of empirical likelihood and the preservation of geometric structure. For example, from the perspective of empirical likelihood, Dai et al. proposed the coclustering model based on classification [6], which, through the source domain data, adds constraints to set of words to provide classification structure and part of classification information, and coclustering was regarded as a bridge, through which the classification structure and information are transformed from the source domain to the target domain. The shortcoming of the method is considering the same concept in the document only, so Zhuang et al. studied the relationship between set of words and document classification; some invariant factors were regarded as the bridge of information transformation from the source domain to the target domain by trifactorization [7], but the method only considers the similar concepts in the text. With the similar idea, Wang et al. carried it out in the network data classification of different domains [8]. In consideration of the shortcoming of the above methods, a joint model of similar concepts and the same concepts was built [9], with synchronized learning of boundary condition distribution. Then, taking the unique concept in each domain document into consideration, Zhuang et al. studied with synchronization the sharing concept and the unique concept in all domains by matrix trifactorization [10], which was more flexible in the process of data-fitting so as to get a better recognition rate. From the geometric point of view, if the two sample data in the domain are similar in terms of the essence of data distribution and the geometric structure, their markers should also be similar [11]. To retain the essential structure, Ling et al. explored the consistency between the supervision of source domain and the essential structure of target domain through the spectrum learning [12]. And Pan et al. put forward the conversion component analysis [13], which aims to find a set of common conversion components in the two domains. The samples are projected into the subspace, which makes the reduction of different degrees of data distribution in different domains. With the same idea, Wang and Mahadevan presented the projection of different domains to the new potential space, synchronously matching the corresponding samples and preserving the geometric structure of each domain [14]. Accounting for the two views, a graph canonical transformation learning (GTL) [15] was put forward, which retained some geometric structures based on the maximum of empirical likelihood.

Based on study of joint canonical transformation learning and considering the fact that the L1-Graph [16, 17] has better adaptive ability and better stability compared with neighbor graph, we put forward a multiview classification algorithm based on L1-Graph domain adaptation learning, which is based on Long et al. [15]. The idea is to construct a framework of nonnegative matrix trifactorization based on transfer learning, in which the unchanged information is regarded as the bridge of knowledge transformation from the source domain to the target domain, and the next step is to construct L1-Graph on the basis of sparse representation, so as to search for the nearest neighbor data with self-adaptation and preserve the samples and the geometric structure; finally, the paper uses the iterative algorithm to cope with the optimization issue of unified objective functions, and then the classification of the testing samples is completed.

2. The Multidomain and Multiview Sample Classification Based on L1-Graph Domain Adaptation Learning

2.1. Description of L1-Graph Domain Adaptation Learning

L1-Graph domain adaptation learning can be applied to all different domains, but, for purposes of explanation, two domains are presented here: the source domain and the target domain , and the domain index is expressed as . Each domain , , has a characteristic matrix . In order to find the common structure, will be broken down into three nonnegative matrices; that is, , among which , , and is the column of the matrix . In the domain , the characteristic samples of matrix trifactorization are classified through the maximum of empirical likelihood; represents the relationship between the characteristic set and the sample classification , and it can be regarded as a bridge of knowledge transformation because of the cross domain stability. In addition, the use of sparse representation is carried out to construct graphs and diagrams and to represent the geometry information of the characteristic space and the sample space in domain , respectively. The basic idea of L1-Graph domain adaptation learning algorithm is presented in [15]. In general, similar characteristics represent the same meaning; likewise, similar samples have the same marker. These two graphs are regarded as joint regularization function so that the trifactorization model of learning successfully forecasts the sample marker in the case of retaining the essence of geometric structure, and the geometric information will be effectively integrated into the clustering process, thus ensuring that the common structure information can effectively promote transfer learning.

2.2. Model of Matrix Trifactorization

Assuming that the source domain is and the target domain is , the domain index is expressed as . and share the same characteristic space and marker space, including characteristics and categories in each characteristic space. , , represents the characteristic sample matrix in domain , in which indicates the column of the domain . represents the sample marker of the source domain, as belongs to category , ; otherwise, .

A total of structural information exists in every associated domain ; for the same structure information, the nonnegative matrix trifactorization is conducted as for the characteristic sample matrix , , and the optimization is planned as follows: in which is the Frobenius norm of matrix ; , and each represents semantic concepts, namely, characteristic clustering; , and each represents a sample type. and are the clustering results as to characteristic and sample separately. represents the relationship between the characteristic set and the sample classification , which can keep better stability when compared with and in different domains. Therefore, assuming that adapts to each domain, then the collective matrix trifactorization is planned as follows:

The common structural information , as the stable bridge of knowledge transformation, can be executed as the monitor information in the source domain, namely, executing . Through the bridge, the knowledge marker in source domain can be converted to the sample in target domain, and the process is in correspondence to the maximum of multidomain empirical likelihood.

2.3. Sparse Representation of L1-Graph Structure

From the geometric point of view, it can be thought that the data points are sampled in the distribution, which is formed by a low dimensional manifold, and then embedded into a high dimensional data space. Therefore, in order to avoid the change of the essence of the data distribution and to hope that the geometric structure is preserved in the conversion process, the assumption is presented here; if the inherent geometric structure of data distribution of two samples and in domain is close to each other, the markers of and should also be close to each other, so a model can be formed on the geometric structure of sample space. The traditional graph construction methods are often heavily dependent on the choice of parameters, and it is difficult to effectively reflect the complexity of the data distribution. According to the theory of sparse representation, any sample can be linearly reconstructed by other samples (allowing for some reconstruction error), and the sparse reconstruction coefficient of the sample can be obtained by coping with a L1 norm optimization issue. The reconstructed coefficients, as weights between two samples, can adjust the relationship between samples with self-adaption, so that the sparse graph, which represents the local relationship between samples, contains more useful structural information. From the perspective of duality between characteristic and sample, characteristic is also sampled from the distribution, which is supported by a low dimensional manifold, and then embedded into a high dimensional space. If the geometric structure of data distribution of two characteristics and in domain is close to each other, their characteristic sets and should also be close to each other. Therefore, as for the characteristic space, a sparse graph based on the principle of sparse representation can be constructed to retain the characteristic geometric structure in each domain, just the same as retaining the geometric structure of sample space. The sparse graph construction of sample here is expressed as , the corresponding sparse graph construction of characteristic is expressed as , and the sparse graph construction of sample space is presented in the following steps.

(1) Input. Sample , ; each sample is normalized, so that .

(2) To Solve the Reconstruction Coefficient. The sparse reconstruction coefficient for each sample in each domain can be obtained by coping with the following optimization issue of minimization of L1 paradigm:in which is an overcomplete dictionary and is a column vector which constitutes the reconstruction coefficient, indicating a relationship between sample and other samples.

(3) To Set the Edge Weights of L1-Graph. L1-Graph is expressed as , in which represents the set of all nodes in domain and represents the weight matrix of L1-Graph in the domain, that is, the similarity matrix. When , ; when , ; when , . The number of nearest neighbors of each sample is identified by optimizing the L1 paradigm issue instead of parameters, which are manually set up.

(4) The Similarity of Symmetrization Matrix. Consider .

Similarly, the weights matrix of characteristic space can also be obtained.

To preserve the L1-Graph regularization function of the geometric structure minimization sample in domain ,To further preserve the L1-Graph regularization function of the geometric structure synchronization minimization characteristic in domain ,

2.4. Joint Optimization

Evidently, (4) and (5) define that the geometric structure of sample and characteristic can be retained through the L1-Graph regularization function. Therefore, the two equations, as a combined L1-Graph regularization function, can be integrated into (2), which defines the optimization issue of L1-Graph coregularized collective matrix trifactorization (L1-GCMF) as follows:in which, and are regularization parameters; the optimization issue can be better defined by constraining the norm of each column of and . According to the optimization results, the marking of a random sample in target domain can be easily inferred by the following formula:

In the process of optimization of formula (6), the common structure information , which is found through the synchronous maximum of empirical likelihood and preserving the geometric structure, becomes more smooth in the process of conversion learning. L1-GCMF can be extended to handle multidomain issues and to research the commonality in the collection structure. The approaches to the optimization issue in (6) can be derived based on the constrained optimization theory. Specifically, the updating rules are deduced through fixing the rest of variables and optimizing one variable, and the process is repeated till the convergence. In consideration of the nonnegative and the constraint of norm , a Lagrange function is constructed as follows:in which and are constrained Lagrange multipliers and is 1 vector. By using the complementary conditions of Karush-Kuhn-Tucker (KKT), the constraint conditions of are deduced as follows:

According to the KKT conditions, the updating plan is obtained as follows:The computation of can be avoided through the use of an iterative standardization technology. In the process of each iteration, each column of can be normalized, so that . Then, two equal items, depending on , can be obtained; namely, , which can be omitted in the updating plan with no influence on the convergence; therefore, the updating rules are presented as follows:

Similarly, the updating rules can also be obtained as follows:in which represents the element product, the element division, and the square root of elements.

2.5. Description of a Multiview Sample Classification Algorithm Based on L1-Graph Domain Adaptation Learning

See Algorithm 1.

Input: data set , , parameters , , .
Output: the target domain classification results
Start:
  to construct the graph , through the principle of sparse representation; to normalize the data sets through
, .
  to initialize , to generate randomly, to obtain through , to obtain through the training of logistic
regression in .
  for   to do
    foreach do
      to update , and through (11)~(12)
      to fix
      to make the norm for each column of and .
      to calculate the objective function through (6)
End

3. Experiment Results and Analysis

3.1. Comparative Experiment Results Based on USPS-Binary Digital Database

The experimental samples are selected from the two digital databases: USPS and Binary. The USPS database contains 10 groups of handwritten digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, as shown in Figure 1(a), and each group contains 1100 gray scale samples. Some gray scale samples of one number are shown in Figure 1(b). The Binary database contains a total of 36 kinds of data, including the numbers 0–9 and the letters A–Z, and each kind of data includes 39 samples; only 10 groups of handwritten digits 0–9 are selected in the experiment. Figure 2(a) presents 10 categories of digits, and Figure 2(b) presents a part of two binary samples of one category.

All images are cut into the same dimension, SIFT characteristic extraction is made for each image, and then all the pictures are transferred into the 300-dimension characteristic histogram. The 10 categories from two databases were considered as the known ones; USPS database was treated as the source domain and Binary database the target domain; the data with the number were randomly selected from each category of data in USPS database to constitute the training samples, and 39 samples from each category of data in Binary database were treated as the observation samples. In order to obtain the optimal parameter values, we conducted the experiment 10 times to calculate the average recognition rate. The GCMF method obtains the optimal value when  , , and the L1-GCMF method obtains the optimal value when and . In all experiments, the number of neighbor unified values is 10, and the number of iterations is 100. In view of the different values of , 10 training samples were randomly selected from each group of data in USPS database to carry out the experiment, and Figure 3 showed the average recognition rate in correspondence with different training samples and different methods.

The experimental results from Figure 3 show that, for different number of training samples, the recognition rate of L1-GCMF algorithm is higher than that of GCMF [15] and ITML [18] algorithms, and, with the changing number of training samples, L1-GCMF algorithm is more stable compared with the other two algorithms. Due to the use of L1-Graph, the method can automatically search for the nearest neighbor parameters, which is more conducive to the connection to the same type samples, so as to improve the classification accuracy of multiview samples and make it have stronger stability.

3.2. Comparative Experiment Results Based on Three-Domain Object Benchmark Database

The Three-Domain Object Benchmark database [19] contains three different domains of amazon, dslr, and webcam, and each domain contains a total of 4652 images from 31 different object categories. There are about 90 images for each category in the domain of amazon, some of which from one category are displayed in Figure 4(a); there are 30 images for each category in domains of dslr and webcam, some of which are separately shown in Figures 4(b) and 4(c).

With amazon as the source domain and webcam as the target domain, 20 categories from the two databases were selected for experiments, all of the selected category of images will be converted to SIFT characteristics. Then 10 samples were randomly selected as training samples from the source domain, and 5 samples as testing samples from the target domain. The GCMF method obtains the optimal value when , , and the L1-GCMF method obtains the optimal value when and . A random selection of 10 training samples and testing samples was used for experiments, and the average recognition rate and standard deviation are shown in Table 1.

The experimental results from Table 1 show that the average recognition rate of L1-GCMF algorithm is relatively high, compared with information-theoretic metric learning (ITML) and GCMF algorithms. Different from the fact that the selection of the neighboring number of ITML and GCMF classification algorithms is dependent on the parameters, the algorithm presented in this paper can get higher recognition rate by using sparse principle of composition, which does not need to manually set up the parameters.

3.3. Comparative Experiment Results Based on ALOI Database

This set of experimental data is from the ALOI database. The database contains 1000 objects, and each object contains images with different light and different angles. 50 object images were selected for experiments, and all of the selected object images were transferred into the 800-dimension SIFT characteristics. Images under different angles were treated as the source domain, as shown in Figure 5(a), and images under different light as the target domain, as shown in Figure 5(b).

30 data were randomly selected from each group in the source domain to be treated as the training samples, and 10 images were randomly selected from each group in the target domain as the observation samples; the data of each group were used to carry out 10 random experiments, so data in the results are the mean of 500 random tests of the 50 categories; the parameter values are the same as in experiment of Section 3.1, and the average recognition rate and the standard deviation are shown in Table 2.

Experimental results from Table 2 show that the average recognition rate of this method is higher than that of ITML and GCMF algorithms, and the robustness of the method is presented because of the smaller standard deviation than the other two methods. Compared with the other algorithms, the algorithm has strong adaptability, avoids the artificial setting of the parameters, and achieves better multiview sample classification, which explains the rationality of the algorithm using sparse composition.

4. Conclusion

Based on the research of marked and unmarked data from different distribution, the paper proposed the multiview sample classification algorithm based on L1-Graph domain adaptation learning. The method first builds a nonnegative matrix trifactorization framework, in which the common structural information was regarded as the bridge of knowledge transformation from the source domain to the target domain, and then constructs L1-Graph by using the principle of sparse representation, so that the neighboring numbers of each sample are determined by the optimization of L1 paradigm issue, so as to search for the nearest neighbor data with self-adaptation. The next is to jointly optimize the objective function by using the iterative algorithm, and then the estimation of classification of the testing samples is completed. Comparative experimental results based on USPS-Binary digital database, Three-Domain Object Benchmark database, and ALOI database show the rationality of the method. However, compared with the traditional nearest neighbor composition, the composition method based on sparse representation has relatively high complexity and relatively large computational cost, due to the need to calculate the reconstruction coefficients of all samples.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.