Mathematical Problems in Engineering

Volume 2017 (2017), Article ID 1239164, 8 pages

https://doi.org/10.1155/2017/1239164

## Domain Adaption Based on ELM Autoencoder

School of Computer, Xi’an University of Posts & Telecommunications, Xi’an 710121, China

Correspondence should be addressed to Wan-Yu Deng; moc.621@uynawgned

Received 21 June 2016; Revised 26 November 2016; Accepted 18 May 2017; Published 19 June 2017

Academic Editor: Jason Gu

Copyright © 2017 Wan-Yu Deng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We propose a new ELM Autoencoder (ELM-AE) based domain adaption algorithm which describes the subspaces of source and target domain by ELM-AE and then carries out subspace alignment to project different domains into a common new space. By leveraging nonlinear approximation ability and efficient one-pass learning ability of ELM-AE, the proposed domain adaption algorithm can efficiently seek a better cross-domain feature representation than linear feature representation approaches such as PCA to improve domain adaption performance. The widely experimental results on Office/Caltech-256 datasets show that the proposed algorithm can achieve better classification accuracy than PCA subspace alignment algorithm and other state-of-the-art domain adaption algorithms in most cases.

#### 1. Introduction

With the rapid development of Internet and social networks, a huge amount of data (e.g., web data and social data [1]) is being generated rapidly at every moment [2, 3]. With the explosive growth of data, its processing becomes more and more essential. Among data processing, the feature extraction is one of the most important technologies to deal with data. Feature extraction is used to represent the sample data and extract the most useful characteristics. The performance of a machine learning algorithm depends on whether the extracted feature can well represent the data. When the data dimension is too large, there will be a lot of problems. For example, the computational efficiency will be decreased and may cause overfitting problems. Principle Component Analysis (PCA) [4–6] is one of state-of-the-art methods of feature extraction. PCA is used to reduce the dimensions of data under such circumstance and tries to keep the useful information as more as possible. In particular, it uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance, and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. However, PCA bears its inherent limitations when it is employed to represent the data: () it requires that the data is linear and () the features of PCA decomposition are orthogonal. In real applications, many datasets do not conform to this property. Based on this, we introduce a new method, ELM Autoencoder (ELM-AE), to extract the feature space of the data and learn the projecting function of subspace. The feature space that one extracts is not based on orthogonal transformation and the space of the data can be linear and nonlinear cases.

Domain adaptation (DA) is mainly to train a robust classifier, which can recognize the fact that the data is from the different distributions. It is widely used in computer vision and pattern recognition. The usual DA methods are divided into two classes [7]: () semisupervised DA algorithm [8], which will have a small amount of label data from the target domain, and () unsupervised DA algorithm, in which there is no label data from the target domain. Here we mainly study the unsupervised DA algorithm and proposed a DA algorithm based on ELM-AE.

Recently, DA has been applied to many fields, embracing speech and language processing [9–11], computer vision [12–14], statistics, and machine learning [12–14]. A robust classifier can be trained in DA to deal with the multidomain mixed classification tasks. This algorithm is particularly suitable to handle the unsupervised data with no class labels. Typical implementation of DA method is learning a new space in which the differences of feature representation between source and target domains can be minimized.

DA method based PCA has been extensively researched [4–6]; we can find a common subspace through PCA in which the diversities between the two different distributions data of the source and target domains are minimized. In [10], Blitzer et al. proposed a method to learn a new feature space through feature relationship between different domains. Source data representation can be obtained by linear transformation of target data according to Jhuo et al. mentioned in [15]. In [16], Gong et al. proposed a geodesic flow kernel (GFK), which mainly counts the changes of both source data and the target data in the geometry and statistics. Fernando et al. proposed a DA method based PCA in [7]. They obtained the feature space of the source and target data by applying PCA, respectively. Then, the representative feature of source data was projected into the feature space of target data and the representative feature of the target data was projected into the feature space of source data. Fernando et al. also proposed three methods in [7]: DA-SA1, where subspace built from source domain uses PCA, DA-SA2, where they use PCA to project the subspace of the target domain which is denoted by , and NA, where they only use the original input space without learning a subspace.

The rest of the paper is organized as follows. We present the related work in Section 2. Section 3 is devoted to the presentation of the ELM-AE method and the consistency theorem on the similarity measure deduced from the learned projecting function. In Section 4, the subspace alignment algorithm based on ELM-AE is introduced. We carry out our experiments on various datasets in Section 5. In Section 6, we get the conclusions.

#### 2. Related Work

In this section, we will show a kind of feature extraction method (PCA) which has been used in subspace alignment domain adaption algorithm. In some applications involving many related features, a great number of features will not only increase the complexity of the problem but also make it difficult to give a reasonable analysis of the problem. In general, although each feature provides some information, their importance is different. In many cases, there is certain correlation among the feature; thus, the information provided by these feature, to some extent, will coincide. Therefore, it is expected to represent these features by a small amount of the new and unrelated feature to reflect the vast majority of information provided by the original feature and then achieve the better solution of the problems through new feature.

PCA was invented in 1901 by Karl Pearson in [17] as an analogue of the principal axis theorem in mechanics. It can be mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate, the second greatest variance on the second coordinate, and so on. In particular, it mainly includes the following steps [17].

() To normalize the raw data matrix , which is equivalent to the panning and expansion of the original feature,where ; represents the average of the th column and .

() Solving covariance matrix,where represents the correlation coefficient between the th column and th column.

The covariance matrix can be obtained by the following equation:

(3) Calculate the eigenvalues and eigenvectors of the covariance matrix and decompose the covariance matrix by its feature in [18]:where is a diagonal matrix composed by eigenvalue of the covariance matrix is the orthogonal matrix composed by eigenvectors of in column, and it is the principal coordinate of the new variables. Characteristic value represents the size of the new variable variance, and the eigenvalues we obtained will gradually decrease.

() When the eigenvalues are small enough, we think it has little to deal with our source data. Thus, we will choose the first larger eigenvalues and eigenvectors to constitute our projecting space being .

() By using (5), get the new data representation. Every line in matrix is equivalent to the projection of all the lines of the original matrix in the principal component axis. These new vectors of projection can be used to express our source data.

PCA method can find the most important variable combination of the original data. By showing the greatest variance, it can effectively and intuitively reflect the relationship between samples. Moreover, it can approximately express the original data by the largest principal component projecting. However, PCA method has its limitations: () it requires that the principal component must be a linear combination of the original data and () it requires that each principal component must be uncorrelated. These will lead to PCA being not able to solve some practical problems well when it encounters them.

#### 3. ELM-AE

In this section, we will introduce one new feature representation algorithm, ELM-AE, which is based on one very fast and effective neural network named Extreme Learning Machine (ELM) [19, 20]. Just like the traditional ELM [21–23], ELM-AE contains three layers: input layer, hidden layer, and output layer. The difference is that the target output is the same as the input in ELM-AE. Figure 1 shows ELM-AE’s network structure for compressed, sparse, and equal dimension representation.