Abstract

With the exponentially increasing volume of XML data, centralized learning solutions are unable to meet the requirements of mining applications with massive training samples. In this paper, a solution to distributed learning over massive XML documents is proposed, which provides distributed conversion of XML documents into representation model in parallel based on MapReduce and a distributed learning component based on Extreme Learning Machine for mining tasks of classification or clustering. Within this framework, training samples are converted from raw XML datasets with better efficiency and information representation ability and taken to distributed learning algorithms in Extreme Learning Machine (ELM) feature space. Extensive experiments are conducted on massive XML documents datasets to verify the effectiveness and efficiency for both classification and clustering applications.

1. Introduction

Classification and clustering are two major problems of XML documents mining tasks. One of the most important parts of mining XML documents is to convert XML documents into representation model. Most traditional representation models are designed for plain text mining applications, taking no account of structural information of XML documents. Vector Space Model (VSM) [1] is one of the most classic and popular representation models of plain text. Previous work proposed some approaches to solve the problem of considering both semantic and structural information for XML document classification, among which Structured Link Vector Model (SLVM) [2] extends VSM to generate a matrix by recording the attribute values of each element in an XML document. Reduced Structured Vector Space Model (RS-VSM) proposed in [3] achieves a higher performance due to its feature subset selection method. Different weights are assigned to different elements according to their priority and representation ability. Distribution based Structured Vector Model (DSVM) [4] further improves the calculation of traditional Term Frequency Inverse Document Frequency (TFIDF) values and takes two factors into consideration, namely, Among Classes Discrimination and Within Class Discrimination.

Extreme Learning Machine (ELM) was proposed by Huang et al. in [5, 6] based on generalized single-hidden layer feedforward networks (SLFNs). With its variants [7ā€“11], ELM achieves extremely fast learning capacity and good generalization capabilities usually in many application fields, including text classification [12], multimedia recognition [13ā€“15], bioinformatics [16], and mobile objects [17]. Recently, Huang et al. in [18] pointed out that (1) the maximal margin property of Support Vector Machine (SVM) [19] and the minimal norm of weights theory of ELM are consistent; (2) from the standard optimization method point of view, ELM for classification and SVM are equivalent. Furthermore, it is proved in [20] that (1) ELM provides a unified learning platform with a widespread type of feature mappings; (2) ELM can be implemented in regression and multiclass classification applications in one formula directly. ELM can be linearly extended to SVMs [18] and SVMs can apply ELM kernel to get better performance due to its universal approximation capability [10, 11, 21] and classification capability [20].

It is generally believed that all the ELM based algorithms consist of two major stages [22]: (1) random feature mapping and (2) output weights calculation. The first stage generating feature mapping randomly is the key concept in ELM theory which differs from other feature learning algorithms. In view of the good properties of the ELM feature mapping, most existing ELM based classification algorithms can be viewed as supervised learning in ELM feature space. While, in [23], the unsupervised learning in ELM feature space is studied, drawing the conclusion that the proposed ELM -Means algorithm and ELM NMF (nonnegative matrix factorization) clustering can get better clustering results than traditional algorithms in original feature space.

Recently, the volume of XML documents keeps explosively increasing in various kinds of web applications. Since the larger the training sample is, generally the better the learning model will be trained [24], it is a great challenge to implement distributed learning solutions to process massive XML datasets in parallel. MapReduce [25], introduced by Google to process parallelizable problems across huge datasets on clusters of computers, provides tremendous parallel computing power without concerns for the underlying implementation and technology. However, MapReduce framework requires distributed storage of the datasets and no communication among mappers or reducers, which brings challenges to (1) converting XML datasets into global representation model and (2) implementing learning algorithms in ELM feature space.

To our best knowledge, this paper is the first to discuss massive XML documents mining problems. We present a distributed solution to XML representation and learning in ELM feature space. Since the raw XML datasets are stored on distributed file system, we propose algorithm DXRC to convert the XML documents into training samples in the form of XML representation model using a MapReduce job. With the converted training samples, we apply PELM [26] and POS-ELM [27] to realize supervised learning and propose a distributed -Means in ELM feature space based on ELM -Means proposed in [23]. The contributions can be summarized as follows.(1)A distributed representing algorithm is proposed to convert massive XML documents into XML representation model in parallel.(2)Existing distributed supervised learning algorithms in ELM feature space are implemented to make comparison of massive XML documents classification performance, including PELM and POS-ELM.(3)A distributed unsupervised learning algorithm is proposed based on ELM -Means [23] to realize distributed clustering over massive XML documents in ELM feature space.(4)Empirical and extensive comparison experiments are conducted on clusters to verify the performance of our solution.

The remainder of this paper is structured as follows. Section 2 introduces XML documents representation models and proposes a distributed converting algorithm to represent XML documents stored on distributed file system. Extreme Learning Machine feature mapping is presented in Section 3. Section 4 presents classification algorithms based on distributed ELMs and proposes a distributed clustering algorithm in ELM feature space based on MapReduce. Section 6 makes performance comparison among distributed classification algorithms and evaluates the proposed distributed clustering algorithm. Section 7 draws conclusions of this paper.

2. Distributed XML Representation

In this section, we first introduce representation model of XML documents and then propose a distributed converting algorithm, which is able to generate global feature vectors for all the XML documents stored on distributed file system.

2.1. XML Representation Model

For learning problems of texts, such as XML and plain documents, the first important task is to convert original documents into representation model. Vector Space Model (VSM) [1] is often used to represent plain text documents, which takes term occurrence statistics as feature vectors. However, representing an XML document in VSM directly will lose the structural information. Structured Link Vector Model (SLVM) is proposed in [2] based on VSM to represent semistructured documents, which contains both semantic and structural information. SLVM is defined aswhere is a feature vector of the th XML element calculated aswhere is the th term and is a unit vector corresponding to the element .

In SLVM, each is a feature matrix , which is viewed as an array of VSMs. consists of the feature terms corresponding to the same XML element, which is an -dimensional feature vector in each element unit.

Based on SLVM, in [3], we proposed Reduced Structured Vector Space Model (RS-VSM), which not only inherits the advantages of representing structural information of SVLM, but also achieves a better performance due to the feature subset selection based on information gain. We also proposed Distribution Based Structured Vector Model (DSVM) in [4] to further strengthen the ability of representation. Two improved interact factors were designed, including Among Classes Discrimination (ACD) and Within Class Discrimination (WCD). Revised IDF was also introduced to indicate the importance of a feature term in other classes more precisely.

is the th term feature described aswhere is the number of elements in document , is the th element of , and , which is the unit vector of in SLVM, is now the dot product of -dimensional unit vector and -dimensional weight vector. is the revised IDF. The factor is the distribution modifying factor, which equals the reciprocal of arithmetic product of WCD and ACD. The detailed calculation of can be found in [4].

2.2. Distributed Converting Algorithm

In this section, we propose a distributed converting algorithm, named Distributed XML Representation Converting (DXRC), to calculate TFIDF [28] values of DSVM based on MapReduce. Since the volume of XML documents is so large that the representation model cannot be generated on a single machine, DXRC realizes the representation of XML documents in the form of DSVM in parallel. The map function and reduce function of DXRC are presented as Algorithms 1 and 2, respectively.

ā€ƒInput: docID, content
ā€ƒOutput: term, docID, element, times, sum
(1) Initiate HashMap mapEle;
(2) foreachā€‰ā€‰ ā€‰ā€‰do
(3)ā€ƒā€ƒInitiate sum = 0;
(4)ā€ƒā€ƒInitiate HashMap mapEleTF;
(5)ā€ƒā€ƒforeachā€‰ā€‰ ā€‰ā€‰do
(6)ā€ƒā€ƒā€ƒā€ƒsum++;
(7)ā€ƒā€ƒā€ƒā€ƒifā€‰ā€‰ā€‰ā€‰then
(8)ā€ƒā€ƒā€ƒā€ƒā€ƒmapEleTF.put(term, mapEleTF.get(term) + 1);
(9)ā€ƒā€ƒā€ƒā€ƒelse
(10)ā€ƒā€ƒā€ƒā€ƒā€ƒmapEleTF.put(term, 1);
(11)ā€ƒā€ƒmapEle.put(element, mapEleTF);
(12) foreachā€‰ā€‰ ā€‰ā€‰do
(13)ā€ƒā€ƒelement = itrEle.getKey();
(14)ā€ƒā€ƒforeachā€‰ā€‰ ā€‰ā€‰do
(15)ā€ƒā€ƒā€ƒā€ƒterm = itrEleTF.getKey();
(16)ā€ƒā€ƒā€ƒā€ƒtimes = itrEleTF.getValue();
(17)ā€ƒā€ƒā€ƒā€ƒemit(term, docID, element, times, sum);

ā€ƒInput: term, list(docID, element, times, sum
ā€ƒOutput: training samples matrix in the form of position, tfidf
(1) Initiate HashMap mapDocEleTF;
(2) Initiate HashMap mapTDocs;
(3) = DistributedCache.get(ā€œtotalDocsNumā€);
(4) weights = DistributedCache.get(ā€œelementWeightsVectorā€);
(5) foreachā€‰ā€‰ ā€‰ā€‰do
(6)ā€ƒā€ƒweightedDocEleTF = weightsdocId, elementā€‰ā€‰* itr.times/itr.sum;
(7)ā€ƒā€ƒmapDocEleTF.put(docId, element, weightedDocEleTF);
(8)ā€ƒā€ƒifā€‰ā€‰ā€‰ā€‰then
(9)ā€ƒā€ƒā€ƒā€ƒnewTimes = mapTF.get(docId) + irt.getValue().times;
(10)ā€ƒā€ƒā€ƒā€ƒmapTDocs.put(docID, newTimes);
(11)ā€ƒā€ƒelse
(12)ā€ƒā€ƒā€ƒā€ƒmapTDocs.put(docID, itr.getValue().times);
(13) docsNumber = mapTDocs.size();
(14) idf = ;
(15) foreachā€‰ā€‰ ā€‰ā€‰do
(16)ā€ƒā€ƒposition = itrDocEleTF.getKey();
(17)ā€ƒā€ƒtfidf = itrDocEleTF.getValue() idf;
(18)ā€ƒā€ƒemit(position, tfidf);

The map function in Algorithm 1 accepts key-value pairs and the MapReduce job context as input. The key of key-value pairs is the XML document ID and the value is the corresponding XML document content. A HashMap (Line 1) is used to cache all the elements of one XML document (Lines 2ā€“11), using element name as key and another HashMap (Line 4) as value. The caches the TF values of all the words in one element (Lines 5ā€“10). That is, for each XML document, the numbers of items and XML elements are the same; for each element, there are as many items in as there are distinct words in this element. Each item in and will be emitted as output in the form of (Lines 12ā€“17).

After the pairs are emitted by map function, all the key-value pairs with the same key, which are also the key-value pairs of the same word in XML documents, are combined and passed to the same reduce function in Algorithm 2 as input. For each key-value pair processed by reduce function, two HashMaps (Line 1) and (Line 2) are initiated. The HashMap is to cache the values of a word in each element in the corresponding XML document and is to cache the number of documents containing this word. The total number of documents (Line 3) and the vector (Line 4), which indicates the weights of all the elements in each XML document, are obtained through distributed cache defined in MapReduce job configuration. Since reduce now has all the values grouped by XML elements along with their weights, weighted values (Line 6) and the number of documents containing each word are calculated and cached in and , respectively (Lines 5ā€“12). Then the value can be calculated (Line 14) and multiplied by each item in . The output of reduce is the pairs, of which is indicating the index of DSVM matrix and is the value of the matrix. Finally, the XML representation DSVM can be built by this matrix and factor , uploaded onto distributed file system, and used as the input of the training model.

3. ELM Feature Mapping

Extreme Learning Machine (ELM) randomly generates parameters of the single-hidden layer feedforward networks without iteratively tuning to gain extremely fast learning speed. The output weights can be calculated by matrix multiplication after the training samples are mapped into ELM feature space.

Given arbitrary samples , ELM is modeled aswhere is the number of hidden layer nodes, is the output weight from the th hidden node to the output node, is the input weight vector, and is the bias of th hidden node. is the activation function to generate mapping neurons, which can be any nonlinear piecewise continuous functions [22], including Sigmoid function (5) and Gaussian function (6), as follows:

Figure 1 shows the structure of ELM with multiple output nodes and the feature mapping process. The three layers of ELM network are input layer, hidden layer, and output layer. The input nodes correspond to the -dimensional data space of original samples, while hidden nodes correspond to the -dimensional ELM feature space. With the -dimensional output space, the decision function outputs the class label of the samples.

The ELM feature mapping denoted by is calculated as

4. Distributed Classification in ELM Feature Space

In this section, we introduce the learning procedure of classification problems in ELM feature space and distributed implementations based on two existing representative distributed ELM algorithms, which are PELM [26] and POS-ELM [27].

4.1. Supervised Learning in ELM Feature Space

Most of the existing ELM algorithms aim at supervised learning, that is, classification and regression. In supervised learning applications, ELM is to minimize the training error and the norm of the output weights [5, 6], that is,where is the vector of class labels.

The matrix is the output weight, which is calculated aswhere is the Moore-Penrose inverse of .

ELM for classification is presented as Algorithm 3.

(1) forā€‰ā€‰ to ā€‰ā€‰do
(2)ā€ƒā€‚Randomly assign input weight and bias ;
(3) Calculate ELM feature space ;
(4) Calculate ;

The output weight of ELM can also be calculated aswhere, according to the ridge regression theory [29], the diagonal of a symmetric matrix can be incremented by a biasing constant to gain better stability and generalization performance [18].

For the case in which the number of training samples is much larger than the dimensionality of the feature space, considering the computation cost, the output weight calculation equation can be rewritten as

4.2. Distributed Implementations

Some existing works have introduced distributed implementations of various ELM algorithms. The original ELM was parallelized by PELM in [26]; Online Sequential ELM (OS-ELM) was implemented on MapReduce as POS-ELM in [27].

4.2.1. Parallel ELM

In the original ELM algorithm, in the case that the number of training samples is much larger than the dimensionality of ELM feature space and is nonsingular, the major cost is the calculation of Moore-Penrose generalized inverse of matrix , where the orthogonal projection method is used as (11). Thus the matrix multiplication and can be calculated by a MapReduce job. In map function, each term of and can be expressed as follows [26]:

In reduce function, all the intermediate results are merged and added up according to the corresponding elements of the result matrix. Since the training input matrix is stored by sample on different machines, the calculation can be parallelized and executed by the MapReduce job. The calculation procedure is demonstrated as Figure 2.

4.2.2. Parallel Online Sequential ELM

The basic idea of Parallel Online Sequential ELM (POS-ELM) is to calculate in parallel. By taking advantages of the calculation of partial ELM feature matrix with a chunk of training data of OS-ELM, POS-ELM calculates its with its own data chunk in the map phase on each machine. The reduce function collects all the and calculates aswhere

The calculation procedure of POS-ELM is shown in Figure 3.

5. Distributed Clustering in ELM Feature Space

In this section, in order to improve the efficiency of clustering massive XML datasets, we also propose a parallel implementation of ELM -Means algorithm, named Distributed ELM -Means (DEK) in this section.

5.1. Unsupervised Learning in ELM Feature Space

It is believed that transforming nonlinear data into some high dimensional feature space increases the probability of the linear separability. However, many Mercer kernel based clustering algorithms are usually not efficient for computation, since the feature mapping is always implicit and cannot be guaranteed to satisfy the universal approximation condition. Thus, [23] holds that explicit feature mapping like ELM feature mapping is more appropriate.

Generally, -Means algorithm in ELM feature space, as ELM -Means for short, has two major steps: (1) transform the original data into ELM feature space and (2) implement traditional clustering algorithm directly. Clustering in the ELM feature space is much more convenient than kernel based algorithms.

5.2. Distributed ELM -Means

For the applications of massive XML documents clustering, implementation of unsupervised learning methods to MapReduce is a key part of the problem. Since ELM feature mapping is extremely fast with good generalization performance and universal approximation ability, in this section, we propose Distributed ELM -Means (DEK) based on ELM -Means [23].

In DEK algorithm, the training samples of XML documents are distributedly stored on distributed file system. Each represents a training sample in ELM feature space with its corresponding class label . With a set of initiated cluster centroids, in the map phase, the distances between each centroid and each training sample stored on its own site are calculated. Then the sample is assigned to the centroid with the shortest distance. In the reduce phase, all the samples assigned to the same centroid are collected in the same reducer. Then a new centroid of each cluster is calculated, with which the set of cluster centroids are updated. That is, a round of MapReduce job updates the set of cluster centroids once. In the next round of MapReduce job, the updated set of centroids is updated again and this procedure will be repeated until convergence or up to maximum number of iterations (Figure 4).

Algorithm 4 presents the map function of DEK. For each sample stored on this mapper (Line 1), the distance between the sample and each cluster centroids is calculated (Lines 2, 3). Then each sample is assigned to the cluster whose centroid is the nearest to this sample (Line 4). The intermediate key-value pair is emitted in the form of (Line 5), in which is the specific sample and is the assigned cluster of .

ā€ƒInput: Training samples , centroids
ā€ƒOutput: centroid , sample
(1) foreachā€‰ā€‰ā€‰ā€‰do
(2)ā€ƒā€ƒforeachā€‰ā€‰ā€‰ā€‰do
(3)ā€ƒā€ƒā€ƒā€‚Calculate distance between and ;
(4)ā€ƒā€ƒAssign to the cluster with ;
(5)ā€ƒā€ƒEmit ;

Algorithm 5 presents the reduce function of DEK. We add up the sum distance in Euclidean space of all the samples in (Lines 1, 2) and then calculate the mean value to represent the new version of the centroid of cluster (Line 3). When all the cluster centroids are updated in this MapReduce job, if this version of centroids is the same as the older one, or if the maximum number of iterations is reached, DEK holds that the clustering job is done; otherwise, DEK continues to the next iteration of MapReduce job until convergence.

ā€ƒInput: centroid , samples list
ā€ƒOutput: Updated set of centroids
(1) foreachā€‰ā€‰ list()ā€‰ā€‰do
(2)ā€ƒā€ƒAdd to squared sum ;
(3) Calculate of cluster as ;

6. Performance Evaluation

All the experiments are conducted to compare the performance in the following aspects:(i)scalability evaluation of proposed Distributed XML Representation Converting (DXRC),(ii)scalability comparison between PELM and POS-ELM in massive XML documents classification problem,(iii)supervised learning performance of PELM and POS-ELM in massive XML documents classification problem,(iv)scalability evaluation of proposed Distributed ELM -Means (DEK),(v)unsupervised learning performance of DEK in massive XML documents clustering problem.

6.1. Experiments Setup
6.1.1. Environment

All the experiments on distributed XML representation converting and distributed learning in ELM feature space are conducted on a Hadoop (Apache Hadoop, http://hadoop.apache.org/) cluster of nine machines, which consists of one master node and eight slave nodes. Each machine is equipped with an Intel Quad Core 2.66ā€‰GHZ CPU, 4ā€‰GB of memory, and CentOS 5.6 as operating system. All the computers are connected via a high speed Gigabit network. The MapReduce framework is configured with Hadoop version 0.20.2 and Java version 1.6.0_24.

6.1.2. Datasets

Three datasets of XML documents are used as original datasets, which are Wikipedia XML Corpus provided by INEX, IBM DeveloperWorks (http://www.ibm.com/developerworks/) articles, and ABC News (http://abcnews.go.com/). Wikipedia XML Corpus is composed of around 96,000 XML documents classified into 21 classes. We also fetched RSS feeds of news and articles in the format of XML from IBM DeveloperWorks and ABC News official web sites. Each XML document of the RSS feeds is composed of elements of title, author, summary, publish information, and so forth. In order to compare the performance of the algorithms over different datasets, we choose the same numbers of XML documents out of all the three datasets, which are 6 classes and 500 documents in each class.

6.1.3. Parameters

According to the universal approximation conditions and classification capability of ELM, a large number of hidden nodes guarantee that the data can be linearly separated [23], especially for the learning problems on high-dimensional training samples like XML documents. Thus, after a set of experiments for parameter setting, the only parameter of learning algorithms in ELM feature space, that is, the number of hidden nodes , is set to 800.

6.1.4. Evaluation Criteria

To clearly evaluate the performance, three sets of evaluation criteria are utilized.(1)For scalability evaluation, we compare the criteria of speedup, sizeup, and scaleup. Speedup indicates the scalability when increasing the number of running machines, which is measured asā€‰Sizeup indicates the scalability when increasing the data size, which is measured asā€‰Scaleup is to measure the scalability of processing -times larger data on an -times larger cluster, which is calculated as(2)For classification problems, accuracy, recall, and -measure are used to evaluate the performance of supervised learning performance in ELM feature space. Accuracy indicates the overall ratio of correctly classified samples, which is measured asā€‰Recall is the ratio of the samples with a specific class label to the ones classified into this class, which is measured asā€‰-measure is to measure the overall performance considering both accuracy and recall, which is calculated as(3)For clustering problems, since each sample in the datasets we used in our experiments is assigned with a class label, we treat this class label as the cluster label. Thus, the same evaluation criteria are used for clustering problems as for classification problems.

6.2. Evaluation Results
6.2.1. Scalability of DXRC

The scalability of representation converting algorithm DXRC is first evaluated. Figure 5(a) demonstrates the speedup of DXRC. As the number of the slave nodes varies from one to eight, the speedup tends to be approximately linear at first, but the growth slows down due to the increasing cost of network communications among more and more working machines. But, in general, DXRC gains good speedup. Figure 5(b) presents the sizeup of DXRC. The -axis denotes the percentage against the whole datasets. That is, 1 is the full size of original dataset; 0.5 indicates half of the original dataset, in which the samples are randomly chosen. With a fixed number of slave machines, which is eight, the evaluation result shows good sizeup of DXRC. Since the scaleup in distributed implementation cannot stick to 1 in practice, in Figure 5(c), the scaleup of DXRC drops slowly when the number of slave nodes and the size of dataset increase, which indicates a good scaleup of DXRC.

Note that the representation ability and performance influence on XML documents classification of DSVM applied in DXRC can be found in our previous work [4].

6.2.2. Scalability of Massive XML Classification in ELM Feature Space

With the training samples converted by algorithm DXRC, a classifier of massive XML documents can be trained based on MapRedcue. The speedup comparison between PELM and POS-ELM on three datasets is presented in Figure 6.

Algorithm PELM, which implements original ELM on MapReduce, requires calculating the inverse of ELM feature space matrix, while PEO-ELM makes use of the idea of online sequential processing to realize parallel computation without communication and requires calculating output weight and the auxiliary matrix iteratively in a single reducer. The centralized calculation reduces the scalability of both PELM and POS-ELM to some degree, especially for POS-ELM. Thus, the speedup of PELM is better than POS-ELM.

Figure 7 demonstrates the sizeup comparison between PELM and POS-ELM. From this figure, we find that the sizeup of PELM is better than POS-ELM on all the three datasets.

For the scaleup comparison, Figure 8 demonstrates that both PELM and POS-ELM have good scaleup performance, and PELM outperforms POS-ELM on each of the three datasets.

In summary, both PELM and POS-ELM have good scalability for massive XML documents classification applications, but PELM has better scalability than POS-ELM.

6.2.3. Performance of Massive XML Classification in ELM Feature Space

The parallel implementation of both PELM and POS-ELM does not invade the computation theory of original ELM and OS-ELM, respectively; that is, the classification performances of PELM and POS-ELM are nearly the same as their corresponding centralized algorithms, respectively. The classification results are shown in Table 1.

From the table we can see that PELM slightly outperforms POS-ELM, because the iterative matrix operations of output weight in POS-ELM cause loss of calculation accuracy. However, for massive XML documents classification applications, since both the extraction and reduction of XML document features are complicated, both PELM and POS-ELM provide satisfactory classification performance.

6.2.4. Scalability of Massive XML Clustering in ELM Feature Space

In this set of experiments, we evaluate the proposed distributed clustering algorithm in ELM feature space, that is, distributed ELM -Means. In theory, the scalability of distributed -Means in ELM feature space and in original feature space is the same, since the only difference is the feature space of the training samples, which has no influence on the computation complexity. Thus, we only present the scalability of DEK without comparison with distributed -Means in original feature space.

The scalability of DEK is evaluated on all the three datasets in terms of speedup in Figure 9(a), sizeup in Figure 9(b), and scaleup in Figure 9(c). The experimental results all demonstrate good scalability of DEK for massive XML documents clustering applications.

6.2.5. Performance of Massive XML Clustering in ELM Feature Space

Clustering performance comparison between distributed clustering in ELM feature space and clustering in original feature space is made for massive XML documents clustering applications in this set of experiments. Note that, since the manual relabeling of the massive XML dataset is infeasible, we only evaluate the clustering quality with the original number of classes, which is six. The comparison results on three different datasets are presented in Table 2. It can be seen from the comparison results that DEK gets better clustering performance due to its ELM features mapping.

7. Conclusion

This paper addresses the problem of distributed XML documents learning in ELM feature space, which has no previous work to our best knowledge. Parallel XML documents representation converting problem based on MapReduce is discussed by proposing a distributed XML representation converting algorithm DXRC. The problem of massive XML documents classification in ELM feature space is studied by implementing PELM and POS-ELM, while, for the problem of massive XML documents clustering in ELM feature space, a distributed ELM -Means algorithm DEK is proposed. Experimental results demonstrate that the distributed XML learning in ELM feature space shows good scalability and learning performance.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research is partially supported by the National Natural Science Foundation of China under Grants nos. 61272181 and 61173030, the National Basic Research Program of China under Grant no. 2011CB302200-G, the 863 Program under Grant no. 2012AA011004, and the Fundamental Research Funds for the Central Universities under Grant no. N120404006.