Computational Intelligence and Neuroscience

Volume 2015 (2015), Article ID 405890, 6 pages

http://dx.doi.org/10.1155/2015/405890

## A Novel Multiple Instance Learning Method Based on Extreme Learning Machine

School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, China

Received 18 December 2014; Revised 18 January 2015; Accepted 18 January 2015

Academic Editor: Thomas DeMarse

Copyright © 2015 Jie Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Since real-world data sets usually contain large instances, it is meaningful to develop efficient and effective multiple instance learning (MIL) algorithm. As a learning paradigm, MIL is different from traditional supervised learning that handles the classification of bags comprising unlabeled instances. In this paper, a novel efficient method based on extreme learning machine (ELM) is proposed to address MIL problem. First, the most qualified instance is selected in each bag through a single hidden layer feedforward network (SLFN) whose input and output weights are both initialed randomly, and the single selected instance is used to represent every bag. Second, the modified ELM model is trained by using the selected instances to update the output weights. Experiments on several benchmark data sets and multiple instance regression data sets show that the ELM-MIL achieves good performance; moreover, it runs several times or even hundreds of times faster than other similar MIL algorithms.

#### 1. Introduction

Multiple instance learning (MIL) was first developed to solve the problem of drug prediction [1]. From then on, a variety of problems are formulated as multiple instance ones, such as object detection [2], image retrieval [3], computer aided diagnosis [4], visual tracking [5–7], text categorization [8–10], and image categorization [11, 12]. In MIL, the single example object that is called a bag contains many feature vectors (instances), some of which may be responsible for the observed classification of the example or object, and the label is only attached to bags (training examples) instead of its instances. Furthermore, example is classified as positive if at least one of its instances is a positive example; otherwise, the bag is labeled as a negative one.

Numerous learning methods for MIL problem have been proposed in the past decade. As the first learning algorithm for MIL, Axis-Parallel Rectangle (APR) [1] was created by changing a hyper rectangle in the instances feature space. Then, the famous Diverse Density (DD) [13] algorithm was proposed to measure a cooccurrence of similar instances from different positive bags. Andrews et al. [8] used support vector machine (SVM) to solve the MIL problem that was called MI-SVM, where a maximal margin hyperplane is chosen for the bags by regarding a margin of the most positive instance in a bag. Wang and Zucker [14] proposed two variants of the -nearest neighbor algorithm by taking advantage of the -neighbors at both the instance and the bag, namely, Bayesian-NN and Citation-NN. Chevaleyre and Zucker derived ID3-MI [15] for multiple instances learning from the decision tree algorithm ID3. The key techniques of the algorithm are the so-called a multiple instance coverage and a multiple instance entropy. Zhou and Zhang presented a multiple instance neural network named BP-MIL [16] with a global error function defined at the level of bags. Nevertheless, it is not uncommon to see that it takes a long time to train most of the multiple instance learning algorithms.

Extreme learning machine (ELM) provides a powerful way for learning pattern which has several advantages such as faster learning speed, higher generalization performance [17–19]. This paper is mainly concerned with extending extreme learning machine to multiple instance learning. In this paper, a novel classification method based on neural network is presented to address MIL problem. Two-step training procedure is employed to train the ELM-MIL. During the first step, the most qualified instance is selected in each bag through SLFNs with a global error function defined at the level of bags, and the single selected instance is used to represent each bag. During the second step, by making use of the selected instances, the modified SLFNs output parameters are optimized the way ELM does. Experiments on several benchmark data sets and text categorization data sets show that the ELM-MIL achieves good performance; moreover, it runs several times or even hundreds of times faster than other similar MIL algorithms.

The remainder of this paper is organized as follows. In Section 2, ELM is briefly introduced and an algorithmic view of the ELM-MIL is provided. In Section 3, the experiments on various MIL problems are conducted and the results are reported. In Section 4, the main idea of the method is concluded and possible future work is discussed.

#### 2. Proposed Methods

In this section, we first introduce ELM theory; then, a modified ELM is proposed to address the MIL problem, where the most positive instance in positive bag or the least negative instance in negative bag is selected.

##### 2.1. Extreme Learning Machine

ELM is a single hidden layer feedforward neural network where the hidden node parameters (e.g., the input weights and hidden node biases in additive nodes and Fourier series nodes, centers, and impact factors in RBF nodes) are chosen randomly and the output weights are usually determined analytically by using the least square method. Because updating of the input weights is unnecessary, the ELM can learn much faster than back propagation (BP) algorithm [18]. Also, ELM can achieve a better generalization performance.

Concretely, suppose that we are given a training set comprising samples and the hidden layer output (with nodes) denoted as a row vector , where is the input sample. The model of the single hidden layer neural network can be written as where is the weight of th hidden node connecting to output node, is the output of the network with hidden nodes, and and are the input weights and hidden layer bias, respectively. is the hidden layer function or kernels. According to the ELM theory [18–20], the parameters and can be randomly assigned, and the hidden layer function can be a nonlinear continuous function that satisfies universal approximation capability theorems. In general, the popular mapping functions are as follows:(1)Sigmoid function: (2)Gaussian function:

For notational simplicity, (1) can be written as where is the hidden layer output matrix, whose elements are as follows: and and .

The least square solution with minimal norm is analytically determined by using generalized Moore-Penrose inverse: where is the Moore-penrose generalized inverse of the hidden layer output matrix .

##### 2.2. ELM-MIL

Assume that the training set contains bags, the th bag is composed of instances, and all instances belong to the -dimension space; for example, the th instance in the th bag is . Each bag is attached by a label . If the bag is positive, then ; otherwise, . Our goal is to predict whether the label of new bags is positive or negative. Hence, the global error function is defined at the level of bags instead of at the level of instances: where is the error on bag .

Based on the assumption if a bag is positive at least one of its instances is positive, we can simply define as follows: where is the output of instance for bag . And our goal is to minimize the cost function for the bags.

Up to now, the last problem is how we can find the most likely instance that has the maximum output. As we know, ELM chooses the input weights randomly and determines the output weights of SLFNs analytically. At first, the output weights are not known; thus, the can not be calculated directly [16]. Furthermore, both the input weights/hidden node biases and output weights are initialized randomly. When the bags are put into the original SLFNs one by one, the instance having the maximum output will be marked down. The most positive or least negative instance (having maximum output) will be thus picked out from each bag. For each bag, we pick the most positive or negative instance with highest likelihood according to the label of the bags. The selected instances, whose number is equal to the number of training bags, will be used as training data set to train the original network through minimizing the least square.

Given a training set , the bag containing instances , each instance is denoted as -dimension feature vector, so the th instance of the th bag is . The hidden node uses sigmoid function, and hidden node number is defined as . The algorithm can now be summarized step-by-step as follows.

*Step 1. *Randomly assign the input weight , the bias and output weight , respectively.

*Step 2. ***For** every bag

** For** every instance in bag

Calculate the output of the SLFNs :
where and is the output of the hidden node function; here the sigmoid function equation (2) is used.

** End for**

Select the win-instance :

** End for**

Now, we have win-instances as the model input .

*Step 3. *Calculate the hidden layer output matrix

*Step 4. *Calculate the new output weights:
where , is the Moore-penrose generalized inverse of the hidden layer output matrix **,** and is a regulator parameter added to the diagonal of for achieving better generalization performance.

#### 3. Experiments

##### 3.1. Benchmark Data Sets

Five most popular benchmark MIL data sets are used to demonstrate the performances of the proposed methods, which are the MUSK1, MUSK2, and images of Fox, Tiger, and Elephant [21]. The data sets MUSK1 and MUSK2 consist of descriptions of molecules (bags). MUSK1 has 92 bags of which 47 bags are labeled as positive and the other are negative. MUSK2 has 102 bags of which 39 bags are labeled as positive bags and the other are negative. The number of instances in each bag in MUSK1 is 6 on average, while in MUSK2 the number is more than 60 on average. And the instance in MUSK data sets is defined by a 166-demensional feature vector. For Fox, Tiger, and Elephant data sets from image categorization, each of them contains 100 positive and 100 negative bags, and each instance is a 230-dimensional vector. The main goal is to differentiate images containing elephants, tigers and foxes from those that do not, respectively. More information of the data sets can be found in [8].

ELM-MIL network with 166 input units, where each unit corresponds to a dimension of the feature vectors, is trained for ranging hidden units. It should be noted that outputs are positive for each unit output, while are negative. When applied for multiple instance classification, our method involves two parameters, namely, the regular parameter and the number of hidden neurons. In the experiments, and the number of hidden neurons are selected from and , respectively. For comparison with several typical MIL methods, we conduct 10-fold cross validation, which is further repeated 10 times with random different partitions, and the average test accuracy is reported. In Table 1, our method is compared with iterated-discrim APR, Diverse Density, EM-DD, BP-MIP, MI-SVM, C4.5, and Citation-NN. All the results taken from original literature were obtained via 10-fold cross validation (10CV) except Citation-NN using leaving one out cross validation (LOO). The values in bracket are the standard deviation and the unavailable results are marked by N/A.