Research Article  Open Access
Divya Tomar, Sonal Agarwal, "Twin Support Vector Machine for Multiple Instance Learning Based on Bag Dissimilarities", Advances in Artificial Intelligence, vol. 2016, Article ID 1269708, 18 pages, 2016. https://doi.org/10.1155/2016/1269708
Twin Support Vector Machine for Multiple Instance Learning Based on Bag Dissimilarities
Abstract
In multiple instance learning (MIL) framework, an object is represented by a set of instances referred to as bag. A positive class label is assigned to a bag if it contains at least one positive instance; otherwise a bag is labeled with negative class label. Therefore, the task of MIL is to learn a classifier at bag level rather than at instance level. Traditional supervised learning approaches cannot be applied directly in such kind of situation. In this study, we represent each bag by a vector of its dissimilarities to the other existing bags in the training dataset and propose a multiple instance learning based Twin Support Vector Machine (MILTWSVM) classifier. We have used different ways to represent the dissimilarity between two bags and performed a comparative analysis of them. The experimental results on ten benchmark MIL datasets demonstrate that the proposed MILTWSVM classifier is computationally inexpensive and competitive with stateoftheart approaches. The significance of the experimental results has been tested by using Friedman statistic and Nemenyi post hoc tests.
1. Introduction
Standard pattern recognition problems consider that the objects are represented as a single feature vector which contains sufficient information for the recognition of these objects. However, some complex objects exist in the real world which are difficult to represent by using a single feature vector; that is, single feature vector representation of an object is not sufficient for its separability, for example, a document with several paragraphs, an image containing many regions, each with different characteristics, and a drug with various conformations of a molecule. Traditional supervised learning techniques handle such kind of problems by representing complex objects using single feature vector. This reduction may lose significant information which further degrades the performance of supervised learning techniques. A set of feature vectors or multiple instances representation can be used for the better understanding of complex object [1, 2]. Multiple instances representation of a complex object can preserve more information about it. MIL is a variation of supervised learning in which a classifier is trained on a set of instances known as bag instead of individual instance. The objective of MIL approaches is to predict the class label for a bag. A bag may have different number of instances and may belong to positive or negative class label. Positive class label is assigned to a bag if it contains at least one positive instance while negative class label is assigned to a bag when all of its instances are negative [1, 3–5]. Figure 1 illustrates the framework of single instance learning and multiple instance learning.
(a) Single instance learning
(b) Multiple instance learning
From Figure 1, it is observed that, in single instance learning, each object is represented by a single feature vector or instance and the classifier learns at instance level by assigning the class label to each instance individually. However, in MIL framework, an object is represented by a set of feature vectors or a bag and the classifier trains at bag level instead of instance level and predicts the class label of a bag instead of an instance.
The term MIL was first used for the drug activity prediction problem (Musk odor prediction) [1]. Later on, it is widely used by the researchers to solve various real world problems like image annotation [6–9], document categorization [6, 10, 11], object detection [12, 13], human action recognition [14], visual tracking [15–18], spam filtering [19], and many other problems. Several MIL approaches have been proposed by the researchers which can be broadly categorized into two groups. The first category, also known as bagbased method, works only on the bag label without having any knowledge of each instance label. Thus the bag labels can be predicted by converting a bag into a single instance representation and using supervised algorithms or by defining kernels or distances between bags. MIL approaches belonging to first category include those of Chen et al. [7], Sørensen et al. [20], and Cheplygina et al. [21] that use some dissimilarity measures to represent a bag with a derived feature vector. Wang and Zucker [22] defined nearest neighbors among bags, Gärtner et al. [23] and Wang et al. [24] determined kernels between bags, Zhou et al. [10] generated a graph with instance from a bag, and Zhang et al. [25] incorporated structure information between bags and many more. On the other hand, the second category also known as instancebased methods focuses on the instance label and a bag label is determined by combining the classification of instances. Axisparallel rectangle method [1] and Diverse Density [2] and its variation [26] are some examples of instancebased approaches. The bagbased methods are widely used by the researchers as they have shown better performance on a wide range of MIL datasets. Therefore, this study has focused on the first category and extended the recently proposed Twin Support Vector Machine (TWSVM) classifier to multiple instance learning scenarios by obtaining summarized information of each bag using different dissimilarity measures [27].
In recent years, many nonparallel hyperplane Support Vector Machine (SVM) classifiers are proposed by the researchers for binary classification [27–29]. For example, Mangasarian and Wild proposed a Generalized Eigenvalue Proximal Support Vector Machine (GEPSVM), the first nonparallel hyperplane classifier, which aims to find a pair of nonparallel hyperplane in such a way that each hyperplane is nearest to one of the two classes and as far as possible from the other classes [30]. GEPSVM shows excellent performance with several benchmark datasets especially with the “CrossPlanes” dataset. Later, by utilizing the concept of traditional SVM and GEPSVM, Jayadeva et al. proposed a nonparallel hyperplane based novel binary classifier, named as TWSVM [27]. The aim of TWSVM classifier is to generate two nonparallel hyperplanes in such a way that each hyperplane lies in close affinity to one of the two classes while maintaining distant from the data instance of other classes. For this purpose, it solves two SVMtype Quadratic Programming Problems (QPPs), while GEPSVM solves two generalized eigenvalue problems. Since TWSVM solves two smaller QPPs as opposed to a single complex QPP, the learning of TWSVM classifier is four times faster than that of standard SVM. TWSVM has shown its superiority over the other existing machine learning approaches on several benchmark datasets. Therefore, in this study, we have extended TWSVM to multiple instance learning scenarios. This paper proposes a bag dissimilarity based multiple instance learning TWSVM (MILTWSVM) classifier. We have defined the dissimilarity between bags using different approaches. The proposed classifier is trained with the summarized information of instances in each bag where a bag is represented by a feature vector. Feature vector contains the dissimilarity scores of a bag derived from the other bags in the training set. The experiment has been performed on ten MIL benchmark datasets. The results of the proposed approach have been compared with several existing MI learning approaches, such as Diverse Density (DD) [2], ExpectationMaximization Diverse Density (EMDD) [26], Multiinstance Logistic Regression (MILR) [31], Citation kNN [22], and MultiInstance Support Vector Machine (MISVM) [6]. The effectiveness of the proposed approach has also been analyzed by using Friedman average rank hypothesis tests [32, 33]. The statistical inferences are made from the observed difference in predictive accuracy. Modified version of Demšar significance diagram has been used to display the output of Friedman test.
The rest of the paper is organized as follows. Section 2 provides a brief overview of multiple instance learning approaches and their applications in the real world. Section 3 includes the formulation of Twin Support Vector Machine classifier. Section 4 describes different approaches used to measure the dissimilarity between bags. The experimental results are discussed in Section 5 and finally the conclusion is drawn in Section 6.
2. Overview of Multiple Instance Learning Approaches and Their Applications
In Multiple Instance Learning (MIL), a bag is used to represent an object as follows:where is the number of instances or feature vectors in bag and is the dimensional feature space. Consider the training dataset contains bags. The training dataset for MIL is represented aswhere is the class label corresponding to each bag in the dataset. The bags are labeled with class if and only if it contains at least one positive instance; otherwise it is considered as negative labeled bag. The objective of MIL problem is to learn a model which can determine the class label of the unseen bag. MIL approaches have been widely used by the researchers to solve many real world problems. The drug activity recognition is one of the most popular applications of it. In this problem, for a given chemical molecule, the system must decide if it is useful for drug design or not. A good drug has the characteristic that it is strongly bound to a target “binding site.” A molecule can adopt multiple conformations or shapes and only one or a few of them bind well with the target protein or binding site. Dietterich et al. modeled the MIL framework to deal with the drug activity recognition [1]. They have predicted whether a new molecule was suitable for drug design or not by analyzing a set of known molecules. They have developed a threeaxisparallel rectangle algorithm in which the combination of extracted molecule features was used to determine the axisparallel rectangles (APR). Zhao et al. proposed an MIL approach based on joint instance and feature selection for drug activity prediction [34]. They have focused on irrelevant and redundant features reduction in order to improve the interpretability of the drug activity recognition model. Diverse Density (DD) has been proposed to measure a region in the feature space which consists of at least one instance from each positive bag while there are no instances from negative bags [2]. DD of a given point is defined as the ratio of the number of positive bags which have instances near to this point and the sum of distances of negative instances from . The point at which DD is maximized corresponds to the target concept. DD has suffered from local optimization problem and the best solution could be achieved by many restarts. An algorithm ExpectationMaximization Diverse Density (EMDD) has been proposed by Zhang and Goldman to solve this problem [26]. They have combined expectationmaximization approach with DD to solve the local optimum problem by iteratively updating the previous target point. Expectation step finds the most positive instance from each bag after an initial guess for the target point . Then, maximization step searches for a new point by maximizing DD on the selected most positive instances. These steps are repeated until the algorithm converges. EMDD performs well on a variety of MIL problems, but it is also very computationally intensive. Several regular supervised classifiers have also been extended to MIL scenarios, for example, Citation kNearest Neighbor (Citation kNN), Bayesian kNN, ID3MI, and multiple instance learning SVM. Citation kNN is an extension of kNN in which a bag has been labeled by analyzing its neighboring bags and the bags that consider the concerned bag as a neighbor [22]. Citation kNN uses different distance metric (minimal Hausdorff distance) in which the focus has been shifted from instances to the bags; that is, the distance is measured between different bags instead of different instances (see Figure 2). Citation kNN has shown better performance on Musk datasets. Recently, a variance of Citation kNN has been proposed by Zhou et al. for web mining task in which the minimum Hausdorff distance has been modified for text features [35]. ID3MI is a decision tree algorithm which uses multiinstance entropy criterion to split the tree nodes [36].
SVM has been extended to MIL scenario by Andrews et al. [6]. They have used two approaches for the extension of SVM. In the first approach, traditional SVM has been extended to MIL scenarios in which hidden labels of instances are decided under constraints posed by the class labels of bags. In the second approach, the objective was to maximize the bag margin directly. MIL Boost is another example of MIL approach where the weights of instances are updated in each of the boosting rounds [12]. In this approach, NoisyOR rule is used to determine the bag labels from given instance labels. Logistic Regression and Neural Network are also extended to the MIL framework. Logistic Regression is a popular probabilistic supervised learning approach which has been extended to MIL framework. Fu and RoblesKelly extended Logistic Regression (LR) to MIL problem domains and combine and regularization methods [37]. Xu and Frank also upgraded single instance LR to multiinstance data and showed its effectiveness on artificial and Musk drug activity prediction dataset [31]. They have followed several assumptions to form the baglevel probability from instancelevel class probabilities. Ramon and De Raedt have also explored the utility of NN for MIL due to its ability of automatic learning from examples [38]. In another research work, Zhou and Zhang proposed BPMLP which is the extension of NN to MIL [39]. In this approach, traditional BP algorithm has been extended using a global error function which is defined at bag level rather than at instance level. Image classification and retrieval is another significant application of MIL in which a given image is to be classified into a target category on the basis of its visual content [40–42]. In MIL, an image can be viewed as a bag of local image patches and can be labeled as positive or negative. A positive label image contains a set of image patches or instances in which at least one patch is conceptual to the user while if all the patches are not conceptual to the user then image is considered as negative label image. For example, in beach scene classification, the target class is beach and by using different regions or contents of the scene image (see Figure 3), the objective is to recognize whether it belongs to beach scene or not. In this case, any visual content that displays a beach may be considered as positive image while negative images show another different visual content.
Xu proposed an MIL extension of Neural Network for the retrieval and classification of images [43]. Maron and Ratan applied DD MIL approach for the classification of natural image scene by using different kinds of bag generators [44]. Bag generators consider each image as a bag and various subregions in the image as instances. They have performed experiment on COREL photo library. Cheng et al. proposed BPMLP and BPSVM approaches for automatic image categorization in multiple instance learning scenario and performed experiment on 2000 images obtained from COREL repository [45]. They have extracted frequent patterns from each image category and embedded an image bag into a multidimensional data point which is useful to characterize the similarity between the image and every common pattern of an image category. Gondra and Xu proposed a Relevance Feedback (RF) learning based Content Based Image Retrieval (CBIR) framework in multiple instance learning scenario [8]. In another research work, Pao et al. proposed an EMDD based MIL method for image classification [46]. Sener and IkizlerCinbis proposed an ensemble of multiple instance learning approaches for the problem of image reranking [47]. They have constructed bags by using three different approaches: sliding window and dynamic and dynamicsliding methods. Then the constructed bags have been used to develop multiinstance classifiers. They have used multiple instance learning with instance selection (MILES) algorithm as MILclassifiers. Rank was assigned to an image by combining the decision score of MILclassifiers. Li and Liu used graph based MIL with instance weighting for the retrieval of images [48]. Different weights were assigned to each region in positive images on the basis of learning results and then rank was calculated for each image. Feng et al. proposed a multiinstance semisupervised learning approach on the basis of hierarchical sparse representation for the categorization of images [49]. They have solved the instance confidence value identification problem under the framework of instancelevel sparse representation. Several other research works have also focused on the image categorization problem in MIL framework. Xu et al. utilized the concept of deep learning of feature representation with multiple instance learning for colon cancer classification based on histopathology images [50]. The deep learning network is a process of obtaining high level features from low level features. They have proposed a system based on deep learning having a set of linear filters in encoder and decoder and used the last hidden layer of deep learning as fully supervised feature learning, as it represents intrinsical features compared to lower level features. Wu et al. extended deep learning to multiple instance learning framework for image annotation [51]. They have used deep convolutional neural network which contains five convolutional layers, followed by a pooling layer and three fully connected layers for learning visual representation with multiple instance learning. The last hidden layer was redesigned for multiple instance learning. Kotzias et al. have also combined the concept of deep learning and multiinstance learning for knowledge transfer [52]. MIL has also been utilized in disease diagnosis by analyzing medical images. Ding et al. considered the breast ultrasound image classification task as multiple instance learning tasks and proposed an MIL method based on SVM which classifies the tumors into benign and malignant [53]. They have used selforganizing map (SOM) to map the instance space into conceptspace and constructed the bag feature vector by using the distribution of the instances of each bag. Li et al. proposed a novel computer aided diagnosis scheme for the recognition of tumor invasion depth of gastric cancer [54]. They have extracted both baglevel and instancelevel features and applied an improved citation kNN algorithm for the identification of gastric tumor invasion depth. Tong et al. used MIL method for the detection of Alzheimer’s disease (AD) and its prodromal stage mild cognitive impairment (MCI) [55]. They have built a graph for each image to identify the relationships among the patches and performed experiment on 834 MRI images taken from ADNI study. In another research work, Quellec et al. proposed an MIL framework for diabetic retinopathy screening [56]. Text categorization is another popular application of MIL. Wang et al. [11] proposed a novel instance specific distance method for the application of MIL text categorization. They have derived this data from Reuters21578 collection having 2000 bags with 243 features. He and Wang investigated the problem of text categorization from multiple instance view in which each text is considered as a bag and each of its sentences as instance [57]. They have developed an MIL approach for Chinese text classification using kNN. MIL is also used for web mining or web index recommendation problem in which each web page is considered as a bag and each of its linked pages is considered as bag instances. Viola et al. [12] proposed an algorithm, known as Fretcit kNN based on minimum Hausdorff dissimilarity measure, and determined the class label of unseen bag by utilizing both references and citers. MIL approaches also have significant contribution to visual tracking [15, 17, 58] and real time video event detection areas [59].
3. Twin Support Vector Machine
Twin Support Vector Machine is a binary classification technique that does classification of data instances by constructing two nonparallel hyperplanes instead of a single hyperplane as in the case of traditional Support Vector Machine. It obtains two nonparallel hyperplanes by solving two QPPs of smaller size as compared to a single complex QPP solved by traditional SVM. TWSVM generates hyperplane for each class in such a way that the data instances of each class lie in close affinity to its corresponding hyperplane and as far as possible from the other hyperplane. The effectiveness of TWSVM over other existing classification approaches has been validated on various benchmark datasets. TWSVM has better generalization ability and faster computational speed due to which it has been applied to several real life applications such as intrusion detection [60, 61], activity recognition [62], image denoising [63], emotion recognition [64], text classification [65], defect prediction [66, 67], disease diagnosis [68, 69], and speaker identification [70]. Consider a binary classification problem of “” size. The training dataset for such kind of problem can be represented aswhere , , represents input data instances in dimensional feature space and indicates corresponding class label. Consider two matrices and comprising the data instance of class and class , respectively. TWSVM solves the following two QPPs:and seeks the following two nonparallel hyperplanes in :Here, and are normal vectors to the hyperplanes; and represent the bias terms. and are two vectors of 1’s of appropriate dimensions. and are two positive tradeoff constants. and are slack variables due to the class and class , respectively. The first term of (4) or (5) is the sum of squared distances of data instances from their corresponding hyperplane. Minimization of this term keeps the hyperplane closest to the data instances of class or class . The second term of (4) or (5) assigns penalty to the data instances of other classes which are misclassified. The constraints require the hyperplane to be maintained at least 1 distance from the data instances of other classes. Slack variable measures the error wherever the hyperplane is closer than the 1 distance. In this way, the hyperplane is kept closer to the data instances of its respective class and as far as possible from the data instances of other classes. The Lagrangian corresponding to (4) is given as follows:where and are two vectors of Lagrange multipliers. The KarushKuhnTucker (KKT) conditions are given bySince , from (10), we can determineEquations (8) and (9) lead toLet , , and . The above equation becomesIn similar manner,From (16) and (17), it is clear that the solution of hyperplane parameters requires the inverse of matrix and . Sometimes, matrix may be illconditioned due to which it is difficult to calculate its inverse. To avoid this situation, regularization terms and are added to the abovementioned equations as follows:where are user defined parameters having small values and is an identity matrix of appropriate dimension. Wolfe dual of (4) and (5) can be defined asUsing these equations, we can determine Lagrangian multipliers which are further useful to obtain hyperplane parameters. In this way, hyperplane is constructed for each class using (6). A class or is assigned to new data instance depending upon its closeness to the two nonparallel hyperplanes. TWSVM assigns class label to an instance by using the following decision function: where is the absolute value. TWSVM has also been extended to the nonlinear cases where data instances are not separable by linear class boundaries. For this purpose, it uses kernel trick to transform the data instances into higherdimensional feature space. Nonlinear TWSVM seeks the following two kernel surfaces instead of planes: where is any arbitrary kernel function and . The primal QPPs of nonlinear TWSVM corresponding to kernelgenerated surfaces (21) are given below:Similar to (16) and (17), kernelgenerated surface parameters can be determined asHere, and . Similar to the linear case, regularization terms and are added to (23) to avoid the illconditioned matrices. A new data instance is labeled with class or in a similar manner to the linear case.
4. Bag Dissimilarity Representation
In multiple instance learning case, a classifier works at the bag level rather than instance level and takes a bag as an input. Therefore, the objective of MIL is to develop a classifier which generates a decision function for the bag. In the proposed approach, each bag is represented by a vector of its dissimilarities to the other bags in the training set. The dissimilarity of a bag from all other bags represents a feature vector. If there are bags in the training set and th bag contains number of instances, then th bag can be represented asThus, each bag has a single feature vector or instance representation and the MIL problem can be considered as a regular supervised learning problem. The dissimilarity between two bags and is measured by using different ways which are classified into two main categories on the basis of bag representation. Consider the representation of a bag as a point set of the highdimensional feature space, and then the dissimilarity between two bags can be measured using a set distance. The following distance metrics have been used to calculate the dissimilarity between bags.
(a) Hausdorff Distance. The Hausdorff distance is one of the most popular distance metrics used in object recognition in the field of computer vision. Two bags and are said to be close to each other if every instance of bag is close to an instance in bag . The dissimilarity between two bags and is defined asHere, represents the directed distance between two bags and . In detail, given two bags and , the directed distance between and is calculated asSimply, measures the Euclidean distance between the instances and . The dimensional representation of th bag, , is formed as a vector of such dissimilarities between th bag and all the other bags in the training set. The final Hausdorff distance between two bags is symmetrized by taking the maximum of the directed distance between them as is not symmetric. The dissimilarity between two bags can also be defined by taking the minimum and average of squared Euclidean distance as follows:Figure 4 shows the minimum Euclidean distances between instances of two bags. According to Figure 4(a), all the instances in bag have the same closest instance in bag while instances in bag have two different closest instances in bag . Due to which the minimum distance between instances of two bags is asymmetric . This distance can be symmetrized by again taking the minimum of these distances; that is, . For the case of average distance as given by (28), .
(a)
(b)
(b) City Block Distance. The dissimilarity between two bags can be defined by using City Block distance metric as follows:
(c) ChiSquared Distance. Chisquared distance is the weighted Euclidean distance which measures the dissimilarity between two bags as follows: If a bag can be viewed as a probability distribution in the instance space, then the dissimilarity between two bags can be defined through distribution distance. It is not only difficult to determine the probability density function in a higherdimensional feature space but also very computationally expensive to estimate the true distributions of instances. Therefore, the instance distributions are approximated and the distance is measured between the approximated distributions by using the following two distance metrics.
(a) Earth Mover’s Distance (EMD). Earth Mover’s distance [71] measures the minimum amount of work to transform one probability distribution into another probability distribution . Consider that each instance has of the total probability mass in bag of size . The Earth Mover’s distance between two bags is computed aswhere is the Euclidean distance and is the flow between instances “” and “” associated with additional constraints: , , , and .
(b) Mahalanobis Distance. Each bag is approximated by a single Gaussian distribution with mean and covariance matrix parameters. Bag dissimilarity between two bags and through Mahalanobis distance is defined asIn this way, we can calculate the dissimilarity score of a bag from the rest of the other bags. The new vector representation of each bag acts as an input to the TWSVM classifier which now works at the baglevel . Now the problem has been converted into the single instance binary classification problem in which a bag has either or class label. Figure 5 depicts the example of multiple instance learning data having four bags. Each bag contains different number of instances such that bag 1 and bag 3 have three instances while bag 2 contains two instances and bag 4 consists of four instances. Class label is associated with each bag instead of individual instance. Traditional supervised learning approaches are not designed for such type of problems. Thus a baglevel MILTWSVM classifier is trained with this summarized data. During testing phase, the similar representation is obtained for the bag query and the proposed classifier takes a decision for a bag on the basis of minimum distance criteria.
5. Numerical Experiments
This section presents the experimental results of our proposed MILTWSVM classifier on ten benchmark MIL datasets. We have analyzed the performance of proposed MILTWSVM classifier with different dissimilarity metrics. The results of MILTWSVM have been compared with several existing MIL approaches such as Diverse Density (DD), ExpectationMaximization Diverse Density (EMDD), MultiInstance Logistic Regression (MILR), Citation kNN, and MultiInstance Support Vector Machine (MISVM). All these classifiers have been implemented in MATLAB 2012a on Windows 7 operating system with Intel core i7 processor with 12 GB RAM. This section has been divided into four subsections. The first subsection includes the description of benchmark MIL datasets used in this study. The second subsection analyzes the impact of parameters on the performance of proposed classifier. Experimental results are discussed and analyzed in subsections three and four, respectively.
5.1. Dataset Description
In this study, the experiment has been performed on ten MIL benchmark datasets: Musk 1, Musk 2, Mutagenesisatoms, Winter Wren, Brown Creeper, Elephant, Fox, Tiger, eastWest, and westEast datasets. These datasets are available online at http://www.miproblems.org/. The detailed description of these datasets is shown in Table 1. These datasets are widely adopted for the performance evaluation of new MIL approaches. These datasets represent four different categories of MIL example.

Musk 1 and Musk 2 are two standard drug activity prediction benchmark datasets in which a bag is represented by one molecule and different conformations or shapes of these molecules are the instances of a bag. In these drug activity prediction datasets, a bag is assigned with the class label “musk” or “nonmusk” by human expert. Musk 1 dataset contains 92 bags while Musk 2 dataset contains 102 bags. Musk 2 dataset contains more number of instances or molecule conformations as compared to the Musk 1 dataset. The objective of MIL is to predict whether a new molecule is “musk” or “nonmusk.” Another dataset that belongs to the category of drug activity prediction is Mutagenesisatoms. This dataset contains 125 positive and 63 negative bags. Brown Creeper and Winter Wren are two audio MIL datasets which contain the audio of bird songs of different species. A bag is represented by an audio fragment. A bag is labeled as positive if particular species is heard in the audio fragment for that category. Since the birds of the same species have similar songs or audio fragment, therefore different bird species have different concepts. It is also possible that some species are heard together more often. In this case, the audio fragments, which are not heard or are negative for one bird species, could be useful to determine whether an audio fragment contains that species or not. Content Base Image Retrieval (CBIR) is another one of the most recognized applications of MIL in which the objective is to determine whether the given image is of interest to user or not. A set of regions or image patches represent an image. An image corresponds to a bag and image regions represent the instances in each bag. The class label of individual instances is unknown. In this study, we have used three image datasets: Elephant, Fox, and Tiger. eastWest and westEast datasets belong to an ILP problem and have been collected from eastWest challenge. The objective of this challenge is to predict whether a train is eastbound or westbound. In eastWest or westEast datasets, a bag represents a train which contains various cars (instances) of different shapes and sizes. Each car having different loads represents its instancelevel attributes. eastWest data challenge has two MI datasets: eastWest and westEast as it is not clear whether an eastbound or westbound train can be considered as positive label example. In eastWest dataset, eastbound trains are regarded as an example of positive class label. Similarly, westbound train is considered as positive example in the westEast dataset.
5.2. Parameters Selection
This study has used Gaussian Kernel function for nonlinear case. MILTWSVM classifier has four penalty parameters: , , , and and an additional kernel parameter sigma . The predictive performance of the classifier gets affected by the choices of these parameters. This study has used Grid Search approach which is one of the widely used approaches for the optimal parameters selection [27, 65, 72–74]. The penalty and kernel parameters are selected from the following range: and . The experiment has been conducted using 10fold crossvalidation approach. It trains the proposed classifier with each pair (penalty and ) in the Cartesian product of these two sets and evaluates their performance by internal crossvalidation on the training set, in which case multiple MILTWSVMs are trained per pair. Finally, it outputs the settings that achieved the highest score in the validation procedure. We have analyzed the influence of these parameters on the performance of MILTWSVM on three datasets: Tiger, Fox, and Mutagenesisatoms as shown in Figures 6, 7, and 8. For linear case, we set and to reduce the computational complexity and analyze their influence on the predictive performance of linear classifier. However, for nonlinear case, consider to reduce the computational complexity and analyze the influence of these parameters and sigma on the predictive performance of nonlinear MILTWSVM classifier. For tiger dataset, the impact of parameters has been analyzed using MaxHausdorff dissimilarity measure as shown in Figure 6. From the figure, it is observed that the proposed linear MILTWSVM classifier has obtained better performance with low value of and high value of parameters . The performance of MILTWSVM suddenly degrades for low value of parameter. For nonlinear cases, MILTWSVM obtains better performance with high value of sigma and low value of penalty parameter . MaxHausdorff based MILTWSVM classifier shows better performance for different combinations of penalty and sigma parameters on other datasets.
(a) Linear
(b) Nonlinear
(a) Linear
(b) Nonlinear
(a) Linear
(b) Nonlinear
The impact of these parameters on Fox and Mutagenesisatoms datasets has been analyzed using MinHausdorff and EMD dissimilarity measures, respectively. On fox dataset, MinHausdorff based linear MILTWSVM has shown better performance for low value of and high value of . Nonlinear MILTWSVM classifier has achieved better predictive accuracy with high value of sigma and low value of penalty parameter on fox dataset as shown in Figure 7. For Mutagenesisatoms, EMD based linear MILTWSVM has obtained highest accuracy for low value of and parameters . Nonlinear MILTWSVM has gained highest accuracy on Mutagenesisatoms dataset for low value of sigma and penalty parameter as shown in Figure 8. For every combination of these parameters (penalty and sigma) and dissimilarity measures, the proposed MILTWSVM classifier behaves differently on different datasets. Therefore, the appropriate selection of these parameters is essential to obtain better performance of MILTWSVM classifier.
5.3. Results and Discussion
The predictive accuracy of the proposed MILTWSVM classifier with different dissimilarity measures on ten benchmark datasets is shown in Tables 2 and 3 for linear and nonlinear cases, respectively.


The result includes the average and standard deviation of classification accuracies of the 10fold crossvalidation. Bold values indicate better predictive accuracy of the classifier. MinHausdorff dissimilarity score based linear MILTWSVM gains highest accuracy on Musk 1, Winter Wren, Elephant, Fox, Tiger, and westEast datasets. MaxHausdorff based linear MILTWSVM classifier obtains highest accuracy on Musk 2 and Brown Creeper datasets. EMD based linear MILTWSVM achieves highest accuracy on Mutagenesisatoms dataset.
Other bag dissimilarity measurements based MILTWSVM classifier shows poor performance on all type of datasets. Similarly, for nonlinear case, MinHausdorff and MaxHausdorff dissimilarity score based MILTWSVM classifier has shown better performance on Winter Wren, Brown Creeper, Elephant, Tiger, Fox, eastWest, and westEast datasets. EMD based nonlinear MILTWSVM classifier achieves highest predictive accuracy on Mutagenesisatoms dataset. Therefore, we can conclude that the MILTWSVM has shown better performance with MinHausdorff and MaxHausdorff dissimilarity scores. Further, we have compared the performance of MinHausdorff based MILTWSVM classifier with the existing MIL approaches ExpectationMaximization Diverse Density (EMDD), Diverse Density (DD), MultiInstance Logistic Regression (MILR), Citation kNN, and MultiInstance Support Vector Machine (MISVM) as shown in Table 4.
 
Friedman test statistic = 36.83. 
From Table 4, it is observed that the proposed MILTWSVM classifier has achieved highest predictive accuracy on all ten benchmark datasets and thus performs better than the other existing MIL approaches.
5.4. Statistical Comparison
Friedman test statistic [32, 33] assigns rank to each classifier according to their performance on each dataset independently. For example, the first rank is given to the best performing classifier; second best performing classifier gets second rank. Average rank is given to the classifiers in case they have shown the same performance. Let be the rank of th classifier on th dataset. Friedman test statistic is calculated asHere, represents the number of datasets used in this study for comparison purpose; denotes the number of classifiers and is the average rank of th classifier. Friedman test statistic follows chisquared distribution with degrees of freedom. The null hypothesis which states that there is no difference between classifiers can be rejected or accepted according to the value of Friedman test statistic. If the value of Friedman test statistic is large as compared to the critical value corresponds to degrees of freedom then we can accept or reject the null hypothesis. The Nemenyi post hoc test [75] reports the significant differences between individual classifiers. According to this test, two classifiers are significantly different if their average rank differs by at least the critical difference (CD) which is obtained aswhere is calculated on the basis of studentized range statistic. The results of Friedman test statistic are plotted by using modified Demšar significance diagram [76]. We have calculated the average rank of each MIL approach on the basis of its performance on each dataset (see Table 4). Then, the Friedman test statistic is calculated according to (31). From Table 4, it is observed that the MinHausdorff dissimilarity based MILTWSVM classifier achieves highest average rank among all MIL approaches. Maximum Hausdorff based MILTWSVM classifier gets second highest average rank. Consider ; then the critical value for 8degree of freedom from chisquared table is 15.507. The obtained Friedman test statistic value is 36.83 which is very higher than the critical value of 8degree of freedom. Hence, we reject the null hypothesis which states that there is no difference between the classifiers. Critical value for nine classifiers is 3.102. Critical difference for is determined using (34) as follows:Figure 9 depicts the Demšar significance diagram in which MIL approaches are arranged in ascending order on the axis as per their average rank and their corresponding ranks are mentioned on the axis.
Critical difference value has been added to the average rank of each MIL approach in order to analyze whether the proposed approach is significantly better than the other MIL approaches. Two vertical lines in red color depict the difference of the end of the best performing MIL approach’s tail and the start of the next significantly different MIL approach. From the figure, it is clear that the other existing MIL approaches such as EMDD, DD, MILR, Citation kNN, and MISVM perform significantly worse than the best performing approach which is MinHausdorff based MILTWSVM. Thus, we can conclude that the proposed MILTWSVM is a suitable choice in the multiple instance learning problem domains.
6. Conclusion
This study has focused on multiple instance learning in which a classifier learns from a set of feature vectors (bag) instead of single feature vector and has proposed an MIL approach based on TWSVM, termed as MILTWSVM. Each bag is denoted by a vector of its dissimilarities to the other bags in the training set and the proposed classifier has been trained with this summarized information. Initially, the performance of proposed MILTWSVM classifier has been compared with different dissimilarity scores on ten benchmark MIL datasets. We have also compared the performance of MILTWSVM with six existing MIL approaches. Experimental results demonstrate that the proposed approach has achieved highest predictive accuracy as compared to the other existing MIL approaches on all ten datasets. This further supports the suitability of MILTWSVM in multiple instance learning scenarios. The findings of experimental results are also supported by the statistical analysis performed by using Friedman test. The test shows that the MILTWSVM is significantly better than the EMDD, DD, MILR, Citation kNN, and MISVM. In the future, we are interested in extending MILTWSVM to multiinstance multilabel scenario.
Competing Interests
The authors declare that they have no conflict of interests regarding the publication of this paper.
References
 T. G. Dietterich, R. H. Lathrop, and T. LozanoPérez, “Solving the multiple instance problem with axisparallel rectangles,” Artificial Intelligence, vol. 89, no. 12, pp. 31–71, 1997. View at: Publisher Site  Google Scholar
 O. Maron and T. LozanoPérez, A Framework for MultipleInstance Learning. Advances in Neural Information Processing Systems (NIPS), vol. 10, MIT Press, 1998.
 Z. H. Zhou, “Multiinstance learning: a survey,” Tech. Rep., Department of Computer Science and Technology, Nanjing University, 2004. View at: Google Scholar
 J. Foulds and E. Frank, “A review of multiinstance learning assumptions,” The Knowledge Engineering Review, vol. 25, no. 1, pp. 1–25, 2010. View at: Publisher Site  Google Scholar
 J. Amores, “Multiple instance classification: review, taxonomy and comparative study,” Artificial Intelligence, vol. 201, pp. 81–105, 2013. View at: Publisher Site  Google Scholar  MathSciNet
 S. Andrews, I. Tsochantaridis, and T. Hofmann, “Support vector machines for maultipleinstance learning,” Advances in Neural Information Processing Systems, vol. 15, MIT Press, no. Cambridge, Mass, USA, pp. 561–568, 2003. View at: Google Scholar
 Y. Chen, J. Bi, and J. Z. Wang, “MILES: multipleinstance learning via embedded instance selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931–1947, 2006. View at: Publisher Site  Google Scholar
 I. Gondra and T. Xu, “A multiple instance learning based framework for semantic image segmentation,” Multimedia Tools and Applications, vol. 48, no. 2, pp. 339–365, 2010. View at: Publisher Site  Google Scholar
 Q. Zhang, S. A. Goldman, W. Yu, and J. E. Fritts, “Contentbased image retrieval using multipleinstance learning,” in Proceedings of the International Conference on Machine Learning (ICML '02), vol. 2, pp. 682–689, 2002. View at: Google Scholar
 Z.H. Zhou, Y.Y. Sun, and Y.F. Li, “Multiinstance learning by treating instances as nonI.I.D. samples,” in Proceedings of the 26th International Conference on Machine Learning (ICML '09), pp. 1249–1256, Montreal, Canada, June 2009. View at: Google Scholar
 H. Wang, F. Nie, and H. Huang, “Learning instance specific distance for multiinstance classification,” in Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 507–512, San Francisco, Calif, USA, August 2011. View at: Google Scholar
 P. Viola, J. C. Platt, and C. Zhang, “Multiple Instance boosting for object detection,” in Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS '05), vol. 18, pp. 1417–1424, December 2005. View at: Google Scholar
 B. Babenko, N. Verma, P. Dollár, and S. J. Belongie, “Multiple instance learning with manifold bags,” in Proceedings of the 28th International Conference on Machine Learning (ICML '11), pp. 81–88, 2011. View at: Google Scholar
 S. Ali and M. Shah, “Human action recognition in videos using kinematic features and multiple instance learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 288–303, 2010. View at: Publisher Site  Google Scholar
 B. Babenko, M.H. Yang, and S. Belongie, “Visual tracking with online multiple instance learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '09), pp. 983–990, IEEE, Miami Beach, Fla, USA, June 2009. View at: Publisher Site  Google Scholar
 C. Leistner, A. Saffari, and H. Bischof, “Miforests: multipleinstance learning with randomized trees,” in Proceedings of the European Conference on Computer Vision (ECCV '10), pp. 29–42, Springer, Berlin, Germany, 2010. View at: Google Scholar
 Y. Xie, Y. Qu, C. Li, and W. Zhang, “Online multiple instance gradient feature selection for robust visual tracking,” Pattern Recognition Letters, vol. 33, no. 9, pp. 1075–1082, 2012. View at: Publisher Site  Google Scholar
 B. Zeisl, C. Leistner, A. Saffari, and H. Bischof, “Online semisupervised multipleinstance boosting,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), p. 1879, IEEE, San Francisco, Calif, USA, June 2010. View at: Publisher Site  Google Scholar
 Z. Jorgensen, Y. Zhou, and M. Inge, “A multiple instance learning strategy for combating good word attacks on spam filters,” The Journal of Machine Learning Research, vol. 9, pp. 1115–1146, 2008. View at: Google Scholar
 L. Sørensen, M. Loog, D. M. J. Tax, W. J. Lee, M. De Bruijne, and R. P. W. Duin, “Dissimilaritybased multiple instance learning,” in Structural, Syntactic, and Statistical Pattern Recognition, vol. 6218, pp. 129–138, Springer, 2010. View at: Google Scholar
 V. Cheplygina, D. M. J. Tax, and M. Loog, “Multiple instance learning with bag dissimilarities,” Pattern Recognition, vol. 48, no. 1, pp. 264–275, 2015. View at: Publisher Site  Google Scholar
 J. Wang and J. D. Zucker, “Solving multipleinstance problem: a lazy learning approach,” in Proceedings of the 17th International Conference on Machine Learning, pp. 1119–1125, San Francisco, Calif, USA, 2000. View at: Google Scholar
 T. Gärtner, P. A. Flach, A. Kowalczyk, and A. J. Smola, “Multiinstance kernels,” in Proceedings of the International Conference on Machine Learning (ICML '02), vol. 2, pp. 179–186, 2002. View at: Google Scholar
 H.Y. Wang, Q. Yang, and H. Zha, “Adaptive pposterior mixturemodel kernels for multiple instance learning,” in Proceedings of the 25th International Conference on Machine Learning, pp. 1136–1143, Helsinki, Finland, July 2008. View at: Google Scholar
 D. Zhang, Y. Liu, L. Si, J. Zhang, and R. D. Lawrence, “Multiple instance learning on structured data,” in Advances in Neural Information Processing Systems (NIPS), vol. 24, pp. 145–153, 2011. View at: Google Scholar
 Q. Zhang and S. A. Goldman, “EMDD: an improved multipleinstance learning technique,” in Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 1073–1080, MIT Press, 2001. View at: Google Scholar
 Jayadeva, R. Khemchandani, and S. Chandra, “Twin support vector machines for pattern classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 5, pp. 905–910, 2007. View at: Publisher Site  Google Scholar
 Y. Tian, Z. Qi, X. Ju, Y. Shi, and X. Liu, “Nonparallel support vector machines for pattern classification,” IEEE Transactions on Cybernetics, vol. 44, no. 7, pp. 1067–1079, 2014. View at: Publisher Site  Google Scholar
 D. Tomar and S. Agarwal, “Twin support vector machine: a review from 2007 to 2014,” Egyptian Informatics Journal, vol. 16, no. 1, pp. 55–69, 2015. View at: Publisher Site  Google Scholar
 O. L. Mangasarian and E. W. Wild, “Multisurface proximal support vector machine classification via generalized eigenvalues,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 1, pp. 69–74, 2006. View at: Publisher Site  Google Scholar
 X. Xu and E. Frank, “Logistic regression and boosting for labeled bags of instances,” in Advances in Knowledge Discovery and Data Mining, H. Dai, R. Srikant, and C. Zhang, Eds., vol. 3056 of Lecture Notes in Computer Science, pp. 272–281, Springer, 2004. View at: Publisher Site  Google Scholar
 M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, 1940. View at: Publisher Site  Google Scholar  MathSciNet
 J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006. View at: Google Scholar  MathSciNet
 Z. Zhao, G. Fu, S. Liu et al., “Drug activity prediction using multipleinstance learning via joint instance and feature selection,” BMC Bioinformatics, vol. 14, supplement 14, article S16, 2013. View at: Publisher Site  Google Scholar
 Z.H. Zhou, K. Jiang, and M. Li, “Multiinstance learning based web mining,” Applied Intelligence, vol. 22, no. 2, pp. 135–147, 2005. View at: Publisher Site  Google Scholar
 Y. Chevaleyre and J. D. Zucker, “Solving multipleinstance and multiplepart learning problems with decision trees and rule sets. Application to the mutagenesis problem,” in Advances in Artificial Intelligence: 14th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI 2001 Ottawa, Canada, June 7–9, 2001 Proceedings, vol. 2056 of Lecture Notes in Computer Science, pp. 204–214, Springer, Berlin, Germany, 2001. View at: Publisher Site  Google Scholar
 Z. Fu and A. RoblesKelly, “Fast multiple instance learning via L1,2 logistic regression,” in Proceedings of the 19th International Conference on Pattern Recognition (ICPR '08), pp. 1–4, IEEE, Tampa, Fla, USA, December 2008. View at: Google Scholar
 J. Ramon and L. De Raedt, “Multi instance neural networks,” in Proceedings of the ICML2000 Workshop on AttributeValue and Relational Learning, pp. 53–60, 2000. View at: Google Scholar
 Z. H. Zhou and M. L. Zhang, “Neural networks for multiinstance learning,” in Proceedings of the International Conference on Intelligent Information Technology, pp. 455–459, Beijing, China, 2002. View at: Google Scholar
 W. Shen, X. Bai, Z. Hu, and Z. Zhang, “Multiple instance subspace learning via partial random projection tree for local reflection symmetry in natural images,” Pattern Recognition, vol. 52, pp. 306–316, 2016. View at: Publisher Site  Google Scholar
 M. Birisan and P. A. Beling, “A multiinstance learning approach to filtering images for presentation to analysts,” Environment Systems and Decisions, vol. 34, no. 3, pp. 406–416, 2014. View at: Publisher Site  Google Scholar
 I. Gondra and T. Xu, “Image region reweighting via multiple instance learning,” Signal, Image and Video Processing, vol. 4, no. 4, pp. 409–417, 2010. View at: Publisher Site  Google Scholar
 Y.Y. Xu, “Multipleinstance learning based decision neural networks for image retrieval and classification,” Neurocomputing, vol. 171, pp. 826–836, 2016. View at: Publisher Site  Google Scholar
 O. Maron and A. L. Ratan, “Multipleinstance learning for natural scene classification,” in Proceedings of the International Conference on Machine Learning (ICML '98), pp. 341–349, Madison, Wis, USA, 1998. View at: Google Scholar
 H. Cheng, K. A. Hua, and N. Yu, “An automatic feature generation approach to multiple instance learning and its applications to image databases,” Multimedia Tools and Applications, vol. 47, no. 3, pp. 507–524, 2010. View at: Publisher Site  Google Scholar
 H. T. Pao, S. C. Chuang, Y. Y. Xu, and H.C. Fu, “An EM based multiple instance learning method for image classification,” Expert Systems with Applications, vol. 35, no. 3, pp. 1468–1472, 2008. View at: Publisher Site  Google Scholar
 F. Sener and N. IkizlerCinbis, “Ensemble of multiple instance classifiers for image reranking,” Image and Vision Computing, vol. 32, no. 5, pp. 348–362, 2014. View at: Publisher Site  Google Scholar
 F. Li and R. Liu, “Graphbased multipleinstance learning with instance weighting for image retrieval,” in Proceedings of the 18th IEEE International Conference on Image Processing (ICIP '11), pp. 2453–2456, IEEE, Brussels, Belgium, September 2011. View at: Publisher Site  Google Scholar
 S. Feng, W. Xiong, B. Li, C. Lang, and X. Huang, “Hierarchical sparse representation based multiinstance semisupervised learning with application to image categorization,” Signal Processing, vol. 94, no. 1, pp. 595–607, 2014. View at: Publisher Site  Google Scholar
 Y. Xu, T. Mo, Q. Feng, P. Zhong, M. Lai, and E. I.C. Chang, “Deep learning of feature representation with multiple instance learning for medical image analysis,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '14), pp. 1626–1630, Florence, France, May 2014. View at: Publisher Site  Google Scholar
 J. Wu, Y. Yinan, C. Huang, and Y. Kai, “Deep multiple instance learning for image classification and autoannotation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15), pp. 3460–3469, IEEE, Boston, Mass, USA, June 2015. View at: Publisher Site  Google Scholar
 D. Kotzias, M. Denil, P. Blunsom, and N. de Freitas, “Deep multiinstance transfer learning,” https://arxiv.org/abs/1411.3128. View at: Google Scholar
 J. Ding, H. D. Cheng, J. Huang, J. Liu, and Y. Zhang, “Breast ultrasound image classification based on multipleinstance learning,” Journal of Digital Imaging, vol. 25, no. 5, pp. 620–627, 2012. View at: Publisher Site  Google Scholar
 C. Li, S. Zhang, H. Zhang et al., “Using the Knearest neighbor algorithm for the classification of lymph node metastasis in gastric cancer,” Computational and Mathematical Methods in Medicine, vol. 2012, Article ID 876545, 11 pages, 2012. View at: Publisher Site  Google Scholar
 T. Tong, R. Wolz, Q. Gao, R. Guerrero, J. V. Hajnal, and D. Rueckert, “Multiple instance learning for classification of dementia in brain MRI,” Medical Image Analysis, vol. 18, no. 5, pp. 808–818, 2014. View at: Publisher Site  Google Scholar
 G. Quellec, M. Lamard, M. D. Abràmoff et al., “A multipleinstance learning framework for diabetic retinopathy screening,” Medical Image Analysis, vol. 16, no. 6, pp. 1228–1240, 2012. View at: Publisher Site  Google Scholar
 W. He and Y. Wang, “Text representation and classification based on multiinstance learning,” in Proceedings of the 16th International Conference on Management Science and Engineering (ICMSE '09), pp. 34–39, Moscow, Russia, September 2009. View at: Publisher Site  Google Scholar
 C. Xu, W. Tao, Z. Meng, and Z. Feng, “Robust visual tracking via online multiple instance learning with Fisher information,” Pattern Recognition, vol. 48, no. 12, pp. 3917–3926, 2015. View at: Publisher Site  Google Scholar
 J. Xu, S. Denman, V. Reddy, C. Fookes, and S. Sridharan, “Realtime video event detection in crowded scenes using MPEG derived features: a multiple instance learning approach,” Pattern Recognition Letters, vol. 44, pp. 113–125, 2014. View at: Publisher Site  Google Scholar
 X. Ding, G. Zhang, Y. Ke, B. Ma, and Z. Li, “High efficient intrusion detection methodology with twin support vector machines,” in Proceedings of the International Symposium on Information Science and Engineering (ISISE '08), vol. 1, pp. 560–564, Shanghai, China, December 2008. View at: Publisher Site  Google Scholar
 J. He and S.H. Zheng, “Intrusion detection model with twin support vector machines,” Journal of Shanghai Jiaotong University (Science), vol. 19, no. 4, pp. 448–454, 2014. View at: Publisher Site  Google Scholar
 J. A. Nasiri, N. M. Charkari, and K. Mozafari, “Energybased model of least squares twin Support Vector Machines for human action recognition,” Signal Processing, vol. 104, pp. 248–257, 2014. View at: Publisher Site  Google Scholar
 H.Y. Yang, X.Y. Wang, P.P. Niu, and Y.C. Liu, “Image denoising using nonsubsampled shearlet transform and twin support vector machines,” Neural Networks, vol. 57, pp. 152–165, 2014. View at: Publisher Site  Google Scholar
 D. Tomar, D. Ojha, and S. Agarwal, “An emotion detection system based on multi least squares twin support vector machine,” Advances in Artificial Intelligence, vol. 2014, Article ID 282659, 11 pages, 2014. View at: Publisher Site  Google Scholar
 M. A. Kumar and M. Gopal, “Least squares twin support vector machines for pattern classification,” Expert Systems with Applications, vol. 36, no. 4, pp. 7535–7543, 2009. View at: Publisher Site  Google Scholar
 S. Agarwal and D. Tomar, “A feature selection based model for software defect prediction,” International Journal of Advanced Science and Technology, vol. 65, pp. 39–58, 2014. View at: Publisher Site  Google Scholar
 D. Tomar and S. Agarwal, “Prediction of defective software modules using class imbalance learning,” Applied Computational Intelligence and Soft Computing, vol. 2016, Article ID 7658207, 12 pages, 2016. View at: Publisher Site  Google Scholar
 D. Tomar and S. Agarwal, “Hybrid feature selection based weighted least squares twin support vector machine approach for diagnosing breast cancer, hepatitis, and diabetes,” Advances in Artificial Neural Systems, vol. 2015, Article ID 265637, 10 pages, 2015. View at: Publisher Site  Google Scholar
 D. Tomar, B. R. Prasad, and S. Agarwal, “An efficient Parkinson disease diagnosis system based on least squares twin support vector machine and particle swarm optimization,” in Proceedings of the 9th IEEE International Conference on Industrial and Information Systems (ICIIS '14), pp. 1–6, IEEE, Gwalior, India, December 2014. View at: Publisher Site  Google Scholar
 Z. Wu and C. Yang, “Study to multitwin support vector machines and its applications in speaker recognition,” in Proceedings of the International Conference on Computational Intelligence and Software Engineering (CiSE '09), pp. 1–4, Wuhan, China, December 2009. View at: Publisher Site  Google Scholar
 Y. Rubner, C. Tomasi, and L. J. Guibas, “Earth mover's distance as a metric for image retrieval,” International Journal of Computer Vision, vol. 40, no. 2, pp. 99–121, 2000. View at: Publisher Site  Google Scholar
 C. W. Hsu, C. C. Chang, and C. J. Lin, “A practical guide to support vector classification,” Tech. Rep., Department of Computer Science, National Taiwan University, Taipei, Taiwan, 2013. View at: Google Scholar
 R. Khemchandani and S. Sharma, “Robust Least Squares Twin Support vector machine for human activity recognition,” Applied Soft Computing, vol. 47, pp. 33–46, 2016. View at: Publisher Site  Google Scholar
 C.W. Hsu and C.J. Lin, “A comparison of methods for multiclass support vector machines,” IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 415–425, 2002. View at: Publisher Site  Google Scholar
 P. Nemenyi, Distributionfree multiple comparisons [Ph.D. thesis], Princeton University, 1963.
 I. Brown and C. Mues, “An experimental comparison of classification algorithms for imbalanced credit scoring data sets,” Expert Systems with Applications, vol. 39, no. 3, pp. 3446–3453, 2012. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2016 Divya Tomar and Sonal Agarwal. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.