Research Article  Open Access
Jinhua Li, Shiji Song, Yuli Zhang, Zhen Zhou, "Robust KMedian and KMeans Clustering Algorithms for Incomplete Data", Mathematical Problems in Engineering, vol. 2016, Article ID 4321928, 8 pages, 2016. https://doi.org/10.1155/2016/4321928
Robust KMedian and KMeans Clustering Algorithms for Incomplete Data
Abstract
Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as Kmedian and Kmeans. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust Kmedian and Kmeans clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and realworld incomplete data sets validate the robustness and effectiveness of the proposed algorithms.
1. Introduction
In the field of data mining and machine learning, it is a common occurrence that the considered data sets contain several observations with missing feature values. Such incomplete data occur in a wide array of application domains due to various reasons, including improper collection process of data sets, high cost to obtain some feature values, and missing response in the questionnaire. For example, online shopping users may only rate a small fraction of the available books, movies, or songs, which leads to massive amounts of missing feature values, Marlin [1]. Theoretical study of pattern recognition for incomplete data is first conducted by Sebestyen [2] under certain probabilistic assumptions. Expectation maximization algorithms have also been proposed to compute maximum likelihood estimates for missing data in Dempster et al. [3]. Early empirical studies on incomplete data are reported by Dixon [4] and Jain and Dubes [5].
Clustering analysis has been regarded as an effective method to extract useful features and explore potential data patterns. Due to the presence of missing feature values, there is an urgent need to cluster incomplete data in many fields, such as image analysis [6], information retrieval [7], and clinical medicine [8]. To cluster incomplete data, the basic approach is the twostep method, which first estimates the missing feature values using imputation and then applies the classical clustering methods. Troyanskaya et al. [9] investigate three imputation based clustering methods for gene microarray data, including the singular value decomposition, weighted Knearest neighbors (KNN), and row average methods. Troyanskaya et al. [9] conclude that the KNN method appears to provide a more robust and sensitive result for missing value estimation than others. Miyamoto et al. [10] also use a similar imputation based fuzzy cmeans (FCM) method to handle incomplete data. Acuna and Rodriguez [11] and Farhangfar et al. [12] compare the performance of different imputation methods for missing values, including single imputation methods, such as the mean, median, hot deck, and NaiveBayes methods and the polytomous regression based multiple imputation method for classification problems. Saravanan and Sailakshmi [13] propose fuzzy probabilistic cmeans algorithms to impute the missing values using the genetic algorithm.
Besides the imputation based methods, Hathaway and Bezdek [14] propose four strategies to make the classical FCM clustering algorithm applicable to incomplete data. The simplest whole data strategy (WDS) deletes all incomplete samples and applies the FCM algorithm to the remaining complete data. This strategy is only useful when only a few incomplete samples include missing values. To calculate distances of missing data in the process of implementing FCM, the partial distance strategy (PDS) can be used. PDS has also been used in pattern recognition in Dixon [4] and fuzzy clustering with missing values in Miyamoto et al. [10] and Timm and Kruse [15]. The third and fourth strategies can be viewed as iterative imputation based methods. The optimal completion strategy (OCS) imputes the missing values by the maximum likelihood estimate in an iterative optimization procedure, and the nearest prototype strategy (NPS) is a simple modification of OCS, in which missing elements are imputed considering only the nearest prototype. Clustering methods without elimination or imputation for incomplete data have also been proposed. Shibayama [16] uses the principal component analysis (PCA) method to capture the structure of incomplete data and Honda and Ichihashi [17] propose linear fuzzy clustering methods based on the local PCA. Zhang and Chen [18] propose a kernelbased FCM clustering algorithm for incomplete data, which estimates the missing feature values based on the fuzzy membership and cluster prototype. Sadaaki et al. [19] further combine the linear fuzzy clustering with PDS, OCS, and NPS proposed by Hathaway and Bezdek [14].
Both direct imputation and iterative imputation (such as OCS, NPS) methods assume that the miss feature value can be well estimated by a single value. However, it is usually hard to obtain accurate estimates of the missing values, and thus clustering methods based on imputation are sensitive to the estimation accuracy. To address this issue, Li et al. [20] use nearestneighbor intervals to represent the missing values and extend FCM by defining new interval distance function for interval data. Interval data have been verified as an effective way to handle the missing values and further used to propose effective clustering methods. Li et al. [21] also represent the missing values by interval data but search for appropriate imputations of missing values in the intervals using the genetic algorithm. Wang et al. [22] use an improved backpropagation (BP) neural network to estimate the interval data for missing values. Zhang et al. [23] propose an improved interval construction method based on preclassification results and use the particle swarm optimization to search for the optimal clustering. Zhang et al. [8] represent the missing values by probabilistic information granules and design an efficient trilevel alternating optimization method to find both the optimal clustering results and the optimal missing values simultaneously.
Recently, robust optimization has been widely accepted as an effective method to handle uncertain or missing data and used in the field of data mining and machine learning, such as the minimax probability machine [24–27], robust support vector machines [28, 29], and robust quadratic regression [30]. This paper aims at designing robust clustering algorithms for incomplete data. The improved interval construction method based on preclassification is used to obtain the interval data for missing values. Based on the interval data representation, we present robust Kmedian and Kmeans clustering algorithms. Different from the existing algorithms, which use either the interval distance function or optimal imputation [20, 21, 23], we reformulate the clustering problem as a minimax robust optimization problem based on interval data.
Specifically, for given cluster prototype and membership matrices, we introduce a concept of robust clustering objective function, which is the maximum of clustering objective function when the missing values vary in the constructed intervals. Then the proposed algorithms aim at finding optimal cluster prototype and membership matrices, which minimize the robust clustering objective function. For both robust Kmedian and Kmean clustering problems, we give equivalent reformulations for the robust objective function and present effective solution methods. Compared with existing methods, the proposed algorithms are insensitive to estimation errors of the constructed intervals, especially when the missing rate is high. Comparisons and analysis of numerical experimental results on UCI data sets also validate the effectiveness of the proposed robust algorithms.
Compared with existing algorithms, the advantages of the proposed robust clustering algorithms are twofold. First, our algorithms can cluster incomplete data without imputation for the missing feature values and provide robust clustering results, which are insensitive to estimation errors. Our experiments also validate the effectiveness of the proposed algorithm in terms of robustness and accuracy by comparison with existing algorithms. Second, the proposed algorithms are easy to understand and implement. Specifically, the time complexity of the robust Kmedian and Kmeans clustering algorithms is and , respectively, where is the number of objects, is the dimension of features, is the number of clusters, and is the number of iterations. Our algorithms have similar computation complexity to the classical Kmedian and Kmeans clustering algorithms and are more efficient than the clustering algorithms for incomplete data proposed by Zhang et al. [8] with the time complexity of (when for the robust Kmeans clustering algorithm).
The paper is organized as follows. Section 2 reviews the classical Kmedian and Kmean algorithms and presents the robust Kmedian and Kmeans clustering problems. Section 3 gives effective algorithms for the proposed robust optimization problems. Section 4 reports experimental results. Finally, we conclude this paper with further research direction in Section 5.
2. Robust Clustering Algorithms
2.1. KMedian and KMeans Clustering for Complete Data
Consider the problem of clustering a set of objects into clusters. For each object , we have a set of features , where describes the th features of the object quantitatively. Let be the feature vector of the object and be the feature matrix or data set.
The task of clustering can be reformulated as an optimization problem, which minimizes the following clustering objective function:under the following constraints:where . For , is the th cluster prototypes and, for any , indicates whether the object belongs to the th cluster. Kmedian and Kmeans are effective algorithms to solve the clustering problem for and , respectively. In the following, let the cluster prototype matrix and the membership matrix , where and .
Both algorithms solve the clustering problem in iterative ways as follows.
Step 1. Set iteration index and randomly select different objects as the initial cluster prototypes .
Step 2. Let , and update the membership matrix by fixing the cluster prototype matrix . For any , randomly select , and set and, for any , set .
Step 3. Update the cluster prototype matrix by fixing the membership matrix . When , for any and , set as the median of the th feature values of these objects in cluster . When , for any , set as the centroid of these objects in cluster ; that is,
Step 4. If, for any and , we have , then stop and return to and ; otherwise, go to Step .
2.2. Robust KMedian and KMeans Clustering for Incomplete Data
Due to various reasons, the feature matrix may contain missing components. For example, when , for a certain object , we may have , which indicates that the thirdfeature value of object is missing. We refer to a data set as an incomplete data set if it contains at least one missing feature value for some objects; that is, there exists at least one and , such that . To describe the missing data set, for any , we further partition the feature set of into two subsets:
In practice, it is difficult to obtain accurate estimations of missing feature values. Thus, in this paper, we represent missing values by intervals. Specifically, for any , we use an interval to represent unknown missing feature value where and use to represent known feature value where . To simplify notations, in the following, let and for any and for any . For details on how to construct these intervals for missing values, see Li et al. [20] and Zhang et al. [23].
This paper aims at designing robust clustering methods, such that the worstcase performance of the cluster output can be guaranteed. The logic of the proposed method can be explained as a twoplayer game: a clustering decisionmaker first makes clustering decision, and then an adversarial player chooses values of missing features from certain intervals. Thus, a robust clustering decisionmaker will select the cluster, such that the worstcase cluster objective function is minimized.
To introduce robust clustering problem, we first define the following robust cluster objective function:where , , and represents the uncertainty in the th feature of the object . Thus, the robust clustering problem can be formulated as follows:
(RCP) is a discrete minimax problem. When there is no missing data, that is, for any , (RCP) reduces to the classical clustering problem (1). Since problem (1) is NPhard problem [31, 32], finding the global optimal solution of (RCP) is a challenging task. In the next section, we propose effective robust Kmedian and Kmeans algorithms for (RCP).
3. Algorithms
3.1. Robust KMedian Clustering Algorithm
In this subsection, we provide a robust Kmedian clustering algorithm for (RCP) when . We first show how to simplify the robust cluster objective function. where (7) uses the fact that the maximum of a convex function over a convex set is attained at extreme points and (8) uses constraints (2). Since and , for any and , we further haveEquation (9) shows that the existence of missing values increases the cluster objective function. Based on (9), the robust Kmedian clustering algorithm can be given in Algorithm 1.
Algorithm 1 (robust Kmedian clustering algorithm).
Input. The feature matrix , interval size () and .
Output. The cluster prototype matrix and membership matrix .
Step 1 (initialization). Set iteration index and randomly select different rows from as the initial cluster prototypes .
Step 2. Let and update by fixing .
For any , randomly select , and set and, for any , set .
Step 3. Update by fixing :
For any , let . For any , set as the median of .
Step 4 (stop criterion). If for any and , then stop and return to and ; otherwise, go to Step .
3.2. Robust KMeans Clustering Algorithm
In this subsection, a robust Kmedian clustering algorithm for (RCP) when is proposed. Similarly to the analysis of when , we first simply the robust cluster objective function as follows:Since , we have
To minimize , we need to update and in an alternative manner. Specifically, when the value of is fixed, each object can be assigned to any cluster in the following index set:When the value of is fixed, for each cluster , let . Then the optimal value of can be obtained by solving the following piecewise convex optimization problem:Note that optimization problem (13) is decomposable in . Thus, to obtain the optimal value of , it is sufficient to solve the following subproblem:
Procedure 1 (procedure of solving the Subproblem (14)).
Input. Given and , , and ().
Output. .
Step 1 (ranking). Rank in the increasing order. To simplify notations, in the following, we omit indices and , and suppose , where .
Step 2. Identify potential minimum points.
For , calculate .
Step 3. Return to .
Subproblem (14) is a piecewise convex quadratic optimization problem and can be solved by Procedure 1.
Procedure 1 solves Subproblem (14) by enumerating all potential minimum points. It is easy to see that Procedure 1 can be implemented in time if the ranking step uses effective sorting methods, such as the Heapsort.
Based on the above discussion, the robust Kmeans clustering algorithm can be described in Algorithm 2.
Algorithm 2 (robust Kmeans clustering algorithm).
Input. The feature matrix , interval size () and .
Output. The cluster prototype matrix and membership matrix .
Step 1 (initialization). Set iteration index and randomly select different rows from as the initial cluster prototypes .
Step 2. Let and update by fixing :
For any , randomly select that belongs to the index set (12).
For any , set .
Step 3. Update by fixing .
For any and , obtain using Procedure 1.
Step 4 (stop criterion). If for any and , then stop and return to and ; otherwise, go to Step .
3.3. Computational Complexity
It is well known that the time complexity of the classical Kmedian and Kmeans algorithms is , where is the number of objects, is the dimension of features, is the number of clusters, and is the number of iterations. We will show that the proposed robust Kmedian clustering algorithm has an time complexity and the robust Kmeans clustering algorithm has an time complexity.
Specifically, the initialization step of Algorithm 1 takes time to initialize the cluster prototype matrix. For a given cluster prototype matrix, Algorithm 1 takes time to update the membership matrix. Note that the median of scalar can be computed in time [33]. Let and we have . Therefore, Step of Algorithm 1 can be implemented in time. The last step of Algorithm 1 takes time. Therefore, the time complexity of the robust Kmedian clustering algorithm is .
For the robust Kmeans clustering algorithm, it is easy to see that the first two steps of Algorithm 2 take and time, respectively. Let . For given and , Procedure 1 takes time to compute . Therefore, Step of Algorithm 2 takes time since time. Note that the last step of Algorithm 2 also takes . Thus, the time complexity of the robust Kmeans clustering algorithms is .
In addition, it is easy to see that both the robust Kmedian and robust Kmeans clustering algorithms have a space complexity of . Therefore, compared with the classical Kmedian and robust Kmeans algorithms, the proposed robust clustering algorithms consume same computation resources.
4. Numerical Experiments
In this section, we compare the proposed robust clustering algorithms with others on two data sets from the UCI machine learning repository. Section 4.1 describes the data sets and experimental setup, and Section 4.2 reports and discusses the experimental results.
4.1. Data Sets and Experimental Setup
Two widely used data sets, Iris and Seeds, are used to test the performance of the proposed algorithms. The Iris data consists of 150 objects and each object has four features of Iris flowers, including sepal length, sepal width, petal length, and petal width. The Iris data includes three clusters, Setosa, Versicolour, and Virginica, and each cluster contains 50 objects. The optimal cluster prototypes of the Iris data have been reported by Hathaway and Bezdek [34]. The Seeds data set consists of 210 kernels of three different varieties of wheat, and each kernel has seven realvalued features, including area, perimeter, compactness, length of kernel, width of kernel, asymmetry coefficient, and length of kernel groove.
We generate the missing values under the missing completely at random (MCAR) mechanism as in Hathaway and Bezdek [14] and Li et al. [20]. Specifically, we randomly select a specified percentage of components and designate them as missing. To make the incomplete data tractable, we also make sure that the following constraints are satisfied:(1)each object retains at least one feature;(2)each feature has at least one value present in the incomplete data set.
In addition to the Iris and Seeds data sets with artificially generated missing values, we also test the proposed algorithms on a realworld incomplete data set and the Stone Flakes data set [35], which consists of 79 eightdimensional attribute stone flake objects in the prehistoric era. These objects belong to three different historic ages. The Stone Flakes data set is incomplete and there are 6 incomplete objects with 10 missing feature values.
Li et al. [20] use the nearest neighbors to construct intervals for missing feature values and, from their numerical experiments, is a good choice. To further test the impact of the interval size on the clustering performance of the proposed robust clustering algorithms, the interval for the missing value is constructed as , where is estimated by the nearest neighbors and .
4.2. Results and Discussion
We first test and compare the performance of the proposed robust Kmedian (labelled “RKM1”) on both Iris and Seeds data sets under different missing rates from to . The classical Kmedian algorithms have also been modified based on WDS, PDS, and NPS to handle incomplete data sets. Since the performance of Kmedian algorithm depends on the initial cluster prototypes, we repeat each algorithm 100 times and report the averaged performance.
Tables 1 and 2 report the averaged performance of different Kmedian algorithms on the incomplete Iris and Seeds data, respectively. The first column in each table gives the missing rate. The second to seventh columns give the averaged misclassification rates by comparison with the true clustering result, where the fifth to seventh columns correspond to the RKM1 algorithms with different values of ranging from 0.05 to 0.15. In Table 1, the eighth to thirteenth columns give the averaged cluster prototype errors of different algorithms, which are calculated by where represents the cluster prototypes given by a certain Kmedian algorithm and is the actual cluster prototypes of the Iris data set without missing values. Since the actual cluster prototypes of the Seeds data set are unknown, such results are not reported in Table 2.


From Tables 1 and 2, we have the following observations.(1)When there is no missing value, that is, the missing rate is equal to zero, all Kmedian algorithms give the same results. As the missing rate increases, in most cases, both the misclassification rate and prototype error of all algorithms become larger.(2)When the missing rate is small, the missing data have little adverse effect on the performance of the proposed RKM1. For example, the misclassification rate of RKM1 when the missing rate is around is even smaller than that of RKM1 when the missing rate is zero.(3)When the missing rate is large, compared with the WDS, PDS, and NPS based Kmedian algorithms, RKM1 provides clustering results with lower numbers of misclassification and prototype errors.(4)Experimental results also show that the interval size affects the performance of RKM1. Specifically, as the value of increases from 0.05 to 0.15, for most cases, the misclassification rate of RKM1 first decreases and then increases. However, when the missing rate is high (20%), RKM1 with a small value of provides the best clustering performance.
The proposed robust Kmeans algorithm (labelled “RKM2”) is also tested on both Iris and Seeds data sets and compared with the WDS, PDS, and NPS based Kmeans algorithms. Tables 3 and 4 report the averaged performance of these algorithms by repeating each algorithm 100 times.


Tables 3 and 4 also validate the robustness of the proposed RKM2 against the missing values. When there are missing values, RKM2 provides robust cluster results with smaller misclassification rate and prototype error compared with the WDS, PDS, and NPS based Kmeans algorithms. For example, when the missing rate is , the misclassification rate given by RKM2 with on the Seeds data set is only , while the best misclassification rate given by other Kmeans algorithms is . The impact of the interval size on the performance of RKM2 is similar to that of RKM1; that is, for most cases the RKM2 with provides the best clustering performance in terms of both misclassification rate and prototype error.
Finally, we test the performance of the proposed robust clustering algorithm on a realworld incomplete data set, the Stone Flakes data set. From the above discussion, we set for both RKM1 and RKM2. Figure 1 demonstrates the numbers of misclassification of different algorithms. From Figure 1, we see that RKM1 provides the lowest misclassification rate and RKM2 provides the second best performance.
5. Conclusion
This paper considers the clustering problem for incomplete data. To reduce the effect of missing values on the performance of clustering results, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function, which is defined as the worstcase cluster objective function when the missing values vary in the constructed intervals. Then, we propose a robust clustering model which aims at minimizing the robust cluster objective function. Robust Kmedian and Kmeans algorithms are designed to solve the proposed robust clustering problem. The time complexity of the robust Kmedian and Kmeans clustering algorithms is and , respectively. Numerical experiments on both artificially generated and realworld incomplete data sets show that the proposed algorithms are robust against the missing data and provide better clustering performance by comparison with the existing WDS, PDS, and NPS based Kmedian and Kmeans algorithms.
Both Kmedian and Kmeans algorithms solve clustering incomplete data with hard constraints; that is, each object only belongs to one cluster. To solve clustering incomplete data with soft constraints, we will further study the robust fuzzy Kmedian and Krobust clustering algorithms in the future.
Competing Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was supported by the Major Program of the National Natural Science Foundation of China under Grant nos. 41427806 and 61503211, the Natural Science Foundation of Beijing under Grant 9152002, and the Project of China Ocean Association under Grant DYXM1252502.
References
 B. M. Marlin, Missing data problems in machine learning [Ph.D. thesis], University of Toronto, Toronto, Canada, 2008.
 G. S. Sebestyen, DecisionMaking Processes in Pattern Recognition, ACM Monograph Series, 1962.
 A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 39, no. 1, pp. 1–38, 1977. View at: Google Scholar  MathSciNet
 J. K. Dixon, “Pattern recognition with partly missing data,” IEEE Transactions on Systems, Man and Cybernetics, vol. 9, no. 10, pp. 617–621, 1979. View at: Publisher Site  Google Scholar
 A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, PrenticeHall, Englewood Cliffs, NJ, USA, 1988. View at: MathSciNet
 X. Zhou, R. Zhao, F. Yu, and H. Tian, “Intuitionistic fuzzy entropy clustering algorithm for infrared image segmentation,” Journal of Intelligent & Fuzzy Systems, vol. 30, no. 3, pp. 1831–1840, 2016. View at: Google Scholar
 H. P. Lai, M. Visani, A. Boucher, and J.M. Ogier, “Unsupervised and interactive semisupervised clustering for large image database indexing and retrieval,” Fundamenta Informaticae, vol. 130, no. 2, pp. 201–218, 2014. View at: Publisher Site  Google Scholar
 L. Zhang, W. Lu, X. Liu, W. Pedrycz, and C. Zhong, “Fuzzy cmeans clustering of incomplete data based on probabilistic information granules of missing values,” KnowledgeBased Systems, vol. 99, pp. 51–70, 2016. View at: Publisher Site  Google Scholar
 O. Troyanskaya, M. Cantor, G. Sherlock et al., “Missing value estimation methods for DNA microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001. View at: Publisher Site  Google Scholar
 S. Miyamoto, O. Takata, and K. Umayahara, “Handling missing values in fuzzy cmeans,” in Proceedings of the 3rd Asian Fuzzy Systems Symposium, pp. 139–142, Masan, Korea, June 1998. View at: Google Scholar
 E. Acuna and C. Rodriguez, “The treatment of missing values and its effect on classifier accuracy,” in Classification, Clustering, and Data Mining Applications, pp. 639–647, Springer, New York, NY, USA, 2004. View at: Google Scholar
 A. Farhangfar, L. Kurgan, and J. Dy, “Impact of imputation of missing values on classification error for discrete data,” Pattern Recognition, vol. 41, no. 12, pp. 3692–3705, 2008. View at: Publisher Site  Google Scholar
 P. Saravanan and P. Sailakshmi, “Missing value imputation using fuzzy possibilistic c means optimized with support vector regression and genetic algorithm,” Journal of Theoretical and Applied Information Technology, vol. 72, no. 1, pp. 34–39, 2015. View at: Google Scholar
 R. J. Hathaway and J. C. Bezdek, “Fuzzy cmeans clustering of incomplete data,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 31, no. 5, pp. 735–744, 2001. View at: Publisher Site  Google Scholar
 H. Timm and R. Kruse, “Fuzzy cluster analysis with missing values,” in Proceedings of the IEEE Conference of the North American Fuzzy Information Processing Society (NAFIPS '98), pp. 242–246, Pensacola Beach, Fla, USA, 1998. View at: Google Scholar
 T. Shibayama, “A pcalike method for multivariate data with missing values,” Japanese Journal of Educational Psychology, vol. 40, no. 2, pp. 257–265, 1992. View at: Google Scholar
 K. Honda and H. Ichihashi, “Linear fuzzy clustering techniques with missing values and their application to local principal component analysis,” IEEE Transactions on Fuzzy Systems, vol. 12, no. 2, pp. 183–193, 2004. View at: Publisher Site  Google Scholar
 D.Q. Zhang and S.C. Chen, “Clustering incomplete data using kernelbased fuzzy cmeans algorithm,” Neural Processing Letters, vol. 18, no. 3, pp. 155–162, 2003. View at: Publisher Site  Google Scholar
 M. Sadaaki, I. Hidetomo, and H. Katsuhiro, Algorithms for Fuzzy Clustering: Methods in Cmeans Clustering with Applications, Springer, Berlin, Germany, 2008.
 D. Li, H. Gu, and L. Zhang, “A fuzzy cmeans clustering algorithm based on nearestneighbor intervals for incomplete data,” Expert Systems with Applications, vol. 37, no. 10, pp. 6942–6947, 2010. View at: Publisher Site  Google Scholar
 D. Li, H. Gu, and L. Zhang, “A hybrid genetic algorithm–fuzzy cmeans approach for incomplete data clustering based on nearestneighbor intervals,” Soft Computing, vol. 17, no. 10, pp. 1787–1796, 2013. View at: Publisher Site  Google Scholar
 B. L. Wang, L. Y. Zhang, L. Zhang, Z. H. Bing, and X. H. Xu, “Missing data imputation by nearestneighbor trained BP for fuzzy clustering,” Journal of Information & Computational Science, vol. 11, no. 15, pp. 5367–5375, 2014. View at: Publisher Site  Google Scholar
 L. Zhang, Z. Bing, and L. Zhang, “A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data,” Pattern Analysis and Applications, vol. 18, no. 2, pp. 377–384, 2015. View at: Publisher Site  Google Scholar
 G. Lanckriet, L. E. Ghaoui, C. Bhattacharyya, and M. I. Jordan, “Minimax probability machine,” in Advances in Neural Information Processing Systems, pp. 801–807, 2001. View at: Google Scholar
 K. Huang, H. Yang, I. King, M. R. Lyu, and L. Chan, “The minimum error minimax probability machine,” Journal of Machine Learning Research, vol. 5, no. 4, pp. 1253–1286, 2004. View at: Google Scholar  MathSciNet
 Y. Wang, Y. Zhang, J. Yi, H. Qu, and J. Miu, “A robust probability classifier based on the modified ${X}^{2}$distance,” Mathematical Problems in Engineering, vol. 2014, Article ID 621314, 11 pages, 2014. View at: Publisher Site  Google Scholar
 S. Song, Y. Gong, Y. Zhang, G. Huang, and G.B. Huang, “Dimension reduction by minimum error minimax probability machine,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016. View at: Publisher Site  Google Scholar
 T. B. Trafalis and R. C. Gilbert, “Robust support vector machines for classification and computational issues,” Optimization Methods and Software, vol. 22, no. 1, pp. 187–198, 2007. View at: Publisher Site  Google Scholar  MathSciNet
 H. Xu, C. Caramanis, and S. Mannor, “Robustness and regularization of support vector machines,” The Journal of Machine Learning Research, vol. 10, pp. 1485–1510, 2009. View at: Google Scholar  MathSciNet
 Y. Wang, Y. Zhang, F. Zhang, and J. Yi, “Robust quadratic regression and its application to energygrowth consumption problem,” Mathematical Problems in Engineering, vol. 2013, Article ID 210510, 10 pages, 2013. View at: Publisher Site  Google Scholar
 P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay, “Clustering large graphs via the singular value decomposition,” Machine Learning, vol. 56, no. 1–3, pp. 9–33, 2004. View at: Publisher Site  Google Scholar
 D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NPhardness of Euclidean sumofsquares clustering,” Machine Learning, vol. 75, no. 2, pp. 245–248, 2009. View at: Publisher Site  Google Scholar
 M. Blum, R. W. Floyd, V. Pratt, R. L. Rivest, and R. E. Tarjan, “Time bounds for selection,” Journal of Computer and System Sciences, vol. 7, no. 4, pp. 448–461, 1973. View at: Publisher Site  Google Scholar  MathSciNet
 R. J. Hathaway and J. C. Bezdek, “Optimization of clustering criteria by reformulation,” IEEE Transactions on Fuzzy Systems, vol. 3, no. 2, pp. 241–245, 1995. View at: Google Scholar
 M. Lichman, Uci Machine Learning Repository, School of Information and Computer Sciences, University of California, Irvine, Calif, USA, 2015.
Copyright
Copyright © 2016 Jinhua Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.