(i) Proposes kNN optimization which automatically balances the data points evenly across all the classes to avoid model misfits. (ii) They have set the value of to 7 for their experiment. (iii) Three datasets are used Reuter-21578, industry sector, and TDT-5.
(i) Much faster than [30]. (ii) Prototype selection recommends the most portable prototypes for training purposes. (iii) [31] eliminates most of the data points in training to increase speed.
(i) The Gini index can easily predict the skewness in the majority class and creates many subtrees to balance the data points. (ii) Can work faster than [30, 31]. (iii) [32] combines both feature subspace selection and splitting criterion to create multiple subtrees to balance the data.
(i) Handles the imbalance problem better than [32] by performing resampling. (ii) Instance weighting enables one to assign few weights for the imbalanced class so that the end performance (in terms of accuracy) is balanced. (iii) They validated the proposed method with SVM classifier.
(i) Instead of balancing the data points across all the classes, this method uses the topic model to construct new data points in each class to create a complete dataset. (ii) This method considers more data points than [30, 32, 33] because of topic modeling which can construct new data points.
(i) Aims to reduce the dimensions of the document matrix representation. (ii) Instead of recommending data points from the data set (such as [31]), the bag of concepts groups one or more data points into topics. (iii) Bag of concepts solves many problems in the traditional bag of words models such as high dimensionality and sparsity issues.
(i) Enhances the problem of [35] by not considering the relationships among the documents. (ii) The features and their relationships are extracted based on deep learning. (iii) The ontology enhancement proposed in [36] helps to reduce the high dimensions. (iv) This method consumes more time in training the samples.
Proposed
Weighted feature selection
(i) Resolves the imbalance problem by assigning weights to the most important features. (ii) Three classifiers are used, namely, kNN, SVM, and Naïve Bayes. (iii) [35] fails to detect the relationships among the documents. In this paper, the proposed system detects the relationship and considers it for classification.