Prototype Generation Using Self-Organizing Maps for Informativeness-Based Classifier

Moreira, Leandro Juvêncio; Silva, Leandro A.

doi:https://doi.org/10.1155/2017/4263064

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Analysis Conclusion Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2017 | Article ID 4263064 | https://doi.org/10.1155/2017/4263064

Prototype Generation Using Self-Organizing Maps for Informativeness-Based Classifier

Leandro Juvêncio Moreira¹and Leandro A. Silva²

Academic Editor: Toshihisa Tanaka

Received31 Jan 2017

Revised13 Jun 2017

Accepted15 Jun 2017

Published25 Jul 2017

Abstract

The nearest neighbor is one of the most important and simple procedures for data classification task. The , as it is called, requires only two parameters: the number of and a similarity measure. However, the algorithm has some weaknesses that make it impossible to be used in real problems. Since the algorithm has no model, an exhaustive comparison of the object in classification analysis and all training dataset is necessary. Another weakness is the optimal choice of parameter when the object analyzed is in an overlap region. To mitigate theses negative aspects, in this work, a hybrid algorithm is proposed which uses the Self-Organizing Maps (SOM) artificial neural network and a classifier that uses similarity measure based on information. Since SOM has the properties of vector quantization, it is used as a Prototype Generation approach to select a reduced training dataset for the classification approach based on the nearest neighbor rule with informativeness measure, named NN. The SOMNN combination was exhaustively experimented and the results show that the proposed approach presents important accuracy in databases where the border region does not have the object classes well defined.

1. Introduction

The main task of a data classifier is to predict the class of an object that is under analysis. The simplest procedure for data classification tasks is the nearest neighbor (NN) algorithm. The algorithm strategy for classification comprises three operations: (i) an unlabeled sample is compared to dataset training through a similarity measure; (ii) the labeled objects are sorted in order of similarity to the unlabeled sample; and finally, (iii) the classification occurs giving the unlabeled sample the majority class of the nearest neighbors objects. Because of its simplified algorithm (three basic operations steps), and reduced number of parameters (similarity measure and the number of nearest neighbor), this instance-based learning algorithm is widely used in the data mining community as a benchmarking algorithm [1–5].

Since the NN algorithm has no model, an exhaustive comparison of the unlabeled sample with all the labeled and stored objects in the database is necessary, which increases the computational time of the process. In addition to this weakness of algorithm, the decision boundaries are defined by the instances stored in the training set and, for this, the algorithm has low tolerance to noise; that is, all training dataset objects are considered relevant patterns. Finally, the optimal choice of depends upon the dataset mainly when the object analyzed is in a boundary region, making this parameter to be tuned according to the application [6–9].

To overcome the drawbacks above, there are in the literature different approaches such as similarity measure alternative to the Euclidean distance to minimize misclassification in boundaries region [10], methods to avoid searching the whole space of training set [11], and dataset summarization to find representative objects of training set [9]. For the dataset summarization approach, there are two main strategies to reduce the dataset volume: one of them based on instance selection and the other based on prototypes. For the approaches based on pattern (or instance) selection, the aim is to find a representative and reduced set of objects from the training dataset, which has the same or higher classification accuracy of a raw dataset [8, 12–15]. The strategies based on prototype, on the other hand, are defined in two approaches: Prototype Selection (PS) [16] and Prototype Generation (PG) [13, 17–19]. The approaches are equivalent; both can be used to identify an optimal subset of representative prototypes, discarding noise, and redundancy. The difference is that PG can also be used to generate and to replace the raw dataset by an artificial dataset. The use of prototypes or reduced training objects that are represented by prototypes minimizes some of NN drawbacks previously mentioned as the exhaustive comparison of all training dataset.

Silva and Del-Moral-Hernandez [5] presented combination methods that use the winning neuron and topological maintain concepts of the Self-Organizing Maps (SOM) neural network to define a reduced subset of objects of the training set that are highly similar to the object that is under analysis for classification [5, 20]. This object subset is retrieved and then utilized by the NN to execute the classification task. In other words, the SOM executes a preprocessing for the NN classifier, recovering the similar objects from the winning neuron and from the adjacent neighbors of the SOM map [21].

With respect to drawback in the tuning of parameter , Zhang et al. proposed a computation learning for this parameter [22]. Song et al., on the other hand, proposed a metric based on informativeness to perform the classification process in a boundaries region, where the choice of is more sensible [10]. This algorithm was called NN and the main idea is investigating the nearest objects more informative instead of the closest. This approach outperforms the use of NN with Euclidean distance; however, it further increases the complexity of the comparison, consequently increasing process time [23].

Inspired by use of PG [5, 20, 21], we introduce a hybrid approach, where in a first step there is the SOM, which has the quantization vector and topological maintenance as important features for using it as a preprocessing in order to present to the classifier algorithm a reduced set of objects, highly similar to the unknown object that is being investigated. Next, the NN algorithm will attribute a class to the unknown object based on the most informative objects of selected set. For the initial exploratory experiments, we observed important results of accuracy and time in classification process [23].

We here formally detail how SOMNN works in hybrid architecture for classification problems. Besides that, here we introduced an experimental methodology to analyze qualitatively the SOMNN classifier in three artificial datasets, experimenting different distribution in the region of class overlapping. In addition, we perform the experiments in 21 databases publicly (7 times more than in the previous study) available in the UCI repository and also sampling way by the 5-fold cross validation method in the complementary website to the paper published by Triguero et al. [9]. The results are analyzed using accuracy, kappa, prototype reduction, and time as performance indices.

The rest of the paper is organized as follows: in Section 2, a brief explanation of Prototype Generation and the taxonomy proposed by [9] are shown; Self-Organizing Maps and the methods to use them in classification with NN are presented in Section 3. In Section 4, the experimental methodology is introduced. Experimental results, discussion, and comparative results are given in Section 5. In the last section, the conclusions are provided.

2. Theoretical Fundamental

2.1. A Brief Introduction to Prototype Generation

For a better understanding of the Prototype Generation idea, let us consider an object of a dataset, defined as a set of descriptive attributes of dimensional and with a class attribute ; that is, . Then, let us assume that is a training dataset with samples of . The purpose of Prototype Generation (PG) is to obtain a reduced set, , with instances selected or generated from , but with . The cardinality of this reduced set must be sufficiently small to decrease the evaluation time taken by a classifier (NN, for example), maintaining the classification accuracy. In fact, data reduction approaches aim mainly to summarize the raw dataset, without damaging the analytical properties, which implies performance accuracy.

For the PG methods, prototypes are used by classifiers instead of raw datasets, or they are used to generate an artificial dataset. Data generation can be interesting in some cases to eliminate data noise or to solve dataset with unbalanced class. Since the possibilities of usage are diversified, the literature presents different methods, approaches, and algorithms. This was the reason for Triguero et al. [9] to propose a PG taxonomy that is used to enhance NN drawbacks, which was defined as a hierarchical way of three levels (generation mechanisms, resulting generation set, and type of reduction), and also review the all algorithms of the PG from the literature (see [9] for a detailed explanation).

In the next section, we introduce a brief of Self-Organizing Maps and the approach is proposed, the combination of SOM and NN.

2.2. A Brief Summary for the Kohonen Self-Organizing Maps

Kohonen Self-Organizing Map (SOM) is a type of neural network that consists of neurons located on a regular low-dimensional grid, usually two-dimensional (2D). Typically, the lattice of the 2D grid is either hexagonal or rectangular [24]. The SOM learning or training process is an iterative algorithm which aims to represent a distribution of the input pattern objects in that regular grid of neurons. The similar input patterns are associated in the same neurons or in the adjacent neurons of the grid.

For the SOM training, a dataset is chosen and divided into two distinct sets. The training set is used to train the SOM which is here called . The other set is used to test the trained SOM (). After this dataset division, we start the training SOM. Formally, an object is randomly selected from during a training, defined as , where the element is an attribute or feature of the object, which belongs to . The object is similar to what was before defined, but without the class information. Additionally, each neuron of the SOM grid has a weight vector , where ; here is the total number of neurons of the map.

During the learning process, the input pattern is randomly selected from the training set and it is compared with the weights vector of the map, initially initialized randomly. The comparison between and is usually made through Euclidean distance. The shortest distance indicates the closest neuron , which will have its weight vector , updated to get close to the selected input pattern . Formally, neuron is defined as follows:

The closest weights vector and their neighbors are updated using the Kohonen algorithm [24]. However, the topological neighborhood is defined so that the farther away the neuron from , the lower the intensity for the neighborhood to be updated. The intensity of the neighborhood function is defined in relation to the training time. In other words, in initial times, the level has high value and, according to the next iterations, it is reduced at each iteration. See Kohonen [24] for a complete explanation of the training rule of the SOM map.

2.3. Building a Prototype Generation Based on SOM

Since the training phase has been completed, each input pattern object from the training set has to be grouped to the closest neuron. The idea in this approach of using SOM as a PG technique is that the index of each instance is a part of the nearest neuron list. Thus, the list of each neuron is here called the Best Matching Unit List (), formally defined aswhere is assigned to the number of the map neuron and is a list with the indexes of input patterns objects associated with the nearest neuron.

The relationship between the instance of training set and the list of the best match unit is of many-to-one. That is, some units , which we could call microclusters, must be associated with one or more instances and other units may have no associations; that is, the list can be empty .

The classification method proposed herein explores two important characteristics of the SOM: vector quantization and topological ordering [24]. For better understanding these features, consider the representation of Figure 1 with input patterns objects (filled circles) used for training a SOM map and the weight vectors of each neuron (squares) after the training phase. In this figure, each weight vector represents a microcluster of input patterns, which is a quantization characteristic. The relationship between the weight vectors can be interpreted as a boundary, which can be understood as a Voronoi region, as exemplified by the shaded area in Figure 1. In operational aspects of use, this can be considered in a classification process in which the strategy, introduced and explored herein, means to establish a two-step process. In the first step, when a test sample (see Figure 1, the unfilled circle) is compared to the weight vectors of the trained SOM map (the squares of Figure 1), the algorithm defines the closest unit according to the following equation:

Hence, as is the nearest unit, we know the list with input patterns indices that should be queried, that is, . Illustratively, consider that weight vector belongs to a Voronoi region; see Figure 1, the shaded area, which has a list with the indices of input patterns known (filled circle). Also in this figure, the unlabeled sample (unfilled circle) belongs to the region covered by unit (shaded area); that is, in the second step of the classification process, the NN algorithm is performed with a reduced set of objects.

However, note that the input patterns object stored in the dataset (filled circles), which are the closest to the object being classified (unfilled circle), belong to neighboring Voronoi regions and are consequently represented in other lists; see Figure 1, circle with a dotted line.

For that reason, in a classification task with NN or (NN as will be introduced in the next section) combined with SOM, the use of the objects represented only as list results in a substantially reduced classification process time but can reduce the accuracy rate. Thus, we explored the second important feature of SOM, the topological ordering of the training dataset objects. In other words, in addition to the list, the lists of adjacent neurons in the SOM map grid are also consulted.

The visit of adjacent units depends on the grid initially set at the SOM training phase. For the SOM trained with rectangular lattice topology, the units of the four adjacent units should be considered. Thus, the list for the unknown pattern is defined as

Otherwise, for a hexagonal lattice topology, we have to consider six adjacent units and so on. In previous studies using SOM with NN [5, 20, 21], we compared the two neighborhood topologies (rectangular and hexagonal) and the results were equivalent. For this reason, the rectangular lattice topology was chosen in this work.

Finally, in the second step of the classification method proposed here, the reduced objects set belonging to (4) is used to find the nearest neighbors (NN). Note that the set of objects extracted from the query lists, that is, , is part of the set of input patterns objects used for the SOM training; that is, and . Formally, we have

Thus, the class of the nearest (or informative instances as will be explained in the next section) is used to label the unknown sample . This framework combination was initially called SOMNN (and here will be introduced the SOMNN classifier).

In summary, the conventional algorithm NN (or NN) compares the unknown sample with all the instances of the dataset; here, the comparison is limited to a selection of the objects; that is, the comparison is restricted to a small number of instances from the training dataset. The main implementation steps are described as a pseudo-code in Algorithm 1.

Input: The weight vectors of SOM Map trained (); training objects dataset (); and an unknown sample ()
Output: the label of unknown sample ()
Compare the unknown sample with each weight vector of using Eq. (3)
The units to be visited are defined by Eq. (4) and the input patterns objects are retrieved by Eq. (5), recovering a reduced dataset
training .
The reduced dataset and the unknown sample are used by a classifier (NN or NN) that return the object class.

As verified in this section, we formalized a strategy to select input pattern objects to be used as references in a classification task and to speed the time of NN algorithm. The next section introduces the NN algorithm which is less sensible to parameter and for this works better than NN in datasets with overlapped classes (boundary not well defined).

2.4. Informative Nearest Neighbors

Some data classification approaches based on nearest neighbor, in addition to defining a given range of values to find the nearest neighbors, also utilize new distance metrics, such as the informative nearest neighbor [10]. In other words, they utilize in the analysis of a new object of unknown class a measure that quantifies which training set object is most informative.

In order to find the informative nearest neighbor, the NN algorithm, as it is called in the proposal by Song et al. [10], calculates the informativity through the following equation:where is the value of the informativity between the neighbor and the object under analysis of unknown class (), to the extent that is the probability of the object being the informative nearest neighbor. This probability is defined by the following equation:

The first term in (7) is defined as the probability that the object is close to the object and is defined as , where is the number of objects that have the same class as . The second part in (7) indicates the probability that the object is distant from the other objects of the training dataset . The indicator will be 1 if the class attributes of the objects and are different; in other words, . Therefore, it can be understood as a penalty factor.

The probability in (7) can be defined as a function of distance between the objects; in other words,

To understand the NN algorithm in practical terms, consider the dataset utilized in Figure 2(a), where it is represented by shaded circles, to the extent that the shades (dark and light) represent the two classes of the set. Now consider Figure 2(b), which has the same training objects with the addition of an object without class represented by a circle without shade. Now, consider in Figure 2(c) the contours in training objects and test object, representing the classification process executed utilizing the traditional NN, with Euclidean distance and value being equal to 5. In this process, the majority class of the nearest neighbors is the one that is represented by dark shading. And, therefore, the decision-making process is made by this class. However, the object under analysis has as its nearest neighbor an object of the training set that belongs to the class with light shading and this, on the other hand, also has as its neighbor another object of the same class. Therefore, utilizing the NN algorithm, the informativity takes into consideration not only the majority class but also the nearest objects and the concordance that the other objects of the training set have with the nearest object. In conclusion, in the case of the NN, the classification would be made by the class represented by the lightest shading Figure 2(d).

(a) Training dataset

(b) Training dataset with the test object

(c) NN classification process

(d) nn classification process

Thus, the concept of informative nearest neighbor has the following definitions. Within the nearest neighbors, the object that is nearest to the object that is being classified, which is distant from other objects of different classes, is considered the most informative object, such that its class is attributed to the unknown object. On the other hand, the object that has a different class from the most informative object is considered least informative. An object is also considered least informative within the nearest neighbors when it has the same class as the most informative object and is nearest to other objects of different classes.

The informativity calculation has a high computational cost because, in addition to comparing the object under analysis with the objects of the training set (first part in (7)), the algorithm still requires a comparison between the training set objects (second part in (7)). In order to reduce the computational effort, Song et al. [10] suggest having the execution of the NN algorithm before executing the NN to define a reduced dataset with most similar objects, according to the Euclidean metric. However, the NN algorithm has the disadvantages presented in the Introduction (the need to store the training set, noise sensibility, etc.) and its use before the NN can affect the performance in the classification of objects that are in a border region, as illustrated in Figure 2(c).

The following section presents a proposal that combines the SOM with the NN algorithm to build a process that will be named SOMNN. This section will also show the advantages of the SOMNN over the NN.

3. Methodology for Combining SOM and NN: A Hybrid Classification SOMNN

The approach utilized by the SOMNN classifier explores the concept of quantization, topology maintenance, and informativity. As already mentioned, an informative object allows the correct prediction of an unknown object, even in boundary not well defined. When talking about information, we cannot have information quality without first significantly measuring this. Information quality is one of the determining keys for the quality of the decisions and actions that are made according to it [25]. It is exactly what the SOMNN classifier proposes to do; in other words, before predicting the class of the unknown object, it measures the information of the training set objects before making the classification decision.

In order to understand the SOMNN combination, consider a SOM trained with the objects from Figure 2(a) without using the class information (shaded color). The prototypes adjusted resulting in trained SOM map (weight vectors) are represented in Figure 3(a). The result of the SOM can be generally understood as being a summary of the training set, through a set of prototypes that have a Voronoi region, with the number of prototypes being smaller than that of the training set, in the following example: the twelve objects were summarized into four prototypes. The number of prototypes is a parameter that refers to the number of neurons of the SOM map.

(a) Training set and prototypes generated after the training of the SOM

(b) SOM: The object to be classified is represented by the circle and the prototypes are represented by the squares

(c) NN: The object under analysis will be compared with the objects represented by the nearest or best match prototype

Now consider the new object classification submitted to process that was presented in Figure 2(b). Also consider the prototype set being utilized in a first comparison, instead of the training set. In this case, for the classification process, in initial phase a comparison will be done between the object under analysis and the set of prototypes, as illustrated in Figure 3(a). Repeating the process that takes place in training the SOM for the selection of the winning neuron, made using the Euclidean distance, the nearest neuron is selected (winner or best match) to the object under analysis. From this process where the nearest prototype is known and that, on the other hand, it is possible to know which training set objects are represented by the prototype, see Figure 3(b) where each prototypes has a Voronoi region. Thus, the reduced training set objects are retrieved to start the classification phase with the . Finally, the classification will be done with a reduced set, as shown in Figure 3(c).

In summary, the last step in Algorithm 1, the use of classifier algorithm, is executed with NN. This process is called here as SOMNN classifier.

Since the process will depend on the selection of the number of neurons of the SOM map, we will utilize the empirical proposal of Vesanto et al. [26] that defines the number of neurons as being the root of the number of objects of the training set. What happens is that, after training the SOM map, some prototypes can be empty; in other words, the prototype represents no object of the training set. In order to prevent this situation from happening, the proposal of Silva and Del-Moral-Hernandez [5] will be utilized. Thus, besides retrieving the objects of the winning prototype, it will also consider the retrieval of the adjacent prototypes.

The combination using the SOM neural networks approach with the NN explores the main characteristics that define the potential of a data classifier, which are storage reduction, noise tolerance, generalization accuracy, and time requirements [9]. To the contrary of the NN that preprocesses the data utilizing the NN algorithm that has a high computational cost, the algorithm proposed in this work reduces the data representation through the SOM. In addition, with the use of the SOM, the classification time of the SOMNN is expected to be shorter when compared with the NN, which results in less memory use, maximizing the classifier’s performance in terms of classification time.

The next section highlights all steps that were done to make the experiments with the SOMNN classifier.

4. Experimental Methodology, Results, and Analysis

This section will present the dataset utilized in the experiments and the parameterization of the classifiers utilized for the comparison with our SOMNN proposal. The experiments consist in using an artificial dataset for qualitative and quantitative analysis and with public dataset used as benchmarking in the literature to evaluate the efficacy of the algorithm proposed.

4.1. Datasets

In order to provide a qualitative analysis with visualization of the border decision-making area and a quantitative analysis in terms of classifier accuracy, three datasets were generated with the following features: 300 objects, two attributes, two classes, and a balanced number of objects per class. For all datasets, the objects were distributed with the same mean but with difference in the standard deviation value, in order to force an overlapping of classes. Thus, each dataset represents distinct situations on the border of classes: no, low, and high confusion.

In order to evaluate the efficacy of the algorithm proposed and compare it with others from the literature, 21 public databases were chosen (Repository of the University of California, Irvine, UCI) that are used as benchmarking for Prototype Generation approaches. Table 1 summarizes the properties of each benchmarking dataset in number of objects (Obj), number of attributes (Att), and number of classes (Cla). For all databases, the attributes are numerical. The separation of these datasets in training and test set were done with the use of the 5-fold cross validation.

4.2. Parameterization of Algorithms

The SOMNN approach will be compared with NN, NN, and SOMNN. The classifiers parameterizations are represented in Table 2. The SOM parametrization is the same for SOMNN and SOMNN.

The experiments were implemented using the R language, version 3.1.2, with RStudio IDE version 0.98 and using a conventional computer with Windows 10, i7 with 8 GB RAM. The experimental results are presented in the following section.

4.3. Qualitative and Quantitative Analysis Using the Artificial Dataset

The objective of experiments using artificial dataset was to compare the performance of NN, NN, SOMNN, and SOMNN classifiers in situations where there are well-separated classes (Figure 4(a)), classes partially overlapped (Figure 4(b)), and a large number of classes overlapped (Figure 4(c)).

(a) Dataset without overlapped classes

(b) Dataset with little classes overlapped

(c) Dataset with large classes overlapped

Analyzing qualitatively, starting by Figures 4(a)(A), 4(b)(A), and 4(c)(A), NN results, we can note that the boundary separation degrades from the moment that the classes start the overlapping. In the worst case, we can observe that a high overlap (Figure 4(c)) is clearly one of the NN disadvantages, because it makes the decision boundary considering all objects as having the same importance. For the NN results (Figures 4(a)(B), 4(b)(B), and 4(c)(B)), it is clear that the border of separation is softer, even when the class overlap increases. This is because the separation was defined by informative representation of the objects from the same class. This fact is most evident in the last experiment (Figure 4(c)), where we can observe that the boundary separation is created to preserve the predominant class in the border region.

Figures 4(a)(C and D), 4(b)(C and D), and 4(c)(C and D) are the results using SOM as the Generation Prototype approach. That is, the decision boundary was generated without using all objects of the database but, instead, based on objects distributed in prototypes of the trained SOM map. In this qualitative analysis, the most important to note is that the preservation of the decision boundary was maintained in all experiments, without significant changes.

Finally, we analyzed quantitatively the experiments with artificial data, with an average classification accuracy defined by a dataset with objects (which were used to generate the decision boundary of Figure 4). The results are shown in Figure 5. The conclusion for the experiments using artificial dataset is that the use of NN is more effective than NN when the separation class has high confusion and that for this, the performance accuracy has not been abruptly reduced. We also note in this qualitative analysis that the use of SOM as Prototype Generation method does not significantly degrade the accuracy performance.

The use of artificial datasets can make qualitative and quantitative analyses between the classifiers. The next experiment has the objective of expanding the previous study [23] through analysis with other performance measures, such as kappa, impact of dataset reduction on the accuracy, and performance of classification time. For these new experiments, 12 new public datasets were used that are benchmarking in Prototype Generation approach [9].

4.4. Experiments and Analysis Using the Benchmarking Dataset

This section shows the experiments and results for datasets introduced in Table 1. The results are analyzed using the following measures as performances: accuracy, kappa, hypothesis test, rate of dataset reduction, and classification time.

Table 3 shows all the classification results for the paper experiments. In this table, the accuracy and kappa measures are shown in terms of average and standard deviation. The other results are also discussed in this section.

In practical aspects, the accuracy and kappa measures are equivalent in terms of performance. For purposes of simplification, only the accuracy will be considered in the extended discussion of the result analysis.

The accuracy is analyzed by comparing the results of the classifiers in pairs. The average and the result deviation of each dataset are compared using the -test with 95% of confidence interval. The comparison result is shown in Table 4. In this table, the datasets indices (“#,” see Table 1) are separated according to the classifier results: higher results (), equal results (), and lower results (), with and being a representation of classifiers compared in pairs. From this result, the same comparison structure will be used again for the count of the incidences percentage.

The general counting is shown in Table 5. This table has an additional column to represent the sum of the percentages of equal and lower results, in order to show when a classifier performance is really better than the other. This was the reason to compare () with () and () in Table 5.

Analyzing the results from Table 5 is possible to note in the first two lines of table that NN showed to be better than NN in most of cases (66.7% and 52.4%, combined with SOM). The use of SOM in the classification process (last two lines of the table) has been shown to be slightly better or worse in some cases. The SOM performance with NN is improved (52.4%) and with NN there is a little degradation (47.7%). However, an important result that should be emphasized (last row of the table) is that the use of SOM with NN maintains or improves the performance in most databases (52.4%).

As a final analysis of the accuracy performance, in order to show that the degradation with the use of SOM has little impact on the final performance, the comparison of the same pair of classifier presented above is performed using a radar chart (Figure 6). In this graph, the external values (polar scale) indicate the dataset number of 1 to 21, and the internal values show the accuracy performance, starting from 0.6 to 1.0. The ideal result would be to have the graph contour in 1.0. In this study, the main results are obtained for the overlapped lines, representing an equivalent result for contrasted classifiers. Combining the results of the accuracy performance (statistical and chart), we can consider that NN has, in the vast majority of studies, a superiority in the classification performance when compared to NN. From this result, it is interesting to note that the NN superiority occurs mainly in datasets with performance below 90% as follows: 7, 9, 11, 12, 14, 15, 16, 18, 19, and 20. On the other hand, the result is lower in experiments with datasets 2 and 21, that is, where the accuracy performance is close to 1 (100%). Therefore, the results also suggest that NN has superior results in datasets in which the decision boundary is not well separated.

The next analysis consists in verifying the SOM efficiency in reducing input objects. For this, the reduction and accuracy percentage of each dataset performance is checked. The results are shown in Figure 7. Interestingly, in both results, SOMNN and SOMNN, there are three regions very well defined in the accuracy reduction experiment. The first datasets have an average of 150.5 objects, with the second averaging 215 and the last averaging 694.5 objects. That is, the reduction varies with the number of objects. Therefore, the results of SOMNN (Figure 7(a)) and SOMNN (Figure 7(b)) show that the more objects in dataset, the higher the reduction rate.

(a) Results for SOMNN

(b) Results for SOMNN

The next results to be analyzed are the time consumed in the classification process. The results are shown in Figure 8, and, for interpretation purposes, the databases are arranged in the vertical axis and are organized in ascending order of number of objects. In vertical axis, each dataset is described by name, number of objects, and number of attributes (described in Table 3). The time shown on the horizontal axis is measured in seconds.

(a) Classification time of NN and NN

(b) Classification time of SOMNN and SOMNN

By analyzing in detail the result of the time classification algorithms NN and NN in Figure 8(a), it is observed that, to a certain number of objects, around 180 (datasets appendicitis to wine), the classification time is almost linear. From this point the tendency curve is not clear. The reason is that there are an increasing number of objects in these other databases and also a variation in the number of attributes. This means that the classification time depends not only on the number of objects but also on the number of attributes, for example, the balance database (625 objects) and dermatology (366 objects), whose last dataset has a smaller number of objects and consumes more time. Another interesting case to mention is observed between the base mov_libras and vowel. The former has almost half the number of objects and nearly ninefold more attributes than the latter but both consumed an equivalent time in the classification process. Another point to consider in the graph is that, for every experiment, the classification time of NN is higher than NN. This result was expected because, as mentioned earlier, NN is computationally more costly due to the fact that NN is run before it as a preprocessing step and, thus, it finds the closest informative object. Although it seems to be an obvious result, the experiments confirm their reliability. Finally, for a general idea of the time, a tendency line was added to the results and the best adjustment was an exponential trend, with Pearson coefficient above 0.7, which is considered a high value. As it is difficult to find a relationship between the numbers of objects and attributes to explain the process timing, the trend is more indicative about the number of objects. Thus, for this experiment, the classification time is more sensitive to the number of objects.

The same time experiment discussed above was repeated for SOMNN and SOMNN (Figure 8(b)). The behavior of the results in this experiment is similar to that discussed for Figure 8(a). This can be interpreted in two ways. The first is that the above analysis can be applied for these results and, more importantly, that the objects selected by the reduced set SOM prototypes can maintain the characteristics of the raw database. However, it should be noted in the result analysis that the time classification scale (horizontal axis) ranges from 0 to 100 seconds. In the earlier results, the scale ranged from 0 to 350 seconds. Nonetheless, the importance of this result is that the trend remains exponential, with 0.7. It is noteworthy that, in the result time shown for SOM (SOMNN and SOMNN), the training time is included.

For a global analysis, in the Figure 9, there are all classification time results together: NN, NN, SOMNN, and SOMNN. The databases in the horizontal axis were arranged again in quantities of objects. By analyzing qualitatively the results shown in this figure, we can note that when the number is lower than about 180 (to the wine dataset), the use of SOM as a preprocessing to NN and NN algorithms in order to reduce the time classification does not have significant advantages. Thus, the use of SOM to decrease the classification time of NN and NN algorithms seems to be more advantageous in database with more than 180 objects (from sonar dataset). This result can be observed at the upper end, where the consumption of classification time is high (vowel dataset), and the use of SOM can reduce by more than 3 times the NN and NN classification time.

4.5. Contrasting the Results of This Work with That in the Literature

For an idea about the importance of the results herein mainly using the NN and the combination SOMNN approach, the performance indexes obtained here was compared with the literature result [9]. The approach chosen for the comparative experiments, Chen algorithm, belongs to the same Prototype Generation category of SOM. The algorithm named Chen [9, 27] was executed using the datasets of Table 3 and the compiled results for this algorithm are shown in terms of average and standard deviation of accuracy, time, and reduction. The comparative results are summarized in Table 6.

Note from the comparative results of Table 6 that NN is the algorithm that has the best performance accuracy. This is an important result because it is the algorithm introduced here as an alternative to NN. In terms of time, the lowest result was obtained by the SOMNN; therefore, it involves the SOM as the approach of Prototype Generation method introduced in this work and it is expected that NN is more time consuming than the NN, as discussed in Section 2. Finally, the Chen algorithm has the bigger reduction, which is to be expected too, since according to Triguero et al. [9], the prototypes parameter has to be configured as being 90% of the number of objects of the dataset.

5. Conclusion

This paper introduces a new classifier named SOMNN, which is based on the combination of Self-Organizing Maps (SOM) and informative nearest neighbors (NN). The NN classifier is costly in computational terms, because in a classification process the informativity is not calculated only by the object under classification analysis, but also considering the other objects of the training set. Song et al. [10] suggested the use of NN algorithm (with the best value experimentally found as being 7) before NN to minimize the high computational cost, that is, using -NN to find a reduced subset for the classification process with the informative nearest neighbor algorithm.

In order to contribute to the Song et al. [10], in this paper, the NN has been substituted by SOM because of quantization vector and maintenance topological of raw dataset. In other words, a SOM map is trained with the dataset and, after this, the objects of this set are associated with the nearest (or winning) neurons. And, thus, each neuron of the map or prototype represents an object subset. Now, in a classification process, the object is compared with the map prototypes, where the winner is elected. The objects mapped in this winning neuron and adjacent neurons are retrieved and presented to then have the execution of NN.

Thus, due to the preprocessing made by the SOM to the NN algorithm, the computational effort as a whole to find the informative nearest neighbor is much smaller, which results in a significant reduction in the classification time when compared to the classification time of the NN without preprocessing.

Therefore, the primary objective of the classifier addressed in this paper was the maintenance of the accuracy of the NN and the reduction of the classification time in a classification process, thus concluding that the use of the objects represented by the winning neuron and adjacent neurons was effective in the analytical aspects by not degrading the performance of NN. The results presented in Section 4 indicate this reduction of the time and, in addition, that the classification rates of the SOMNN are statistically similar when compared to the NN, that is, time reduction and accuracy preservation.

Another important conclusion in analysis of the classification experiments, mainly using artificial dataset, and also in benchmarking dataset where the accuracy performance was worst, the NN approach presents more significant accuracy results when the objects of different classes are not well separated, with high mixture in the border region.

As a final conclusion, the NN is an algorithm with accuracy performance better than NN. But the classification time is a bottleneck for the algorithm, which is minimized using SOM as a Prototype Generation technique. Thus, the SOMNN classifier is proposed here which is specialized to solve problems where the border region is not well defined in a tolerable time.

Conflicts of Interest

The authors declare that there are no conflicts of interest with regard to the publication of this paper.

Acknowledgments

This work was partially supported by CNPq (Brazilian National Council for Scientific and Technological Development) Process 454363/2014-1.

References

T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
View at: Publisher Site | Google Scholar
S. B. Kotsiantis, B. Sotiris, I. Zaharakis, and P. Pintelas, Supervised Machine Learning: A Review of Classification Techniques, IOS Press, 2007.
M. Sjöberg and J. Laaksonen, “Optimal combination of som search in best-matching units and map neighborhood,” in Proceedings of the 7th International Workshop on Advances in Self-Organizing Maps, vol. 5629, pp. 281–289, Berlin, Germany, 2009.
View at: Publisher Site | Google Scholar
X. Wu and V. Kumar, The Top Ten Algorithms in Data Mining, CRC Press, 2009.
L. A. Silva and E. Del-Moral-Hernandez, “A SOM combined with KNN for classification task,” in Proceedings of the 2011 International Joint Conference on Neural Network, IJCNN 2011, pp. 2368–2373, San Jose, Calif, USA, August 2011.
View at: Publisher Site | Google Scholar
R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, Wiley-Interscience, New York, NY, USA, 2nd edition, 2001.
View at: MathSciNet
A. Torralba, R. Fergus, and W. T. Freeman, “80 million tiny images: a large data set for nonparametric object and scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958–1970, 2008.
View at: Publisher Site | Google Scholar
Z. Deng, X. Zhu, D. Cheng, M. Zong, and S. Zhang, “Efficient kNN classification algorithm for big data,” Neurocomputing, vol. 195, pp. 143–148, 2016.
View at: Publisher Site | Google Scholar
I. Triguero, J. Derrac, S. García, and F. Herrera, “A taxonomy and experimental study on prototype generation for nearest neighbor classification,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 42, no. 1, pp. 86–100, 2012.
View at: Publisher Site | Google Scholar
Y. Song, J. Huang, D. Zhou, H. Zha, and C. L. Giles, “IKNN: Informative K-Nearest Neighbor Pattern Classification,” in Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, pp. 248–264, Springer, Berlin, Germany, 2007.
View at: Google Scholar
S. Zhang, X. Li, M. Zong, X. Zhu, and R. Wang, “Efficient kNN Classification With Different Numbers of Nearest Neighbors,” IEEE Transactions on Neural Networks and Learning Systems, no. 99, pp. 1–12, 2017.
View at: Publisher Site | Google Scholar
D. Randall Wilson and T. R. Martinez, “Reduction techniques for instance-based learning algorithms,” Machine Learning, vol. 38, no. 3, pp. 257–286, 2000.
View at: Publisher Site | Google Scholar
H. Brighton and C. Mellish, “Advances in instance selection for instance-based learning algorithms,” Data Mining and Knowledge Discovery, vol. 6, no. 2, pp. 153–172, 2002.
View at: Publisher Site | Google Scholar | MathSciNet
E. Marchiori, “Class conditional nearest neighbor for large margin instance selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 364–370, 2010.
View at: Publisher Site | Google Scholar
X. Zhao, W. Lin, J. Hao, X. Zuo, and J. Yuan, “Clustering and pattern search for enhancing particle swarm optimization with Euclidean spatial neighborhood search,” Neurocomputing, vol. 171, pp. 966–981, 2016.
View at: Publisher Site | Google Scholar
E. Pȩkalska, R. P. W. Duin, and P. Paclík, “Prototype selection for dissimilarity-based classifiers,” Pattern Recognition, vol. 39, no. 2, pp. 189–208, 2006.
View at: Publisher Site | Google Scholar
S.-W. Kim and B. J. Oommen, “A brief taxonomy and ranking of creative prototype reduction schemes,” Pattern Analysis & Applications, vol. 6, no. 3, pp. 232–244, 2003.
View at: Publisher Site | Google Scholar | MathSciNet
M. Lozano, J. M. Sotoca, J. S. Sánchez, F. Pla, E. Pekalska, and R. P. W. Duin, “Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces,” Pattern Recognition, vol. 39, no. 10, pp. 1827–1838, 2006.
View at: Publisher Site | Google Scholar
H. A. Fayed, S. R. Hashem, and A. F. Atiya, “Self-generating prototypes for pattern classification,” Pattern Recognition, vol. 40, no. 5, pp. 1498–1509, 2007.
View at: Publisher Site | Google Scholar
L. A. Silva, E. Del-Moral-Hernandez, R. A. Moreno, and S. S. Furuie, “Combining wavelets transform and Hu moments with self-organizing maps for medical image categorization,” Journal of Electronic Imaging, vol. 20, no. 4, Article ID 043002, 2011.
View at: Publisher Site | Google Scholar
L. A. Silva, E. C. Kitani, and E. Del-Moral-Hernandez, “Fine-tuning of the SOMkNN classifier,” in Proceedings of the 2013 International Joint Conference on Neural Networks, IJCNN 2013, Dallas, TX, USA, August 2013.
View at: Publisher Site | Google Scholar
S. Zhang, X. Li, M. Zong, X. Zhu, R. Wang, and D. Cheng, “Learning k for knn classification,” ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 3, p. 43, 2017.
View at: Publisher Site | Google Scholar
L. J. Moreira and L. A. Silva, “Data classification combining Self-Organizing Maps and Informative Nearest Neighbor,” in Proceedings of the 2016 International Joint Conference on Neural Networks, IJCNN 2016, pp. 706–713, Vancouver, BC, Canada, July 2016.
View at: Publisher Site | Google Scholar
T. Kohonen, “Essentials of the self-organizing map,” Neural Networks, vol. 37, pp. 52–65, 2013.
View at: Publisher Site | Google Scholar
B. Stvilia, L. Gasser, M. B. Twidale, and L. C. Smith, “A framework for information quality assessment,” Journal of the American Society for Information Science and Technology, vol. 58, no. 12, pp. 1720–1733, 2007.
View at: Publisher Site | Google Scholar
J. Vesanto, J. Himberg, E. Alhoniemi, and J. Parhankangas, “Som toolbox for matlab 5,” Tech. Rep. A57, Helsinki University of Technology, Finland, 2000.
View at: Google Scholar
C. H. Chen and A. Jóźwik, “A sample set condensation algorithm for the class sensitive artificial neural network,” Pattern Recognition Letters, vol. 17, no. 8, pp. 819–823, 1996.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Leandro Juvêncio Moreira and Leandro A. Silva. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1669

Downloads

1139

Citations