Mathematical Problems in Engineering

Volume 2016, Article ID 9324793, 7 pages

http://dx.doi.org/10.1155/2016/9324793

## Neural Gas Clustering Adapted for Given Size of Clusters

Department of Applied Informatics and Mathematics, University of SS. Cyril and Methodius, J. Herdu 2, 917 01 Trnava, Slovakia

Received 19 April 2016; Revised 8 September 2016; Accepted 26 October 2016

Academic Editor: Dan Simon

Copyright © 2016 Iveta Dirgová Luptáková et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Clustering algorithms belong to major topics in big data analysis. Their main goal is to separate an unlabelled dataset into several subsets, with each subset ideally characterized by some unique characteristic of its data structure. Common clustering approaches cannot impose constraints on sizes of clusters. However, in many applications, sizes of clusters are bounded or known in advance. One of the more recent robust clustering algorithms is called neural gas which is popular, for example, for data compression and vector quantization used in speech recognition and signal processing. In this paper, we have introduced an adapted neural gas algorithm able to accommodate requirements for the size of clusters. The convergence of algorithm towards an optimum is tested on simple illustrative examples. The proposed algorithm provides better statistical results than its direct counterpart, balanced* k*-means algorithm, and, moreover, unlike the balanced* k*-means, the quality of results of our proposed algorithm can be straightforwardly controlled by user defined parameters.

#### 1. Introduction

Data amount in various disciplines, ranging from bioinformatics to web documents, increases nonlinearly each year. However, to exploit these data and to extract knowledge from them, their effective processing is necessary. Big data analysis contains cluster analysis together with clustering algorithms as its major topic. The goal of unsupervised clustering as a data mining task is to separate an unlabelled dataset of “observations” into several sets, where each separate set is ideally characterized by its unique hidden data structure. Since a definition of the principle underlying such a data structure is subjective, there does not exist the best clustering algorithm or the best definition of a cluster. Among major approaches to clustering belong hierarchical, partitional, neural network-based or kernel-based clustering [1].

In many applications, sizes of clusters are bounded or known in advance. Examples can be viewed in student study size segmentation [2] or could be used in testing for division into test groups [3] (e.g., searching in parts of a network for intrusions [4]), in customer segmentation for sales groups in marketing, with given/bounded capacities of each team, in job scheduling problem, where machines have given capacities, in document clustering constrained by storage spaces [5–7], or in divide and conquer methods, where the divide part is controlled by clustering [8]. However, traditional clustering techniques cannot impose such constraints on the sizes of clusters. Nevertheless, a few attempts occurred recently to modify classical clustering algorithms like* k*-means to accommodate such requirements as equal cluster size [5, 8, 9], using also application of linear programming optimization techniques like in [7].

The* k*-means algorithm, whose core was suggested already about 60 years ago [10], separates observations into clusters; each observation belongs to the cluster with the nearest mean. The problem is NP-hard, but there are available fast heuristic algorithms for its solution. These converge to local optima which sometimes can produce counterintuitive results. A standard algorithm starts with a set of centroids and then repeatedly assigns each of the input “observations” to the closest centroids and recalculates the centroid of each set in the partition. Unluckily selected initial positions of the centroids may cause the algorithm to fall to local optima instead of the global optimum.

The linear programming technique used in one version of* k*-means to include constraints [7] allows for relaxing constraints. Since it involves fixing weights for the importance of these constraints, these results cannot be directly compared with our method described further. Our method, similar to [9], does not include weighting of constraint satisfaction as it produces their exact satisfaction.

In this paper, we shall be using neural gas clustering algorithm which was first introduced in [11] and it has not been used for clustering with constrained sizes of clusters yet. Its inspiration comes from a type of neural network called SOM (self-organizing map) [12]. Neural gas is similar to both neural network-based clustering and partitional one. Neural gas gained popularity thanks to its robust convergence compared to online* k*-means clustering. It is mostly used in speech recognition and image processing for data compression or vector quantization.

Neural gas, similarly to SOM, can be used for putting together related data, or for clustering. It finds optimal data representations based on feature vectors (“observations,” represented by data points in multidimensional space) and is typically used in pattern recognition.

Similarly to SOM and a few other artificial neural networks, adaptation of neural gas repeats competitive learning and mapping. “Learning” moves cluster centers in the feature space by a competitive process called vector quantization using “observations.” “Mapping” assigns the “observations,” where each observation is assigned to the cluster with the closest center (Euclidean distance is used). Neural gas is composed of neurons (defining centers of clusters, their number is fixed in advance), and position of each neuron is defined by weight vector of the same dimension as the “observations” data vectors. The positions of neurons move around abruptly during the training, similar to gas molecules movements, which gave the algorithm its name.

In our testing cases, each weight vector defines the coordinates of its neuron in 1D or 2D space. After the adaptation, the coordinates of neurons should ideally correspond to the centers of clusters.

Initially, each neuron is associated with a random point (vector) from the “observations” data. Then, a randomly selected observation point is presented to the neural gas network. Euclidean distances of the selected point to all the weight vectors of neurons are calculated, and these centers are sorted by their distance from the selected point, from closest to most distant . Then, each weight vector of the ordered sequence is adapted bywhere is the number of neurons with weight vector closer to the current point than the current weight (i.e., index of its vector in ordered sequence, minus 1), *ε* is an adaptation step size, and *λ* is neighbourhood range. The parameters *ε* and *λ* are reduced with increasing number of presented points by the following equations:where iter is the number of points presented so far and itermax is the total number of presented points.

The changes during the learning can be compared to the gradient descent method in the most typical multilayered perceptron neural networks, where the difference of the network output from the ideal output is proportional to the size of the adaptation change of the weights. Unlike SOM, not only winning neuron changes its position, but also all the other neurons move, and the more distant neurons move less.

#### 2. Materials and Methods

##### 2.1. Changes in the Neural Gas Learning

The adapted learning must take into account the fact that sometimes a data point or observation is closer to the center of one cluster, but since this cluster is already “saturated” by other data points assigned to it and achieved its full size, the point is assigned to another closest cluster, which still has a “free capacity.”

In the first iteration of the learning, the assignment of observations to clusters is not yet ready, so the adaptation of the vectors assigned to the centers remains the same as in the original neural gas algorithm, described by equations (1) and (2).

However, after the mapping step, each observation point is assigned to one of the clusters so that during the adaptation of a center of a cluster we can take the points already assigned to this cluster more seriously than the points assigned to another cluster.

This principle has been applied by changing the index value of in (1). First, we sort the centers by the distance from the selected point in the same way as in the classical neural gas algorithm, but then we change the sequence by moving that center to the front of the sequence to which the currently selected point has been assigned in the previous iteration. The position of this center then changes the most.

In order to provide a most simple illustrative example, we use four points in 1D space placed at positions 0, 1, 2, and 10, which we clustered by the neural gas algorithm into two clusters. Each point during the learning iterations attracts both centers of cluster towards itself. In the beginning, the shifts in the positions are great and the centers shift quite abruptly, but due to the exponents in (2) which gradually change with iterations from 0 to 1, while their base is a positive number smaller than 1, the changes gradually diminish. At the top of the graphs in Figure 1, the two centers with each presented point move substantially to the left or right, but at the end of iterations in the bottom of figures, the proposed centers of clusters change only slightly. It means that with the increasing iteration number the positions of the centers of clusters converge to their final values. For our illustrative example, we use only 80 iterations, which is quite enough for our simple example. The initial value of was set to = 1, its final value was set to = 0.05, and for *λ* these parameters were and . Normally, the number of iterations is set to tens of thousands, and would be much closer to 0 which would make better but lengthier convergence. Obviously, normally the two clusters of points would be and , with positions of cluster centers and , which would produce the minimal Mean Square Error . This convergence of positions of cluster centers with increasing iterations towards values and can be seen in the left part of Figure 1 depicting the convergence of the original neural gas algorithm. The right part of Figure 1 shows a convergence of the adapted neural gas algorithm, where the size of both clusters was set to 2. Therefore, the 4 points were clustered with minimum MSE into clusters and , whose centers would be and .