Abstract
We propose a new clustering method for data in cylindrical coordinates based on the means. The goal of the means family is to maximize an optimization function, which requires a similarity. Thus, we need a new similarity to obtain the new clustering method for data in cylindrical coordinates. In this study, we first derive a new similarity for the new clustering method by assuming a particular probabilistic model. A data point in cylindrical coordinates has radius, azimuth, and height. We assume that the azimuth is sampled from a von Mises distribution and the radius and the height are independently generated from isotropic Gaussian distributions. We derive the new similarity from the log likelihood of the assumed probability distribution. Our experiments demonstrate that the proposed method using the new similarity can appropriately partition synthetic data defined in cylindrical coordinates. Furthermore, we apply the proposed method to color image quantization and show that the methods successfully quantize a color image with respect to the hue element.
1. Introduction
Clustering is an important technique in many areas such as data analysis, data visualization, image processing, and pattern recognition. The most popular and useful clustering method is the means. The means uses the Euclidean distance as coefficient and partitions data to clusters. The Euclidean distance is a reasonable measurement for data sampled from an isotropic Gaussian distribution. We cannot always obtain a good clustering result using the means because not all data distributions are isotropic Gaussian distributions.
The present study focuses on data in cylindrical coordinates. Data in cylindrical coordinates have a periodic element, so clustering methods using the Euclidean distance will lead to an improper analysis of the data. Furthermore, a clustering method using the Euclidean distance may not be able to extract meaningful centroids. For example, if a distribution in cylindrical coordinates is remarkably curved crescentshape, the centroid of the distribution calculated by the means may not be on the data distribution. However, there are no clustering methods optimized for data in cylindrical coordinates.
The cylindrical data are found in many fields such as image processing, meteorology, and biology. Movements of plants and animals and wind direction with another environmental measure are typical examples of cylindrical data [1]. The most popular example of data in cylindrical coordinates is color defined in the HSV color model. The HSV color has three attributes that are hue (direction), saturation (radius), and value that means brightness (height). The HSV color model can represent hue information and has a more natural correspondence to human vision than the RGB color model [2]. The clustering method for cylindrical coordinates is useful for many fields, especially image processing.
The purpose of this study is to develop a new clustering method for data in cylindrical coordinates based on the means. We first derive a new similarity for clustering data in cylindrical coordinates assuming that the data are sampled from a probabilistic model that is the product of a von Mises distribution and Gaussian distributions. We propose a new clustering method with this new similarity for data in cylindrical coordinates. Using numerical experiments, we demonstrate that the proposed method can partition synthetic data. Furthermore, we evaluate the performance of the proposed method for real world data. Finally, we apply the proposed method to color image quantization and demonstrate that it can quantize a color image according to the hue.
2. Related Works
The most commonly used clustering method is the means [3], which is one of the top 10 most common algorithms used in data mining [4]. We have applied the means to various fields because it is fast, simple, and easy to understand. It uses the Euclidean distance as a clustering criterion and assumes that the data is sampled from a mixture of isotropic Gaussian distributions. Thus, we can apply the means to data sampled from a mixture of isotropic Gaussian distributions, but the means is not appropriate for data generated from other distributions. Data in cylindrical coordinates have periodic characteristics, so the means will be inappropriate as a clustering method for the data.
We can cluster periodic data distributed on an dimensional sphere surface using the spherical means (skmeans). Dhillon and Modha [5] and Banerjee et al. [6, 7] have developed the skmeans for clustering high dimensional text data. It is a means based method that uses cosine similarity as the criterion for clustering. The skmeans assumes that the data are sampled from a mixture of von MisesFisher distributions with the same concentrate parameters and the same mixture weights. However, we cannot apply the skmeans to data that have direction, radius, and height. To appropriately partition these data, we need a different nonlinear separation method.
There are many methods for achieving nonlinear separation. One method is the kernel means [8], which partitions the data points in a higherdimensional feature space after they are mapped to the feature space using a nonlinear function. The spectral clustering [9] is another popular modern nonlinear clustering method, which uses the eigenvectors of a similarity (kernel) matrix to partition data points. The support vector clustering [10] is inspired by the support vector machine [11]. These nonlinear clustering methods based on the kernel method can provide reasonable clustering results for nonGaussian data. However, these methods can hardly provide significant statistics because they perform the clustering in a feature space. This is a problem when we also want to determine some features of data, such as color image quantization. Furthermore, we must experimentally select the optimal kernel functions and its parameters.
Clustering methods are frequently used for color image quantization. Color image quantization reduces the number of colors in an image and plays an important role in applications such as image segmentation [12], image compression [13], and color feature extraction [14]. A color quantization technique consists of two stages: the palette design stage and the pixel mapping stage. These stages can be, respectively, regarded as calculating the centroids and assigning a data point to a cluster. Many researchers have developed color quantization methods including median cut [15], the means [16], the fuzzy means [17, 18], the selforganizing maps [19–21], and the particle swarm optimization [22]. However, generally, color quantization is performed in the RGB color space although HSV color space is rarely adopted.
3. Methodology
3.1. Assumed Probabilistic Distribution
A data point in cylindrical coordinates, , is represented by with and , where are called the radius, azimuth, and height, respectively. In this study, we represent the azimuth as a unit vector to simply calculate the cosine similarity. Here, each element of is assumed to be independent and identically distributed. Let a data point in cylindrical coordinates, , be generated by a probability density function (pdf) of the formwhere and and are pdfs of a von Mises distribution and an isotropic Gaussian distribution, respectively. A pdf of a von Mises distribution has the formwhere is the mean of the azimuth with , is the concentrate parameter, and is the modified Bessel function of the first kind (order 0). A pdf of an isotropic Gaussian distribution has the formwhere is the mean and is the variance. and are the means of the radius and height, respectively. and are the variances of radius and height, respectively. Thus, the density can be written as
We can estimate the parameters of density using maximum likelihood estimation. Let data set be generated from density . The log likelihood function of is Maximizing this equation subject to , we find the maximum likelihood estimates , , , , , and obtained fromwhere is
It is difficult to estimate the concentrate parameter , because an analytic solution cannot be obtained using the maximum likelihood estimate and we can only calculate the ratio of the Bessel functions. We approximate using the numerical method proposed by Sra [23], because it produces the most accurate estimates for (compared to other methods).
We estimate using the recursive functionwhere is the iteration number. The recursive calculations terminate when . In this study, . We calculate using the method proposed by Banerjee et al. [6]. is
3.2. Cylindrical Means
The means family uses a particular similarity to decide whether a data point belongs to a cluster. The Euclidean distance (dissimilarity) is most frequently used by the means family, and, moreover, is derived using the log likelihood of an isotropic Gaussian distribution. Therefore, the means using the Euclidean distance will be able to appropriately partition data sampled from isotropic Gaussian distributions but not other distributions. We must develop a new similarity for data in cylindrical coordinates because the means family clusters by maximizing the sum of similarities between a centroid of a cluster and data points that belong to the cluster. In this study, we obtain the optimal similarity for partitioning data in cylindrical coordinates from an assumed pdf.
First, to develop a means based method for data in cylindrical coordinates (cylindrical means; cykmeans), we obtain a new similarity measure for data in cylindrical coordinates by assuming a probability distribution. Assume that a data point in a cluster that has a centroid is sampled from the probability distribution denoted by (4) where . The natural logarithm of iswhere is a normalizing constant given byHere, we ignore the normalizing constant to obtain In this study, this equation is used as a similarity for the cykmeans. denotes the similarity between the data point and the centroid . The terms in (12) consist of the cosine similarity and the Euclidean similarities, and the new similarity is a sum of these similarities weighed. The weights indicate the concentrations of distributions. This similarity can also be considered as a simplified log likelihood.
The cykmeans partitions data points in cylindrical coordinates into clusters using the procedure same as the means. Let be a set of data points in cylindrical coordinates. Let be the centroid of the th cluster. Using the similarity , the objective function is where is a binary indicator value. If the th data point belongs to the th cluster, . Otherwise, . The aim of the cykmeans is to maximize the objective function . The process to maximize the objective function is the same as that of the means and is described as follows. (1)Fix and initialize .(2)Assign each data point to the cluster that has the most similar centroid.(3)Estimate parameters of clusters.(4)Return to Step if the cluster assignment of data points changes or the difference in the values of the optimal function from the current and last iteration is more than a threshold . Otherwise, terminate the procedure.In this study, we use where is the objective function of the th iteration. Algorithm 1 shows the details of the algorithm of the cykmeans. From (6), the elements of the centroid vector, , of the th cluster arewhere is the number of data points in the th cluster (which has the form ). The other values used to calculate the objective function are is approximated by Sra’s method using the ratio of the Bessel function .

The cykmeans method has many parameters. The means method for data in threedimensional Cartesian coordinates has only parameters, which are multiples of the number of centroid vectors and dimensions. However, the cykmeans has parameters, which are multiples of the number of clusters and the number of parameters of a cluster. The parameters of the th cluster are , (two dimensions), , , , and . Because the cykmeans has more degrees of freedom, the dead unit problem (i.e., empty clusters) will frequently occur if the initial is not optimal.
3.3. Fixed cykMeans
Model based clustering methods have various problems such as the dead units and initial value problems. One reason for this is that the log likelihood equation can have many local optima [9]. If a model has more parameters, these problems tend to be more frequent. In the fixed cykmeans, the concentrate parameter and the variances s are fixed for particular values. As a consequence, the fixed cykmeans has parameters. Fixing the parameters decreases the complexity of the model and makes these problems less. Algorithm 2 indicates the fixed cykmeans algorithm.

3.4. Computational Complexity
Assigning data points to clusters has a complexity of per iteration. We must estimate six parameters. We obtain three s, two s, and in time per iteration, where is the convergence time of . Therefore, the total computational complexity per iteration is . The complexity of the fixed cykmeans is per iteration, so the cykmeans is approximately 1.5 times as complex as the fixed cykmeans.
4. Experimental Results
In our experiments, we use Python and its libraries (NumPy, SciPy, and scikitlearn) to implement the proposed method.
4.1. Synthetic Data
In this subsection, we demonstrate that the cykmeans and the fixed cykmeans can partition synthetic data that is defined using cylindrical coordinates. The dataset used in this experience has three clusters, as shown in Figure 1(a). The data points in each cluster are generated from the probability distribution denoted by (4), with the parameters shown in Table 1. Figures 1(b), 1(c), and 1(d) show the clustering results of the cykmeans, the fixed cykmeans with , , and , and the means, respectively. We can see that the cykmeans and the fixed cykmeans properly partition the dataset into each cluster. On the other hand, the means regards two upper right clusters as one cluster and unsuccessfully partitions the dataset. Table 2 shows the parameters estimated by the cykmeans, the fixed cykmeans, and the means. The cykmeans can only estimate the concentrate parameters and the variances. The values of the concentrate parameters and the variances estimated by the cykmeans are approximate to the true values. The cykmeans most appropriately estimates the number of data points in each cluster. The fixed cykmeans most approximately estimates the all means and the cykmeans also approximately calculates the all means. These results show that the cykmeans and the fixed cykmeans sufficiently approximately estimate the all means.
(a)
(b)
(c)
(d)
In the next experiment, we examine the effectiveness of the proposed methods (the cykmeans and the fixed cykmeans with , , and ) compared to the means and the kernel means with a radial basis function. The parameter of the radial basis function is . The synthetic data have clusters and are defined in cylindrical coordinates. The number of data points in each cluster is 200. The mean azimuth of the th cluster is a random number in . The concentrate parameter is a random number in . The mean radius of the th cluster is a random number in . The mean height of the th cluster is a random number in . The standard deviations of and are random numbers in .
Figure 2 shows the relationship between the number of clusters and adjusted rand index (ARI). ARI evaluates the performance of clustering algorithms [24]. When ARI = 1, all data points belong to true clusters. The figure shows that the cykmeans has the largest ARI for almost all cases. The fixed cykmeans performs better than the kernel means and the means. The means performs the worst. In conclusion, the cykmeans most accurately partitions synthetic data defined in cylindrical coordinates, and the fixed cykmeans also performs well.
4.2. Real World Data
We show the performances of the proposed methods for the iris dataset (http://mlearn.ics.uci.edu/databases/iris/) and the segmentation benchmark dataset (http://www.ntu.edu.sg/home/asjfcai/Benchmark_Website/benchmark_index.html) [25]. The iris dataset has 150 data points of three classes of irises. The data point consists of the four attributes, sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm. The segmentation benchmark dataset consists of 100 images from the Berkeley segmentation database [26] and groundtruths generated by manual labeling.
Table 3 depicts the ARI scores of the cykmeans, the fixed cykmeans, the means, and the kernel means for the iris dataset. The parameters of the fixed cykmeans are , , and . of the radial basis function of the kernel means is 0.01. In this experiment, we use only three attributes of the iris dataset because the proposed methods are specialized for 3dimensional data. Furthermore, we transform this dataset that has three attributes into zero mean dataset. In all cases, the performance of the cykmeans is lower than the other methods. Conversely, in almost all cases, the performance of the fixed cykmeans is the best. However, the difference in the performance between the fixed cykmeans, the means, and the kernel means is not large.
Table 4 shows the ARI scores of the cykmeans, the fixed cykmeans, and the kmean for seven images in the segmentation benchmark dataset. The parameters of the fixed cykmeans are , , and . To evaluate the performances of the cykmeans and the fixed cykmeans, we convert images from RGB color to HSV color. When we cluster the dataset by the means, we use images represented by RGB color and HSV color. In this experiment, we compare a clustering result with a ground truth using the ARI score. We set the number of clusters to the number of segments in a ground truth. In all cases, the fixed cykmeans stably shows good performance. The cykmeans indicates much better or worse performances than the other methods. In other words, the cykmeans shows unstable performance. This instability will be caused by the cykmeans more easily trapping a local minimum because of more parameters.
4.3. Application to Color Image Quantization
We apply the cykmeans and the fixed cykmeans to color image quantization and compare the results to those using the means. We convert images quantized by the proposed methods from RGB color space to HSV color space before quantization, whereas an image processed by the means is represented using RGB. Figure 3 contains the four test images from the Berkeley segmentation database [26] and their quantization results. The original color images have sizes of or and are used as the test images to quantize into three colors. These quantization results are generated by the cykmeans, the fixed cykmeans with , , and , and the means. The color of a pixel in the quantized image represents the value of the centroid of the cluster that contains the pixel.
For image 118035 in Figure 3, the colors of the background, the wall, and the roof are obviously different from each other. The cykmeans and the fixed cykmeans successfully segment this image, whereas the means extracts the shade from the wall and can not merge the wall to one color. Furthermore, the quantization results using the cykmeans and the fixed cykmeans are very similar.
Image 26098 consists of red and green peppers on a display table. The cykmeans merges the red peppers and the planks of the display table and divides the dark area into two colors. The fixed cykmeans successfully extracts the red peppers. The means assigns red to the planks and part of the green peppers.
Image 299091 consists of some sky with cloud, an ocher pyramid, and ocher ground. The cykmeans groups the ocher pyramid and white cloud into the same color, whereas the fixed cykmeans correctly segments the pyramid and the sky. The means is unsuccessful; it divides the pyramid into three regions (an ocher region, a highlight region, and a shade region).
The cykmeans did not perform well for image 295087. It segments the image into two colors even though we set the number of clusters to three. Thus, the cykmeans makes a dead unit. This is because the concentrate parameter and variances, respectively, become small and large if a distribution of data points is regarded to visually consist of a few clusters. Thus, a few clusters include all data points and dead units (empty clusters) appear, even if we fix the number of clusters to a large number. In contrast, the fixed cykmeans (which has fixed concentrate and variance values) appropriately partitions the ground and the blue and the deep blue regions of the sky. The means extracts shaded regions from the ground; that is, it can not group the ground into one region.
Furthermore, the initial parameters, and s, of the fixed cykmeans can control the quantization results. Figure 4 shows the quantization results generated by the fixed cykmeans using the different parameters. The original image in Figure 4 consists of two objects: the red fish and the arms of an anemone. The fixed cykmeans with , , and can not extract the red fish shown in the middle image of Figure 4. However, the fixed cykmeans with , , and extracts the red fish in the left image of Figure 4. This is because a large and/or large variances relatively increase the cosine similarity term of (12), and consequently clustering is more focused on the hue element.
In conclusion, the fixed cykmeans is a more suitable method for color image quantization than the cykmeans. The fixed cykmeans quantizes color images with respect to the hue. The quantization results of the fixed cykmeans differ from that generated by means. That is because the Euclidean metric cannot consider the hue.
5. Conclusion and Discussion
In this study, we develop the cykmeans and the fixed cykmeans methods, which are new clustering methods for data in cylindrical coordinates. We derive a new similarity for the cykmeans from a probability distribution that is the product of a von Mises distribution and two Gaussian distributions (see (4)), because the Euclidean distance cannot properly represent dissimilarities between data points on periodic axes. Our experiments demonstrate that the cykmeans and the fixed cykmeans can properly partition synthetic data in cylindrical coordinates. Furthermore, the experimental results using real world data show that the fixed cykmeans has equal or better performance than the means and the kernel means. In the final experiment, the proposed methods are applied to color image quantization and successfully quantize a color image with respect to the hue element.
The experiments that partitioned synthetic data demonstrate the effectiveness of the cykmeans. In the first experimental results, the cykmeans produces good estimates of the parameters and clustering data. The results of the second experiment show that the cykmeans performs the best when clustering synthetic data. However, in the experiment using real world data we find that the cykmeans did not provide good clustering results. Furthermore, the results of the color image quantization suggest that the flexibility of the cykmeans often produces dead units or a small cluster containing few data points. Thus, the cykmeans may not be appropriate for actual applications.
The fixed cykmeans will be an effective method for actual applications. The fixed cykmeans is stable and performs well when we apply it to clusterings of synthetic data, real world data, and color image quantization. Furthermore, the fixed cykmeans hardly makes dead units because the number of its parameters is smaller than the cykmeans. The fixed cykmeans requires less computational time than the cykmeans with similar results.
In future work, we will improve the performance of the proposed methods. The proposed methods are exposed to the illinitialization problem and/or the dead unit problem caused by an incorrect initialization, similar to means. The means++ method proposed by Athur and Vassilvitskii [27] solves the illinitialization problem of means and improves the clustering performance by obtaining an initial set of cluster centers that is close to the optimal solution. The conscience mechanism improves the performance of competitive learning and clustering algorithms [28–30]. It inserts a bias into the competition process so that each unit can win the competition with equal probability. Xu et al. [31] proposed an algorithm based on competitive learning called rival penalized competitive learning [2, 32], which determines the appropriate number of clusters and solves the dead unit problem. The strategy of rival penalized competitive learning is to adapt the weights of the winning unit to the input and to unlearn the weights of the 2nd winner. By incorporating the approaches in these algorithms into the proposed methods, we will improve the performance and reduce the effect of the intrinsic problems.
Conflicts of Interest
The author declares that there are no conflicts of interest regarding the publication of this article.
Acknowledgments
The author would like to thank Toya Teramoto of the University of ElectroCommunications for testing their algorithm.