#### Abstract

Circular histogram represents the statistical distribution of circular data; the H component histogram of HSI color model is a typical example of the circular histogram. When using H component to segment color image, a feasible way is to transform the circular histogram into a linear histogram, and then, the mature gray image thresholding methods are used on the linear histogram to select the threshold value. Thus, the reasonable selection of the breakpoint on circular histogram to linearize the circular histogram is the key. In this paper, based on the angles mean on circular histogram and the line mean on linear histogram, a simple breakpoint selection criterion is proposed, and the suitable range of this method is analyzed. Compared with the existing breakpoint selection criteria based on Lorenz curve and cumulative distribution entropy, the proposed method has the advantages of simple expression and less calculation and does not depend on the direction of rotation.

#### 1. Introduction

The data obtained from actual observations can be expressed in various measurement spaces, and the angles’ space showing the angle change is one of the measurement spaces. Angles’ space data processing belongs to the branch of the discipline of statistics: direction (circular) statistics [1, 2]. Angle-based data are called direction data, and angles are commonly expressed as unit vectors. Different from the measurement based on the scale, the direction data have inherent periodic (cyclic) characteristics, which make the direction data have many unique and novel characteristics in modeling and statistical processing.

The data that show the angle change of a single variable is called circular data [2], and one of its visual display methods is a point on the unit circle or a unit vector on a plane. A typical example of circular data is the H component in the HSI color model of color image [3]. The HSI color model is a mathematical image model proposed by the American colorist H. A. Munsell in 1915; it uses H (hue), S (saturation), and I (intensity) to describe color characteristics. The HSI color model is different from the commonly used RGB color models. The three components R(red), G(green), and B(blue) of RGB color model are linearly dependent, but the three components H, S, and I of the HSI color model are linearly independent. Since the HSI color model has a good capability of representing the colors of human perception, the color image segmentation in HSI color space has achieved good results [410].

As a typical example of circular data, Hue(H) represents the basic colors of the image ; it can be expressed as a circular histogram, due to its periodicity. At first, some scholars used the hue histogram to segment the color image without considering its periodicity [11, 12]. Tseng et al. [13] proposed the thresholding method of circular histogram for color image segmentation for the first time in order not to lose the periodicity of the hue. Wu et al. [14] gave an iterative Otsu’s algorithm based on the circular histogram, but this method is not the optimal method and cannot guarantee the convergence of the algorithm. Dimov et al. [15] gave the method of optimal thresholding and multithresholding of circular histogram through the symmetry constraint of threshold point pairs, but the calculation is very complicated. Utilizing the cyclic characteristic of a circular histogram, Lai and Posin [16] theoretically analyzed that when the circular histogram is expanded into a linear histogram and the Otsu method is adopted, only half of the points on the circle need to be searched to obtain the optimal threshold point pair, which successfully reduces the time complexity from to . However, this method is not general and can only be applied to two-class threshold.

Lai and Posin’s research [16] shows that it is a feasible way that first expand the circular histogram into a linear histogram and then use the mature linear histogram threshold methods (such as Otsu’s [17], fuzzy entropy [18], and context sensitive [19] thresholding techniques) to obtain the threshold of a circular histogram. How to choose a suitable breakpoint to linearize the circular histogram becomes a key. For this reason, we have proposed two breakpoint selection criteria. One is the criterion based on the Lorenz curve [20]; we discussed the relation between the area difference and the expansion direction and gave the optimal breakpoint selection criterion in the anticlockwise or clockwise direction. The other is the criterion based on the cumulative distribution entropy [21], and we built a circular histogram expansion model based on the cumulative distribution entropy and discussed the optimal breakpoint selection criteria under different expansion directions. These two circular histogram expansion methods overcome the randomness of breakpoint selection. However, the computational complexity of the Lorenz curve and cumulative distribution entropy is relatively high and makes the selection of the optimal breakpoint spend much time.

Circular statistics, as a particular branch, generally deals with data composed of angles or directions. Due to the obvious periodicity of this data, it is necessary to distinguish between circular data and linear data. In circular statistics, angle mean is used to represent the average angle of a set of data on a circle; it is a circular statistical invariant on the circular histogram; it does not change with the rotation of the circular histogram. On the contrary, line mean represents the average of a linear set of data and is a linear statistical invariant on the linear histogram. Since the angle mean is a circular statistical invariant, the line mean is a linear statistical invariant; in view of this, this paper proposes a simple breakpoint selection criterion to minimize the distance between the angle mean on the circle and the point on the circle corresponding to the line mean of the expanded linear histogram. The proposed criterion can quickly and reasonably find the breakpoint that keeps the distribution unchanged after the circular histogram is expanded.

This paper proposes a fast method for breakpoint selection in circular histogram, which solves the problem of low efficiency in expanding circular histogram into linear histogram. It is organized as follows. Section 2 describes the angle mean of the circular histogram and the line mean of the linear histogram and gives the optimal breakpoint selection criterion. In Section 3, the suitable range of the proposed method is given by comparing it with the optimal breakpoint selection criteria of the Lorentz curve based and cumulative distribution entropy based. Section 4 summarizes the paper.

#### 2. Criteria for Selection of Breakpoint in Circular Histogram

In this section, we use the H component histogram of the HSI color model to explain the circular histogram. Figure 1(a) shows the H component diagram in the HSI color model. The H component represents the periodic change of color in the anticlockwise direction. For example, red is 0, green is , and blue is . Taking into account the periodic changes of the H component, a circular histogram is used to represent the statistical distribution of the H component (Figure 1(b)).

When we use H component to realize color image segmentation, a feasible approach is to transform the circular histogram into a linear histogram, and then, we use the threshold segmentation methods on the linear histogram to select the threshold. The distribution information carried by different linearized histograms produced by the same circular histogram at different cutting points may be different. Figure 2 shows the result of the circular histogram (Figure 1(b)) expanded at two different points. Although they are derived from the same circular distribution, their linearized distributions are not similar. In order to keep the distribution of the linearized histogram as consistent as possible with the distribution of the circular histogram, a new breakpoint selection method is given below.

##### 2.1. Angles Mean and Linearized Mean of Circular Histogram

For a circular histogram with L points , is the frequency of point on the circle and is the corresponding angle. The trigonometric moments on circular histogram are

The angles mean on circular histogram is defined aswhere .

The average direction given in definition (2) is a statistic that describes the position state characteristics of the circular histogram. It does not depend on the starting point and the rotation direction, reflecting the center of the circular histogram [1, 2]. The red line in Figure 3 represents the angles mean of the circular histogram. To show more clearly, Figure 3 uses the rose diagram to illustrate the circular histogram.

Suppose the circular histogram (Figure 1(b)) is expanded into a linear histogram (Figure 4) in the anticlockwise direction at the breakpoint , where . The line mean on linear histogram is can be obtained as

The corresponding point of the line mean on the circular histogram is formulated as

##### 2.2. Breakpoint Selection Criteria

The goal of linearizing the circular histogram is to be able to maintain the complete original distribution. To find the optimal breakpoint, considering the angles mean is a circular invariant on a circular histogram and is a linear invariant of the linear histogram expanded at the breakpoint , it is hoped that the point on the circle corresponding to and are as close as possible so that the linear histogram expanded by the breakpoint can retain more original information of the circular histogram distribution.

and are points on the circle. Because of periodicity, the distance between them is different from the Euclidean distance, and more attention is paid to the difference in the direction of the two values. The cosine value of the angle between them can be used to measure the difference in the direction of them. The distance between and can be measured by the cosine of the angles [1, 2] and expressed as

The value of is only related to the angles of and , . When the two angles are the same, ; when the directions of two angles are opposite, .

The mean-based selection criterion for the optimal breakpoint is

Obviously, when the circular histogram expands in the clockwise direction, the corresponding value of the line mean on the circular histogram is the same as the value obtained in equation (4). Therefore, the method in this paper is unrelated to the expansion direction of the circular histogram.

It is important to emphasize that the idea of mean-based breakpoint selection criterion is different from the existing Lorenz curve-based and cumulative distribution entropy-based breakpoint selection criteria [20, 21]. The mean-based method uses the invariants of circular statistics and linear statistics. Lorenz curve-based and cumulative distribution entropy-based methods, using the cumulative distribution information of each linearized histogram, are related to the counterclockwise or clockwise direction of rotation.

The algorithm of breakpoint selection on circular histogram is very simple and easy to implement. The algorithm of breakpoint selection with mean is illustrated in Algorithm 1.

 Input: H-histogram Hist, Hue magnitude Output: The optimal breakpoint Then calculate the distance with Hist according to equation (5) for magnitude do Rotate historium to the right or left by Calculate the distance with Hist according equation (5) if then end if end for return

#### 3. Experiment Results and Analysis

The experiment is divided into two parts to evaluate the proposed method. The experiments are performed using Python3.8 on a PC with Intel Core 2.50 GHZ CPU and 8 GB RAM, under Windows 10 operating system. In circular models, the von Mises distribution (also known as the circular normal distribution) is the most important distribution. The status is equivalent to the normal distribution in the linear distribution. Many theories with applications in the circular statistics are often discussed for the von Mises distribution [1, 2]. Therefore, the first part shows the results of selecting breakpoint for different types of artificial von Mises distributions and discusses the influence of parameters (the mean direction) and (the concentration parameter) of the bimodal von Mises distribution [1, 2] on the proposed method. In the second part, the proposed mean-based breakpoint selection criterion is compared with the existing breakpoint selection criteria, including the breakpoint selection method based on the Lorenz curve [20] (Lorenz-based), cumulative distribution entropy [21] (CDFE-based), and artificial bee colony [22] (ABC-based) on the H component circular histogram corresponding to 8 images from the Berkeley dataset. For convenience, the quantitative level of the H component in the experimental part is 360.

##### 3.1. Artificial Circular Histograms

We assume that the target and background in the circular histogram are distributed as the ideal von Mises distribution. The linearization effect of the mixture bimodal circular histogram composed of different parameters and is analyzed.

In Figure 5, the distribution of circular histogram (a) is a mixture of von Mises distributions and . The distribution of circular histogram (b) is a mixture of the von Mises distributions and . The distribution of circular histogram (c) is a mixture of the von Mises distributions and . The linear histograms (d)-(f) show the linearization results of Figure (a)-(c) with the mean-based method, respectively.

As demonstrated in Figure 5(d)5(f), the expansion effect of and is better than . The linearized histogram corresponding to and can maintain the original distribution. The linearized histogram corresponding to failed to completely retain the original distribution.

To more specifically illustrate the linearization effect of the mean-based method on the mixture distribution of the same , Figure 6 shows the relation between the percentage of the broken distribution (see the red box in Figure 5(f)) and the mean direction difference . Due to the symmetry of the circle, the mean direction difference only is selected from 0 to 180. The effect of the rest is equivalent to its symmetrical part. When the mean direction difference is closer to 180, the proportion of the broken distribution will suddenly increase, but the maximum will not exceed 0.5%. When the mean direction difference is less than 150, the linearization effect is similar and better.

Similarly, Figures 79 show the relation between the percentage of the broken distribution and the mean direction difference when and are the combinations of (5, 10), (10, 15), and (10, 20), respectively.

The maximum percentage of broken distribution is 20% in Figure 7, and it is 14% in Figure 8, which shows the lower the overall concentration parameter, the worse the overall linearization effect.

In Figure 9, the maximum percentage of broken distribution is 16%. Figures 8 and 9 show the small difference in the concentration parameter of the two distributions is conducive to the linearization of the circular histogram.

In Figures 79, the percentage of broken distribution in different concentration parameters is positively related to the mean direction difference, and it increases exponentially around 180. This exponential increase greatly reduces the linearization effect near 180.

In summary, from the results of Figures 69, it can be seen that the mean-based method is suitable for situations where the target is not far from the background in the circular histogram. The distance between the mean direction should generally not exceed .

##### 3.2. Real Circular Histograms

To further illustrate the scope and effect of the mean-based breaking method, the H component circular histogram corresponding to the 8 color images in the Berkeley dataset is selected for breakpoint selection and compared with the Lorenz-based [20], CDFE-based [21], and ABC-based [22] breakpoint selection criteria. The linearization result of 8 images can be seen in Figures 1017.

The variance and kurtosis have also been computed to fully compare the effects of 4 algorithms. Tables 1 and 2 depict the variance and kurtosis using the existing Lorenz-based, CDFE-based, ABC-based, and proposed mean-based histogram techniques. Equation (7) defines the calculation formula of variance. Variance represents the discrete trend of data distribution. When the data distribution is relatively scattered, the variance is large, and when the data distribution is relatively concentrated, the variance is small. Equation (8) defines the calculation formula of kurtosis. The lower limit of kurtosis will not be lower than 1, and the upper limit will not be higher than the number of data. The greater the kurtosis, the steeper the distribution:where is variance, is kurtosis, is the value of the variable , is the average of the variable , is the number of the variable , and is the probability of the value of the variable .

As we can see from Figures 10 and 11, when the distribution is unimodal (bimodal coincidence), the breaking results of Lorenz-based, CDFE-based, ABC-based, and mean-based methods are appropriate, which guarantees the integrity of the distribution.

It can be seen from the H component histograms in Figures 12 and 13 that the distance between the centers of the two distributions of the circular histogram is small. In terms of maintaining the integrity of circular distribution, the CDFE-based method and the mean-based method shows the better effect. The variance and kurtosis of the mean method is the best in Tables 1 and 2.

Comparing the results of Figures 14(c)14(f), the CDFE-based method shows the best effect and is the Lorenz-based method. The distance between the centers of the two distributions of the circular histogram (Figure 14(b)) is about 160. The mean-based result from Figure 14(f) shows the small distribution is broken by the breakpoint. This is consistent with the conclusion obtained by the artificial circular histogram analysis. When the centers of the two distributions are far apart, the linearization effect of the mean-based method will deteriorate.

From Figure 15, we can see the linearization effect of the CDFE-based and mean-based methods is better than that of the Lorenz-based and ABC-based methods. The linearization result using the Lorenz-based method does not maintain the integrity of the circular distribution; a small part becomes the right part of the linearized histogram. The linearization result using the ABC-based method completely destroys the distribution.

For complex distributions in Figures 16 and 17 that most color types appear, the frequency is different. It can be seen from Tables 12 that the mean-based method has a slight advantage over the ABC-based method, the CDFE-based method, and the Lorenz-based method in variance and kurtosis.

On the whole, from the linearization results of the 8 circular histograms, the mean-based method is superior to CDFE-based, Lorenz-based, and ABC-based methods, when the difference between the center positions of the target and the background is not particularly large. The mean-based method is effective in suitable scenarios. Judging from the average of metrics of the 8 images shown in Tables 1 and 2, the mean-based method is the best among the four methods.

Table 3 shows the time spent on the linearization of the above 8 images by the Lorenz-based, CDFE-based, ABC-based, and mean-based methods, respectively. The mean-based method has a great advantage in speed. Compared with the Lorenz-based method, on an average, it can save about 7 times of time. Compared with the CDFE-based and ABC-based methods, the improvement is even greater, on an average, shortening about 71 times the time.

#### 4. Conclusions

For the linearization of circular histograms, we propose a new method to select breakpoint. The new method uses a simple mean operation to give the optimal breakpoint selection criterion. We discuss the applicable scenarios of this breakpoint selection criterion. Experiments show that the new method can guarantee the linearization effect of the circular histogram in suitable scenarios, reduce the computational complexity of the breakpoint selection, and provide a better way for the linearization of the circular histogram. In future, we will explore a new breakpoint selection method based on the mean-based method in this paper, which have a better linearization effect when the centers of the two distributions are far apart.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 62071378, 62071379, 62071380, and 61901365) and “New Star Team of Xi’an University of Posts and Telecommunications” (no. xyt2016-01).