Granular Computing Classification Algorithms Based on Distance Measures between Granules from the View of Set
Granular computing classification algorithms are proposed based on distance measures between two granules from the view of set. Firstly, granules are represented as the forms of hyperdiamond, hypersphere, hypercube, and hyperbox. Secondly, the distance measure between two granules is defined from the view of set, and the union operator between two granules is formed to obtain the granule set including the granules with different granularity. Thirdly the threshold of granularity determines the union between two granules and is used to form the granular computing classification algorithms based on distance measures (DGrC). The benchmark datasets in UCI Machine Learning Repository are used to verify the performance of DGrC, and experimental results show that DGrC improved the testing accuracies.
Granular computing (GrC) is computing method based on the partition of problem space and is widely used in pattern recognition, information system, and so forth. Zadeh identified three fundamental concepts of the human cognition process, namely, granulation, organization, and causation [1, 2]. Granulation is a process that decomposes a universe into parts. Conversely, organization is a process that integrates parts into a universe by introducing operation between two granules. Causation involves the association of causes and effects. Information granules based on sets, fuzzy sets or relations, and fuzzy relations are computed in . In general, the fuzzy inclusion measure is induced by granule and union granule, such as the positive valuation functions of granules that are used to form the fuzzy inclusion measure [4–6]. But there are some problems; for example, the fuzzy inclusion measure between two atomic granules is zero no matter how far between two atomic granules is. These studies enable us to map the complexities of the world around us into simple theories.
GrC based algebraic system is a frame computing paradigm that regards the set of objects as granule, and the union operator and meet operator are the two keys of GrC. The union operator and meet operator are related to the shapes of granule. There are granules with different shapes, such as hypersphere granules, hypercube granules, hyperdiamond granules, and hyperbox granules.
The present work uses distance measure between granules with the same shapes from the view of set. A granule is represented as a vector, and the distance between granules is defined by the centers of granules and the granularities, such as the half length of hyperdiamond diagonal, the radii of hypersphere, the half length of hypercube side, and the length of hyperbox diagonal. The granular computing classification algorithms based on distance measure (DGrC) are proposed.
The rest of this paper is presented as follows. Granular computing classification algorithm based on distance measure is described in Section 2. Section 3 demonstrates the comparative experimental results on two-class and multiclass problems. Section 4 summarizes the contribution of our work and presents future work plans.
2. Granular Computing Classification Algorithm Based on Distance Measure
For the dataset in -dimensional space, we construct granular computing classification algorithms (GrC) in terms of the following steps. Firstly, the single point in is represented as the atomic granules which are indivisible. Secondly, the distance between two granules is proposed based on the view of set. Thirdly, the distance and granularity determine the union process jointly. Finally, the granule set is obtained and used to predict the class of unknown datum.
2.1. Representation of Granule and Granularity
In reality, the shapes of granules are irregular, the distance between two granules is not easily measured, the union granule, and the meet granule are related to the shapes of granules. In order to study granular computing, the granule is represented as regular shapes, such as hyperdiamond, hypersphere, hypercube, and hyperbox, especially diamond, sphere, cube, and box in 2-dimensional space. These four shape granules are represented as follows.(1)Hyperdiamond granule is represented as a vector including the hyperdiamond’s center and the half of diagonal length.(2)Hypersphere granule is represented as a vector including the center and the radii of the hypersphere.(3)Hypercube granule is represented as a vector including the center and the half of side length of the hypercube.(4)Hyperbox granule is represented as a vector including vectors induced the beginning points and the end points.
Granularity is the size of granule, such as the half of diagonal length of hyperdiamond granule, the radii of hypersphere granule, the half of side of hypercube granule, and the maximal diagonal of hyperbox. The granularity of granule is represented as .
For hyperdiamond granule where is the half of diagonal length of hyperdiamond granule.
For hypersphere granule where is the radii of hypersphere.
For hypercube granule where is the half of side of hypercube.
The granularity of hyperbox granules is defined as the distance between the beginning point and the end point. For hyperbox granule , the granularity is the distance
In Figure 1, is hyperdiamond granule in space, whose center is and granularity is 0.5. is hypersphere granule with center and granularity 0.5. is hypercube granule with center and granularity 0.5. is hyperbox granule with the beginning point , the end point , and the granularity 0.3606. These granules are shown in Figures 1(a), 1(b), 1(c), and 1(d). From Figure 1, we can see that different shape granules have different shapes even if they have the same forms of representations.
2.2. Distance Measure between Granules
The distance between granules refers to the minimal distance between two points which belong to different granules.
For two hyperdiamond granulesand, the distance is where and are the centers of hyperdiamond granules and and and are granularities of hyperdiamond granules and
For two hypersphere granule and , the distance is whereand are the centers of hypersphere granules and and and are granularities of hypersphere granules and
For hypercube granulesand, the distance is where and are the centers of hypercube granules and and and are granularities of hypercube granules and
For two hyperbox and, the distance is where and are the beginning points of hyperbox granules and and and are the end points of hyperbox granules and .
, are operators between two vectors and defined as
According to the distance between two granules mentioned above, the distance between two granules is the arbitrary real number. There is margin between two granules when , there is a same point between two granules when , and there is an overlap between two granules when . When , the greater means the greater margin between two granules, and when , the greater means the smaller overlap. Figure 2 shows the distance between two granules, including , , and .
2.3. Operators between Granules
Any points are regarded as atomic granules which are indivisible; the union process is the key to obtain the larger granules compared with atomic granules. Likewise, the whole space is a granule with the maximal granularity; the decomposition process is the key to divide the lager granules into smaller granules.
For two hyperdiamond granulesand, the union hyperdiamond granule is where ,, , is the vector whose th component is 1, and the other components are 0,
For two hypersphere granules and , the union hypersphere granule is where ,, the vector from to .
For two hypercube granules and , the union hypercube granule is where is the vector with the same length as vector, and all the components are 1.
For two hyperbox granules and , the union hyperbox granule is
We explain the union process between granules in Figure 3 for 2-dimensional space . Two granules and represent two hyperdiamond granules, hypersphere granules, or hypercube granules in 2-dimensional space, the union hyperdiamond granule is , the union hypersphere granule is , and the union hypercube granule is . Suppose two hyperbox granules and in 2-dimensional space, the union hyperbox granule is . These union granules are shown in Figure 3.
2.4. Granular Computing Classification Algorithms Based on Distance between Granules
The granular computing classification algorithms include two algorithms, the first algorithm is the training algorithm and the second algorithm is the testing algorithm.
For training set , the training granular computing classification algorithms are proposed by the following steps. Firstly, the samples are used to form the atomic granule. Secondly, the threshold of granularity is introduced to conditionally unite the atomic granules by the aforementioned union operator, and the granule set is composed of all the union granules. Thirdly, if all atomic granules are included in the granules of , the union process is terminated, otherwise, the second process is continued. The training algorithm is described as follows.
Suppose that the atomic granules with the same class labels induced by are , , , , and . The training algorithm can be described as the following tree structure in Figure 4; leafs denote the atomic granules, root denotes including its child nodes and , is induced by union operation of child nodes and , is the union granule of and , and is the union granule of and . The whole process of obtaining GS is the bottle up process.
The threshold , which is the cut of granularity induced by formulas (1a)–(1d) for the different shapes of granules, is selected in descending order. The larger means the granule set induced by Algorithm 1 including the larger granules, conversely; the smaller means the granule set induced by Algorithm 1 including the smaller granules. For the same training set, the smaller means the induced granule set including more granules compared with the larger .
The purpose of training algorithm is to obtain the granule set and the corresponding class lab, which are used to predict the class label of an unknown datum. The testing data including multiple data and their class labels are used to form the testing set, which is used to verify the performance of granular computing algorithms. If the prediction class labels of the testing data are same as the real class labels, the testing data are classified. Otherwise, the testing data are misclassified. The classification accuracy is one of the performances of granular classification algorithms. The testing algorithm is described as Algorithm 2.
We evaluated the effectiveness of DGrC on both two-class and multiclass problems using Intel PIV PC with 2.8 GHz CPU and 2 GB memory, running Microsoft Windows XP Professional, and Matlab 7.0. We mainly analyze and discuss DGrCs with different shape granules from training accuracy (Tr (%)), testing accuracy (generalization ability) (Ts (%)), training time (Tr (s)), and testing time (Ts (s)).
3.1. Two-Class Problems
The spiral classification is a difficult problem to be classified and is used to evaluate the performance of classifiers. The training data are generated by the method proposed in . The training set and the testing set in reference  are used to evaluate the performance of GrC.
The threshold of granularity is from 0.2 to 0 with step 0.001; the maximal testing accuracy is the selection indicator of optimization algorithms. Performances of GrC with four kinds of shape are listed in Table 1. The training data and their granules were shown in Figure 5 in which the single points are the atomic granules. From the table, we saw that GrC with hypersphere granules achieved the optimization performance because of the minimal size of GS including 88 granules when , GrC with hypercube granules is poor because of maximal size of GS including 99 granules when , and GrC with hyperdiamond granules touched the best testing accuracy firstly. The training time and testing time are related to the size of granule set GS, so the granular computing classification algorithms with the minimal size of granule set are our pursuits in the same conditions for the maximal test accuracy.
3.2. Multiclass Problems
For multiclass problems, datasets listed in Table 2 are selected from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/) to test DGrC. They are wall-following robot navigation data (sensor2, sensor4, and sensor24) which are divided into training data and testing data at random, optical recognition of handwritten digits (optdigits) including training data and testing data, pen-based recognition of handwritten digits (pendigits) including training data and testing data, letter recognition (letter) which is divided into training data and testing data, and shuttle including training data and testing data. These datasets are used to verify the performances of DGrC from the aspects of size, Tr (%), Ts (%), Tr (s), and Ts (s) (see Table 3).
For the selected datasets, the optimal testing accuracies are 98.0769% (sensor2), 90.8691% (sensor4), 83.0220% (sensor24) 97.997% (optdigits), 97.799% (pendigits), 94.765% (letter), and 99.883% (shuttle) by KNN algorithms. We selected the optimal parameters that maximized the testing accuracy. DGrCs with 4 shapes are performed in the same environment, and the performance is listed in Table 3. From the table, we can see, for the optimal testing accuracies, that DGrC is better than KNN. (1) DGrC with hyperdiamond granules achieved the best testing accuracies 92.4092%, 87.8022%, and 99.9448%, which are highlighted by black fonts, for datasets sensor4, sensor24, and shuttle. (2) DGrC with hypersphere granules achieved the optimal testing accuracies 98.4615%, 98.1636%, 94.7953%, and 99.9448% for dataset sensor2, optdigits, letter, and shuttle. (3) DGrC with hyperbox granules achieved the optimal testing accuracy 97.9417% for pendigits.
The granular computing classification algorithms with different shape granules are proposed based on distance measures in the paper. Firstly, a training datum is represented as an atomic granule. Secondly, the distance measure between granules is form based on the centers and granularities of granules. Thirdly, the training process is constructed based on the union operator and the threshold of granularity jointly. Finally, the proposed granular computing classification algorithms are demonstrated by the dataset selected from references. DGrC is affected by the sequence of the training data the same as the other granular computing. For the future work, we will focus on the adaptive selection of threshold of granularities and apply the granular computing to image segmentations.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported in part by the Natural Science Foundation of China (Grant no. 61170202) and Natural Science Foundation of Henan (nos. 132300410421, 132300410422).
L. A. Zadeh, Fuzzy Sets and Information Granulation. Advances in Fuzzy Set Theory and Applications, North Holland, 1979.
L. A. Zadeh, “Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic,” Fuzzy Sets and Systems, vol. 90, no. 2, pp. 111–127, 1997.View at: Google Scholar
L. Wang, X. Liu, and W. Pedrycz, “Effective intervals determined by information granules to improve forecasting in fuzzy time series,” Expert Systems with Applications, vol. 40, no. 14, pp. 5673–5679, 2013.View at: Google Scholar
K. J. Lang and D. J. Witbrock, “Learning to tell two spirals apart,” in Proceedings of Connectionist Models Summer School, pp. 52–59, June 1988.View at: Google Scholar