Research Article
An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division
Input: | : the index of the cluster, | medi: the list of the samples assigned to the same cluster. | Output: , | : the index of the cluster, | : the sum of the values of the samples belonging to the same cluster and the number of samples. | (1) Construct a counter num_s to record the number of samples in the same cluster; | (2) Construct an array sum_v to record the sum of the values of different dimensions of the samples belonging | to the same cluster (i.e., the samples in the list medi); | (3) Construct the sample examples to extract the data objects from medi.next(), and the dimensions to obtain | the dimension of the original data object; | (4) num_s = 0; | (5) while (medi.hasNext()) do | (6) CurrentPoint = medi.next(); | (7) num_s++; | (8) for to dimensions do | (9) sum_v[]+ = CurrentPoint.point[]; | (10) //Calculate the sum of the values of each dimension of examples | (11) end for | (12) for to dimensions do | (13) mean[] = sum_v[]/num_s; | (14) //Compute the mean value of the samples for each cluster | (15) end for | (16) end while | (17) index = ; | (18) Construct as a string containing the sum of the values of each dimension sum_v[] and | the number of samples num_s; | (19) return pairs; |
|