Research Article

A Big Data-Driven Approach to Analyze the Influencing Factors of Enterprise’s Technological Innovation

Algorithm 1

Improved semantic similarity and relatedness-based K-means clustering algorithm.
Input: preprocessed dataset ; the dataset contains N terms , semantic similarity, and relatedness matrix M; the number of cluster K; iteration termination condition ; the maximum number of iterations MaxStep;
Output: K cluster result;
BEGIN
(1) start = 0
k = 0; //initialization
 load dataset and select an initial cluster centre z1 randomly from saving to the initial cluster centre ;
(2) Calculate the distance between each sample and the initial point , find the point with the largest distance from according to the equation (4), take the sample point ci as the second initial cluster centre , and save it to the initial point set ;
(3) repeat step 2 until the kth initial cluster centre is found;
(4) according to the , assign each sample to the class of the nearest k initial cluster centres;
(5) update the centre of each cluster through the mean value , represents the number of sample points in the group;
(6) the measure function , represents the cluster centre, represents the distance between the jth data point and the lth cluster centre; represents the semantic matrix;
(7) if the number of iterations reaches MaxStep or satisfies , the iteration is terminated;
 Otherwise, O = O + 1,
 return to step 5 and step 6
(8) end;