Computational Intelligence and Neuroscience

Research Article

A Big Data-Driven Approach to Analyze the Influencing Factors of Enterprise’s Technological Innovation

Improved semantic similarity and relatedness-based K-means clustering algorithm.

Input: preprocessed dataset ; the dataset contains N terms , semantic similarity, and relatedness matrix M; the number of cluster K; iteration termination condition ; the maximum number of iterations MaxStep;
Output: K cluster result;
	BEGIN
(1)	start = 0
	k = 0; //initialization
	load dataset and select an initial cluster centre z₁ randomly from saving to the initial cluster centre ;
(2)	Calculate the distance between each sample and the initial point , find the point with the largest distance from according to the equation (4), take the sample point c_i as the second initial cluster centre , and save it to the initial point set ;
(3)	repeat step 2 until the kth initial cluster centre is found;
(4)	according to the , assign each sample to the class of the nearest k initial cluster centres;
(5)	update the centre of each cluster through the mean value , represents the number of sample points in the group;
(6)	the measure function , represents the cluster centre, represents the distance between the jth data point and the lth cluster centre; represents the semantic matrix;
(7)	if the number of iterations reaches MaxStep or satisfies , the iteration is terminated;
	Otherwise, O = O + 1,
	return to step 5 and step 6
(8)	end;