A Big Data-Driven Approach to Analyze the Influencing Factors of Enterprise’s Technological Innovation
Algorithm 1
Improved semantic similarity and relatedness-based K-means clustering algorithm.
Input: preprocessed dataset ; the dataset contains N terms , semantic similarity, and relatedness matrix M; the number of cluster K; iteration termination condition ; the maximum number of iterations MaxStep;
Output: K cluster result;
BEGIN
(1)
start = 0
k = 0; //initialization
load dataset and select an initial cluster centre z1 randomly from saving to the initial cluster centre ;
(2)
Calculate the distance between each sample and the initial point , find the point with the largest distance from according to the equation (4), take the sample point ci as the second initial cluster centre , and save it to the initial point set ;
(3)
repeat step 2 until the kth initial cluster centre is found;
(4)
according to the , assign each sample to the class of the nearest k initial cluster centres;
(5)
update the centre of each cluster through the mean value , represents the number of sample points in the group;
(6)
the measure function , represents the cluster centre, represents the distance between the jth data point and the lth cluster centre; represents the semantic matrix;
(7)
if the number of iterations reaches MaxStep or satisfies , the iteration is terminated;