Security and Communication Networks

Research Article

Gene Sequence Clustering Based on the Profile Hidden Markov Model with Differential Identifiability

Gene sequence clustering algorithm based on DI-PHMM (DI-GSCA).

	Input: Number of clusters , training sequence data , DI parameter and round number of iteration (optional)
	Output: Index of the cluster to which the sequence belongs where
(1)
(2)	for in
(3)	for in
(4)	//Calculate the score of the sequence for each PHMM
(5)
(6)	//Divide the sequence into the corresponding cluster according to the highest score
(7)
(8)	for in
(9)	if ():
(10)
(11)	else
(12)	//The privacy parameter is assigned according to whether the number of iteration rounds is fixed.
(13)	//Construct a new cluster center sub-model
(14)
(15)	//The degree of change of the model from the last iteration (divergence distance)
(16)