Research Article

Gene Sequence Clustering Based on the Profile Hidden Markov Model with Differential Identifiability

Algorithm 2

Gene sequence clustering algorithm based on DI-PHMM (DI-GSCA).
ā€‰Input: Number of clusters , training sequence data , DI parameter and round number of iteration (optional)
ā€‰Output: Index of the cluster to which the sequence belongs where
(1)
(2)for in
(3)for in
(4)//Calculate the score of the sequence for each PHMM
(5)
(6)//Divide the sequence into the corresponding cluster according to the highest score
(7)
(8)for in
(9)if ():
(10)
(11)else
(12) //The privacy parameter is assigned according to whether the number of iteration rounds is fixed.
(13)//Construct a new cluster center sub-model
(14)
(15)//The degree of change of the model from the last iteration (divergence distance)
(16)