input: means that user annotated biomedical entity with correctness |
rate also refer to its RDF graphs; |
, predefined threshold of frequency; |
, group number of correctness rates defined by user |
output: , a set of frequent patterns. |
(1) classify into different sets of , in each set, annotations are all submitted by user ; |
(2) for each |
(2.1) ; |
(2.2) cluster elements in into a set of groups according to with -mean, |
and cluster center is the correctness of the group, for example, is correctness of ; |
(2.3) for each //find frequent patterns for given annotator with given correctness |
(2.3.1) Pattern Path belong to an entity and , here, is count of |
in and is count of all Pattern Paths in ) // set of frequent pattern paths |
(2.3.2) for , can be a Pattern Path or a sub RDF graph. //find frequent conjugate items |
{ for |
{ If and are conjugate and and is the |
conjunct appearance of and in ). Then |
{ merge and into a sub RDF graph , and is the frequency of |
If exists one graph including , then remove from ;} |
(2.3.3) Repeat Step (2.3.2) untill doesn’t change; |
(2.3.4) ; matches a RDF path of )} |
(2.4) ; ; |
(2.5) For any two pattern (), If (), then //merge same pattern with different |
{ remove from ; |
is number of entities matching g in ; is number of |
entities matching in ) } |
(2.6) ; |
(2.7) if ( and ) ; go to (2.2);} |
(3) circularly merge frequent patterns in with Rule 1 and Rule 2 presented in this section until doesn’t change; |
(4) return ; |