Research Article

A Novel Approach to Word Sense Disambiguation Based on Topical and Semantic Association

Algorithm 1

Leveraging Topic Discriminative Term for Topic Identification.
Input:
 (i) The candidate topic discriminative terms of a given document.
Output:
 (ii) The set of topical clusters with topical describing information.
Procedure:
 (1) Computing the re-occurrences topic span for each candidate topic discriminative term.
 (2) Ordering by the length of re-occurrences topic span for each candidate topic discriminative term.
 (3) repeat
 (4)  Proceeding iteratively, starting with the longest re-occurrences topic span of TDT, ending with
 the shortest and last TDT.
 (5) Judging the overlap relationship of arbitrary two TDT’s [ , ].
 (6) if (Two topic span intervals of TDT are intersecting)
 (7)  Two vertices of corresponding topic span intervals of TDT will be directly connected.
 (8) else
 (9)  Two vertices will respectively belong to different topic span intervals, namely belong to
 different subgraphs.
 (10) until all independent subgraphs are constructed;
 (11) Labeling each subgraph for , where .
 (12) All independent subgraphs will be connected with the vertices that are directly proximal
 in the corresponding TDT’s topic span, so a whole topical graph is constructed.
 (13) repeat
 (14) if (there exists one TDT which is monosemous)
 (15)   Select the higher weight one as initial vertex.
 (16)  Iteratively calculate the similarity of topic chains between this vertex and other ones with its
 neighbors by formula (3).
 (17) else
 (18)  Iteratively calculate the similarity of topic chains from the maximal weight of two
 vertices by formula (4).
 (19) until all candidate TDTs only exist the unique topic chain;
 (20)  Updating the weight of edges by formula (5) and pruning completely irrelevant edges in
 the topical graph.
 (21) Readjusting a topical conflict interval between two vertices of corresponding TDTs which
 are connected by the pruned edge.
 (22) S <- choosing the vertices of the maximum degree in the topical graph.
 (23) Integrating the other vertices into topical clusters according to the adjacency and
 similarity relationship.
 (24) Ignoring the isolated individuals and too small topical clusters.
 (25)   Generating topical describing information to implement topic identification.
 (26) Return The set of topical clusters with topical describing information.