(i) | Input: Text, a text being parsed. UNISeg, a Vietnamese EDU segmentation model. Patterns, a list of patterns for recognizing discourse markers and their symbols being used in grammar G1 and G2. G1, CFG for recognizing reason relations at inner-sentence level. G2, CFG for recognizing reason relations at intersentence level. Output: Spans, a list of text spans which are EDUs or parts of EDUs from the input Text. Rels, a list of reason relations in form (i, j) where i is the text span index which is the reason of the text span index j. |
(1) | Sents ⟵ SentDetect(Text) |
(2) | LookupTable ⟵ {} |
(3) | TextSyms |
(4) | for sent_id = 1 to |Sents| |
(5) | EDUs ⟵ EDUSegment(Sents[sent_id]) |
(6) | SentSyms ⟵ [] |
(7) | for edu_id = 1 to |EDUs|: |
(8) | ConvertToSymbol(EDUs[edu_id], symbols, lookup) |
(9) | LookupTable.append(lookup) |
(10) | tree ⟵ Earley(symbols, G1) |
(11) | SentSyms.append(tree.childNodes()) |
(12) | tree ⟵ Earley(SentSyms, G1) |
(13) | TextSyms.append(tree.childNodes()) |
(14) | tree ⟵ Earley(TextSyms, G2) |
(15) | subtrees ⟵ tree.childNodes() |
(16) | base_index ⟵ 0 |
(17) | Rels ⟵ [] |
(18) | for subt_id = 1 to |subtrees| |
(ii) | rel ⟵ GetRelation(subtrees[subt_id], base_index) |
(19) | Rels.append(rel) |
(20) | base_index + = |subt.leaves()| |
(21) | Spans ⟵ LookupTable.values() |
(22) | return Spans, Rels |