Review Article

Applications of Natural Language Processing in Biodiversity Science

Figure 4

Shallow-vs-Deep-Parsing. The shallow parsing result produced by GENIA Tagger (http://text0.mib.man.ac.uk/software/geniatagger/). The deep parsing result produced by Enju Parser for Biomedical Domain (http://www-tsujii.is.s.u-tokyo.ac.jp/enju/demo.html). GENIA Tagger and Enju Parser are products of the Tsujii Laboratory of the University of Tokyo and optimized for biomedical domain. Both Parsing results contain errors, for example “obovate” should be an ADJP (adjective phrase), but GENIA Tagger chunked it as a VP (verb phrase). “blade” is a noun, but Enju parser parsed it as a verb (VBD). This is not to criticize the tools, but to point out language differences in different domains could have a significant impact on the performance of NLP tools. Parsers trained for a general domain produce erroneous results on morphological descriptions [16].
391574.fig.004