Research Article

MT Evaluation in the Context of Language Complexity

Table 1

Dataset composition.

Feature typeFeature nameGT_MTGT_PEMTMT@EC_MTMT@EC_PEMT

ReadabilityAverage sentence length (words)8.458.817.828.75
Average word length (characters)5.515.895.705.89
Number of short sentences (n < 10)63.37%56.44%67.49%58.35%
Number of long sentences (n ≥ 10)36.63%43.56%32.51%41.65%

Lexico-grammaticalFrequency of nouns32.81%36.82%31.25%36.23%
Frequency of adjectives8.22%9.00%11.81%9.05%
Frequency of adverbs3.16%2.70%3.11%2.73%
Frequency of verbs16.57%16.11%15.00%15.87%
Frequency of pronominals3.13%3.03%3.02%3.27%
Frequency of participles1.63%1.89%1.62%1.97%
Frequency of morphemes1.45%1.32%1.34%1.36%
Frequency of abbreviation3.01%2.40%3.94%2.44%
Frequency of numbers3.87%3.65%4.32%3.53%
Frequency of undefinable POSs0.29%0.23%1.06%0.34%
Frequency of foreign words6.98%4.84%6.02%5.14%
Frequency of interjections0.02%0.02%0.02%0.02%
Frequency of numerals0.75%0.49%0.70%0.42%
Frequency of prepositions & conjunctions18.10%17.49%16.80%17.63%