Review Article
A Comprehensive Survey of Abstractive Text Summarization Based on Deep Learning
Table 2
The statistics of DUC/TAC datasets.
| Dataset | #Document | Language | #Ground-truth summary | Summary length |
| DUC 2001 | 60 × 10 | Eng. | 3 per cluster | 50, 100, 200, 400 tokens | DUC 2002 | 60 × 10 | Eng. | 128 | 10, 50, 100, 200 tokens | DUC 2003 | 60 × 10, 30 × 25 | Eng. | 128 | 200, 400 tokens | DUC 2004 | 100 × 10 | Ara. & Eng. | 4 per cluster | 100 tokens | DUC 2005 | 50 × 32 | Eng. | 4 per cluster | 665 bytes | DUC 2006 | 50 × 25 | Eng. | 4 per cluster | 250 tokens | DUC 2007 | 25 × 10 | Eng. | 4 per cluster | 250 tokens | TAC 2008 | 48 × 20 | Eng. | 4 per cluster | 250 tokens | TAC 2009 | 44 × 20 | Eng. | 4 per cluster | 250 tokens | TAC 2010 | 46 × 20 | Eng. | 8 per cluster | 100 tokens | TAC 2011 | 44 × 20 | Eng. | 8 per cluster | 100 tokens |
|
|