Abstract

Learners with reading difficulties normally face significant challenges in understanding the text-based learning materials. In this regard, there is a need for an assistive summary to help such learners to approach the learning documents with minimal difficulty. An important issue in extractive summarization is to extract cohesive summary from the text. Existing summarization approaches focus mostly on informative sentences rather than cohesive sentences. We considered several existing features, including sentence location, cardinality, title similarity, and keywords to extract important sentences. Moreover, learner-dependent readability-related features such as average sentence length, percentage of trigger words, percentage of polysyllabic words, and percentage of noun entity occurrences are considered for the summarization purpose. The objective of this work is to extract the optimal combination of sentences that increase readability through sentence cohesion using genetic algorithm. The results show that the summary extraction using our proposed approach performs better in -measure, readability, and cohesion than the baseline approach (lead) and the corpus-based approach. The task-based evaluation shows the effect of summary assistive reading in enhancing readability on reading difficulties.

1. Introduction

Students with learning difficulties are diverse in nature, and the difficulties include reading difficulties, math difficulties, writing difficulties, and other problems related with the learning process. About 8% to 10% of students are having learning difficulties [1]. As reading is an essential component in life, our focus is mainly on the learners with reading difficulties. Challenges do exist while comprehending and organizing information from text for such learners. Learners find greater difficulty when they are reading at their earlier stage because of the amount of information to be read [2]. Summarizing can either be taught [3] or understood as one of the several strategies [4] and can be shown to improve their comprehension and recall skills about whatever is read [5].

The impact of summarization in comprehending the text is significant for learners with reading difficulties [2]. As a strategy performed either during or after reading, summarizing helps readers to focus on the main ideas and other key skill concepts that have been taught and to disregard the less relevant ones [6]. Reading comprehension is defined [7] as “the process of simultaneously extracting and constructing meaning.” This definition emphasizes on both decoding and integrating new meanings with old information. The act of comprehending entails three essential elements, namely, (1) the reader, (2) the text, and (3) the activity [8]. For learners with reading difficulty, the activity element (summarization) can be used to improve reading comprehension. When the learner does not have significant effect under remedial course, the alternate option is to support the learner through assistive tool. As the learner may have a problem in encoding the text, due to slower encoding speed, units could get lost from the working memory before they are being processed [9]. Such learners should map the content of the text to an image, so the trigger words could often create confusion to these individuals [10]. The above-mentioned factor has to be considered while extracting summary sentences. Our objective is to extract summary of a given text that balances -measure, readability, and cohesion using genetic algorithm. The learner will be benefited by the summary of the given text which reduces their difficulty in reading and understanding.

Automatic document summarization takes a source document (or source documents) as input, and then it extracts the essence of the source(s), to present a well-formed summary to the user [11]. Document summaries can be abstracts or extracts. An extract summary consists of sentences extracted from a document while an abstract summary may contain words and phrases which do not exist in the original document [11]. The sentences are scored using extracted features which are based on importance and readability. Sentences having higher informative score are extracted and ordered chronologically. The major issue of extractive summarization is the lack of cohesion. Cohesion is the extent to which the ideas in the text are expressed clearly and relate to one another in a systematic fashion, by avoiding a confusing jumble of information. Several studies have clearly established a link between cohesion and comprehension [12]. Higher cohesion has been found to facilitate comprehension and recall. The optimal level of cohesion depends on the knowledge level of the reader [13], and it was found that low knowledge readers could be benefited from high cohesion texts, as most researchers would expect. The objective of this work is to improve cohesion in the extracted sentences to form summary of a specified length by using genetic algorithm.

The rest of the paper is organized as follows. In Section 2, we discuss the related works in the text summarization using genetic algorithm and cohesive summary extraction. Section 3 describes the proposed methodology. In Section 4, the results based on intrinsic evaluation are discussed, and the final section deals with discussions on extrinsic evaluation.

Text summarization based on machine learning has got tremendous improvement over wide areas of applications. Machine learning algorithms can be applied in almost all the phases of text summarization such as feature selection, finding the optimal weight for each feature, and sentence extraction. Genetic algorithms (GAs) are a relatively new paradigm for a search, based on principles of natural selection. GAs, introduced by Holland, were proven to be the most powerful optimization technique in a large solution space [14, 15]. They are used where exhaustive search for solution is expensive in terms of computation time. The dominance of GAs in each phase of text summarization can be widely seen in various research areas.

In general, Vafaie and De Jong [16] suggested that GAs are a useful tool for solving difficult feature selection problems where both the size of the feature set and the performance of the underlying system are important design considerations. In case of text summarization, Silla et al. [17] investigated the effectiveness of genetic algorithm-based attribute selection to improve the performance of classification algorithms by solving automatic text summarization task. Also, Suanmali et al. [18] proposed an approach called feature fusion technique to discover which features out of the available ones are most useful for extracting summary sentences.

To determine the feature weights, Litvak et al. [19] designed a new approach to a multilingual single-document extractive summarization using a genetic algorithm which will find an optimal weighted linear combination of 31 statistical sentence scoring methods with all language-independent features. Yeh et al. [20] proposed a novel trainable summarizer, which has several kinds of document features, and two major ideas were employed to improve the conventional corpus-based text summarization. First, sentence positions are ranked to emphasize the significances of different sentence positions; second, the score function was trained by the genetic algorithm to obtain a suitable combination of feature weights. Fattah and Ren [21] investigated the effect of each sentence feature on the summarization task and used all features in combination to train genetic algorithm and mathematical regression (MR) models to obtain a suitable combination of feature weights.

To extract the summary sentences, Fatma et al. [22] and Liu et al. [23] used GAs, to find sets of sentences that maximize summary quality metrics, starting from a random selection of sentences as the initial population. To choose the best summary, multiple candidates were generated and evaluated for each document. Qazvinian et al. [24] used GA to produce document summary. They proposed a fitness function based on following three factors: readability factor (RF), cohesion factor (CF), and topic-relation factor (TRF). lack of cohesion is the major drawback of extractive summarization, which can be improved by various techniques.

Various attempts were made to improve cohesion in extractive summarization. Smith et al. [25] investigated four different ways of creating 100 word summaries, and the results proved that the summary produced by a traditional vector space-based summarizer is not less cohesive than a summary created by taking the most important sentences from the summarizer. The simplest form of cohesion is lexical cohesion. Morris and Hirst [26] introduced the concept of lexical chains. Researchers explored various types of cohesion measures, namely, clue words that are based on stems, semantic distance based on WordNet, and cosine similarity that is based on the word TF-idf vector. Mani and Bloedorn [27] introduced few methods based on text cohesion which models text in terms of relations between words or referring expressions, to help determine how tightly connected the text is. Silber and McCoy [28] discussed that the lexical chains represent the lexical cohesion among the arbitrary number of related words. By using lexical chains, they can find statistically the most important concepts by looking at the structure in the document rather than the deep semantic meaning. Barzilay and Elhadad [29] presented a less greedy algorithm that constructs all possible interpretations of the source text using lexical chains. Their algorithm selects the interpretation with the strongest cohesion, and then the strong chains are used to generate a summary of the original.

In this direction, we exploit the strength of genetic algorithm for improving the readability through optimal combination of sentences based on lexical cohesion, where the number of sentences depends on the compression ratio.

3. Proposed Work

For a given text, the input text is preprocessed, and features like average sentence length (f1), percentage of trigger words (f8), title to sentence similarity (f2), centrality (f3), percentage of polysyllabic words (f4), percentage of noun occurrences (f5), sentence position (f7), positive keywords (f9), and term frequency (f6) are extracted from the text. The significance of learner-dependent features in summary extraction is discussed in [30]. The sentence score is calculated using weights and features vectors. The sentence combination for summary generation is identified using GA based on sentence similarity and sentence score. The working model of the process is shown in Figure 1. When the input text is large, the number of candidate summary sentences is also large and the feasible combination to form an extractive summary thus grows exponentially. So, the search for optimal solutions under multiple criteria has been proven to be computationally demanding and time consuming. Evolutionary computation has been proven to be an effective way to solve such problems. Genetic algorithm, an evolutionary algorithm, has the ability of avoiding the local search and can increase the probability of finding the global best. The proposed model formulates the sentence extraction problem, that considers both informative score and cohesion using genetic algorithm. An effective integer coding strategy, which can reduce encoding and decoding time, is applied in this process.

3.1. Document Representation

Documents to be summarized can be represented as graph . Sentences in the documents are represented as nodes (), and the similarity between sentences is represented as edges (). The graph representation of a sample document is shown in Figure 2. Graph-based extractive summarization algorithm aims at identifying the most important sentences from a text, based on information exclusively drawn from text itself [31]. Graph-based methods are unsupervised and rely on the given texts to derive an extractive summary. The proposed algorithm consists of following steps.(1)The informative score of the sentences are estimated using feature vectors. (2)Then, similarity between the sentences is estimated using cosine similarity. (3)The optimal combination of sentences for subgraph extraction is found, which balances both informative scores and readability using genetic algorithm.

3.2. Informative Score

The Document consists of a set of sentences . The informative score can be calculated using feature vectors. where the features vectors of the document can be represented using Vector space model. The vector representation of a given document is shown in Table 1.

The informative score can be calculated by linear combination of all features. Where the impact of features varies from domain to domain, the feature weights can be calculated using the linear regression model [32]. Mathematical regression is a good model to estimate the feature weights. In this model a mathematical function relates output to input as in (2):

Here, is the input matrix (feature parameters). is the output vector. is the linear statistical model of the system (weights ) in (2):

where is the value of th feature in th sentence and is the weight of feature “.” The sentence score can be calculated as a linear combination of the feature vector and its corresponding weight:

3.3. Similarity Score

Similarity between the sentences can be calculated using cosine similarity. The terms () in the sentences are weighted using Tf-isf as in (6). The term frequency in the given document is the number of times a given term appears in the document:

where is the number of occurrences of the term and is the sum of occurrences of all the terms in the document. The inverse sentence frequency is a measure of the importance of the term:

where is the number of sentences in the document and is the number of sentences containing the significant term. The corresponding weight is, therefore, computed as,

where is the term frequency of th index term in th sentence and is of inverse sentence frequency. The similarity of the sentence can be measured using cosine similarity as in (7):

Similarity matrix can be composed using similarity metrics. The lower triangular values are made zero to avoid backward traversal in the graph and to preserve ordering in sentences while extraction, and also the diagonal elements are made zero as there is no self-loop.

3.4. Subgraph Extraction

The subgraph (summary) extraction technique can be formulated as a combinatorial problem. The idea is to extract the high informative score sentences that have good cohesion where the informative score can be calculated using feature score and its weight and cohesion can be measured using the sentence similarity.

3.5. Problem Formulation

The objective of this work is to maximize the average similarity between the sentences having higher informative score. The objective function is as follows:

Maximize

subject to constraints

Here, the constraint ensures only forward traversal in the graph and ensures that the summary has a cohesive nonredundant candidate sentences. The basic steps in genetic algorithm are given in Algorithm 1.

(1) procedure SUMMARY EXTRACTION(.txt file)
(2) Create initialize population choose K different combinations of the sentences
(3) Evaluate initial population   Calculate the fitness based on (13)
(4) Select the best chromosomes    select the chromosomes with current best fitness
(5) Generate child chromosome    apply modified crossover and mutation to produce child chromosomes
(6) Replace   weaker chromosomes are replaced by potential chromosomes
(7) Stopping criteria   Repeat steps (3–6) until specified max generation is attained
(8) Output Extract the summary sentences from the best chromosome as final solution
(9) return   Candidate Sentences          Summary sentences
(10) end procedure

3.5.1. Chromosome Encoding

A chromosome of the proposed algorithm consists of sequence of positive integers that represent the sentence number of the original text for the summary. To represent the solutions of the sentence extraction problem, effective population coding strategy must be designed. Usually binary coding strategies are adopted, in which the size of the chromosome is equal to the input text. Whether a particular sentence is selected or not can be decided by “0” or “1.” Though it is simple to implement, when the text size grows, it may be time consuming to generate a summary. To overcome this problem, integer coding is suggested. Here the size of the chromosome is equal to the size of the summary and is not equal to the input text. Each locus of the chromosome represents the order of sentences in the summary. The size of the chromosome is fixed and is decided by the compression rate (CR) of the text. With respect to the educational domain, the text at the beginning and at the end is considered as more significant. As the text belongs to the educational domain, the genes of the first and last loci are allocated for the first sentence and last sentence of the text, respectively.

The genes of the chromosomes are sorted in ascending order after random generation in order to preserve the chronological order of the summary, which helps in better comprehension.

3.5.2. Initial Population

Initial population is generated using the encoding technique explained in previous section:

where is the population size, is the size of the chromosome, and is the number of generations. At random generation “,” any individual chromosome can be represented as

3.5.3. Fitness Function

After initialization of the population, the process of evaluating the quality of the chromosome is done using the fitness function. The fitness function aids in finding the optimal combination of sentences and tends to balance the informative score and the sentence similarity. The fitness function is as follows:

where represents the fitness value of the th chromosome. represents the size of the chromosome.

3.5.4. Selection

“Survival of the fittest” can be achieved through the selection operator. The improvement in the average quality of the population can be done by considering good quality chromosome and has to be copied for the next generation. In order to maintain the overall optimal combination, always offspring having better fitness will survive over generations.

3.5.5. Modified Crossover

The objective of exploring the search space can be attained using crossover operator. It examines the current solution in order to find the better ones. Crossover in sentence extraction plays the role of exchanging each partial summary of two selected chromosomes in such a way that the offspring produced by the crossover represents single summary. When pair of individuals has been selected for crossover, single-cut crossover is the best candidate because more number of intermediate points for crossover might lead to redundant information. In single-cut crossover, summaries generated by two different parents are combined at a particular point. In general, after crossover, there is a possibility for occurrences of redundant sentences in summary as the problem is basically a combinatorial one. To avoid the redundancy of genes in chromosome, modified crossover is adapted and shown in Algorithm 2. Here, the crossover between two parent chromosomes is possible only if there is a common gene between them at the same position. The crossover probability depends on the degree of randomness in the genes of the parent chromosome.

 procedure  CROSSOVER   The crossover of chromosome A and B
  Input:           Chromosomes before crossover
  Output:           Chromosomes after crossover
  while     do        for the entire size of chromosome
   if   then check the bit positions of chromosomes
                store the values in a list
   else
    
   end if
  end while
       Select a random bit position from list
  for   do
       Copy partial values of parent(A) to child(a)
       Copy partial values of parent(B) to child(b)
  end for
  for    do
        Copy remaining values of parent(B) to child(a)
        Copy remaining values of parent(A) to child(b)
  end for
  return              Chromosome after crossover
 end procedure

3.5.6. Modified Mutation

Mutation is a background operator that produces random changes in various chromosomes and maintains diversity in the population. The population undergoes mutation by an actual change or flipping of one of the genes of the candidate chromosomes, which in turn keeps away from local optima [33]. A random gene in the chromosome is selected and replaced with values that do not duplicate sentences. An arbitrary bit position is chosen and is replaced with values nearer to the values of its neighbor. If such a neighbor does not exist, another end neighbor will be considered for value selection which is given in Algorithm 3. This reduces the probability of big changes in summary fitness, because sentences which are next to each other in chronological order of the text may talk highly about the same idea [24].

 procedure MUTATION           Mutation in chromosome R
   Input:            Chromosomes before mutation
  Output:            Chromosomes after mutation
        select the mutation bit position
          generate random number such that
  if   then   greater than or equal to specified value
   if   then     less than specified value
    if   then    not equal to the same value
          Replace the value of the bit by X
    end if
   end if
  end if
  return            Chromosome after mutation
 end procedure

3.5.7. Stopping Criteria

The stopping criteria for implementation purpose can be decided as follows:(1)when an upper limit on the number of generations is reached or(2)the chance of getting changes in the consecutive generation is extremely low.

3.6. Experiments and Results
3.6.1. Dataset

A collection of hundred and fourteen articles from educational text of grade four to grade seven are used for evaluation. The texts are mainly from science and social subjects because of the challenges faced by the reading difficulties in understanding the concepts. The text basically belongs to three different corpus, one is from the text books [34] and another is of grade level text from Scales [35]. The statistics of the data corpus used in the experiments are tabulated in Table 2.

The summaries of these texts are created manually by three independent human annotators. The sentences to be included in the summary are ranked by the annotators. The sentences that score maximum rank would be included in the summary according to the compression ratio.

3.7. Evaluation

Extractive summary can be evaluated using various characteristics such as accuracy, cohesion, and readability. Accuracy in extraction measures how far the technique is capable of predicting the correct sentence. Evaluation can be classified into intrinsic and extrinsic evaluation. Intrinsic evaluation judges the summary quality by its coverage between machine-generated summary and human-generated summary. Extrinsic evaluation focuses mainly on the quality by its effect on other tasks. We consider both methods because the generated summary should be informative as well as readable. The former part is objective and can be verified using intrinsic evaluation, and the latter part is subjective and can be evaluated using the extrinsic method. In intrinsic evaluation, precision (), recall (), and -measure () are used to judge the coverage between the manual and the machine generated summary:

where is the machine generated summary and is the manual summary. The readability of the summary can be evaluated by existing metrics. Extrinsic evaluation can be done to verify the usability of the summary, by its target audience.

3.7.1. -Measure

-measure is used to evaluate the extracted summary, as it is a combination of both precision and recall. The results of various summarization methods based on the -measure are tabulated in Table 3. When the compression ratio is 10%, GA performs better in summary extraction when compared to modified corpus-based approach [30] and lead method. When the compression ratio is 20%, it is better than modified corpus approach with a slight improvement, but when 30%, the performance is better than the lead method and lower than the modified corpus-based approach. The reason is that the objective of this approach is not to improve the accuracy of the extraction but to improve its readability or understandability through cohesion, and at the same time is not to dilute the accuracy for the sake of readability. The evaluation criterion focuses mainly on readability and cohesion of the summary.

3.7.2. Readability

To date, there exist numerous readability metrics for predicting the readability of a given text. We considered FOG [36], SMOG [37], and Flesch-Kincaid [38] for predicting the readability of the summary.

The basic difficulty of the text can be measured using average sentence length and average word length. FOG and SMOG predict the readability using average sentence length and the percentage of words with at least three syllables as parameters. The Flesch-Kincaid grade level score uses a fixed linear combination of average words per sentence and average syllable per word. Combining both features result in overall accuracy. Flesch-Kincaid predicts the readability of the summary generated using FCBA as better when compared to others, while FOG predicts the summary generated by both FCBA and GA as better. The comparison of readability score of summaries generated through Flesch Kincaid, SMOG, and FOG are shown in Figure 3. Depending upon the application and the features that have to be considered, suitable readability metrics can be used. For independent-level reading, SMOG works better than the other formulae. It is intended to measure the readability of the material that the teacher has suggested to a student to use independently [39].

SMOG predicts that the summary generated by GA produces better readability. The variations are due to the difference in the parameter that metrics consider for readability prediction.

Despite their strengths, there are theoretical shortcomings in using readability formulae to evaluate texts. In particular, the emphasis on shallow features like word length and sentence length is difficult to capture the deep, structural properties of the text that reflect cohesion. The cohesion of the extracted sentences can be measured by applying cosine similarity between consecutive sentences. When sentences in the documents are represented as term vectors, the similarity of two sentences corresponds to the correlation between the vectors.

3.7.3. Cohesion

Cohesion is defined as the set of linguistic means available for creating texture [40], that is, the property of a text being an interpretable whole (rather than unconnected sentences). Lexical cohesion between the sentences in a summary can be measured using cosine similarity and is compared to the summary generated using various models.

For 10% compression, baseline and GA perform better in extracting a cohesive summary. For 20% compression, GA extracts a better cohesive summary than FCBA and baseline. For 30% compression, GA extracts a better summary and the comparison based on cohesion is measured using cumulative sentence similarity as shown in Figure 4. It is clear that the summary cohesion is mainly based on the content distribution in the text. Baseline may be cohesive, but it fails in predicting the better summary sentences. GA considers all factors without losing any of them, namely, -measure, readability, and cohesion, while extracting a summary.

3.7.4. Task-Based Evaluation

To evaluate the satisfaction of the learners about the summary generated from various methods for over viewing the text, task-based evaluation is used. A way to measure a text’s difficulty level is to ask the learner various subjective questions in which they must evaluate how easy it is to understand a text. This metacomprehension of text has been measured using Likert scales in educational studies [41]. A sample set of learners have been asked to fill up a questionnaire using a five-point Likert scale as given below.(1)Do you think the summary will help you to understand the text better?(2)Do you think the summary is easy to read?

The feedbacks obtained from the learners are as shown in Figure 5. User evaluation using Likert scale point is an indirect way of measuring text readability. So, the text readability can be evaluated directly to measure the comprehension of information from a text: objective comprehension questions such as multiple-choice or yes/no type questions, which have been used with learners in educational and psychological studies of comprehension. Though the likert scale feedback is as indirect measurement, the learner may not have clear distinction between the various summaries of the text. The direct way to measure the text comprehension is to evaluate through objective comprehension questions.

The experiments were carried out on fifteen learners with reading difficulties ranging from the fourth grade to the seventh grade. Each learner is given two texts, one from text books with comprehension questions and another with a summary followed by the text and objective questions from the grade level text. The effect of summary assistive reading on students with reading difficulties in enhancing readability and comprehension for both experimental and comparison groups responses is as shown in Table 4. To analyze the results, paired -test is used to compare the mean scores and the significance level of experimental and comparison groups with assistive summary and without assistive summary on comprehension test. There was no significant difference in scores between the two groups, when only the text is provided for comprehension test (, ). On the other hand, the groups supplemented with the assistive summary test showed a significant difference between the groups (, ). The experiment group () scored high when compared to the comparison group. From the results, it is evident that an assistive summary has a positive effect in enhancing readability and comprehension of a given text. The learner is given easy to read important sentences which increases his/her interest and motivation. The objective questions focus much on evaluating recall, recognize, and on analysis or comprehension type questions of Blooms taxonomy. It is clear from the questions answered by the learners that the summary helps them in answering literal questions better than analytical questions. Moreover the summary that is easy for one learner may not be easy for another. In future, we plan to develop a graded automatic summarization with various levels that can be better suited to a specific range of learners.

4. Conclusion

When the target audience are specific, the usability of summarization can be measured through task-based evaluation. It is equally important that the quality of the summary is not diluted for its easiness. In this paper, we proposed a genetic algorithm-based text summarization that considers informative score, readability score, and sentence similarity for summary extraction. The performance of GA-based summarization is better in -measure, readability score and cohesion when compared to corpus-based approach and baseline method (lead). GA-based summary extraction approach can be used to extract summary from the text that belongs to science and social subjects. Furthermore, the summary can be used as an overview before reading the complete text, to improve the readability and comprehension.