Abstract

Nowadays, rich quantity of information is offered on the Net which makes it hard for the clients to detect necessary information. Programmed techniques are desirable to effectively filter and search useful data from the Net. The purpose of purported text summarization is to get satisfied content handling with information variety. The main factor of document summarization is to extract benefit feature. In this paper, we extract word feature in three group called important words. Also, we extract sentence feature depending on the extracted words. With increasing knowledge on the Internet, it turns out to be an extremely time-consuming, exhausting, and boring mission to read the whole content and papers and get the relevant information on precise topics

1. Introduction

By increasing the knowledge on the Internet, it turns out to be an extremely time-consuming and boring mission to read whole content and papers and get the relevant information on precise topics. Content summarization is recognized as a key for this matter as it generates programmed briefing of the data. Summarization of text can be defined as an abbreviated version of generated text from several documents without down core contents or impression of the original documents and expressive summary of a certain manuscript by covering greatest imperative part of the contents and with smallest redundancy from different contribution resources. There are various types of content summarization depending on rate of recurrence of input sources, the technique of generated summary, the goal of summary, and the input and output language of summarization process.

Recently, the theory of neutrosophic logic and sets has been introduced. Florentin [1, 2] presented the neutrosophic logic. It is a decision in which each proposition is valued to have three grades such as a grade of truth (T), a grade of indeterminacy (I), and a grade of falsity (F). A neutrosophic set is defined as a set where every component of the universe has a grade of truth, indeterminacy, and falsity, respectively, and lies between [0, 1], which is the nonstandard unit interval [35]. There are various applications using neutrosophic logic as in [6, 7].

In this paper, we propose neutrosophic logic centered multidocument summarization procedure to debrief vital sentences to create nonredundant summary. The projected approach is associate degree extractive primarily built generic report system, and outline within the context of this projected work is matter outline created from one or many news connected documents.

The paper is well-structured as follows. In Section 2, we give some basic concepts on the text summarization system. Section 3 introduces the proposed summarization technique. The fundamentals of neutrosophic sets are introduced in Section 4. The basics of using neutrosophic sets based on information retrieval are introduced in Section 5. Section 6 is devoted to present our approach to document summarization using distance between neutrosophic sets. The conclusion of paper is given in Section 7.

2. Text Summarization

As previously said, text summary is a condensed version of a document that retains the major points and ideas of the original material(s). The goal of a summarizing system is to offer a concise and fluid overview of a given text by addressing the most important parts of the material while minimizing redundancy from various in-place sources.

There exists a range of taxonomies for text summarization [812] supported frequency of input sources, the means of outline generated, purpose of outline, and language of input sources.

There are 2 varieties of algorithms regarding which varied works are printed around text summarization. They are extraction-based summarization and abstraction-based summarization.

The extraction-based technique works by extracting sentences from a document. There is no compression in any format during this technique. It is just a matter of memorizing sentences in order to create a more compact outline.

Abstraction-based reports, on the other hand, are effective. Apart from memorizing the most important sentences, it alters the way a text is organized. The retrieved text is regenerated. It is categorized as a single document or multidocument report depending on the number of input sources considered for generating the outline. Once a document has been provided as an input for a text report, it is known as a single document report, whereas a multidocument report uses a collection of papers as input to create the outline.

The outline of a domain-specific report is generated using domain-specific data, whereas the outline of a domain freelance report (generic) is generated using generic alternatives. Domain-specific report approaches have become popular among academics.

In this research, we offer a document summarization system based on neutrosophic logic for extracting relevant sentences and generating a summary. The planned approach is an extraction-based generic report system, and the outline in this planned work is a matter outline created from one or more news-related papers.

3. The Proposed Document Summarization Technique

Summary is not sufficient to just generate words and phrases that apprehend the source document. Summary also must be accurate and read fluently as a new separate document. Summarization of text [3, 1315] is the duty of creating a brief and fluent summary while retaining the overall meaning and information content. The process of summarization takes some steps: first is the preprocessing of data; second is the feature word extraction; third is the feature sentence extraction; and the last step is the organization of the set of documents to produce the summary. In the last step, we use the neutrosophic logic, and we illustrate it later.

3.1. Input Preprocessing

Some preprocessing activities are required for the set of raw documents before they can be entered into the planned technique.(i)Words that should be avoided or removed: The most commonly used terms, such as “a,” “an,” and “the,” do not have any linguistics data related to the text area unit. All of the stop words have been preprogrammed and saved in a separate file.(ii)Stemming. This is the process of converting all words to their root type by eliminating their prefix and suffix. For the stemming procedure, we employed a porter stemmer.(iii)Removal of Special Characters. House character removes all special characters from a collection of input documents, including punctuation, interrogation, and exclamation.(iv)Segmentation Process. This is a method of extracting each sentence from a document independently. All sentences from documents are retrieved and saved in this manner.(v)When a sentence is segmented, the tokenization process is applied to all of the sentences. It is a technique for isolating words from sentences. It is used to define the character structure, such as the date, time, punctuation, and number.

3.2. Feature Extraction

To perform an efficient document summarization, we consider the feature extraction. Feature extraction is not only limited on words but also on sentence. In the following subsections, we illustrate our method to extract words with different levels of strength. Also, the sentence extraction depends on words feature.

The preprocessed knowledge in word is used to see sentence score in the feature extraction phase. The effectiveness of different sentence evaluation methods is determined by the type of text, genre of text, language, and structure of contribution text. The main belief is that completely distinct themes will enjoy different characteristics, which can be differentiated by a variety of possibilities.

All the text selections are divided into two categories: word level and sentence level alternatives. We have run tests on various combinations of shallow text options on various datasets to find the optimum mix of options that will deliver the greatest results in terms of coverage and relevancy for the news domain. The options that were used in the planned strategy are listed below.

3.2.1. Word Features

The previous methods of text summarization depend on word information in the whole documents. Another way, we can extract feature that recognizes topics by using words without reading the whole document. For example, word “algorithm” can indicate the document field “computer science”; the appearance of this word in any sentence means that this sentence is also important.

The term “document field” refers to basic and mutual information that is useful in human communication.

A field tree is a visual representation of document field relationships. The field tree’s leaf nodes are parallel to terminal fields, super-fields are nodes connected to the root, and other nodes are middle fields. Text field can be cleared efficiently if there are many important words and if the frequency rate is high. Therefore, we can define three levels of important words (IM-W) which will be more effective than using full documents as traditional methods. The three levels of IM-W are defined as follows:(IM-W) 1. This appears with title of document and in one terminal field, and we can calculate it as follows: for the root of supper field F, the child field is F/c; the following formula is used to justify whether or not the word is (IM-W) 1.(IM-W) 2. This appears with more than one terminal field in one medium field.(IM-W) 3. This appears only with one medium field.

3.2.2. Sentence Features

Sentence features are the most important to construct the summary. Two features of sentences are identified: the first is the sentence that contains IM-W and the second is sentence length, and the short sentences do not give any vital information, so short sentences are not recommended. Sentence length score is computed as follows:

3.3. Summarization Process

The summarization process [1618] is done with three steps. First, all the sentences are arranged from the highest to lowest score achieved using the neutrosophic approach. Sentences are chosen based on their degree of resemblance to other sentences in the summary. We used the following formula to determine sentence similarity: Euclidian distance between two neutrosophic sets which is explained in Section 6. The second step is the optimization process; in this step, we delete the repeated sentence and delete the similar sentence which contains the largest number of similar words. The third step is sentences arrangement. Sentences are organized in the final summary in the order in which they appeared in the foundation documents. We have laid up certain guidelines for you, which are as follows:(1)Sentences are arranged in declining order of their importance(2)If two sentences in the same document have the same score and are at the same location, the sentence in the earlier document is given priority over the other sentence

4. Neutrosophic Sets

The neutrosophic set is an influential general frame that has been recently proposed by F. Smarandache in [1, 2]. He presented the grade of indeterminacy (I) as an independent component. At this point, the scale of truth, indeterminacy, and falsity corresponds to any element of a neutrosophic set in an ordinary unit interval [0, 1].

Neutrosophic set definition: Let be a general set, and a single-valued neutrosophic set is an item that is categorized by three membership functions. is a truth-membership function, is an indeterminacy-membership function, and is a falsity-membership function. The total sum of any element deceptions in the closed interval [0, 3].

5. Information Retrieval Based on Neutrosophic Sets Ñ

El in [19] discusses the fundamentals of information retrieval using neutrosophic sets as follows.

Let D be a limited set of documents, . W is a set of words, ; the neutrosophic set Ñ in D is considered by a truth-membership function , an indeterminacy-membership function , and a falsity-membership function , wherever are functions and . Consider a neutrosophic single-valued element of Ñ.

A neutrosophic single-valued [812, 20] set Ñ over a limited universe is characterized as follows:where ,where is the number of appearance of the word in the document , is the number of appearance of the word in the set , and is the number of appearance of the word in the subset .

6. Document Summarization Based on Neutrosophic Sets

We use the distance between two Neutrosophic sets [21, 22] to create a summary with related and closely-related sentences. Single-valued neutrosophic sets [18, 23] are a type of neutrosophic set that were motivated by a practical argument and can be employed in real-world applications like science and engineering. Distance and similarity are important concepts in a variety of fields, including psychology, linguistics, and computer intelligence.

6.1. Neutrosophic Summarization Technique Using Euclidian Distance between Two Neutrosophic Sets

We introduce the distance between two sentences as a single-valued neutrosophic.

Let the sets and be defined over the finite universe , and let be two single-valued neutrosophic sets in . Then, the distance between is as follows:

The Euclidian distance between and is defined as follows:

The normalized Euclidian distance between and is defined as follows:

Example 1. In this example, we explain the whole method in one document, let us have a topic called “computer and math,” and this topic considers a field and a part from the field tree as shown in Figure 1.
We take an article from the subfield “computer science,” an article under title “Environmental impact of computation and the future of green computing.” Assume that is a set of extracted sentence from the document, the set of important words are { Environmental, impact, computation, future, green, computing }, and Ñ is a subset of sentence from . They were selected according to the occurrence of the set of keywords W where are a degree of ‘strong occurrence of important words,’ a degree of ‘indeterminacy of important words,’ and a degree of ‘poor occurrence of important words,’ respectively. The following step is to determine the Euclidean distance between two sentences like S1 and S2:Number of occurrence of keywords in the documents is as follows: Environmental “7,” impact “6,” computation “6,” future “3,” green “4,” and computing “13.” A single value for neutrosophic set N is given in Table 1.

Example 2. From the data of Example 1 and Table 1, the normalized Euclidian distance between and is given as follows:

7. Conclusions and Future Works

The aim of our work is to study another method of text summarization based on neutrosophic sets. The benefit of using neutrosophic sets is that they are used as a good mathematical tool for document summarization via distance between two neutrosophic sets.

The expected future work for our paper is to compare this method of document summarization with other methods like fuzzy logic and fuzzy ontology.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.