OMFM: A Framework of Object Merging Based on Fuzzy Multisets

Yue, Lin; Zuo, Wanli; Feng, Lizhou; Guo, Lin

doi:https://doi.org/10.1155/2014/304537

Mathematical Problems in Engineering

On this page

Abstract Introduction Preliminaries Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2014 | Article ID 304537 | https://doi.org/10.1155/2014/304537

OMFM: A Framework of Object Merging Based on Fuzzy Multisets

Lin Yue,^1,2,3Wanli Zuo,^1,2Lizhou Feng,^1,2and Lin Guo^1,2

Academic Editor: Pandian Vasant

Received02 May 2014

Accepted16 Oct 2014

Published10 Nov 2014

Abstract

Information fusion is a process of merging information from multiple sources into a new set of information. Existing work on information fusion is applicable in various scenarios such as multiagent system, group decision making, and multidocument summarization. This paper intends to develop an effective framework to solve object merging problem based on fuzzy multisets. The objects defined in this paper are data segments in document fusion task, referring to the concepts with semantic-related terms of different semantic relations embedded. The fundamental operation is the merge function mapping data segments in multiple fuzzy multisets onto one object, which is a solution. Under this framework, we define quality measures of purity and entropy to quantify the quality of the solutions, balancing accurateness, and completeness of the results. Merge function that yields this kind of solutions is VI-optimal merge function and a series of theoretical properties concerning it are studied. Finally, we investigate the proposed framework in a special application scenario (i.e., document fusion) which is related to the task of multidocument summarization and show how the framework works with illustrative example.

1. Introduction

As an important research area, information fusion is a process of merging information from multiple sources into a new set of information. There are many applications in this research area such as heterogeneous database, multiagent system, group decision making, and multidocument summarization. Under different application scenarios, different principles and procedures are utilized to solve the problems. Many classical mathematical theories of aggregation operators [1–5] have been developed for multiagent system and group decision making system, and the information that aggregation operators try to fuse typically expresses facts of opinion or score of an agent. Besides these researches, a fair amount of work focused on the situation where the source is regarded as a propositional belief [6–8]. The existence of nonfactual knowledge like integrity constraints and inference rules makes the difference between these two theories. As a result, a lot of work has been done in the heterogeneous database area on first-order theory. Another type of fusion is that each source presents knowledge by means of a possibility distribution [9], in this case, the imperfection of incorrectness, uncertainty, and incompleteness in the data should be coped with. The main challenge is how to deal with conflicting information provided by different sources.

To address the issues in the third type of fusion, a framework of object merging has been investigated by using multiset theory currently, which could be utilized to solve the problem of multidocument summarization (MDS) [10]. Also, object merging is a hot spot of research in many domains with good prospect for application. The framework of multiset merging for MDS has defined the merge function which maps the objects in multisets onto a single object and has got some foregoing results which cannot be considered as a final summarization yet [11, 12], owing to the fact that these foregoing results are just some keywords without any relation among them, not mention to context of co-text, context of culture, context of situation, and so forth. The essential reason for this result is that the framework defined the quality measures with the multiplicity of element as the measure of important element. In other words, the multiplicity is equal to term frequency which is just shallow text feature. When performing source selection in MDS, the traditional method transformed one document into the representation of a vector of words or a multiset of words, which are just simple settings. Other progressive approaches should be proposed, which are semantically richer than using words as source representation. In short, the problem of processing coreferent objects has not been deeply investigated at present. On one hand, merging of nonquantitative objects, especially the objects with semantic information, has not been proposed. On the other hand, object merging functions and the rationality of merging still need to be further investigated.

Within the scope of our paper, we also focus on the problem of object merging in information fusion, and our work should be treated as an extension of the framework mentioned hereinbefore. There are many differences between these two works. The basic difference concerning the definition of coreferent objects: coreferent objects in paper [11] are the objects describing the same entity in the real world, while in our paper the object we’re discussing is a piece of data or information, which could be used to denote the same concept with semantic-related terms of different semantic relations embedded. Then, fuzzy multiset theory is investigated in our paper, in which membership degree function and length function are used to describe both uncertainty and repeatability of the natural language. When performing fusion in practical situations, the object merging process has considered deep text features of semantic relations such as hypernym, synonym, and antonym. Moreover, two quality measures (purity [13] and entropy [14, 15]) widely used in the text mining literature are adopted to quantify the result of a merge function. Thus, the behavior of the merge functions we defined in this paper can be characterized by the behavior of the quality measures. With this strategy, we can get an optimal merge result. The possible application of this work is document fusion [16], where a collection of textual documents is used to produce the shortest description containing all information found within the document set, but without repetition. Existing solutions for this problem normally focused on statistical methods or heuristics methods used in multidocument summarization [17, 18]. In this paper, object merging based on fuzzy multisets (OMFM) is definitely a meaningful attempt, where a source set of multiple documents is denoted as a multiset and each document is denoted as a fuzzy multiset of multiple concepts.

This paper is organized as follows. In Section 2, we review mathematical preliminaries. Furthermore, the general framework of objects and object merging are proposed in Section 3, and definition of the quality measures and construction of merge functions are introduced in Section 4. Next, demonstration about how our framework works on practical problem (i.e., document fusion) with illustrative example is presented in Section 5. Finally, in Section 6, we give the conclusion and future work to the proposed framework OMFM.

2. Preliminaries

In mathematics, fuzzy set introduced by Zadeh in 1965 is set whose elements have degrees of membership which is an extension of the classical notion of set [19, 20]. Fuzzy set theory is very useful to deal with problems that are not easily handled by classical computing techniques. On the other hand, the use of membership degrees instead of real numbers to represent memberships also provides a mean to measure the possible uncertainty in languages computational theory. The notion of multiset is a generalization of the classical notion of set in which members are allowed to appear more than once. As a data structure, multiset stands in between strings where a linear ordering of symbols is presented and sets where no ordering is considered. Combined with the notion of fuzzy set, multiset is generalized to fuzzy multiset [21], which could describe both uncertainty and repeatability of the natural language. Consider one language modeling problem: given some sentences, identify the concepts and words which are similar or identical, and merge these objects to get a condensed description. This task is a challenging natural language problem with large amounts of diverse and compositional data. To solve this problem, we extend fuzzy multiset to produce a language model which maps data segments in multiple fuzzy multisets onto one object, where different semantic relations for one concept are treated as repeated elements with different membership degree in fuzzy multisets. In this section, mathematical theories of fuzzy set, multiset, and fuzzy multiset will be briefly reviewed.

Definition 1 (membership function). The membership function indicates the degree of belonging to . indicates that element completely belongs to set ; that is, is the concept of traditional set.

Definition 2 (fuzzy set). The membership function over defines a fuzzy set, which is represented as . Fuzzy set with elements can be denoted as .
According to the definition of fuzzy set, to what extent an object belongs to a set is not fixed any more, and the membership of each object falls in the range of interval .

2.1. Multiset

Definition 3 (multiset). A multiset over the universe is defined as , where . For , denotes the multiplicity of in .
The cardinality of a multiset is given by [22].
A multiset also could be denoted as . The set of all multisets drawn from a universe is denoted as .

There are some basic operators and relations of multiset below: Inclusion: Equality: Intersection: Union: Addition:

Definition 4 (-cut set of multiset). The -cut set of a multiset is denoted as and given by .
Note that the difference between the notation and is that the former one means assigning an index to the multiset and the latter one means the -cut set of the multiset [23].

2.2. Fuzzy Multiset

Combined with the concept of fuzzy set, the traditional concept of multiset could be denoted as where , represents the membership degree of element recurring in multiset for the th time. According to the traditional concept of set, the membership degree of all elements in a multiset is 1. For , the traditional sense of multiset is generalized to fuzzy multiset.

Definition 5 (fuzzy multiset). A fuzzy multiset over the universe is defined as where is the degree of membership of element ().

Definition 6 (length function). The concept of multiplicity in multiset generalized to fuzzy multiset is length function, which is denoted as [24].

The number of occurrence or cardinality of a fuzzy multiset is given by [22].

There are some basic operators and relations of fuzzy multiset below: Inclusion: Equality: Addition: For , , . Intersection: Union: Note that when performing any operator for two fuzzy multisets, the length of the membership degree sequences and , will be set up to be similar. For this reason, adequate number of zeros is affixed. For , the membership degree is arranged by the decreasing order of the elements in the sequence, where and .

Example 7. A fuzzy multiset over ; the membership degree of element is ; the membership degree of element is ; the membership degree of element is ; the membership degree of element is , where , , , and .
The set of all fuzzy multisets drawn from a universe is denoted as .

Definition 8 (-cut set of fuzzy multiset). The -cut set of a fuzzy multiset is denoted as and given by Note that the difference between the notation and notation is that is preserved for the -cut set of the fuzzy multiset , while means assigning an index to the fuzzy multiset .

3. Objects and Object Merging

3.1. The General Framework

We have reviewed the most relevant definitions in the previous section. As we’ve mentioned earlier, the framework in our paper extends the work in paper [11], so now we will introduce some work basis below. The bases involve the redefinitions of coreferent objects and merge function in OMFM, and a brief review of properties of preservation and majority rule in [11].

The bases involve the redefinitions of coreferent objects and merge function in OMFM.

Reference function is formalized to describe a concept in the real world, where symbolizes the real world. By definition, two concepts are called coreferent if they describe the same real world concept.

Definition 9 (coreferent objects). Let be a universe set of concepts. Two concepts and are coreferent if and only if .

By the definition above, two objects that describe the same real world concept with semantic-related terms of different semantic relations embedded are formalized axiomatically. Here, we consider the context as the baseline: when describing a theme in a document, some semantic-related terms relating to this concept will be used to extend the theme.

Definition 10 (merge function). The merge function in OMFM is represented by function .
Mapping the fuzzy multisets of objects onto a single object is the job of merge function in our work, and these functions are often idempotent; that is, . This conclusion is also suitable in this paper and corresponding proof will be given in the following section.

A brief review of two important properties.

Property 1 (preservation). A merge function is preservative when merge function only selects one of the elements from the source set, the property of preservation in OMFM is denoted as

Property 2 (majority rule). If the multiplicity value of an element is larger than the half of cardinality value of the source set then this element must be selected by the merge function, which is denoted as

The majority rule above is an important property for merge function in multiset that was further studied in [25] and a weaker version has been proved in [26]. By now, the majority rule is not extended deeply in fuzzy multiset as it does not apply in general, but the preservation rule will be elaborated in our paper.

3.2. Merging of Fuzzy Multisets

Within the scope of OMFM, we focus on the case of object merging of compound a multiset and multiple fuzzy multisets with the function of the type below: where the elements of are denoted as , and the elements of are . Here, the multiset could be denoted as , where denotes the multiplicity of in .

The fundamental operator is mapping the data segments of fuzzy multisets onto one object, which is called a solution. In following sections the symbol is used to represent a random solution of a given merge function; that is, .

Example 11. Given a fuzzy multiset over and the multiset , such that Here, consider two merge functions we have mentioned in Section 2, . Then, we can obtain that Here, the fuzzy multiset is referred to as source or source fuzzy multiset; the multiplicity of different source fuzzy multiset is not considered by merge function.

The case is is not an upper bounded lattice. The normalization criterion that is needed when performing merge functions is usually omitted by fuzzy multiset theory. Therefore, we show another property below.

Property 3 (boundedness). A bounded merge function over should satisfy the following constraint:
It indicates that the merge function selects one of the elements from all the source sets. A corresponding inference is that
This inference explains that any element not belonging to any source set should not exist in the outcomes of a bounded merge function. We could easily get this natural property just from the observation, because element with membership degree should not be mixed into a solution arbitrarily. Also, it is a weaker notion of preservation. Besides, we also formulate the enforcing preservation: Then, Property 3 is equivalent to indicating

Paper [11] has pointed out that keeping the weaker version of Property 2 in the situation of multiset is advantageous. They take multidocument summarization (MDS) as an example to explain that keeping a strict preservation would lead to a bad result in practical situations, that is, one of the documents itself would be the summary of the entire document set. While the task of document fusion (DF) is to generate a text containing all the information in entire document set. So, a weaker version of Property 2 is also advantageous in our framework. The bounded merge function of fuzzy multiset will be further elaborated in subsequent sections.

Theorem 12. The functions , , and are bounded.

Proof. We can get the proof from the case that for any

4. Optimal Merging of Fuzzy Multisets

4.1. Quality Measures

The purpose of defining quality measures is to construct the merge functions that could get good performance for object merging in multiple fuzzy multisets. On one hand, the behavior of the merge functions we defined could be characterized by the value of the quality measures. On the other hand, adjusting a merge function could also optimize a balance between accurateness and completeness of a given solution to get a higher value of quality measures. The relationship between the merge functions and quality measures can be shown in Figure 1.

Within the scope of our paper, we adopted two quality measures widely used in the text mining literature: the first one is purity [13], and the second one is entropy [14, 15]. Information entropy is a concept used to measure the amount of information in the information theory, which is often taken as a measure of “disorder;” that is, the higher the value of entropy, the higher the extent of disorder; information purity is a measure of correlation between a system and its environment, where a higher value of purity means that a system is more relevant to its environment. Both of the two measures fall into range interval . Basically, the maximum purity and minimum entropy of results are the goals we try to achieve. Nevertheless, when we try to analyze the effect of a merge function, we should be able to analyze the effect at fundamental level of the elements. So, some local quality measures will be introduced first.

Definition 13 (local precision). Given a multiset , the local precision of the element could be defined as such that The local precision judges the accurateness of adding the element with the membership degree into the solution. Here, is a multiset of sources. judges the proportion of fuzzy multisets where the membership degree of element is .

Example 14. Given the multiset in Example 11. The local precision of adding an element into a solution with membership degree 0.5 is calculated. When , , , , and :

Property 4 (monotonity of ). Local precision is a decreasing function in accordance with the membership degree threshold : The monotonity of is a natural property. The lower membership degree means more sources will be added into the solution, owing to the fact that higher membership degree indicates relative simple relations related one concept (say the synonym of one word), and lower membership degree indicates more unspecific and more layered descriptions concerning one concept. As a result, we will get more complete information with higher precision.

Definition 15 (purity). Purity is computed using the maximal local precision value for each element in the solution as follows: such that

Example 16. Given the multiset of Example 11 and solution , we could obtain the local precision of all elements in this solution Note that we can also set of each element in solution a different value.
Then, we get the purity

Definition 17 (local entropy). The local entropy of each fuzzy multiset in is calculated as such that

Property 5 (monotonity of ). Local precision is an increasing function in accordance with the degree of membership threshold when , a decreasing function in accordance with the degree of membership threshold when :
Property 5 implies that the variation trend of local entropy is impacted by both fuzziness and proportion of an element in a solution; that is, neither excessively detailed or excessively brief information, nor more sources or less sources contained in the solution is appropriate to enrich the information of a fusion system. The proofs of these natural properties are omitted here. Back to our approach, the important connection exists between local precision and local entropy is also reflected by this property.

Definition 18 (total entropy). The total entropy of is calculated as such that

Example 19. Given the multiset of Example 11 and , we obtain the local precision of all elements in this solution Then, we get the total entropy

The purity and entropy can, respectively, express the quality measures, but the variation scales between them may be unequal. As mentioned above, the maximum purity and minimum entropy of results are the goals we try to achieve. Therefore, we try to investigate an index with the similar variation scales.

Definition 20 (validation index value). Given a multiset of sources , the VI-value of the validation index is calculated as such that

Next, the rationality of this index will be shown. Generally, a brilliant result is generated by the higher value of the purity and the lower value of the entropy. That is to say, if the discrepancy between these two values is large, the value of the validation index is large and a good result can be determined by this validation index. That is to say, a balance between purity and entropy is expressed by validation index. In the case where the variation scales of these two values are similar, we propose a constant value which could change the similar variation scales of purity value and entropy value. In practice, we determine the most significant singular values by selecting the best VI, and it is kind of an empirical value which could be achieved during the simulation and modified through iterated procedure. But how to determine the value of this constant is not the problem we really care about now, we have not discussed this problem deeply in this paper. In our future work, we will explore this problem deeply with experimental analysis.

Note that for any solution , if and only if the local precisions of all elements in this solution differ from zero.

4.2. Optimization of Quality

The effect of a merge function can be judged by quality measures introduced in previous phase. And then we try to investigate the solutions optimizing the values of the quality measures. This type of optimization problem also appears in other research fields, paper [27] utilized the transitive closure as the effective mechanism transforming a matrix into fuzzy equivalence relation, by this way, finding the approximate partitions of data sequences. It is a classic example in the field of fuzzy set theory. Another example involved searching approximate minimum-distance by transforming a fuzzy reciprocal relation with a transitive reciprocal relation [28]. That is to say, the optimization mechanism could not be one of a kind. At the next step, we will concentrate on maximum quality generated from VI-value (the maximization of the purity and the minimization of the entropy). The difficulty of this step is to find the solution which gets the best VI-value. Therefore, the main task here is to define and investigate a suitable merge function.

Definition 21 (VI-value merge function). A VI-value merge function over should satisfy the following constraint: constrained by
At this step, some properties of VI-optimal merge function will be studied further. A notable point is that there may appear several solutions sharing one maximum VI-value. With the definition of the merge function, how to select the unique solution is an important task here. Therefore, a selection criterion that selects one solution from the optimal solutions set is needed when performing these merge functions. With the special application area of OMFM, we will show the details in illustrative examples. Another problem is a solution that has does not always exist. Hence, the notion of invalid solution is given below.

Definition 22 (invalid solution). Assume a VI-optimal merge function and a fuzzy multiset of sources .
A multiset is defined as an invalid solution of if A solution of a VI-optimal merge function that is not invalid is called avalid solution. Notice the differences between invalid solution and valid solution. Then, we will introduce another significant theorem.

Theorem 23. Any solution that is a real subset of the source intersection or a real superset of the source union has that

Proof. Assume a fuzzy multiset of source .(1)A solution that satisfies Also it satisfies Owing to the case of all the elements of the solution would generate a local precision equivalent to 0, then (2)A solution that satisfies also satisfies Owing to the case of all the elements of the solution would generate a local precision equivalent to 0, then

The conclusion here is that a valid solution of VI-optimal merge function should include the intersection of the sources and should be included by the union of the source. In view of this point, we define the intersection of the sources as the lower bound and the union of the sources as the upper bound. The formalized definition is shown as where the lower bound is denoted as and the upper bound is denoted as . Hence, we shall only consider solution that satisfies in the following section.

Theorem 24. An VI-optimal merge function is idempotent.

Proof. Assume the fuzzy multiset satisfies .
Thus, for , we have that The corresponding proof is also shown when applying the previous theorem.

Theorem 25. A VI-optimalmerge function is bounded.

Proof. With Theorems 12 and 23, we could get this conclusion.

An important point is that VI-optimal merge functions do not satisfy the property of preservation. Nevertheless, due to the theorem we just proved above, they are bounded undoubtedly and boundedness offering a weaker version of preservation is shown in previous section. Besides the theorem of boundedness, several interesting theorems relevant to VI-optimal merge function need to be mentioned here. One of them is the theorem of VI-optimality invariance when scaling of multiplicity of the sources below.

Theorem 26. Assume a fuzzy multiset and a merge function . A conclusion could be got that

Proof. Several facts could be got that And on the other hand,

Theorem 27. Assume a fuzzy multiset is a VI-optimal merge function and a scaling parameter . If the solution is VI-optimalwith regard to the sources , then is VI-optimal in regard to the sources .

Proof. We could get the corollary in last theorem.

5. An Application: Document Fusion with Illustrative Example

5.1. Document Fusion

One possible application for this fuzzy multiset framework is document fusion. It involves the merging of elements with the different relations embedded. When it comes to document fusion, we have to introduce multidocument summarization briefly. Document fusion and multidocument summarization are two relevant areas. The important difference between these two areas is that, for multidocument summarization, the main task is to generate the shortest description containing the most relevant information, while for document fusion, the focus is to generate the shortest description containing all information contained in the whole document set excluding the redundancy [15, 16]. It is like that multidocument summarization is the intersection of the documents and document fusion is the union of the documents. Unlike multidocument summarization system, there is no organization like DUC (Document Understand Conference) [29] providing “ideal” datasets for document fusion research yet, with which multiple documents under same subject and ideal summarization results for testing can be achieved. In addition, intrinsic and extrinsic evaluations in multidocument summarization system could not be suitable in fusion task: intrinsic evaluation where evaluation is done by human on accessing the quality of the fused documents itself makes the evaluation process subjective [30], and on the other hand, the difficulty in intrinsic evaluation of document fusion systems is that there is no existing collection of human written fusion results of multiple documents, serving as a gold standard for such evaluations by now; and extrinsic evaluation where the result of the document fused is evaluated by the completion of a specific task makes the evaluation process more complicated. Thus, there are no standard methods used to estimate the work in fusion task like in some document summarization tasks [31–33]. Given the problems we mentioned above, the evaluation that we performed is limited to date. To demonstrate our work, an example of an article cluster concerning the spoilage problem complaints of the dairy products on a particular brand has been selected from “315 consumption complaint” website to show the general fusion process and results by utilizing our framework. Although we use Chinese text for illustration, it is worth mentioning that there is not any fundamental difference between Chinese and English or other language under this framework.

The work of our paper is to propose a framework for document fusion, so we are not only aiming to get keywords, but for comprehensive information. Here, we just try to consider the situation of fuzzy multiset. With such extensions, the membership degree could be used to show the importance and fuzziness of an element, which makes the document representation more granular and semantically richer than multiset merging model in paper [11]. Assigning different weights to the same element also makes sense, when considering the situation that semantic-related terms with different semantic relations are used to identify the concept, which is semantically richer than just using words. Under our framework, semantic methods and statistical methods could be combined and used in many domains.

5.2. Illustrative Example

The main processing that needs to be performed is to get the Extra Strong, Strong, and Medium Strong relations of every concept in each article by using HowNet [34]. As a common-sense knowledge base, HowNet unveils interconceptual and interattribute relations of concepts. In HowNet, every concept of a word or phrase and its description form one entry with relations such as hypernym, hyponym, synonym, antonym, meronym, and Holonym (descriptions for these relations could be seen in Table 1), existing in HowNet and presented in DEF (concept definition) as shown in Box 1.

When performing English text, a large lexical database of English, WordNet, could be used to identify these relations instead of HowNet. Here, textual intention structure is determined by three relations of every concept. As an indicator in linguistic segments, three relation segments of every concept tend to indicate the theme segments. That is to say, once three relations of every concept have been confirmed, the corresponding linguistic segments will have determinate tendency. Each concept is defined as an element in fuzzy multisets and the three relation segments that are included in each concept determine the different membership degree of each element as shown in Table 2.

In the following example, three semantic segments concerning different concepts , , and will be used in merge process. The corresponding fuzzy multisets obtained from the source set are shown below: where denotes there are Extra Strong semantic relations identified in concept .

As we’ve mentioned above, document fusion is to produce the shortest description containing all information found within the document set, but without repetition. The solution that we need is the solution concluding all the key concepts (, , and in this example) and these concepts are constructed by three relations (Extra Strong, Strong, and Medium Strong relations). Let us consider the solution . We get the local precision of all elements in this solution such that And we calculate the local entropy such that Then, we have got the validation index with setting the constant value :

On the same principle, we present the local precision, purity, local entropy, entropy, and VI-value of the solutions with only one semantic relation embedded in each concept. If every concept is treated of equal importance with single relation embedded, in this case, we have got a maximal VI-value 0.355 (see Table 3).

As mentioned in former section, a valid solution of a VI-optimal merge function should follow the constraint below:

We should only consider solution that satisfies the constraint mentioned above in practical application. So, more complicated solutions could also be considered. For example, when concept is an important concept which needs to be described explicitly in the fusion result, more details as Medium Strong relation in Table 2 should be contained to construct the description. If the concept is not treated of equal importance with single relation embedded, in this case, we’ve got two maximal VI-value of 0.491 (see Table 4). Generally, for any solution , we can calculate the VI-values and choose the solution based on the observation of maximal VI-value. Within the scope of this paper, a tie-breaking criterion does not always exist, the accessorial choice criterion that helps selecting a solution from the set of optimal solutions is necessary. From the observations of and , we need to decide the merging function by actual requirement or finding a new solution with more semantic relations embedded.

More examples with considering concept and concept as important concepts which needs to be describe explicitly in the fusion result, will be shown in Tables 5 and 6. When concept and concept are with similar scale of multiple relations embedded, we could get equivalent maximum VI-value 0.410 from and , which means these two strategies get the same effect. If only one concept is with multiple relations embedded, we should consider to and get the maximum VI-value 0.491 from or . Both the situations needs to be further selected by considering the specific application: For and , two solutions selected strong and medium strong relations for concept and , so the importance of concept should be further considered, that is to say, if concept is more importance for fusion, should be selected. For or , when the fusion results need to be described with more details, corresponding solution should be selected; otherwise, would also be a candidate selection for fusion.

Figure 2 shows the distribution of the VI-values with different scales of semantic relations embedded using box-and-whisker plot and evaluates the effectiveness of the merge function. The observations on partial of VI-values are presented here. The top bar stands for the maximum observation value; the bottom bar represents the lowest observation value. The bottom of the box is the lower quartile with 25% of data less than this value, the top of the box is the upper quartile with 25% of data greater than this value, and the red line in the box is the median observation value with 50% of data greater than this value. The -axis presents the functions we examined (single relation embedded in single concept as shown in Table 3 multiple relations embedded in single concept as shown in Tables 4, 5, and 6, and the upper solution and lower solution). The -axis presents the VI-values. By observing the range of VI-values, we see that, for all VI-values, there is a wide range of values that are achieved by different merge functions. Thus, simple strategies are not likely to work well. Observing the performance of the multiple relations embedded strategy, we see that multiple relations embedded surpassed over only one relation embedded function, as it produces very good results on VI-values we examined. Thus, it is enough to optimize merge function using different scale of multiple relations embedded in concept, to obtain high quality results.

We have not performed all the merge functions with corresponding VI-values in our illustrative example yet. The data and explanation we give above is just to show the proposed framework vividly. Now, we’ve got the conclusion that we could get a corresponding VI-value for any solution. On the other hand, we could use the VI-value to select the best solution. As our framework is flexible enough to generate shortest description containing all information found within the document sets with different levels of details depending on practical requirement. With this framework, neither the relations between the keywords nor the context of co-text will be lost. To generate a moderately fluent semantic fusion result from a collection of documents, sentence planning and regeneration are then used to combine the segments together to form a coherent whole. In our paper, the framework solved basic issues on developing a document fusion system. The documents are fused on more levels of granularity, as we assign different weights to different semantic relations embedding in the same element of concept. Meanwhile, taking into account semantic relations in the fusion progress ensures the readability of the fused document.

6. Conclusion and Future Work

We have presented a framework OMFM to map the fuzzy multisets of objects into one object. Our framework for merging multiple fuzzy multisets of documents is an interesting work, where a document set is modeled as a multiset of documents and each document is modeled as a fuzzy multiset of concepts. Also, OMFM is an extension of the work in paper [11], which could describe both uncertainty and repeatability of the natural language by using the membership degree as the semantic fuzziness of the objects. The quality measures widely used in the text mining literature are defined to quantify the result of a merge function: purity (a measure of correctness) and entropy (a measure of completeness), where the maximum purity is got by upper solution, and the minimum entropy is got by lower solution. Then, we have constructed VI-optimal merge function to get the best solution, where both the higher purity and the lower entropy could be achieved simultaneously. Moreover, we have proved the properties related to constraints of merging problem. Finally, how to settle the problem in document fusion application using OMFM has shown the practicality and effectiveness of our work.

With comparatively higher theoretical value and prospect of application, object merging problem will become a hot spot of research in many domains. Future work will further focus on experimental research and applying this framework to solving more relevant problems.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research has been supported by the National Natural Science Foundation of China (Grant no. 61472049), the National Natural Science Foundation for Young Scholars of China (Grant no. 61300148), and the Key Scientific and Technological Project of Jilin Province (Grant no. 20130206051GX). Besides, Dr. Lin Yue has been awarded a scholarship under the State Scholarship Fund to pursue her study at the University of Queensland as a joint Ph.D. Student, and this work also has been awarded by China Scholarship Council (CSC). Finally, The authors really appreciate the anonymous reviewers for their constructive comments which have made substantial improvements to this paper.

References

D. Dubois and H. M. Prade, Fundamentals of Fuzzy Sets, Kluwer Academic, 2000.
View at: Publisher Site | MathSciNet
R. R. Yager, “On ordered weighted averaging aggregation operators in multicriteria decisionmaking,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 18, no. 1, pp. 183–190, 1988.
View at: Publisher Site | Google Scholar | MathSciNet
M. Mursaleen and A. K. Noman, “Applications of Hausdorff measure of noncompactness in the spaces of generalized means,” Mathematical Inequalities & Applications, vol. 16, no. 1, pp. 207–220, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
G. Beliakov and S. James, “On extending generalized Bonferroni means to Atanassov orthopairs in decision making contexts,” Fuzzy Sets and Systems, vol. 211, pp. 84–98, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
R. R. Yager and A. Rybalov, “Uninorm aggregation operators,” Fuzzy Sets and Systems, vol. 80, no. 1, pp. 111–120, 1996.
View at: Publisher Site | Google Scholar | MathSciNet
J. Lin and A. O. Mendelzon, “Merging databases under constraints,” International Journal of Cooperative Information Systems, vol. 7, no. 1, pp. 55–76, 1998.
View at: Publisher Site | Google Scholar
S. Konieczny and R. P. Pérez, “Merging information under constraints: a logical framework,” Journal of Logic and Computation, vol. 12, no. 5, pp. 773–808, 2002.
View at: Publisher Site | Google Scholar | MathSciNet
A. M. Pitts, “Nominal logic, at first order theory of names and binding,” Information and Computation, vol. 186, no. 2, pp. 165–193, 2003.
View at: Publisher Site | Google Scholar | MathSciNet
S. Destercke, D. Dubois, and E. Chojnacki, “Possibilistic information fusion using maximal coherent subsets,” IEEE Transactions on Fuzzy Systems, vol. 17, pp. 79–92, 2009.
View at: Publisher Site | Google Scholar
K. McKeown, R. J. Passonneau, D. K. Elson, A. Nenkova, and J. Hirschberg, “Do summaries help?” in Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '05), pp. 210–217, ACM, August 2005.
View at: Publisher Site | Google Scholar
A. Bronselaer, D. van Britsom, and G. de Tré, “A framework for multiset merging,” Fuzzy Sets and Systems, vol. 191, pp. 1–20, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
A. Bronselaer and G. D. Tré, “Aspects of object merging,” in Proceedings of the Annual North American Fuzzy Information Processing Society Conference (NAFIPS '10), IEEE, July 2010.
View at: Publisher Site | Google Scholar
P. Pantel and D. Lin, “Document clustering with committees,” in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2002.
View at: Google Scholar
M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” in Proceedings of the KDD Workshop on Text Mining, Boston, Mass, USA, 2000.
View at: Google Scholar
V. Dodonov, “Purity- and entropy-bounded uncertainty relations for mixed quantum states,” Journal of Optics B: Quantum and Semiclassical Optics, vol. 4, no. 3, pp. S98–S109, 2002.
View at: Publisher Site | Google Scholar
C. Monz, “Document fusion for comprehensive event description,” in Proceedings of the Workshop on Human Language Technology and Knowledge Management, Association for Computational Linguistics, 2001.
View at: Google Scholar
J. Azzopardi and C. Staff, “Fusion of news reports using surface-based methods,” in Proceedings of the 26th International Conference on Advanced Information Networking and Applications Workshops (WAINA ’12), pp. 809–814, Fukuoka, Japan, March 2012.
View at: Publisher Site | Google Scholar
D. R. Radev, “A common theory of information fusion from multiple text sources step one: cross-document structure,” in Proceedings of the 1st SIGdial Workshop on Discourse and Dialogue, vol. 10, Association for Computational Linguistics, 2000.
View at: Google Scholar
L. A. Zadeh, “Fuzzy sets,” Information and Computation, vol. 8, pp. 338–353, 1965.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
H.-J. Zimmermann, Fuzzy Set Theory—and Its Applications, Kluwer Academic, 4th edition, 2001.
View at: Publisher Site | MathSciNet
S. Miyamoto, “Multisets and fuzzy multisets,” in Soft Computing and Human-Centered Machines, pp. 9–33, Springer, 2000.
View at: Google Scholar
J. Casasnovas and F. Rosselló, “Scalar and fuzzy cardinalities of crisp and fuzzy multisets,” International Journal of Intelligent Systems, vol. 24, no. 6, pp. 587–623, 2009.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
T. Y. Nishida, “Multiset and K-subset transforming systems,” in Proceedings of the Workshop on Multiset Processing, pp. 203–217, 2000.
View at: Google Scholar
S. Miyamoto, “Fuzzy multisets and their generalizations,” in Multiset Processing, vol. 2235 of Lecture Notes in Computer Science, pp. 225–235, Springer, Berlin, Germany, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
J. Lin and A. O. Mendelzon, “Knowledge base merging by majority,” in Dynamic Worlds, pp. 195–218, Springer, Berlin, Germany, 1999.
View at: Google Scholar
A. Bronselaer, G. De Tré, and D. Van Britsom, “Multiset merging: the majority rule,” in Eurofuse 2011, pp. 279–292, Springer, Berlin, Germany, 2012.
View at: Publisher Site | Google Scholar
Y.-J. Wang, “A clustering method based on fuzzy equivalence relation for customer relationship management,” Expert Systems with Applications, vol. 37, no. 9, pp. 6421–6428, 2010.
View at: Publisher Site | Google Scholar
S. Freson, H. de Meyer, and B. de Baets, “An algorithm for generating consistent and transitive approximations of reciprocal preference relations,” in Computational Intelligence for Knowledge-Based Systems Design, vol. 6178 of Lecture Notes in Computer Science, pp. 564–573, 2010.
View at: Publisher Site | Google Scholar
http://duc.nist.gov.
R. Barzilay, K. R. McKeown, and M. Elhadad, “Information fusion in the context of multi-document summarization,” in Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, Association for Computational Linguistics, 1999.
View at: Google Scholar
Y. Chen, X. Lou, and J. Pan, “Research on multi-document summarization using lexical cohesion,” in Proceedings of the International Conference on Web Information Systems and Mining (WISM '09), pp. 118–122, Shanghai, China, November 2009.
View at: Publisher Site | Google Scholar
C. Dang and X. Luo, “WordNet-based document summarization,” in Proceedings of the 7th WSEAS International Conference on Applied Computer & Applied Computational Science (ACACOS '08), 2008.
View at: Google Scholar
K. R. McKeown, J. L. Klavans, V. Hatzivassiloglou, R. Barzilay, and E. Eskin, “Towards multidocument summarization by reformulation: progress and prospects,” in Proceedings of the 16th National Conference on Artificial Intelligence (AAAI '99), pp. 453–460, July 1999.
View at: Google Scholar
http://www.keenage.com/.

Copyright

Copyright © 2014 Lin Yue et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1100

Downloads

819

Citations