Abstract

Functional dependency is the basis of database normalization. Various types of fuzzy functional dependencies have been proposed for fuzzy relational database and applied to the process of database normalization. However, the problem of achieving lossless join decomposition occurs when employing the fuzzy functional dependencies to database normalization in an extended possibility-based fuzzy data models. To resolve the problem, this study defined fuzzy functional dependency based on a notion of approximate equality for extended possibility-based fuzzy relational databases. Examples show that the notion is more applicable than other similarity concept to the research related to the extended possibility-based data model. We provide a decomposition method of using the proposed fuzzy functional dependency for database normalization and prove the lossless join property of the decomposition method.

1. Introduction

Database normalization plays a crucial role in the design theory of relational database to avoid insertion and deletion and update anomalies in a database. The database normalization involves decomposition of a relation schema (table) into several smaller ones. The essential requirement of the decomposition is lossless join property, which ensures that the original relation can be obtained from its decomposed results via combination operations [1]. Several methods have been proposed to design normalized relation schemes based on the keys and functional dependencies of a relation to achieve lossless join decomposition [2, 3]. The design theory has been applied to fuzzy databases, in which uncertain and imprecise information can be represented and manipulated. The fuzzy databases are extended from the classical databases based on fuzzy sets and possibility theory [4], and they can be resemblance-based fuzzy model [5, 6] and possibility-based fuzzy model [7, 8]. In the context of fuzzy databases, fuzzy functional dependency (FFD) has emerged to extend the classical functional dependency to represent functional relationships between classes/attributes of objects for fuzzy database models. Various FFD definitions have been proposed in some fuzzy data models for database normalization [9, 10].

However, very few research methods discuss lossless join property for the normalization in possibility-based fuzzy databases. To achieve lossless join decomposition by using FFDs for the possibility-based fuzzy databases is more difficult than for the resemblance-based fuzzy databases, especially for extended possibility-based fuzzy database. The extended possibility-based fuzzy database [7] is an extension of possibility-based fuzzy database [8] by including a resemblance-based fuzzy model [6]. In the fuzzy database, attribute values could be the possibility distributions of the attribute on its domain. Additionally, the elements in a domain have some degree of resemblance. Previous work has applied FFDs on the decomposition for the fuzzy database [1012]. Informally, these FFDs are based on a certain degree of similarity between two attribute values. Namely, two tuples that are similar but not identical might be regarded as redundant. Applying the similarity-based FFDs on relation decomposition prompts the difficulty for lossless join decomposition on two facets: (i) redundancy removal: how to eliminate redundant tuples that are not identical from the decomposed results so that the results can be, later on, used to produce the original relation without losing information and (ii) tuple merging: how to combine two relations via merging their tuples of which attribute values are similar but not identical.

Complicating this problem further, most similarity measures [7, 13, 14] of values in the form of possibility distribution are not transitive. When tuple redundancy is determined by the similarity measures of nontransitivity, the result of eliminating redundant tuples from a decomposed relation might be not unique (or order sensitive1). An inconsistent data redundancy removal not only leads to unstable results of data integration, as described in [15], but also causes decomposition results not lossless. When the decomposition result of a relation is not unique, the combination of the result will have many different outcomes, at least one of which is different from the original relation. Accordingly, the decomposition inevitably violates the lossless join property. Moreover, the nonunique results occur for relation combination when the attribute values to be joined/merged have similarity relation of nontransitivity.

To avoid nontransitivity, Chen et al. provided FFD with embedded classical FD [11, 16], where redundancy removal is restricted to duplicate tuples. But, this restriction draws the normalization process back to the traditional operations of crisp data. To obtain transitive relationship among tuples, some research applied the max-min transitive closure on the relationship matrix of similarity degree between tuples [17]. The max-min transitive closure of a relationship matrix must be a matrix with max-min transitivity [18]. By referring to the transitive closure of the relationship matrix, the tuples which have similarity higher than a given threshold can be grouped into disjointed sets. The tuples in the same set were regarded as redundant. However, this approach cannot determine the similarity of two tuples by merely examining these two, and the similarity is changed by inserting or deleting other tuples. The nondeterministic and dynamic characteristic is not applicable to the practice of databases.

To our knowledge, very few studies provide a complete guideline to perform normalization that ensures lossless join decomposition in the fuzzy databases. Therefore, the purpose of this study is to fill up this gap. This study first proposes a notion of approximate equality which represents the transitive equivalent relation among tuples. Then, it provides new definition of FFD and lossless join decomposition based on approximate equality for the fuzzy databases. Both functional dependencies and lossless join decomposition in a traditional database are special cases in this proposal. Examples show that the notion is more applicable than other similarity concepts to the research related to the fuzzy databases. This work also provides the method of achieving the lossless join decomposition for the fuzzy databases.

The remainder of this paper is organized as follows. Section 2 gives a brief introduction to database normalization and fuzzy database and the survey of the similarity measures related to the fuzzy database. Section 3 demonstrates the problem of using nontransitive similarity measures for determining tuple redundancy and provides a notion of approximate equality for it. The FFD is then defined based on the approximate equality in Section 4. Besides, the lossless join decomposition is proposed for the fuzzy databases, and its property is proven as well. Section 5 draws the conclusion of this paper.

2. Preliminaries

This section first briefly reviews the essential operations for lossless join decomposition in traditional databases. Then, it introduces the fuzzy databases considered in this work and the similarity measures of values in form of possibility distribution.

2.1. Essential Operations for Lossless Join Decomposition

In traditional relational database, a row is called a tuple; a column header is called an attribute; and the table is called a relation. Given an -ary relation schema , an instance of denoted by is the set of all tuples in . Let denote a set of attributes . A functional dependency FD existing in represents the tuples having the same values on attribute that must be identical on , where . Two operations are related to the lossless join decomposition: projection and natural join. The operation projection generates a result by selecting certain attributes from given relation and removing redundant tuples. Let denote a set of attributes in ; that is, . The result of projection over attributes is , where represents the composite of values on in tuple . The natural join (denoted by ) of and is obtained by removing duplicate attribute from the results of equal join on joined attribute and is denoted as shown below: The natural join and projection operations are, respectively, used to combine and decompose relations.

Formally, decomposition of is lossless join if equation holds. In other words, the lossless join decomposition ensures that the combination of the decomposed results of a relation has no spurious tuple or missing tuple to the relation via natural join operation [1].

For example, given a relation , the results of and are shown in Table 1.

In this case, the decomposition of has lossless join property because the natural join result of and is exactly the same as (as shown in Table 2).

2.2. The Fuzzy Databases

In last two decades, fuzzy concepts have been incorporated in traditional databases [5, 8, 19] and applied to measure the relation between data [2022]. The fuzzy databases enable dealing with imprecision and uncertainty in the real world based on the theory of fuzzy sets and possibility distribution theory. The possibility-based fuzzy theory has been widely applied in environmental management, such as flood-diversion planning [23], water resources management [24], and air quality management [25]. This work considers the extended possibility-based databases proposed by Chen et al. [7] because it can capture both the possibility-based fuzzy model and the resemblance-based fuzzy concept. The fuzzy database has drawn much attention of research on semantic measures, information processing, update operation, and UML class diagram therewith [20, 26, 27]. The data model of the fuzzy databases is a hybrid of a possibility-based data model in [8] and a resemblance-based data model in [6]. The possibility-based model derives from Zadeh’s fuzzy theory. In the theory [4], a fuzzy set on a universe of discourse is described by , where is a membership function for the fuzzy set and denotes the degree of membership of in . In a possibility-based database [8], the value of an attribute on a domain is a possibility distribution , where denotes the possibility that is the actual value of . For example, and . An example of applying the possibility-based fuzzy theory2 in real world is shown below. Consider a domain of attribute “eye color” is {black, brown, blue, green} and a possibility distribution is given below: Suppose that John’s eye color is an “Asia color.” Then, according to the interpretation for possibility-based fuzzy theory, one concludes that the possibility of John’s eye being brown blue color is 0.3.

In the extended possibility-based database, attribute values are represented by possibility distributions of an attribute on its domain, and a domain is associated with a similarity relation of domain elements. Formally, an -ary relation instance on a schema in the fuzzy database is a subset of Cartesian product of , where represents a set of all possibility distributions of attribute on its domain. For a domain , a proximity relation is given to describe the resemblance between domain elements in . A proximity is a mapping with reflexivity and symmetry; that is, and . The elements in a domain cannot directly be partitioned into disjoint equivalent classes by a threshold cutting on the proximity relation for the domain elements.

To acquire equivalent classes of a proximity relation on a domain, Shenoi et al. [18] proposed -proximate relation. Two elements are -proximate (denoted by ) if or there exists a sequence , Such that . Given a proximity relation and a threshold for domain , the domain can be partitioned into disjoint subsets (called -proximate equivalent classes) such that the elements in a partition are -proximate. The equivalent classes are regarded as basic concepts for the methods being reviewed or proposed hereinafter.

By extending traditional functional dependency, research has proposed variety of fuzzy functional dependencies (FFDs) for fuzzy databases [14, 21, 28]. The FFDs are determined by the degree of similarity of attribute values rather than by the identity. Several similarity measures of attribute values are proposed for the extended possibility-based frameworks [7, 16, 20, 29]. Most of them provide the estimates within the interval . The similarity measures are briefly restated hereinafter, in which and , respectively, denote the proximity relation and a threshold defined on a given domain ; and represent two possibility distributions on . The degree of closeness between and , denoted by , is defined as follows [29]: where .

The measure of may give low degree of similarity for two values that are very similar to each other, for example, . To prevent some counter-intuitive estimates of , Chen et al. defined the possibility that is true as shown below [7] (here denotes minimum): This assessment is widely adopted in the extended possibility-based databases and is adoptable for the application with subnormal distribution (i.e., , or see [4] for details). For normal distribution, Chen et al. [16] included identity relation (denoted by ) into (4) as follows:

Ma et al. defined the similarity measure from the perspective of the semantic closeness between two attribute values [20] as shown below: where denotes a semantic inclusion degree. Consider the following: The notion of may violate the convention that the similarity degree of two values lies within ; for the case that and when the similarity of “excellence” and “good” is larger than the given threshold; that is, . It is difficult to set up a proper threshold for estimates that range out of , having an unpredictable upper bound.

Liu et al. [13] extended the semantic equivalence to ensure that the result of similarity measure lies within . The measurement adjusts the possibility distributions of values based on  -proximate equivalent classes of the domain before measuring their similarity. Let be the -proximate equivalent classes of domain . The adjusted value of possibility distribution is defined as follows: where and , . Then,

Although the methods mentioned above differ from each other on measuring similarity of attribute values, most of the methods of measuring the similarity of tuples are the same. The methods adopt the minimum of the similarity of each pair of attribute values. Given tuples and , the resemblance of tuples and , denoted by , is given by where could be either , , , , or in (3)–(9). Tuples and are redundant to each other if , where is a given threshold. The similarity measure of tuples has been applied to extract representative tuples for reducing information redundancy [17].

Fuzzy functional dependency (FFD) is a concept derived from traditional FD. Both FFD and FD have several applications on databases, for example, redundancy elimination [30], missing data prediction, fuzzy data compression [17, 31], and lossless join decomposition [10, 28, 32]. In literature, various FFDs are defined for different fuzzy data model. For some fuzzy data representation, FFDs are defined based on the equivalence classes of tuples, such as the similarity-based fuzzy data model [33]. In the extended possibility-based databases, the definition of FFD is also of variety, such as literature [10, 14, 34, 35]. One example among the FFD definitions in the literature is listed below.

Definition 1 (see [10], fuzzy functional dependency). Let denote that attribute is fuzzy functional which depends on attribute in a relation . The FFD: holds in the instance if and only if for every .

The example helps in understanding the problem of applying the FFDs on relation decomposition in the fuzzy databases illustrated in Section 3.

3. Redundancy Removal and Tuple Merging

Several factors determine whether the relation decomposition possesses the lossless join property. They are the ways to decompose a relation, to remove redundant tuples, and to combine the decomposed results. Redundancy removal is to eliminate redundant tuples. If the similarity measures used to measure tuple redundancy are not transitive, the result of redundancy removal could be nonunique. An example of nontransitivity is that tuples and are redundant to each other, and and are redundant as well, but and are not redundant. In this case, the result of redundancy removal will be if is deleted first, which differs from the one-tuple result (either or ) when first deleting the tuples other than . The nontransitivity makes the result of redundancy removal order sensitive and hinders the lossless join decomposition.

Nevertheless, most well-defined similarity measures [7, 10, 20, 29] for the values of possibility distribution are reflexive and symmetric but not transitive. For example, consider adopting (4) to measure the similarity of tuples. Given three values , , and , then we have , , and . Considering tuples , , and , we have and but for any according to (10). Thus, the similarity measure of tuples is not transitive.

In generalizing projection and equal join operations of traditional database to fuzzy databases, when the redundancy removal is order sensitive, it is hard to obtain lossless join decomposition. Consider the case that holds in the instance of relation based on Definition 1; namely, for every . Assuming that consists of three tuples , , and is a key attribute, it is possible that the two values in each of pairs , , , and are redundant to each other, but and are not. Since , should be decomposed to avoid redundancy. After decomposing to and , if tuple is first removed because it is redundant to tuple , the result of contains two tuples and . The natural join of and generates a four-tuple result , , , , which contains spurious tuple.

To resolve this problem, this study proposes the operations of projection and equal join for the fuzzy databases, which involves evaluation of redundancy and tuple merging. Since the decomposition of relations is based on FFD, it depends on the similarity of tuples. For the data in the fuzzy model, (3)–(9) can be used to measure the similarity of tuples and define FFDs in the fuzzy databases. However, (5) restricts redundant tuples to those duplicate. Equations (3), (4), and (6) lack transitivity. Therefore, this work adopts (9) and (10) to define approximate equality for the tuples that might not be identical but have high similarity degree. The approximate equality enables obtaining a unique result of redundancy removal.

Definition 2 (approximately equal tuples). Two tuples and are approximately equal, denoted by , if it is satisfied that .

In other words, tuples and are approximately equal if their similarity .

Lemma 3. The approximate equality of tuples (or attribute values) is transitive.

Proof. Based on (9), it is obvious that if and , then . Thus, if and , then based on (10).

The tuples of approximate equality are considered to be redundant to each other. The notion of approximate equality can be applied to query processing with the predicate containing fuzzy concept [36] for fuzzy databases in different models. For simplicity, we let denote hereafter.

Example 4. Given values and on domain and equivalent classes and for , the average possibilities of are and , yielding . Likewise, , , and . We have and . Thus, . Given on , we have even though is not identical to .

Proposition 5. The approximate equality can be used to classify values of the fuzzy database into disjoint sets (equivalence classes).

Proof. Based on the definition of (9), it is obvious that is reflective and symmetric; that is, and for values and . Besides, approximate equality is transitive according to Lemma 3. Therefore, two different sets of approximately equal values are either disjoint sets or same class sets, where any two of the values are approximately equal to each other.

The transitivity of similarity measure is important to any operation involving redundancy removal or tuple merging. Besides, the measure of transitivity can be applied to clustering methods or data groupings, such as the ones in [36, 37].

Proposition 6. Given and its adjusted value   following (8), .

Proof. It is obvious by the definition of (9).

Buckles and Petry first proposed the way of tuple merging and applied it to remove redundant tuples in a fuzzy database [5]. Tuple merging can also be used at join operation. This study extends the tuple merging of Chen et al. [16] to be (11) for relation combination as well as redundancy removal. Given tuples and , tuple merging of and , denoted by , is given by where each (or ) is the adjusted value of (or ) according to (8) and denotes fuzzy union. For single-value tuples and , tuple merging is alternatively denoted by .

Lemma 7. Let and be two possibility distributions on the same domain. If , then .

Proof. Based on (9) and (11), it is obvious that if , then and .

Based on the literature review and Lemma 7, we summarize the property of different similarity measures with threshold in Table 3 to show the merit of (9) adopted in this work.

4. Approximate Lossless Join Decomposition

This section first offers the operations for relation decomposition and combination. Then, it proposes a notion of approximate lossless join decomposition (ALJD), which incorporates fuzzy concepts into lossless join decomposition. It also provides the method to achieve the ALJD.

Similar to the works in [37], this study generalizes the projection and natural join operations in traditional database to the fuzzy databases, as below. Here, given a relation , denotes a set of attributes in (i.e., ), and denotes the composite of values in tuple over attribute . For example, given , , and , then .

Projection. Projecting the instance of relation on attributes , denoted by , is given by

Natural Join. Natural join instances of and , denoted by , are defined as follows: In (12) and (13), tuple redundancy is determined by approximate equality (e.g., ; see Definition 2), and both redundancy removal and tuple combination use tuple merging in (11).

Proposition 8. The projection result of a relation based on (12) must be unique.

Proof. It can be directly derived from Proposition 5.

Based on the operations (12) and (13), the ALJD is formally defined following the extension of approximate equality from tuple level to relation level in Definition 9.

Definition 9 (approximately equal relation instances). Two relation instances and in the fuzzy database are approximately equal, denoted by , if for every tuple , there must exist a tuple such that and vice versa.

Definition 10 (approximate lossless join). A composition of a relation in the fuzzy database is approximate lossless join if .

The approximate lossless join decomposition means the natural join of all decomposed results of a relation instance is approximately equal to the original relation instance. More specifically, every tuple in the original relation is approximately equal to one of tuples in the combination result.

Proposition 11. Consider the following:

Proof. It can be derived from (11) and (12).

Corollary 12. Consider the following:

Proof. It can be derived directly from Proposition 11.

The projection of a relation over the same schema, as shown in Corollary 12, represents no operations other than removing redundant tuples from the instance of the relation via tuple merging. Corollary 12 shows that the result of redundancy removal of a relation instance is approximately equal to the original instance. This property is essential for obtaining the combination result that is approximately equal to the original instance after relation decomposition.

This study proposes FFD for the decomposition in the fuzzy database as shown below.

Definition 13. The FFD holds in the relation instance if satisfies that, for every , ; if , then .

Remark 14. An FD in a traditional database is a special case of the FFD. If a FD holds in , then holds as well. It is because must be true for any if .

Lemma 15. Given relations and and , if satisfies a set F of FFDs, then satisfies .

Proof. Proof by contradiction: we assumed that and there exists an FFD such that satisfies and does not. Because exists in , Since does not exist in , there exists , such that and . Since , there exists such that and . Then, we have and but , which contradicts (16).

It is noted that the FFD in Definition 13 satisfies Armstrong’s axioms (inference rules), including reflexive rule, augmentation rule, and transitive rule3. This property enables the result of lossless join decomposition that has dependency preservation property4 [1].

Lemma 16. Let be a relation and be an FFD in . If and , then the decomposition of has approximate lossless join property.

Proof. We proved that based on Definition 10. Let be a relation such that . We first prove that, for all , there exist and and then prove that, for all , there must exist and . Proof by contradiction: let us assume that is the tuple such that no satisfies . Let and be tuples such that and . Based on Proposition 11, there must be and such that and because and . Since and , there must exist such that according to (13). Also, since and , we have by Lemma 7. Thus, , which contradicts the assumption.
Proof by contradiction for second part with renewed symbols: assume that is the tuple such that no satisfies . Since , there exist and such that , , and based on (13). Also, we let and be the tuples such that Since , based on Proposition 11, there must exist such that Based on (17) and (18), we have . Because , holds based on Definition 13. Thus, based on (19), and , which contradicts the assumption.

In Lemma 16, each one of , , and could be a single attribute or a set of attributes.

Definition 17. Let be the set of FFDs. An FFD in is trivial if there exists an FFD in such that .

Based on Armstrong’s inference rules, IR1 and IR3 (see endnote 2), if a set of FFD contains , the closure of will also contain and , which is trivial. When a relation is decomposed into more relations, it takes more join operations to obtain the original data for query process. Considering the cost of join operations, it is not efficient to decompose a relation that has already been in the third normal form. A relation is in the third normal form if there is no functional dependency between nonkey attributes in the relation [1]. Accordingly, the relation decomposition has two prerequisites as follows.(i)It needs to avoid decomposing a relation based on trivial FFDs.(ii)It needs to make sure that the decomposed result preserves the closure of FFDs in the original relation.For example, if is not a key in , then will be decomposed based on rather than on trivial FFD or . Based on Lemma 16 and Definition 17, we propose an algorithm for ALJD (see Algorithm 1).

Inputs R and F, where R is a relation and F is the set of FFDs exists in .
Step  1. Let and be the set of all that are not trivial.
Step  2. For a : ~> in , Let be the relation chosen from such that both and are in .
Step  3. If X is not a key attribute in , do followings:
   (1) decompose into and , such that and
   (2) let (remove from )
   (3) let .
Step  4. Go to Step  2 if is not empty.
Output is , the set of relations decomposed from .
Note: represents the list of all attributes in other than X.

In the ALJD algorithm, an FFD containing key attributes is excluded from the decomposition process at Step 3. This follows the concept of the normalization of traditional databases, where only the FD of nonkey attributes is considered. To have a consistent presentation of data, this work generalizes the definition of key attributes for the fuzzy databases; namely, an attribute is a key attribute in if there does not exist two tuples and in such that . The exclusion of processing FFDs containing key attributes can prevent unnecessary decomposing on the relations which have no update anomaly problem. Although the decomposition without the key exclusion is still an ALJD, it increases the cost of the join operations of query process.

Proposition 18. Let be a relation and let be an FFD in . If and , then (i) each FFD existing in or must exist in and (ii) every FFD existing in must either exist in or or be derived via FFDs in and .

Proof. Statement (i) can be derived by Proposition 11 and Lemma 15. Statement (ii) can be derived by Lemma 16 and the property of FFD (namely, Armstrong’s axiom IRs 1, 2, and 3, described at endnote 2).

The above statements show that the ALJD also preserves the closure of FFDs in the original relation, which is important to the issues related to the application of FFDs.

5. Conclusion

The contribution of this work is threefold. First, it highlights the problem of relation decomposition when tuple elimination is order sensitive. To overcome the problem, it proposes the notion of approximate equality for the tuples or relations in the fuzzy databases and provides the measure of the approximate equality. The measurement is reflexive, symmetric, and transitive. It enables classifying tuples into disjoint sets and ensures that a decomposed relation has unique result after redundancy removal or tuple merging. Therefore, the notion of approximate equality is important for data operations in the fuzzy databases. Second, it proposes approximate lossless join decomposition for the fuzzy databases and defines two operations projection and equal join for the decomposition, all of which are based on the approximate equality. The data operations and ALJD can be applied to the issue on data compression in the fuzzy databases. Third, this work defines FFDs and proposes an algorithm to decompose relations in the fuzzy databases based on the FFDs. The decomposition by the algorithm ensures the approximate lossless join property. The FFD and ALJD proposed for the fuzzy databases are, respectively, the general cases of the traditional FD and lossless join decomposition. The general property is important for dealing with the databases containing crisp data and fuzzy data. Forth, similar to the existing approaches of database normalization on resemblance-based fuzzy databases, this study provides several propositions to prove that the proposed approach of decomposition satisfies a degree of lossless join property. Compared to the normalization approaches for resemblance-based fuzzy databases, achieving lossless join decomposition for the extended possibility-based fuzzy databases is more difficult because of having more complex data.

There are some directions of future work. Future study can adopt the notion of approximate equality to define data operations for the query processing in the fuzzy databases. Research can apply the notion on the research related to data compression, fuzzy association rules, missing value prediction, relation compactness, and the integrity constraint in the fuzzy databases. Study aims to incorporate the fuzzy concept into clustering methods or data groupings for decision-making in marketing, healthcare applications, or business operations that can adopt the approximate equality for the similarity measures. Since the fuzzy concept has been incorporated into object-oriented databases in literature, future work can provide the approximate equality specifically for the data in the fuzzy object-oriented data models.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The author would like to acknowledge the support of National Science Council NSC102-2410-H-155-036-MY2 and Innovation Center for Big Data & Digital Convergence.

Endnotes

  1. Different orders on removing redundant tuples could lead to different results.
  2. For further details on possibility distribution and on the difference between possibility and probability measures, the reader is referred to [38].
  3. IR1 (reflexive rule): if ; IR2 (augmentation rule): if , then ; IR3 (transitive rule): if and , the (see [1]).
  4. Each FFD in either directly exists in some individual relations that decomposed from or can be represented via Armstrong’s inference rules of the FFDs in these relations.