#### Abstract

In many practical applications, it turns out to be useful to use the notion
of *fuzzy transform:* once we have functions
with
, we can then represent each function
by the coefficients
Once we know the coefficients , we can
(approximately) reconstruct the original function as . The original motivation for this transformation came from fuzzy modeling,
but the transformation itself is a purely mathematical transformation.
Thus, the empirical successes of this transformation suggest that
this transformation can be also interpreted in more traditional (nonfuzzy)
mathematics as well.
Such an interpretation is presented in this paper. Specifically, we show
that the 2002 probabilistic interpretation of fuzzy modeling by Sánchez
et al. can be modified into a natural probabilistic explanation of fuzzy
transform formulas.

#### 1. Introduction: Fuzzy Transform and the Need for Its Probabilistic Interpretation

##### 1.1. Fuzzy Transform: A Definition

The notion of a fuzzy transform (*F-transform*, for short) turned out to be very useful in many application areas such as image compression and solving differential equations under initial uncertainty; see, for example, [1, 2] and references therein.

Generally speaking, the F-transform of function is a vector with weighted local mean values of as components. The first step in the definition of the F-transform of is a selection of a *fuzzy partition* of universal set (e.g, a bounded interval on ) by a finite set of *basic functions *
which are continuous and satisfy the condition Basic functions are called *membership functions* of respective fuzzy sets, or, alternatively, *granules*, information pieces, etc. Their choice reflects the type of uncertainty which is related to the knowledge of .

Once the basic functions are selected, we define the F-transform of a continuous function as a vector , where F-transform satisfies the following properties [1, 2]: (i) minimizes , (ii)for a twice continuously differentiable function , , where is the length of the support of .

F-transform is used in applications as a “skeleton model” of . This model provides a compressed image if is an image [3], values of a trend if is a time series [4], a numeric model if is used in numeric computations (integration, differentiation) [5], etc.

Once we know the F-transform components , we can (approximately) reconstruct the original function as
In [1], the formula (3) is called the *F-transform inversion formula*. The formula (3) represents a continuous function that approximates . Under certain reasonable conditions, a sequence of functions represented by (3) uniformly converges to (see [1] for more details).

*Example 1. *Let us give an example of the F-transform of on the domain with respect to . For simplicity, we assume that basic functions are of triangular shape and constitute a uniform fuzzy partition of . Their analytical representation is as follows:
By (2), the values of the components of the F-transform are
Figure 1 provides a graphical representation of the basic functions , of the function , of its F-transform components , and of the inverse F-transform of .

**(a)**

**(b)**

##### 1.2. F-Transform: Original Motivation

The original motivation for F-transform came from fuzzy modeling [1, 2]. For example, in the situation corresponding to the inverse F-transform, we have rules

These rules are Takagi-Sugeno (TSK) rules with singleton (constant) right-hand sides. For TSK rules, the value corresponding to a given input is . Since , we get formula (3).

The purpose was to show that this type of modeling can be as useful in applications as more traditional techniques such as Fourier transform and wavelet transform. Moreover, F-transform has a potential advantage over Fourier and wavelet transforms: in contrast to the purely mathematical basic functions used in Fourier and wavelet transforms, the basic functions in a fuzzy partition usually come from natural language terms like “low” or “high” (for a detailed description of fuzzy modeling, see, e.g., [6, 7]).

Just like any other tool of applied mathematics, F-transform is not a panacea. It is more successful in some problems, and in other problems, it is less successful. It is therefore desirable to combine F-transform with other mathematical tools, so as to combine relative advantages of different techniques. For combining F-transform with other mathematical tools, it is desirable to come up with a purely mathematical (nonfuzzy) interpretation for this transform.

In particular, since most mathematical data processing tools are based on probability and statistics, it is desirable to come up with a probabilistic interpretation for F-transform.

##### 1.3. The Known Probabilistic Interpretation of Fuzzy Modeling Leads to a Probabilistic Interpretation of F-Transform

We have mentioned that F-transform was originally designed as a particular case of fuzzy modeling. A seminal paper [8] provided a reasonable probabilistic model for a particular case of fuzzy modeling. Specifically, this paper shows that if we use piecewise constant probability density functions for describing the output, then we get a particular case of a fuzzy model—the case when we use product for “and” and sum for “or.” Since F-transform corresponds to exactly this type of fuzzy modeling, we thus get a probabilistic model for F-transform as well.

##### 1.4. What We Do in This Paper

In this paper, we show that a modification of the probabilistic interpretation from [8] enables us to justify formulas of F-transform without making any additional assumptions about the probability distributions. In mathematical terms, this modification consists of using Bayes formulas—and making assumptions about *prior* distributions (a natural way to describe prior knowledge in statistics) instead of making assumptions about the *actual* distributions.

Thus, we get an even more natural probabilistic interpretation of F-transform. Specifically (i)the paper [8] shows, in effect, that * there exists* a reasonable probabilistic interpretation of the F-transform formulas; (ii)however, in principle, this interpretation leaves the possibility that * there exist other* equally reasonable assumptions about the probability distributions can lead to different formulas; (iii)in our modified interpretation, we show that the basic probabilistic setting * uniquely determines* the F-transform formulas—without the need to make any assumptions about the probability distributions.

We also show that a similar modification can be applied to the probabilistic interpretation of general fuzzy modeling formulas.

*Comment 1. *From the mathematical viewpoint, the resulting formulas are very similar to the formulas from [8] (with the exception of the Bayes formula step). However, in our opinion, this mathematically *minor* modification leads to a *major* change in interpretation: now, to probabilistic researchers, F-transform is (i)not just a possible model, corresponding to one of the possible reasonable choices of probability distributions, (ii)but the model uniquely emerging from the natural probabilistic setting.

Similar conclusion can be made about the probabilistic interpretation of more general fuzzy models. In other words, our minor modification uncovers an even deeper fundamental meaning of the probabilistic interpretation originally proposed in [8].

#### 2. A Natural Practical Problem that Leads to F-Transform

##### 2.1. Physical Setting: General Discussion

Let us assume that we have a physical process that is characterized by two quantities and , and we know that these quantities are related by a functional dependence .

In the ideal situation of complete knowledge, (i)we know the exact value of , (ii)we have the exact description of the function .

In this case, we can get the corresponding *exact* value of the second quantity.

In practice, we know the value with uncertainty, that is, several different values of are consistent with our knowledge. We must therefore provide a reasonable * estimate* for . Finding such an estimate will be the * first problem* with which we will be dealing. In this first problem, we assume that the function is known * exactly*.

If this function has to be determined *empirically*, then we will transform the empirical (often, partial) knowledge about into a reasonable estimate for this function. This will be the * second problem* with which we will be dealing in this section.

##### 2.2. First Problem: Estimating the Value for an Imprecisely Known

If we only know one piece of information about , what is the reasonable estimate for ?

##### 2.3. Second Problem: Estimating the Function Based on Partial Information about the Dependence between and

Assume that for every information piece , , we have the corresponding measured value of . Since we know only numerical characteristics of the unknown function , we cannot exactly reconstruct this function. Instead, we need to provide a good estimate for each value of this function.

#### 3. A Natural Probabilistic Problem that Leads to the Probabilistic Interpretation of F-Transform

##### 3.1. Uncertainty in : A General Probabilistic Description

Assume that we have a *model* of the estimation procedure, that enables us, given the actual value , to compute the probability of this procedure resulting in —under the condition that the actual (unknown) value of the estimated quantity is .

To simplify formulas, we denote Since for every , we must have exactly one of the possible outcomes, we thus conclude that the probabilities of different estimation results must add up to one, that is, we must have In the above simplified notation, this formula takes the form

##### 3.2. First Problem: Estimating the Value for an Imprecisely Known

Let us consider the first problem. In practice, we do not know the exact value of the quantity . Instead, we only have one of the information pieces , . Under the assumption that we know , what is the reasonable estimate for ?

In terms of probability theory, we would like to find the conditional expected value of under the condition .

By definition, this expected value is equal to Thus, to compute this expected value, we must know the probabilities . Instead, we know the probabilities .

In general, the problem of reconstructing probabilities of different hypotheses based on the observation from conditional probabilities of this observation under different hypotheses is well known in probability theory; it is solved by applying the Bayes theorem. The continuous version of this theorem is in which is a prior probability of the hypothesis (strictly speaking, and are probability densities).

In our case, different hypotheses correspond to different possible values of the quantity of interest. Thus, (11) takes the form

Since there is no a priori reason to prefer one value of to the other, it is reasonable to assume that all the values are equally probable, that is, that all prior values are equal to each other: .

Substituting into the formula (12) and dividing both the numerator and the denominator by the common factor , we get the expression

Substituting this expression into formula (10) (and renaming the variable in the denominator), we get

In terms of the simplified notation (7), we thus get that is, exactly the formula (2) corresponding to F-transform.

##### 3.3. Second Problem: Estimating the Function Based on Partial Information about the Dependence between and

In some practical situations, we do not know the exact expression for the function . Instead, we must estimate from the empirical data, that is, from the previous results of simultaneous measuring and .

In each such measurement, the only information that we get about is one of the values . For each case when the information about is , we have one or several values .

Ideally, we should have a large number of values corresponding to each -measurement result . Based on these values , we should then be able to reconstruct the conditional distribution of under the condition of . Based on these conditional distributions, we should be able to reconstruct the values for all .

In practice, however, we have only a few values corresponding to each -measurement result . In this case, at best, instead of the entire conditional probability distribution, we can only reconstruct a single parameter—the conditional mean . Since we only know characteristics of the unknown function , we cannot exactly reconstruct this function. Instead, we need to describe a good estimates for each value of this function.

Similarly to the first problem, we take the mean as a reasonable estimate. Thus, in the above practical setting, the problem of estimating the function takes the following form: (i)for every , we know the conditional mean ; (ii)based on these conditional means, for every , we want to estimate the mean value .

For this problem, the formula of full probability leads to the following result: By using the notations for , for , and for , we can transform the formula (16) into the form that is, exactly the F-transform inversion formula (3).

##### 3.4. Conclusion

The above (minor) modification of a probability model from [8] uniquely determined both basic formulas (2) and (3) related to F-transform.

##### 3.5. Relation with the Random Set Interpretation of Fuzzy Sets

It is worth mentioning that the probabilistic interpretation from [8] is related to the random set interpretation of fuzzy sets (see, e.g., [9]).

In this interpretation, the meaning of an imprecise (fuzzy) term like “small” is based on the following idea. The fact that the term is imprecise means that for the same value , some people will say that this value is small, while other people will say that this value is not small. To take this imprecision into account, we can store, for each person, a set of all the values that this person considers small.

Since there is no prior reason to prefer the opinion of one of these folks, we consider their opinions equally reasonable. We can then take the ratio of people who consider to be small as a reasonable measure of smallness (this is actually one of the standard ways to construct a membership function corresponding to a certain term).

We can describe this ratio in probabilistic terms if we assume that all the persons are equally probable. In these terms, the value can be interpreted as the probability that a randomly selected person would consider to be small.

This interpretation of the membership function as the conditional probability is exactly what we used in our probabilistic interpretation of F-transform.

##### 3.6. Terminological Comment

For completeness, let us explain why the above interpretation is called the random sets interpretation.

For crisp (well-defined) properties, each property can be described by the set of all the values that satisfy this property.

For each imprecise property like “small,” instead of a * single* set describing all the values that satisfy this property, we have *several* sets describing the opinions of several persons. We consider the opinions of all these persons to be equally valid, so each of persons has the exact same probability of being correct. In this case, we have different sets, each occurring with probability .

In mathematical terms, we can describe this situation by saying that we have a probability distribution on the class of all possible sets. In probability theory, such a distribution is called a * random set*—similarly to the fact that a probability distribution on the class of all possible numbers is called a *random number*.

#### 4. Discussion

Let us discuss what the consequences of the above results for the meaning and usage of F-transforms are(the authors are greatly thankful to the anonymous referees who proposed the main ideas of this discussion).To start this discussion, let us recall why F-transforms were proposed in the first place.

##### 4.1. Need for F-Transforms and the Resulting Main Advantage of F-Transforms: Reminder

One of the main objectives of F-transform is to approximate general functions by functions from a selected finite-parametric family. This is a well-known mathematical problem, and many successful techniques have been developed for solving this problem. For example, we can expand the original function by a polynomial, and then use the first few terms in this expansion as the desired approximation. We can also use transforms such as Fourier transform or wavelet transform, and keep only the first few terms in the corresponding expansion as the desired approximation.

All existing approximation techniques take a function and approximate this function. In situations in which the only information that we have about the desired dependence are the values of measured for several values of , this is the only thing we can do. However, in practice, we often have additional expert knowledge about the dependence . It is therefore desirable to take this understanding into account when we approximate a function.

The expert knowledge is often imprecise (fuzzy), that is, formulated in terms of imprecise expert rules. A natural way to describe imprecise rules is to use fuzzy logic and fuzzy modeling, and, as we have shown, the fuzzy modeling approach naturally leads to F-transforms.

The ability to take into account expert knowledge is thus the main advantage of F-transforms, the main reason why F-transform has led to many successful applications.

##### 4.2. The Probabilistic Interpretation of F-Transform Leads to an Additional Advantage of F-Transform in Comparison with Other Approximation Techniques

The above probabilistic interpretation of F-transforms shows that each component of an F-transform can be interpreted as a mean value of the approximating function under the condition that the unknown value is consistent with the measurement result . It is well known that in probability theory, the mean value can be alternatively described as the value that minimizes the mean square difference between this value and the actual value , that is, that minimizes the expression . Thus, the above relation provides an additional advantage of F-transforms in comparison with other approximation tools: (i)F-transforms not only reflect expert knowledge, (ii)F-transforms also provide a solution which is * optimal* (in a well-defined reasonable sense).

##### 4.3. Gauging the Accuracy of the Resulting Approximation

We have shown that each component of the F-transform provides the approximation to which is the most accurate. The next natural question is how accurate is it? In other words, what is the corresponding mean square difference ? It turns out that the answer to this question can also be provided in terms of F-transforms.

Namely, as it is known, for , we have

That is, . The expression can also be described in terms of F-transforms. Indeed, our result about the relation between F-transform and conditional expected value applies to all possible functions, including the square of the original function . Thus, each value is equal to the th component of the F-transform of this square.

So, we arrive at the following conclusion. If we only know that is consistent with the measurement result , then (i)a reasonable approximation for is the value : the th component of the F-transform,(ii)the root mean square accuracy of this approximation is determined by the formula , where is the th component of the F-transform of the function .

Similarly, for the second problem—reconstructing when we only know finitely many values corresponding to different —the mean square accuracy of the corresponding approximation of the actual (unknown) function by its inverse F-transform is equal to

The first term in this difference is equal to the inverse F-transform where the values form an F-transform of the squared function .

Thus, we arrive at the following conclusion: (i)If we only know the values of the F-transform of the actual (unknown) dependence , then, as a reasonable approximation to , we can take the inverse F-transform .(ii)If, in addition to the values , we also know the F-transform of the square , then we can estimate the root means square accuracy by using the formula where is the inverse F-transform of the squared function.

#### 5. A Similar Modification of a Probabilistic Interpretation Is Possible for Mamdani-Style Fuzzy Modeling (and Fuzzy Control)

##### 5.1. From F-Transform to Fuzzy Modeling

Let us show that the above modification of a probabilistic interpretation from [8] can be extended from F-transform to a more general case of Mamdani-type fuzzy modeling and fuzzy control.

*Comment 2. *In this section, we concentrate on Mamdani's approach since F-transform can be viewed as a particular case of this approach, and since for Mamdani's approach, a probabilistic interpretation is possible [8]. Please note that while Mamdani's approach was historically the first, at present, there are many different approaches to fuzzy modeling and fuzzy control; we mention some of them in this chapter, but there are many others; see, for example, [10–12]. How to best interpret these other approaches in probabilistic terms—and whether such an interpretation is at all possible—is an interesting open question.

For example, an interesting question is how to interpret type-2 approaches to fuzzy modeling and fuzzy control; see, for example, [13–16]; maybe via interval-valued probabilities?

##### 5.2. Mamdani's Approach to Fuzzy Modeling and Fuzzy Control: A Brief Reminder

In Mamdani's approach, we start with rules like

and then use membership functions for “small” and “medium” to transform these rules into an exact control strategy.“if is small, then should be medium”,

In general, we have rules

with known membership functions and for the corresponding properties. Mamdani's methodology is based on saying that for each input , the value is a reasonable value of control if and only if one of the above rules is applicable, that is,(i)either the first rule is applicable, that is, satisfies the property and satisfies the property , (ii)or the second rule is applicable, that is, satisfies the property and satisfies the property , (iii)(iv)or the th rule is applicable, that is, satisfies the property and satisfies the property .“if has a property then has the property ” (),

Once we select functions and to represent “and” and “or” (these functions are called * t-norm* and * t-conorm*), we can thus describe the degree of our belief that is reasonable (for a given input ) as
In particular, if we select and (and if the added values do not go beyond 1), we get
Once we know this membership function, we can find the appropriate value of by using the so-called * centroid defuzzification*:

##### 5.3. A Natural Probabilistic Analog of Mamdani's Approach to Fuzzy Modeling

In [8], it was shown that in a probabilistic setting, we get formulas similar to Mamdani rules corresponding to and —if we assume a uniform distribution on the outputs. Let us show that by using Bayes formula, we can avoid this additional assumption, and thus, make the resulting probabilistic analog of Mamdani's fuzzy modeling even more natural.

Similarly to the above probabilistic interpretation of F-transform, let us assume that we have possible pieces of information about the quantity , and that for each piece of information, we also know the corresponding probability which we will be denoted by .

Similarly, let us assume that we have possible pieces of information about , and we know the corresponding probabilities which we will denote by .

We know that depends on , but we do not know the exact dependence. Instead, for each information about , we know the corresponding information about the corresponding .

Since we did not select any specific order for the information , we can select the value corresponding to as , the value corresponding to by , etc. Under this selection, the available information simply means that if is described by the piece of information , then the corresponding is described by the piece of information .

Our objective is, given these rules and given a new value , to find a good estimate for the appropriate .

Due to the formula of full probability, the conditional probability density of under the condition has the form We know the probabilities . The probability densities can be determined by using the Bayes theorem—similarly to the F-transform case—as that is, in terms of the values , as Substituting the formula (27) and the expression (7) into the formula (25) (and changing the multiplication order), we get the formula Once we know these probabilities, we can produce the mean as a reasonable estimate for : These are exactly the formulas derived in [8] from the additional assumption of a piecewise constant output distribution. Thus, our (minor) modification of [8] indeed uniquely determines the corresponding probabilistic analog of Mamdani's formulas.

##### 5.4. In Mamdani-Type Setting, Fuzzy and Probabilistic Formulas Are, in General, Different

It is worth mentioning that (i)while in F-transform, the probabilistic and fuzzy derivations lead to exactly the same formulas,(ii)in the general fuzzy modeling case, as mentioned in [8], the formulas are somewhat different:(a)the formula (29) is exactly the same as (24), with instead of ;(b)the formula (28) is slightly different from Mamdani's formula (23)—by the integral in the denominator.

##### 5.5. Cases when Fuzzy and Probabilistic Formulas Coincide

For F-transform (and, more generally, in all the cases when the value is the same for all ), this additional denominator simply divides all the values by the constant. This constant appears both in the numerator and in the denominator of the formula (28) and thus, it does not affect the resulting value .

Another case when the fuzzy and probabilistic formulas coincide is the case of the Takagi-Sugeno (TSK) approach; see, for example, [10]. This equivalence is, in effect, proven in [8]. In the TSK approach, rules have the type

“if has a property then ” (),

for known functions . In the probabilistic setting, we assume that under a piece of information , we must take . Thus, for a given input , we select with probability , where . The resulting mean is thus equal to . For the case when , this is exactly the TSK formula.

##### 5.6. Comparison between Fuzzy and Probabilistic Modeling

For Mamdani-type situations when fuzzy and probabilistic formulas are different, the comparison of the corresponding probabilistic and fuzzy rules is done, in detail, in [8].

Let us add three more situations to this comparison, situations that are naturally related to our modified derivation.

##### 5.7. Case when Probabilistic Control Is Better

When the values are different, probabilistic control and fuzzy control lead, in general, to a different value . We will show, on an example originally proposed by R. Yager, that in this case, the result of the probabilistic control is closer to common sense that the result of Mamdani's control.

Indeed, let us consider the situation in which we have two rules: (i)the first rule is a more general rule saying that if is small, then should be small; (ii)the second rule is a very specific rule, saying that if is very close to 0.11, then should be very close to 0.15.

Intuitively, if we have a value for which a very specific rule is applicable, for example, the value , then this specific rule should have a priority over the general rule. However, since the width of the membership function is small, the corresponding term in (23) will practically not affect the resulting estimate (24).

In contrast, in the probabilistic control, the effect of is normalized by, crudely speaking, the total width of the corresponding function . Thus, even the most specific rules will have—as desired—the significant influence on the result (29).

*Comment 3. *It should be mentioned that the problem with specific rules occurs only in Mamdani's approach to fuzzy control. In the alternative* logical* approach, this problem does not appear; see, for example, [17].

##### 5.8. Another Case When Probabilistic Control Is Better

The probabilistic interpretation enables us to naturally consider more general situations in which the rules are themselves probabilistic, that is, when, for each and , we know the conditional * probability * that if has the property , then has the property .

In other words, instead of the original rules

we now have rules“if has the property , then has the property , ”

“if has the property , then has the property with probability

Indeed, in this case, due to the formula of full probability, the conditional probability density of under the condition has the form Here, we know the original probabilities and the probabilities . The probability densities can be determined by using the Bayes theorem as an expression (27). Substituting the formula (27) and the expression into the formula (30) (and changing the multiplication order), we get the formula Once we know these probabilities, we can produce the mean by using the formula (29).

##### 5.9. In Some Cases, Fuzzy Control Is Better

We have shown that in some situations, probabilistic control is better than the original Mamdani's fuzzy control. However, in other situations, the fuzzy control is better. Let us give two examples.

##### 5.10. Case when Mamdani's Formulas Are Better

The above probabilistic formulas only work for the case when , that is, in the probabilistic terms, when the properties are mutually exclusive. In practice, we may have nonexclusive properties, in which case we may have .

It is not clear how to handle this situation within the probabilistic approach. However, such situations are not a problem if we apply fuzzy control: its formulas are applicable no matter whether we satisfy the requirement or not.

*Other Cases when Mamdani's Formulas Are Better*

The probabilistic interpretation is only possible when we use multiplication and addition as “and” and “or” operations and .

Fuzzy control does not necessarily have to use these operations, it can use different t-norms and t-conorms. It is an empirical fact that in many control situations, the use of t-norm different from the product and of the t-conorm different from the sum leads to a much better quality control—for example, a more stable or a smoother one.

In [18], we have formulated the problem of selecting the t-norm and the t-conorm as a precise optimization problem, and for several objective functions like smoothness or stability, we gave an explicit analytical solutions to these optimization problem—specifically, we described the selection that leads to the optimal values of smoothness or stability. In many of these case, the optimal selection is indeed different from the probabilistic case of product and sum. Thus, fuzzy control methodology indeed leads to a better quality control.

#### 6. Conclusion

The fuzzy transform (F-transform) techniques have been lately shown to be very successful in various applications, including applications where until recently, only more traditional tools like Fourier transform or wavelet transform have been applied. In many other applications, however, the traditional tools have a clear advantage. It is therefore desirable to combine F-transform with the more traditional tools, so as to combine the relative advantages of both techniques. To make this combination easier, it is desirable to interpret F-transform in traditional mathematical terms.

In this paper, we describe a modification of a probabilistic interpretation described in [8]. In this modification, the corresponding probabilistic model uniquely leads to the formulas of the F-transform. A similar modification is described in a more general situation of fuzzy modeling.

#### Acknowledgments

This paper was supported in part by the National Science Foundation Grant HRD-0734825, by Grant 1 T36 GM078000-01 from the National Institutes of Health, and by Grant MSM 6198898701 from MŠMT of Czech Republic. The authors are thankful to Josef Štěpán and Ron Yager for motivation and valuable discussions, and to the anonymous referees for valuable suggestions.