#### Abstract

In many practical situations like weather prediction, we are interested in large-scale (averaged) value of the predicted quantities. For example, it is impossible to predict the exact future temperature at different spatial locations, but we can reasonably well predict average temperature over a region. Traditionally, to obtain such large-scale predictions, we first perform a detailed integration of the corresponding differential equation and then average the resulting detailed solution. This procedure is often very time-consuming, since we need to process all the details of the original data. In our previous papers, we have shown that similar quality large-scale prediction results can be obtained if, instead, we apply a much faster procedure—first average the inputs (by applying an appropriate fuzzy transform) and then use these averaged inputs to solve the corresponding (discretization of the) differential equation. In this paper, we provide a general theoretical explanation of why our semiheuristic method works, that is, why fuzzy transforms are efficient in large-scale predictions.

#### 1. Formulation of the Problem

##### 1.1. Predictions Are Needed

One of the main objectives of science is to predict the future values of the physical quantities. For example, it is desirable to predict tomorrow's weather, the weather for several days ahead, and so forth. For a spreading flu epidemic, it is desirable to predict how this epidemic will spread if we do not introduce any restrictions on travel-and how this spread will change if such restrictions are introduced.

##### 1.2. Detailed Predictions Are Often Impossible

Of course, ideally, it is desirable to have predictions which are as detailed as possible. For example, ideally, we would like to know the exact value of tomorrow's temperature and wind speed at all possible spatial locations within a given region—or to predict exactly where the epidemics will spread and exactly how many people will fall ill if we do not introduce any travel restrictions.

However, in many practical situations, such a detailed prediction is impossible. In some of these situations, prediction is potentially possible, but it requires such a large amount of computations that even on the fastest modern computers, the computations finish long after the future event (that we are trying to predict) has already occurred.

##### 1.3. Large-Scale Predictions Are Usually Sufficient

In many practical situations in which we cannot predict the* exact* values of the future quantities, it is often sufficient to predict the* average* values of the future quantities, averaged over certain areas.

For example, from the practical viewpoint, even though we cannot predict the exact value of tomorrow's temperature at all possible spatial locations, it would be beneficial to predict the* average* temperature over a given small geographic region. Similarly, for an epidemic, even though we are unable to predict where exactly it will spread and how many people will fall ill in different small towns, it is very beneficial to be able to predict how many people* on average* will get ill in the region.

For predicting time series, for example, financial time series formed by the prices of different stocks at different moments of time, though it is impossible to predict the exact values of the future prices, it is desirable to at least be able to predict the* trends*, that is, the prices averaged over a certain time period.

*Comment 1.3. *For clarity and simplicity, in the following text, we will describe the case when both the input and the output depend only on time . The exact same formulas can also be applied if we have a spatial dependence; in this case, and are the corresponding spatial points.

##### 1.4. Towards a Precise Mathematical Description of Quantities Predicted by Large-Scale Prediction

Instead of predicting the values for different moments of time , we predict the weighted averages , that is, the average of the values for the values which are close to .

It is reasonable to assume that for different moments we use the same averaging, that is, the weight with which the value contributes to depends only on the difference and not on the absolute values of or . Under this assumption, the general formula for the weighted average takes the form where all the weights are nonnegative and for each , the total weight of all the values is equal to 1:

##### 1.5. An Example and a Useful Equivalent Reformulation of Averaging

A natural example of such averaging is a * Gaussian averaging*, where we use Gaussian weights:
It is often convenient to represent this Gaussian weight function as
where the new weight function is described by a simpler formula
This new weight function satisfies the property and

##### 1.6. Large-Scale Quantities and Fuzzy Transform

A similar representation is often useful for other weight functions as well. In general, once we know this new weight function , we can use the normalized condition (2) to find that
Thus, in terms of the new weight function , the weighted average (1) takes the form
Expression (8) is a particular case of the expression of a * fuzzy transform* [1–3] which is, in general, defined as
for some function for which . For a special uniform case [2, 3], we have several functions of the form , where is a given function. The corresponding values of the fuzzy transforms are then equal to
that is, coincide with the values corresponding to different points .

Thus, from the mathematical viewpoint, the weighted averages are simply the values of the fuzzy transform.

##### 1.7. Typical Prediction Procedure: Solving a Differential Equation

Most relations in physics are described by differential equations. In particular, the relation between the observed signals and the predicted values can also be described by a differential equation.

##### 1.8. Traditional Procedure for Large-Scale Predictions

Since prediction usually means solving a known differential equation, a usual procedure for large-scale predictions is as follows: (i)first, we use the known values to solve the differential equations and get the values ; (ii)then, we apply the weighted average procedure (8) to the resulting values and get the desired large-scale predictions .

##### 1.9. Drawbacks of the Traditional Procedure

The main drawback of the traditional procedure is that we spend a lot of computation time to get a detailed solution —but at the end, we only return a few values corresponding to large-scale predictions.

For example, in weather prediction, we spend hours of computer time on high-performance supercomputers to solve a complex system of differential equations with thousand of variables and then only use the large-scale weighted average of this solution.

##### 1.10. Natural Idea

We are only interested in* large-scale* predictions, that is, only in the weighted* averages* of the result of solving the differential equation, averages that ignore the fine structure of the solution . So why not start with the averaged values of the input , that is, why not ignore the fine structure of from the very beginning and thus, save computation time.

In other words, (1)traditionally, we first *integrate* the differential equation and then* average* the solution; (2)what we propose is that we first* average* and only then* integrate*; in this manner, we will need fewer values to integrate and, thus, less computation time.

##### 1.11. Empirically, This Idea Seems to Work

For several differential equations, we implemented the above idea of how to speed up computations. Specifically, (i)instead of the original input , we use the fuzzy transform values , (ii)then we use the values in the discretized version of the original differential equation, then(iii)we use the results of this solution as an estimate for the desired large-scale averages (= fuzzy transform of ).

Surprisingly, we got a very good approximation to the values computed based on the detailed [2–12].

##### 1.12. What We Do in This Paper

In this paper, we provide a theoretical explanation for the empirical success of the fuzzy-transform-based methods of speeding up computations.

This explanation makes us confident that this fuzzy transform technique can be successfully used in other large-scale prediction problems as well.

#### 2. Theoretical Explanation

##### 2.1. Linearization

Usually, the effect of each input value on the prediction results is small. In this sense, we can say that the inputs are relatively small. Thus, we can use the standard technique of dealing with dependence on small value: (1)extend the dependence of on in Taylor series, (2)ignore quadratic and higher order terms, and thus (3)keep only linear terms in this dependence.

In this case, we get the following dependence: for some functions and .

##### 2.2. Shift-Invariance

We are interested in systematic predictions, predictions that need to be repeated again and again. In these predictions, there is no fixed moment of time: if we start with the same input repeated later (i.e., shifted in time, from to ), we get the same result (similarly shifted) .

For the formula (11), this shift-invariance means that (1)first, we must have for all and ; in particular, for , we conclude that , that is, should not depend on time at all: ; (2)second, we must have for all , , and ; in particular, for , we conclude that and that the function should only depend on the difference .

Thus, we arrive at the following dependence:

##### 2.3. Main Result: Formulation

In the traditional approach, we first find the detailed output (12) and then average it by applying the averaging

An alternative approach is to first apply the same averaging to the original signal , resulting in and try use this averaged signal as the input to the corresponding dynamical systems (i.e., in effect, to transformation (12)): Our claim is that these two approaches always lead to the same result, that is, for all moments of time .

*Proof. *In terms of the normalized weight function (7), the original signal has the form
where is determined by formula (12). Substituting the expression
into formula (17), we conclude that
that is,
where

Similarly, in terms of the normalized weight function , we have
Substituting the corresponding formula
into expression (15) for , we conclude that
that is,
where

In view of formulas (20) and (25), to prove that the values and always coincide, it is sufficient to prove that the corresponding functions and coincide for all and . These functions are defined by expressions (21) and (26).

To prove that these expressions coincide, let us try to transform them into each other. In expression (26), we take the value of the normalized weight function at the point . In contrast, in expression (21), we use the value for the corresponding auxiliary variable . To transform expression (26) into the form (21), let us introduce a new auxiliary variable for which . From this formula, we conclude that , hence takes the form . Thus, in terms of the new variable , the integrated expression in (26) takes the form
Hence, the integrals of these two expressions must also coincide:
The right-hand side of this equality is exactly the expression (21)—the only difference is that we use a different name for the integration variable ( instead of ). Thus, the functions and indeed coincide—and, hence, .

The equality is proven.

*Comment 2.3. *In the ideal case, when quadratic terms can be completely ignored and there is no dependence on absolute time, the new method leads to * exact* same large-scale predictions as the traditional one. In practice, if we take into account that

(i)the quadratic terms are small but non-zero, and that (ii)there may be an underlying trend-like dependence on absolute time (like global warming in weather prediction), we end up with* approximate* equality between the traditional and fuzzy-transform-based predictions—and this approximate equality is what we observed in our experiments [2–12].

Since large-scale predictions are approximate anyway, this approximate equality means that, in terms of accuracy, the new predictions are, in effect, as good as the traditional ones. Since the new predictions are much faster to compute, they have a clear practical advantage.

#### Acknowledgments

This work was supported in part by the National Science Foundation Grant HRD-0734825, by Grant 1 T36 GM078000-01 from the National Institutes of Health, and by Grant MSM 6198898701 from MŠMT of Czech Republic. The authors are thankful to the anonymous referees for valuable suggestions.