Abstract

A new estimate of the probability density function (PDF) of the sum of a random number of independent and identically distributed (IID) random variables is shown. The sum PDF is represented as a sum of normal PDFs weighted according to the PDF. The analytical model is verified by numerical simulations. The comparison is made by the Chi-Square Goodness-of-Fit test.

1. Introduction

The probability density function (PDF) of the sum of a random number of independent random variables is important for many applications in the scientific and technical area [1]. Such a problem is not at all straightforward and has a theoretical solution only in some cases [25]. Further, in some cases the theoretical solution is not engineering viable (see, e.g., [26] and the references therein, especially in [4, 5] where the rates of convergence in various metrics are studied as well).

The purpose of this paper is to find a viable and good estimate of the PDF of a random variable (RV) which is the sum of a random number of independent and identically distributed (IID) real RVs . The new model has also the benefit of providing an intuitive support to physical interpretation. In simple terms, the sum PDF is represented as a sum of normal PDFs weighted according to the PDF. Two particular results are achieved. They are the conditions under which the PDF of the sum approximates a normal PDF again and those under which it approximates the envelope of the PDF of , except at the extremes when they are high.

The theory can be applied all the time where a random number of IID RVs has to be summed. For instance, the total energy stored in the volume of reverberation chambers (RCs) can be statistically achieved by the sum of a random number of IID RVs representing the energy stored in as many appropriate subvolumes . By following this rationale, the PDF of the total stored energy in an RC can be achieved as well as the PDF of the relevant quality factor . A similar rationale is used to achieve a generalized stochastic field model for RCs [7].

Another important applicative case is experienced in coherent narrow-band sensor when the speckle arises and an appropriate model cannot be achieved to best mitigate it or exploit it as a geophysical signal [8].

2. The Model

The RV is expressed as follows:where the real IID RVs are assumed with finite variance. is an integer and positive RV independent of the ; the values of are denoted with , (), where and are the minimum and maximum value of , respectively. The value cannot be too small since the central limit theorem (CLT) has to be applied for each value. can be certainly considered an adequate value [9].

The PDF can be written as follows:where is the discrete probability relevant to and is the delta function.

The mean and variance of are as follows [2]:where and are the mean and standard deviation of , respectively; and are the mean and standard deviation of , respectively.

The random sum represents mutually exclusive random sums, which are selected by the values; they areThe set of the values of , which is the certain event, is given by the sum of the mutually exclusive events, which are the subsets formed by the values of the RVs ; that is, it can be written as follows:where is the set of the values of and , , are the subsets of the values of the corresponding . Therefore, it results inThe PDF of the random sum is given by the sum of PDFs, , which are weighted by the relevant probability . That is, it can be written as follows:Considering the CLT, the PDFs can be assumed normal [9, 10]. Hence [9],By considering (8) and (9), the estimate of can be written as follows:It can be noted that if , then the PDF is the sum of normal PDFs which are centered at zero with increasing variances and weighted by the relevant probability .

We stress that (9) and (10) are written under the condition of identically distributed RVs ; it is precautionary with respect to the validity of the CLT [913]. However, even though the extension of (9) is practicable when the RVs are not identically distributed, by considering the further conditions on the RVs for the possible convergences different from the normal distribution [913], the corresponding extension of (10) requires more stringent conditions as specified below (see the end of Section 4). Note that when the PDF of the sum of the RVs is exactly expressible for any constant value apart from the PDFs of , which could also be different, then (10) can be exactly written and no validation procedure is necessary. The sums of independent exponentially distributed RVs with different scale parameters as well [1417] (and not only them) are an example where the are not identically distributed and (9) and (10) can also be exactly achieved.

3. Model Applicability

Equation (10) can be easy implemented by a simple algorithm, which can be solved by a calculator to achieve the values; they are denoted here as analytical solution values.

The error on a single distribution depends on and on the form of the density [9, 1821]; it is maximized for ; however, under the assumed hypothesis that the random variables are IID, the value is adequate for most applications [9]. The total error depends on and , and it can be estimated by considering the error on each distribution . The approximation in (10) improves as increases. However, the analytical determination of the total error is too arduous. Here, in order to find the conditions such that the overall approximation is adequate for applications, we compare the histogram achieved by (10) and that achieved by numerical simulation. This last one is achieved starting from the generation of the RVs having the same PDFs of and , which are known, and then by summing the according to the generated values of . The histogram originated from the numerical simulation is used as a null hypothesis for the comparison. The comparison is accomplished by Chi-Square Goodness-of-Fit test to the significance of 0.05. The analytical solution, which includes the implementation of (10), the numerical simulation, and Chi-Square Goodness-of-Fit test are accomplished by the software LabVIEW of National Instruments. Clearly, the width of the bins and the relevant centers along the -axes is the same both for the analytical histogram and for the numerically simulated one.

Even though the results of the tests shown here are achieved by using as a null hypothesis the PDF of the simulated data, it has been verified that the outcome of the test is the same when one chooses as a null hypothesis the achieved model (10). Actually, since (10) is an analytical model, it allows us to calculate probability and samples for each bin with no statistical fluctuations, so that it could be used as a null hypothesis for the comparison. This inversion of rationale is again consistent with the theory of the Chi-Square Goodness-of-Fit test. In fact, it corresponds to consider the achieved model as an exact model so that it can originate the theoretical frequencies, which are compared with the frequencies achieved by numerical simulations; these last ones have the typical statistical fluctuations. In order to calculate the number of degrees of freedom for Chi-Square test, it is specified that , , , and are theoretical parameters as the PDF of is known as well as the mean and variance of . When (10) is considered exact, its mean and variance are promptly achieved; they turn out to be equal to (3) and (4), respectively.

It is specified that the comparisons are made so that the bins whose samples are less than four are discarded. It is also specified that the sample number of , which is denoted with , is equal to 10000 and the bin number is 60.

In some cases, in order to avoid values close to zero (too much low) or simply to test different values of and (namely, different values) with the same shape of the PDF, the mean of the PDF is shifted by adding a fixed value to it. In these cases, the relevant PDF is shown with the same nomenclature except for the addition of the adjective “shifted” as we have a shift of the mean value only.

The comparison is made by using various pairs of PDFs for and with different means and variances. In particular, two meaningful PDFs are tested, which are the uniform distribution and the one chi squared with two freedom degrees; this last one is shifted along the -axis (shifted exponential distribution). However, both for and , PDFs as a triangular distribution, a chi distribution with two degrees of freedom (Rayleigh distribution), and a chi square distribution with six degrees of freedom are tested as well. To save space, only the results relative to PDFs as the uniform, (un)shifted exponential and (un)shifted Rayleigh are shown here.

All the tests made are accepted at the significance level of 0.05; some results are shown below, Figures 119.

4. Discussion on the Behaviour of the PDF and Two of Its Useful Approximations

In this section, we consider the behavior of the PDF by some parameters defined in terms of mean and variance of and . In other words, we write the general conditions in order that the PDF approximates the normal PDF and the one; they include implicitly the conditions why the PDF progressively changes from a normal PDF to the one. The conditions, why the weighted normal PDFs approximating the PDF are separated, are also examined.

To start from (4), by also taking a cue from (10) and (24) in [2], and by considering that implies , it can be inferred that ifwhere and are adequate threshold values depending on the PDFs of the RVs and , then we can again assume approximately normal. Note that , where is the variation coefficient of . Normally, the thresholds and are both less than one; however, if then can turn out to be considerably greater than one. By setting , it is noted that ; namely, as (11)-(12) show, the thresholds and are not independent in sense that if one decreases, the other can increase and vice versa; clearly, it can occur within adequate limits depending on the PDFs of and . It can be noted that (11)-(12) are similar to (10) and (24) in [2]; actually, they are as a whole less stringent than (10) and (24) in [2]. However, (11)-(12) are very useful for the practical applications, where it is sufficient to know an approximate solution and not just that exact or asymptotically exact.

Similarly, by considering that implies , it can be inferred that ifwhere and are adequate threshold values depending on the PDFs of and , then approximates the envelope of the PDF, as it will be cleared below.

The thresholds and depend on the ; as this last one increases, decreases and increases.

For both inferences (11)-(12) and (13)-(14), it is necessary that the next weighted normal PDFs forming the PDF markedly interfere. In order to find the conditions under which the next weighted normal PDFs markedly interfere, the next unweighted normal PDFs can be considered. Mean and standard deviation of each normal component can be assumed equal to and , respectively; it can be written as follows:Therefore, it has to be (the double symbol means “the same order of magnitude and greater than”); that is Indeed, if (16) is satisfied for it is satisfied for any belonging to its distribution. For the inference (11)-(12), in (16) the symbol meaning “greater than” is prevalent, whereas, for the inference (13)-(14), in (16) the symbol meaning “the same order of magnitude” is prevalent.

Note that condition (16) is included in condition (12). Note also that for a given pair of PDFs of and , by fixing a certain value of , as increases, increases as well; so, gradually is exceeded and condition (16) is no longer satisfied. Under these conditions, the weighted normal PDFs approximating the PDF are partially separated; this separation increases with the increasing of with respect to , so that (10) always approximates the PDF, but this last one does not approximate the envelope of the PDF again. We stress that the limitation is due to condition (16).

Clearly, (15) and (16) can be written for the cases where the difference between two next values is greater than 1; that is, it can be written as follows:where is an integer given by the difference between and .

Therefore, it has to be ; that is,Shortly, the conditions in order that the PDF approximates a normal PDF again or the envelope of the PDF are (11)-(12) and (13), (14), and (16), respectively. Below, both the inferences will be verified by (10) and numerical simulations; the values of the thresholds for which approximates a normal PDF and those for which approximates the envelope of the PDF will also be shown for some examples.

We recall that if the PDF is normal then is approximately normal apart from the conditions given here [2].

5. Experimental Results

In order to verify the model achieved for the PDF, some simulation results are shown; various pairs of PDFs are used for and with different means and variances. In particular, two PDFs with very different kurtosis index are tested, so as to make the tests very meaningful. Such PDFs are the uniform and chi squared with two freedom degrees shifted along the -axis (shifted exponential distribution). The uniform distribution is platykurtic whereas the exponential one is leptokurtic.

A PDF formed by three delta of Dirac only (; ) is first used for ; see Figures 16. To each delta is coupled the relative probability. The PDF is chi square distribution with two degrees of freedom so that is a constant equal to 1. Since the forms of histograms and relevant PDFs are the same, the marker points representing the height of the histogram bins are interpolated by a simple line in all the figures shown here.

When the normal approximation is tested, it is specified that the null hypothesis is the theoretical normal with mean and variance given by (3) and (4), respectively. Similarly, when the envelope of the PDF is tested, it is specified that the null hypothesis is the PDF, which is considered continuous, with mean and variance given by (3) and (4) again. However, the captions of all the figures are exhaustive and the null hypothesis is specified.

In Figures 16, the behavior of the PDF with the increasing of the ratio for the same probability can be noted. In particular, when (11) and (12) are satisfied the PDF approximates the normal one; see Figure 1; then as the ratios and increase, the PDF converges to the envelope of the PDF of except at its extremes when they are high; see Figures 2 and 3; finally, the PDF becomes a sequence of separated approximate normal PDFs; see Figures 5 and 6. In Figure 4, the normal PDFs forming the PDF are only partially separated. Figure 1 shows that if the thresholds and are limited to 0.3 and the PDFs of and are as specified in the caption of the same figures, then the PDF can be considered approximately normal again.

In Figures 5 and 6, it can be noted that each weighted normal PDF forming the PDF of the sum lowers and widens with the increasing of the relative , as expected. In this connection, it is worth to recall that the variation coefficient of each weighted normal component tends to zero as the relative tends to infinite. In Figure 6, note that the probabilities are different and that in Figures 5 and 6, the weighted normal components are separated as the condition (18) is not satisfied. Clearly, it is why the PDF is not formed by consecutive deltas of Dirac but the next deltas of Dirac are far. In particular, in Figures 5 and 6.

It is specified that different PDFs for do not change the essence of the results, except the thresholds and for which can be considered approximately normal again and those for which approximates the envelope of the PDF, with mean and variance given by (3) and (4). As an example, in Figures 7(a), 7(b), and 7(c), the normal approximation is tested where the PDFs of and are both uniform distributions; their parameters are shown in the captions of the relative figures. In Figures 810, the PDFs of and are both uniform distributions again; the results show that approximates gradually the envelope of the PDF. In particular, in Figure 9, where , it is shown that and . In Figure 11, the case where (16) is not satisfied so that the single normal PDFs are separated is shown; in this case, in order to resolve well the PDF graphs, the bin number is 1000 and is 100000.

Figures 1215 show the results where the PDF of is a shifted chi square distribution with two degrees of freedom and the one of is a uniform distribution. It can be noted again that converges to the envelope of the PDF as and increase. In Figure 15, the thresholds and for are shown.

Similarly, Figures 1618 show the results where the PDFs of and are a shifted chi distribution with two degrees of freedom and a chi square distribution with two degrees of freedom, respectively. In Figure 19, the PDF of is a uniform distribution. In Figure 19, the thresholds and for are also shown.

It is understood that (13), (14), and (16) or (18) are satisfied; it is important to note that with the increasing of the PDF converges to the envelope of the PDF of except at the extremes when they are high. In fact, the extremes of are always tapered and the tapering at the lower extremity is steeper than the one at the upper extremity, as the theoretical model (10) shows. It is shown in Figures 9-10 where the envelope of PDF is uniform. Similarly, when the envelope of the PDF has an extreme high as that of a shifted exponential PDF, where the first extreme is high and the second tends to zero, then for values sufficiently high of , the PDF assumes the same form of the envelope of the PDF of , except for a starting steep rise to which is associated a small amount of probability only. However, for high values of , the probabilities associated with the tapers are very low in all cases apart from the PDF, so that the Chi-Square test is as a rule accepted at the significance level of 0.05, even if the null hypothesis is with no tapers; it is shown in Figures 10, 15, and 19.

Hence, the PDF changes from a normal PDF to the envelope of the PDF, by assuming various forms between the two, and finally, it gradually becomes a sequence of separated approximate normal PDFs according to the ratios , , and . Note that the envelope of the normal PDFs forming the PDF has the same form of the envelope of the PDF also when they are totally separated as they are weighted according to the probabilities .

In order to give results ready for the applications, the next progress of this work is to map the areas of a Cartesian plane () to show where the PDF is approximately normal for the pairs of PDFs of and most common and useful for applications. Similarly, the areas of the Cartesian plane () where the PDF converges to the envelope of the PDF should be mapped. In this last one case, can be taken as a parameter so that the areas can be mapped as a function of and .

In principle, by considering the PDF of as a continuous function, the analytical model for the PDF of could also be written as an integral; that is, it could be written as follows:Clearly, the continuity of affects the quality of the approximation on . However, the analytical solution of the integral in (19) is enough difficult also in the case in which is uniform; therefore, the attempt to achieve an analytical solution of (19) is not very feasible.

The above developments could be extended to the case where the RVs are not identically distributed. In this case, by also considering the above on the further conditions for the convergence to the normal distribution [913], the estimates of and can be written as follows:where and are mean and variance of the sum of . Practically, (20) imply that be very high so that and have stable values; otherwise, the situation is much more difficult as it is necessary to know the means and variances and , as well as the PDFs of , so that (20) can be exactly written.

6. Conclusions

A model for the estimate of the PDF of the sum of a random number of IID distributed random variables is shown. The CLT is repeatedly applied, so that the PDF of the sum is represented by a sum of weighted normal PDFs; therefore, it is also an intuitive model and as a rule, it allows us to make out quickly the form of the PDF of . The validity of the model is verified by Chi-Square Goodness-of-Fit test by comparing the histogram correspondent with that achieved by numerical simulations starting from the generation of the RVs having the same PDFs of and , which are known, and then by summing the according to the generated values of .

Two particular results are achieved; they are the conditions under which the sum PDF can be approximated again by a normal PDF and those under which it approximates the envelope of the PDF of with mean and variance given by (3) and (4), respectively.

The model is usable for practical applications, where it is useful and sufficient to know an approximate solution. In order to give results ready for the applications, the next progress of this work is to map the areas of a Cartesian plan to show the areas where the PDF is approximately normal and those where it converges to the envelope of the PDF for the pairs of PDF of and interesting for applications.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported in part by the POR Campania FESR 2007/2013 O.O. 2.2—Contract of Regional Program for the Development of the Production Chains in Campania. The authors are grateful to Luigi De Luca (GGM Lab at the Università degli Studi di Napoli Parthenope) for the useful help in the paper editing.