Research Article | Open Access
An Estimate of the Probability Density Function of the Sum of a Random Number of Independent Random Variables
A new estimate of the probability density function (PDF) of the sum of a random number of independent and identically distributed (IID) random variables is shown. The sum PDF is represented as a sum of normal PDFs weighted according to the PDF. The analytical model is verified by numerical simulations. The comparison is made by the Chi-Square Goodness-of-Fit test.
The probability density function (PDF) of the sum of a random number of independent random variables is important for many applications in the scientific and technical area . Such a problem is not at all straightforward and has a theoretical solution only in some cases [2–5]. Further, in some cases the theoretical solution is not engineering viable (see, e.g., [2–6] and the references therein, especially in [4, 5] where the rates of convergence in various metrics are studied as well).
The purpose of this paper is to find a viable and good estimate of the PDF of a random variable (RV) which is the sum of a random number of independent and identically distributed (IID) real RVs . The new model has also the benefit of providing an intuitive support to physical interpretation. In simple terms, the sum PDF is represented as a sum of normal PDFs weighted according to the PDF. Two particular results are achieved. They are the conditions under which the PDF of the sum approximates a normal PDF again and those under which it approximates the envelope of the PDF of , except at the extremes when they are high.
The theory can be applied all the time where a random number of IID RVs has to be summed. For instance, the total energy stored in the volume of reverberation chambers (RCs) can be statistically achieved by the sum of a random number of IID RVs representing the energy stored in as many appropriate subvolumes . By following this rationale, the PDF of the total stored energy in an RC can be achieved as well as the PDF of the relevant quality factor . A similar rationale is used to achieve a generalized stochastic field model for RCs .
Another important applicative case is experienced in coherent narrow-band sensor when the speckle arises and an appropriate model cannot be achieved to best mitigate it or exploit it as a geophysical signal .
2. The Model
The RV is expressed as follows:where the real IID RVs are assumed with finite variance. is an integer and positive RV independent of the ; the values of are denoted with , (), where and are the minimum and maximum value of , respectively. The value cannot be too small since the central limit theorem (CLT) has to be applied for each value. can be certainly considered an adequate value .
The PDF can be written as follows:where is the discrete probability relevant to and is the delta function.
The mean and variance of are as follows :where and are the mean and standard deviation of , respectively; and are the mean and standard deviation of , respectively.
The random sum represents mutually exclusive random sums, which are selected by the values; they areThe set of the values of , which is the certain event, is given by the sum of the mutually exclusive events, which are the subsets formed by the values of the RVs ; that is, it can be written as follows:where is the set of the values of and , , are the subsets of the values of the corresponding . Therefore, it results inThe PDF of the random sum is given by the sum of PDFs, , which are weighted by the relevant probability . That is, it can be written as follows:Considering the CLT, the PDFs can be assumed normal [9, 10]. Hence ,By considering (8) and (9), the estimate of can be written as follows:It can be noted that if , then the PDF is the sum of normal PDFs which are centered at zero with increasing variances and weighted by the relevant probability .
We stress that (9) and (10) are written under the condition of identically distributed RVs ; it is precautionary with respect to the validity of the CLT [9–13]. However, even though the extension of (9) is practicable when the RVs are not identically distributed, by considering the further conditions on the RVs for the possible convergences different from the normal distribution [9–13], the corresponding extension of (10) requires more stringent conditions as specified below (see the end of Section 4). Note that when the PDF of the sum of the RVs is exactly expressible for any constant value apart from the PDFs of , which could also be different, then (10) can be exactly written and no validation procedure is necessary. The sums of independent exponentially distributed RVs with different scale parameters as well [14–17] (and not only them) are an example where the are not identically distributed and (9) and (10) can also be exactly achieved.
3. Model Applicability
Equation (10) can be easy implemented by a simple algorithm, which can be solved by a calculator to achieve the values; they are denoted here as analytical solution values.
The error on a single distribution depends on and on the form of the density [9, 18–21]; it is maximized for ; however, under the assumed hypothesis that the random variables are IID, the value is adequate for most applications . The total error depends on and , and it can be estimated by considering the error on each distribution . The approximation in (10) improves as increases. However, the analytical determination of the total error is too arduous. Here, in order to find the conditions such that the overall approximation is adequate for applications, we compare the histogram achieved by (10) and that achieved by numerical simulation. This last one is achieved starting from the generation of the RVs having the same PDFs of and , which are known, and then by summing the according to the generated values of . The histogram originated from the numerical simulation is used as a null hypothesis for the comparison. The comparison is accomplished by Chi-Square Goodness-of-Fit test to the significance of 0.05. The analytical solution, which includes the implementation of (10), the numerical simulation, and Chi-Square Goodness-of-Fit test are accomplished by the software LabVIEW of National Instruments. Clearly, the width of the bins and the relevant centers along the -axes is the same both for the analytical histogram and for the numerically simulated one.
Even though the results of the tests shown here are achieved by using as a null hypothesis the PDF of the simulated data, it has been verified that the outcome of the test is the same when one chooses as a null hypothesis the achieved model (10). Actually, since (10) is an analytical model, it allows us to calculate probability and samples for each bin with no statistical fluctuations, so that it could be used as a null hypothesis for the comparison. This inversion of rationale is again consistent with the theory of the Chi-Square Goodness-of-Fit test. In fact, it corresponds to consider the achieved model as an exact model so that it can originate the theoretical frequencies, which are compared with the frequencies achieved by numerical simulations; these last ones have the typical statistical fluctuations. In order to calculate the number of degrees of freedom for Chi-Square test, it is specified that , , , and are theoretical parameters as the PDF of is known as well as the mean and variance of . When (10) is considered exact, its mean and variance are promptly achieved; they turn out to be equal to (3) and (4), respectively.
It is specified that the comparisons are made so that the bins whose samples are less than four are discarded. It is also specified that the sample number of , which is denoted with , is equal to 10000 and the bin number is 60.
In some cases, in order to avoid values close to zero (too much low) or simply to test different values of and (namely, different values) with the same shape of the PDF, the mean of the PDF is shifted by adding a fixed value to it. In these cases, the relevant PDF is shown with the same nomenclature except for the addition of the adjective “shifted” as we have a shift of the mean value only.
The comparison is made by using various pairs of PDFs for and with different means and variances. In particular, two meaningful PDFs are tested, which are the uniform distribution and the one chi squared with two freedom degrees; this last one is shifted along the -axis (shifted exponential distribution). However, both for and , PDFs as a triangular distribution, a chi distribution with two degrees of freedom (Rayleigh distribution), and a chi square distribution with six degrees of freedom are tested as well. To save space, only the results relative to PDFs as the uniform, (un)shifted exponential and (un)shifted Rayleigh are shown here.
4. Discussion on the Behaviour of the PDF and Two of Its Useful Approximations
In this section, we consider the behavior of the PDF by some parameters defined in terms of mean and variance of and . In other words, we write the general conditions in order that the PDF approximates the normal PDF and the one; they include implicitly the conditions why the PDF progressively changes from a normal PDF to the one. The conditions, why the weighted normal PDFs approximating the PDF are separated, are also examined.
To start from (4), by also taking a cue from (10) and (24) in , and by considering that implies , it can be inferred that ifwhere and are adequate threshold values depending on the PDFs of the RVs and , then we can again assume approximately normal. Note that , where is the variation coefficient of . Normally, the thresholds and are both less than one; however, if then can turn out to be considerably greater than one. By setting , it is noted that ; namely, as (11)-(12) show, the thresholds and are not independent in sense that if one decreases, the other can increase and vice versa; clearly, it can occur within adequate limits depending on the PDFs of and . It can be noted that (11)-(12) are similar to (10) and (24) in ; actually, they are as a whole less stringent than (10) and (24) in . However, (11)-(12) are very useful for the practical applications, where it is sufficient to know an approximate solution and not just that exact or asymptotically exact.
Similarly, by considering that implies