Abstract

The linear combination of Student’s random variables (RVs) appears in many statistical applications. Unfortunately, the Student’s distribution is not closed under convolution, thus, deriving an exact and general distribution for the linear combination of Student’s RVs is infeasible, which motivates a fitting/approximation approach. Here, we focus on the scenario where the only constraint is that the number of degrees of freedom of each RV is greater than two. Notice that since the odd moments/cumulants of the Student’s distribution are zero and the even moments/cumulants do not exist when their order is greater than the number of degrees of freedom, it becomes impossible to use conventional approaches based on moments/cumulants of order one or higher than two. To circumvent this issue, herein we propose fitting such a distribution to that of a scaled Student’s RV by exploiting the second moment together with either the first absolute moment or the characteristic function (CF). For the fitting based on the absolute moment, we depart from the case of the linear combination of Student’s RVs and then generalize to through a simple iterative procedure. Meanwhile, the CF-based fitting is direct, but its accuracy (measured in terms of the Bhattacharyya distance metric) depends on the CF parameter configuration, for which we propose a simple but accurate approach. We numerically show that the CF-based fitting usually outperforms the absolute moment-based fitting and that both the scale and number of degrees of freedom of the fitting distribution increase almost linearly with .

1. Introduction

The Student’s -distribution arises in numerous scenarios, e.g., when estimating the mean of a normally distributed population of unknown variance with relatively few samples and in Bayesian analysis of data from a normal family. Moreover, such a distribution plays a key role in many relevant statistical analyses, including Student’s -test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and linear regression analysis [16].

One of the distinctive properties of the Student’s distribution is its heavy tail. This behavior is also seen in the famous family of stable distributions; however, the Student’s distribution is more analytically tractable, which allows, for example, to write down explicitly its likelihood function [1].

1.1. Main Statistics of the Student’s Distribution

The probability density function (PDF) and cumulative density function (CDF) of a Student’s random variable (RV) with degrees of freedom, i.e., , is given by [6]., where is regularized incomplete beta function ([7], Equation (8.17.2)), andhere, and are the beta ([7], Equation (5.12.1)) and gamma ([7], Equation (5.2.1)) functions, respectively. Meanwhile, the integer moments are [6] as follows:while for odd, and moments of order or higher do not exist. We observe that for the specific case of (second moment), (4) reduces to .

Two statistics that play a key role in our proposed approach are the absolute moments and the characteristic function (CF). The latter is given by [6]where is the modified Bessel function of second kind and order ([7], Sec. 10.25) (Notice that , while the moment generating function of does not exist). Meanwhile, the absolute moments can be obtained as follows:where (a) follows from leveraging the symmetry of around zero and from substituting (1), (b) exploits ([8], Equation (3.241.4)) to solve the definite integral, and (c) is attained after simple algebraic transformations after substituting (3).

1.2. On the Linear Combination of Student’s RVs

The linear combination of Student’s RVs, denoted as follows:where and (without loss of generality) , appears in many statistical applications. For instance,(i)Fairweather [9] proposed a method based on the pivotal quantity to obtain an accurate confidence interval for the common mean of several normal populations. Notice that the problem of characterizing the distribution of independent samples that are collected from different normal populations with a common mean but possibly with different variances appears in many practical applications, e.g., when different instruments/methods/laboratories are used to measure substances or products to assess their average quality [10];(ii)The Behrens–Fisher distribution of the test statistic for testing the equality of the means of two normal populations with unknown variances is that of a linear combination of two independent Student’s RVs. The problem appears in many traditional statistical problems, e.g., check [3, 1113];(iii)The distribution of RVs can approximate other heavy-tailed symmetric distributions, e.g., , where and are respectively Gaussian and Rayleigh-distributed. In such scenarios, the distribution of their linear combination may be extremely valuable. Interestingly, the sum of random variables in the form appears in the scenario proposed in [14], where the goal is to determine the number of active devices in a machine-type wireless communication network by relying on coordinated pilot transmissions without much signaling overhead, which facilitates the posterior data decoding procedures.

Unfortunately, the Student’s distribution is not closed under convolution [15], thus, deriving the exact distribution of has been shown to be a cumbersome task, especially for an arbitrary number of degrees of freedom and number of addends . For instance, the methods proposed in [25, 15] are restricted to the case of all s having an odd number of degrees of freedom. Meanwhile, the PDF of is given in [16] as an infinite series, but only for the specific case of .

In general, approximation methods are often more tractable and appealing, which motivates our work in this paper. Specifically, we aim to accurately approximate the distribution of in closed-form given that are independently distributed with , , and no other constraints (the assumption that RVs are independent is common in the literature, e.g., [2, 46, 16]). To the best of our knowledge, this is the first work to (satisfactorily) address this.

1.3. Our Approach

For the approximation, we resort to a Student’s distribution fitting. Specifically, we aim to accurately fit with , which should hold, at least intuitively, as both distributions share the same symmetric and bell-shaped form. However, what might appear to be a simple and straightforward approach is not when considering that the Student’s t distributions have more than two degrees of freedom and no other constraints. We elaborate on this as follows:

The distribution fitting approaches commonly rely on moments (including moments [17]) or cumulants matching. Specifically, at least two moments and/or cumulants of are needed to match those of a scaled Student’s t distribution since such a distribution is characterized only by the scale , and the number of degrees of freedom . However, the challenge lies in that (and each ) must be greater than the moment/cumulant order, while (i) the odd moments/cumulants cannot be used since they are zero, and (ii) the negative moments do not converge since . This implies that (i) we cannot fit moments/cumulants of order higher than 2 in order to allow , and (ii) we cannot rely on the first moment/cumulant. Meanwhile, fractional moments could be used, but they are complex and difficult to compute in general.

In this work, we resort to a fitting based on the second moment matching together with absolute moment matching, or CF matching, to circumvent the above issues. Note that using second moment matching is a natural choice given its simplicity, while exploiting the absolute moment also seems appealing. However, although absolute moments of any order, , with , could be used, they are cumbersome to derive due to the limited separability of the absolute value of a sum, thus, we focus on the simplest case. Finally, a CF matching performance is intriguing as it is not a commonly adopted approach in the literature for distribution fitting problems, especially because moments or other simpler statistics are often available, so we adopt it here given the special characteristics/challenges of the considered problem.

2. Computation of the Relevant Statistics of

2.1. Second Moment

Therefore, the second moment of is given bywhich comes from leveraging the independence and zero-mean features of and from using (4) with .

2.2. Characteristic Function

The CF of the sum of independent RVs matches the product of their independent CFs, thus

2.3. Absolute Moment

The absolute moment of obeys

Remarkably, further simplifying (10) is not a trivial task. Furthermore, its computational complexity scales with . Therefore, we focus on the case , but leverage the corresponding results for the distribution fitting of the linear combination of any RVs in Section 4.

The absolute moment for the case of can be computed as follows:where in (a), we exploit the fact that is symmetric around 0, thus, it can adopt positive and negative values with probability 0.5. Then, (b) comes after applying the integral operator to each of the integrand’s addends and leveraging the symmetry of . Now, observe that , and we are concerned with computing and .

In the case of , we have thatwhere (a) comes from solving the inner integral via [8], Equation (2.27.7)], while we leverage ([8], Equation (3.259.3)) to solve the remaining integral in (b).

In the case of , we have thatwhere (a) comes from using the CDF definition, and (b) from leveraging , followed by splitting the integration region such that the sign of can be fixed accordingly. The latter, together with the symmetry of , is exploited to attain (c), while (d) is immediately obtained after simple algebraic transformations. Note that , where and require integral computations as shown in (13). Fortunately, their calculation can be further simplified as described next.

In the case of , we have thatwhich comes from using ([8], Equation (3.241.4)). Meanwhilewhere (a) comes from substituting (2), (b) follows from rearranging terms, while (c) is obtained by leveraging (14).

Now, to simplify , we introduce the following variable transformationthus, . Thenwhere , . Now, we leverage the Taylor series expansion of an incomplete regularized beta function [18] to write

Then, by substituting (18) into (17), one getswhere (a) comes from exploiting the variable transformation , while the integral is solved in (b) by leveraging ([8], Equation (3.197.8)). Then can be easily estimated by truncating the infinite sum in (19). As illustrated in Figure 1, the relative approximation error decreases following a power law decay, thus, a relatively small number of addends is needed, especially for small , , and large .

By combining (12), (13), (14), (15), and (19), and substituting them into (11), we obtain

2.4. Special Case: Sum of i.i.d. Student’s RVs

Herein, we consider the special case of the sum of i.i.d. Student’s RVs, i.e., , , and . With this in hand, the computation of , , and (for ) can be more easily obtained as follows:

Under such conditions, (8) and (9) directly simplify to

As for for , for which , one may depart from (a) in (12) to writewhich leverages ([8], Equation (3.251.2)) and (3) followed by some simple algebraic simplifications. Meanwhile, (14) can be directly used with instead of . Now, (19) can be computed by departing from (17) aswhere (a) comes from stating the regularized incomplete beta function in terms of a hypergeometric function according to ([7], Equation (8.17.7)) and using , while (b) follows from leveraging [19] together with iterative integration by parts. Finally, by combining the above results, we obtain

3. Distribution Fitting

Next, we follow two different distribution-fitting approaches. The first approach is based on matching the second and absolute moments, while the second approach relies on matching the second moment and the CF for a certain . We also illustrate their accuracy.

3.1. Fitting Based on Second and Absolute Moments

According to (4) with and (6) with , the set of equations to solve is as follows:with variables . Recall that is given by (8), while is given in (20) for the case of . By isolating in the first equation, i.e., , and substituting it into the second equation, the system of (25) transforms to

We observe that attaining an exact closed-form solution for in (26) is not viable, thus, we resort to a low-complex approximation of . For this, we plot vs. in Figure 2, and realize that has approximately the form of a quotient of two linear functions. Hence, we state

Here, we know that , andwhere (a) comes from using ([7], Equation 5.11.12). Then, in order to satisfy such conditions, we can directly set and as follows:where (30) uses the result in (29) in the last step. Finally, can be obtained easily by standard curve fitting. The accuracy of (27) is also depicted in Figure 2.

With (27) in place, one can estimate and as follows:and use the approximation . Such a distribution fit is illustrated in Figure 3 and evinces the appropriateness of our approach. All in all, the fitting accuracy only seems to be affected by the distribution tails, which is expected considering that only two (low-order) features of the child distributions are used for the fitting. Nevertheless, even in the tails region, the accuracy is surprisingly good, being only critically affected when the child distributions have significantly diverging degrees of freedom and scaling factors in opposite directions, e.g., small and large .

3.1.1. Linear Combination of Student’s RVs

Notice that if a scaled Student’s distribution fits accurately the distribution of the linear combination of Student’s RVs, then such a Student’s fitting approach applies for the linear combination of any Student’s RVs. This can be easily shown by induction as follows:

According to the previous subsection’s results, we can state thatholds approximately. Here, and , where and are transformation functions (given by (31) and (32) in the particular case of ). Then, assume that , and observe that

Thus, proving the hypothesis.

Wrapping up, the linear combination of Student’s RVs is approximately distributed as , where , can be iteratively obtained as follows:, where , , and , are respectively given by (31) and (32), which can be computed by leveraging (8) and (20). The accuracy of such a procedure has been corroborated by several simulation campaigns, and it is illustrated here in Figure 4 for an example set of distribution parameters.

3.2. Fitting Based on Second Moment and Characteristic Function

According to (4) with and (5), the set of equations to solve is as follows:with variables . Recall that is given by (8), while is given in (9). As in Section 3.1, we first isolate in the first equation, i.e., , and then substitute it into the second equation. By doing this, the system of (36) transforms to

It can be shown that is a decreasing function and that and for some decreasing function . This is illustrated in Figure 5 and implies that there is at most one real solution for (37), which can be easily found via the bisection method. After this, one sets and uses the approximation .

The main challenge with this approach is the proper setting of . On one hand, (37) may not have solution for certain values of . On the other hand, different feasible values of may lead to significantly different fitting accuracy figures. We investigate these issues in Section 4.3. For now, let us conveniently setso that simplifies to

Observe that attaining an exact closed-form solution for in using (39) is still not viable. Fortunately, can be accurately approximated by a very tractable fractional function of the form

Meanwhile, we know that , and as shown in Figure 5. Then, in order to satisfy such conditions, we can directly set and as follows:where (42) leverages the result in (41) in the last step. Finally, can be obtained easily by standard curve fitting, and then one can set . The accuracy of (40) is depicted in Figure 6.

With (40) in place, one can estimate and asand use the approximation . Such a distribution fitting is illustrated in Figures 7 and 8, and evinces the appropriateness of our approach. Finally, notice that the results here agree also with our previous observations around Figure 3.

4. Fitting Accuracy Analysis

In this section, we assess the accuracy of the fitting methods discussed in Section 3 by adopting the Bhattacharyya distance metric [20]. This metric measures the similarity of two probability distributions, which in this case are the true distribution of , , and the approximate scaled Student’s distribution based on one of the proposed fitting approaches. Notice that since the true/exact distribution of , , is unknown and difficult to compute for a general , we leverage a Monte Carlo approach to estimate the Bhattacharyya distance. Specifically, such a metric is computed as follows:where is the th sample taken from , and is estimated using its histogram. As , (44) approaches the exact Bhattacharyya distance between the continuous probability distributions of and . Finally, we focus on the special case of the sum of i.i.d. RVs for simplicity, adopt , and set without loss of generality.

4.1. Absolute Moment vs. CF-Based Fitting

Figure 9 shows the fitting accuracy of the two methods proposed in this work together with that of a benchmark approach based on the second and fourth moments matching. As commented earlier, such an approach can only be used when , which was one of the key motivations for our work. From the figure, we can tell that (i) the proposed fitting approaches are more accurate than the benchmark based on second and fourth moments matching, which is also restricted to the cases where ; (ii) the fitting based on the CF matching is generally more accurate than the one based on absolute moment matching, although both approaches tend to converge as increases; and (iii) for relatively small , the proposed fitting approaches are more accurate for a smaller , while this behavior might be reverted as increases.

4.2. Scaling Laws of the Fitting Parameters

Herein, we leverage the i.i.d. Assumption to illustrate in Figure 10 how the parameters of the fitting distribution scale with . For this, we adopt only the CF-based fitting, as our previous results indicate that it is the most accurate. We observe that both parameters of the fitting distribution, the scale and the number of degrees of freedom , increase with . Specifically, with and (concave increase). Moreover, and are respectively increasing and decreasing functions of .

4.3. Performance Impact of the CF Parameter Setting

As commented earlier in Section 3.2, different choices of in (36) (or directly in (37)) may lead to significantly different fitting accuracy figures. Here, we investigate these issues by illustrating the estimated Bhattacharyya distance as a function of in Figure 11. Observe that for relatively large values of , the accuracy depends little on the specific value of . This situation changes drastically as decreases approaches two, under which proper setting becomes more critical. Indeed, there is an accuracy-optimum value of , especially noticeable when is small. Notice that although the optimum may be cumbersome to determine beforehand in practice. Nevertheless, and as a rule of dumb, relatively small values of are usually preferred, and our proposal in (38) appears to be a valid (and simple) configuration approach.

5. Conclusion

In this work, we proposed two fitting approaches for the distribution of linear combinations of the Student’s RVs with more than two degrees of freedom. They leverage the second moment together with either the first absolute moment or the CF to fit the distribution to that of a scaled Student’s RV. For the former, we first analytically obtained the absolute moment of a linear combination of Student’s RVs and then generalized to through a simple iterative procedure, while the fitting is direct for the latter but its accuracy depends on the CF parameter. Notably, we proposed a simple CF parameter configuration and showed that it can lead to high fitting accuracy. We resorted to Monte Carlo simulations and adopted the Bhattacharyya distance metric for numerically quantifying the fitting accuracy. We showed that the CF-based fitting can usually outperform the absolute moment -based fitting, although the accuracy provided by both approaches converges when the RVs have a sufficiently large number of degrees of freedom. Interestingly, both proposed approaches outperform benchmark fitting based on second and fourth moments matching, which is only applicable when all RVs have at least four degrees of freedom. Interestingly, both the scale and number of degrees of freedom of the fitting distribution were shown to increase almost linearly with .

Finally, notice that our work opens the path to even more general fitting approaches. Indeed, by discarding the second moment and leveraging only the absolute moments and CFs, one may be able to accurately characterize the distribution of the linear combinations of RVs with more than one degree of freedom. Moreover, by solely relying on CFs matching with different parameters , one may completely remove the constraint on the number of degrees of freedom and arbitrarily fit the distribution of any linear combination of RVs to that of a scaled Student’s t RV. However, further studies are required on how to achieve this in an optimal and/or simple manner.

Data Availability

Codes necessary for reproducing the results of this work are openly available at https://github.com/onel2428/Distribution-of-Sum-of-Students-t-Random-Variables.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by the Academy of Finland (6G Flagship Program under Grant 346208) and the Finnish Foundation for Technology Promotion.