#### Abstract

We consider the class of those distributions that satisfy Gauss's principle (the maximum likelihood estimator of the mean is the sample mean) and have a parameter orthogonal to the mean. It is shown that this so-called “mean orthogonal class” is closed under convolution. A previous characterization of the compound gamma characterization of random sums is revisited and clarified. A new characterization of the compound distribution with multiparameter Hermite count distribution and gamma severity distribution is obtained.

#### 1. Introduction

The topic of maximum likelihood characterizations of distributions has a long history and is an active field of contemporary mathematical sciences. It concerns the characterization of a (class of) probability distribution(s) through the structure of the maximum likelihood estimator (MLE) of one or more parameters of interest (e.g., location, scale, etc.). Starting point is a famous result by Gauss [1] on the foundation of least squares theory (see, e.g., [2], [3, afterword, pages 208 and 215]). Given is a location family with continuous derivative. If the maximum likelihood estimator of the location parameter is the sample mean, then the distribution is normal. This important result has been discussed by Poincaré [4, Chapter 10], Teicher [5], Ferguson [6, 7], Marshall and Olkin [8], Bondesson [9], and Azzalini and Genton [10], by many other authors. The property that the MLE of the mean is the sample mean has been called *Gauss’s principle* by Campbell [11] (see also [12, 13] and references therein). A brief account of the present contribution follows.

Within the framework of multiparameter distributions, consider the “mean orthogonal class” of those distributions that besides Gauss’s principle have a parameterization such that the mean is orthogonal to some parameter vector, a property which can always be satisfied by Amari [14], Section 8. This class has been considered by Sprott [15]. A characterization of the mean orthogonal class through the cumulant generating function (cgf) has been formulated by Hürlimann [16]. Extending a result by Hudson [17], it is shown in Theorem 3 that this class is closed under convolution. Section 3 is devoted to a characterization of random sums through the mean orthogonal class. Hürlimann [18] has established that the mean scaled severity in a compound model is necessarily gamma distributed provided that the count distribution and the distribution of the random sum belong to the mean orthogonal class, and some additional partial differential parameter equation can be solved. A followup to this construction for the individual model of risk theory is Hürlimann [19]. We clarify and simplify the original proof to obtain a characterization that is further used in Section 4. Based on a result by Puig and Valero [20], we derive in Theorem 17 a most stringent characterization, which allows compounding of the gamma distribution under a single count data family, namely, the multiparameter Hermite distribution. This one requires that the count distribution is closed under convolution and binomial subsampling.

#### 2. Distributions with the Mean Orthogonal Property

Let be a random variable whose distribution depends upon a vector of parameters, where the mean is functionally independent of , that is, , . The log likelihood of is denoted by . We assume throughout that the cumulant generating function (cgf) exists and denotes the variance by . The standard regularity conditions for maximum likelihood estimation are supposed to hold. The vector denotes a random sample of size , which realizes the random variable , and denotes the sample mean. We are interested in the class of distributions that satisfy Gauss’s principle (the maximum likelihood estimator of the mean is the sample mean), that is, such that . A distribution belongs to this class if and only if there are functions , , , such that the following equivalent partial differential equations hold (e.g., [15–17]):

*Definition 1. *The mean is called *orthogonal* to the parameter vector , denoted by , if one has .

The original motivation for parameter orthogonality is improvement of maximum likelihood estimation by reparameterization. In the class the number of maximum likelihood equations is reduced by one and parameter orthogonality decreases the often high correlation between the MLEs of the parameters since the MLEs of orthogonal parameters are asymptotically uncorrelated. Indeed, the expectations in (2) are elements of the (expected) Fisher information matrix, which determines the asymptotic covariance matrix of . In this respect, one is interested in the subclass of of all distributions satisfying besides the *mean orthogonal property *. This so-called *mean orthogonal class* is characterized as follows.

Theorem 2 (Characterization of the mean orthogonal class). *Let be a random variable with cgf satisfying the above assumptions. Then, one has if and only if the following quasi-linear partial differential equation is satisfied:
*

*Proof. *This is shown in Hürlimann [16].

Hudson [17, Theorem 1] has shown that the class is closed under convolution. In fact, convolution invariance holds under the more stringent mean orthogonal property.

Theorem 3 (Convolution invariance of the mean orthogonal class). *If are independent, then . More precisely, if has cgf , with , then the cgf of , with , , and , satisfies (2) and one has , .*

*Proof. *Without loss of generality we assume that . Since and , one can express as a function of through the parameter transformation , . Since and , one obtains
which implies the result by (2) of Theorem 2.

*Example 4. *Binomial random variables and their convolutions belong to the class . For two binomials this is shown in Hürlimann [16, Example 2] (see also [21]). For an arbitrary number of binomials this is derived in the Appendix of Hudson [17].

#### 3. Mean Orthogonal Characterization of the Compound Gamma Distribution

Consider random sums of the type
where the ’s are independent and identically distributed nonnegative random variables, and is a counting random variable defined on the nonnegative integers, which is independent of the ’s. The mean and variance of , , and are denoted, respectively, by , , , , and , . The coefficient of variation of is denoted by . In some applications, it is convenient to scale the severity by the mean such that the *mean scaled severity * has mean . The resulting sum
is called *mean scaled compound* random sum. The mean scaled compound model has important insurance risk applications. It has been studied in Hürlimann [18], which establishes that the mean scaled severity is necessarily gamma distributed provided that the random variables and belong to the mean orthogonal class and some additional partial differential parameter equation can be solved. A followup to this construction for the individual model of risk theory is Hürlimann [19]. We clarify and simplify the original proof to obtain a characterization of (4), which will be used in Section 4. In particular, (3.20) in Hürlimann [18] is not a consequence but an assumption. Since this equation is satisfied in the provided examples, this error does not harm the obtained result but must be rectified from a mathematical logical point of view. Also, the proof of Lemma 7 there will be simplified (proof of Lemma 8 below).

Theorem 5 (Compound gamma characterization). *Let be a counting random variable with cgf . Suppose there exists a one-to-one coordinate transformation mapping to such that , and set . Suppose the cgf of the severity exists, and let be the cgf of the random sum . Assume the cgf of the mean scaled severity is functionally independent of , and set , and . If and , then is gamma distributed with cgf .*

To show this, some preliminaries are required. First, we review conditions under which . Given the probability generating function (pgf) of , it is very useful to consider the associated so-called *cumulant* pgf defined by
The given name stems from the following series representation of the cgf:

*Remark 6. *The sequence , , is the unique solution of the system of equations (e.g., [22, Corollary 2], [23], and [24, Theorem 1]):
If , , the distribution of is compound Poisson with parameter and severity distribution , . Otherwise, one speaks of the so-called *pseudo-compound Poisson* representation of the distribution.

Lemma 7. * Let be a counting random variable with cgf of the form (7). Suppose there exists a one-to-one coordinate transformation mapping to , and set . Then with is equivalent to the following conditions:
*

*Proof. *The condition (9) is a restatement of Theorem 2. Applying the chain rule of differential calculus, this condition transforms to
Now, by Lemma 8 below and the chain rule, one has
Inserting into (12) shows that
The statements (10) and (11) follow by using the representation (7).

Lemma 8. *If , then the partial differential parameter equation holds.*

*Proof. *The representation (7) implies that , . Now, using (7) one sees that (9) is equivalent to , . It follows that .

*Proof of Theorem 5. *Let be the moment generating function of . Expressed in terms of the mean scaled severity one has . The relationship (7) for the cgf yields the series expansion:
By Theorem 2 one has if and only if the equation is satisfied. With the series representation for and the assumption , this equation is equivalent to the following condition (use that does not depend on ):
Now, by Lemma 7 and (10), one has the identity (use the differential chain rule)
which, together with , implies that
Inserted into the above expression one obtains the ordinary differential equation:
whose unique solution is . Since , one sees that
is the cgf of a gamma-distributed random variable. The proof is complete.

The proof uses the so-called *natural* parameterization of the compound gamma distribution. It is interesting to obtain explicit parameters orthogonal to the means of , and . By the assumption one has , and since is gamma distributed, one has with . It remains to construct a parameter vector orthogonal to the mean of such that
where must be determined. This task can be solved in a unified way for a lot of counting distributions (see [18, Section 4]). To illustrate the method, it suffices to consider here a single example.

*Example 9 (compound negative binomial gamma distribution). *Let , , , be a negative binomial random variable. Its cumulant pgf (6) reads , . One has the following identity (see [18, equation (4.7)]):
which implies for that . Together, this shows that (10) is satisfied. Therefore, one has and . Now, by Theorem 5 the compound negative binomial will be a compound negative binomial gamma if and is satisfied. Written in terms of the parameter , , the latter equation is equivalent to the condition . With one obtains the differential equation , which has the solution for some . In the coordinates , this constant is equal to
Since one must have .

#### 4. Mean Orthogonal Characterization of the Compound Multiparameter Hermite Gamma

The mean orthogonal characterization of the compound gamma distribution allows for a wide variety of count data distributions in the mean orthogonal class. In order to reduce further the possible set of count distributions that can be used, one can ask for characterizations in terms of additional assumptions. For example, Puig [25] and Puig and Valero [26] characterize count data distributions satisfying Gauss’s principle and several notions of additivity, which via Theorem 5 can be translated to characterizations of compound gamma distributions. Based on a result by Puig and Valero [20], we derive a most stringent characterization, which allows compounding of the gamma distribution under a single count data family, namely, the multiparameter Hermite distribution. To show this, some additional preliminaries are required.

*Definition 10. *Let be a counting random variable, let , be independent and identically distributed Bernoulli random variables with probability of success , and let be independent of . Then is called an *independent p-thinning* of .

*Definition 11. *Let **F** be a family of count distributions. It is called closed under *binomial subsampling* if, for any random variable with distribution in **F**, all its independent p-thinnings, for all , have distributions in **F**.

*Definition 12. *Let **F** be a family of distributions. It is called closed under *convolution* if, for any two independent random variables with distributions in **F**, the distribution of the sum also belongs to **F**.

*Definition 13. *Let be an integer random variable with pgf and *factorial cumulant generating function* (fcgf) . For any integer the th *factorial cumulant* of is defined and denoted by .

There is only one count distribution family closed under convolution and binomial subsampling.

Theorem 14 (Characterization of the multiparameter Hermite distribution). *Let be a family of count distributions parameterized by its first factorial cumulants and assume that its pgf is continuous in over its parameter space. Then is closed under convolution and binomial subsampling if and only if the pgf is of the form
*

*Proof. *See Puig and Valero [20], proof of Theorem 1.

Some comments are in order. The case corresponds to the Poisson distribution, is the Hermite distribution (e.g., [27]). For arbitrary , this distribution is called the *multiparameter Hermite* distribution of order by Milne and Westcott [28]. In terms of the cumulant pgf (6), the representation (24) can be rewritten as
where , solves the system in (8), that is,
The case of (26) is already in A.W. Kemp and C.D. Kemp [29], and for arbitrary this assertion is equivalent to Lemma 2 in Puig and Valero [20]. The special case , , , is the generalized Hermite by Gupta and Jain [30]. The multiparameter Hermite belongs also to the Kumar [31] family of distributions. In general, the conditions on the sequence , , under which (25) defines a true probability distribution have been identified in Lévy [32]. According to Lukacs [33, page 252] and Johnson et al. [34, page 356], this is the case provided that a negative value is preceded by a positive value and followed by at least two positive values. In particular, if at least are nonzero, then , are necessary conditions for (25) to be a pgf [28, Remark 1]. If for , then the multiparameter Hermite is compound Poisson with parameter and severity , thus infinitely divisible by Feller [35, Section XII.2]. Due to the next result, the multiparameter Hermite is of interest in the context of Gauss’s principle, orthogonal parameters to the mean, and the related compound gamma characterization of random sums.

Lemma 15. *Let , , be continuous real functions in the parameter vector over some parameter space, and set , , for a parameter . Assume that the cumulant pgf defines a feasible multiparameter Hermite random variable of order over the parameter space. Then and .*

*Proof. *Set . Then one has
Together, this shows that (10) is satisfied. The result follows by Lemma 7.

*Example 16 (Hermite distribution ()). *Suppose the Hermite distribution is parameterized by its first two factorial cumulants . Since , , it can equivalently be parameterized by its mean and variance . Consider a parameterization , such that , . There exists a one-to-one mapping between and . Since ,, it is determined by the coordinate transformation:
Therefore, the cumulant pgf defines a feasible two-parameter Hermite distribution such that the corresponding random variable belongs to and . Since one notes that the Hermite distribution is necessarily overdispersed. As noted by Puig and Valero [20] overdispersion holds for all infinitely divisible multiparameter Hermite distributions of arbitrary order . Therefore, it should be useful to analyze data with this property (e.g., claim number data in automobile insurance, up-to-date Hürlimann [36, page 802], and multiparameter Hermite for ).

We are ready for the following new characterization result.

Theorem 17 (Compound multiparameter Hermite gamma characterization). *Let be a counting random variable parameterized by its first factorial cumulants and assume that its cgf is continuous in over its parameter space and set , . Suppose the cgf of the severity exists, and let be the cgf of the random sum . Assume the cgf of the mean scaled severity is functionally independent of , and set , . Assume is closed under convolution and binomial subsampling, and . Then is a multiparameter Hermite distribution of order , and is gamma distributed with cgf . Furthermore, there exists a parameterization of such that its cumulant pgf reads . One has with , , , and, in the coordinates , the constant is equal to
*

*Proof. *The result follows by combining Theorems 5 and 14 making the observation that a multiparameter Hermite distribution can always be put in the form of Lemma 15 (generalization of Example 16). The assertion about the orthogonal parameters to the means follows along the same arguments as in Example 9 using (27).