Abstract

In this paper, we introduce a new slash distribution via the power normal distribution and uniform distribution. This new distribution, called the slash power normal distribution, models more appropriately the skewness and kurtosis of data than the power normal distribution. Moreover, the probability density function and the cumulative distribution function of the slash power normal distribution are derived and the density function curves with different parameters are given. We also study the basic properties of the moments, derive the maximum likelihood estimation of the parameters, and substantiate our arguments with numerical simulations. Finally, we model hourly measurements of sulphur dioxide concentrations of a station in Hong Kong by the slash power normal distribution, power normal distribution, and skew normal distribution and use the K-S test to evaluate the model fittings. The result demonstrates that the slash power normal distribution gives a better fit to the data.

1. Introduction

With the advancement of research in statistics and its applications, there has been a growing interest in the construction of flexible parametric classes of distributions exhibiting skewness or heavy-tailed characteristics that are different from those of the normal distribution. Some skewed symmetric distributions have been studied, for example, by Arnold and Beaver [1], Gupta et al. [2], Wang et al. [3], Azzalini [4], and Arellano-Valle and Azzalini [5]. The motivation originates from the analysis of real-life data, including financial, environmental, and medical data whose distributions often do not follow the normal law (see as reported in Azzalini et al. [6], Bartoletti and Loperfido [7], Adcock et al. [8], Hossain and Beyene [9], and Cancela and Pires [10]). It is thus necessary to find and study from the theoretical and practical points of view distributions that can better model skewed or heavy-tailed data.

Kafadar [11] introduced the standard slash normal (SN) distribution which is defined as the distribution of the ratio of a standard normal random variable and an independent uniform random variable on the interval (0,1) raised to the power We write as . When , it yields the standard normal distribution.

In fact, straightforward algebra yields the probability density function (p.d.f.) of a univariate SN distribution ,and the corresponding cumulative distribution function (c.d.f.) of the univariate slash normal distribution,where and are the standard normal density and distribution function, respectively. Hence, the expectation and the variance of the standard distribution are for and for .

A general slash normal distribution is obtained by performing scale multiplication and location shift of a standard slash normal random variable (see Rogers and Tukey [12] and Mosteller and Tukey [13] for further properties). Wang and Genton [14] generalized Kafadar’s univariate slash normal distribution to the multivariate slash normal distribution and investigated its properties. Compared to the standard normal distribution, the standard slash normal distribution has heavier tails and is symmetric about the origin. Thus, it has been used to simulate heavy-tailed data (see, for example, Andrew et al. [15] and Gross [16]).

Extensions of the slash normal distribution have been considered by several authors; among them, Gui [17] introduced a new extension which is not only heavy-tailed but also unimodal and bimodal via the alpha skew normal distribution introduced by Elal-Olivero [18]. Furthermore, the structure of uniform slash and α-slash distributions under discrete and continuous is explored by Jones and Higuchi [19].

In this paper, we consider a generalization of the slash normal distribution, called the slash power normal (SPN) distribution, via the power normal (PN) distribution introduced by Durrans [20]. The PN(α) distribution is a special form of fractional order statistical distribution. Formally, we have X ∼ PN(α) if the p. d. f. of X is

The main idea of this distribution was introduced in detail by Lehmann [21]. Gupta and Gupta [22] studied the fundamental properties of PN distribution. One usual extension that should be noted is the location-scale version of PN distributions. If Y ∼ PN(α), then X = μ +σ Y, where −∞ < μ < ∞ and σ > 0, has the p.d.f. given byand is denoted by .

We derive the p.d.f. and c.d.f. of the SPN distributions. We also study their moments and the maximum likelihood estimation of their parameters.

The rest of this paper is organized as follows. In Section 2, we propose a generalized skewed slash distribution to model heavy-tailed and skewed data and study its properties. In Section 3, we give the maximum likelihood estimation of the parameters. We substantiate our theoretical arguments with a simulation study in Section 4 and a real-life application in Section 5. We then conclude our paper in Section 6.

2. Slash Power Normal Distribution

We give formally the definition of a slash power normal distribution in the stochastic representation by the following theorem.

2.1. Stochastic Representation

We derive the probability density function of the slash power normal distribution in the following theorem.

Theorem 1. Let Y and U be two independent random variables, where and . Then the p.d.f. ofiswhere and

Proof. Let and the inverse function is The Jacobian determinant isWe have the joint p.d.f. of and Now the marginal p.d.f. of is given byThis completes the proof of the theorem.
Since the p.d.f. in (6) is reduced to the SN distribution for , we can say that the distribution given by (6) generalizes the SN distribution. Thus, we have the following definition.

Definition 1. If the p.d.f. of a random variable is given by (6), that is,where are shape parameters, we say that has the slash power normal (SPN) distribution, denoted by . The c.d.f. of the SPN distribution is given as follows. The graph is given in Figure 1.In different types of statistical analysis, it is common to study the survival function, the hazard function, and the inverse risk function. For the SPN, the survival and hazard functions are given separately byFurthermore, the inverse hazard function is

2.2. Density Shape and Special Cases

Similar to the power normal distribution, we graphically show a typical p.d.f. of the SPN distributions with various combinations of parameters.

By the graphical presentation in Figure 2, the p.d.f. of the SPN distribution has heavier tail and lower peak than the PN distribution.

For the ordinary slash normal distribution, it is well known that the thickness of the tail increases with decreasing values of q. From Figure 3, we can observe that for fixed α, the same is also true for the slash power normal distribution.

The SPN distribution has a unimodal density which is skewed to the left if and to the right if Figure 4 depicts a few p.d.f. graphs for different values of . When , the distribution is symmetric since In this case, the p.d.f. of the distribution is given by

This means that the SN distribution is a special case of the SPN distribution.

When , we have the standard univariate skew-slash distribution, which is introduced by Wang and Genton [14] with the p.d.f. (see [14]).

When and , the distribution is reduced to

That is,

The above distribution is just the standard SN distribution .

Also, in terms of Theorem 1, the SPN distribution approaches the PN distribution when approaches infinity.

Proposition 1. Let and , , then

Proof. According to the definition of , we havewhich is just the same as (6). Thus the proof is completed.
The proposition shows that the SPN distribution can be represented as a mixture of the PN distribution and uniform distribution. The result gives us an alternative way to generate random numbers from the SPN distribution.

2.3. Random Number Generation

Firstly, we employ the inverse function method to generate a random number from the

distribution, and the p.d.f. is

Secondly, we generate a random number on (0, 1). Then, the quotient will serve as a random variable from the distribution.

2.4. Moments of the Slash Power Normal Distribution

If a random variable and has the stochastic representation in Theorem 1, then the mean of X is given by

Let

Clearly, ; then,

See Arnold and Beaver [23] for the proof.

We can now easily obtain etc.

Therefore,

In order to obtain the second moment of X, we need to compute the second moment of Y2 (t < 1/2). Firstly, when α = 1, it is obvious that

Then, for α = 2, 3, …, we have

Therefore,

Then, can be obtained by using the abovementioned expressions.

From the definition of the general moments are given by

For special cases, when is a positive integer, we havewhere the p.d.f. of V is given by

And is the p.d.f. of the generalized skew normal distribution (see Gupta and Gupta [24]). Then,

3. Maximum Likelihood Estimation

In this section, we consider the location-scale form of the distribution and introduce an additional location parameter and a scale parameter to the density function. In this form, for and the distribution is obtained. The density function of the general form is given bywhere

Suppose is a random sample of size from (30). Then, the log-likelihood function is expressed as

The maximum likelihood estimates (MLE) of the parameters maximize this likelihood function. Taking the partial derivatives of the log-likelihood function with respect to , respectively, and equalizing the obtained expressions to zero yield the likelihood equationswhere

It should be noted that the abovementioned maximum likelihood estimation equations are not in simple analytical form. Therefore, the estimates can be obtained via numerical procedures such as the Newton–Raphson method and the L-BFGS-B methods. The programs in software packages such as MATLAB and R provide computing routines for solving such nonlinear optimization problems. In this paper, we will use the nonlinear optimization routine optim in R which uses the L-BFGS-B method to compute the estimates.

4. Simulation Study

In this section, we employ a simulation experiment to illustrate the behavior of the maximum likelihood estimators of the parameters. It is well known that the maximum likelihood estimators are asymptotically normal and unbiased under some regularity conditions. However, we cannot derive the Fisher information matrix because the log-likelihood function is a complex transformation. It is thus difficult to obtain the theoretical properties of the maximum likelihood estimators. Hence, we only numerically study the properties of the estimators.

We first generate 1000 samples of sizes 20 and 100 from the SPN distribution with fixed parameters. The estimates are computed by the optim function using the L-BFGS-B method in R. The empirical means and standard deviations of the estimates are given in Table 1.

As can be observed from Table 1, the estimates approach the true values as the sample size increases, which implies the consistency of the estimates.

5. Real-Life Data Analysis

We further demonstrate the existence of the slash power normal distribution in real life in this section. The dataset studied is about air pollution in Hong Kong, a high-density city in China. The data contain the hourly measurements of sulphur dioxide () concentrations of a station in Central, Hong Kong, in January–December 2015. Table 2 summarizes the basic descriptive statistics of the dataset. Figure 5 reveals the positive skewness intrinsic in the data, so the symmetrical distributions are no longer appropriate for describing such data.

We fit the dataset with the , and and use the Kolmogorov–Smirnov (K-S) test to evaluate the model fittings. The K-S test quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. The K-S test statistic is defined aswhere is the empirical distribution function and is the theoretical cumulative distribution of the distribution being tested. The results are presented in Table 3, where the initial values of each parameter are

Table 3 shows the maximum likelihood estimates together with the standard errors of different distributions. The statistic D indicates that the proposed SPN model is a best fit, which is consistent with Figure 6. The Akaike information criterion (AIC), Bayesian information criterion (BIC), and corrected Akaike information criterion (CAIC) are used to measure the goodness of fit of the models. , and where n is the sample size, k is the number of parameters in the model, and L is the maximized value of the likelihood function for the estimated model. The best model is the one with the smallest AIC (or BIC or CAIC).

From Table 3, AIC, BIC, or CAIC shows that the proposed SPN model is a best fit. We can also observe that the SPN model successfully captures the kurtosis and skewness of the SO2 concentration data of Central, Hong Kong (see Figure 5). This shows that the SPN model provides a good vehicle to deal with skewed and heavy-tailed data.

6. Conclusion

In this paper, we have proposed and analyzed a new slash version of the power normal distribution. It is defined as the ratio of two independent random variables, namely, the power normal and the power of the uniform distribution. The slash normal, skew-slash normal, and related distributions turn out to be special cases of the proposed slash power normal distribution. We have studied its basic properties, including the variance and the general moments.

The SPN distribution is fitted to a real-life dataset by the maximum likelihood approach and we have compared it with the power normal and skew normal distributions. The empirical result shows that the proposed SPN model better fits the dataset and it provides us a more appropriate model to deal with skewed and heavy-tailed data.

Data Availability

The data used to support the findings of this study have been deposited in the repository (https://www.epd.gov.hk/epd/sc_chi/environmentinhk/air/air_quality/air_quality.html).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this study.

Acknowledgments

This research was funded by the earmarked grant CUHK 446213 of the Hong Kong Research Grant Council, the National Natural Science Foundation of China (Grant no. 11261044), and Scientific Research Program Funded by Shaanxi Provincial Education Department (Program no. 20JK0549).