Abstract

We introduce an alternative skew-slash distribution by using the scale mixture of the exponential power distribution. We derive the properties of this distribution and estimate its parameter by Maximum Likelihood and Bayesian methods. By a simulation study we compute the mentioned estimators and their mean square errors, and we provide an example on real data to demonstrate the modeling strength of the new distribution.

1. Introduction

The Exponential Power () distribution can be considered as a general distribution for random errors. This distribution has the following density function where , , and . The normal distribution is obtained from this distribution when , whereas heavier (lighter) tail distributions are produced when (). In particular, we obtain the double exponential distribution for and the uniform distribution for . This model and its extensions have been studied by [17] and others.

Hill and Dixon [8] have given evidence that, in real applications, the distribution of the data is often skew, while virtually all robust methods assume symmetry of the error distribution. Moreover, the distribution of real data is seldom so heavily tailed as the ones employed in theoretical robustness studies. To handle both skewness and heavy tails simulate, Azzalini [9] proposed the skew exponential power distribution, which has probability density function pdf where , , and , is the cumulative distribution function of the standard normal distribution and () is the density function of with in (1.1). The distribution reduced to the distribution when , to the Skew Normal distribution, distribution (introduced by [10]) when , and to the normal distribution when (.

Another type of the skew-exponential power distribution proposed by Ferreira et al. [11], denoted by has pdf given by which reduces to the skew-normal distribution when . They provide an EM type algorithm to estimate the parameters of this distribution. Since, the name of two distributions are the same, we use for the second one.

In this paper, we introduce an location-scale mixture distribution. This distribution provides useful asymmetric and heavy-tailed extensions of its symmetric counterparts for robust statistical modeling of data sets involving distributions with heavy tails and skewness.

To this end, in Section 2, the location-scale mixture exponential power distribution is introduced and some properties are given. A Maximum Likelihood and Bayesian methods are constructed to estimate its parameters. In Section 3, in order to investigate the performance of the proposed methods, we present some simulation studies and a real data application.

2. Location-Scale Mixture Exponential Power Distribution

In this section we introduce the location-scale mixture exponential power distribution and derive some distributional properties. We estimate its parameter by Maximum Likelihood and Bayesian methods.

Definition 2.1. A random variable has distribution with location parameter , scale parameter , skew parameter , and shape parameter , denoted by , if where , , , , and is independent of .

Using (1.1) and independence of and , the pdf of random variable in (2.1) can be easily shown to be where .

We draw the density curve of , , and distributions in Figure 1. We see that the distribution is more skew and heavier than the other distributions.

We draw the density curve of for different values of and in Figures 2(a) and 2(b), respectively. We can see that when gets larger, the curves becomes more fatter, and when gets larger, the curve becomes more skew. Also we draw the density curve of for in Figure 2(c). We see that when gets larger, the curve becomes more kurtosis.

2.1. Properties of the Distribution

Some properties of distribution are given in the following theorems.

Theorem 2.2. If    and , , then .

Theorem 2.3. If and , then .

Theorem 2.4. If and , then .

The proof of Theorems 2.2, 2.3, and 2.4, are easily derived from (1.1) and (2.2). From these theorems we can generate deviates.

Theorem 2.5. The nth moment of standardized distribution (i.e., when ) is

Proof. Let and independent of . Then and the th moment of calculated by [6]. So the result is followed by a simple calculation.
From Theorem 2.5 we have Also, the population skewness and kurtosis are easily derived.

2.2. Maximum Likelihood Estimation

Let be a random sample from with the observations . We want to find the ML estimates of the parameters of this distribution. From (2.2) the log-likelihood function is given by Suppose and are known. Differentiate (2.5) with respect to , , , and equating the results to zero, we get the following system of equations, where By choosing the initial values , , , and iterating (2.6) until convergence, we can find the ML estimates.

2.3. Bayesian Method

In this section, we implement the Bayesian methodology using Markov Chain Monte Carlo techniques for estimation of the parameters of the distribution. Let , then the likelihood function of is given by

Now, to find the posterior distribution, we need to specify the prior distribution of the unknown parameters . By considering a normal prior    on both and and a truncated normal (on a ) prior on both and and a truncated normal (on a ) prior on , and without loss of generality the independence of the parameters, that is, The posterior distribution of given can be obtained from (2.8) and (2.9) as follows Distribution (2.10) does not have a closed form. Hence for doing inference, algorithm such as the Metropolis-Hasting can be used to generate samples of the posterior distribution of the parameters. We present the following general scheme of sampling.(1)Set and get starting values for the parameters .(2)For (i)generate from ,(ii)compute (we take based on the symmetric function.)(iii) generate from , if then , else .(3) Set and return 2 until convergence is achieved.

2.4. The Observed Information Matrix

In this section we evaluate the observed information matrix of the distribution, which is defined by

Under some regularity conditions, the covariance matrix of the maximum likelihood estimates can be approximated by the inverse of . The observed information matrix can be obtained as follows where see [12, 13].

Now, we consider which is partitioned into components corresponding to all the parameters in as where We define After some algebraic calculation, we obtain

In the next section we use the above techniques to estimate the parameters.

2.5. Sensitivity Analysis

In this section, we perform sensitivity analysis to detect observations that under small perturbation of the model exert great influence on the maximum likelihood estimators. The best known perturbation schemes are based on case deletion in which the effects are studied of completely removing cases from the analysis by metrics such as the likelihood distance and Cook's distance (see [14]). In this paper, we use the classical measures, namely, Cook distance and the likelihood displacement.

Let be the estimate of without the th observation in the sample. To assess the influence of the th case on the estimate , the basic idea is to compare the difference between and . If deletion of a case seriously influences the estimates, more attention should be paid to that case. Hence, if is far from , then the th case is regarded as an influential observation. A first measure of global influence is defined as the standardized norm of , namely, the generalized Cook distance where is the observed information matrix (in point) presented in Section 2.4. Another measure of the difference between is the likelihood distance

In the next section we perform sensitivity analysis to illustrate the usefulness of the proposed methodology.

3. Applications

In this section, we present two examples of application of distribution. The first one is the two small simulation studies and the other is two statistical analysis of real data sets.

3.1. Small Simulation Studies

We perform a small simulation study to investigate the estimators that were proposed in Section 2.2. We first generate 100 samples of different sizes from distribution for fixed and parameters. We compute the estimates of , , and by the iteration method that was illustrated in Section 2.2, which we denote by , , and , . Then the mean and mean square error of these values are reported as the estimates and . For example, and . The estimates and the s are given in Table 1. This table shows that when the sample size increase, the of estimates , , and convergence to zero. A similar result was happened for the bias of these estimators.

The second small simulation study is the performance of the Bayesian method that was proposed in Section 2.3. We generate 100 samples of sample size 100 from LSMEP(0,1,1.5,1,3), LSMEP(-1,2,3.5,-5,5.5), and LSMEP(2,3,4.5,3,7). Then we compute the Bayes estimates of the parameters for each sample by method and derive the final estimates and their s similar to the method that used in estimation. The estimated parameters and their s are given in Table 2.

3.2. Real Data Application

We use the Australian athletes dataset analyzed in [1519]. The dataset contains several variables measured on athletes. We consider the variable red cell count . They note skewness on the left as well as heavy-tail behavior. We fit a , an and an distribution to this data set. In the first method we use the optim routine in software to find the maximum likelihood estimates of the parameters for and distributions and we use the type algorithm for the estimated parameters for distribution. In the second method we use the Bayesian estimates of the parameters. The results are shown in Tables 3 and 4. These tables contain the estimates of the parameters of the , , and distributions, besides their corresponding standard errors , computed via the information-based method presented in Section 2.4 for the LSMEP distribution. We used the Hessian matrix in optim routine for computed of parameters of distribution and the for distribution computed by [11]. For comparing the models, we also computed the AIC [20] and EDC [21] criteria. From these criteria and Figures 3 and 4, we see that distribution has a better fit than the others to this dataset.

3.3. Sensitivity Analysis

We use the results obtained through simulated and real data set to illustrate the advantage of the proposed methodology.

3.3.1. Simulated Data

We perform a simulation study to investigate the empirical performance of the proposed methods in Section 2.5. We generate 4 samples of 100 size from distribution for fixed and parameters. Now we consider the following atypical points where . Then we compute which is the estimate of with sample and which is the estimate of without the th observation in . We compute the generalized Cook distance and likelihood distance that was proposed in Section 2.5. Figure 5 depicts the index plot of and for case weights perturbation. In Figures 5(a) and 5(b) we see that for all the perturbation schemes considered, the atypical points 5, 30, and 70 were correctly picked up indicating that the methodology works very well when suspicious points are presented in the data set.

3.3.2. rcc Dataset

In this section, we use the real data set to find the points which are influential in parameters estimation. Let be the estimate of in data and let be the estimate of without the th observation, then we compute the and as diagnostics for global influence. For the case deletion diagnosis, the measures and presented in Figures 6(a) and 6(b), respectively, indicate individual 55, 161, 166, 174, and 181 as influential. Note that the individual 161 and 174 are very influential.