Likelihood Inference of Nonlinear Models Based on a Class of Flexible Skewed Distributions

Chen, Xuedong; Zeng, Qianying; Song, Qiankun

doi:https://doi.org/10.1155/2014/542985

Abstract and Applied Analysis

On this page

Abstract Introduction Notation Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2014 | Article ID 542985 | https://doi.org/10.1155/2014/542985

Likelihood Inference of Nonlinear Models Based on a Class of Flexible Skewed Distributions

Xuedong Chen,¹Qianying Zeng,²and Qiankun Song²

Academic Editor: Jinde Cao

Received28 Aug 2014

Accepted28 Sept 2014

Published03 Dec 2014

Abstract

This paper deals with the issue of the likelihood inference for nonlinear models with a flexible skew-t-normal (FSTN) distribution, which is proposed within a general framework of flexible skew-symmetric (FSS) distributions by combining with skew-t-normal (STN) distribution. In comparison with the common skewed distributions such as skew normal (SN), and skew-t (ST) as well as scale mixtures of skew normal (SMSN), the FSTN distribution can accommodate more flexibility and robustness in the presence of skewed, heavy-tailed, especially multimodal outcomes. However, for this distribution, a usual approach of maximum likelihood estimates based on EM algorithm becomes unavailable and an alternative way is to return to the original Newton-Raphson type method. In order to improve the estimation as well as the way for confidence estimation and hypothesis test for the parameters of interest, a modified Newton-Raphson iterative algorithm is presented in this paper, based on profile likelihood for nonlinear regression models with FSTN distribution, and, then, the confidence interval and hypothesis test are also developed. Furthermore, a real example and simulation are conducted to demonstrate the usefulness and the superiority of our approach.

1. Introduction

The common assumption of distribution for random error is normal in statistical modeling. This assumption may lack the robustness against departures from normality and/or outliers and may result in misleading inferential results [1, 2]. For the past few years, there is an increasing interest in developing more flexible parametric families capable of adopting as closely as possible real data, which exhibit quite substantial nonnormal characteristics such as skewness and heavy tails. In a variety of applications, one popular option is to modify a symmetric probability density function of a variable, thereby introducing skewness. An important advantage of this sort of approach compared with other approaches to robustness is an explicit statement of the probabilistic setting, leading to a clear interpretation of the results [3]. Following this idea, the skw-normal (SN) distribution was firstly introduced by [4], and, then, the skew-t (ST) distribution was introduced by [5]; the skew-t-normal (STN) was introduced by [6]; moreover, some extensions to these multivariate cases were studied by [7, 8] and so on. Since then, several authors have tried to extend these results to more general forms of skew-symmetric distributions, of which here we would like to mention [9], in this paper; they proposed a general framework of distributions which is called flexible skew-symmetric (FSS) distribution. As pointed out by [10] that this distribution family enjoys a sufficient flexibility in that with different choice of submodel settings, the FSS distribution includes several known distributions such as the SN and ST as its special cases.

However, in many practical applications, it is not rare at all to encounter a multimodality, sometimes with an even irregular shape, and, for this case, all the distributions mentioned above appear to be unsufficient to describe the multimodal feature of the data. A solution to this problem is to use finite mixture models. In [11], the authors worked with a mixture model with component densities belonging to the STN distribution and a computationally feasible EM-type algorithm was developed for calculating the maximum likelihood (ML) estimates of parameters. Unfortunately, although the proposed methodology is useful for analyzing multimodal asymmetric data, it suffers from the problem of “model identification” as the number of the parameters to be estimated is usually large. As a result, in this paper, we deal with a new extension of the class of FSS distributions, which is referred to as flexible skew-t-normal (FSTN) distribution. This new distribution is proposed within the general framework of the FSS distributions in combination with the definition of STN distribution. In practical applications, it is able to regulate the density in a more flexible way to offer robustness and it can be treated as an appealing option for accommodating data with skewness and heavy tails as well as multimodality jointly.

On the other hand, nonlinear regression models are widely applied in the fields of economics, engineering, biomedical research, and so forth, where the nonlinear function of unknown parameters is used to explain or investigate the nonlinear relationship of random phenomena under study. More recently, several authors have used a class of skewed distributions in the context of nonlinear regression models, and some valuable results were obtained. For example, [12] developed the robust estimation and the local influence analysis for regression model with SMSN distribution. From Bayesian point of view, [13] considered the Bayesian estimation and the case influence diagnostics for nonlinear regression models with SMSN distributions. More related literature could be found in [14–17]. Generally speaking, for model fitting of the nonlinear regression with skewed distributions, a popular approach is to consider the hierarchical representation of variables with a specific distribution, in which the postulated distribution is expressed as several conditional distributions of simpler forms such as normal and Student’s and Gamma. Based on that, EM algorithm or Bayesian hierarchical approach then can be implemented effectively for conducting model estimation and statistical inference.

In this paper, our aim is to develop an approach to likelihood inference of nonlinear regression models with FSTN assumption. As there is no stochastic representation for FSTN distribution, all the methods cited above become unavailable for our considered problem and an alternative way is to return to the original Newton-Raphson iterative procedure for model estimation. Under the nonlinear regression paradigm, the accuracy of estimates is affected by the strength of nonlinearity and the corresponding confidence interval and hypothesis test require the assumption of normality of the estimators or distribution, which is too restrictive. Besides, considering that in many practical applications, rather than the total parameters, we are usually interested in a proper subset of them. By taking all these factors into account, in this paper we focus on the parameters of interest and propose a modified Newton-Raphson iterative algorithm for calculating the ML estimates based on profile likelihood. Furthermore, the confidence interval and hypothesis test for the parameters of interest are also considered. We conduct an application and a simulation study to compare the algorithm effectiveness and distribution robustness for nonlinear regression model in terms of fitting performance and model selection. The results from the numerical examples illustrate the usefulness and the superiority of our methodology.

The remainder of this paper is organized as follows. In Section 2, we briefly discuss the FSS distribution and FSTN distributions. In Section 3, we present the likelihood inference including the quantities of the first- and the second-order derivatives as well as the standard Newton-Raphson iterative formula. In Section 4, we give an introduction of profile inference for our proposed model, where the confidence estimation and hypothesis test are presented too. Section 5 gives numerical examples using both simulated and real data to illustrate the performance of the proposed methodology. Finally, some concluding remarks are given in Section 6.

2. Models and Notation

The class of skewed distributions such as SN, ST, and STN perform to be plausible for modeling skewness or (and) heavy tails underlying the observations. The actual situation is that it is not rare at all to encounter multimodality, sometimes with an even more irregular shape, and, for this case, the aforementioned distributions become unsufficient. In this paper, with the adoption of a sufficiently flexible class of distributions, we consider one of these extensions, referred to as the family of flexible skew-symmetric (FSS) distributions which is introduced by [9] with the following density function of type: where and are symmetric univariate density and distribution function, respectively, that is, , , and is an odd polynomial of degree (i.e., a polynomial including only terms of odd degree), , is the location parameter, is the scale parameter, and and are shape parameters.

In general, the density function of STN distribution can be represented as , where and , respectively, denote the univariate standard Student’s density function and the univariate standard normal distribution function and is the degrees of freedom. The skewness is regulated by the shape parameter and the tail thickness of the distribution is controlled by . Commonly, in comparison with the SN distribution, the STN distribution exhibits obvious feature of heavy tails when .

In this paper, we work with one version of (1) and the specific definition can be presented as follows. Let and , where is defined as before and that is, the density function of univariate Student’s distribution with 0 location, scale, and degrees of freedom. The above extension of (1) is referred to as flexible skew-t-normal (FSTN) distribution, denoted by .

It is noted that the FSTN distribution is proposed within the general framework of FSS distribution by combining with the definition of STN distribution and, as a consequence, it shares analogous feature with these two distributions. For all of that, the FSTN distribution presents some interesting and peculiar features and is able to regulate the density in a more flexible way. To be particular, except for modeling skewness and tail thickness, the FSTN distribution allows for multimodality, depending on the specific setting of . For the purpose of comparison and illustration, we assume and in FSS distribution, which is denoted by FSN, and, then, we set in , that is, in STN distribution, and, for this case, different selections of and determine whether the density is unimodal or bimodal. Moreover, the same assumption for and in FSTN distribution is made.

Figure 1 displays the density functions of FSN and STN as well as FSTN distributions with four different situations considered, namely, , , and ; and ; , , and ; , , and , respectively, with , for all cases. By examination of Figure 1, we can detect how these three densities change with different combinations of and . For instance, in Figure 1(a), FSN, STN, and FSTN appear to be very close, while STN and FSTN are heavy tailed to a little extent. In Figure 1(b), both FSN and FSTN are unimodal when and keep the same sign, and the ranking for the degree of skewness is STN, FSTN, and FSN in turn; that is, the STN and FSTN distributions have thicker tails compared to FSN distribution. With opposite sign of and , both FSN and FSTN distributions are bimodal and highly skewed in Figures 1(c) and 1(d); moreover, in the same direction, FSTN has thicker tails than FSN distribution. Our proposed FSTN distribution can be treated as a proper compromise between the FSS distribution and the STN distribution. It allows for a wider range of tail behavior compared to FSS distribution whereas it is able to accommodate multimodality which cannot be described by STN distribution. From the applied viewpoint, the FSTN distribution is an appealing option which can be expected to yield robust inferential results in the presence of outlying observations.

(a)

(b)

(c)

(d)

3. Likelihood Inference

Consider independent observations satisfying a nonlinear regression model as with for . Here, is a -dimensional vector and is a vector of parameters. Also, let be the design matrix; is a known twice differentiable function. Then, the corresponding log-likelihood for parameter is given by And we have where and . The corresponding second-order derivatives of (3) can be shown as where , , and .

Assume is the gradient or score vector and is the Hessian matrix composed of the above second-order derivatives. To obtain the ML estimate of , the Newton-Raphson iteration algorithm is defined by It is noted that the above iterative procedure is an unpartitioned algorithm; that is, all the parameters including nonlinear regression coefficients , scale parameter , and shape parameter as well as tail thickness parameter are estimated simultaneously. For our considered problem, there are at least two difficulties that may be encountered for (6); the first one is that once the number of the parameters to be estimated becomes large, the corresponding computational burden turns to be heavy with an unacceptable estimation error, and the second one is as follows: when the strength of nonlinearity of the link function changes, the iterative process may become unstable or even nonconvergent, leading to the poor estimation results. Considering the needs of practical problems, rather than the total parameter set, we are usually interested in a proper subset of it. To improve the efficiency of the algorithm and to facilitate statistical inference of the nonlinear models with FSTN distribution, we put forward the following profile likelihood method based on (3) and (6).

4. Profile Likelihood Inference

4.1. Profile Estimation Algorithm

Let be a partition of , where is a parameter vector of nonlinear regression, an interest parameter, and is a -dimension nuisance parameter with . Similarly, the partition of and is given as , where , , , and , and the diagonal elements of are given by , respectively, and the off-diagonal elements in can be obtained similarly. Let , corresponding to the partition of .

In the subsequent context, we focus on the estimation of based on profile likelihood method [18]. Firstly, suppose is known and we rewrite the original likelihood function (3) as where notation denotes that is fixed but varies. For each , to estimate we can obtain Alternatively, to estimate , we evaluate the maximum value of over and have where and are referred to as the profile likelihood function and the profile ML estimation, respectively.

Following [19], we define the profile Newton-Raphson iteration formula as follows: where , , and all the matrices and the vectors on the right hand side of (10) are evaluated at and .

Note that, both in (6) and in (10), the strength of nonlinearity of link function is reflected by to large extents. Therefore, by examination of the expression of , we find that when the element of is much less than the corresponding element of the following approximating result can be obtained as And then, the iteration formulas of , , , and can also be obtained just as before. The above estimation procedure is referred to as profile modified Newton-Raphson iteration algorithm.

The stopping rule for the above algorithm can be presented as that iteration proceeds until some distance involving two successive evaluations of the profile likelihood , such as or , is small enough; for example, is adopted in this paper.

The asymptotic covariance matrix of the ML estimates for profile likelihood can be evaluated by inverting the expected information matrix; however, it does not have a closed form expression; the observed information matrix can be used as a replacement which is estimated by , where , , and can be obtained as similar as above.

The choice of the initial values plays an important role in nonlinear regression fitting; in this paper, the specific steps for choosing the starting values are implemented as follows:(i)compute the initial value based on the nonlinear regression model with standard normal assumption;(ii)with fixed, compute the initial values and for the SN and finite mixture SN assumptions, respectively.

In order to simplify the estimation of parameter , we have fixed integral values for from 2 to 40 by one; choose the value of that maximizes the profile likelihood as , and, then, the initial values of , , , that are required in the estimation procedure are all obtained.

4.2. Profile Confidence Estimation and Hypothesis Test

Confidence interval and hypothesis test play an important role in statistical inference and, in the subsequent content, we will consider the profile confidence estimation and hypothesis test for the parameters of interest in nonlinear regression models. Suppose ; the following regular conditions for likelihood inference are assumed:(R-i); (R-ii);(R-iii), where is arbitrary measurable function;(R-iv).

Apart from the above assumptions, in this paper, some additional conditions are assumed to hold for that:(A-i) is twice continuously differentiable with respect to ;(A-ii) is full rank for all ;(A-iii)assume is the true value of ; then, there exist a neighbour region of and a constant such that for any .

Under the above assumption, following [20], we have where is the true value of and the convergence is under the meaning of convergence in probability. Based on the profile likelihood theory, the confidence region of with the confidence level is given by where denotes the upper percentile of chi-square distribution with degrees of freedom. Furthermore, the hypothesis test can also be considered and the corresponding test statistics are presented by . Unlike standard likelihood method, profile likelihood confidence intervals and hypothesis test do not need assumption of normality of the estimator or distribution which is too restrictive; they are based on an asymptotic chi-square distribution of the log profile likelihood ratio test statistics and these properties bring a lot of convenience and feasibility in practical computation.

5. Simulation Study

To investigate the experimental performance of our methodology, we undertake a simulation study to compare the fitting performance of misspecified distribution as well as understanding the large sample properties of the ML estimates. To realize this purpose, we generate the artificial data from the following two nonlinear models:

Model : drug responsiveness model:

Model : nonlinear growth-curve model: with two kinds of skewed distributions for random error as follows: Case (I): and Case (II): .

The true parameter population is chosen as follows: is assumed to be the same for the above two models. In Model , let , , , and and in Model let , , , and . Note that, for the values of the degrees of freedom , a relative value () can yield a heavy-tailed distribution as we need.

In this simulation, there are totally four simulated data sets corresponding to two nonlinear models, Model and Model , along with two distributions, Case (I) and Case (II), for random error. Similar to previous analysis, each simulated data set is fitted under STN and FSN as well as FSTN scenarios using three different estimation algorithms. To shed light on the experimental performance of our methodology, an interesting comparison can be made by examining how often we can recognize the true model.

Table 1 shows the absolute value of the average bias between the true and the estimated parameters and the percentages of each model being ranked as the best model based on AIC criterion out of 500 replications are also presented in Table 1.

By examination of Table 1, we can find the following: (i) the difference for the estimation results between drug responsiveness model and growth-curve model is significant, indicating that the nonlinearity of model imposes an impact on parameter estimation; (ii) when the true distribution for random error is STN, the three fitting distributions have roughly the same behavior; it is hard to tell which approach for parameter estimation is good and which approach is bad and the similar results can be obtained for model selection based on AIC; (iii) when the true distribution for random error is FSTN, the FSN and the FSTN distributions outperform the STN distribution by producing estimates with lower bias and higher AIC proportion; furthermore, the PNR and MPNR methods perform better than the traditional NR method in general.

To study the consistence properties of ML estimate, we focus on the situation that the true distribution for random error is Case (II) whereas the fitting distribution is FSTN too. For this case, two estimation algorithms, PNR and MPNR, are adopted for parameter estimation and samples of different sizes (, 100, and 200) are generated from Models and . We compute the 95% confidence interval for the parameters of interest and the mean square error (MSE) for different model, where . The length of the 95% confidence interval and the MSE results are summarized in Table 2. From Table 2 we can see that the length of the confidence interval and the MSE tend to decrease as the sample size increases as expected.

Tables 1 and 2 show that in general FSTN distribution enjoys more robustness and flexibility in modeling data with skewness and heavy tails as well as multimodality in comparison with other skewed alternatives and the implementation of MPNR method brings more accuracy and improvement for model estimation in the context of nonlinear regression with this new distribution.

6. Conclusion

We have proposed a new skewed distribution based on the general FSS distribution framework, called the FSTN distribution, which is allowed to accommodate multimodality, asymmetry, and heavy tails jointly to offer greater flexibility than SN and STN counterparts. Moreover, we have developed a modified profile of Newton-Raphson iterative algorithm for estimating the parameters of interest of nonlinear model with FSTN distribution and the interval estimation and hypothesis test in a profile likelihood paradigm are also considered.

Numerical studies reveal that, in the context of nonlinear regression analysis, if the true distribution is STN or FSN whereas the fitting distribution is FSTN, the estimation results are not influenced by this misspecification of distribution assumption. However, once the true distribution is FSTN while the fitting distribution is STN, the estimation results appear to be somewhat disappointing, which shows the robustness of FSTN distribution. In general, the combination of FSTN distribution with MPNR method brings more accuracy and improvement on the estimation of nonlinear regression.

So far the present methodology is limited to the complete data analysis; the extensions of this paper include missing data version as well as Bayesian analysis of this model, which will be reported in another paper.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grants 11171105 and 61273021 and in part by the Natural Science Foundation Project of CQ cstc2013jjB40008.

References

G. Verbeke and E. Lesaffre, “A linear mixed-effects model with heterogeneity in the random-effects population,” Journal of the American Statistical Association, vol. 91, no. 433, pp. 217–221, 1996.
View at: Publisher Site | Google Scholar
P. Ghosh, M. D. Branco, and H. Chakraborty, “Bivariate random effect model using skew-normal distribution with application to HIV-RNA,” Statistics in Medicine, vol. 26, no. 6, pp. 1255–1267, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
A. Azzalini and M. G. Genton, “Robust likelihood methods based on the skew-t and related distributions,” International Statistical Review, vol. 76, no. 1, pp. 106–129, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
A. Azzalini, “A class of distributions which includes the normal ones,” Scandinavian Journal of Statistics, vol. 12, no. 2, pp. 171–178, 1985.
View at: Google Scholar | MathSciNet
A. Azzalini and A. Capitanio, “Distributions generated by perturbation of symmetry with emphasis on a multivariate skew $t$ -distribution,” Journal of the Royal Statistical Society, Series B: Statistical Methodology, vol. 65, no. 2, pp. 367–389, 2003.
View at: Publisher Site | Google Scholar | MathSciNet
H. W. Gómez, O. Venegas, and H. Bolfarine, “Skew-symmetric distributions generated by the distribution function of the normal distribution,” Environmetrics, vol. 18, no. 4, pp. 395–407, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
A. Azzalini and A. Dalla Valle, “The multivariate skew-normal distribution,” Biometrika, vol. 83, no. 4, pp. 715–726, 1996.
View at: Publisher Site | Google Scholar | MathSciNet
A. Azzalini and A. Capitanio, “Statistical applications of the multivariate skew normal distribution,” Journal of the Royal Statistical Society B: Statistical Methodology, vol. 61, no. 3, pp. 579–602, 1999.
View at: Publisher Site | Google Scholar | MathSciNet
Y. Ma and M. G. Genton, “Flexible class of skew-symmetric distributions,” Scandinavian Journal of Statistics, vol. 31, no. 3, pp. 459–468, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
A. Azzalini, “The skew-normal distribution and related multivariate families,” Scandinavian Journal of Statistics, vol. 32, no. 2, pp. 159–200, 2005.
View at: Publisher Site | Google Scholar | MathSciNet
H. J. Ho, S. Pyne, and T. I. Lin, “Maximum likelihood inference for mixtures of skew Student- $t$ -normal distributions through practical EM-type algorithms,” Statistics and Computing, vol. 22, no. 1, pp. 287–299, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
C. B. Zeller, V. H. Lachos, and F. E. Vilca-Labra, “Local influence analysis for regression models with scale mixtures of skew-normal distributions,” Journal of Applied Statistics, vol. 38, no. 2, pp. 343–368, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
V. G. Cancho, D. K. Dey, V. H. Lachos, and M. G. Andrade, “Bayesian nonlinear regression models with scale mixtures of skew-normal distributions: estimation and case influence diagnostics,” Computational Statistics & Data Analysis, vol. 55, no. 1, pp. 588–602, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
M. D. Branco and D. K. Dey, “A general class of multivariate skew-elliptical distributions,” Journal of Multivariate Analysis, vol. 79, no. 1, pp. 99–113, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
F. C. Xie, J. G. Lin, and B. C. Wei, “Diagnostics for skew-normal nonlinear regression models with AR(1) errors,” Computational Statistics and Data Analysis, vol. 53, no. 12, pp. 4403–4416, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
F.-C. Xie, B.-C. Wei, and J.-G. Lin, “Homogeneity diagnostics for skew-normal nonlinear regressions models,” Statistics & Probability Letters, vol. 79, no. 6, pp. 821–827, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
L. H. Vanegas and F. J. Cysneiros, “Assessment of diagnostic procedures in symmetrical nonlinear regression models,” Computational Statistics & Data Analysis, vol. 54, no. 4, pp. 1002–1016, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
T. A. Severini, “An approximation to the modified profile likelihood function,” Biometrika, vol. 85, no. 2, pp. 403–411, 1998.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. K. Smyth, “Partitioned algorithms for maximum likelihood and other non-linear estimation,” Statistics and Computing, vol. 6, no. 3, pp. 201–216, 1996.
View at: Publisher Site | Google Scholar
B. Wei, Exponential Family Nonlinear Models, Springer, Singapore, 1998.
View at: MathSciNet

Copyright

Copyright © 2014 Xuedong Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1011

Downloads

1216

Citations