Journal of Probability and Statistics

Volume 2018, Article ID 4878925, 9 pages

https://doi.org/10.1155/2018/4878925

## Local Influence Analysis for Quasi-Likelihood Nonlinear Models with Random Effects

^{1}School of Mathematics and Statistics, Guizhou University of Finance and Economics, Guiyang 550025, China^{2}Department of Mathematics and Statistics, University of North Carolina at Charlotte, NC 28223, USA^{3}Department of Mathematics, Southern University of Science and Technology, Shenzhen 518055, China

Correspondence should be addressed to Xuejun Jiang; nc.ude.ctsus@jxgnaij

Received 9 May 2018; Accepted 16 July 2018; Published 8 August 2018

Academic Editor: Steve Su

Copyright © 2018 Tian Xia et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We propose a quasi-likelihood nonlinear model with random effects, which is a hybrid extension of quasi-likelihood nonlinear models and generalized linear mixed models. It includes a wide class of existing models as examples. A novel penalized quasi-likelihood estimation method is introduced. Based on the Laplace approximation and a penalized quasi-likelihood displacement, local influence of minor perturbations on the data set is investigated for the proposed model. Four concrete perturbation schemes are considered in the local influence analysis. The effectiveness of the proposed methodology is illustrated by some numerical examinations on a pharmacokinetics dataset.

#### 1. Introduction

In this paper, we propose a quasi-likelihood nonlinear model with random effects (QLNMWRE) and investigate local influence of the model. The QLNMWRE is a hybrid generalization of quasi-likelihood nonlinear models [1, 2] and generalized linear mixed models, and it combines the advantages of both models. Generalized linear mixed models (GLMMs) are extensions of the well-known generalized linear models [3] by adding random effects to the linear predictor. GLMMs are effective and flexible for modeling nonnormal responses, repeated measurements, and other forms of clustered data. Efficient inference for the GLMMs depends on the underlying distribution of the data. Nevertheless, the exact distribution is rarely known in practice. In contrast, the quasi-likelihood method [4] requires only the first and second moments assumptions about the distribution and has been widely applied in the theory and practice of statistics (see, e.g., [5–8]).

Detecting influential observations is important in data analysis. The local influence analysis has become a general tool for detecting a group of points with great influence on the fitted model through perturbation schemes [9]. This approach has been successfully applied in many models, such as mixed models [10, 11], generalized linear models [12], generalized linear mixed models [13], exponential family nonlinear models [14], nonlinear reproductive dispersion mixed model [15], nonlinear mixed-effect models [16, 17], and multivariate threshold time series models [1]. However, in these references the local influence method severely depends on the likelihood displacement, which is rarely known in practice. Instead, quasi-likelihood methods do not require the exact likelihood function except the first two moments of the response variables. Hence, we conduct influence analysis of the QLNMWRE using a novel penalized quasi-likelihood estimation method. The proposed methodology is illustrated by analyzing the pharmacokinetics dataset.

The remainder of this paper is organized as follows. In Section 2, we introduce the QLNMWRE and the corresponding estimation method. A Fisher-scoring iteration algorithm is advanced to calculate the estimators. In Section 3, a penalized quasi-likelihood displacement (PQLD) is proposed, and assessment of local influence under four different perturbation schemes is investigated. In Section 4, the pharmacokinetics dataset is employed to illustrate the effectiveness of the proposed methodology. Finally, we make discussion in Section 5.

#### 2. Models and Estimation Method

Let be a response vector of length , and let and be and matrices of explanatory variables associated with fixed and random effects, respectively. Conditional on the dimensional vector of random effects, , the observations, , are independent and satisfy thatwhere is an unknown parameter vector defined in a compact subset , and are defined in a subset of and a subset of , respectively, is a known variance function, is a dispersion parameter that is known or can be estimated separately, is a continuously differentiable function such that the derivative matrix has rank for all , with , and the random effects are assumed to be multivariate normally distributed:with being a known nonnegative definite matrix. Following [2, 3, 18, 19], the conditional log quasi-likelihood on is defined aswhere The model defined by (1)-(3) is the so-called QLNMWRE.

Clearly, this QLNMWRE encompasses some important special cases. If , then the above model is just the quasi-likelihood nonlinear model discussed by [2]; if , and are independently drawn from a one-parameter exponential family of distributions with density where is a measure, then it reduces to generalized linear models with random effects (see [20, 21]). Hence, the QLNMWRE is a hybrid extension of the quasi-likelihood nonlinear models and the generalized linear models with random effects.

Let be a probability density function of random effect . Then the joint log quasi-likelihood function of and isSimilar to the relationship between the joint log-likelihood function and the marginal log -likelihood function, we have where is the marginal log quasi-likelihood function of and is the log quasi-likelihood function of given , i.e., Following the arguments in [20], the integrated log quasi-likelihood function used to estimate is defined bywhere denotes the deviance measure of fit. If, conditional on , is a member of the exponential family, then is the conditional log-likelihood of given , and is the log-likelihood function.

In general, no analytical expressions are available for the integral in (8) and approximate techniques are needed. The simplest approach is the Laplace approximation [22, 23]. Obviously, the right-hand side of (8) is where . When the Laplace method is applied to approximate the integrated quasi-likelihood function (8), estimates of for fixed are obtained by maximizing the penalized quasi-likelihood (PQL) (8):where , and is the root of for fixed . We will use the penalized quasi-likelihood to estimate and to conduct local influence analysis. To this end, we need the following assumptions.

*Assumption A. *

(i) ;

(ii) there exists some constant and some compact subset such that

It is easily seen that Assumption A holds in generalized linear mixed models and exponential family nonlinear random effects models. Assumption A guarantees existence of the variance-covariance matrix of , where with . Let and , where Put , , , and . Under Assumption A, we have the following result.

Theorem 1. *For the model defined by (1)-(3), conditional on , the quasi-score function, the quasi-observed information matrix, and the quasi-Fisher information matrix for admit the following representations:where indicates the array multiplication.*

Let denote the maximum quasi-likelihood estimator (MQLE) of , which is the solution of equation . Then the Fisher-scoring iteration method can be used for computing by iteratively solving the following equation (see [14, 24]):where , and are all evaluated at and .

On the other hand, it follows from (5) that the quasi-score function and the quasi-Fisher information matrix for can be, respectively, expressed as where with , and Hence, the Fisher-scoring iteration algorithm for computing the predictor of under known is given bywhere and are all evaluated at and . As the iteration scheme (19) converges, converges to .

In general, the choice of initial value is important for the Fisher-scoring iteration algorithm. We use the algorithm in [2] for quasi-likelihood nonlinear models to find the starting values of parameter for QLNMWRE with . Hence, the MQLE of can be obtained by solving (16) and (19) until convergence.

In order to investigate the statistical diagnostic measures for QLNMWRE, we rewrite (16) where . When converges to , can be expressed aswhere , and are all evaluated at .

#### 3. Local Influence

The aim of local influence analysis is to investigate the behavior of some influence measure when small perturbations are made in the model/data, where is an m-dimensional vector of perturbations restricted to some open subset . For simple statistical models, Cook constructed in [9] the likelihood displacement and used it to assess the local influence of a minor perturbation. Although this approach is very useful, serious difficulties are encountered when applying it to complicated models, because of the intractable likelihood function. For the sake of coping with those difficulties, some authors have considered alternatives to replace LD. For instance, Zhu et al. proposed in [25] the Q-likelihood displacement and established an approach to assess local influence of statistical models with incomplete data, and Jung presented in [26] a quasi-likelihood displacement to obtain local influence analysis in generalized estimating equations. Inspired by [25, 26], we define in this work a new penalized quasi-likelihood displacement and then adapt the local influence approach introduced by [9] to the QLNMWRE.

Let and be the penalized quasi-likelihood for the unperturbed and perturbed models, respectively. We assume that there is an such as . Let and be the MQLE of under the unperturbed and perturbed models, respectively. Similar to the likelihood displacement [9], we define the penalized quasi-likelihood displacement (PQLD) as The influence graph is defined as . Following the approach developed in [9, 25, 26], the normal curvature of at in the direction of some unit vector can be used to summarize the local behavior of the penalized quasi-likelihood displacement. As shown in [9], the normal curvature in the unit direction at is given bywhere , in which is a matrix evaluated at and , is a Hessian matrix evaluated at . The maximum curvature , which is the largest absolute eigenvalue of , and the corresponding direction vector are usually used for identifying locally influential observations. A large value of is an indication of a serious local problem, and if the -th element in is relatively large special attention should be paid to the element being perturbed by . To apply the local influence method in [9] to the QLNMWRE, we consider the following four perturbation schemes.

##### 3.1. Case-Weights Perturbation

Let be an perturbation vector such that . The joint log quasi-likelihood function for the perturbed model is given by where . Then the penalized quasi-likelihood function can be expressed aswhere , , and satisfies Hence, , where . Then and thus where .

##### 3.2. Response Variable Perturbation

A perturbation of the response variables is introduced by replacing by , where , and represents the situation with no perturbation. In this case, the joint log quasi-likelihood function for the perturbed model is given by where C is a constant. It follows from Section 2 that the penalized quasi-likelihood function is where , and satisfies It follows that , where with . Then and where with and .

##### 3.3. Explanatory Variables Perturbation

In this case, we focus on the perturbation of a specific explanatory variable. Under this condition we have the perturbed explanatory matrix with , where is a single explanatory variable of matrix corresponding to and denotes no perturbation. Then the joint log quasi-likelihood function for the perturbed model is where C is a constant, , and . It follows from Section 2 that where , , and satisfies Therefore, and . Let and . Then where indicates the array multiplication.

##### 3.4. Perturbation of Covariates in Random Effects

Consider perturbing the data for the th explanatory variable of , by modifying the data matrix Z as , where is a vector with 1 at th position and zeros elsewhere. Under this situation, the perturbed joint log quasi-likelihood can be expressed as where is a quantity that does not depend on and , and . When , it indicates no perturbation. It follows from Section 2 that where , and satisfies and therefore, with . Then Hence,

#### 4. Numerical Results

To illustrate how to use the proposed methodology, we consider the data set reported by [27]. The data came from a study of the pharmacokinetics of indomethacin following bolus intravenous injection of the same dose in six human volunteers. For each subject, the plasma concentrations of indomethacin were measured at 11 time points from 15 min to 8 hours postinjection. Davidian et al. used nonlinear repeated model to analyze the dataset in [28]; we model it using the following QLNMWRE:where response variables belong to the Gumbel distribution (cf. [29]) with the density function , and . By [29], we have and , where is called the Euler constant, and . It is easily shown that Assumption A holds for our proposed model. Therefore, we can apply our proposed methodology to estimate the parameters in model (43). Using the algorithm in Section 2, we obtain the MQLE of , the predictive values of as follows: and

Now we present local influence analysis for the above fitting results. Under case-weight perturbation, cases 23, 45, and 56 are most influential, as depicted as in Figure 1(a). Cases 1, 12, 23, 45, and 56 are identified as influential points, and case 23 is the most influential, as shown in Figure 1(b). The index plots of and for perturbation on explanatory variables are given in Figures 2(a) and 2(b), respectively. From Figure 2(a) we can see that cases 12, 23, 45, and 56 are identified as influential points. Figure 2(b) shows that cases 1, 12, 23, 34, 45, and 56 are influential. Figure 3 displays the index plots of for the perturbation of random effects. For these types of perturbation, case 23 is identified as being the most influential. Note that case 23 exerts great influence in each perturbation scheme, which indicates that the results obtained through different perturbation schemes are quite consistent. Special attention should be paid to those influential cases, which may be worthwhile to consider a more formal test to check whether they are outliers.