Computational and Mathematical Methods in Medicine

Volume 2016 (2016), Article ID 8578156, 8 pages

http://dx.doi.org/10.1155/2016/8578156

## Multiple Linear Regressions by Maximizing the Likelihood under Assumption of Generalized Gauss-Laplace Distribution of the Error

^{1}Department of Physics and Chemistry, Faculty of Materials and Environmental Engineering, Technical University of Cluj-Napoca, Muncii Boulevard No. 103-105, 400641 Cluj-Napoca, Romania^{2}Doctoral School of Chemistry, Institute for Doctoral Studies, Babeş-Bolyai University, Kogălniceanu Street No. 1, 400084 Cluj-Napoca, Romania^{3}Department of Chemistry, Faculty of Science, University of Oradea, Universităţii Street No. 1, 410087 Oradea, Romania^{4}Department of Medical Informatics and Biostatistics, Faculty of Medicine, Iuliu Haţieganu University of Medicine and Pharmacy, Louis Pasteur Street No. 6, 400349 Cluj-Napoca, Romania^{5}Doctoral School, University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca, Calea Mănăştur No. 3-5, 400372 Cluj-Napoca, Romania

Received 22 August 2016; Accepted 23 October 2016

Academic Editor: Irini Doytchinova

Copyright © 2016 Lorentz Jäntschi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected.

#### 1. Introduction

The first report on multiple linear regression appears on 1885 [1] and was detailed in 1886 [2]. The classical treatments of the multiple regressions were built on the product-moment method implemented in 1846 [3] and later connected with the optimal correlation [4].

In his first published paper, Fisher introduces the method of likelihood maximization [5], later used in conjunction with Pearson’s correlation [6]—a paper which started a contradictory debate between the method of central moments and the method of likelihood estimation [7] replied to in [8] and finally linked with the partial correlation coefficients [9].

A multiple linear regression model involves more than two variables, one () being assumed dependent and the others () being assumed to be independent, and is considered here as a continuation of a previous study [10]. The most important assumption is that the data are paired; for example, a natural association between the values of the variables exists. This kind of association is accomplished when for instance a multiple linear regression is constructed involving a measured property/activity () for a series of compounds for which other compounds measured properties/activities or structure-based descriptors are available (), the natural association being in this case the (chemical) compound responsible for that property/activity/descriptor value.

The least squares method is the standard approach for regression analysis, the method being credited to Legendre [11] (for a debate about the inventor, please see [12]), which also (implicitly) assumes that the error is normally distributed.

Iteratively applying local quadratic approximation to the likelihood (through the Fisher information [13]), the least squares method was used to fit a generalized linear model as a way of unifying classical, logistic, and Poisson (linear) regression in [14] by iteratively reweighing the least squares method in the way to the maximum likelihood estimation of the model parameters.

Generalized Gauss-Laplace distribution is the natural extension [15] from Gauss’s [16] and Laplace’s [17] symmetric distributions. It is a triparametric distribution (location, scale, and shape) and parameter estimation via maximum likelihood and the method of moments have been reported in [18], concluding that the estimates do not have a closed form and must be obtained numerically.

A more general result regarding the maximum likelihood estimation can be found in [19] but unfortunately provides only conditions in which maximum likelihood estimate exists and is unique without providing the reciprocal (namely, there exist also other conditions than the one given, in which maximum likelihood estimate exists and is unique). Even more, for numerical estimates, it is hardly to discuss unicity.

The problem of estimating the parameters of a multiple linear regression under assumption of generalized Gauss-Laplace distribution of the error is a hard problem which can be solved only numerically and it involves an optimization problem with constrains, where is the number of unknown (to be determined) coefficients of the multiple linear regression. In this paper a mathematical and a numerical treatment of the problem is proposed.

In order to provide a proof of the facts for the proposed method of relaxing the distribution of the error when linear regression is used to link between chemical information and biological measurements, ten previously reported datasets were considered, all with significant role in human medicine or ecology.

#### 2. Mathematical Treatment

One may define the generalized Gauss-Laplace (GL) distribution aswhere is the Gamma function and (location), (scale), and (shape) are the parameters of the distribution.

This definition will be used here for the Gauss-Laplace distribution to relax the normal distributed constraint for the distribution of the error ().

##### 2.1. Statement of the Problem

Multiple linear regressions under assumption of GL distribution (see (1), for the error ; , and are to be determined from sampled data) are stated in the following equation:where (and and ).

The case with intercept () is reduced to the case without intercept by increasing with one the number of the independent variables (; ; and , when ) and therefore will not be mentioned further. The substitution given as (2) transforms the distribution from univariate to a multivariate one and can be mathematically characterized by a series of properties, such as is given in [29] (results applicable resizing from 0 and ).

Let us take a sample of paired measurements (e.g., , where is the number of paired measurements and is the number of independent measures). The likelihood for the sample is

Doing the substitution given in (2) and expressing in full its parameters, the expression of the likelihood function from (3) becomes

The likelihood is at maximum when all its partial derivatives are zero:

##### 2.2. Simplification of the Problem

The problem of finding the maximum of the likelihood is a typical problem of finding the extreme points, but not easy to be solved because it depends on a large number of variables. The easiest way is to eliminate one variable, namely, . The derivative of LMLRGL by provides the value of :

Please note that does not depend on . Therefore, let be the function having this constraint. After some calculations, the expression of iswhere .

On the other hand, only depends on , and therefore when the derivative of relative to is zero, then also the derivative of the maximum likelihood estimation function (either any of LMLRGL and LMLRGLS) is zero. Doing the partial derivatives of , with the above given substitution (function ), the following equation is the result:where .

At this point only the expression of the likelihood function (see (7) and (8)) must be included in the new statement of the problem (see (9)) in order to keep in full the derivatives constraints of the initial problem (see (4) and (5)). There is no obvious further reduction of the problem. However, revising the results obtained till this point, the cancelling of the (likelihood function) derivative relative to was included at the beginning of the simplification (see (6)) while the cancelling of the (likelihood function) derivatives relative to the regression coefficients is equivalent to the previous given equation for the regression coefficients (see (9)). On the other hand, (7)–(9) facilitate an iterative solution of the problem.

#### 3. Fixed-Point Theory for Iterating to the Optimal Solution

A convenient notation was used in (9) to suggest the further treatment of the problem. Actually, Fisher and Mackenzie proposed for the first time to use such numerical treatment in statistics [30]. This is based on the assumption that, near to the optimal solution, an iterative evaluation of the coefficients (here and ) conducted using their previous values (hidden in (9) inside of the function ) leads to the optimum. The optimum is obtained when no significant change from one step to another is on their values, and, at that time, the function acts as the argument of a contraction mapping [31].

There are some inconveniences for a smooth application of the fixed-point theory. One of them is that the obtaining of the maximum of the LMLRGLS function (see (7), being obtained for known values of and (where vanishes)) is not a very simple problem; it is expected from its explicit expression to have more than one local maximum. Fortunately, some clues exist, such as the domain of (ranging from 0) and the expectance from the power of the error (here let us say that it is expected to have from 0.1 to 10 and is very unlikely but possible to have from 0.01 to 100, but outside of this range also precision of computations often fails). But the biggest inconvenience is that (9) is not an equation, but a system of equations, and here we may only provide different strategies of iteration, hoping that at least one of them provides the contraction mapping. Namely, we may(i)start from some initial values of the regression coefficients () and for the power of the error ;(ii)use initial values to obtain the likelihood function LMLRGLS (from (7)) as a function depending only on ; it requires only the evaluation of (see (8));(iii)find the maximum (let this be ) of LMLRGLS function from (7) (where its derivative is 0 and the point is a global extreme point);(iv)prepare starting of a loop on , by setting it to 0 ();(v)it is possible, especially at the beginning of the iteration (when ), that and be largely different one to each other; a major change in will accelerate the convergence but will also increase the likelihood of divergence; therefore, use the new () and old () value to indicate the gradient of the change in , such as , with small to be determined;(vi)do a loop () using the new value of (namely, ) to calculate the new values of the coefficients ( with (9)) and using the new values of the coefficients calculate the new value of (turn back to find the maximum from (7));(vii)repeat until () and () have no changes during iteration.

At arriving in the stationary point, all criteria for the maximization of the likelihood are accomplished; namely, the equations corresponding to all derivatives cancellation are assured. The great advantage of this proposed method is that it reduces the problem of finding the maximum of a function with variables to the finding of the maximum of a function with one variable (), in a repeated process, of course.

The disadvantage is that the evolution is through a contracting functional of which contraction cannot be assured all the time. This is the reason why there are different strategies of finding such kind of contracting functional (see example 6.1 in [32] for construction of a contracting functional from resampling).

Some calculations are the same regardless the strategy used and are given in the next as Algorithms 1–3.