Scientifica

Scientifica / 2021 / Article

Research Article | Open Access

Volume 2021 |Article ID 5545356 | https://doi.org/10.1155/2021/5545356

Adewale F. Lukman, Issam Dawoud, B. M. Golam Kibria, Zakariya Y. Algamal, Benedicta Aladeitan, "A New Ridge-Type Estimator for the Gamma Regression Model", Scientifica, vol. 2021, Article ID 5545356, 8 pages, 2021. https://doi.org/10.1155/2021/5545356

A New Ridge-Type Estimator for the Gamma Regression Model

Academic Editor: Francisco Ayuga
Received26 Jan 2021
Accepted04 Jun 2021
Published21 Jun 2021

Abstract

The known linear regression model (LRM) is used mostly for modelling the QSAR relationship between the response variable (biological activity) and one or more physiochemical or structural properties which serve as the explanatory variables mainly when the distribution of the response variable is normal. The gamma regression model is employed often for a skewed dependent variable. The parameters in both models are estimated using the maximum likelihood estimator (MLE). However, the MLE becomes unstable in the presence of multicollinearity for both models. In this study, we propose a new estimator and suggest some biasing parameters to estimate the regression parameter for the gamma regression model when there is multicollinearity. A simulation study and a real-life application were performed for evaluating the estimators' performance via the mean squared error criterion. The results from simulation and the real-life application revealed that the proposed gamma estimator produced lower MSE values than other considered estimators.

1. Introduction

The gamma regression model (GRM) is generally adopted to model a skewed response variable that follows a gamma distribution with one or more independent variables. It is used in modelling the real-life data problems of several fields such as the medical sciences, health care economic, and automobile insurance claim [1]. When the positively skewed response variable follows a gamma distribution with a given set of independent variables, then it is preferred to use the gamma regression model [24]. As in linear regression models, the explanatory variables independence assumption rarely holds in practice, so the multicollinearity problem exists in the gamma regression models which means the maximum likelihood estimator (MLE) is unstable and gives high variances [5]. Consequently, constructing confidence intervals or testing the regression parameters of the model becomes difficult [6]. A lot of authors proposed different estimators for handling multicollinearity. The ridge estimator given by Hoerl and Kennard [7] is an alternative to MLE to overcome the multicollinearity in the linear regression model. The estimator has been extended to the generalized linear models (GLM) (see [8, 9]). Also, Månsson and Shukur [10] and Månsson [11] introduced the ridge estimator to the Poisson regression model and the negative binomial regression model, respectively. Kurtoglu and Ozkale [12] extend the Liu estimator of Liu [13] to the gamma regression model. Batah et al. [14] proposed a modified Jackknife ridge estimator by combining the ideas of the generalized ridge estimator and Jackknifed ridge estimator. Also, Algamal [3] developed the modified Jackknifed ridge gamma regression estimator. Recently, the modified version of the ridge regression estimator with two biasing parameters was proposed for both the LRM and GRM [15, 16]. Kibria and Lukman [17] proposed a new estimator called the ridge-type estimator and applied to the popular linear regression model.

The main objective portrayed in this article is to extend the new ridge-type estimator of Kibria and Lukman [17] to the GRM. The article organization is as follows: in Section 1, we proposed the new ridge-type gamma estimator, and then we derived its properties. Also, we have done the theoretical comparisons and have explained the estimation of the biasing parameter in Section 2. A simulation study is conducted to investigate and compare the performance of the new gamma estimator and some existing estimators in Section 3. We also analyzed a real-life data in Section 4. Finally, we have provided some concluding remarks in Section 5.

2. The Statistical Methodology

Consider the response variable which follows the known gamma distribution with the parameter of the nonnegative shape and the parameter of the nonnegative scale with probability density function:where and The log-likelihood function of (1) is

Equation (2) is solved iteratively since it is nonlinear in using the Fisher scoring method as follows:where is the iteration degree, and . The last step for the estimated coefficients is considered aswhere , matrix, , and is called the vector in ith element, . and are obtained by procedure of the Fisher scoring iterative (see [12, 18]). The matrix form of the covariance, the matrix of the mean squared error (MMSE), as well as the mean square error (MSE) are obtained by Algamal and Asar [19] and written, respectively, as follows:where .where is considered as an jth eigenvalue of the given matrix and the notation is the transpose of X.

The gamma ridge estimator (GRE) is considered aswhere and is the biasing parameter. The MMSE and MSE of GRE are given bywhere such that is the matrix of eigenvectors of .

The gamma Liu estimator (GLE) is given bywhere and is the biasing parameter.

The MMSE and MSE of GLE are given by

2.1. The New Gamma Estimator

For the known linear regression model, Kibria and Lukman [17] proposed the following new ridge-type estimator and called as the Kibria–Lukman (KL) estimator, which is defined aswhere , , and .

In this study, we extend the KL estimator to the GRM and referred to the estimator as gamma KL estimator (GKL) which is written as follows:where .

The bias and covariance matrix form of GKL estimator are gotten respectively as:where and

So, the MMSE and MSE in terms of eigenvalues are defined, respectively, as

2.2. The Theoretical Comparison for the Estimators

Some needed lemmas are stated as follows for comparing the estimators in theoretical.

Lemma 1. Suppose matrices is positive definite (p.d.) as well as is p.d. (or is nonnegative); then, iff , where is the max eigenvalue for the matrix [20].

Lemma 2. Suppose is an matrix which is p.d. and be a vector; then, is p.d. iff [21].

Lemma 3. Suppose that , be the given two linear estimators of . Also, suppose is p.d., where is considered as the covariance matrix form of and , . Consequently,if , where [22].

2.2.1. Comparison of GKL and MLE

Theorem 1. is better than if

Proof. The difference of the dispersion isWe observed that is positive definite (p.d.) since for . By Lemma 3, the proof is done.

2.2.2. Comparison of GKL and GRE

Theorem 2. is superior to ifwhere

Proof. where and .
Clearly, for the biasing parameters and , as well as . if , where is the max eigenvalue of the matrix form . By Lemma 1, the proof is done.

2.2.3. Comparison of GKL and GLE

Theorem 3. is superior to ifwhere .

Proof. The difference of the dispersion isWe observed that is p.d. since for and . By Lemma 3, the proof is done.

2.2.4. Estimation of Parameter k

The optimal value of in is adopted from the KL estimator of the study of Kibria and Lukman [17] as follows:

The optimal value of given in (24) depends on the unknown parameters and Therefore, we put the corresponding unbiased estimators instead of them. Consequently,

3. Simulation Design

R 3.4.1 programming language is adopted for the simulation design of this study. Following Algamal [19], the response variable is generated as follows:where , denotes . The parameter vector, , is chosen such that [1, 23, 24]. Following Kibria [25] and Kibria and Banik [26], the given explanatory variables are obtained as follows:where are generated from standard normal and is the correlation between the explanatory variables. The values of in this study are chosen to be 0.95, 0.99, and 0.999. We obtained the mean function for p = 4 and 7 explanatory variables, respectively, for the following sample sizes: 20, 50, and 200. For each replicate, we compute the mean square error (MSE) of the estimators by using the following equation:where would be any of the following estimators (MLE, GRE, GLE, and GLK). The smaller the mean square error value is, the better the estimator is. The biasing parameters for GRE and GLE are obtained as follows:

We examined two shrinkage parameters for the proposed estimator. They are defined as follows:

The simulation results for different values of n, φ, and ρ are presented in Tables 1 and 2 for p = 4 and 7, respectively. For a graphical representation, we also plotted MSE vs n, ρ, φ, and p in Figure 1.


nρMLEGRE-kGLE-dGKL ()GKL ()

0.5200.952.0080.9491.6431.1930.942
0.998.1952.7617.1564.0832.018
0.99978.59923.30575.07037.11917.929
500.951.2650.6431.0250.7630.601
0.994.2771.2573.5321.7991.102
0.99938.1728.04435.32013.2987.051
2000.950.5440.4440.4780.4590.435
0.990.9230.4670.6820.5510.463
0.9995.0680.5544.0671.5220.545

1200.953.5141.7583.1132.0251.357
0.9915.6776.75314.5588.2264.568
0.999154.07663.790150.43979.21761.203
500.952.6711.5282.4061.6551.155
0.9911.0345.41010.2006.0032.205
0.999105.10948.863102.24054.61026.562
2000.950.6280.4490.5460.4730.445
0.991.3920.5041.0500.6830.463
0.9999.8373.2208.3552.9481.276


nρMLEGRE-kGLE-dGKL ()GKL ()

0.5200.954.0492.1933.4732.7842.165
0.9917.2136.96215.17410.4646.451
0.999172.42063.921164.530102.44155.631
500.952.3931.5252.1881.8001.520
0.997.7423.1927.0364.5882.509
0.99969.72922.84367.01536.93622.786
2000.951.3751.1551.2821.2521.103
0.992.1311.2101.7501.5611.207
0.9999.9411.6588.3254.5071.431

1200.957.3974.4246.8845.0753.476
0.9934.88919.07133.21622.70911.262
0.999356.808192.852350.583231.657123.844
500.954.7903.3484.6513.5642.779
0.9919.78412.39819.29113.4285.905
0.999191.838116.591189.700126.65435.276
2000.951.6441.4621.5491.4021.348
0.993.2691.5832.8392.1251.437
0.99920.4024.71618.5509.3114.049

It was observed from both Tables 1 and 2 and Figure 1 that the MSE increases as the level of multicollinearity increases keeping other variables constant. For instance, when n = 50, for the MLE, the MSE increases from 1.265 to 38.172 as the level of multicollinearity, rises from 0.95 to 0.999 for given and p = 4. We also observed that, as the explanatory variables increases from p = 4 to p = 7, the MSE increases provided other variables are kept constant. For instance, when n = 20 for  = 0.99 and the MSE for the GRE-k rises from 6.753 to 19.071. Also, when other variables are fixed, increasing the sample size n results in a decrease in the MSE for all the estimators’, for example, the MSE value of GLE-d for n = 200, p = 7, and  = 0.95 reduces from 1.282 to 1.549. Furthermore, the MSE increases as the dispersion parameter increases from 0.5 to 1. The maximum likelihood estimator performs least as expected because of the effect of multicollinearity on the estimator. The result in Tables 1 and 2 and Figure 1 shows that the GKL outperforms other estimators. Since the performance of the proposed estimator GKL depends on its biasing parameter, we examined two different biasing parameters for GKL estimator and observed that the GKL estimator performs best with the biasing parameter, The simulation result further supports the theoretical results that the performance of GKL estimator is the best. The performance of the GRE and GLE is better than that of the MLE. Furthermore, we explored the performance of the proposed estimator and the existing estimators by analyzing a real-life data in Section 4.

4. Real-Life Data: Algamal Data

The chemical dataset adopted in this study was employed in the study of Algamal [3, 19]. He employed the quantitative structure-activity relationship (QSAR) model to study the relationship between the biological activities of 65 imidazo [4, 5-b] pyridine derivatives – an anticancer compound – and 15 molecular descriptors. The QSAR model is widely used in the following fields: chemical sciences, biological sciences, and engineering. The linear regression model is popularly used to model the QSAR relationship between the response variable (biological activity) and one or more physiochemical or structural properties which serve as the explanatory variables especially when the response variable is normally distributed [27]. However, the regression modelling is employed when the response variable is skewed [3, 19, 24, 28]. In this study, following Algamal [3, 19], the variables of interest are described in Table 3.


Variable namesDescription

Mor21vSignal 21/weighted by van der Waals volume
Mor21eSignal 21/weighted by Sanderson electronegativity
IC3Information content index
MWMolecular weight
SpMaxA_DNormalized leading eigenvalue from topological distance matrix
ATS8vBroto–Moreau autocorrelation of lag 8 weighted by van der Waals volume
GATS4pGeary autocorrelation of lag 4 weighted by polarizability
SpMax8_Bh(p)Largest eigenvalue n. 8 of Burden matrix weighted by polarizability.
SpMax3_Bh(s)Largest eigenvalue n. 3 of Burden matrix weighted by l-state.
P_VSA_e_3P_VSA-like on Sanderson electronegativity, bin 3
TDB08m3D topological distance-based descriptors; lag 8 weighted by mass
RDF100mRadial distribution function: 100/weighted by mass
MATS7vMoran autocorrelation of lag 7 weighted by van der Waals volume
MATS2sMoran autocorrelation of lag 2 weighted by l-state
HATS6vLeverage-weighted autocorrelation of lag 6/weighted by van der Waals volume

According to Algamal [3, 19]; the response variable, y, follows a gamma distribution. Using the chi-square goodness of fit test, author examined that the response variable is well fitted to the gamma distribution with test statistic ( value) given as 9.3657 (0.07521). Algamal [19] reported that the correlation coefficient between the following variables, Mor21v and Mor21e, SpMax3_Bh(s) and ATS8v, SpMaxA_D and MW and finally MW and ATS8v, is greater than 0.9 and interpreted as high correlation. The eigenvalues of are 7.6687E + 8, 1.3238E + 6, 85791, 5523.6, 358.71, 250.51, 148.46, 42.731, 27.239, 18.015, 9.1197, 8.6175, 5.7748, 2.4292, 1.6532, and 0.3659, respectively. Thus, the condition number, CN is computed as follows:

CN =  = 45777.7 which indicates the presence of severe multicollinearity [19]. The results of the gamma regression model and the mean square error are presented in Table 4.


Coef.MLEGRE-kGLE-dGKL ()GKL ()

tpecretnI−0.1568−0.1597−0.1568−0.1624−0.1573
MW0.01580.01550.01580.01550.0148
IC30.82510.82540.82510.82550.8260
SpMaxA_D−0.4681−0.4418−0.4681−0.4407−0.3816
ATS8v−2.3347−2.3161−2.3347−2.3165−2.2691
MATS7v−1.1565−1.1382−1.1565−1.1392−1.0903
MATS2s−2.2127−2.1479−2.2127−2.1452−1.9987
GATS4p−2.7097−2.6510−2.7097−2.6511−2.5068
SpMax8_Bh(p)2.80412.74262.80412.74252.5930
SpMax3_Bh(s)0.40820.39940.40820.39910.3790
P_VSA_e_30.00160.00170.00160.00170.0020
TDB08m−1.3127−1.1859−1.3127−1.1811−0.8954
RDF100m−0.0004−0.0004−0.0004−0.0005−0.0006
Mor21v−0.8682−0.8448−0.8682−0.8446−0.7882
Mor21e−0.0504−0.0593−0.0504−0.0597−0.0795
HATS6v−0.5290−0.4030−0.5290−0.3803−0.1723
d/k0.00770.99990.08240.2871
MSE5.55993.50625.55993.23511.6397

The result in Table 4 agrees with the simulation results. The performance of the MLE is the worst in terms of possessing the highest MSE. The proposed estimator with the biasing parameter in this order has the least mean square error followed by , GRE-k and GLE-d estimators. Recall in the simulation study GKL with as the shrinkage parameter performed the best.

5. Some Concluding Remarks

The Kibria–Lukman [17] estimator was developed to circumvent the problem of multicollinearity for the linear regression model. This estimator is in the class of the ridge regression and the Liu-type regression estimator, and it has a single biasing parameter. In gamma regression model, multicollinearity is also a threat for the performance of the maximum likelihood estimator (MLE) in the estimation of the regression coefficients. The gamma ridge (GRE) and the gamma Liu estimator (GLE) has been introduced in the previous study to mitigate the problem of multicollinearity. Since, Kibria and Lukman [17] claimed that the KL estimator outperforms the ridge and Liu estimator in the linear regression model, which motivated us to develop the gamma KL (GKL) estimator for the effective estimation in the GRM. We derived the statistical properties of GKL estimator and compared it theoretically with the MLE, GRE, and GLE. Furthermore, a simulation study and a chemical data analysis were conducted in support of the theoretical study. The simulation and application result show that GKLE with as the shrinkage parameter performed the best. In conclusion, the use of the GKL estimator is preferred when multicollinearity exists in the known gamma regression model.

Data Availability

The data used to support the findings of this study are available upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. M. Amin, M. Qasim, and M. Amanullah, “Performance of Asar and Genç and Huang and Yang’s two-parameter estimation methods for the gamma regression model,” Iranian Journal of Science and Technology, Transactions A: Science, vol. 43, no. 6, pp. 2951–2963, 2019. View at: Publisher Site | Google Scholar
  2. A. M. Al-Abood and D. H. Young, “Improved deviance goodness of fit statistics for a gamma regression model,” Communications in Statistics-Theory and Methods, vol. 15, no. 6, pp. 1865–1874, 1986. View at: Publisher Site | Google Scholar
  3. Z. Y. Algamal, “Developing a ridge estimator for the gamma regression model,” Journal of Chemometrics, vol. 32, no. 10, p. e3054, 2018. View at: Publisher Site | Google Scholar
  4. M. Wasef Hattab, “A derivation of prediction intervals for gamma regression,” Journal of Statistical Computation and Simulation, vol. 86, no. 17, pp. 3512–3526, 2016. View at: Publisher Site | Google Scholar
  5. E. Dunder, S. Gumustekin, and M. A. Cengiz, “Variable selection in gamma regression models via artificial bee colony algorithm,” Journal of Applied Statistics, vol. 45, no. 1, pp. 8–16, 2016. View at: Publisher Site | Google Scholar
  6. S. Perez-Melo and B. M. G. Kibria, “On some test statistics for testing the regression coefficients in presence of multicollinearity: a simulation study,” Stats, vol. 3, no. 1, pp. 40–55, 2020. View at: Publisher Site | Google Scholar
  7. A. E. Hoerl and R. W. Kennard, “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970. View at: Publisher Site | Google Scholar
  8. R. L. Schaefer, L. D. Roi, and R. A. Wolfe, “A ridge logistic estimator,” Communications in Statistics-Theory and Methods, vol. 13, no. 1, pp. 99–113, 1984. View at: Publisher Site | Google Scholar
  9. B. Segerstedt, “On ordinary ridge regression in generalized linear models,” Communications in Statistics-Theory and Methods, vol. 21, no. 8, pp. 2227–2246, 1992. View at: Publisher Site | Google Scholar
  10. K. Månsson and G. Shukur, “A Poisson ridge regression estimator,” Economic Modelling, vol. 28, no. 4, pp. 1475–1481, 2011. View at: Publisher Site | Google Scholar
  11. K. Månsson, “On ridge estimators for the negative binomial regression model,” Economic Modelling, vol. 29, no. 2, pp. 178–184, 2012. View at: Publisher Site | Google Scholar
  12. F. Kurtoglu and M. R. Ozkale, “Liu estimation in generalized linear models: application on gamma distributed response variable,” Statistical Papers, vol. 57, no. 4, pp. 911–928, 2016. View at: Google Scholar
  13. K. Liu, “A new class of biased estimate in linear regression,” Communications in Statistics Theory and Methods, vol. 22, no. 2, pp. 393–402, 1993. View at: Google Scholar
  14. F. S. M. Batah, T. V. Ramanathan, and S. D. Gore, “The efficiency of modified jackknife and ridge type regression estimators-a comparison,” Surveys in Mathematics and Its Applications, vol. 3, pp. 111–122, 2008. View at: Publisher Site | Google Scholar
  15. A. F. Lukman, K. Ayinde, S. Binuomote, and O. A. Clement, “Modified ridge‐type estimator to combat multicollinearity: application to chemical data,” Journal of Chemometrics, vol. 33, no. 5, p. e3125, 2019. View at: Publisher Site | Google Scholar
  16. A. F. Lukman, K. Ayinde, B. M. G. Kibria, and E. T. Adewuyi, “Modified ridge-type estimator for the gamma regression model,” Communications in Statistics-Simulation and Computation, pp. 1–15, 2020. View at: Publisher Site | Google Scholar
  17. B. M. G. Kibria and A. F. Lukman, “A new ridge-type estimator for the linear regression model: simulations and applications,” Scientifica, vol. 2020, Article ID 9758378, 16 pages, 2020. View at: Publisher Site | Google Scholar
  18. J. W. Hardin and J. M. Hilbe, Generalized Linear Models and Extensions, Stata Press, College Station, TX, USA, 2012.
  19. Z. Y. Algamal and Y. Asar, “Liu-type estimator for the gamma regression model,” Communications in Statistics-Simulation and Computation, vol. 49, no. 8, pp. 2035–2048, 2018. View at: Publisher Site | Google Scholar
  20. S. G. Wang, M. X. Wu, and Z. Z. Jia, Matrix Inequalities, Chinese Science Press, Beijing, China, 2nd edition, 2006.
  21. R. W. Farebrother, “Further results on the mean square error of ridge regression,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 38, no. 3, pp. 248–250, 1976. View at: Publisher Site | Google Scholar
  22. G. Trenkler and H. Toutenburg, “Mean squared error matrix comparisons between biased estimators-an overview of recent results,” Statistical Papers, vol. 31, no. 1, pp. 165–179, 1990. View at: Publisher Site | Google Scholar
  23. A. F. Lukman, K. Ayinde, S. K. Sek, and E. Adewuyi, “A modified new two-parameter estimator in a linear regression model,” Modelling and Simulation in Engineering, vol. 2019, Article ID 6342702, 10 pages, 2019. View at: Publisher Site | Google Scholar
  24. A. F. Lukman, K. Ayinde, B. Aladeitan, and R. Bamidele, “An unbiased estimator with prior information,” Arab Journal of Basic and Applied Sciences, vol. 27, no. 1, pp. 45–55, 2020. View at: Publisher Site | Google Scholar
  25. B. M. G. Kibria, “Performance of some new ridge regression estimators,” Communications in Statistics-Simulation and Computation, vol. 32, no. 1, pp. 419–435, 2003. View at: Publisher Site | Google Scholar
  26. B. M. G. Kibria and S. Banik, “Some ridge regression estimators and their performances,” Journal of Modern Applied Statistical Methods, vol. 15, no. 1, pp. 206–238, 2016. View at: Publisher Site | Google Scholar
  27. Z. Y. Algamal and M. H. Lee, “A novel molecular descriptor selection method in QSAR classification model based on weighted penalized logistic regression,” Journal of Chemometrics, vol. 31, no. 10, p. e2915, 2017. View at: Publisher Site | Google Scholar
  28. A. F. Lukman, A. Zakariya, G. B. M. Kibria, and K. Ayinde, “The KL estimator for the inverse Gaussian regression model,” Concurrency Computat Pract Exper, p. e6222, 2021, inpress. View at: Publisher Site | Google Scholar

Copyright © 2021 Adewale F. Lukman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Related articles

No related content is available yet for this article.
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views705
Downloads522
Citations

Related articles

No related content is available yet for this article.

Article of the Year Award: Outstanding research contributions of 2021, as selected by our Chief Editors. Read the winning articles.