Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2015 (2015), Article ID 143712, 18 pages
http://dx.doi.org/10.1155/2015/143712
Review Article

The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics

1Erasmus University Rotterdam Institute for Behavior and Biology, Department of Applied Economics, Erasmus School of Economics, Erasmus University Rotterdam, Postbus 1738, 3000 DR Rotterdam, Netherlands
2Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Postbus 1738, 3000 DR Rotterdam, Netherlands

Received 28 November 2014; Accepted 24 December 2014

Academic Editor: Junwen Wang

Copyright © 2015 Ronald de Vlaming and Patrick J. F. Groenen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. P. D. P. Pharoah, A. Antoniou, M. Bobrow, R. L. Zimmern, D. F. Easton, and B. A. J. Ponder, “Polygenic susceptibility to breast cancer and implications for prevention,” Nature Genetics, vol. 31, no. 1, pp. 33–36, 2002. View at Publisher · View at Google Scholar · View at Scopus
  2. J. B. Meigs, P. Shrader, L. M. Sullivan et al., “Genotype score in addition to common risk factors for prediction of type 2 diabetes,” The New England Journal of Medicine, vol. 359, no. 21, pp. 2208–2219, 2008. View at Publisher · View at Google Scholar · View at Scopus
  3. S. M. Purcell, N. R. Wray, J. L. Stone et al., “Common polygenic variation contributes to risk of schizophrenia and bipolar disorder,” Nature, vol. 460, no. 7256, pp. 748–752, 2009. View at Publisher · View at Google Scholar · View at Scopus
  4. J. W. Smoller, K. Kendler, N. Craddock et al., “Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis,” The Lancet, vol. 381, no. 9875, pp. 1371–1379, 2013. View at Publisher · View at Google Scholar
  5. C. A. Rietveld, S. E. Medland, J. Derringer et al., “GWAS of 126,559 individuals identifies genetic variants associated with educational attainment,” Science, vol. 340, no. 6139, pp. 1467–1471, 2013. View at Publisher · View at Google Scholar
  6. C. A. Rietveld, T. Esko, G. Davies et al., “Common genetic variants associated with cognitive performance identified using the proxy-phenotype method,” Proceedings of the National Academy of Sciences of the United States of America, vol. 111, no. 38, pp. 13790–13794, 2014. View at Google Scholar
  7. S. M. Purcell, B. Neale, K. Todd-Brown et al., “PLINK: a tool set for whole-genome association and population-based linkage analyses,” The American Journal of Human Genetics, vol. 81, no. 3, pp. 559–575, 2007. View at Publisher · View at Google Scholar · View at Scopus
  8. D. M. Evans, P. M. Visscher, and N. R. Wray, “Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk,” Human Molecular Genetics, vol. 18, no. 18, pp. 3525–3531, 2009. View at Publisher · View at Google Scholar · View at Scopus
  9. A. E. Hoerl and R. W. Kennard, “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970. View at Publisher · View at Google Scholar
  10. N. Malo, O. Libiger, and N. J. Schork, “Accommodating linkage disequilibrium in genetic-association analyses via ridge regression,” The American Journal of Human Genetics, vol. 82, no. 2, pp. 375–385, 2008. View at Publisher · View at Google Scholar · View at Scopus
  11. G. Abraham, A. Kowalczyk, J. Zobel, and M. Inouye, “Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease,” Genetic Epidemiology, vol. 37, no. 2, pp. 184–195, 2013. View at Publisher · View at Google Scholar · View at Scopus
  12. O. González-Recio, D. Gianola, N. Long, K. A. Weigel, G. J. M. Rosa, and S. Avendaño, “Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers,” Genetics, vol. 178, no. 4, pp. 2305–2313, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. D. Gianola and J. B. C. H. M. van Kaam, “Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits,” Genetics, vol. 178, no. 4, pp. 2289–2303, 2008. View at Publisher · View at Google Scholar · View at Scopus
  14. G. de los Campos, D. Gianola, and G. J. M. Rosa, “Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation,” Journal of Animal Science, vol. 87, no. 6, pp. 1883–1887, 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. J. Crossa, G. de los Campos, P. Pérez et al., “Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers,” Genetics, vol. 186, no. 2, pp. 713–724, 2010. View at Publisher · View at Google Scholar · View at Scopus
  16. J. B. Endelman, “Ridge regression and other kernels for genomic selection with R package rrBLUP,” The Plant Genome Journal, vol. 4, no. 3, pp. 250–255, 2011. View at Publisher · View at Google Scholar
  17. G. Morota and D. Gianola, “Kernel-based whole-genome prediction of complex traits: a review,” Frontiers in Genetics, vol. 5, article 363, 2014. View at Publisher · View at Google Scholar
  18. M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 68, no. 1, pp. 49–67, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  19. H. Zou, “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418–1429, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  20. H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 67, no. 2, pp. 301–320, 2005. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  21. A. Benner, M. Zucknick, T. Hielscher, C. Ittrich, and U. Mansmann, “High-dimensional Cox models: the choice of penalty as part of the model building process,” Biometrical Journal, vol. 52, no. 1, pp. 50–69, 2010. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  22. J. O. Ogutu, T. Schulz-Streeck, and H. P. Piepho, “Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions,” BMC Proceedings, vol. 6, supplement 2, article S10, 2012. View at Publisher · View at Google Scholar
  23. I. E. Frank and J. H. Friedman, “A statistical view of some chemometrics regression tools,” Technometrics, vol. 35, no. 2, pp. 109–135, 1993. View at Publisher · View at Google Scholar
  24. H. M. Bøvelstad, S. Nygård, H. L. Størvold et al., “Predicting survival from microarray data—a comparative study,” Bioinformatics, vol. 23, no. 16, pp. 2080–2087, 2007. View at Publisher · View at Google Scholar · View at Scopus
  25. W. N. van Wieringen, D. Kun, R. Hampel, and A.-L. Boulesteix, “Survival prediction using gene expression data: a review and comparison,” Computational Statistics & Data Analysis, vol. 53, no. 5, pp. 1590–1603, 2009. View at Publisher · View at Google Scholar · View at Scopus
  26. M. G. Usai, M. E. Goddard, and B. J. Hayes, “LASSO with cross-validation for genomic selection,” Genetics Research, vol. 91, no. 6, pp. 427–436, 2009. View at Publisher · View at Google Scholar · View at Scopus
  27. J. C. Whittaker, R. Thompson, and M. C. Denham, “Marker-assisted selection using ridge regression,” Genetical Research, vol. 75, no. 2, pp. 249–252, 2000. View at Google Scholar
  28. J. Yang, S. H. Lee, M. E. Goddard, and P. M. Visscher, “GCTA: a tool for genome-wide complex trait analysis,” The American Journal of Human Genetics, vol. 88, no. 1, pp. 76–82, 2011. View at Publisher · View at Google Scholar · View at Scopus
  29. H. D. Patterson and R. Thompson, “Recovery of inter-block information when block sizes are unequal,” Biometrika, vol. 58, no. 3, pp. 545–554, 1971. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  30. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society Series B: Methodological, vol. 39, no. 1, pp. 1–38, 1977. View at Google Scholar · View at MathSciNet
  31. N. Hofheinz, D. Borchardt, K. Weissleder, and M. Frisch, “Genome-based prediction of test cross performance in two subsequent breeding cycles,” Theoretical and Applied Genetics, vol. 125, no. 8, pp. 1639–1645, 2012. View at Publisher · View at Google Scholar · View at Scopus
  32. C. R. Henderson, “Estimation of genetic parameters,” in Biometrics, vol. 6, pp. 186–187, International Biometric Society, Washington, DC, USA, 1950. View at Google Scholar
  33. C. R. Henderson, “Estimation of variance and covariance components,” Biometrics, vol. 9, no. 2, pp. 226–252, 1953. View at Google Scholar · View at MathSciNet
  34. C. R. Henderson, “Selection index and expected genetic advance,” Statistical Genetics and Plant Breeding, vol. 982, pp. 141–163, 1963. View at Google Scholar
  35. C. R. Henderson, “Best linear unbiased estimation and prediction under a selection model,” Biometrics, vol. 31, no. 2, pp. 423–447, 1975. View at Publisher · View at Google Scholar · View at Scopus
  36. C. R. Henderson, “Best linear unbiased prediction of nonadditive genetic merits,” Journal of Animal Science, vol. 60, no. 1, pp. 111–117, 1985. View at Google Scholar
  37. T. H. E. Meuwissen, B. J. Hayes, and M. E. Goddard, “Prediction of total genetic value using genome-wide dense marker maps,” Genetics, vol. 157, no. 4, pp. 1819–1829, 2001. View at Google Scholar · View at Scopus
  38. L. R. Schaeffer, “Strategy for applying genome-wide selection in dairy cattle,” Journal of Animal Breeding and Genetics, vol. 123, no. 4, pp. 218–223, 2006. View at Publisher · View at Google Scholar · View at Scopus
  39. J. Sherman and W. J. Morrison, “Adjustment of an inverse matrix corresponding to a change in one element of a given matrix,” The Annals of Mathematical Statistics, vol. 21, pp. 124–127, 1950. View at Publisher · View at Google Scholar · View at MathSciNet
  40. M. Woodbury, “Inverting modified matrices,” Memorandum Report 42, Statistical Research Group, Princeton University, 1950. View at Google Scholar
  41. S. R. Searle, G. Casella, and C. E. McCulloch, Variance Components, John Wiley & Sons, Hoboken, NJ, USA, 2006.
  42. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  43. J. Sabourin, A. B. Nobel, and W. Valdar, “Fine-mapping additive and dominant SNP effects using group-LASSO and fractional resample model averaging,” Genetic Epidemiology, vol. 39, no. 2, pp. 77–88, 2015. View at Publisher · View at Google Scholar
  44. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, vol. 2, Springer, New York, NY, USA, 2009. View at Publisher · View at Google Scholar · View at MathSciNet
  45. T. A. Manolio, F. S. Collins, N. J. Cox et al., “Finding the missing heritability of complex diseases,” Nature, vol. 461, no. 7265, pp. 747–753, 2009. View at Publisher · View at Google Scholar · View at Scopus
  46. D. Speed, G. Hemani, M. R. Johnson, and D. J. Balding, “Improved heritability estimation from genome-wide SNPs,” The American Journal of Human Genetics, vol. 91, no. 6, pp. 1011–1021, 2012. View at Publisher · View at Google Scholar · View at Scopus
  47. T. Hastie and R. Tibshirani, “Efficient quadratic regularization for expression arrays,” Biostatistics, vol. 5, no. 3, pp. 329–340, 2004. View at Publisher · View at Google Scholar · View at Scopus
  48. G. S. Kimeldorf and G. Wahba, “A correspondence between Bayesian estimation on stochastic processes and smoothing by splines,” The Annals of Mathematical Statistics, vol. 41, pp. 495–502, 1970. View at Publisher · View at Google Scholar · View at MathSciNet
  49. C. C. Chang, C. C. Chow, L. C. A. M. Tellier, S. Vattikuti, S. M. Purcell, and J. J. Lee, “Second-generation PLINK: rising to the challenge of larger and richer datasets,” Gigascience, vol. 4, no. 7, 2015. View at Google Scholar
  50. X. Shen, M. Alam, F. Fikse, and L. Rönnegård, “A novel generalized ridge regression method for quantitative genetics,” Genetics, vol. 193, no. 4, pp. 1255–1268, 2013. View at Publisher · View at Google Scholar · View at Scopus
  51. N. Hofheinz and M. Frisch, “Heteroscedastic ridge regression approaches for genome-wide prediction with a focus on computational efficiency and accurate effect estimation,” G3: Genes, Genomes, Genetics, vol. 4, no. 3, pp. 539–546, 2014. View at Publisher · View at Google Scholar · View at Scopus
  52. A. Aizerman, E. M. Braverman, and L. I. Rozoner, “Theoretical foundations of the potential function method in pattern recognition learning,” Automationand Remote Control, vol. 25, pp. 821–837, 1964. View at Google Scholar
  53. N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950. View at Publisher · View at Google Scholar · View at MathSciNet
  54. D. A. Harville, “Discussion on a section on interpolation and estimation,” in Statistics: An Appraisal, pp. 281–286, 1983. View at Google Scholar
  55. T. Speed, “[That BLUP is a good thing: the estimation of random effects]: comment,” Statistical Science, vol. 6, no. 1, pp. 42–44, 1991. View at Publisher · View at Google Scholar
  56. H. P. Piepho, “Ridge regression and extensions for genomewide selection in maize,” Crop Science, vol. 49, no. 4, pp. 1165–1176, 2009. View at Publisher · View at Google Scholar · View at Scopus
  57. G. Morota, M. Koyama, G. J. M. Rosa, K. A. Weigel, and D. Gianola, “Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data,” Genetics Selection Evolution, vol. 45, no. 1, article 17, 2013. View at Publisher · View at Google Scholar · View at Scopus
  58. L. Tusell, P. Pérez-Rodríguez, S. Forni, and D. Gianola, “Model averaging for genome-enabled prediction with reproducing kernel Hilbert spaces: a case study with pig litter size and wheat yield,” Journal of Animal Breeding and Genetics, vol. 131, no. 2, pp. 105–115, 2014. View at Publisher · View at Google Scholar · View at Scopus
  59. G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, vol. 6, no. 2, pp. 461–464, 1978. View at Publisher · View at Google Scholar · View at MathSciNet
  60. F. Dudbridge, “Power and predictive accuracy of polygenic risk scores,” PLoS Genetics, vol. 9, no. 3, Article ID e1003348, 2013. View at Publisher · View at Google Scholar · View at Scopus
  61. H. Warren, J.-P. Casas, A. Hingorani, F. Dudbridge, and J. Whittaker, “Genetic prediction of quantitative lipid traits: comparing shrinkage models to gene scores,” Genetic Epidemiology, vol. 38, no. 1, pp. 72–83, 2014. View at Publisher · View at Google Scholar · View at Scopus
  62. J. Yang, B. Benyamin, B. P. McEvoy et al., “Common SNPs explain a large proportion of the heritability for human height,” Nature Genetics, vol. 42, no. 7, pp. 565–569, 2010. View at Publisher · View at Google Scholar · View at Scopus
  63. F. Dudbridge and A. Gusnanto, “Estimation of significance thresholds for genomewide association scans,” Genetic Epidemiology, vol. 32, no. 3, pp. 227–234, 2008. View at Publisher · View at Google Scholar · View at Scopus