Table of Contents
Advances in Statistics
Volume 2014, Article ID 502678, 19 pages
http://dx.doi.org/10.1155/2014/502678
Review Article

Entering the Era of Data Science: Targeted Learning and the Integration of Statistics and Computational Data Analysis

1University of California, Berkeley, 108 Haviland Hall, Berkeley, CA 94720-7360, USA
2Department of Computer Science, Utrecht University, The Netherlands

Received 16 February 2014; Revised 9 July 2014; Accepted 10 July 2014; Published 10 September 2014

Academic Editor: Chin-Shang Li

Copyright © 2014 Mark J. van der Laan and Richard J. C. M. Starmans. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. M. J. van der Laan and D. Rubin, “Targeted maximum likelihood learning,” International Journal of Biostatistics, vol. 2, no. 1, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  2. M. J. van der Laan and S. Rose, Targeted Learning: Causal Inference for Observational and Experimental Data, Springer, New York, NY, USA, 2011. View at Publisher · View at Google Scholar · View at MathSciNet
  3. R. J. C. M. Starmans, “Models, inference and truth: Probabilistic reasoning in the information era,” in Targeted Learning: Causal Inference for Observational and Experimental Studies, M. J. van der Laan and S. Rose, Eds., pp. 1–20, Springer, New York, NY, USA, 2011. View at Google Scholar · View at MathSciNet
  4. M. J. van der Laan, “Estimation based on case-control designs with known prevalence probability,” The International Journal of Biostatistics, vol. 4, no. 1, 2008. View at Publisher · View at Google Scholar · View at MathSciNet
  5. A. Chambaz and M. J. van der Laan, “Targeting the optimal design in randomized clinical trials with binary outcomes and no covariate, theoretical study,” The International Journal of Biostatistics, vol. 7, no. 1, pp. 1–32, 2011, Working paper 258, http://biostats.bepress.com/ucbbiostat/. View at Google Scholar
  6. A. Chambaz and M. J. van der Laan, “Targeting the optimal design in randomized clinical trials with binary outcomes and no covariate: simulation study,” International Journal of Biostatistics, vol. 7, no. 1, article 33, 2011, Working paper 258, http://www.bepress.com/ucbbiostat. View at Google Scholar · View at MathSciNet
  7. M. J. van der Laan, L. B. Balzer, and M. L. Petersen, “Adaptive matching in randomized trials and observational studies,” Journal of Statistical Research, vol. 46, no. 2, pp. 113–156, 2013. View at Google Scholar
  8. M. J. van der Laan, “Causal inference for networks,” Tech. Rep. 300, University of California, Berkeley, Calif, USA, 2012. View at Google Scholar
  9. J. Pearl, Causality: Models, Reasoning, and Inference, Cambridge University Press, Cambridge, NY, USA, 2nd edition, 2009. View at Publisher · View at Google Scholar · View at MathSciNet
  10. R. D. Gill, “Non- and semi-parametric maximum likelihood estimators and the von Mises method (part 1),” Scandinavian Journal of Statistics, vol. 16, pp. 97–128, 1989. View at Google Scholar
  11. A. W. van der Vaart and J. A. Wellner, Weak Convergence and Emprical Processes, Springer Series in Statistics, Springer, New York, NY, USA, 1996. View at Publisher · View at Google Scholar · View at MathSciNet
  12. R. D. Gill, M. J. van der Laan, and J. A. Wellner, “Inefficient estimators of the bivariate survival function for three models,” Annales de l'Institut Henri Poincare, vol. 31, no. 3, pp. 545–597, 1995. View at Google Scholar · View at MathSciNet
  13. P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. Wellner, Efficient and Adaptive Estimation for Semiparametric Models, Springer, 1997.
  14. S. Gruber and M. J. van der Laan, “A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome,” International Journal of Biostatistics, vol. 6, no. 1, article 26, 2010. View at Publisher · View at Google Scholar · View at MathSciNet
  15. M. J. van der Laan and S. Dudoit, “Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples,” Technical Report, Division of Biostatistics, University of California, Berkeley, Calif, USA, 2003. View at Google Scholar
  16. A. W. van der Vaart, S. Dudoit, and M. J. van der Laan, “Oracle inequalities for multi-fold cross validation,” Statistics and Decisions, vol. 24, no. 3, pp. 351–371, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  17. M. J. van der Laan, S. Dudoit, and A. W. van der Vaart, “The cross-validated adaptive epsilon-net estimator,” Statistics & Decisions, vol. 24, no. 3, pp. 373–395, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  18. M. J. van der Laan, E. Polley, and A. Hubbard, “Super learner,” Statistical Applications in Genetics and Molecular Biology, vol. 6, no. 1, article 25, 2007. View at Publisher · View at Google Scholar · View at MathSciNet
  19. E. C. Polley, S. Rose, and M. J. van der Laan, “Super learning,” in Targeted Learning: Causal Inference for Observational and Experimental Data, M. J. van der Laan and S. Rose, Eds., Springer, New York, NY, USA, 2012. View at Google Scholar
  20. M. J. van der Laan and J. M. Robins, Unified Methods for Censored Longitudinal Data and Causality, New York, NY, USA, Springer, 2003. View at Publisher · View at Google Scholar · View at MathSciNet
  21. M. L. Petersen and M. J. van der Laan, A General Roadmap for the Estimation of Causal Effects, Division of Biostatis tics, University of California, Berkeley, Calif, USA, 2012.
  22. J. Splawa-Neyman, “On the application of probability theory to agricultural experiments,” Statistical Science, vol. 5, no. 4, pp. 465–480, 1990. View at Google Scholar · View at MathSciNet
  23. D. B. Rubin, “Estimating causal effects of treatments in randomized and non-randomized studies,” Journal of Educational Psychology, vol. 64, pp. 688–701, 1974. View at Google Scholar
  24. D. B. Rubin, Matched Sampling for Causal Effects, Cambridge University Press, Cambridge, Mass, USA, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  25. P. W. Holland, “Statistics and causal inference,” Journal of the American Statistical Association, vol. 81, no. 396, pp. 945–960, 1986. View at Publisher · View at Google Scholar · View at MathSciNet
  26. J. Robins, “A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect,” Mathematical Modelling, vol. 7, no. 9–12, pp. 1393–1512, 1986. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  27. J. M. Robins, “Addendum to “A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect”,” Computers & Mathematics with Applications, vol. 14, no. 9–12, pp. 923–945, 1987. View at Google Scholar
  28. J. Robins, “A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods,” Journal of Chronic Diseases, vol. 40, supplement 2, pp. 139S–161S, 1987. View at Google Scholar · View at Scopus
  29. A. Rotnitzky, D. Scharfstein, T. L. Su, and J. Robins, “Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring,” Biometrics, vol. 57, no. 1, pp. 103–113, 2001. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  30. J. M. Robins, A. Rotnitzky, and D. O. Scharfstein, “Sensitivity analysis for se lection bias and unmeasured confounding in mi ssing data and causal inference models,” in Statistical Models in Epidemiology, the Environment and Clinical Trials, IMA Volumes in Mathematics and Its Applications, Springer, Berlin, Germany, 1999. View at Google Scholar
  31. D. O. Scharfstein, A. Rotnitzky, and J. Robins, “Adjusting for nonignorable drop-out using semiparametric nonresponse models,” Journal of the American Statistical Association, vol. 94, no. 448, pp. 1096–1146, 1999. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  32. I. Diaz and M. J. van der Laan, “Sensitivity analysis for causal inference under unmeasured confounding and measurement error problems,” Tech. Rep., Division of Biostatistics, University of California, Berkeley, Calif, USA, 2012, http://www.bepress.com/ucbbiostat/paper303. View at Google Scholar
  33. O. Bembom and M. J. van der Laan, “A practical illustration of the im-portance of realistic individualized treatment rules in causal inference,” Electronic Journal of Statistics, vol. 1, pp. 574–596, 2007. View at Google Scholar
  34. S. Rose and M. J. van der Laan, “Simple optimal weighting of cases and controls in case-control studies,” The International Journal of Biostatistics, vol. 4, no. 1, 2008. View at Publisher · View at Google Scholar · View at MathSciNet
  35. S. Rose and M. J. van der Laan, “Why match? Investigating matched case-control study designs with causal effect estimation,” The International Journal of Biostatistics, vol. 5, no. 1, article 1, 2009. View at Publisher · View at Google Scholar · View at MathSciNet
  36. S. Rose and M. J. van der Laan, “A targeted maximum likelihood estimator for two-stage designs,” International Journal of Biostatistics, vol. 7, no. 1, 21 pages, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  37. K. L. Moore and M. J. van der Laan, “Application of time-to-event methods in the assessment of safety in clinical trials,” in Design, Summarization, Analysis & Interpretation of Clinical Trials with Time-to-Event Endpoints, E. Karl, Ed., Chapman and Hall, 2009. View at Google Scholar
  38. K. L. Moore and M. J. van der Laan, “Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation,” Statistics in Medicine, vol. 28, no. 1, pp. 39–64, 2009. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  39. K. L. Moore and M. J. van der Laan, “Increasing power in randomized trials with right censored outcomes through covariate adjustment,” Journal of Biopharmaceutical Statistics, vol. 19, no. 6, pp. 1099–1131, 2009. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  40. O. Bembom, M. L. Petersen, S.-Y. Rhee et al., “Biomarker discovery using targeted maximum likelihood estimation: application to the treatment of antiretroviral resistant HIV infection,” Statistics in Medicine, vol. 28, pp. 152–172, 2009. View at Google Scholar
  41. R. Neugebauer, M. J. Silverberg, and M. J. van der Laan, “Observational study and individualized antiretroviral therapy initiation rules for reducing cancer incidence in HIV-infected patients,” Tech. Rep. 272, Division of Biostatistics, University of California, Berkeley, Calif, USA, 2010. View at Google Scholar
  42. E. C. Polley and M. J. van der Laan, “Predicting optimal treatment assignment based on prognostic factors in cancer patients,” in Design, Summarization, Analysis & Interpretation of Clinical Trials with Time-to-Event Endpoints, K. E. Peace, Ed., Chapman & Hall, 2009. View at Google Scholar
  43. M. Rosenblum, S. G. Deeks, M. van der Laan, and D. R. Bangsberg, “The risk of virologic failure decreases with duration of HIV suppression, at greater than 50% adherence to antiretroviral therapy,” PLoS ONE, vol. 4, no. 9, Article ID e7196, 2009. View at Publisher · View at Google Scholar · View at Scopus
  44. M. J. van der Laan and S. Gruber, “Collaborative double robust targeted maximum likelihood estimation,” The International Journal of Biostatistics, vol. 6, no. 1, article 17, 2010. View at Publisher · View at Google Scholar · View at Scopus
  45. O. M. Stitelman and M. J. van der Laan, “Collaborative targeted maximum like-lihood for time to event data,” Tech. Rep. 260, Division of Biostatistics, University of California, Berkeley, Calif, USA, 2010. View at Google Scholar
  46. S. Gruber and M. J. van der Laan, “An application of collaborative targeted maximum likelihood estimation in causal inference and genomics,” The International Journal of Biostatistics, vol. 6, no. 1, 2010. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  47. M. Rosenblum and M. J. van der Laan, “Targeted maximum likelihood estimation of the parameter of a marginal structural model,” International Journal of Biostatistics, vol. 6, no. 2, 2010. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  48. H. Wang, S. Rose, and M. J. van der Laan, “Finding quantitative trait loci genes with collaborative targeted maximum likelihood learning,” Statistics & Probability Letters, vol. 81, no. 7, pp. 792–796, 2011. View at Google Scholar
  49. I. D. Muñoz and M. J. van der Laan, “Super learner based conditional density estimation with application to marginal structural models,” International Journal of Biostatistics, vol. 7, no. 1, article 38, 2011. View at Publisher · View at Google Scholar · View at MathSciNet
  50. I. D. Muñoz and M. van der Laan, “Population intervention causal effects based on stochastic interventions,” Biometrics, vol. 68, no. 2, pp. 541–549, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  51. I. Diaz and M. J. van der Laan, “Sensitivity analysis for causal inference under unmeasured confounding and measurement error problems,” International Journal of Biostatistics, vol. 9, no. 2, pp. 149–160, 2013. View at Google Scholar
  52. I. Diaz and M. J. van der Laan, “Assessing the causal effect of policies: an example using stochastic interventions,” International Journal of Biostatistics, vol. 9, no. 2, pp. 161–174, 2013. View at Publisher · View at Google Scholar
  53. I. Diaz and J. Mark van der Laan, “Targeted data adaptive estimation of the causal dose—response curve,” Journal of Causal Inference, vol. 1, no. 2, pp. 171–192, 2013. View at Google Scholar
  54. O. M. Stitelman and M. J. van der Laan, “Targeted maximum likelihood estimation of effect modification parameters in survival analysis,” The International Journal of Biostatistics, vol. 7, no. 1, article 19, 2011. View at Publisher · View at Google Scholar
  55. M. J. van der Laan, “Targeted maximum likelihood based causal inference: Part I,” International Journal of Biostatistics, vol. 6, no. 2, Art. pages, 2010. View at Google Scholar · View at MathSciNet
  56. O. M. Stitelman and M. J. van der Laan, “Targeted maximum likelihood estimation of time-to-event parameters with time-dependent covariates,” Tech. Rep., Division of Biostatistics, University of California, Berkeley, Calif, USA, 2011. View at Google Scholar
  57. M. Schnitzer, E. Moodie, M. J. van der Laan, R. Platt, and M. Klei, “Modeling the impact of hepatitis C viral clearance on end-stage liver disease in an HIV co-infected cohort with Targeted Maximum Likelihood Estimation,” Biometrics, vol. 70, no. 1, pp. 144–152, 2014. View at Google Scholar
  58. S. Gruber and M. J. van der Laan, “Targeted minimum loss based estimator that outperforms a given estimator,” The International Journal of Biostatistics, vol. 8, article 11, no. 1, 2012. View at Publisher · View at Google Scholar · View at MathSciNet
  59. S. Gruber and M. J. van der Laan, “Consistent causal effect estimation under dual misspecification and implications for confounder selection procedure,” Statistical Methods in Medical Research, 2012. View at Google Scholar
  60. M. Petersen, J. Schwab, S. Gruber, N. Blaser, M. Schomaker, and M. J. van der Laan, “Targeted minimum loss based estimation of marginal structural working models,” Tech. Rep. 312, University of California, Berkeley, Calif, USA, 2013. View at Google Scholar
  61. J. Brooks, M. J. van der Laan, D. E. Singer, and A. S. Go, “Targeted minimum loss-based estimation of causal effects in right-censored survival data with time-dependent covariates: warfarin, stroke, and death in atrial fibrillation,” Journal of Causal Inference, vol. 1, no. 2, pp. 235–254, 2013. View at Google Scholar
  62. J. Brooks, M. J. van der Laan, and A. S. Go, “Targeted maximum likelihood estimation for prediction calibration,” International Journal of Biostatistics, vol. 8, article 30, no. 1, 2012. View at Google Scholar · View at Scopus
  63. S. Sapp, M. J. van der Laan, and K. Page, “Targeted estimation of variable importance measures with interval-censored outcomes,” Tech. Rep. 307, University of California, Berkeley, Calif, USA, 2013. View at Google Scholar
  64. R. Neugebauer, J. A. Schmittdiel, and M. J. van der Laan, “Targeted learning in real-world comparative effectiveness research with time-varying interventions,” Tech. Rep. HHSA29020050016I, The Agency for Healthcare Research and Quality, 2013. View at Google Scholar
  65. S. D. Lendle, M. S. Subbaraman, and M. J. van der Laan, “Identification and efficient estimation of the natural direct effect among the untreated,” Biometrics, vol. 69, no. 2, pp. 310–317, 2013. View at Google Scholar
  66. S. D. Lendle, B. Fireman, and M. J. van der Laan, “Targeted maximum likelihood estimation in safety analysis,” Journal of Clinical Epidemiology, vol. 66, no. 8, pp. S91–S98, 2013. View at Publisher · View at Google Scholar · View at Scopus
  67. M. S. Subbaraman, S. Lendle, M. van der Laan, L. A. Kaskutas, and J. Ahern, “Cravings as a mediator and moderator of drinking outcomes in the COMBINE study,” Addiction, vol. 108, no. 10, pp. 1737–1744, 2013. View at Publisher · View at Google Scholar · View at Scopus
  68. S. D. Lendle, B. Fireman, and M. J. van der Laan, “Balancing score adjusted targeted minimum loss-based estimation,” 2013.
  69. W. Zheng, M. L. Petersen, and M. J. van der Laan, “Estimating the effect of a community-based intervention with two communities,” Journal of Causal Inference, vol. 1, no. 1, pp. 83–106, 2013. View at Google Scholar
  70. W. Zheng and M. J. van der Laan, “Targeted maximum likelihood estimation of natural direct effects,” International Journal of Biostatistics, vol. 8, no. 1, 2012. View at Google Scholar · View at MathSciNet
  71. W. Zheng and M. J. van der Laan, “Causal mediation in a survival setting with time-dependent mediators,” Technical Report 295, Division of Biostatistics, University of California, Berkeley, Calif, USA, 2012. View at Google Scholar
  72. M. Carone, M. Petersen, and M. J. van der Laan, “Targeted minimum loss based estimation of a casual effect using interval censored time to event data,” in Interval Censored Time to Event Data: Methods and Applications, D.-G. Chen, J. Sun, and K. E . Peace, Eds., Chapman & Hall/CRC, New York, NY, USA, 2012. View at Google Scholar
  73. D. O. Scharfstein, A. Rotnitzky, and J. M. Robins, “Adjusting for nonignorable drop-out using semiparametric nonresponse models, (with discussion and rejoinder),” Journal of the American Statistical Association, vol. 94, pp. 1096–1120, 1999. View at Google Scholar
  74. H. Bang and J. M. Robins, “Doubly robust estimation in missing data and causal inference models,” Biometrics, vol. 61, no. 4, pp. 962–972, 2005. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  75. A. Chambaz, N. Pierre, and M. J. van der Laan, “Estimation of a non-parametric variable importance measure of a continuous exposure,” Electronic Journal of Statistic, vol. 6, pp. 1059–1099, 2012. View at Publisher · View at Google Scholar
  76. C. Tuglus and M. J. van der Laan, “Targeted methods for biomarker discovery, the search for a standard,” UC Berkeley Working Paper Series, 2008, http://www.bepress.com/ucbbiostat/paper233/.
  77. C. Tuglus and M. J. van der Laan, “Modified FDR controlling procedure for multi-stage analyses,” Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, article 12, 2009. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  78. C. Tuglus and M. J. van der Laan, “Targeted methods for biomarker discoveries,” in Targeted Learning: Causal Inference for Observational and Experimental Data, M. J. van der Laan and S. Rose, Eds., chapter 22, Springer, New York, NY, USA, 2011. View at Google Scholar
  79. H. Wang, S. Rose, and M. J. van der Laan, “Finding quantitative trait loci genes,” in Targeted Learning: Causal Inference for Observational and Experimental Data, M.J. van der Laan and S. Rose, Eds., Springer, New York, NY, USA, 2011, chapter 23. View at Google Scholar
  80. L. B. Balzer and M. J. van der Laan, “Estimating effects on rare outcomes: knowledge is power,” Tech. Rep. 310, Division of Biostatistics, University of California, Berkeley, Calif, USA, 2013. View at Google Scholar
  81. W. Zheng and M. J. van der Laan, “Cross-validated targeted minimum loss based estimation,” in Targeted Learning: Causal Inference for Observational and Experimental Studies, M. J. van der Laan and S. Rose, Eds., Springer, New York, NY, USA, 2011. View at Google Scholar
  82. A. Rotnitzky, Q. Lei, M. Sued, and J. M. Robins, “Improved double-robust estimation in missing data and causal inference models,” Biometrika, vol. 99, no. 2, pp. 439–456, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  83. D. B. Rubin and M. J. van der Laan, “Empirical efficiency maximization: improved locally efficient covariate adjustment in randomized experiments and survival analysis,” The International Journal of Biostatistics, vol. 4, no. 1, article 5, 2008. View at Publisher · View at Google Scholar · View at MathSciNet
  84. M. J. van der Laan, “Statistical inference when using data adaptive estimators of nuisance parameters,” Tech. Rep. 302, Division of Biostatistics, University of California, Berkeley, Calif, USA, 2012. View at Google Scholar
  85. M. J. van der and M. L. Petersen, “Targeted learning,” in Ensemble Machine Learning, pp. 117–156, Springer, New York, NY, USA, 2012. View at Google Scholar
  86. M. J. van der Laan, A. E. Hubbard, and S. Kherad, “Statistical inference for data adaptive target parameters,” Tech. Rep. 314, University of California, Berkeley, Calif, USA, June 2013. View at Google Scholar
  87. M. J. van der Laan, “Targeted learning of an optimal dynamic treatment and statistical inference for its mean outcome,” Tech. Rep. 317, University of California at Berkeley, 2013, To appear in Journal of Causal Inference. View at Google Scholar
  88. J. M. Robins, L. Li, E. Tchetgen, and A. W. van der Vaart, “Higher order influence functions and minimax estimation of non-linear functionals,” in Essays in Honor of David A. Freedman , IMS, Collections Probability and Statistics, pp. 335–421, Springer, New York, NY, USA, 2008. View at Google Scholar
  89. S. Rose, R. J. C. M. Starmans, and M. J. van der Laan, “Targeted learning for causality and statistical analysis in medical research,” Tech. Rep. 297, Division of Biostatistics, University of California, Berkeley, Calif, USA, 2011. View at Google Scholar
  90. R. J. C. M. Starmans, “Picasso, Hegel and the era of big data,” Stator, vol. 2, no. 24, 2013 (Dutch). View at Google Scholar
  91. R. J. C. M. Starmans and M. J. van der Laan, “Inferential statistics versus machine learning; a prelude to reconciliation,” Stator, vol. 2, no. 24, 2013 (Dutch). View at Google Scholar