Table of Contents Author Guidelines Submit a Manuscript
Applied Computational Intelligence and Soft Computing
Volume 2017, Article ID 5134962, 13 pages
https://doi.org/10.1155/2017/5134962
Research Article

Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting

Clausthal University of Technology, Clausthal-Zellerfeld, Germany

Correspondence should be addressed to Jelena Fiosina; moc.liamg@anisoif.anelej

Received 22 July 2016; Accepted 22 November 2016; Published 8 March 2017

Academic Editor: Francesco Carlo Morabito

Copyright © 2017 Jelena Fiosina and Maksims Fiosins. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. C. Long, Ed., Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, John Wiley and Sons, Inc, New York, NY, USA, 2015.
  2. J. Dean and S. Ghemawat, “Map reduce: a flexible data processing tool,” Communications of the ACM, vol. 53, no. 1, pp. 72–77, 2010. View at Publisher · View at Google Scholar · View at Scopus
  3. Spark. spark cluster computing framework, http://spark.apache.org/.
  4. D. Peralta, S. del Río, S. Ramírez-Gallego, I. Triguero, J. M. Benitez, and F. Herrera, “Evolutionary feature selection for big data classification: a MapReduce approach,” Mathematical Problems in Engineering, vol. 2015, Article ID 246139, 11 pages, 2015. View at Publisher · View at Google Scholar · View at Scopus
  5. “CRAN task view: High-performance and parallel computing with R,” https://cran.r-project.org/web/views/HighPerformanceComputing.html.
  6. SparkR(Ronspark), https://spark.apache.org/docs/latest/sparkr.html.
  7. P. D. Michailidis and K. G. Margaritis, “Accelerating kernel density estimation on the GPU using the CUDA framework,” Applied Mathematical Sciences, vol. 7, no. 29-32, pp. 1447–1476, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  8. A. Fernández, S. del Río, V. López et al., “Big data with cloud computing: an insight on the computing environment, mapreduce, and programming frameworks,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 4, no. 5, pp. 380–409, 2014. View at Publisher · View at Google Scholar · View at Scopus
  9. N. E. Helwig, “Semiparametric regression of big data in R,” in Proceedings of the CSE Big Data Workshop, May 2014.
  10. N. Draper and H. Smith, Applied Regression Analysis, John Wiley & Sons, New York, NY, USA, 1986. View at MathSciNet
  11. M. S. Srivastava, Methods of multivariate statistics, Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], NY, USA, 2002. View at MathSciNet
  12. L. Bottou, “Stochastic gradient descent tricks,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7700, pp. 421–436, 2012. View at Publisher · View at Google Scholar · View at Scopus
  13. W. Härdle, Applied Nonparametric Regression, Cambridge University Press, Cambridge, UK, 2002.
  14. W. Härdle, M. Müller, S. Sperlich, and A. Werwatz, Nonparametric and Semiparametric Models, Springer Series in Statistics, Springer-Verlag, New York, NY, USA, 2004. View at Publisher · View at Google Scholar · View at MathSciNet
  15. E. Nadaraya, “On estimating regression,” Theory of Probability and its Applications, vol. 9, no. 1, pp. 141–142, 1964. View at Publisher · View at Google Scholar
  16. G. S. Watson, “Smooth regression analysis,” The Indian Journal of Statistics. Series A, vol. 26, pp. 359–372, 1964. View at Google Scholar · View at MathSciNet
  17. B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, 1986. View at Publisher · View at Google Scholar · View at MathSciNet
  18. D. W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization, Wiley-Interscience, 1992. View at Publisher · View at Google Scholar · View at MathSciNet
  19. H. Liang, “Estimation in partially linear models and numerical comparisons,” Computational Statistics & Data Analysis, vol. 50, no. 3, pp. 675–687, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  20. P. Speckman, “Kernel smoothing in partial linear models,” Journal of the Royal Statistical Society B, vol. 50, no. 3, pp. 413–436, 1988. View at Google Scholar · View at MathSciNet
  21. M. Zaharia, M. Chowdhury, T. Das et al., “Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI '12), USENIX Association, Berkeley, Calif, USA, 2012.
  22. Mllib guide, http://spark.apache.org/docs/latest/mllib-guide.html.
  23. N. Pentreath, Machine Learning with Spark, Packt Publishing Ltd., Birmingham, UK, 2015.
  24. R. B. Zadeh, X. Meng, A. Ulanov et al., “Matrix computations and optimisation in apache spark,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16), 38, p. 31, San Francisco, Calif, USA, August 2016. View at Publisher · View at Google Scholar
  25. M. Fiosins, J. Fiosina, J. P. Müller, and J. Görmer, “Agent-based integrated decision making for autonomous vehicles in Urban traffic,” Advances in Intelligent and Soft Computing, vol. 88, pp. 173–178, 2011. View at Publisher · View at Google Scholar · View at Scopus
  26. J. Fiosina and M. Fiosins, “Cooperative regression-based forecasting in distributed traffic networks,” in Distributed Network Intelligence, Security and Applications, Q. A. Memon, Ed., chapter 1, p. 337, CRC Press, Taylor & Francis Group, 2013. View at Google Scholar
  27. Airline on-time performance data, asa sections on statistical computing statistical graphics, http://stat-computing.org/dataexpo/2009/.
  28. Data science with hadoop: Predicting airline delays part 2, http://de.hortonworks.com/blog/data-science-hadoop-spark-scala-part-2/.
  29. J. Fiosina, M. Fiosins, and J. P. Müller, “Big data processing and mining for next generation intelligent transportation systems,” Jurnal Teknologi, vol. 63, no. 3, 2013. View at Google Scholar
  30. J. Dromard, G. Roudire, and P. Owezarski, “Unsupervised network anomaly detection in real-time on big data,” in Communications in Computer and Information Science: ADBIS 2015 Short Papers and Workshops, BigDap, DCSA, GID, MEBIS, OAIS, SW4CH, WISARD, Poitiers, France, September 8–11, 2015. Proceedings, vol. 539 of Communications in Computer and Information Science, pp. 197–206, Springer, Berlin, Germany, 2015. View at Publisher · View at Google Scholar