Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 509429, 10 pages
Review Article

Attribute Selection Impact on Linear and Nonlinear Regression Models for Crop Yield Prediction

1IMTA, Boulevard Cuauhnáhuac 8532, Colonia Progreso, 62550 Jiutepec, MOR, Mexico
2UPEMOR, Boulevard Cuauhnáhuac 566, Colonia Lomas del Texcal, 62550 Jiutepec, MOR, Mexico

Received 6 December 2013; Accepted 10 February 2014; Published 26 May 2014

Academic Editors: S. Balochian and Y. Zhang

Copyright © 2014 Alberto Gonzalez-Sanchez et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5′ regression trees, and artificial neural networks (ANN) were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE), relative mean absolute error (RMAE), and correlation factor (). The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%), lowest average RMAE (8.75%), and the highest average correlation factor (0.63).