Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 468405, 11 pages
Research Article

A Procedure for Extending Input Selection Algorithms to Low Quality Data in Modelling Problems with Application to the Automatic Grading of Uploaded Assignments

1Computer Science Department, Universidad de Oviedo, Sedes Departamentales, Edificio 1, Campus de Viesques, 33203 Gijón, Spain
2Computer Science Department, Universidad de Granada, C/Periodista Daniel Saucedo Arana s/n, 18071 Granada, Spain
3Statistics Department, E. U. I. T. Industrial, Universidad de Oviedo, Módulo 1, Planta 4, Campus de Viesques, 33203 Gijón, Spain

Received 11 March 2014; Revised 26 May 2014; Accepted 9 June 2014; Published 7 July 2014

Academic Editor: Anand Paul

Copyright © 2014 José Otero et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


When selecting relevant inputs in modeling problems with low quality data, the ranking of the most informative inputs is also uncertain. In this paper, this issue is addressed through a new procedure that allows the extending of different crisp feature selection algorithms to vague data. The partial knowledge about the ordinal of each feature is modelled by means of a possibility distribution, and a ranking is hereby applied to sort these distributions. It will be shown that this technique makes the most use of the available information in some vague datasets. The approach is demonstrated in a real-world application. In the context of massive online computer science courses, methods are sought for automatically providing the student with a qualification through code metrics. Feature selection methods are used to find the metrics involved in the most meaningful predictions. In this study, 800 source code files, collected and revised by the authors in classroom Computer Science lectures taught between 2013 and 2014, are analyzed with the proposed technique, and the most relevant metrics for the automatic grading task are discussed.