Table of Contents
ISRN Software Engineering
Volume 2012, Article ID 823437, 8 pages
Research Article

Predicting Software Projects Cost Estimation Based on Mining Historical Data

1Computer Information Systems Department, Jordan University of Science and Technology, P.O. Box 3030, Irbid 22110, Jordan
2Computer Information Systems Department, Yarmouk University, P.O. Box 566, Irbid 21163, Jordan

Received 18 November 2011; Accepted 16 January 2012

Academic Editors: J. Cao and O. Greevy

Copyright © 2012 Hassan Najadat et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


In this research, a hybrid cost estimation model is proposed to produce a realistic prediction model that takes into consideration software project, product, process, and environmental elements. A cost estimation dataset is built from a large number of open source projects. Those projects are divided into three domains: communication, finance, and game projects. Several data mining techniques are used to classify software projects in terms of their development complexity. Data mining techniques are also used to study association between different software attributes and their relation to cost estimation. Results showed that finance metrics are usually the most complex in terms of code size and some other complexity metrics. Results showed also that games applications have higher values of the SLOCmath, coupling, cyclomatic complexity, and MCDC metrics. Information gain is used in order to evaluate the ability of object-oriented metrics to predict software complexity. MCDC metric is shown to be the first metric in deciding a software project complexity. A software project effort equation is created based on clustering and based on all software projects’ attributes. According to the software metrics weights values developed in this project, we can notice that MCDC, LOC, and cyclomatic complexity of the traditional metrics are still the dominant metrics that affect our classification process, while number of children and depth of inheritance are the dominant from the object-oriented metrics as a second level.