Optimizing Software Defect Prediction: A Genetic Algorithm Based Comparative Analysis
Keywords:
Software defect prediction, software metrics, machine learning, classificationAbstract
Software quality assurance is a crucial activity during the initial stages of the software
development life cycle. Over the past two decades, various frameworks have been developed to
ensure software quality. By predicting defective modules at the initial stages, the resources
available for software development can be efficiently used to ensure timely delivery of good-
quality software. Numerous software defect prediction models have been proposed and developed
using supervised and unsupervised machine learning methodologies, along with the integration of
statistical methodologies. Software metrics contain hidden patterns that can be extracted and
utilized to identify defective modules using a machine learning approach. This study applies a
genetic algorithm (GA) to select relevant features that play a vital role in predicting defective
modules, and explores supervised classification techniques by incorporating seven widely used
NASA datasets. The three most used classification techniques, namely decision tree, support
vector machine, and naïve Bayes, were selected for the analysis. Precision, accuracy, recall,
Matthew’s correlation coefficient, F-measure, and receiver operating characteristic were selected
as the performance parameters. The results of this study can serve as a baseline for comparing and
verifying the results of new models that implement GA for optimal feature selection.