Classification models for ensemble noise filter In a recent study, we performed some pre- liminary noise detection work on the two NASA software case studies, using an ensem- ble of 25 classifiers.9 The basic idea was to use a set of different classifiers (as an ensemble- classifier filter) to classify software modules and tag as potential noise those misclassified by a majority of classifiers. The underlying assumption is that if a majority of classifiers misclassify a software module, that module’s attribute values likely don’t adhere to the underlying characteristics of the whole soft- ware product, and the module is likely noise.
The ensemble noise filter consists of 25 different classifiers. It includes traditional classifiers (such as logistic regression, naive Bayes, and regression trees) and advanced computational intelligence-based methods (such as genetic programming, neural net- works, and rough sets). Table 4 lists the 25 classifiers used in our ensemble noise fil- ter. The classifiers encompass different supervised-learning paradigms, including induction, soft computing, and statistical regression. Several of them are imple- mented in the Weka data mining and machine learning tool, which is written in Java.10 Other works describe each classifier in more detail.9,10
For each classification technique, we first obtained several candidate classification models. We chose the one with a preferred balance between the FPR and FNR error rates as the final model. We adopted this strategy in accordance with our previous studies1 of high-assurance software systems similar to JM1 and KC2. Moreover, such a strategy is a practical solution for high- assurance software systems because super- vised classification is difficult for such sys- tems and often yields a classifier that predicts most or all modules as not fault prone.
The lines-of-code classifier first sorted the modules in ascending order of their lines of code. Using a specific threshold lines-of- code value, thdLOC, the modules with LOC lower than thdLOCwere selected and labeled (predicted) as not fault prone, and the rest as fault prone. We varied the threshold value