Кластеризации данных в области прикладной информатики и программной инженерии на примере зарубежного опыта и зарубежных публикаций


Unsupervised software quality estimation



бет4/17
Дата15.12.2022
өлшемі177,5 Kb.
#57493
1   2   3   4   5   6   7   8   9   ...   17
Unsupervised software quality estimation
In the context of software quality estimation, most research has focused on super- vised-learning methods for building quality- classification or fault prediction models. Little effort has been devoted to unsupervised-learning methods. The most relevant recent work is an unsupervised analysis and visualization of software measurement data using SOMs,7 which didn’t focus on the quality estimation or noise detection and filtering problems.
Unsupervised here refers to learning with- out class labels (that is, the software fault measurement); it doesn’t mean learning without human supervision. Our analysis does involve a human software engineering expert who supervises the labeling efforts.
Our clustering- and expert-based approach is an interactive process. Depending on the expert’s availability, we determine a realis- tic range for the number of clusters (K). Such an approach is important for software pro- jects with limited resources. We then present each cluster’s mean software measurement value to the expert as the cluster representatives. The expert also specifies what other statistics of the software measurement data set are necessary to accurately label each cluster as fault prone or not fault prone.
The data statistics we provide include global mean, minimum, maximum, median, 75th percentile, 80th percentile, 85th per- centile, and 90th percentile of each feature (software metric) dimension, as well as each cluster’s size. In the interactive process, the clustering analyst certainly can suggest other useful information to the software engineer- ing expert, who in turn can ask for additional data properties. In our study, the expert had over 15 years of experience in software qual ity and reliability engineering. Although we used one expert in our study, multiple experts (a realistic scenario in industry) can work as a team in labeling the software modules.
Experimental study
Our empirical case study used data sets from two NASA software projects (written in C++), labeled JM1 and KC2. The software measurements and fault data were collected



Достарыңызбен бөлісу:
1   2   3   4   5   6   7   8   9   ...   17




©emirsaba.org 2024
әкімшілігінің қараңыз

    Басты бет