Кластеризации данных в области прикладной информатики и программной инженерии на примере зарубежного опыта и зарубежных публикаций



бет8/17
Дата15.12.2022
өлшемі177,5 Kb.
#57493
1   ...   4   5   6   7   8   9   10   11   ...   17
Table 4. Error rates of the 25 classifiers for data sets JM1-8850 and KC2-520. An asterisk indicates that a cross-validation approach wasn’t available for that technique.

JM1-8850
False-positive Fal

KC2-520
se-negative False-positive False-negative

Classifier

Symbol

rate (%)

rate (%)

rate (%)

rate (%)

Case-based reasoning1

CBR

30.70

30.88

21.50

20.75

*Treedisc decision tree algorithm

TD

30.78

29.16

18.60

16.04

*Logistic regression

LR

34.23

33.97

20.77

21.70

*Lines of code

LOC

34.85

34.08

20.53

19.81

*Genetic programming

GP

34.71

32.66

18.36

16.98

*Artificial neural networks

ANN

38.06

30.35

21.26

21.70

LogitBoost classifier

LBOOST

34.72

32.72

22.22

20.75

Rule-based modeling2

RBM

33.71

33.08

17.39

16.04

Bagging classifier

BAG

30.59

30.76

21.50

20.75

*Rough sets-based classifier

RSET

31.62

30.94

16.18

14.15

MetaCost classifier

MCOST

33.67

33.61

23.43

21.70

AdaBoost classifier

ABOOST

33.41

33.79

28.26

29.25

Decision table

DTABLE

34.29

34.32

18.84

18.87

Alternating decision table

ADT

33.83

33.61

19.81

19.81

Sequential minimal optimization

SMO

34.09

33.97

20.77

20.75

Instance-based (1 nearest neighbor)

IB1

34.73

34.74

23.67

24.53

Instance-based (k nearest neighbor)

IBK

32.70

32.48

20.53

19.81

Partial decision trees

PART

33.16

33.14

20.77

19.81

OneR algorithm (based on one most informative attribute)

ONER

34.50

34.38

20.05

19.81

Repeated incremental pruning to produce error reduction

JRIP

33.18

33.08

19.81

19.81

Ripple down rule algorithm

RDR

33.94

34.02

18.84

19.81

C4.5 decision tree algorithm

J48

32.56

32.42

19.57

19.81

Naive Bayes

NBAYES

34.12

33.97

21.26

21.70

Hyperpipes algorithm

HPIPES

37.97

38.29

23.91

23.58

Locally weighted learning

LWLS

33.59

33.61

20.05

19.81

until we got the desired balance between the FPR and FNR. Software-engineering prac- titioners often use the LOC metric as a rule of thumb to gauge a software product’s qual- ity. The underlying assumption is, a larger- size program module is likely to have more software faults than a smaller module.
Table 4 shows the FPR and FNR rates of the 25 classifiers for the two case studies. For 19 of 25 classifiers, the error rates are based on a tenfold cross-validation approach. For the six classification techniques marked with an asterisk in the table, a cross-validation feature wasn’t available owing to limitation of the respective tools used. Table 5 presents some descriptive statistics of the error rates for the 25 classification models. The error rates of the selected models for JM-8850 are relatively greater than for KC2-520. In addition, the FPR and FNR error rates for a given classifier are similar, reflecting the effect of our model-selection strategy.


Noise detection results
We compare the modules tagged as noise by the ensemble noise filter approach with those mislabeled by the software-engineering expert in the clustering-based method. The results in Figure 3 show an interesting match between the two sets of modules. The x-axis indicates the consensus level among the 25
Table 5. Descriptive statistics of error rates for the 25 classification models.

Statistic
Average

FPR (%)
33.75

FNR (%)
33.12

FPR (%)
20.71

FNR (%)
20.30

Standard deviation

1.80

1.80

2.41

2.93

Median

33.83

33.61

20.53

19.81

Minimum

30.59

29.16

16.18

14.15

Maximum

38.06

38.29

28.26

29.25



E n h a n c i n g I n f o r m a t i o n







Figure 3. Noise recall results for (a) JM1-8850 and (b) KC2-520

classifiers used for noise filtering. For exam- ple, 13 means that a module is viewed as noise if 13 or more classifiers predict the label wrong. The y-axis shows the recall per- centage of the modules considered noise by the ensemble—that is, how many of them are covered by the set of modules the expert mislabeled.


In JM1-8850, the noise recall performance of expert-based classification with clusters obtained by Neural-Gas was generally bet- ter than with clusters based on k-means. In KC2-520, however, the opposite was true. The absolute noise recall performance of the expert-based classification was generally bet- ter for the KC2-520 data set than for the JM1- 8850 data set. This indicates that data char- acteristics such as the extent of potential noise, among other factors, influence a clas- sifier’s performance.
It’s interesting that a majority of the mod- ules detected as noise were among the mod- ules mislabeled by the clustering- and expert- based labeling method. Even though we don’t yet know which one (the noise-filtering method or the clustering method) was more accurate for this case study, the matching results warrant future research on noise filter- ing with unsupervised clustering techniques.

T
his study reflects our initial research into clustering-based analysis for soft-
ware quality-estimation problems. We plan to continue discussions with software engineers to better evaluate the benefits of clustering-based analysis. For this purpose, we must further interpret the quality-estimation and noise detection results.

It’s possible to build a more interactive system for software engineers to explore software metrics data, identify mislabeled software modules, and pinpoint the deficiency and inappropriateness of collected software metrics. Data analysts and software- engineering experts can collaborate more closely to construct and collect more informative software metrics.
Data analysts can apply a clustering- and expert-based classification scheme to classification problems in other domains, such as medical research and computer-network- intrusion detection. In the future, they could consider additional clustering techniques and compare them with the techniques in this
study. The impact of numbers of clusters on classification accuracy also deserves more investigation.
Acknowledgments
We thank the anonymous reviewers for their constructive critique and suggestions, Vedang Joshi and Pierre Rebours for their assistance with experiments, and Kehan Gao for her patient reviews of the manuscript.

References


  1. T.M. Khoshgoftaar and N. Seliya, “Analogy- Based Practical Classification Rules for Soft- ware Quality Estimation,” Empirical Software Eng. J., vol. 8, no. 4, Dec. 2003, pp. 325–350.



  1. T.M. Khoshgoftaar, L.A. Bullard, and K. Gao, “Detecting Outliers Using Rule-Based Mod- eling for Improving CBR-Based Software Quality Classification Models,” Case-Based Reasoning Research and Development, LNAI 1689, Springer-Verlag, 2003, pp. 216–230.



  1. C.E. Brodley and M.A. Friedl, “Identifying Mislabeled Training Data,” J. Artificial Intel- ligence Research, vol. 11, Jul.–Dec. 1999, pp. 131–167.



  1. C.M. Teng, “Correcting Noisy Data,” Proc. 16th Int’l Conf. Machine Learning (ICML 99), Morgan Kaufmann, 1999, pp. 239–248.



  1. S. Zhong and J. Ghosh, “A Unified Frame- work for Model-Based Clustering,” J. Machine Learning Research, vol. 4, Dec. 2003, pp. 1001–1037.

  2. T.M. Martinetz, S.G. Berkovich, and K.J. Schulten, “Neural-Gas Network for Vector Quantization and its Application to Time- Series Prediction,” IEEE Trans. Neural Net- works, vol. 4, no. 4, July 1993, pp. 558–569.



  1. W. Pedrycz et al., “Self-Organizing Maps as a Tool for Software Analysis,” Proc. IEEE Canadian Conf. Electrical and Computer Eng. 2001 (CCECE 2001), IEEE Press, 2001, pp. 93–97.



  1. L.C. Briand, W.L. Melo, and J. Wust, “Assess- ing the Applicability of fault proneness Mod- els across Object-Oriented Software Pro- jects,” IEEE Trans. Software Eng., vol. 28, no. 7, July 2002, pp. 706–720.



  1. V. Joshi, Noise Elimination with Ensemble- Classifier Filtering: A Case Study in Software Quality Eng., master’s thesis, Dept. of Com- puter Science and Eng., Florida Atlantic Univ., 2003.



  1. I.H. Witten and E. Frank, Data Mining: Prac- tical Machine Learning Tools and Techniques with Java Implementations, Morgan Kauf- mann, 1999.

Для оценки качества программного обеспечения специалисты по разработке программного обеспечения обычно строят модели классификации качества или прогнозирования неисправностей, используя метрики программного обеспечения и


данные о неисправностях из предыдущего выпуска системы или аналогичного программного проекта.
Затем инженеры используют эти модели для прогнозирования вероятности возникновения неисправностей в разрабатываемых программных модулях.
Однако создание точных моделей оценки качества является сложной задачей, поскольку зашумленные данные обычно снижают эффективность обученных моделей. Существует два общих типа шума в метриках и данных о качестве программного обеспечения. Один относится к неправильно маркированным программным модулям, вызванным тем, что инженеры-программисты не смогли обнаружить, не сообщили или просто проигнорировали существующие программные ошибки. Другая связана с недостатками некоторых коли- лектируемых метрик программного обеспечения, которые могут привести к тому, что два одинаковых (с точки зрения заданных метрик) программных модуля будут иметь разные метки неисправности. Удаление таких зашумленных экземпляров может значительно улучшить производительность калиброванных моделей оценки качества ПО. Поэтому желательно точно определить проблемные программные модули до калибровки любых моделей оценки качества программного обеспечения.
Другая основная проблема заключается в том, что в реальных проектах программного обеспечения измерения неисправностей программного обеспечения (например, метки выраженности неисправностей) могут быть недоступны для обучения модели оценки качества программного обеспечения. Это происходит, когда организация имеет дело с типом проекта программного обеспечения, с которым она никогда раньше не имела дела. Кроме того, возможно, она не записывала и не собирала данные о неисправностях программного обеспечения в предыдущем выпуске системы. Итак, как команда по обеспечению качества может предсказать качество программного проекта без собранных метрик программного обеспечения? Команда не может использовать метод контролируемого обучения без таких показателей качества программного обеспечения, как класс риска или количество неисправностей. Тогда задача оценки ложится на аналитика (эксперта), который должен определить метки для каждого программного модуля. Кластерный анализ, инструмент анализа данных, естественным образом решает эти две проблемы.


Достарыңызбен бөлісу:
1   ...   4   5   6   7   8   9   10   11   ...   17




©emirsaba.org 2024
әкімшілігінің қараңыз

    Басты бет