Кластеризации данных в области прикладной информатики и программной инженерии на примере зарубежного опыта и зарубежных публикаций



бет5/17
Дата15.12.2022
өлшемі177,5 Kb.
#57493
1   2   3   4   5   6   7   8   9   ...   17
Байланысты:
Лаб 1 Даурен

Software metric

Minimum

Maximum

Mean

Median

75%

80%

85%

90%

Branch_Count

1

361

8.79

3

9

11

15

19

Total_Lines_of_Code

1

1,275

37.03

13

45

55

64

91

Executable_LOC

0

1,107

27.87

8

34

42

52

71

Comments_LOC

0

44

2.00

0

2

2

3

5

Blank_LOC

0

121

4.35

1

5

7

9

11

Code_And_Comments_LOC

0

11

0.28

0

0

0

0

1

Total_Operators

1

2,469

57.83

17

64

80

106

142

Total_Operands

0

1,513

37.16

11

41

53

70

95

Unique_Operators

1

47

9.23

8

14

15

16

18

Unique_Operands

0

325

14.52

7

20

24

28

37

Cyclomatic_Complexity

1

180

4.91

2

5

6

8

10

Essential_Complexity

1

125

2.45

1

1

1

4

5

Design_Complexity

1

143

3.66

2

4

5

6

7



at the program function, subroutine, or method levels, so a software module is a pro- gram function, a subroutine, or a method. In JM1, some modules with the same attribute values had different defect labels. When we removed these inconsistent software mod- ules and those with missing values from JM1, 8,850 modules remained. We labeled the reduced JM1 data set JM1-8850. The KC2 data set has 520 software modules, and is labeled KC2-520.
Which software metrics to include in a given set depends on what metrics data col- lection tools are available and the project under consideration. Another software proj- ect might consider a different set of mea- surements for software quality estimation.8 Tables 1 and 2 show the basic data statistics we provided to the expert for JM1-8850 and KC2-520. We derived them from our inter- action with the expert, who used them in his labeling effort.
A majority of the software modules had no faults. In JM1-8850, 1,687 modules had at least one and at most 26 faults. In KC2-520, 106 modules had at least one and no more than 13 faults. A module with no defects was labeled 0 (not fault prone), and a module with one or more faults was labeled 1 (fault prone). However, we didn’t use these labels in our clustering- and expert-based analysis. They were only used for evaluating the expert’s labeling performance.
Experimental setting
We implemented both the k-means and Neural-Gas algorithms in C, with an interface to Matlab and compiled into Matlab exe- cutable (mex) files that you can call directly from Matlab. The executables were in binary code and thus run faster than regular Matlab code. We used batch training for the k-means algorithm with a relative convergence crite- rion of 1e-4 (that is, the algorithm stops if the relative change of cost function between two consecutive iterations is less than 1e-4). For our data sets, the k-means algorithm always converged within 100 iterations. We used an online version of the Neural-Gas algorithm, and the temperature parameter  started at K/2 and gradually decreased to 0.01 at the end of 100 training epochs.
We set the number of clusters K (a parame- ter needed by both the k-means and Neural- Gas algorithms) to 30 for JM1-8850 and 20 for KC2-520. The heuristic consideration for choosing 20 and 30 was a balance between reducing the number of software module rep- resentatives the expert must examine and obtaining a fine (granularity) representation of the original software measurement data. Both k-means and Neural-Gas generated a few empty clusters (4 and 2, respectively) for KC2- 520, so the actual number of clusters the expert evaluated was less than 20 for KC2-520. Instead of evaluating 520 or 8,850 software modules one by one, the expert only had to label 20 or 30 groups at most.
For clustering quality, we used the MSE objective presented earlier and average purity. A cluster’s purity is the percentage of the most-dominated category (fault prone or not fault prone) in the cluster, and average purity is the mean over all clusters. It ranges between 0 and 1—the higher the number, the better the average purity.
We used defect labels provided with the data set to evaluate the expert’s labeling decision. Specifically, we reported the overall classification error, false-positive rate (FPR, percentage of not fault prone modules mis- labeled fault prone), and false-negative rate (FNR, percentage of fault prone modules mislabeled not fault prone).


Достарыңызбен бөлісу:
1   2   3   4   5   6   7   8   9   ...   17




©emirsaba.org 2024
әкімшілігінің қараңыз

    Басты бет