forked from danhan/fragment_android
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
78 lines (58 loc) · 3.41 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
===============Folder Description============
1 Paper_Latex
The source file of paper in Latex
2 archive
All the data from LDA and Labeled LDA, and the data for ploting
3 /src, /bin , /data
They are for source code.
=================Describe the Archive folder==============================
This file is to describe the files in the folder /doc
---- Regular LDA (regular-lda-htc.scala,htc_35_topics,document-topic-distribution.csv /2500)
---- HTC ---- HTC.csv
---- Labeled LDA (labeled_lda_htc.scala,htc_topics_72.txt,document-topic-distribution.csv,/2500 )
Doc
---- Regular LDA (the same types as HTC)
----Moto ---- moto.csv
---- Labeled LDA (the same types as HTC)
----bug-num-over-time.xlsx (data for ploting the bugs over time in Topic Mining and Analysis)
----topic_relevance_analysis.xlsx (data for ploting the topic relevance in Topic Mining and Analysis)
----similarity_comparison_of_LDA_Labeled_LDA.xlsx (data for ploting the similarity of regular LDA and Labeled LDA for both vendors)
0 HTC.csv/moto.csv
They contains all bugs and the labels manually put on.
1 regular-lda-x.scala
The script which is the input to TMT tools to get the topics from regular/Labeled LDA
2 x_35_topics
It is from regular/Labeled lDA we labeled, the sequence follows the output from LDA
3 document-topic-distribution.csv
It is generated by regular/Labeled LDA. bugs as the row, topics as the columns, the value is the relevance of bug to this topic
4 /2500
It is the output result from TMT tools.
=====================For Source Code Description================================
1 source code
1 main source code is in src/evaluation folder
2 src/sample is the first version, the only valable information is to caculate the the threshold which can make us find the best distribution for LDA. The result is 0.2
2 data
in the folder /data/<vendor>
1 bug-time.csv
it helps to get the sorted bugs over time.
2 formated-bug-time.csv
it is the intermediate data.There are bug id and formated time for each bug
3 label-bug-topic-entire.csv
the file includes bug id, bug time, bug title, bug description and bug label from manually work
4 label-topic-distribut.csv
It is from Labeled LDA, the bug id, and the relevenace value for each topic
5 label-topics.csv
labels from manual work
6 matrix-relevance-l.csv
It is the output file. It is the bug and topic average relevance distribution for regular LDA by month. The column is the topics, the row is the time period. 7 matrix-relevance-r.csv
It is the output file. It has the same format as "matrix-relevance-r.csv". But it is for label LDA.
8 regular-topic-distribute.csv
It is generated by regular LDA. the column is the topics(there is no topic name, but the indexing id is the same as what it is in the regular-topic.csv), the row is the bug id
9 regular-topcis.csv
It is topics from regular LDA. We labeled them.
10 topics-bugs-inter.csv
This is a matrix, whose rows are the topics from regular LDA, the columns are the topics from Label LDA. The value is the number of bugs intersection between regular and label LDA.
11 topics-bug-l.csv
This is a matrix, but only the row is valuable, which shows the number of bugs for topics from label LDA
12 topcis-bug-r.csv
This is a matrix, but only the column is valuable, which shows the number of bugs for topics from regular LDA.