-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTNC_Dataset_info.txt
43 lines (31 loc) · 1.07 KB
/
TNC_Dataset_info.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
------------------------------------------------------------
[Data Source]
Platform : TNC:Thai National Corpus
Organization : Chulalongkorn University
Access : https://www.arts.chula.ac.th/ling/tnc/
------------------------------------------------------------
[Data volume]
total : 40000 articles
each domain : 5000 article
Number of domain : 8
------------------------------------------------------------
[After cleaning]
remaining : 30000 articles
each domain : 4500 articles
------------------------------------------------------------
[Split Dataset]
train 70% : 25200 articles
validation 15% : 5400 articles
test 15% : 5400 articles
------------------------------------------------------------
[File Datials]
TNC_Dataset_Public.zip (21.3 MB)
|
|------TNC_TrainSet.csv (128 MB)
|
|------TNC_ValidationSet.csv (26.7 MB)
|
|------TNC_TestSet.csv (26.0 MB)
|
|------Dataset_info.txt (993 bytes)
------------------------------------------------------------