forked from tatuylonen/wiktextract
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
99 lines (67 loc) · 3.15 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
XXX Some very common words (e.g., green, die, time, plant, be, king)
have separate translation pages, e.g., green/translations). There are
enough of these that they should be handled separately and added to
the senses in question.
- see "translation_link"
Map Participle part-of-speech to verb with the proper tag
Consider combining Symbol, Letter, Character, Punctuation, Glyph part-of-speech
Consider combining affix, prefix, suffix, infix, circumfix, interfix
part-of-speech
XXX It is fairly common to have {{en-phrase}} under ==Verb==. What
part-of-speech should these be?
XXX Check Spanish "capitales" - has two es-noun items with different
gender&meaning. Are these more common? Currently only one of them
gets included.
XXX Qualifiers specified in linkages after {{l|...}} do not get
included in output, e.g., Galician "ano" Etymology 2. Not sure how
common this is overall.
XXX check handling of "alter" at the beginning of parse_any(), is it correct?
XXX change wiktwords to output senses rather than words
XXX check variant vs. data in page.py; is variant properly added?
XXX Redo formatting of {{place|...}} (need custom code)
XXX lots of other things also need custom code. Perhaps just call the lua
code of modules for that?
- check: Portuguese (need consultation?)
- pt-pron def (various ways to express pronoun)
- pt-pronoun-with-n
- pt-pronoun-with-l
- check: Japanese (need consultation?)
- ja-verb form of
Check:
Pinuin reading of
mul-kangxi radical-def
mul-suowen radical-def
mul-kanadef
mul-domino def
mul-cjk stroke-def
Brai-def
speciesabbrev
Capture coordinate terms under Coordinate terms header
# XXX pages linked under "Category:English glossaries" may be interesting
# to check out
# XXX pages linked under "Category:English appendices" may be interesting
# to check out
# XXX pages like "Appendix:Glossary of ..." seem interesting, might want to
# extract data from them?
# XXX "Appendix:Animals" seems to contain helpful information that we might
# want to extract.
# XXX Thesaurus:* pages seem potentially useful
# XXX Check out: Appendix:Roget's thesaurus classification. Could this be
# helpful in hypernyms etc?
# Category:<langcode>:All topics and its subcategories seems very interesting.
# The English category tree looks very promising. XXX where are the
# category relationships defined? Wikimedia Commons?
# XXX check Unsupported titles/* and how to get their real title
# XXX test "sama" (Finnish) to check that linkages for list items are correct
# XXX test "juttu" (Finnish) to check that sense is correctly included in
# linkages
# XXX check pronunciations for "house" to see that "noun" and "verb" senses
# are correctly parsed
# XXX test "Friday" - it has embedded template in Related terms (currently
# handled wrong)
# XXX Finnish ällös seems to leave [[w:Optative mood|optative]] in gloss ???
# XXX Finnish binääri leaves binäärinen#Compounds in gloss
XXX check ====Alternative forms====, e.g., clinicopathologic
XXX consider changing audios format to something more usable/extensible
XXX Check the R: tags. They seem to contain interesting links to
other databases (identifiers into those databases)