@@ -1092,7 +1092,7 @@ Then you can first of all turn that blanktag tag into an error tag with
1092
1092
1093
1093
Now, we could just suggest a wordform on the comma and call it a day:
1094
1094
1095
- COPY ("<, >" SUGGESTWF) TARGET ("," &no-space-after-punct-mark) ;
1095
+ COPY ("<, >"S SUGGESTWF) TARGET ("," &no-space-after-punct-mark) ;
1096
1096
1097
1097
but that will
1098
1098
@@ -1120,37 +1120,18 @@ word is a "link" word. In the above rules,
1120
1120
- The ` RIGHT ` relation says that this is one big error, not two
1121
1121
separate ones.
1122
1122
1123
- Then we can add a suggestion that puts a space between the forms:
1124
-
1125
- COPY:no-space-after-punct ("<$1 $2>"v SUGGESTWF)
1126
- TARGET ("<(.*)>"r &no-space-after-punct-mark)
1127
- IF (1 ("<(.*)>"r))
1128
- (NOT 0 (co&no-space-after-punct-mark))
1129
- ;
1130
-
1131
- This uses vislcg3's [ variable strings / varstrings] ( http://beta.visl.sdu.dk/cg3/chunked/tags.html#variable-strings ) to create the
1132
- wordform suggestion from two regular expression strings matching the
1133
- wordforms of the two cohorts. Note that the ` $1 ` and ` $2 ` refer to the
1134
- first and second regex groups as they appear in the rule, not as they
1135
- appear in the sentence. If the rule referred to the preceding word
1136
- with ` (-1 ("<(.*)>"r)) ` , you'd probably want the suggestion to be `<$2
1137
- $1>`.
1138
-
1139
- We don't put a suggestion-tag on the ` co& ` cohort (here the word
1140
- ` <ja> ` ), which would lead to some strange suggestions since it is
1141
- already part of the suggestion-tag on the comma ` <,> ` cohort. See
1142
- [ How underlines and replacements are built] ( #orgb25740d ) for more
1143
- on the relationship between ` SUGGESTWF ` and replacements.
1123
+ We don't have to change the SUGGESTWF reading – ` divvun-suggest ` knows
1124
+ how to extend the underline.
1144
1125
1145
1126
Now the output is
1146
1127
1147
1128
"<3>"
1148
1129
"3" Num Arab Sg Loc Attr @HNOUN
1149
1130
"3" Num Arab Sg Nom @HNOUN
1150
1131
"3" Num Arab Sg Ill Attr @HNOUN
1151
- "<,>"
1132
+ "<,>" ,ja → , ja
1152
1133
"," CLB <NoSpaceAfterPunctMark> &no-space-after-punct-mark ID:3 R:RIGHT:4
1153
- "," CLB <NoSpaceAfterPunctMark> "<, ja >" &no-space-after-punct-mark SUGGESTWF ID:3 R:RIGHT:4
1134
+ "," CLB <NoSpaceAfterPunctMark> "<, >" &no-space-after-punct-mark SUGGESTWF ID:3 R:RIGHT:4
1154
1135
"<ja>"
1155
1136
"ja" CC @CNP co&no-space-after-punct-mark ID:4
1156
1137
@@ -1209,6 +1190,16 @@ Now we get:
1209
1190
which should end up as a nice error message, suggestion and
1210
1191
underline in the UI.
1211
1192
1193
+ ## Creating suggestions using regex capture groups
1194
+
1195
+ Some rules have suggestions like ` VSTR:"$1$2"S ` along with regex
1196
+ matches like ` (0 ("<(.*)>"r) LINK -1 ("<(.*)>"r)) ` this uses VISL
1197
+ CG3's [ variable strings / varstrings] ( http://beta.visl.sdu.dk/cg3/chunked/tags.html#variable-strings )
1198
+ to create wordform suggestion from two regular expression strings
1199
+ matching the wordforms of the two cohorts. Note that the ` $1 ` and ` $2 `
1200
+ refer to the first and second regex groups as they appear in the rule,
1201
+ not as they appear in the sentence (so the above VSTR and regex would
1202
+ suggest to swap the order of the two words).
1212
1203
1213
1204
<a id =" org26182db " ></a >
1214
1205
@@ -1254,15 +1245,17 @@ underlines and actually show the suggestions, add a rule like
1254
1245
1255
1246
ADD (&typo SUGGESTWF) (<spelled>) ;
1256
1247
1257
- to the grammar checker CG. The reason we add ` SUGGESTWF ` and not
1258
- ` SUGGEST ` is that we're using the wordform-tag directly as the
1259
- suggestion, and not sending each analysis through the generator (as
1260
- ` SUGGEST ` would do). See also the next section on how replacements
1261
- are built. So if, after disambiguation and grammarchecker CG's, we had
1248
+ to the grammar checker CG (in practice we tend to only add the ` &typo `
1249
+ tag if none of the other grammar rules applied). The reason we add
1250
+ ` SUGGESTWF ` and not ` SUGGEST ` is that we're using the wordform-tag
1251
+ directly as the suggestion, and not sending each analysis through the
1252
+ generator (as ` SUGGEST ` would do). See also the next section on how
1253
+ replacements are built. So if, after disambiguation and grammarchecker
1254
+ CG's, we had
1262
1255
1263
1256
"<coffes>"
1264
- "coffee" N Pl <W:37.3018> <WA:17.3018> <spelled> "<coffees>" &typo SUGGESTWF
1265
- "coffer" N Pl <W:39.1010> <WA:17.3018> <spelled> "<coffers>" &typo SUGGESTWF
1257
+ "coffee" N Pl <W:37.3018> <WA:17.3018> <spelled> "<coffees>"S &typo SUGGESTWF
1258
+ "coffer" N Pl <W:39.1010> <WA:17.3018> <spelled> "<coffers>"S &typo SUGGESTWF
1266
1259
1267
1260
then the final ` divvun-suggest ` step would simply use the contents of
1268
1261
the tags
@@ -1345,16 +1338,6 @@ following CG parse:
1345
1338
will give us all and only the suggestions we want ("he was" and "we
1346
1339
were", but not * "he were").
1347
1340
1348
- There is one exception to the above principles; for
1349
- backwards-compatibility, ` SUGGESTWF ` is still used to mean that the
1350
- whole underline should be replaced by what's in ` SUGGESTWF ` . This
1351
- means that if you combine ` SUGGESTWF ` with ` RIGHT/LEFT ` , you will not
1352
- automatically get the word form for the relation target(s) in your
1353
- replacement, you have to construct the whole replacement yourself.
1354
- This also means you cannot combine ` SUGGESTWF ` with ` SUGGEST ` on
1355
- other words. (If we ever change how this works, we will have to first
1356
- update many existing CG3 rules.)
1357
-
1358
1341
1359
1342
## Summary of special tags and relations
1360
1343
@@ -1378,7 +1361,7 @@ don't conflict with the below special tags.
1378
1361
- ` SUGGESTWF ` on a reading means that ` divvun-suggest ` should use the
1379
1362
reading's wordform-tag (e.g. a tag like
1380
1363
1381
- "<Cupertino >"
1364
+ "<Cupertino >"S
1382
1365
1383
1366
on a * reading* , not as the first line of a cohort) as a suggestion.
1384
1367
See [ Including spelling errors] ( #org26182db ) .
@@ -1395,6 +1378,8 @@ don't conflict with the below special tags.
1395
1378
[ How underlines and replacements are built] ( #orgb25740d ) for details.
1396
1379
Another reason to use ` co& ` is to ensure we can refer to the
1397
1380
central error with ` $1 ` in ` errors.source.xml ` .
1381
+ - ` DROP-PRE-BLANK ` means the suggestion should trim the preceding
1382
+ space (useful for fixing spaces before punctuation).
1398
1383
- ` &ADDED ` means this cohort was added (typically with ` ADDCOHORT ` )
1399
1384
and should be a part of the suggestion for the error. It will appear
1400
1385
after the blank of the preceding cohort, and will not be the central
0 commit comments