Skip to content

Commit 39fd20c

Browse files
committed
📝 Update docs cf. DROP-PRE-BLANK and SUGGESTWF working like SUGGEST
1 parent 9ec735c commit 39fd20c

File tree

1 file changed

+27
-42
lines changed

1 file changed

+27
-42
lines changed

README.md

Lines changed: 27 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1092,7 +1092,7 @@ Then you can first of all turn that blanktag tag into an error tag with
10921092

10931093
Now, we could just suggest a wordform on the comma and call it a day:
10941094

1095-
COPY ("<, >" SUGGESTWF) TARGET ("," &no-space-after-punct-mark) ;
1095+
COPY ("<, >"S SUGGESTWF) TARGET ("," &no-space-after-punct-mark) ;
10961096

10971097
but that will
10981098

@@ -1120,37 +1120,18 @@ word is a "link" word. In the above rules,
11201120
- The `RIGHT` relation says that this is one big error, not two
11211121
separate ones.
11221122

1123-
Then we can add a suggestion that puts a space between the forms:
1124-
1125-
COPY:no-space-after-punct ("<$1 $2>"v SUGGESTWF)
1126-
TARGET ("<(.*)>"r &no-space-after-punct-mark)
1127-
IF (1 ("<(.*)>"r))
1128-
(NOT 0 (co&no-space-after-punct-mark))
1129-
;
1130-
1131-
This uses vislcg3's [variable strings / varstrings](http://beta.visl.sdu.dk/cg3/chunked/tags.html#variable-strings) to create the
1132-
wordform suggestion from two regular expression strings matching the
1133-
wordforms of the two cohorts. Note that the `$1` and `$2` refer to the
1134-
first and second regex groups as they appear in the rule, not as they
1135-
appear in the sentence. If the rule referred to the preceding word
1136-
with `(-1 ("<(.*)>"r))`, you'd probably want the suggestion to be `<$2
1137-
$1>`.
1138-
1139-
We don't put a suggestion-tag on the `co&` cohort (here the word
1140-
`<ja>`), which would lead to some strange suggestions since it is
1141-
already part of the suggestion-tag on the comma `<,>` cohort. See
1142-
[How underlines and replacements are built](#orgb25740d) for more
1143-
on the relationship between `SUGGESTWF` and replacements.
1123+
We don't have to change the SUGGESTWF reading – `divvun-suggest` knows
1124+
how to extend the underline.
11441125

11451126
Now the output is
11461127

11471128
"<3>"
11481129
"3" Num Arab Sg Loc Attr @HNOUN
11491130
"3" Num Arab Sg Nom @HNOUN
11501131
"3" Num Arab Sg Ill Attr @HNOUN
1151-
"<,>"
1132+
"<,>" ,ja → , ja
11521133
"," CLB <NoSpaceAfterPunctMark> &no-space-after-punct-mark ID:3 R:RIGHT:4
1153-
"," CLB <NoSpaceAfterPunctMark> "<, ja>" &no-space-after-punct-mark SUGGESTWF ID:3 R:RIGHT:4
1134+
"," CLB <NoSpaceAfterPunctMark> "<, >" &no-space-after-punct-mark SUGGESTWF ID:3 R:RIGHT:4
11541135
"<ja>"
11551136
"ja" CC @CNP co&no-space-after-punct-mark ID:4
11561137

@@ -1209,6 +1190,16 @@ Now we get:
12091190
which should end up as a nice error message, suggestion and
12101191
underline in the UI.
12111192

1193+
## Creating suggestions using regex capture groups
1194+
1195+
Some rules have suggestions like `VSTR:"$1$2"S` along with regex
1196+
matches like `(0 ("<(.*)>"r) LINK -1 ("<(.*)>"r))` this uses VISL
1197+
CG3's [variable strings / varstrings](http://beta.visl.sdu.dk/cg3/chunked/tags.html#variable-strings)
1198+
to create wordform suggestion from two regular expression strings
1199+
matching the wordforms of the two cohorts. Note that the `$1` and `$2`
1200+
refer to the first and second regex groups as they appear in the rule,
1201+
not as they appear in the sentence (so the above VSTR and regex would
1202+
suggest to swap the order of the two words).
12121203

12131204
<a id="org26182db"></a>
12141205

@@ -1254,15 +1245,17 @@ underlines and actually show the suggestions, add a rule like
12541245

12551246
ADD (&typo SUGGESTWF) (<spelled>) ;
12561247

1257-
to the grammar checker CG. The reason we add `SUGGESTWF` and not
1258-
`SUGGEST` is that we're using the wordform-tag directly as the
1259-
suggestion, and not sending each analysis through the generator (as
1260-
`SUGGEST` would do). See also the next section on how replacements
1261-
are built. So if, after disambiguation and grammarchecker CG's, we had
1248+
to the grammar checker CG (in practice we tend to only add the `&typo`
1249+
tag if none of the other grammar rules applied). The reason we add
1250+
`SUGGESTWF` and not `SUGGEST` is that we're using the wordform-tag
1251+
directly as the suggestion, and not sending each analysis through the
1252+
generator (as `SUGGEST` would do). See also the next section on how
1253+
replacements are built. So if, after disambiguation and grammarchecker
1254+
CG's, we had
12621255

12631256
"<coffes>"
1264-
"coffee" N Pl <W:37.3018> <WA:17.3018> <spelled> "<coffees>" &typo SUGGESTWF
1265-
"coffer" N Pl <W:39.1010> <WA:17.3018> <spelled> "<coffers>" &typo SUGGESTWF
1257+
"coffee" N Pl <W:37.3018> <WA:17.3018> <spelled> "<coffees>"S &typo SUGGESTWF
1258+
"coffer" N Pl <W:39.1010> <WA:17.3018> <spelled> "<coffers>"S &typo SUGGESTWF
12661259

12671260
then the final `divvun-suggest` step would simply use the contents of
12681261
the tags
@@ -1345,16 +1338,6 @@ following CG parse:
13451338
will give us all and only the suggestions we want ("he was" and "we
13461339
were", but not *"he were").
13471340

1348-
There is one exception to the above principles; for
1349-
backwards-compatibility, `SUGGESTWF` is still used to mean that the
1350-
whole underline should be replaced by what's in `SUGGESTWF`. This
1351-
means that if you combine `SUGGESTWF` with `RIGHT/LEFT`, you will not
1352-
automatically get the word form for the relation target(s) in your
1353-
replacement, you have to construct the whole replacement yourself.
1354-
This also means you cannot combine `SUGGESTWF` with `SUGGEST` on
1355-
other words. (If we ever change how this works, we will have to first
1356-
update many existing CG3 rules.)
1357-
13581341

13591342
## Summary of special tags and relations
13601343

@@ -1378,7 +1361,7 @@ don't conflict with the below special tags.
13781361
- `SUGGESTWF` on a reading means that `divvun-suggest` should use the
13791362
reading's wordform-tag (e.g. a tag like
13801363

1381-
"<Cupertino>"
1364+
"<Cupertino>"S
13821365

13831366
on a *reading*, not as the first line of a cohort) as a suggestion.
13841367
See [Including spelling errors](#org26182db).
@@ -1395,6 +1378,8 @@ don't conflict with the below special tags.
13951378
[How underlines and replacements are built](#orgb25740d) for details.
13961379
Another reason to use `co&` is to ensure we can refer to the
13971380
central error with `$1` in `errors.source.xml`.
1381+
- `DROP-PRE-BLANK` means the suggestion should trim the preceding
1382+
space (useful for fixing spaces before punctuation).
13981383
- `&ADDED` means this cohort was added (typically with `ADDCOHORT`)
13991384
and should be a part of the suggestion for the error. It will appear
14001385
after the blank of the preceding cohort, and will not be the central

0 commit comments

Comments
 (0)