You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This line in the Hack grammar is an invalid Oniguruma pattern. The behavior for TextMate grammars when an invalid pattern is encountered (dutifully reproduced by vscode-textmate) is to silently fail, causing the regex to never match. So whatever it's intended to do is not happening.
The error in this pattern is the use of [^...\\x7f-\\xff]. From context, it's easy to figure out that the author meant for this to exclude, among other things, the range from code point U+007F to U+00FF. And although that's what it would do in nearly every modern regex engine, that's not what it's doing in Oniguruma.
In Oniguruma, \xHH matches an "encoded byte value", not a code point value like \x{...} does. That means for values from 0 to 7F, \xHH and \x{HH} work the same, but they diverge for hex values 80 to FF. With \xHH above 7F, the token must be part of a valid encoded byte sequence. So e.g. the three-byte sequence \xEF\xBB\xBF in Oniguruma is equivalent to the single code point \uFEFF in JavaScript (NOT the same as the three code points \xEF\xBB\xBF in JavaScript) and the same as \x{FEFF} in Oniguruma. \xFF, on it's own, is not a valid encoded byte sequence, so it is an error in Oniguruma.
To fix this, the \\xff should be replaced with \\x{ff}. It will then run in Oniguruma.
The text was updated successfully, but these errors were encountered:
slevithan
changed the title
Hack grammar contains an invalid Oniguruma pattern
Hack grammar contains invalid Oniguruma token \xffNov 23, 2024
This line in the Hack grammar is an invalid Oniguruma pattern. The behavior for TextMate grammars when an invalid pattern is encountered (dutifully reproduced by
vscode-textmate
) is to silently fail, causing the regex to never match. So whatever it's intended to do is not happening.The error in this pattern is the use of
[^...\\x7f-\\xff]
. From context, it's easy to figure out that the author meant for this to exclude, among other things, the range from code point U+007F to U+00FF. And although that's what it would do in nearly every modern regex engine, that's not what it's doing in Oniguruma.In Oniguruma,
\xHH
matches an "encoded byte value", not a code point value like\x{...}
does. That means for values from0
to7F
,\xHH
and\x{HH}
work the same, but they diverge for hex values80
toFF
. With\xHH
above7F
, the token must be part of a valid encoded byte sequence. So e.g. the three-byte sequence\xEF\xBB\xBF
in Oniguruma is equivalent to the single code point\uFEFF
in JavaScript (NOT the same as the three code points\xEF\xBB\xBF
in JavaScript) and the same as\x{FEFF}
in Oniguruma.\xFF
, on it's own, is not a valid encoded byte sequence, so it is an error in Oniguruma.To fix this, the
\\xff
should be replaced with\\x{ff}
. It will then run in Oniguruma.The text was updated successfully, but these errors were encountered: