From 3a5634660fd0499e85a4d75776968443a210c0e7 Mon Sep 17 00:00:00 2001 From: Steven Levithan Date: Wed, 6 Nov 2024 16:28:24 +0100 Subject: [PATCH] emulation -> accuracy --- README.md | 28 ++++++++++++++----------- demo/demo.css | 2 +- demo/demo.js | 2 +- demo/index.html | 4 ++-- spec/helpers/features.js | 2 ++ spec/helpers/matchers.js | 21 ++++++++----------- spec/match-assertion.spec.js | 4 ++-- spec/match-backreference.spec.js | 31 ++++++++++++++++++++++++--- spec/match-recursion.spec.js | 6 +++--- src/compile.js | 22 +++++++++---------- src/generate.js | 12 +++++------ src/transform.js | 36 ++++++++++++++++---------------- src/unicode.js | 2 +- src/utils.js | 4 ++-- 14 files changed, 101 insertions(+), 75 deletions(-) diff --git a/README.md b/README.md index 919173c..2beb166 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Compared to running the actual [Oniguruma](https://github.com/kkos/oniguruma) C ### [Try the demo REPL](https://slevithan.github.io/oniguruma-to-es/demo/) -Oniguruma-To-ES deeply understands all of the hundreds of large and small differences in Oniguruma and JavaScript regex syntax and behavior across multiple JavaScript version targets. It's *obsessive* about precisely following Oniguruma syntax rules and ensuring that the emulated features it supports have **exactly the same behavior**, even in extreme edge cases. And it's battle-tested on thousands of real-world Oniguruma regexes used in TextMate grammars (via the Shiki library). A few uncommon features can't be perfectly emulated and allow rare differences, but if you don't want to allow this, you can set the `emulation` option to `strict` and throw for such patterns (see details below). +Oniguruma-To-ES deeply understands all of the hundreds of large and small differences in Oniguruma and JavaScript regex syntax and behavior across multiple JavaScript version targets. It's *obsessive* about precisely following Oniguruma syntax rules and ensuring that the emulated features it supports have **exactly the same behavior**, even in extreme edge cases. And it's battle-tested on thousands of real-world Oniguruma regexes used in TextMate grammars (via the Shiki library). A few uncommon features can't be perfectly emulated and allow rare differences, but if you don't want to allow this, you can set the `accuracy` option to throw for such patterns (see details below). ## 📜 Contents @@ -83,7 +83,7 @@ A string with `i`, `m`, and `x` in any order (all optional). ```ts type CompileOptions = { - emulation?: 'strict' | 'default' | 'loose'; + accuracy?: 'strict' | 'default' | 'loose'; global?: boolean; hasIndices?: boolean; maxRecursionDepth?: number | null; @@ -139,24 +139,28 @@ function toRegexAst( These options are shared by functions [`compile`](#compile) and [`toRegExp`](#toregexp). -### `emulation` +### `accuracy` One of `'strict'`, `'default'` *(default)*, or `'loose'`. -Sets the level of emulation strictness. +Sets the level of emulation rigor/strictness. - **Strict:** Throw if the pattern can't be emulated with identical behavior (even in rare edge cases) for the given target. - **Default:** The best choice in most cases. Permits a few close approximations of Oniguruma in order to support additional features. - **Loose:** Useful for non-critical matching like syntax highlighting where having some mismatches is better than not working. -Each level of increased emulation strictness supports a subset of patterns supported by less strict modes. If a given pattern doesn't produce an error for a particular emulation mode, its generated result will be identical with all lower levels of strictness (given the same `target`). +Each level of increased accuracy supports a subset of patterns supported by lower accuracies. If a given pattern doesn't produce an error for a particular accuracy, its generated result will be identical with all lower levels of accuracy (given the same `target`).
More details -#### `default` mode +#### `strict` -Supports all features of `strict` mode, plus the following additional features, depending on `target`: +Supports slightly fewer features, but the missing features are all relatively uncommon (see below). + +#### `default` + +Supports all features of `strict`, plus the following additional features, depending on `target`: - All targets (`ESNext` and earlier): - Enables use of `\X` using a close approximation of a Unicode extended grapheme cluster. @@ -166,12 +170,12 @@ Supports all features of `strict` mode, plus the following additional features, - `ES2018`: - Enables use of POSIX classes `[:graph:]` and `[:print:]` using ASCII-based versions rather than the Unicode versions available for `ES2024` and later. Other POSIX classes are always based on Unicode. -#### `loose` mode +#### `loose` Supports all features of `default`, plus the following: - Silences errors for unsupported uses of the search-start anchor `\G` (a flexible assertion that doesn’t have a direct equivalent in JavaScript). - - Oniguruma-To-ES uses a variety of strategies to accurately emulate many common uses of `\G`. When using `loose` mode, if a `\G` assertion is found that doesn't have a known emulation strategy, the `\G` is simply removed and JavaScript's `y` (`sticky`) flag is added. This might lead to some false positives and negatives. + - Oniguruma-To-ES uses a variety of strategies to accurately emulate many common uses of `\G`. When using `loose` accuracy, if a `\G` assertion is found that doesn't have a known emulation strategy, the `\G` is simply removed and JavaScript's `y` (`sticky`) flag is added. This might lead to some false positives and negatives.
### `global` @@ -190,9 +194,9 @@ Include JavaScript flag `d` (`hasIndices`) in the result. *Default: `6`.* -If an integer between `2` and `100`, common recursion forms are supported and recurse up to the specified depth limit. If set to `null`, any use of recursion results in an error. +Specifies the recursion depth limit. Supported values are integers `2` to `100` and `null`. If `null`, any use of recursion results in an error. -Since recursion isn't infinite-depth like in Oniguruma, use of recursion also results in an error if the `emulation` option is set to `'strict'`. +Since recursion isn't infinite-depth like in Oniguruma, use of recursion also results in an error if using strict `accuracy`.
More details @@ -906,7 +910,7 @@ The table above doesn't include all aspects that Oniguruma-To-ES emulates (inclu 1. Target `ES2018` doesn't allow Unicode property names added in JavaScript specifications after ES2018 to be used. 2. Unicode blocks are easily emulatable but their character data would significantly increase library weight. They're also a deeply flawed and arguably-unuseful feature, given the ability to use Unicode scripts and other properties. -3. With target `ES2018`, the specific POSIX classes `[:graph:]` and `[:print:]` are an error if option `emulation` is `'strict'`, and they use ASCII-based versions rather than the Unicode versions available for target `ES2024` and later. +3. With target `ES2018`, the specific POSIX classes `[:graph:]` and `[:print:]` use ASCII-based versions rather than the Unicode versions available for target `ES2024` and later, and they result in an error if using strict `accuracy`. 4. Target `ES2018` doesn't support nested *negated* character classes. 5. It's not an error for *numbered* backreferences to come before their referenced group in Oniguruma, but an error is the best path for Oniguruma-To-ES because (1) most placements are mistakes and can never match (based on the Oniguruma behavior for backreferences to nonparticipating groups), (2) erroring matches the behavior of named backreferences, and (3) the edge cases where they're matchable rely on rules for backreference resetting within quantified groups that are different in JavaScript and aren't emulatable. Note that it's not a backreference in the first place if using `\10` or higher and not as many capturing groups are defined to the left (it's an octal or identity escape). 6. The recursion depth limit is specified by option `maxRecursionDepth`. Some forms of recursion (multiple recursions in the same pattern, and recursion with backreferences) aren't yet supported. Patterns that would error in Oniguruma due to triggering infinite recursion might find a match in Oniguruma-To-ES since recursion is bounded (future versions will detect this and error at transpilation time). diff --git a/demo/demo.css b/demo/demo.css index 18b412f..588f1d1 100644 --- a/demo/demo.css +++ b/demo/demo.css @@ -34,7 +34,7 @@ h2 { code { padding: 0 3px; - background-color: #f6f6f6; + background-color: #f0f0f0; } kbd { diff --git a/demo/demo.js b/demo/demo.js index ddeb215..4b1b487 100644 --- a/demo/demo.js +++ b/demo/demo.js @@ -5,8 +5,8 @@ const state = { x: getValue('flag-x'), }, opts: { + accuracy: getValue('option-accuracy'), allowSubclassBasedEmulation: getValue('option-allowSubclassBasedEmulation'), - emulation: getValue('option-emulation'), global: getValue('option-global'), hasIndices: getValue('option-hasIndices'), maxRecursionDepth: getValue('option-maxRecursionDepth'), diff --git a/demo/index.html b/demo/index.html index 23c549b..b176da4 100644 --- a/demo/index.html +++ b/demo/index.html @@ -44,8 +44,8 @@

Try it