Skip to content

Commit

Permalink
Recursion tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
slevithan committed Oct 30, 2024
1 parent 76c32cd commit 661764e
Show file tree
Hide file tree
Showing 5 changed files with 47 additions and 37 deletions.
28 changes: 20 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,26 +24,34 @@ These options are shared by functions `compile` and `toRegExp`.

Allows results that differ from Oniguruma in rare cases. If `false`, throws if the pattern can't be emulated with identical behavior for the given `target`.

*Default: `true`.*

<details>
<summary>More details</summary>

Specifically, this option enables the following additional features, depending on `target`:

- All targets (`ESNext` and earlier):
- Enables use of `\X` using a close approximation of a Unicode extended grapheme cluster.
- Enables recursion via `\g<0>` and `\g<name>` using a depth limit specified via option `maxRecursionDepth`.
- Enables recursion (e.g. via `\g<0>`) using a depth limit specified via option `maxRecursionDepth`.
- `ES2024` and earlier:
- Enables use of case-insensitive backreferences to case-sensitive groups.
- `ES2018`:
- Enables use of POSIX classes `[:graph:]` and `[:print:]` using ASCII versions rather than the Unicode versions available for `ES2024` and later. Other POSIX classes always use Unicode.

*Default: `true`.*
</details>

### `maxRecursionDepth`

If `null`, any use of recursion throws. If an integer between `2` and `100` (and `allowBestEffort` is on), common recursion forms are supported and recurse up to the specified max depth.

Using a higher limit is not a problem if needed. Although it can add a slight performance cost, that's limited to regexes that actually use recursion.
If `null`, any use of recursion throws. If an integer between `2` and `100` (and `allowBestEffort` is `true`), common recursion forms are supported and recurse up to the specified max depth.

*Default: `6`.*

<details>
<summary>More details</summary>

Using a higher limit is not a problem if needed. Although there can be a performance cost (generally small unless exacerbating an existing problem with superlinear backtracking), there is no effect on regexes that don't use recursion.
</details>

### `optimize`

Simplify the generated pattern when it doesn't change the meaning.
Expand All @@ -54,6 +62,11 @@ Simplify the generated pattern when it doesn't change the meaning.

Sets the JavaScript language version for generated patterns and flags. Later targets allow faster processing, simpler generated source, and support for additional Oniguruma features.

*Default: `'ES2024'`.*

<details open>
<summary>More details</summary>

- `ES2018`: Uses JS flag `u`.
- Emulation restrictions: Character class intersection, nested negated classes, and Unicode properties added after ES2018 are not allowed.
- Generated regexes might use ES2018 features that require Node.js 10 or a browser version released during 2018 to 2023 (in Safari's case). Minimum requirement for any regex is Node.js 6 or a 2016-era browser.
Expand All @@ -63,8 +76,7 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
- `ESNext`: Uses JS flag `v` and allows use of flag groups and duplicate group names.
- Benefits: Faster transpilation, simpler generated source, and duplicate group names are preserved across separate alternation paths.
- Generated regexes might use features that require Node.js 23 or a 2024-era browser (except Safari, which lacks support).

*Default: `'ES2024'`.*
</details>

## Unicode, mixed case-sensitivity

Expand Down
2 changes: 1 addition & 1 deletion demo/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ <h1>
<img src="https://upload.wikimedia.org/wikipedia/commons/c/c2/GitHub_Invertocat_Logo.svg" width="25" height="25" alt="GitHub">
</a>
</h1>
<p>This is a basic REPL for testing the output of <a href="https://github.com/slevithan/oniguruma-to-es">Oniguruma-To-ES</a>, an Oniguruma to JavaScript RegExp transpiler. See <a href="https://github.com/kkos/oniguruma/blob/master/doc/RE">Oniguruma syntax</a>.</p>
<p>This is a basic REPL for testing the output of <a href="https://github.com/slevithan/oniguruma-to-es">Oniguruma-To-ES</a>, an Oniguruma to JavaScript RegExp transpiler. See <a href="https://github.com/kkos/oniguruma/blob/master/doc/RE">Oniguruma syntax</a> for an overview, but there are many subtleties to its differences from JavaScript that aren't shown in the docs.</p>

<h2>Try it</h2>
<p><textarea id="input" spellcheck="false" oninput="showOutput(this); autoGrow(this)"></textarea></p>
Expand Down
38 changes: 19 additions & 19 deletions dist/index.min.js

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions spec/match-recursion.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ describe('Recursion', () => {
expect(() => compile('', '', {maxRecursionDepth: null})).not.toThrow();
});

it('should throw if maxRecursionDepth is not null or a positive-integer in range 2-100', () => {
it('should throw if maxRecursionDepth is not null or a positive-integer 2-100', () => {
for (const value of [-2, 0, 1, 2.5, 101, Infinity, '2', '', undefined, NaN, false, [], {}]) {
expect(() => compile('', '', {maxRecursionDepth: value})).toThrow();
}
Expand Down Expand Up @@ -51,15 +51,15 @@ describe('Recursion', () => {

describe('numbered', () => {
// Current limitation of `regex-recursion`
it('should throw for recursion by number', () => {
it('should throw for numbered recursion', () => {
expect(() => compile(r`(a\g<1>?)`)).toThrow();
expect(() => compile(r`(a\g<2>(\g<1>?))`)).toThrow();
});
});

describe('relative numbered', () => {
// Current limitation of `regex-recursion`
it('should throw for recursion by number', () => {
it('should throw for relative numbered recursion', () => {
expect(() => compile(r`(a\g<-1>?)`)).toThrow();
expect(() => compile(r`(a\g<+1>(\g<-2>?))`)).toThrow();
});
Expand Down
10 changes: 4 additions & 6 deletions src/transform.js
Original file line number Diff line number Diff line change
Expand Up @@ -372,21 +372,19 @@ const SecondPassVisitor = {
const parentAlt = getParentAlternative(node);

// ## Handle recursion; runs after subroutine expansion
// TODO: Can this be refactored into conditions for `isDirectRecursion` and `isIndirectRecursion`?
const isRecursion = openSubroutineRefs.has(ref) || openDirectCaptures.has(origin);
const isDirectRecursion = isRecursion && !openSubroutineRefs.size;
if (isRecursion && !isDirectRecursion) {
if (openSubroutineRefs.has(ref)) {
// Indirect recursion is supportable at the AST level but would require `regex-recursion`
// to allow multiple recursions in a pattern, along with code changes here (after which
// `openDirectCaptures` and `openSubroutineRefs` could be combined)
throw new Error('Unsupported indirect recursion');
}
if (origin) {
// Name or number; not mixed since can't use numbered subroutines with named capture
openSubroutineRefs.add(ref);
} else {
openDirectCaptures.add(node);
}
if (isDirectRecursion) {
if (openDirectCaptures.has(origin)) {
// Recursion doesn't change following backrefs to `ref` (unlike other subroutines), so
// don't wrap with a capture for this node's ref
replaceWith(createRecursion(ref));
Expand Down Expand Up @@ -546,7 +544,7 @@ function cloneCapturingGroup(obj, originMap, up, up2) {
} else {
if (key === 'type' && value === AstTypes.CapturingGroup) {
// Key is the copied node, value is the origin node
originMap.set(store, obj);
originMap.set(store, originMap.get(obj) ?? obj);
}
store[key] = value;
}
Expand Down

0 comments on commit 661764e

Please sign in to comment.