Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose raw options on EmulatedRegExp #15

Merged
merged 1 commit into from
Jan 1, 2025

Conversation

antfu
Copy link
Collaborator

@antfu antfu commented Dec 31, 2024

I would need this for shikijs/shiki#878 where I can serialized it into JS code with like:

// pseudo code
let jsCode = ''

for (const reg of regexps) {
  if (reg instance of EmulatedRegExp)
    jsCode += `new EmulatedRegExp(${JSON.stringify(reg.source)},${JSON.stringify(reg.flags)},${JSON.stringify(reg.rawOptions)})`
}

@slevithan
Copy link
Owner

slevithan commented Dec 31, 2024

I'm fine with adding this if it's needed, but is it possible to avoid?

The raw options are already available from toDetails, which could be used if you're working with Oniguruma patterns rather than precompiled regexes.

// pseudo code
let jsCode = ''

for (const pattern of patterns) {
  const d = toDetails(pattern, {
    global: true,
    // ...more options
  })
  jsCode += d.subclass ?
    `new EmulatedRegExp(${JSON.stringify(d.pattern)}, '${d.flags}', ${JSON.stringify(d.subclass)})` :
    // Avoid escaping already-escaped forward slashes
    `/${d.pattern ? str.replace(/\\?./gsu, m => m === '/' ? '\\/' : m) : '(?:)'}/${d.flags}`
}

This forces setting the target when generating the patterns. But I guess it would be OK for the pre-computed versions of the grammars to always use target: 'ES2024'. I'm assuming your current approach has this same limitation, no?

Very open to any feedback, and like I said, I'm open to exposing this. That said, I don't think the current PR handles everything. I think you'd also need the raw/original pattern in order to cover all the edge cases, due to the underlying behavior from the RegExpSubclass class (from Regex+ internals) that the EmulatedRegExp class is extending. And for the sake of completeness (although it wouldn't affect Shiki) this should also probably preserve the rawOptions property on copies when doing new EmulatedRegExp(existingEmulatedRegExp). I can code these follow-ups if you still think this is the best path to take after reading the above.

@antfu
Copy link
Collaborator Author

antfu commented Jan 1, 2025

toDetails seems to take the raw patterns and compile? Currently what I am doing is to run a real constructor from the JS engine and then serializing them to strings, in shikijs/shiki#880

https://github.com/shikijs/shiki/blob/2ac68cfbdd772466890fac1ddd9b77fe9b2c9bb5/packages/langs-precompiled/scripts/precompile.ts#L8-L14
https://github.com/shikijs/shiki/blob/2ac68cfbdd772466890fac1ddd9b77fe9b2c9bb5/packages/langs-precompiled/scripts/langs.ts#L121-L129

As long as we get the logic work, I am fine with the API design. If you have an idea to improve it without changing the API, would you directly commit to the branch and see? Thanks

@slevithan
Copy link
Owner

toDetails seems to take the raw patterns and compile?

Yes.

Currently what I am doing is to run a real constructor from the JS engine and then serializing them to strings [...]
As long as we get the logic work, I am fine with the API design.

I suspect we could find a way to make it work without changing the EmulatedRegExp API, but I think being able to serialize an EmulatedRegExp instance to a string is a use case worth supporting.

I'll go ahead and merge this as a starting point, but I'll need to do a few follow up diffs (and maybe publish a new version of regex/internals) before it becomes stable.

I'll let you know when it's ready!

@slevithan slevithan merged commit 85c91e7 into main Jan 1, 2025
4 checks passed
@slevithan
Copy link
Owner

oniguruma-to-es v0.10.0 has been published. It includes a rawArgs property on EmulatedRegExp instances so they can be accurately serialized.

rawArgs type:

{
  pattern: string;
  flags: string;
  options: {
    strategy?: string;
    useEmulationGroups?: boolean;
  };
}

Along with other improvements, v0.10.0 also renamed toDetails's subclass property as options for consistency.

@slevithan
Copy link
Owner

For reference, the specific follow up diffs for this were dbfe54f and 5857966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants