Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globstar ** causes exponential runtime behavior #237

Open
johnp opened this issue Sep 21, 2024 · 2 comments
Open

Globstar ** causes exponential runtime behavior #237

johnp opened this issue Sep 21, 2024 · 2 comments

Comments

@johnp
Copy link

johnp commented Sep 21, 2024

What is the issue with the URL Pattern Standard?

I've recently come across this bug report in the popular urlpattern-polyfill library. Under "§ 2.2. Converting part lists to regular expressions", a pattern such as

new URLPattern({ hostname: '**.google.com' })

is converted to the regex

/^((?:.*)*)\.google\.com$/u

. This regular expression runs in exponential time under V8, and even breaks on sufficiently long input in SpiderMonkey with the error "InternalError: too much recursion". Of the major JS engines, only WebKit is able to execute it in linear time. The globstar pattern is very common in many globbing languages/libraries, for example minimatch. Software wanting to adopt the standardized URLPattern in exchange for the underspecified space of globbing dialects, will eventually have to deal with end user's code subtly breaking due to this (example).

To avoid this, and under the assumption that there is not much value in emitting ((?:.*)*), would it maybe be possible to interpret globstars as unnamed groups, i.e. (.*)? I suspect that there was likely a reason not to support this pattern, but I haven't been able to find any discussion around the topic of globstars in relation to this spec, so I wanted to at least raise it publicly.

Interestingly, while the exponential runtime behavior happens with the polyfill, the native V8 URLPattern implementation does not appear to suffer the same issue.

@jeremyroman
Copy link
Collaborator

Interesting. The way a series of modifiers works is, in a different way, also interesting.

This tokenizes as asterisk asterisk char(.) char(g) char(o) ..., which gets turned into:

  • a part whose type is full-wildcard and which has a modifier of zero-or-more
  • a fixed part

It seems not too hard to just drop the modifier when we have a full wildcard. Would that be enough? I think you could still make it exponential in such regex engines by writing ([^])* or similar, but it'd at least be less of a footgun.

Honestly I'm not sure of the utility of applying any of the modifiers to full-wildcard parts. Is that syntax (writing **, *? or *+) useful in any way, or is it always exactly equivalent to *, including the result of trying to match it. I think it's equivalent so we could just drop the modifier, or maybe even reject such patterns since the author might be trying to do something else (would need investigation if that's web-compatible to do even if we wanted to).

@johnp
Copy link
Author

johnp commented Sep 24, 2024

Dropping the modifier under those conditions is enough to generate a benign /^(.*)\.google\.com$/u RegEx. I've experimentally added the logic before step 7.1; not sure if it should/can apply if there's a part prefix or suffix, as I haven't fully understood those scenarios.

Regarding *? and *+, I also think they would technically be equivalent to **, but rejecting those patterns might be more helpful, because the developer most likely meant to refer to a literal asterisk in this case, i.e. they should use ([*]?) / ([*]+). But I don't have a strong opinion on that, and, of course, web-compatibility would have to be checked.

I've also ran both through the unmodified urlpattern-polyfill:

  • *? becomes (.*)?, as expected, afaict, under step 7.1. (no exponential runtime)
  • *+ becomes ((?:.*)+), as expected, afaict, under step 7.2. (exponential runtime)

So interestingly, there already is some inconsistency, where *? behaves a bit more sensibly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants