Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher Unicode planes support? #147

Closed
f3ath opened this issue Apr 2, 2023 · 2 comments
Closed

Higher Unicode planes support? #147

f3ath opened this issue Apr 2, 2023 · 2 comments

Comments

@f3ath
Copy link
Contributor

f3ath commented Apr 2, 2023

Hi @renggli! First of all, thanks for this great piece of software. I was hoping to use petitparser to implement a function to validate I-Regex expressions in my json_path implementation. The I-Regex standard requires full unicode compatibility, including the character from the higher planes. Here's the corresponding part of the ABNF:

NormalChar = ( %x00-27 / %x2C-2D ; ','-'-'
 / %x2F-3E ; '/'-'>'
 / %x40-5A ; '@'-'Z'
 / %x5E-7A ; '^'-'z'
 / %x7E-10FFFF )

Unfortunately, it seems that petitparser does not support surrogate UTF-16 pairs. Here's an example code to reproduce the issue:

import 'package:petitparser/parser.dart';

void main() {
 range("\u{10ff00}", "\u{10ffff}");
}

I have tried to quickly "hack" your code by replacing .codeUnit calls with .runes, but it seems to go deeper than I thought and also might affect performance, since converting to runes will make access by index impossible. So my question is: do you plan to add support for higher planes?

@renggli
Copy link
Member

renggli commented Apr 2, 2023

See #80 for a similar request.

I don't exactly understand what NormalChar is supposed to accept, but you can identify surrogate characters as follows:

final surrogatePair = seq2(
  pattern('\uD800-\uDBFF'),
  pattern('\uDC00-\uDFFF'),
).flatten();

From there — I am pretty sure — you can compose a parser that consumes the desired set of single and surrogate characters.

@f3ath
Copy link
Contributor Author

f3ath commented Apr 3, 2023

Thank you, the approach you suggested seems to be working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants