Skip to content

Fix utf8 state in regex #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 17, 2025
Merged

Fix utf8 state in regex #125

merged 2 commits into from
Apr 17, 2025

Conversation

coyove
Copy link
Contributor

@coyove coyove commented Apr 13, 2025

This PR proposes the following changes when dealing with invalid utf8 sequences:

  1. Invalid utf8 code will now return 0xFFFD when regexing
  2. Other utf8 operations will not crash but yield undefined results, so prior checking is required.

@tylov
Copy link
Collaborator

tylov commented Apr 17, 2025

I'll merge this, but I will add recovery to back up one byte when in REJECT happens on the 2nd, 3rd or 4th byte in the codepage.

@tylov tylov merged commit 3c8394b into stclib:master Apr 17, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants