-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate PCRE capture handling #12
Commits on Feb 16, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 8269df1 - Browse repository at this point
Copy the full SHA 8269df1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 36ed4eb - Browse repository at this point
Copy the full SHA 36ed4ebView commit details -
Configuration menu - View commit details
-
Copy full SHA for 377c67d - Browse repository at this point
Copy the full SHA 377c67dView commit details -
Add
fsm_generate_matches
(src/libfsm/gen.c).This is mainly used for fuzz testing -- we can use gen to walk a DFA to generate matching input strings up to a certain length, so then we can compare capture behavior against PCRE for those particular inputs. amend: gen tests
Configuration menu - View commit details
-
Copy full SHA for 41c8e54 - Browse repository at this point
Copy the full SHA 41c8e54View commit details -
Complemely rework capture resoultion.
This is a big commit, unfortunately difficult to break apart further due to interface changes, metadata being passed through whole-FSM transformations, and so on. Sorry about that. - Delete code related to capture action metadata on edges. That approach made FSM transformations (determinisation, minimisation, etc.) considerably more expensive, and there were some corner cases that I wasn't able to get working correctly. - Switch to a somewhat simpler method, adapted from Russ Cox's "Regular Expression Matching: the Virtual Machine Approach". Since the capture resolution metadata (an opcode program for a virtual machine) is associated with individual end states, this combines cleanly when multiple regexes are unioned into a single large DFA that matches them all at once. - Add lots of capture regression tests, mostly from using libfsm's `fsm_generate_matches` and a fuzzer to compare behavior against PCRE. This brought many, many obscure cases to light. - Delete capture tests based on the old interface. The new one does not work with state machines built manually using libfsm's interafces, only via compilation from regex. - Some performance improvements to trimming and minimisation, mostly due to better utilizing bit-parallelism in the edge set data structure. - Switch to using new ADTs in several places. amend: interface changes
Configuration menu - View commit details
-
Copy full SHA for d997649 - Browse repository at this point
Copy the full SHA d997649View commit details -
Configuration menu - View commit details
-
Copy full SHA for ef08bab - Browse repository at this point
Copy the full SHA ef08babView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ef4a6c - Browse repository at this point
Copy the full SHA 7ef4a6cView commit details -
parser.act: Avoid crash in parser from '(*:'.
See katef#386 on katef/libfsm. This is a workaround for a bug in the parser -- once the fuzzer finds it, it tends to get in the way of finding deeper issues.
Configuration menu - View commit details
-
Copy full SHA for 9ed77c7 - Browse repository at this point
Copy the full SHA 9ed77c7View commit details -
Configuration menu - View commit details
-
Copy full SHA for a729833 - Browse repository at this point
Copy the full SHA a729833View commit details -
ast_rewrite: Make ast_rewrite's ALT case deduplication preserve order.
Previously it sorted the ALT case subtrees to find and discard unique ones, but capture results are affected by ALT case ordering, so we need to preserve ordering while eliminating duplicates.
Configuration menu - View commit details
-
Copy full SHA for 9089520 - Browse repository at this point
Copy the full SHA 9089520View commit details -
Configuration menu - View commit details
-
Copy full SHA for f584a9c - Browse repository at this point
Copy the full SHA f584a9cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8b6ce44 - Browse repository at this point
Copy the full SHA 8b6ce44View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6428f5e - Browse repository at this point
Copy the full SHA 6428f5eView commit details -
fuzz/target.c: Update fuzz harness with PCRE comparison modes.
To build, uncomment `PCRE_CMP=1` in fuzz/Makefile. This depends on libpcre2-8.
Configuration menu - View commit details
-
Copy full SHA for 16676e3 - Browse repository at this point
Copy the full SHA 16676e3View commit details -
re: Add flags for generating input, capture resolution, multi-regexes.
This was pretty much the minimal amount I needed for manual testing. - `-FC`: No captures, do not build capture metadata even when dialects support it. - Generating input: `build/bin/re -rpcre -G10 '^a*b*$'` - Capture resolution: `build/bin/re -rpcre -R '^a(b*)c$' abbbbbc` -- 0: 0,7 -- 1: 1,6 - Multiple regexes: `build/bin/re -rpcre -pgC -Y file_with_one_regex_per_line` Capture resolution is not implemented for -y or multi-regex yet, because `fsm_exec_with_captures` needs all input buffered ahead of time and because multi-regex isn't passing along the capture bases and matching those up based on the result.
Configuration menu - View commit details
-
Copy full SHA for f753c72 - Browse repository at this point
Copy the full SHA f753c72View commit details -
tests/*/Makefile: Add
-FC
(no captures) for some calls to RE.There are several tests that have nothing to do with captures, capture behavior is tested directly with `tests/captures/`.
Configuration menu - View commit details
-
Copy full SHA for 719acc6 - Browse repository at this point
Copy the full SHA 719acc6View commit details -
Configuration menu - View commit details
-
Copy full SHA for efeae03 - Browse repository at this point
Copy the full SHA efeae03View commit details
Commits on Mar 2, 2023
-
internedstateset: If EXPENSIVE_CHECKS, add check for sorted input.
Also, add a comment about how the iterators yield collated .to states, so we can avoid the overhead of sorting later.
Configuration menu - View commit details
-
Copy full SHA for 1f9d673 - Browse repository at this point
Copy the full SHA 1f9d673View commit details
Commits on Mar 6, 2023
-
Configuration menu - View commit details
-
Copy full SHA for e9c59bc - Browse repository at this point
Copy the full SHA e9c59bcView commit details -
capture tests: SHOULD_SKIP -> SHOULD_REJECT_AS_UNSUPPORTED
Update several skipped tests. These should now be run, and expect libfsm to reject them as unsupported. These will be fixed by the next commit. Also, fix the test runner's handling of unsupported inputs.
Configuration menu - View commit details
-
Copy full SHA for f3e69c3 - Browse repository at this point
Copy the full SHA f3e69c3View commit details -
ast_analysis: Implement rejection for e.g.
^(($)|x)+$
.Expand analysis to detect and reject this special case. It's not likely to be worth supporting, but was previously not identified correctly at compile-time. This has not been fuzzed yet, but with it all tests pass, including several that were previously set to SKIP.
Configuration menu - View commit details
-
Copy full SHA for 5badf27 - Browse repository at this point
Copy the full SHA 5badf27View commit details
Commits on Mar 8, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 0fd0f12 - Browse repository at this point
Copy the full SHA 0fd0f12View commit details -
ast_analysis: Ensure unuspported code path is flagged as unsatisfiable.
Add regression test, found via fuzzing.
Configuration menu - View commit details
-
Copy full SHA for fddcf6f - Browse repository at this point
Copy the full SHA fddcf6fView commit details
Commits on Mar 21, 2023
-
Configuration menu - View commit details
-
Copy full SHA for f39528a - Browse repository at this point
Copy the full SHA f39528aView commit details