Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hopcroft's minimization: Valmari and Lehtinen's variant for a partial transition function #475

Merged
merged 7 commits into from
Dec 2, 2024

Conversation

koniksedy
Copy link
Collaborator

This PR introduces the implementation of Valmaria and Lehtinin's variant of Hopcroft's minimization of deterministic finite automata with partial transition function.

The performance of this reduction method was tested against Brzozowski's minimization and simulation on determinized automata obtained from:

  1. Abstract Regular Model Checking,
  2. automata generated during the decision process of WS1S logic,
  3. automata for regular expressions sourced from RegexLib,
  4. automata for regular expressions used in email validation obtained as in MFPS'17 and in APLAS'21,
  5. automata for string constraints derived from Norn and SyGuS-qgen and Others, and
  6. Tabakov-Vardi random automata.

Timeouts

The timeout was set to 10 seconds.

Method Timeout Count
Brzozowski 2,167
Simulation 2,308
Hopcroft 481

All Benchmarks

It can be seen that Hopcoroft's algorithm is, on average, 100x faster than Brzozowski or the simulation.

Cactus Plot

cactus

Scatter Plot Matrix

scatter

Abstract Regular Model Checking

Cactus Plot

cactus

Scatter Plot Matrix

scatter

WS1S logic

These automate have a binary alphabet.

Cactus Plot

cactus

Scatter Plot Matrix

scatter

RegexLib

Cactus Plot

cactus

Scatter Plot Matrix

scatter

Email Validation

Cactus Plot

cactus

Scatter Plot Matrix

scatter

Norn and SyGuS-qgen and Others

Cactus Plot

cactus

Scatter Plot Matrix

scatter

Tabakov-Vardi

Automata were generated with 10, 50, 100, and 200 states, for alphabets containing 2, 4, 8, and 16 symbols, and with a transitions-to-states ratio per symbol ranging from 0.1 to 1.

Cactus Plot

cactus

Scatter Plot Matrix

scatter

@koniksedy koniksedy requested review from jurajsic and Adda0 November 30, 2024 12:41
Copy link

codecov bot commented Nov 30, 2024

Codecov Report

Attention: Patch coverage is 94.77612% with 7 lines in your changes missing coverage. Please review.

Project coverage is 74.46%. Comparing base (a45f104) to head (094ad6a).
Report is 20 commits behind head on devel.

Files with missing lines Patch % Lines
src/nfa/operations.cc 94.77% 0 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #475      +/-   ##
==========================================
+ Coverage   73.67%   74.46%   +0.78%     
==========================================
  Files          30       30              
  Lines        4251     4413     +162     
  Branches      968     1003      +35     
==========================================
+ Hits         3132     3286     +154     
- Misses        771      772       +1     
- Partials      348      355       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jurajsic
Copy link
Member

That looks really good!

Just to be safe, can you also compare the size of minimized automata by Brzozowski and Hopcroft? If they are actually always same.

Also, maybe we should make it the default minimization algorithm (for the dispatcher function, if we have it + maybe have two versions of the dispatcher function, one take nfa, one dfa).

@Adda0
Copy link
Collaborator

Adda0 commented Nov 30, 2024

Also, maybe we should make it the default minimization algorithm (for the dispatcher function, if we have it + maybe have two versions of the dispatcher function, one take nfa, one dfa).

I agree. Maybe we should have minimize() and minimize_dfa()?

Or even better, minimize() which has a enum class flag AutomatonType with possible values NFA and DFA, by default NFA. And two underlying functions called from this function minimize_nfa() and minimize_dfa(). This would solve all the issues, would be extensible, and allow for reusing the enum class for other functions as well.

@koniksedy
Copy link
Collaborator Author

Just to be safe, can you also compare the size of minimized automata by Brzozowski and Hopcroft? If they are actually always same.

I've already tested it. Apart from Brzozowski creating (for some reason) an automaton with one initial state for an empty automaton, the automata are indeed the same in language and size.

Copy link
Collaborator

@Adda0 Adda0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks great. When we finish the discussion regarding the interface, I will approve and merge the PR.

include/mata/nfa/algorithms.hh Show resolved Hide resolved
src/nfa/operations.cc Outdated Show resolved Hide resolved
src/nfa/operations.cc Show resolved Hide resolved
src/nfa/operations.cc Outdated Show resolved Hide resolved
src/nfa/operations.cc Outdated Show resolved Hide resolved
src/nfa/operations.cc Outdated Show resolved Hide resolved
Copy link
Member

@jurajsic jurajsic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I would still make it a default minimization algorithm, but that can be further PR.

@Adda0 Adda0 merged commit c211c86 into devel Dec 2, 2024
15 checks passed
@Adda0 Adda0 deleted the hopcroft branch December 2, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants