Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce: Simulation algorithm #434

Draft
wants to merge 8 commits into
base: devel
Choose a base branch
from
Draft

Conversation

samo538
Copy link

@samo538 samo538 commented Aug 26, 2024

Implementation of a reduction algorithm that uses simulation. The pseudo code for this algorithm can be viewed here: https://github.com/ondrik/iny-fix/blob/master/main.pdf

Copy link
Member

@ondrik ondrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments about how to improve the code

@@ -56,6 +56,126 @@ namespace {
return lts_for_simulation.compute_simulation();
}

bool vec_contains(std::vector<std::pair<State, State>> vec, std::pair<State, State> par){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. instead of passing the first argument by value, it should be better to pass by constant reference
  2. anyway, using a vector here does not look like a good choice

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function no longer exists .. (Tested the algorithm a bit after removing it ... it made the code like 3x faster)

Nfa result_nfa;
Nfa reverted_nfa;

std::vector<std::pair<State, State>> result_sim {}; // R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of std::vector, some other data structure with faster lookup would be better (sparse bitmap/std::set/std::unordered_set, ...)

size_t ***matrix;

// Matrix allocation
matrix = static_cast<size_t ***>(malloc(sizeof(size_t **) * alph_syms.size()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. no malloc/free in C++, better use new/delete
  2. anyway, the best is to completely avoid dynamic allocation and just use a std::vector
  3. also, no need to do nested arrays, just use one-dimensional and write index functions to access the right cell
  4. size_t probably an overkill, a 32b counter should be enough

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using one dimensional array now.

// ! Propagate until fixpoint
size_t worklist_size;
std::pair<State, State> working_pair;
while ((worklist_size = worklist.size()) != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just use worklist.size()? compiler should be able to optimize the two calls...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for (non-)emptiness should be done using worklist.empty(), optionally with ! in front of the expression for non-emptiness. And we can then use size() inside the loop body.

@ondrik ondrik self-assigned this Aug 26, 2024
Copy link
Collaborator

@Adda0 Adda0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I added some initial thoughts about the code quality and possible optimizations. A more thorough review will follow later on when the PR is more mature.

When the algorithm is finished, tests for the added function(s) should be added, too.

}

Nfa reduce_testing(const Nfa& aut){
// ! Preprocessing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the format // ! supposed to mean something specific? Like a critical section or something? I use this notation to highlight something important. Like a note of warning etc. This does not look like this case, however.

Copy link
Author

@samo538 samo538 Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pseudo code for this algorithm is "split" into sections with these descriptions. I tried make clear which parts of the real code represent which parts of the pseudo code.

Copy link
Collaborator

@Adda0 Adda0 Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Great idea. And it makes reading the code much easier. I would however use a more readable notation. Something like // Pseudocode: Preprocessing. or similar. Otherwise, everyone will be unsure of what the // ! should represent.

std::vector<std::pair<State, State>> worklist {}; // Worklist

// Alphabet extraction
mata::OnTheFlyAlphabet alph;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are extracting the alphabet from the automaton, you could also add the alphabet as an optional parameter (probably const Alphabet* alphabet = nullptr) to the reduction function and fill the alphabet manually here only when the optional parameter is left empty.

aut.fill_alphabet(alph);
std::vector<Symbol> alph_syms = alph.get_alphabet_symbols().to_vector();

size_t no_states = aut.num_of_states();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you define variables which will not change in the future, use const to help the optimizer optimize the code further.

PS: We usually use num_<something>, <something>_num etc. for variable containing a number of something in Mata.

std::vector<Symbol> alph_syms = alph.get_alphabet_symbols().to_vector();

size_t no_states = aut.num_of_states();
size_t ***matrix;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least a comment would be needed here describing what the variables holds, but as @ondrik said, we should avoid dynamic allocation entirely, or if it is completely necessary, use C++ smart pointers such as std::shared_ptr or std::unique_ptr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved this

return false;
}

Nfa reduce_testing(const Nfa& aut){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name should be descriptive. Something like reduce_by_<name_of_the_algorithm> or similar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The new name is a modified copy of an already existing function that computes the simulation relation (just in a different way)

@@ -56,6 +56,126 @@ namespace {
return lts_for_simulation.compute_simulation();
}

bool vec_contains(std::vector<std::pair<State, State>> vec, std::pair<State, State> par){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, one should not create a custom function to search a container, unless they want to use some special non-standard search algorithm, but instead use std::find, or <container_instance>.find() != <container_instance.end()> or <container_instance>.contains() etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function no longer exists

}
for (State p: (*symbol_p_).targets.to_vector()) {
if (!vec_contains(result_sim, std::pair(p,q))) {
result_sim.push_back(std::pair(p,q)); // R append
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of push_back(std::pair(...)), one can use emplace_back(...) to construct the pair in-place at the location in the vector.

Comment on lines 169 to 180
//Printig of the final relation:
std::cout << "The final relation is:" << std::endl;
for (std::pair final: result_sim){
std::cout << final.first << " , " << final.second << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not forget to remove the debug prints later on.

@samo538 samo538 force-pushed the simulation branch 5 times, most recently from 5e66ce5 to 4ff753a Compare September 4, 2024 17:52
@ondrik
Copy link
Member

ondrik commented Sep 4, 2024

what's the issue with the automata in 4ff753a ?

@samo538
Copy link
Author

samo538 commented Sep 5, 2024

what's the issue with the automata in 4ff753a ?

Functions load_automata() and load_automaton() used in benchmarking codes could not parse these automata because they used keywords 'true' and 'false'. I consulted this with @Adda0 and changed all occurences of 'true' to '/true' which the functions can parse (and did the same with 'false'). However, i dont know why github shows huges changes in some files (which at the first glance dont change anything, like in tests-integration/automata/b-armc-incl-easiest/aut2.mata).

@samo538 samo538 force-pushed the simulation branch 7 times, most recently from 544c62d to eb4cf7f Compare September 11, 2024 15:24
@samo538 samo538 force-pushed the simulation branch 2 times, most recently from a3e561f to 7d41d03 Compare October 25, 2024 13:31
@Adda0
Copy link
Collaborator

Adda0 commented Oct 31, 2024

@samo538 Could you move the fix for the true and false values in .mata format to a separate PR? We could iterate quickly on this and merge the changes. And resolve why seemingly unrelated lines in the files were changed. Maybe trimming trailing newlines?

@samo538 samo538 force-pushed the simulation branch 2 times, most recently from 8dc0d17 to 6302fe7 Compare November 26, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants