Intersection speedup and refactor #344

kilohsakul · 2023-09-25T14:51:38Z

Mainly better sotrage for map of pairs to states,
and a number of other things.

codecov · 2023-09-25T14:59:12Z

Codecov Report

Attention: 24 lines in your changes are missing coverage. Please review.

Comparison is base (3380500) 72.94% compared to head (8614495) 71.64%.
Report is 10 commits behind head on devel.

Additional details and impacted files

@@            Coverage Diff             @@
##            devel     #344      +/-   ##
==========================================
- Coverage   72.94%   71.64%   -1.31%     
==========================================
  Files          33       30       -3     
  Lines        4144     3636     -508     
  Branches      955      846     -109     
==========================================
- Hits         3023     2605     -418     
+ Misses        740      736       -4     
+ Partials      381      295      -86

Files	Coverage Δ
include/mata/nfa/delta.hh	`90.54% <ø> (ø)`
include/mata/nfa/nfa.hh	`100.00% <ø> (ø)`
include/mata/nfa/plumbing.hh	`94.44% <100.00%> (ø)`
src/nfa/inclusion.cc	`90.09% <100.00%> (ø)`
src/nfa/nfa.cc	`80.16% <100.00%> (+0.16%)`	⬆️
src/nfa/operations.cc	`63.90% <100.00%> (+0.29%)`	⬆️
src/strings/nfa-noodlification.cc	`71.49% <66.66%> (ø)`
src/nfa/delta.cc	`82.71% <80.00%> (-0.08%)`	⬇️
src/nfa/intersection.cc	`79.41% <78.35%> (-3.67%)`	⬇️

... and 10 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/nfa/intersection.cc

vhavlena · 2023-09-25T15:46:17Z

src/nfa/intersection.cc

+    std::vector<State> min_lhs;
+    std::vector<State> max_lhs;
+
+    auto update_ranges = [&min_rhs,&max_rhs,&min_lhs,&max_lhs](State lhs_state, State rhs_state)


This series of lambda functions is quite ugly. I would consider some better code structure or something.

I was going to say that I agree, but as a method with all this stuff as parameters it is even uglier.
In fact I am thinking that this way of doing it is quite succinct and easy to read, even thogh it is not exactly standard.

It might be a bit slower than functions, but I agree that in this case, unless you want to create a class for intersection, the lambdas here simplify the structure and passing parameters around.

david said just write [&], looks nicer now

I think that the problem in this conversion is about having 6 or so lambda one after another, often without any documentation comment explaining what they are for.

now there is much less lamdas and they are simple

src/nfa/intersection.cc

vhavlena · 2023-09-25T15:52:49Z

src/nfa/intersection.cc

+        const StatePost& lhs_state_post{lhs.delta[lhs_source] };
+
+        //TODO: handling of epsilons might not be ideal, don't know, it would need some brain cycles to improve.
+        // (handling of normal symbols is more important and it is ok)


Well I guess this is not true. In noodler we use heavily this epsilon product.

Is there a benchmark with it at tacas?
Anyway, I currently don't know how to do it better.
Lets think about it if we find it slow on some benchmark.

Well, depends on which tacas paper do you mean. For mata we don't use it.

vhavlena · 2023-09-25T15:54:20Z

src/nfa/intersection.cc

+    // And every containment test first asks whether lhs and rhs are in each others ranges.
+    // This is several times faster compared to pure product_vec_map, which is turn is notably faster that one unordered map from pairs to states.
+    // (but Juraj says that rewriting hash function may help unordered map significantly, so maybe that would be enough ... ?)
+    // TODO: where to put this magical constant? It should not be here.


Alternative to that is a vector of sets. First state in a pair serves as an index to the vector, the second state is found in the set using binary search.

I tried, replacing unordered_map with map makes it 40% slower.
StackOverflow people say that this is because memory locality is by far the most important thing when the collections are not too large, and set has pointers everywhere.

Besides, when trying this, I also tried disable the optimization with ranges, and the fucker got faster.
So maybe we can simplify this and remove all that, just keep the switching between matrix and vector of unordered maps. But I will try to see what happened first.

…greation)

include/mata/nfa/algorithms.hh

include/mata/nfa/delta.hh

include/mata/nfa/nfa.hh

include/mata/nfa/algorithms.hh

src/nfa/intersection.cc

…eviews

optimizing the number of searches in the lhs/rhs to product maps

tfiedor

I have few minor suggestions for small speed ups. Other than that, it is quite a beast. It would be good to go through it again after deadlines and try to make it little bit more readable, as it is sometimes heavy.

Also, I'm wondering if we should keep the old version and maybe have some portfolio intersection method deciding wrt e.g. the size of the input automata, since I have a feeling this will work for big automata, but might be worse for smaller ones (since there are lot of vectors, vector of vectors etc.). Experiments will tell.

include/mata/nfa/algorithms.hh

tfiedor · 2023-09-26T14:41:15Z

src/nfa/intersection.cc

+namespace mata::nfa {
+
+Nfa intersection(const Nfa& lhs, const Nfa& rhs, const Symbol first_epsilon, ProductMap *prod_map) {
+


Btw. one small suggestion for potential speedup: test the following:

if lhs.empty: return lhs elif rhs.empty: return rhs

Might show some speed up, if in the product you do some initialization steps.

Might make sense, but I am not sure whether emptiness is really for free, we are changing it now, lets see the performance first.

As I said at meeting, I don't suggest testing language emptiness, but maybe delta emptiness? Or final state emptiness could be applied? Basically anything cheap, that can be used to avoid costly initializations.

fine, I added test for final initial state emptiness

src/nfa/intersection.cc

src/nfa/operations.cc

tfiedor · 2023-09-26T14:51:00Z

tests-integration/src/utils/utils.cc

@@ -62,7 +62,7 @@ int load_automata(
            std::vector<mata::IntermediateAut> mintermized = mintermization.mintermize(inter_auts);
            TIME_END(mintermization);
            for (mata::IntermediateAut& inter_aut : mintermized) {
-                assert(inter_aut.alphabet_type == mata::IntermediateAut::AlphabetType::BITVECTOR);
+                //assert(inter_aut.alphabet_type == mata::IntermediateAut::AlphabetType::BITVECTOR);


Probably don't commit this at all and keep your repo dirty.

Co-authored-by: Tomas Fiedor <[email protected]>

kilohsakul · 2023-09-28T11:02:59Z

I have few minor suggestions for small speed ups. Other than that, it is quite a beast. It would be good to go through it again after deadlines and try to make it little bit more readable, as it is sometimes heavy.

Also, I'm wondering if we should keep the old version and maybe have some portfolio intersection method deciding wrt e.g. the size of the input automata, since I have a feeling this will work for big automata, but might be worse for smaller ones (since there are lot of vectors, vector of vectors etc.). Experiments will tell.

Ok lets see, but I actually don't believe that it can be slower, the vectors should be all much cheaper to allocate then the deltas of the input automata, so it should cost relatively nothing.
Lets bother with it only if we see something bad in the experiment.

kilohsakul · 2023-09-28T12:17:00Z

I uncommenting that assert in uitils.cc in integration-tests but Mmaybe it should no bet failing.
Could you look at it, @tfiedor ? I made an issue.

Co-authored-by: David Chocholatý <[email protected]>

…inates after finding first final state).

Co-authored-by: David Chocholatý <[email protected]>

…ection_faster

kilohsakul · 2023-09-28T15:45:15Z

Lets merge this, no?

tfiedor · 2023-09-28T15:46:28Z

Lets merge this, no?

You are done?

kilohsakul · 2023-09-28T16:19:15Z

Now yes, done.

Adda0

I reviewed the changes again, resolved some unresolved discussions from this PR, and will integrate the changes in another PR.

Adda0 · 2023-10-02T11:21:14Z

@kilohsakul Can the branch be removed? We are trying to keep the number of branches to a minimum in order to keep the repository clean and uncluttered.

kilohsakul added 5 commits September 25, 2023 02:24

fucking works

06b53a8

tests pass

72daba8

done, but sigsegv

6880f4f

almost done, and tests pass with matrix as well as with limits!!!

d5c7ce3

done, olé!

1ffc102

kilohsakul requested review from Adda0, tfiedor, vhavlena and jurajsic September 25, 2023 14:51

vhavlena approved these changes Sep 25, 2023

View reviewed changes

kilohsakul added 2 commits September 25, 2023 18:53

disable the assert in parsing to run the test (cox inter in test inte…

574e0bb

…greation)

is_lang_empty via get_useful_states (much faster)

bfae9a7

Adda0 reviewed Sep 26, 2023

View reviewed changes

kilohsakul added 5 commits September 26, 2023 12:34

before removing ranges to optimize product storage

d2881a2

removed ranges optimization and did some little things asked for in r…

2ceecb1

…eviews

renamed some things

b1a9d72

fix binding

5a46057

adding the reverse product to lhs/rhs map as vectors,

49d5d75

optimizing the number of searches in the lhs/rhs to product maps

tfiedor approved these changes Sep 26, 2023

View reviewed changes

kilohsakul and others added 3 commits September 28, 2023 11:54

Update src/nfa/intersection.cc

0f01d7c

Co-authored-by: Tomas Fiedor <[email protected]>

Update src/nfa/intersection.cc

31fa23c

Co-authored-by: Tomas Fiedor <[email protected]>

Update include/mata/nfa/algorithms.hh

10f110f

Co-authored-by: Tomas Fiedor <[email protected]>

kilohsakul mentioned this pull request Sep 28, 2023

strange assertion fail in integration-tests/utils.cc when parsing #350

Open

kilohsakul added 2 commits September 28, 2023 13:14

tf's review

a375fd1

tf's review

03e911d

kilohsakul and others added 2 commits September 28, 2023 15:50

Update src/nfa/delta.cc

8959594

Co-authored-by: David Chocholatý <[email protected]>

Also optimized language emptiness check a bit (get_useful_states term…

2960deb

…inates after finding first final state).

kilohsakul and others added 5 commits September 28, 2023 16:14

Update src/nfa/intersection.cc

4d54953

Co-authored-by: David Chocholatý <[email protected]>

some smalll things?

12fc2d6

Merge remote-tracking branch 'origin/intersection_faster' into inters…

993be81

…ection_faster

some smalll things?

261c4b7

some smalll things?

7c43663

kilohsakul added 2 commits September 28, 2023 18:01

comment

8025de5

comment

8614495

tfiedor merged commit 025f56a into devel Sep 28, 2023
19 of 20 checks passed

tfiedor deleted the intersection_faster branch September 28, 2023 16:19

kilohsakul restored the intersection_faster branch September 28, 2023 16:24

Adda0 reviewed Oct 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intersection speedup and refactor #344

Intersection speedup and refactor #344

kilohsakul commented Sep 25, 2023

codecov bot commented Sep 25, 2023 •

edited

Loading

vhavlena Sep 25, 2023

kilohsakul Sep 26, 2023 •

edited

Loading

Adda0 Sep 26, 2023

kilohsakul Sep 26, 2023

Adda0 Sep 26, 2023

kilohsakul Sep 26, 2023

vhavlena Sep 25, 2023

kilohsakul Sep 26, 2023

vhavlena Sep 27, 2023

vhavlena Sep 25, 2023

kilohsakul Sep 26, 2023

tfiedor left a comment

tfiedor Sep 26, 2023

kilohsakul Sep 28, 2023

tfiedor Sep 28, 2023

kilohsakul Sep 28, 2023

kilohsakul Sep 28, 2023

tfiedor Sep 26, 2023

kilohsakul commented Sep 28, 2023

kilohsakul commented Sep 28, 2023

kilohsakul commented Sep 28, 2023

tfiedor commented Sep 28, 2023

kilohsakul commented Sep 28, 2023

Adda0 left a comment •

edited

Loading

Adda0 commented Oct 2, 2023

		namespace mata::nfa {

		Nfa intersection(const Nfa& lhs, const Nfa& rhs, const Symbol first_epsilon, ProductMap *prod_map) {

Intersection speedup and refactor #344

Intersection speedup and refactor #344

Conversation

kilohsakul commented Sep 25, 2023

codecov bot commented Sep 25, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

kilohsakul Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tfiedor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kilohsakul commented Sep 28, 2023

kilohsakul commented Sep 28, 2023

kilohsakul commented Sep 28, 2023

tfiedor commented Sep 28, 2023

kilohsakul commented Sep 28, 2023

Adda0 left a comment • edited Loading

Choose a reason for hiding this comment

Adda0 commented Oct 2, 2023

codecov bot commented Sep 25, 2023 •

edited

Loading

kilohsakul Sep 26, 2023 •

edited

Loading

Adda0 left a comment •

edited

Loading