ILP-based extractor that produces an optimal dag-cost extraction #30

TrevorHansen · 2023-12-28T05:40:34Z

I mentioned in #24 that I've made an updated ILP extractor that given enough time, will produce an optimal (least-cost) DAG-extaction. It's my preference to combine this and the work in #24, so we have a faster & simple ILP extractor that can produce optimal DAG extractions.

I now realise, either the bottom-up, or faster-bottom-up will produce an optimal tree-extraction. So with this ILP extractor we can now produce optimal dag or tree extractions.

I'm not sure if it's widely appreciated, but the current ILP extractor isn't good. Looking at the details below, you can see the the current/old extractor is about 95 times slower than the faster-greedy-dag extractor and produces extractions that are much worse than it. I can't see a reason to use the current/old ILP extractor.

cumulative time for faster-greedy-dag: 3146ms
cumulative dag cost for faster-greedy-dag: 77029

(10 second timeout)
cumulative time for ilp-cbc: 1039220ms
cumulative dag cost for ilp-cbc: 76100

cumulative time for ilp-cbc-old: 297838ms
cumulative dag cost for ilp-cbc-old: 118031

cumulative time for ilp-cbc: 1039220ms
cumulative time for ilp-cbc-old: 297838ms
cumulative tree cost for ilp-cbc: 32017750412190
cumulative tree cost for ilp-cbc-old: 35474620331874
cumulative dag cost for ilp-cbc: 76100
cumulative dag cost for ilp-cbc-old: 118031
ilp-cbc / ilp-cbc-old
geo mean
tree: 0.6204
dag: 0.6842
micros: 22.8171
quantiles
tree:   0.0229, 0.5030, 0.5840, 0.8510, 2.3333
dag:    0.0233, 0.5762, 0.6541, 0.8865, 1.1095
micros: 0.1439, 9.9892, 31.2070, 63.5396, 461.6565

The ILP solver in this PR is slow on some instances, with some taking >10 hours to optimise. So has a timeout, that if reached, will return the results from the faster-greedy-dag, which is fast and kinda-ok.

Here are the results at some timeouts for 260 test inputs.

2 second timeout, dag cost 77,011, 343 second cumulative, 125 unfinished
10 second timeout, dag cost  76,100, 1,039 second cumulative, 86 unfinished
30 second timeout, dag cost  75,894, 2,580 second cumulative, 71 unfinished
100 second timeout, dag cost 75,866, 6,509 second cumulative, 50 unfinished

oflatt · 2023-12-30T13:41:13Z

Does this already combine with #24?

TrevorHansen · 2023-12-31T01:27:36Z

Does this already combine with #24?

Not yet. Hopefully when they're combined, and this ILP extractor is using good_lp/highs instead of CBC solver the runtime will drop by 1/3 (like @averyanalex has demonstrated with the current ILP extractor).

Going forward, I'm imagining something like this:

Merge the good_lp/highs ILP extractor (Add good_lp based (with highs solver) extractor #24) into main.
Merge this ILP extractor (ILP-based extractor that produces an optimal dag-cost extraction #30) into main.
Merge the better test code (Test code #31) into main.
Using the better test code, confirm that the updated extractors(Add good_lp based (with highs solver) extractor #24, ILP-based extractor that produces an optimal dag-cost extraction #30) are sound.
Change this extractor to use good_lp/highs

oflatt · 2023-12-31T17:28:10Z

Sounds like a good plan!

oflatt · 2023-12-31T17:28:19Z

Let me know if you are ever blocked on a review

TrevorHansen · 2024-01-01T02:37:36Z

Let me know if you are ever blocked on a review

Thanks @oflatt, I'm happy with this PR now, who should I ask to review it?

…thub workflow

averyanalex · 2024-01-01T09:32:02Z

Does this already combine with #24?

Not yet. Hopefully when they're combined, and this ILP extractor is using good_lp/highs instead of CBC solver the runtime will drop by 1/3 (like @averyanalex has demonstrated with the current ILP extractor).

Going forward, I'm imagining something like this:

* Merge the good_lp/highs ILP extractor ([Add good_lp based (with highs solver) extractor #24](https://github.com/egraphs-good/extraction-gym/pull/24)) into main.

* Merge this ILP extractor ([ILP-based extractor that produces an optimal dag-cost extraction #30](https://github.com/egraphs-good/extraction-gym/pull/30)) into main.

* Merge the better test code ([Test code #31](https://github.com/egraphs-good/extraction-gym/pull/31)) into main.

* Using the better test code, confirm that the updated extractors([Add good_lp based (with highs solver) extractor #24](https://github.com/egraphs-good/extraction-gym/pull/24), [ILP-based extractor that produces an optimal dag-cost extraction #30](https://github.com/egraphs-good/extraction-gym/pull/30)) are sound.

* Change this extractor to use good_lp/highs

It might make sense to merge #30 first, and then I'll try to port the updated extractor to good_lp.

mwillsey · 2024-01-01T18:23:29Z

I like the overall plan, and I like the consolidation of extractor variants (too many makes it hard to compare). The problem with optimal ILP is cycle breaking, and I'm not 100% clear on how this approach works on that. Can you include a comment or something on how cycles are avoided? It seems like cycle avoidance strategies will be a major distinguishing factor for DAG-based extractors.

TrevorHansen · 2024-01-01T22:46:54Z

... The problem with optimal ILP is cycle breaking, and I'm not 100% clear on how this approach works on that. Can you include a comment or something on how cycles are avoided? It seems like cycle avoidance strategies will be a major distinguishing factor for DAG-based extractors.

Sure, I've added this description to the code:

To block cycles, we enforce that a topological ordering exists on the extraction.
Each class is mapped to an integer variable (called its level). Then for each node,
we add a constraint that if a node is active, then the level of the class the node
belongs to must be less than than the level of each of the node's children.

To create a cycle, the levels would need to decrease, so they're blocked. For example,
given a two class cycle: if class A, has level 'l', and class B has level 'm', then
'l' must be less than 'm', but because there is also an active node in class B that
has class A as a child, 'm' must be less than 'l', which is a contradiction.

mwillsey · 2024-01-02T15:51:58Z

is it being an integer variable important? Could it be a real-valued one? That might be easier for the solver.

…ers for the levels

TrevorHansen · 2024-01-03T14:27:25Z

is it being an integer variable important? Could it be a real-valued one? That might be easier for the solver.

Nicely spotted! Combined with some other changes it reduced the runtime by about 15%.

better ilp extractor

36f05d0

TrevorHansen mentioned this pull request Dec 29, 2023

Test code #31

Merged

TrevorHansen mentioned this pull request Dec 31, 2023

Why is fast-greedy-dag worse in terms of dag size? #19

Open

TrevorHansen changed the title ~~better ILP extractor~~ ILP-based extractor that produces an optimal dag-cost extraction Jan 1, 2024

Separate extractor with timeout and without timeout

8c1f9d7

TrevorHansen added 5 commits January 1, 2024 13:46

fix build. enable ilp-cbc by default

a2e2b6d

fix to use ilp-cbc with timeout

af94572

Move the timeout to the caller

9486d8e

Removing cbc because I don't have permission to install cbc during gi…

0a36b03

…thub workflow

re-enable global-greedy-dag

af2eeaa

add description of how cycles are blocked

8619f7f

TrevorHansen added 2 commits January 4, 2024 01:06

Following the suggestion of @mwillsey to use floats rather than integ…

4be170d

…ers for the levels

fix layout

931c3b5

Merge remote-tracking branch 'main/main' into om11

35d28b2

mwillsey merged commit 84d67bf into egraphs-good:main Jan 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ILP-based extractor that produces an optimal dag-cost extraction #30

ILP-based extractor that produces an optimal dag-cost extraction #30

TrevorHansen commented Dec 28, 2023

oflatt commented Dec 30, 2023

TrevorHansen commented Dec 31, 2023

oflatt commented Dec 31, 2023

oflatt commented Dec 31, 2023

TrevorHansen commented Jan 1, 2024

averyanalex commented Jan 1, 2024

mwillsey commented Jan 1, 2024

TrevorHansen commented Jan 1, 2024

mwillsey commented Jan 2, 2024

TrevorHansen commented Jan 3, 2024

ILP-based extractor that produces an optimal dag-cost extraction #30

ILP-based extractor that produces an optimal dag-cost extraction #30

Conversation

TrevorHansen commented Dec 28, 2023

oflatt commented Dec 30, 2023

TrevorHansen commented Dec 31, 2023

oflatt commented Dec 31, 2023

oflatt commented Dec 31, 2023

TrevorHansen commented Jan 1, 2024

averyanalex commented Jan 1, 2024

mwillsey commented Jan 1, 2024

TrevorHansen commented Jan 1, 2024

mwillsey commented Jan 2, 2024

TrevorHansen commented Jan 3, 2024