Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use large threads for processing Boogie programs #710

Merged
merged 10 commits into from
Mar 31, 2023

Conversation

keyboardDrummer
Copy link
Collaborator

@keyboardDrummer keyboardDrummer commented Mar 29, 2023

Fixes Dafny issue dafny-lang/dafny#3803 and dafny-lang/dafny#3819

Testing

  • Updated nesting.bpl file so it uses triggers the stack overflow encountered in the above issues

@keyboardDrummer keyboardDrummer enabled auto-merge (squash) March 29, 2023 17:28
@robin-aws
Copy link
Contributor

I'm a fan of the general approach, and it's nice there's already a single choke point where we're already delegating to another thread anyway. My main concern is the dependency we're taking on a random solution that isn't cleanly distributed or necessarily maintained, but I could be convinced if this is literally the only existing solution out there (with a LOT more dedicated testing).

@@ -0,0 +1,325 @@
// Code taken from https://github.com/ChadBurggraf/parallel-extensions-extras/blob/ec803e58eee28c698e44f55f49c5ad6671b1aa58/TaskSchedulers/WorkStealingTaskScheduler.cs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, is this really the best option out there for this functionality? No commonly used nuget package out there?

Copy link
Collaborator Author

@keyboardDrummer keyboardDrummer Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, and a NuGet package of this wouldn't work since I had to edit to code to enable customizing the stack size. Doing so is apparently something that nobody does since there's no library code anywhere to do it :-D

Anyways, I replaced this code with something much simpler written by hand. I'm guessing it's less performant but I think that's fine - this is a temporary solution anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I just started looking at the comment trail on this PR and it took a while for things to make sense. It sounds like @keyboardDrummer was using a copied version of some publicly-available code but has now changed the approach to a simple threadpool class with a large stack size. I saw the new class and indeed it looks quite simple.

Is there any reason to gate the use of the new threadpool under a command-line flag? In other words, do you think it is possible for the new threadpool to be worse than the custom threadpool in some way. If so, might be better to use a command-line option, especially since the solution is temporary anyway.

Copy link
Collaborator Author

@keyboardDrummer keyboardDrummer Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to gate the use of the new threadpool under a command-line flag? In other words, do you think it is possible for the new threadpool to be worse than the custom threadpool in some way. If so, might be better to use a command-line option, especially since the solution is temporary anyway.

Note that older Boogies, like 2.9.4, also spawned 16MB stack threads, as you can see here: https://github.com/boogie-org/boogie/blob/v2.9.4/Source/ExecutionEngine/ExecutionEngine.cs#L433. They spawned one thread for each Boogie implementation: https://github.com/boogie-org/boogie/blob/v2.9.4/Source/ExecutionEngine/ExecutionEngine.cs#L1018
which could be garbage collected after an implementation had finished running, so the memory usage was limited to 16MB * implementations_running_concurrently.

I've updated this PR so it also spends 16MB * implementations_running_concurrently of memory, so the memory profile should be similar to Boogie 2.9.4.

Regarding running time performance, I think the behavior here will be better than 2.9.4 because threads are re-used between implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please compare the running time of the current Boogie and your new version on a few large-ish examples, potentially generated from Dafny. Another option could be to compare the running time for the entire Boogie regression testsuite. Just a sanity check. I expect your intuition that there wouldn't be any significant difference to be valid.

Source/ExecutionEngine/WorkStealingQueue.cs Outdated Show resolved Hide resolved
@robin-aws
Copy link
Contributor

Regardless of the outcome here, let's also cut an explicit issue for the root cause of traversing ASTs recursively, which your dafny-lang/dafny#2124 work might be the deeper solution for.

@keyboardDrummer
Copy link
Collaborator Author

Regardless of the outcome here, let's also cut an explicit issue for the root cause of traversing ASTs recursively, which your dafny-lang/dafny#2124 work might be the deeper solution for.

#711

Copy link
Contributor

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks - I'm much more comfortable with this! Just a couple of comment suggestions.

private Task BlockUntilTaskIsAvailable()
{
isWorkAvailable.Wait();
queue.TryDequeue(out var task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a comment saying we don't need to check the return value or if task is null because of the separate semaphore (too bad ConcurrentQueue doesn't have a plain blocking Dequeue method).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the implementation to use AsyncQueue, which has the behavior you were hoping for from ConcurrentQueue.

@@ -26,10 +26,14 @@ public record ProcessedProgram(Program Program, Action<VCGen, Implementation, Ve

public class ExecutionEngine : IDisposable
{
private static readonly CustomStackSizePoolTaskScheduler largeThreadScheduler = CustomStackSizePoolTaskScheduler.Create(16 * 1024 * 1024, Environment.ProcessorCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a quick comment stating why we need such a deep stack, maybe point to #711

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@shazqadeer
Copy link
Contributor

Curious what the core underlying problem is---traversal of deep Boogie ASTs or traversal of deep VCExpr's?

@keyboardDrummer
Copy link
Collaborator Author

Curious what the core underlying problem is---traversal of deep Boogie ASTs or traversal of deep VCExpr's?

Traversal of deep VCExprs is what we've run into in practice.

Copy link
Contributor

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So little code! Love it! :)

@keyboardDrummer keyboardDrummer merged commit ec72064 into boogie-org:master Mar 31, 2023
@keyboardDrummer keyboardDrummer deleted the largeThreads branch March 31, 2023 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants