Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for crash when using NSAutoreleasePool on macOS #1414

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MarkOates
Copy link
Contributor

Problem

This pull request fixes a crash that can occur when using NSAutoreleasePool during al_destroy_display in macOS. The crash appears inconsistently, at different times, with different types of crashes, and sometimes not at all - but only occurs after a call to al_destroy_display(). Also, the crashes only began to appear after I upgraded macOS to Ventura (13.1) on M1.

I believe it's the result of a double-free.

Possible Solution

To fix this issue, the line al_free(d) was removed from the end of al_destroy_display(). In this case, I'm guessing the declaration of ALLEGRO_DISPLAY_OSX_WIN* dpy = (ALLEGRO_DISPLAY_OSX_WIN*) d; at the top of the function within the scope of a NSAutoreleasePool may have resulted in the [pool drain] destroying dpy. Then, with al_free(d) having already destroyed the resource, the crash could possibly occur at some managed point after a pool drain, once the draining thread had a chance to address it.

Now, I'm not familiar with Objective-C and not entirely confidant with how Allegro 5 manages its memory and contexts internally, so this is just a guess to the solution/cause. Removing the al_free(d) (or removing the [pool drain] line entirely) did stop the crashes.

Recreating the Issue

This issue only appeared on macOS (Ventura 13.1, M1), with the fix here resulting in no more crashing.

Also, I tested the change on macOS (Monterey 12.5.1, Intel), which did not produce the issue before, and it still continues to run normally.

Easiest way to reproduce the problem (if you can reproduce the ✨secret hardware configuration✨) might be with this code below. Running the program will likely reproduce a crash 9 out of 10 runs.

#include <allegro5/allegro.h>
#include <iostream>

int main(int argc, char** argv)
{
   al_init();
   ALLEGRO_DISPLAY *display = nullptr;

   int attempts = 20;

   for (int attempt=0; attempt<attempts; attempt++)
   {
      std::cout << "creating display (attempt " << attempt << ")" << std::endl;
      display = al_create_display(1920, 1080);
      std::cout << "   display created" << std::endl;
      std::cout << "destroying display" << std::endl;
      al_destroy_display(display);
      std::cout << "   display destroyed" << std::endl;
   }

   std::cout << "Process finished." << std::endl;

   al_uninstall_system();
   return 0;
}

@MarkOates
Copy link
Contributor Author

I've attached some crash report info here:

Zip contains the results of 10 runs. Each run that crashed (9 of 10) includes a backtrace of all the threads, and if a crash report was output to the terminal that is included as well.

The terminal's output of the crash reports did appear threaded, so it's possible there could be multiple crash reports threaded together in the output. Backtraces are clean though.

crash_reports.zip

@allefant
Copy link
Contributor

allefant commented Mar 2, 2023

How did you create the backtraces? E.g. the stack trace in run05-crash-report.txt does not match the backtrace in run05-backtrace.txt, ideally it would be the one from the crash but it seems to be from some time afterwards?

@allefant
Copy link
Contributor

allefant commented Mar 2, 2023

I usually do this:

lldb ./my-app
r
(wait for crash)
bt all

Then the console should contain the crash output as well as the backtraces.

@MarkOates
Copy link
Contributor Author

MarkOates commented Mar 2, 2023

lldb ./my-app
r
(wait for crash)
bt all

Yes, that's actually exactly what I did. The stack traces and crash reports do not match. 🤷‍♂️

Here are some screenshots of the terminal (different kinds of crashes, some with reports output to terminal, some not): more_crash_reports-screenshots.zip

It's honestly the weirdest problem I've ever observed/debugged.

@pedro-w
Copy link
Contributor

pedro-w commented Mar 2, 2023

I noticed the crash seems to happen quite often when creating the new display, in [NSOpenGLContext makeCurrentContext] where presumably making a new context current means making the old context not current, but the old context might have been destroyed by then. Is that what you're thinking? (sorry just trying to catch up)
At this point we're in the main thread and we don't do graphics on the main thread, so the reason for making the new context current is (maybe?) just so the calls to glGetString() will succeed in osxgl.m:1640-42.
I don't know if releasing a context when it's current automatically causes it to be made not current, so there are two possible things to look at

  • call [NSOpenGLContext clearCurrentContext] when destroying a display
  • make sure the release is called on the same thread as the alloc, currently it isn't as the former is on the user's thread and the latter is on the main thread

Could you try making some changes to osxgl.m as below and seeing if it makes any difference? Frustrating because I don't have your hardware and I can't ever get it to crash like this.

diff --git a/src/macosx/osxgl.m b/src/macosx/osxgl.m
index a4db583e0..69319365c 100644
--- a/src/macosx/osxgl.m
+++ b/src/macosx/osxgl.m
@@ -1774,9 +1774,11 @@ static void destroy_display(ALLEGRO_DISPLAY* d)
          [dpy->win close];
          dpy->win = nil;
       }
+      // Context was alloc'd on the main thread so should be released on the same thread
+      [NSOpenGLContext clearCurrentContext];
+      [dpy->ctx release];
    });
    _al_ogl_unmanage_extensions(&dpy->parent);
-   [dpy->ctx release];
    [dpy->cursor release];
    _al_event_source_free(&d->es);
    al_free(d->ogl_extras);

@MarkOates
Copy link
Contributor Author

Could you try making some changes to osxgl.m as below and seeing if it makes any difference?

I tried out these changes but unfortunately it didn't seem to have any affect on the amount/number of crashes.

vi 2023-03-03 11-54-55

Frustrating because I don't have your hardware and I can't ever get it to crash like this.

I hear that. I wish there was something I could do to mitigate it. Bear in mind, I'd be more than happy to attempt any other ideas you may want to try out though.

@pedro-w
Copy link
Contributor

pedro-w commented Mar 5, 2023

I'm running a bit low on ideas. Still thinking about the fact the crash seems to come from with [NSOpenGLContext dealloc] could you try moving the call to [NSOpenGLContext clearCurrentContext] (line 1780) in your screenshot to the start of the function (just after the line unsigned int i)?

Because al_free is just a wrapper around free, if removing it fixes the crash it must be that something is holding a reference to part of the display structure which is modified (i.e. corrupted) by the action of free.

@pedro-w
Copy link
Contributor

pedro-w commented Mar 5, 2023

One other thing - I don't think the autorelease pool is needed in destroy_display, maybe try getting rid of it altogether - if it was needed you should see a message printed to the console when you run it

@MarkOates MarkOates force-pushed the remove-double-free branch from b0f62ed to 42588c0 Compare March 28, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants