Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

23.3: crash upon exit #3427

Closed
DarthGandalf opened this issue Sep 25, 2023 · 17 comments · Fixed by #3541
Closed

23.3: crash upon exit #3427

DarthGandalf opened this issue Sep 25, 2023 · 17 comments · Fixed by #3541
Assignees
Labels
bug Something likely wrong in the code importance: critical Real showstopper, program fails here state: confirmed A developer can reproduce the issue
Milestone

Comments

@DarthGandalf
Copy link
Contributor

Expected Behaviour

It should exit cleanly

Actual Behaviour

Downloaded 1 files (0 kbytes) in a session of 41.33 sec (average of 0 kB/s + 0 files from cache (0 kB)).
LandscapeMgr: Clearing cache of 0 landscapes totalling about 0 MB.
malloc(): unsorted double linked list corrupted
[1] 22581 IOT instruction stellarium

Steps to reproduce

  1. Launch Stellarium
  2. Wait until it starts
  3. Move mouse to lower edge of the screen
  4. Click Quit button

System

  • Stellarium version: 23.3, qt5
  • Operating system: Gentoo Linux, amd64
  • Graphics Card: GIGABYTE Radeon RX 6700 XT Gaming OC 12G, x11-drivers/xf86-video-amdgpu-23.0.0
  • Screen type (if applicable): 1920x1080

Logfile

log.txt

Stacktrace: gdb.txt

@DarthGandalf
Copy link
Contributor Author

Might be related to #3391 but I don't do anything special with any script

@10110111
Copy link
Contributor

I often notice this message, without any scripts. But doesn't always appear, I can't predict whether it will appear on any launch. Can you reproduce it reliably?

@DarthGandalf
Copy link
Contributor Author

DarthGandalf commented Sep 25, 2023

Yes, so far it happened every time I launched stellarium since upgrade to 23.3 - five times or so, while debugging this crash

@10110111
Copy link
Contributor

Could you try running it under Valgrind?

@DarthGandalf
Copy link
Contributor Author

Doesn't show such error under valgrind

Downloaded 1 files (0 kbytes) in a session of 673.214 sec (average of 0 kB/s + 0 files from cache (0 kB)).
==26533== Warning: set address range perms: large range [0x45753000, 0x579a0000) (noaccess)
==26533== Warning: set address range perms: large range [0x5d896000, 0x7e8c2000) (noaccess)
LandscapeMgr: Clearing cache of 0 landscapes totalling about 0 MB.
StelGLWidget destroyed
==26533==
==26533== HEAP SUMMARY:
==26533== in use at exit: 491,216 bytes in 3,624 blocks
==26533== total heap usage: 58,744,970 allocs, 58,741,346 frees, 30,950,473,774 bytes allocated
==26533==
==26533== LEAK SUMMARY:
==26533== definitely lost: 5,344 bytes in 104 blocks
==26533== indirectly lost: 134,504 bytes in 112 blocks
==26533== possibly lost: 832 bytes in 2 blocks
==26533== still reachable: 350,536 bytes in 3,406 blocks
==26533== suppressed: 0 bytes in 0 blocks
==26533== Rerun with --leak-check=full to see details of leaked memory
==26533==
==26533== Use --track-origins=yes to see where uninitialised values come from
==26533== For lists of detected and suppressed errors, rerun with: -s
==26533== ERROR SUMMARY: 1000 errors from 1000 contexts (suppressed: 0 from 0)

@gzotti
Copy link
Member

gzotti commented Sep 25, 2023

While I have not seen this, it may be related to using std::list in StelPainter instead of Qt5's QLinkedList. Did I mess up something last spring?

EDIT: No, the report says Qt5...

Any idea about the mem leak?

@DarthGandalf
Copy link
Contributor Author

@gzotti I think that's offtopic for the crash issue, but sure:

$ valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes -s stellarium

val.txt.gz

@alex-w alex-w added the bug Something likely wrong in the code label Sep 26, 2023
@alex-w alex-w added this to the 23.4 milestone Sep 26, 2023
@alex-w alex-w added importance: critical Real showstopper, program fails here state: confirmed A developer can reproduce the issue labels Sep 26, 2023
@github-actions
Copy link

Hello @DarthGandalf!

OK, developers can reproduce the issue. Thanks for the report!

@alex-w
Copy link
Member

alex-w commented Nov 18, 2023

Hmm... serious memory leak, this is very bad news :(

@gzotti
Copy link
Member

gzotti commented Nov 19, 2023

The report lists many errors. Many "conditional jump depends on uninitialized values" (any chance for more hints to find which they are and whether we can do anything about them?), and tons of reports in e.g. libLLVM-16.so or even dbus. Where do we use that? LLVM is used in Mesa on Windows, but this is a Linux run. Also Mesa? How did earlier versions behave here?

@alex-w
Copy link
Member

alex-w commented Dec 9, 2023

I hope it fixed in commit a2010ee

@10110111
Copy link
Contributor

10110111 commented Dec 9, 2023

The first memory error on exit consistent with this comment looks like this:

==537294== Invalid read of size 4
==537294==    at 0xE3693D: StelTexture::~StelTexture() (StelTexture.cpp:64)
==537294==    by 0xE36CAF: StelTexture::~StelTexture() (StelTexture.cpp:90)
==537294==    by 0xE32260: QtSharedPointer::CustomDeleter<StelTexture, QtSharedPointer::NormalDeleter>::execute() (qsharedpointer_impl.h:158)
==537294==    by 0xE310FE: QtSharedPointer::ExternalRefCountWithCustomDeleter<StelTexture, QtSharedPointer::NormalDeleter>::deleter(QtSharedPointer::ExternalRefCountData*) (qsharedpointer_impl.h:176)
==537294==    by 0xA2460E: QtSharedPointer::ExternalRefCountData::destroy() (qsharedpointer_impl.h:114)
==537294==    by 0xD12F6A: QSharedPointer<StelTexture>::deref(QtSharedPointer::ExternalRefCountData*) (qsharedpointer_impl.h:445)
==537294==    by 0xD12DCD: QSharedPointer<StelTexture>::deref() (qsharedpointer_impl.h:440)
==537294==    by 0xD12C8F: QSharedPointer<StelTexture>::~QSharedPointer() (qsharedpointer_impl.h:280)
==537294==    by 0xDA1196: StelApp::~StelApp() (StelApp.cpp:307)
==537294==    by 0xDA125F: StelApp::~StelApp() (StelApp.cpp:334)
==537294==    by 0x9FA996: StelMainView::deinit() (StelMainView.cpp:1449)
==537294==    by 0x9C9E4F: main (main.cpp:462)
==537294==  Address 0xec69840 is 16 bytes inside a block of size 80 free'd
==537294==    at 0x483D1CF: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==537294==    by 0xD12788: StelTextureMgr::~StelTextureMgr() (StelTextureMgr.hpp:37)
==537294==    by 0xDA1025: StelApp::~StelApp() (StelApp.cpp:325)
==537294==    by 0xDA125F: StelApp::~StelApp() (StelApp.cpp:334)
==537294==    by 0x9FA996: StelMainView::deinit() (StelMainView.cpp:1449)
==537294==    by 0x9C9E4F: main (main.cpp:462)
==537294==  Block was alloc'd at
==537294==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==537294==    by 0xDA27DC: StelApp::init(QSettings*) (StelApp.cpp:468)
==537294==    by 0x9F44BC: StelMainView::init() (StelMainView.cpp:929)
==537294==    by 0xA00E2C: StelGLWidget::initializeGL() (StelMainView.cpp:139)
==537294==    by 0x4E7F363: QOpenGLWidgetPrivate::initialize() (in /home/ruslan/opt/qt6.4.1/lib/libQt6OpenGLWidgets.so.6.4.1)
==537294==    by 0x4E80217: QOpenGLWidget::event(QEvent*) (in /home/ruslan/opt/qt6.4.1/lib/libQt6OpenGLWidgets.so.6.4.1)
==537294==    by 0x50BEA55: QApplicationPrivate::notify_helper(QObject*, QEvent*) (in /home/ruslan/opt/qt6.4.1/lib/libQt6Widgets.so.6.4.1)
==537294==    by 0x50CA3BF: QApplication::notify(QObject*, QEvent*) (in /home/ruslan/opt/qt6.4.1/lib/libQt6Widgets.so.6.4.1)
==537294==    by 0x6A4B039: QCoreApplication::notifyInternal2(QObject*, QEvent*) (in /home/ruslan/opt/qt6.4.1/lib/libQt6Core.so.6.4.1)
==537294==    by 0x511B196: QWidgetPrivate::show_helper() (in /home/ruslan/opt/qt6.4.1/lib/libQt6Widgets.so.6.4.1)
==537294==    by 0x511E5AA: QWidgetPrivate::setVisible(bool) (in /home/ruslan/opt/qt6.4.1/lib/libQt6Widgets.so.6.4.1)
==537294==    by 0x511B0C7: QWidgetPrivate::showChildren(bool) (in /home/ruslan/opt/qt6.4.1/lib/libQt6Widgets.so.6.4.1)

@alex-w
Copy link
Member

alex-w commented Dec 9, 2023

I hope it fixed in commit a2010ee

No success, I'll revert it

@10110111
Copy link
Contributor

10110111 commented Dec 9, 2023

Oh, I got it. The problem is that StelTexture and StelTextureMgr are needlessly coupled. In particular, the memory access error happens when StelTextureMgr is already destroyed (in StelApp::~StelApp()), and then the data members of StelApp delete the last shared pointers to some StelTextures, whose destructor then tries to access the no longer existing StelTextureMgr.

The design is obviously broken. Textures shouldn't access the texture manager, at least not through a raw pointer. If we must keep it this way, this should at least happen via some weak pointer, e.g. QPointer.

10110111 added a commit to 10110111/stellarium that referenced this issue Dec 9, 2023
On deletion of StelApp StelTextureMgr is deleted, after which some
instances of StelTexture are deleted. But StelTexture destructor logs
VRAM usage via StelTextureMgr, accessing a no longer existing object.

This commit turns the nonstatic raw pointer to texture manager into a
static weak pointer (QPointer), so that it was possible to check whether
texture manager still exists before trying to access it. Making it
static is supposed to prevent useless copying of this smart pointer
while the texture manager is effectively a singleton.

Fixes Stellarium#3427
10110111 added a commit to 10110111/stellarium that referenced this issue Dec 9, 2023
On deletion of StelApp StelTextureMgr is deleted, after which some
instances of StelTexture are deleted. But StelTexture destructor logs
VRAM usage via StelTextureMgr, accessing a no longer existing object.

This commit turns the nonstatic raw pointer to texture manager into a
static weak pointer (QPointer), so that it was possible to check whether
texture manager still exists before trying to access it. Making it
static is supposed to prevent useless copying of this smart pointer
while the texture manager is effectively a singleton.

Fixes Stellarium#3427
10110111 added a commit to 10110111/stellarium that referenced this issue Dec 10, 2023
On deletion of StelApp StelTextureMgr is deleted, after which some
instances of StelTexture are deleted. But StelTexture destructor logs
VRAM usage via StelTextureMgr, accessing a no longer existing object.

This commit turns the nonstatic raw pointer to texture manager into a
static weak pointer (QPointer), so that it was possible to check whether
texture manager still exists before trying to access it. Making it
static is supposed to prevent useless copying of this smart pointer
while the texture manager is effectively a singleton.

Fixes Stellarium#3427
alex-w pushed a commit that referenced this issue Dec 10, 2023
On deletion of StelApp StelTextureMgr is deleted, after which some
instances of StelTexture are deleted. But StelTexture destructor logs
VRAM usage via StelTextureMgr, accessing a no longer existing object.

This commit turns the nonstatic raw pointer to texture manager into a
static weak pointer (QPointer), so that it was possible to check whether
texture manager still exists before trying to access it. Making it
static is supposed to prevent useless copying of this smart pointer
while the texture manager is effectively a singleton.

Fixes #3427
@alex-w alex-w added the state: published The fix has been published for testing in weekly binary package label Dec 10, 2023
Copy link

Hello @DarthGandalf!

Please check the fresh version (development snapshot) of Stellarium:
https://github.com/Stellarium/stellarium-data/releases/tag/weekly-snapshot

@alex-w alex-w removed the state: published The fix has been published for testing in weekly binary package label Dec 23, 2023
Copy link

Hello @DarthGandalf!

Please check the latest stable version of Stellarium:
https://github.com/Stellarium/stellarium/releases/latest

@DarthGandalf
Copy link
Contributor Author

Confirmed fixed, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something likely wrong in the code importance: critical Real showstopper, program fails here state: confirmed A developer can reproduce the issue
Development

Successfully merging a pull request may close this issue.

4 participants