Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next hangs after a while (Guix build) #680

Closed
Ambrevar opened this issue Apr 19, 2020 · 34 comments
Closed

Next hangs after a while (Guix build) #680

Ambrevar opened this issue Apr 19, 2020 · 34 comments
Labels
2-series Related to releases whose major version is 2.

Comments

@Ambrevar
Copy link
Member

It seems that Next hangs after a while.

It happened to me after many hours, and it seems when I tried to open a URL externally.
(Could be a red herring.)

Possible causes:

  • Unhandled exception at the FFI / javascript level. If so, it should be easy to fix.
  • Corruption at the FFI library level (like the C-v issue). If so, we need to fix the libraries or find a way to not trigger the issue.

@jellelicht If you have a recipe please share :)

@Ambrevar Ambrevar added the 2-series Related to releases whose major version is 2. label Apr 19, 2020
@Ambrevar
Copy link
Member Author

@jellelicht reported the following strace:

...
clone(child_stack=0x7f81f766bfb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[15133], tls=0x7f81f766c700, child_tidptr=0x7f81f766c9d0) = 15133
sched_getaffinity(15133, 32, [0, 1, 2, 3, 4, 5, 6, 7]) = 32
futex(0x7f824c2578d8, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(4, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(4, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 3, 35648) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\5\0\0\0\0\0\0\0", 16)         = 8
recvmsg(4, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(4, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 3, 35648) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\1\0\0\0\0\0\0\0", 16)         = 8
recvmsg(4, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(4, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 3, 35648) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\1\0\0\0\0\0\0\0", 16)         = 8
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=11441, si_uid=1000} ---
futex(0x7f825101ebc0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 8
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=11441, si_uid=1000} ---
futex(0x7f825101ebc0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 8
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
write(5, "\1\0\0\0\0\0\0\0", 8)         = 8
futex(0x7f824c2f9160, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=11441, si_uid=1000} ---
futex(0x7f825101ebc0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=11441, si_uid=1000} ---
futex(0x7f825101ebc0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f824c2f9160, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=11441, si_uid=1000} ---
futex(0x7f825101ebc0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f824c2f9160, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=11441, si_uid=1000} ---
futex(0x7f825101ebc0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f824c2f9160, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=11441, si_uid=1000} ---
futex(0x7f825101ebc0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=11441, si_uid=1000} ---
futex(0x7f825101ebc0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
...

And it keeps looping on SIGUSR1 forever.

@jmercouris
Copy link
Member

Possibly a starvation issue? Two processes trying to access the same resource? That's the only reason I can think of why it cannot accquire a lock

@Ambrevar
Copy link
Member Author

Ambrevar commented Apr 24, 2020 via email

@jmercouris
Copy link
Member

I never experience hangs, no. Even if a particular view hangs (which I can reliably do by trying to play any audio...) I can always kill the buffer and load another.

@Ambrevar
Copy link
Member Author

Ambrevar commented May 1, 2020

I can reproduce the hang after 10 minutes using the Guix recipe.

@jmercouris
Copy link
Member

What is the Guix recipe?

@Ambrevar
Copy link
Member Author

Ambrevar commented May 1, 2020 via email

@jmercouris
Copy link
Member

Are you saying that if you use the Guix recipe to build Next it hangs after 10 minutes, but otherwise it hangs after an indeterminate out of time?

@Ambrevar
Copy link
Member Author

Ambrevar commented May 1, 2020 via email

@Ambrevar
Copy link
Member Author

Ambrevar commented May 6, 2020

I've updated most Common Lisp Guix packages: now Next should be using approximately the same Common Lisp library versions when built with Guix or Quicklisp.

That said, I am still experiencing the issue with Guix.
Also see issue #593 which may be related.

  • Either I've missed a key library in Guix.
  • Or the Guix build system produces libraries different from those loaded from Quicklisp. Guix uses "compile-bundle-op", maybe this breaks CFFI / GTK / WebKit somehow.
  • Or...? Any idea?

@khinsen @jellelicht @arunisaac

@Ambrevar Ambrevar changed the title Next hangs after a while Next hangs after a while (Guix build) May 6, 2020
@Ambrevar
Copy link
Member Author

Ambrevar commented May 7, 2020

Tried something else: In Guix, sbcl-cffi uses the gnu-build-system gcc, which is gcc-7 if I'm not mistaken. I tried switching to gcc-9 (the one I use in my profile) to no avail.

@khinsen
Copy link
Contributor

khinsen commented May 7, 2020

I have never had Next hang at all. I rarely use it for long under Guix (I start the Guix virtual machine only for precise needs, as it occupies a lot of memory), but under macOS I only quit Next to update to a newer commit. Still, it never hangs.

Could the communication between the two processes via dbus be a potential cause? My impression is that all UI events pass through it, so if it blocks, Next would appear to hang.

@Ambrevar
Copy link
Member Author

Ambrevar commented May 7, 2020 via email

@khinsen
Copy link
Contributor

khinsen commented May 7, 2020

Well, I am using Next on master. I wasn't aware that it doesn't use DBus any more!

@Ambrevar
Copy link
Member Author

Ambrevar commented May 7, 2020 via email

@jmercouris
Copy link
Member

jmercouris commented May 7, 2020

@Ambrevar

what if you run Next in a VM with ssh X forwarding in a Guix build, does it crash then?

@jmercouris
Copy link
Member

The VM will help you determine if it is hardware dependent (unless you are using a pass through access to hardware directly)

@Ambrevar
Copy link
Member Author

Ambrevar commented May 7, 2020 via email

@jmercouris
Copy link
Member

I don't know enough about virtualization to comment with any certainty. In any case, I would try it

@arunisaac
Copy link

arunisaac commented May 7, 2020 via email

@Ambrevar
Copy link
Member Author

Ambrevar commented May 8, 2020 via email

@arunisaac
Copy link

arunisaac commented May 8, 2020 via email

@Ambrevar
Copy link
Member Author

Ambrevar commented May 8, 2020 via email

@jmercouris
Copy link
Member

Congratulations on the fix!

@jellelicht
Copy link

jellelicht commented May 8, 2020

Is there anything I can do to easily test this already? I'm also not quite sure why a patch to guix' asdf-build-system has anything to do with the now-used gnu-build-system 🤔

EDIT: or is it because all of the cl-source packages were kind of 'broken' before?

@Ambrevar
Copy link
Member Author

Ambrevar commented May 8, 2020 via email

@Ambrevar
Copy link
Member Author

Ambrevar commented May 8, 2020

After more testing, I've narrowed down the issue to sbcl-cl-cffi-gtk: switching to cl-cffi-gtk fixes the issue. I'll report upstream.

@jellelicht
Copy link

Well, it seems to work (20 minutes in and no hangs). Congrats on the fix!

@Ambrevar
Copy link
Member Author

Ambrevar commented May 8, 2020 via email

@arunisaac
Copy link

arunisaac commented May 9, 2020 via email

@Ambrevar
Copy link
Member Author

Ambrevar commented May 9, 2020 via email

@ghost
Copy link

ghost commented May 20, 2020

I've built current master, than guix package -f build-scripts/guix.scm and getting high CPU usage after 10-ish minutes. There is no info in terminal, not even with -v flag, next just freezes and fails to redisplay after switching windows/desktops. I'm willing to do some swank debugging if someone can guide me through it.

Here is my system info:

Lisp Implementation: SBCL 
Lisp Version: 2.0.4 
Operating System: Linux 5.4.41-gnu 
Features: (WEBKIT2 WEBKIT2-2.28 WEBKIT2-EMOJI WEBKIT2-MEDIA WEBKIT2-SANDBOXING
           GTK-3-22 GTK-3-20 GTK-3-18 GTK-3-16 GTK-3-14 GTK-3-12 GTK-3-10
           GTK-3-8 GTK-3-6 GTK-3-4 GTK GDK-3-22 GDK-3-20 GDK-3-18 GDK-3-16
           GDK-3-14 GDK-3-12 GDK-3-10 GDK-3-8 GDK-3-6 GDK-3-4 CAIRO-1-10
           CAIRO-1-12 GDK-PIXBUF GLIB-2-30 GLIB-2-32 GLIB-2-34 GLIB-2-36
           GLIB-2-38 GLIB-2-40 GLIB-2-42 GLIB-2-44 GLIB-2-46 GLIB-2-48
           GLIB-2-50 GLIB-2-52 GLIB-2-54 GLIB-2-56 GLIB-2-58 GLIB
           FSET-EXT-STRINGS SWANK PLUMP-UTF-32 DECLARE-TYPES PARENSCRIPT
           NAMED-READTABLES OSICAT-FD-STREAMS 21BIT-CHARS CL-FAD CHUNGA
           FLAT-NAMESPACE X86-64 UNIX CFFI FLAT-NAMESPACE FLEXI-STREAMS
           CLOSER-MOP SB-BSD-SOCKETS-ADDRINFO SPLIT-SEQUENCE CL-PPCRE-UNICODE
           CL-UNICODE CL-PPCRE BORDEAUX-THREADS SEQUENCE-EMPTYP ASDF3.3 ASDF3.2
           ASDF3.1 ASDF3 ASDF2 ASDF OS-UNIX NON-BASE-CHARS-EXIST-P ASDF-UNICODE
           X86-64 GENCGC 64-BIT ANSI-CL COMMON-LISP ELF IEEE-FLOATING-POINT
           LINUX LITTLE-ENDIAN PACKAGE-LOCAL-NICKNAMES SB-CORE-COMPRESSION
           SB-LDB SB-PACKAGE-LOCKS SB-THREAD SB-UNICODE SBCL UNIX)

Cheers!

@Ambrevar
Copy link
Member Author

Ambrevar commented May 20, 2020 via email

@Ambrevar
Copy link
Member Author

Fixed in b24e9a3.
You'll need to guix pull first (after commit 0fadc00a1a5bfbb6ca9caf1bfae06e839684843c).

Feel free to reopen if the issue persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2-series Related to releases whose major version is 2.
Development

Successfully merging a pull request may close this issue.

5 participants