-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocked indefinitely in copyIterateeToMVar #14
Comments
I have discovered another deadlock, this time I can actually reproduce it quite reliably on my local machine (Mac). https://gist.github.com/wereHamster/fbc3e53837007a5d6f8c43c0ecf7317e It's essentially a websocket server and client in one. The client connects to the server. The server starts sending messages (with increasing size) to the client. Compile with The number you see is the length of the message that was sent. It increases by 10 in each iteration. Quite often (>75% of the runs), the output stops after less than a second. When it stops it's because the server is not sending any data, is blocked in |
I can reproduce it reliably under Mac, but not under Linux. Should I file it as a bug with GHC (IO manager)? |
I think we should try to reduce the example to something much smaller first -- you can't expect GHC devs to read through library code. I'll see if I can spot something fishy. I don't currently have a Mac though. |
That would mean working through snap internals, then io-streams, then network :( I know the sending side is blocked inside the send(2) syscall, that much I was able to track down. |
I tried to make roughly the server/client setup with just Through threadscope I was able to find out that the sending thread is blocked in an MVar. If I remove the lock in ThreadscopeThis trace is with the lock in You see almost 100% activity in the first half of each green 'block'. That's both the sending and receiving side doing their job. Eventually the sending side (thread 4) is blocked on an MVar, and the activity slightly decreases. From then on the receiving thread (thread 6) is mostly active, until it, too, blocks on an MVar. With the lock in |
Are there still similar issues when you use a recent Snap (where we don't need the iteratee/MVar trick anymore)? |
I compiled the code against stackage nightly 2017-08-25, which contains snap-{core,server}-1.0.3.0. Is that recent enough? |
Yeah, that snapshot should no longer contain |
I'm confused though what MVar the writing thread is blocking on. Since I removed the one in |
Next insight: Inside the websocket handler in the server, at the end I loop forever, reading from the connection and discarding the data (ie. |
I noticed that I can prevent this deadlock by sending a ping along with each data frame, forcing the remote side to respond with a pong and thereby waking the local receiver thread up. |
I have a strange bug which appears to cause the
EL.head
incopyIterateeToMVar
to block indefinitely if the application is compiled with-threaded
. I usedprint
style debugging to track that down: aprint
statement just beforeEL.head
is executed while aprint
statement just after is not.I was not able to reproduce the bug on my Mac OS X development machine. The OS where the proxy runs is a Linux server. GHC 7.10.1.
The application is a rather simple websocket proxy: it is a snap server which accepts requests from web browsers and forwards the messages to one of the available backend servers (and messages from backend servers back to the correct clients).
The symptoms are that the server stops forwarding messages to the backend server, but messages from the backend server do make their way through the proxy to the browser. This happens only after the first message from the backend server passes through to the web browser. Until then the proxy is accepting messages just fine. A typical session looks like this ([C]lient, [P]roxy, [B]ackend):
However, if the client continuously sends messages through the proxy (say, every 100ms), then the bug sometimes disappears (but not always).
What's even more interesting, is that the bug appears to depend on the pattern of messages that the backend sends to the client. I have different backends, and for some the bug shows in 100% of the attemps, and for others never, so there is a strong correlation between the backend used and the bug. The message pattern is different for each backend.
I suspect the problem is somewhere in Snap (or how websockets-snap interacts with snap), because:
netstat
), so the proxy (at least the snap part of it) is reading the data from the linux kernel socket into the application memory.The text was updated successfully, but these errors were encountered: