-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket errors with unixsocket under load #1281
Comments
@nickbabcock thanks for the thorough bug report, we'll dig into this. |
I took another look at this issue. Used the latest master on Jetty and used jnr 0.18. This might be a lead: I was able to cut out haproxy by using a recent version of curl (7.40+). Have two shells running: while [ 1 ]; do
curl --unix-socket /var/run/jetty.sock 'http://localhost' -d @<large-file> >/dev/null
done The two will run side by side for an indeterminate amount of time before one of them hangs indefinitely, while the other one continues to run without any problems. Which one hangs is indeterminate. |
@nickbabcock just a note to confirm that we can reproduce the problem. However, there are no indications yet as to exactly what the problem is. It does not look like a simple dead lock or anything similar. But we should be able to work out a bit more now we can reproduce. |
@nickbabcock I have updated the java test client to operate in the same style as curl, but it fails to reproduce the problem. So I'm not sure where in the conversation it is hung.... I wonder if there is the equivalent for wireshark for unix sockets? |
Using --verbose on curl, I see now that it is hanging at a connect, as the last line logged is the very first line:
So either the connection is not being received by Jetty, is not being noticed by jetty, or is noticed but somehow is lost. hhmmmmmmm.... |
Still struggling to find anything in Jetty: I've tried both async and blocking accepts; I've tried looping on accept to ensure all connections are accepted, but no go. I'm concerned it may be a bug in JNR... let me ask them to see if they can shed light... |
@gregw check your lsof / file descriptors / open files / ulimit settings. It sounds like you are running low. |
I'm configured for 80000 FDs and the most unix domain sockets I see running these tests has been about 580. A single busy client never fails. But one client will fail soon after a second test is started. I'm currently using 2 clients loops: while :; do curl -v --unix-socket /tmp/jetty.sock http://localhost/ ; done and while :; do echo -e "GET / HTTP/1.1\r\nHost: socket\r\n\r\n" | netcat -U /tmp/jetty.sock ; done Running two curl clients, one or the other will hang within seconds; running two netcat clients I never see a failure; Running a netcat and a curl, the curl will fail withing seconds. So it could still be a curl problem.... |
I don't think it is curl code because load testing tools like wrk and k6 hang / error as well (but this is with haproxy in the middle). |
I can't make the following pure JNR server fail: /*
* This file is part of the JNR project.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.eclipse.jetty.unixsocket;
import jnr.enxio.channels.NativeSelectorProvider;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.util.Set;
import java.util.Iterator;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.eclipse.jetty.util.StringUtil;
import org.eclipse.jetty.util.TypeUtil;
import jnr.unixsocket.UnixServerSocket;
import jnr.unixsocket.UnixServerSocketChannel;
import jnr.unixsocket.UnixSocketAddress;
import jnr.unixsocket.UnixSocketChannel;
public class JNRServer
{
public static void main(String[] args) throws IOException
{
java.io.File path = new java.io.File("/tmp/jetty.sock");
path.deleteOnExit();
UnixSocketAddress address = new UnixSocketAddress(path);
UnixServerSocketChannel channel = UnixServerSocketChannel.open();
try
{
Selector sel = NativeSelectorProvider.getInstance().openSelector();
channel.configureBlocking(false);
channel.socket().bind(address);
channel.register(sel,SelectionKey.OP_ACCEPT,new ServerActor(channel,sel));
while (sel.select() > 0)
{
Set<SelectionKey> keys = sel.selectedKeys();
Iterator<SelectionKey> iterator = keys.iterator();
while (iterator.hasNext())
{
SelectionKey k = iterator.next();
Actor a = (Actor)k.attachment();
if (!a.rxready())
{
k.cancel();
}
iterator.remove();
}
}
}
catch (IOException ex)
{
Logger.getLogger(UnixServerSocket.class.getName()).log(Level.SEVERE,null,ex);
}
System.out.println("UnixServer EXIT");
}
static interface Actor
{
public boolean rxready();
}
static final class ServerActor implements Actor
{
private final UnixServerSocketChannel channel;
private final Selector selector;
public ServerActor(UnixServerSocketChannel channel, Selector selector)
{
this.channel = channel;
this.selector = selector;
}
public final boolean rxready()
{
try
{
UnixSocketChannel client = channel.accept();
client.configureBlocking(false);
client.register(selector,SelectionKey.OP_READ,new ClientActor(client));
return true;
}
catch (IOException ex)
{
return false;
}
}
}
static final class ClientActor implements Actor
{
String request = "";
String response = "HTTP/1.1 200 OK\r\n"
+ "Content-Length: 14\r\n"
+ "Content-Type: text/plain\r\n"
+ "Connection: close\r\n"
+ "\r\n"
+ "Hello World!\r\n";
private final UnixSocketChannel channel;
public ClientActor(UnixSocketChannel channel)
{
this.channel = channel;
}
public final boolean rxready()
{
try
{
ByteBuffer buf = ByteBuffer.allocate(1024);
while (true)
{
buf.clear();
int n = channel.read(buf);
UnixSocketAddress remote = channel.getRemoteSocketAddress();
System.err.printf("Read in %d bytes from %s\n",n,remote);
if (n == 0)
return true;
if (n < 0)
return false;
buf.flip();
request += new String(buf.array(),buf.arrayOffset(),buf.remaining());
System.err.println(TypeUtil.toHexString(request.getBytes()));
if (request.endsWith("\r\n\r\n"))
{
System.err.println("Read request:");
System.err.println(request);
channel.write(ByteBuffer.wrap(response.getBytes()));
channel.shutdownOutput();
}
return true;
}
}
catch (IOException ex)
{
ex.printStackTrace();
return false;
}
}
}
} with any client. So I guess it does indicate something in jetty.... but I cannot see what we are doing differently??? |
Calling UnixSocketConnector.setAcceptQueueSize(65530) seems to make the issue go away. Lower values could work too but did I not test it. @gregw @nickbabcock could you confirm? |
I have found an issue in the JNR implementation that I believe could be responsible for these failures, however I'm not sure how it relates to your queuesize finding (which might just avoid the problem rather than fix it). I have submitted a PR to JNR, but there has been no activity on that project for some time and the PR has not been accepted. However we have worked around the bug in commit 1921220, which was part of #2014 and included in jetty-9.4.9.v20180320. So can you try this release? |
I verified that I tested jetty-9.4.9.v20180320 originally, and did the tests again just to make sure, so the two issues are separate. However as PHP workers behind a unix socket need the same tweak I do not really think this is a problem, just something that needs a line in the UnixSocketConnector docs. Raising net.core.somaxconn is probably needed too. For the record with a queue size of 1 I needed 3 curls for one of them to freeze, with a queue size of 2 I needed 4. |
Excellent, I load tested again (this time simplified using jetty, socat, and wrk) and I see no errors 😄 Thanks for the hard work, working around that bug in jnr was incredible! |
Been experimenting with jetty's unix socket (using 9.4.0 and c7c183c) support and have received a few errors when putting the setup under load (low volume works fine). For instance, in a jersey app I've seen this stack exception:
so I decided to isolate using just jetty code.
Given a simple echo server that uses unix sockets:
And given the partial config for an ssl terminating HAProxy setup:
Stress testing the setup from another box:
where
wrk1M.lua
has contentswrk reports there are socket errors:
Also I see the following (single) warning in the jetty log:
To me there appears to be a few possibilities:
jnr
libraryThe text was updated successfully, but these errors were encountered: