Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Netty Connection Read Timeout #477

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

exell-christopher
Copy link

We were debugging an issue with network connectivity, and noticed that having a setting for the SO timeout would be useful, so that unresponsive sockets timeout instead of hanging forever.

Still wracking my brain as to how best to test this, so for now all I have is the validation tests that verify that get getters and setters work.

@broach
Copy link
Contributor

broach commented Oct 6, 2014

SO_TIMEOUT would not affect anything here. We're using NIO (non-blocking) connections; there's nothing to time out on because the only time a socket is read from is after it signals it's ready for reading and then the read is non-blocking.

See: http://stackoverflow.com/questions/22892575/so-timeout-in-non-blocking-channel-in-netty

Early on I had written a handler as the Netty guys describe. I can't remember if it was when I was prototyping or if I just didn't include it for some reason .

What is causing you to have "hanging sockets" ?

@exell-christopher
Copy link
Author

We are seeing blocking sockets when the start tls call is issued, execution blocks waiting for the decoder promise, which will never returns.

io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:260)
com.basho.riak.client.core.RiakNode.doGetConnection(RiakNode.java:730)
com.basho.riak.client.core.RiakNode.start(RiakNode.java:262)

@exell-christopher
Copy link
Author

I'm putting a read timeout handler in place right now, to leverage the NIO sockets.

@broach
Copy link
Contributor

broach commented Oct 7, 2014

The line you reference (RiakNode.java:730) is waiting on the SSL handler's promise (well, erm, or something? In master it's a blank line but L729 does that). Are you trying to use SSL? Or have you just modified that file enough to push things down so that it's the normal connection wait?

Either way, you're in doGetConnection() so ... if something is blocking, it has nothing to do with reads. Either there's something not right with your SSL config (and for some reason we're not handling that error as we should), or you're actually blocking on the normal connect attempt which can be controlled by setting the connection timeout.

@exell-christopher
Copy link
Author

It's not an issue obtaining the connection, it's once it get's into start tls land. It's not related to the handshake failing, as that throws errors as expected.

@exell-christopher
Copy link
Author

Updated to do read timeout w/async sockets. I'm debugging against it locally to see if it exposes more info about the TLS issues.

@exell-christopher
Copy link
Author

With the added logging, I'm seeing what's below. It looks like a connection is being closed somewhere, or riak isn't responding to the start tls request. Probably worthwhile leaving the timeout in place, if only to catch issues like this.

It does appear that there would need to be some sort of handler for the timeouts, but I wasn't sure how the handling would work with the current flow. Would expose itself as a failed operation, or just throw an exception?

APP | 2014-10-07 06:45:51,084 [main] DEBUG com.basho.riak.client.core.RiakNode - Waiting for new connection from channel future to :8087
APP | 2014-10-07 06:45:51,085 [main] DEBUG com.basho.riak.client.core.RiakNode - Connection to :8087 successful
APP | 2014-10-07 06:45:51,085 [main] DEBUG com.basho.riak.client.core.RiakNode - trustStore set starting TLS
APP | 2014-10-07 06:45:51,087 [main] DEBUG com.basho.riak.client.core.RiakNode - Using TLSv1.2
APP | 2014-10-07 06:45:51,087 [main] DEBUG com.basho.riak.client.core.RiakNode - Waiting for authentication to complete with :8087
APP | 2014-10-07 06:45:51,087 [nioEventLoopGroup-2-2] DEBUG c.b.r.c.c.netty.RiakSecurityDecoder - MyStartTlsDecoder Channel Active
APP | 2014-10-07 06:45:51,088 [nioEventLoopGroup-2-2] DEBUG c.b.r.c.c.netty.RiakSecurityDecoder - MyStartTlsDecoder Handler Added
APP | 2014-10-07 06:45:51,088 [nioEventLoopGroup-2-2] DEBUG c.b.r.c.c.netty.RiakSecurityDecoder - Received MSG_RpbStartTls reply from :8087
APP | 2014-10-07 06:46:01,094 [nioEventLoopGroup-2-2] WARN c.b.r.c.c.netty.RiakSecurityDecoder - SSLHandshake Failed with:8087.
javax.net.ssl.SSLException: handshake timed out
APP | 2014-10-07 06:46:45,631 [nioEventLoopGroup-2-1] WARN i.n.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.

@exell-christopher exell-christopher changed the title Expose Netty Connection SO (read) Timeout Expose Netty Connection Read Timeout Oct 7, 2014
@alexmoore
Copy link
Contributor

@exell-christopher Is this still an issue for you?

@lukebakken
Copy link
Contributor

FWIW, I added a TLS handshake timeout in basho/riak_api#119 in this code

@exell-christopher
Copy link
Author

Thanks for adding that!

@lukebakken
Copy link
Contributor

@exell-christopher - we may still wish to add the code in this PR. @alexmoore ???

@exell-christopher
Copy link
Author

Unfortunately I don't currently have a project that is using Riak, so I can't comment on whether this is still needed, but I've found that having timeout settings exposed always seems to be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants