-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve udp tunnel stability #406
base: master
Are you sure you want to change the base?
Conversation
share/tunnel/tunnel.go
Outdated
} | ||
} | ||
|
||
func (t *Tunnel) getSSHNoWait(ctx context.Context) ssh.Conn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't getSSH
equivalent when c != nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right.
Remove the wait may also cause high cpu usage, I will revert this change.
share/tunnel/tunnel_in_proxy_udp.go
Outdated
return u.Errorf("inbound-udpchan: %w", err) | ||
u.Errorf("inbound-udpchan: %w", err) | ||
// wait for a short while for the ssh connection | ||
time.Sleep(time.Duration(100) * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of sleeping, can use channels to know when the ssh connection is ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for adding sleep before is that, when the connection is lost, getSSH() will return nil immediately, and the runOutbound() loop will take up a lot of memory and cause the process to be killed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh the loop spins? I haven’t seen that happen before though that sounds like a bug, where the real fix might be to block with a waitgroup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After connection lost, activatingConnWait()
cannot block, because t.activatingConn.DoneAll()
is called.
Removing DoneAll()
fixes the issue.
@@ -68,6 +68,10 @@ func New(c Config) *Tunnel { | |||
return t | |||
} | |||
|
|||
func (t *Tunnel) Close() { | |||
t.activatingConn.DoneAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a lock/unlock + nil check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activatingConn is a waitGroup, it is value type and thread safe, so no need for lock and nil checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yea that's right, its a custom wait group (which might be a bad idea itself - anyway)
it should block while disconnected/connecting and it should not-block while connected
DoneAll here would stop blocking, but Tunnel Done means closed?
how does this improve UDP? sorry, I think I'm missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original code, when the connection is lost, DoneAll in the BindSSH method will be called, which causes activatingConnWait() to not work and cannot block.
This leads to high resource usage of runOutbound in udpListener. Calling u.getUDPChan(ctx)
will immediately gets nil instead of blocking.
Close
is added to correctly release resources in server mode, because DoneAll was removed from BindSSH.
In client mode, there is no need to call DoneAll because the same tunnel is always used.
#405