Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hight CPU usage - Socket never closed #123

Open
paillardf opened this issue Dec 18, 2024 · 11 comments
Open

Hight CPU usage - Socket never closed #123

paillardf opened this issue Dec 18, 2024 · 11 comments

Comments

@paillardf
Copy link

Hello,
It seems that the production version used in Rethink DNS (ee0a5ac) has an issue with sockets not being closed. After a day of use, the CPU usage reaches 50% even though no traffic is being generated. When we extract the processes, it appears that this usage originates from the GO code in Firestack.

If we release the library, a large number (+100) of socket closure logs are generated:
onSocketClosed: SocketSummary{Proto:udp,ID:13d815e967215ae5,PID:Base,UID:-1,Target:192.0.2.255,Rx:0,Tx:983224,Duration:5979,Rtt:0,Msg:read udp 192.0.2.1:53449->192.0.2.255:32761: use of closed network connection,Dup:true,}

The library is configured as follows in our project:

addDnsTransport(it, dnsServer)  
Intra.setSystemDNS(it, systemDns.joinToString(","))  
it.setRoute(InternetProtocol.IPv46.value())  
it.setTunMode(  
    Settings.DNSModePort,  
    Settings.BlockModeFilter,  
    Settings.PtModeAuto  
)  
it.resolver.translate(false)  

Are we supposed to monitor and close some socket ourself? is it a bug in the library?
Thanks in advance for your support.

@ignoramous
Copy link
Contributor

ignoramous commented Dec 26, 2024

Thanks.

we release the library, a large number (+100) of socket closure logs are generated

Netstack had a bug re: forever UDP sockets which we upstreamed & sync'd back to firestack (v055o, the next Rethink version will also carry this fix):

Unsure if ee0a5ac contains the tunnel.CloseConns("") API; empty string will close all open (TCP/UDP) conns.

There's a bug which we haven't been able to fix but it involves forever open TCP conns (and not UDP but over WireGuard). Or, it could also be a bug in our connection tracker that it may be tracking an already closed TCP connection.

After a day of use, the CPU usage reaches 50% even though no traffic is being generated

Strange. We never noticed this, but we do occassionally get reports from users complaining about battery drains.

If you can, please see if the CPU use goes away on the current tip (there's been some 600+ commits since).

Also, on the current tip, from client code, you can:

  1. intra.PrintStack() to dump all active goroutine stacktraces to stdout / logcat.
  2. intra.Tunnel.Stat() and get back a full list of open sockets, their duration, and other metadata about the tunnel & the emulated tun device, GC stats, network stats etc as a Java/Kotlin object (its .toString() will recursively print all its fields if called over gomobile generated JNI/FFI) or a Go struct.

Are we supposed to monitor and close some socket ourself?

Not really.

is it a bug in the library?

Possibly.

@paillardf
Copy link
Author

Thanks for your answer @ignoramous
I'm trying to upgrade the library to the latest version available to see if the leak is resolved but I run into the following issue when a DNS request is made:

ipmap.go:369: W ipmap: Add: err resolving 1dot1dot1dot1.cloudflare-dns.com: lookup 1dot1dot1dot1.cloudflare-dns.com on localhost: no resolver
ipmap.go:213: W ipmap: Get: zero ips for 1dot1dot1dot1.cloudflare-dns.com:853
dot.go:300: W dot: (Preferred) no proxy for 1dot1dot1dot1.cloudflare-dns.com:853; choosing Base among [Base]
ipmap.go:369: W ipmap: Add: err resolving 1dot1dot1dot1.cloudflare-dns.com: lookup 1dot1dot1dot1.cloudflare-dns.com on localhost: no resolver
ipmap.go:213: W ipmap: Get: zero ips for 1dot1dot1dot1.cloudflare-dns.com
ipmap.go:369: W ipmap: Add: err resolving 1dot1dot1dot1.cloudflare-dns.com: lookup 1dot1dot1dot1.cloudflare-dns.com on localhost: no resolver
ipmap.go:203: W ipmap: Add: zero ips for 1dot1dot1dot1.cloudflare-dns.com
dot.go:188: W dot: tlsdial: (Preferred) nil conn/err for 1dot1dot1dot1.cloudflare-dns.com:853, ech? false; err? unknown network no ips
dot.go:326: W dot: ans? ;; opcode: QUERY, status: SERVFAIL, id: 49375
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;clients4.google.com.	IN	 A
 err(unknown network no ips) / ans(0)
cacher.go:370: W cache: barrier: hangover(k: clients4.google.com:1); discard ans (has? true)

My implementation is the same as before with ee0a5ac. Can I find an example anywhere to compare what I'm supposed to change to use a DoT/DoH server now?
Thank you

@Lanius-collaris
Copy link
Contributor

@ignoramous

onSocketClosed: SocketSummary{Proto:udp,ID:13d815e967215ae5,PID:Base,UID:-1,Target:192.0.2.255,Rx:0,Tx:983224,Duration:5979,Rtt:0,Msg:read udp 192.0.2.1:53449->192.0.2.255:32761: use of closed network connection,Dup:true,}

Does firestack close a pair of net.Conn in one goroutine and guarantee that these two net.Conn are not closed until another forwarding goroutine stops?

@Lanius-collaris
Copy link
Contributor

lookup 1dot1dot1dot1.cloudflare-dns.com on localhost: no resolver

firestack needs a bootstrap DNS server if you don't pass IPs to AddDoTTransport()

@paillardf
Copy link
Author

lookup 1dot1dot1dot1.cloudflare-dns.com on localhost: no resolver

firestack needs a bootstrap DNS server if you don't pass IPs to AddDoTTransport()

How am I supposed to configure the bootstrap DNS server ? To me it was the point of calling Intra.setSystemDNS but it's not anymore...

@paillardf
Copy link
Author

@ignoramous

onSocketClosed: SocketSummary{Proto:udp,ID:13d815e967215ae5,PID:Base,UID:-1,Target:192.0.2.255,Rx:0,Tx:983224,Duration:5979,Rtt:0,Msg:read udp 192.0.2.1:53449->192.0.2.255:32761: use of closed network connection,Dup:true,}

Does firestack close a pair of net.Conn in one goroutine and guarantee that these two net.Conn are not closed until another forwarding goroutine stops?

I'm not sure how to answer you here. All I have is the following logs and a CPU trace.
logcat.txt
cpu-simpleperf-20241216T104527.trace.zip

@ignoramous
Copy link
Contributor

ignoramous commented Jan 2, 2025

Does firestack close a pair of net.Conn in one goroutine and guarantee that these two net.Conn are not closed until another forwarding goroutine stops

I think so. conntracker.Untrack should close the pair after both upload+download are done.

firestack/intra/common.go

Lines 229 to 237 in dc54c93

h.conntracker.Track(cid, local, remote)
defer h.conntracker.Untrack(cid)
uploadch := make(chan ioinfo)
go upload(cid, via, local, remote, uploadch)
dbytes, derr := download(cid, via, local, remote)
upload := <-uploadch

Hm, I think golang requires clients to set rw deadlines on UDP sockets. May be that's keeping the conns up?


Create a new default (bootstrap) DNS:

// NewDefaultDNS creates a new DefaultDNS resolver of type typ. For typ DOH,
// url scheme is http or https; for typ DNS53, url is ipport or csv(ipport).
// ips is a csv of ipports for typ DOH, and nil for typ DNS53.
func NewDefaultDNS(typ, url, ips string) (DefaultDNS, error) {

//dns53
intra.NewDefaultDNS(backend.DNS53, "1.1.1.2,8.8.4.4,9.9.9.10", "")

// doh
intra.NewDefaultDNS(backend.DOH, "https://dns.cloudflare-dns.com/dns-query", "1.0.0.1,1.1.1.1")

Pass its ref to intra.Connect:

func Connect(fd, mtu int, fakedns string, dtr DefaultDNS, bdg Bridge) (t Tunnel, err error) {


We aren't really paying attention to the APIs tbh. And so, if you want us to simplify/stabilize it or extensively document them, let us know!

@Lanius-collaris
Copy link
Contributor

Hm, I think golang requires clients to set rw deadlines on UDP sockets. May be that's keeping the conns up?

Deadlines are used to "generate" timeout errors to prevent infinite loops.

Does firestack close a pair of net.Conn in one goroutine and guarantee that these two net.Conn are not closed until another forwarding goroutine stops

I think so. conntracker.Untrack should close the pair after both upload+download are done.

It seems not. core.CloseOp() may close a net.UDPConn immediately.

case *net.UDPConn:

@ignoramous
Copy link
Contributor

core.CloseOp() may close a net.UDPConn immediately.

For full duplex TCP, it closes read on one end, write on the other. No way to do so for UDP?

Deadlines are used to "generate" timeout errors to prevent infinite loops

True. Believe io.CopyBuffer that's used for upload+download is infinitely looping until error?

return io.CopyBuffer(dst, src, b)

@paillardf
Copy link
Author

CPU usage seems to be under control with the latest version. I will keep a eye on it and let you know.

@Lanius-collaris
Copy link
Contributor

Deadlines are used to "generate" timeout errors to prevent infinite loops

True. Believe io.CopyBuffer that's used for upload+download is infinitely looping until error?

return io.CopyBuffer(dst, src, b)

Yes, but rwext, one of net.Conn used by firestack has a special Read() :
https://github.com/golang/go/blob/go1.23.4/src/io/io.go#L398-L456

func (rw rwext) Read(b []byte) (n int, err error) {

core.CloseOp() may close a net.UDPConn immediately.

For full duplex TCP, it closes read on one end, write on the other. No way to do so for UDP?

In my gvisor-playground, one of two goroutines ( there may be more goroutines in some cases ) checks a shared variable in addition to setting deadlines, and another goroutine stores "the time of last WriteTo()" in this shared variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants