Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent short-lived connections lead to OOM #1

Open
msakrejda opened this issue May 23, 2013 · 3 comments
Open

Frequent short-lived connections lead to OOM #1

msakrejda opened this issue May 23, 2013 · 3 comments

Comments

@msakrejda
Copy link

After a lot of activity like this:

May 23 18:32:54 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:54 client connects with identifier "c867499b-92f9-4ef3-8a08-95b089b17042"
May 23 18:32:54 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:54 Disconnect client: client disconnects
May 23 18:32:54 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:54 Disconnect client: client disconnects
May 23 18:32:54 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:54 client connects with identifier "c867499b-92f9-4ef3-8a08-95b089b17042"
May 23 18:32:54 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:54 Disconnect client: client disconnects
May 23 18:32:54 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:54 client connects with identifier "c867499b-92f9-4ef3-8a08-95b089b17042"
May 23 18:32:54 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:54 client connects with identifier "c867499b-92f9-4ef3-8a08-95b089b17042"
May 23 18:32:55 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:55 client connects with identifier "c867499b-92f9-4ef3-8a08-95b089b17042"
May 23 18:32:55 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:55 Disconnect client: client disconnects
May 23 18:32:55 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:55 Disconnect client: client disconnects
May 23 18:32:55 ip-10-60-95-103 pg_logplexcollector: pg_logplexcollector 2013/05/23 18:32:55 Disconnect client: client disconnects

I get this:

May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: throw: runtime: out of memory
May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: 
May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: goroutine 73591 [running]:
May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: reflect.unsafe_NewArray(0xf840079000, 0x521a78, 0x4, 0x2, 0x176, ...)
May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: #011/build/buildd/golang-1/src/pkg/runtime/iface.c:700 +0x67
May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: reflect.MakeSlice(0xf840079000, 0x591c08, 0x400000004, 0xf840079000, 0x591c08, ...)
May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: #011/usr/lib/go/src/pkg/reflect/value.go:1650 +0x16d
May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: encoding/asn1.parseField(0x591c08, 0xf856f8cdd0, 0x176, 0xf858bec7cc, 0x25000000027, ...)
May 23 18:32:57 ip-10-60-95-103 pg_logplexcollector: #011/usr/lib/go/src/pkg/encoding/asn1/asn1.go:640 +0x12e2
...

Perhaps the suicide exit to prevent bloat needs to happen faster? This is on a machine with just 2GB.

@msakrejda
Copy link
Author

Happened again to this same DB. I'm kind of skeptical of more frequent suicide at this point. Could Go 1.1 be worth a shot here? Or alternately using a worker pool, rather than spawning new goroutines per-connection (kinda gross, but if that's where the problem is, it could help).

@fdr
Copy link
Member

fdr commented May 24, 2013

I don't think that's where the problem is. Some discussion suggested the more likely thing to finger are the threads that are spawned to make system calls.

Go 1.1 may help but there's also some evidence it could hurt. Faster suicide may help.

I told Dave Cheney I'd get around to making a test driver to help track go bloat. I committed to that, but if you feel inclined to do this, of course feel free?

@msakrejda
Copy link
Author

Interesting. And yeah, perhaps. My plate is equally full at the moment, but maybe when things clear up...

Incidentally, I took a stab at quantifying the problem by polling ps periodically. This is what happened around the restart:

heroku@ip-10-60-95-103:~$ while true
do
  ps --no-headers -opid,start,rss,vsz -C pg_logplexcollector
  sleep 5
done
...
13869 23:23:21 193996 638164
13869 23:23:21 194260 638292
13869 23:23:21 194260 638420
13869 23:23:21 194520 638676
13869 23:23:21 194784 638804
13869 23:23:21 194784 638932
 3737 00:23:24     4  35920
 3737 00:23:24  4188 290120
 3737 00:23:24  4680 291656
 3737 00:23:24  5132 291912
 3737 00:23:24  5560 301324
 3737 00:23:24  5812 301452
 3737 00:23:24  6060 301708

So up to 200MB for this particular DB after an hour, and this is after the workload was tweaked to use more persistent (but still relatively short-lived) connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants