Description
I have an application which was reading capnproto structures from a network and feeding it to ~600 goroutines which would try analyzing it (read-only) in parallel. This caused a massive lock contention, where according to perf
, ReadLimiter's spinlock only was responsible for 40% of cumulative CPU usage, please see flamegraph.html.gz (compressed to avoid github from stripping embedded JS)
Once we adapted 1b552dd application's CPU usage dropped more than 10x, from 60 cores to <6 cores.
I think this library might benefit from replacing spinlocks with sleeping locks. Spinlocks are generally not the best choice for userspace where you can't guarantee having the CPU while holding the lock. In case your goroutine gets the lock and gets descheduled other application threads will be stupidly burning CPU cycles until both kernel and golang scheduler agree to give CPU to the goroutine holding the lock. Another disadvantage is that CPU usage poorly correlates with the amount of actual work achieved and can go up unbounded.