-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReadStream copies values into memory too soon #17
Comments
Definitely can't take a global write lock. Maybe there's a way to change the cache structure (significantly) to allow per-key synchronization. I bet a smarter play would be to siphon into a buffer off-cache, making regular checkpoints to ensure we don't exceed the max, and then atomically moving the siphoned data into the cache on completion. Something along these lines... I think it's feasible. Give me a moment to think on it. |
Yeah, that's kind of the direction I was going. Main thing I don't like about that is then potentially you have multiple readers trying to generate redundant cache entries alongside each other. But maybe I'm overoptimising. In the meantime, I verified that the existing behaviour is already kinda racy :-/ |
Haha, whoops! :person_frowning: |
Partially addresses #17. Big thanks to @jonboulle for spotting that head-slapper.
@jonboulle just to be clear, this is already fixed for your specific use-case with #21, correct? |
@peterbourgon well technically I would say that for our specific use case we are just sidestepping the problem :-). I would still love to see this fixed and am kind of mulling on it in the back of my brain - if you wouldn't mind leaving it open a bit longer maybe I can come up with a patch... |
Understood. Yeah, I'm mulling it too, just not finding quite as much mull-time as I'd like :) |
Is there any plan to address this issue? We've found that setting a CacheSizeMax results in using a lot more heap memory and then ultimately still doesn't limit disk cache through restarts. |
As currently implemented,
Diskv.ReadStream
is not a purely streaming reader, because it buffers an entire copy of the value (in thesiphon
) before attempting to put the value into the cache. Unfortunately with large values this is a recipe for memory exhaustion.Ideally we would stream the value directly into the cache as the
io.ReadCloser
thatReadStream
returns is consumed, checking the cache size as we go. I started to go down this path, but it creates another race condition because then writing into the cache is not atomic: we cannot know when theReadCloser
will be finished consuming the entry, and it's very possible for others to beginRead
ing the same key-value pair as we're still writing it. So the next step down that road is for readers to actually take a lock per cache entry (which would then get released once the callerClose
s theReadCloser
). This quickly became a web of mutexes and synchronisation hacks which felt very unidiomatic golang.Various simple "solutions" exist (e.g. prechecking the size of the file against the cache size before starting to siphon), but they are all inherently racy and could still lead to memory exhaustion under stressful conditions. (We could also just take a global write lock during reads, but that wouldn't be very nice to other readers, would it?)
@peterbourgon @philips thoughts?
The text was updated successfully, but these errors were encountered: