-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigation for remote snapshot filecache overflow #15077
Mitigation for remote snapshot filecache overflow #15077
Conversation
❌ Gradle check result for 9b494b6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Thanks for the effort Finn! |
Looking at the code:
Would it be possible to do this at 70-90% of capacity and at > 100%? At >100% you are already too late right? |
❌ Gradle check result for 6f86ed4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@andrross tests which exceed the cache capacity have drastically fewer failed requests with 'System.gc()' and a small sleep. Is this appropriate for a mitigation or is it preferable to just fail and make no attempt to prune the cache here? |
@finnegancarroll Yeah we shouldn't be trying to invoke GC manually. The better way to do this, if we need to do this at all, is to do the tracking of the unclosed cloned objects in the parent via weak references, and then deterministically close everything when the parent is closed. I think that would have the same effect, because anything that is eligible for garbage collection must have already had the originating input stream closed. |
❌ Gradle check result for 38b3b95: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for f72d436: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Reverted to just failing requests on cache overflow. |
TransferManager fails BlobFetchRequest on full cache Signed-off-by: Finn Carroll <[email protected]> (cherry picked from commit 8f34ce5) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
TransferManager fails BlobFetchRequest on full cache (cherry picked from commit 8f34ce5) Signed-off-by: Finn Carroll <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
TransferManager fails BlobFetchRequest on full cache Signed-off-by: Finn Carroll <[email protected]> (cherry picked from commit 8f34ce5) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Andrew Ross <[email protected]>
…#15077) TransferManager fails BlobFetchRequest on full cache Signed-off-by: Finn Carroll <[email protected]>
Maybe I'm not looking in the right place, but did this make it into the 2.17.0 release? Can't find it in the release notes? |
The backport was merged looks like, #15761, maybe a release notes wasn't updated? Where did you look? |
Ah I do see it now: https://github.com/opensearch-project/OpenSearch/releases/tag/2.17.0 Might have made a typo while searching, sorry! |
Looks like this was backported in 2.17, but not in 2.x @finnegancarroll Can you check once - if required please raise a manual packport and close the auto-backport PR. |
…#15077) TransferManager fails BlobFetchRequest on full cache Signed-off-by: Finn Carroll <[email protected]> (cherry picked from commit 8f34ce5) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Andrew Ross <[email protected]>
TransferManager fails BlobFetchRequest on full cache (cherry picked from commit 8f34ce5) Signed-off-by: Finn Carroll <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Andrew Ross <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…#15077) TransferManager fails BlobFetchRequest on full cache Signed-off-by: Finn Carroll <[email protected]>
…#15077) TransferManager fails BlobFetchRequest on full cache Signed-off-by: Finn Carroll <[email protected]>
…#15077) TransferManager fails BlobFetchRequest on full cache Signed-off-by: Finn Carroll <[email protected]>
Description
Remote snapshot file_cache does not strictly limit it's disk usage to its configured capacity. This change causes subsequent requests to the file cache to fail when the cache overflows. #11676 for more details.
Related Issues
Mitigation for #11676
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.