-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thread to periodically perform pending cache maintenance #2308
base: main
Are you sure you want to change the base?
Add thread to periodically perform pending cache maintenance #2308
Conversation
06852a4
to
8392b1d
Compare
@owenhalpert The changes look good in general. What I'd be interested in is testing under load for a resource constrained system - can you verify if this adds to the latency or impact the performance in any way? We did implement a force evict before writes with #2015. Can you also enable this feature flag and run the above tests to ensure it behaves well? |
* for more details. Thus, to perform any pending maintenance, the cleanUp method will be called periodically from a CacheMaintainer instance. | ||
*/ | ||
public class CacheMaintainer<K, V> implements Closeable { | ||
private final Cache<K, V> cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also avoid maintaining the Cache object reference here by using a functional interface. That would also get rid of the generification of this class.
Simply pass and store the runnable reference instead of the cache as new CacheMaintainer(() -> cache.cleanUp());
Possibly also move this class to the util package and call it a ScheduledExecutor
with Runnable ref and interval as parameters.
public class ScheduledExecutor implements Closeable {
...
public ScheduledExecutor(Runnable reference, long scheduleMillis) {
...
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kotwanikunal what do you think about creating the executor and calling scheduleAtFixedRate within each cache class instead of creating a new ScheduledExecutor class?
558e7ea
to
b10f30f
Compare
@kotwanikunal I've completed the benchmarking on a single node cluster limited to 3GB of memory. Before benchmarking my code, I validated the CacheMaintainer was actually running by inspecting the OpenSearch process status in the Docker container and found that after about a minute, the Results below:
This suggests there is no significant impact on latency with my code changes. I've included the full results of these test runs here: https://gist.github.com/owenhalpert/05ad4f5ae9577f717f2c59f2039d52e4 |
30f53b7
to
dc61f6e
Compare
dc61f6e
to
1ca8dff
Compare
...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java
Outdated
Show resolved
Hide resolved
Signed-off-by: owenhalpert <[email protected]>
1ca8dff
to
85b1782
Compare
…UTES Signed-off-by: owenhalpert <[email protected]> Signed-off-by: owenhalpert <[email protected]>
…UTES Signed-off-by: owenhalpert <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments overall looks good
.../org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCacheManager.java
Show resolved
Hide resolved
try { | ||
cache.cleanUp(); | ||
} catch (Exception e) { | ||
logger.error("Error cleaning up cache", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why swallow exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any exceptions from Guava cache operations would otherwise halt our scheduled executor. If an exception occurs here, it would be from Guava's internals rather than our logic, so per Dooyong's suggestion above we can log it and continue scheduling cleanup tasks.
This way we can ensure the cache maintenance keeps running even if individual cleanup attempts fail, and we can monitor Guava errors in the logs.
Signed-off-by: owenhalpert <[email protected]>
*/ | ||
public ScheduledExecutor(Runnable task, long scheduleMillis) { | ||
this.task = task; | ||
this.executor = Executors.newSingleThreadScheduledExecutor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be creating a new thread for each instance, I wanted to understand the thought process behind this.
I understand that this is used for cache so the threads won't grow exponentially, but since its in util package and the name is generic - any misuse of this can create unbounded number of threads with each new instance. Can we be a bit more defensive here?
The suggestion here is to refactor and make sure there are fixed number of threads for cache cleanup across caches
Description
Adds a ScheduledExecutor class that, for each cache, takes in a Runnable with a call to
cleanUp
and periodically executes according to the values ofKNN_CACHE_ITEM_EXPIRY_TIME_MINUTES
andQUANTIZATION_STATE_CACHE_EXPIRY_TIME_MINUTES
. This will perform any pending maintenance (such as evicting expired entries) which was previously only performed when the cache was accessed. The maintenance thread is created whenever a NativeMemoryCache or QuantizationStateCache is instantiated or rebuilt and can be shut down with either class'sclose
method. Relevant logic for cleanup was added to some testing base classes.Related Issues
Resolves #2239
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.