Add thread to periodically perform pending cache maintenance #2308

owenhalpert · 2024-12-03T22:50:05Z

Description

Adds a ScheduledExecutor class that, for each cache, takes in a Runnable with a call to cleanUp and periodically executes according to the values of KNN_CACHE_ITEM_EXPIRY_TIME_MINUTES and QUANTIZATION_STATE_CACHE_EXPIRY_TIME_MINUTES. This will perform any pending maintenance (such as evicting expired entries) which was previously only performed when the cache was accessed. The maintenance thread is created whenever a NativeMemoryCache or QuantizationStateCache is instantiated or rebuilt and can be shut down with either class's close method. Relevant logic for cleanup was added to some testing base classes.

Related Issues

Resolves #2239

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

kotwanikunal · 2024-12-04T10:22:07Z

@owenhalpert The changes look good in general. What I'd be interested in is testing under load for a resource constrained system - can you verify if this adds to the latency or impact the performance in any way?

We did implement a force evict before writes with #2015. Can you also enable this feature flag and run the above tests to ensure it behaves well?

kotwanikunal · 2024-12-04T10:36:47Z

src/main/java/org/opensearch/knn/index/CacheMaintainer.java

+ * for more details. Thus, to perform any pending maintenance, the cleanUp method will be called periodically from a CacheMaintainer instance.
+ */
+public class CacheMaintainer<K, V> implements Closeable {
+    private final Cache<K, V> cache;


You can also avoid maintaining the Cache object reference here by using a functional interface. That would also get rid of the generification of this class.

Simply pass and store the runnable reference instead of the cache as new CacheMaintainer(() -> cache.cleanUp());

Possibly also move this class to the util package and call it a ScheduledExecutor with Runnable ref and interval as parameters.

public class ScheduledExecutor implements Closeable { ... public ScheduledExecutor(Runnable reference, long scheduleMillis) { ... } ... }

@kotwanikunal what do you think about creating the executor and calling scheduleAtFixedRate within each cache class instead of creating a new ScheduledExecutor class?

owenhalpert · 2024-12-17T23:28:47Z

@kotwanikunal I've completed the benchmarking on a single node cluster limited to 3GB of memory.

Before benchmarking my code, I validated the CacheMaintainer was actually running by inspecting the OpenSearch process status in the Docker container and found that after about a minute, the RssAnon value would decrease by about 0.5gb without any interference on my part, signaling the maintainer successfully cleaned up the expired entries. This is aligned with what I saw on the standard maintenance on the clean code (which I triggered manually by accessing the cache).

Results below:

Performance Summary for Resource-Constrained Testing (3GB Memory Limit)

Run 1: Clean 2.18 code

Indexing:

p50 latency: 16.96 ms

p90 latency: 32.47 ms

Search:

p50 latency: 330.48 ms

p90 latency: 437.78 ms

Run 2: Clean 2.18 code, Force evict ON

Indexing:

p50 latency: 17.27 ms

p90 latency 33.82 ms

Search:

p50 latency 337.43 ms

p90 latency 428.70 ms

Run 3: PR changes added, Force evict ON:

Indexing:

p50 latency: 16.89 ms

p90 latency: 32.24 ms

Search:

p50 latency: 341.38 ms

p90 latency: 414.06 ms

Run 4: PR changes added, Force evict OFF:

Indexing:

p50 latency: 17.43 ms

p90 latency: 34.63 ms

Search:

p50 latency: 346.83 ms

p90 latency: 430.04 ms

This suggests there is no significant impact on latency with my code changes. I've included the full results of these test runs here:

https://gist.github.com/owenhalpert/05ad4f5ae9577f717f2c59f2039d52e4

src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java

src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

Signed-off-by: owenhalpert <[email protected]>

…UTES Signed-off-by: owenhalpert <[email protected]> Signed-off-by: owenhalpert <[email protected]>

…UTES Signed-off-by: owenhalpert <[email protected]>

jmazanec15

Minor comments overall looks good

.../org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCacheManager.java

jmazanec15 · 2024-12-23T16:46:10Z

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java

+                try {
+                    cache.cleanUp();
+                } catch (Exception e) {
+                    logger.error("Error cleaning up cache", e);


Why swallow exception here?

Any exceptions from Guava cache operations would otherwise halt our scheduled executor. If an exception occurs here, it would be from Guava's internals rather than our logic, so per Dooyong's suggestion above we can log it and continue scheduling cleanup tasks.

This way we can ensure the cache maintenance keeps running even if individual cleanup attempts fail, and we can monitor Guava errors in the logs.

Signed-off-by: owenhalpert <[email protected]>

shatejas · 2024-12-24T17:37:10Z

src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java

+     */
+    public ScheduledExecutor(Runnable task, long scheduleMillis) {
+        this.task = task;
+        this.executor = Executors.newSingleThreadScheduledExecutor();


This seems to be creating a new thread for each instance, I wanted to understand the thought process behind this.

I understand that this is used for cache so the threads won't grow exponentially, but since its in util package and the name is generic - any misuse of this can create unbounded number of threads with each new instance. Can we be a bit more defensive here?

The suggestion here is to refactor and make sure there are fixed number of threads for cache cleanup across caches

owenhalpert force-pushed the cache-maintenance branch from 06852a4 to 8392b1d Compare December 3, 2024 22:52

kotwanikunal reviewed Dec 4, 2024

View reviewed changes

owenhalpert force-pushed the cache-maintenance branch from 558e7ea to b10f30f Compare December 9, 2024 17:43

owenhalpert force-pushed the cache-maintenance branch from 30f53b7 to dc61f6e Compare December 19, 2024 18:16

owenhalpert changed the title ~~Add CacheMaintainer class to perform pending cache maintenance every minute~~ Add thread to perform pending cache maintenance every minute Dec 19, 2024

owenhalpert force-pushed the cache-maintenance branch 2 times, most recently from dc61f6e to 1ca8dff Compare December 19, 2024 19:20

owenhalpert marked this pull request as ready for review December 19, 2024 20:09

owenhalpert requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, ryanbogan, luyuncheng and shatejas as code owners December 19, 2024 20:09

0ctopus13prime reviewed Dec 20, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java Show resolved Hide resolved

0ctopus13prime reviewed Dec 20, 2024

View reviewed changes

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java Outdated Show resolved Hide resolved

0ctopus13prime reviewed Dec 20, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/util/ScheduledExecutor.java Outdated Show resolved Hide resolved

0ctopus13prime reviewed Dec 20, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/memory/NativeMemoryCacheManager.java Outdated Show resolved Hide resolved

Add thread to perform pending cache maintenance every minute

85b1782

Signed-off-by: owenhalpert <[email protected]>

owenhalpert force-pushed the cache-maintenance branch from 1ca8dff to 85b1782 Compare December 20, 2024 19:20

Modify QSC maintainer to use QUANTIZATION_STATE_CACHE_EXPIRY_TIME_MIN…

826e80a

…UTES Signed-off-by: owenhalpert <[email protected]> Signed-off-by: owenhalpert <[email protected]>

owenhalpert changed the title ~~Add thread to perform pending cache maintenance every minute~~ Add thread to periodically perform pending cache maintenance Dec 21, 2024

Modify QSC maintainer to use QUANTIZATION_STATE_CACHE_EXPIRY_TIME_MIN…

1c4b6bf

…UTES Signed-off-by: owenhalpert <[email protected]>

jmazanec15 reviewed Dec 23, 2024

View reviewed changes

Implement Closeable in QSCManager, document exception swallowing

14ed10f

Signed-off-by: owenhalpert <[email protected]>

shatejas reviewed Dec 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add thread to periodically perform pending cache maintenance #2308

Add thread to periodically perform pending cache maintenance #2308

owenhalpert commented Dec 3, 2024 •

edited

Loading

kotwanikunal commented Dec 4, 2024

kotwanikunal Dec 4, 2024

owenhalpert Dec 12, 2024

owenhalpert commented Dec 17, 2024

jmazanec15 left a comment

jmazanec15 Dec 23, 2024

owenhalpert Dec 23, 2024 •

edited

Loading

shatejas Dec 24, 2024 •

edited

Loading

Add thread to periodically perform pending cache maintenance #2308

Are you sure you want to change the base?

Add thread to periodically perform pending cache maintenance #2308

Conversation

owenhalpert commented Dec 3, 2024 • edited Loading

Description

Related Issues

Check List

kotwanikunal commented Dec 4, 2024

kotwanikunal Dec 4, 2024

Choose a reason for hiding this comment

owenhalpert Dec 12, 2024

Choose a reason for hiding this comment

owenhalpert commented Dec 17, 2024

jmazanec15 left a comment

Choose a reason for hiding this comment

jmazanec15 Dec 23, 2024

Choose a reason for hiding this comment

owenhalpert Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

shatejas Dec 24, 2024 • edited Loading

Choose a reason for hiding this comment

owenhalpert commented Dec 3, 2024 •

edited

Loading

owenhalpert Dec 23, 2024 •

edited

Loading

shatejas Dec 24, 2024 •

edited

Loading