-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching policy #4
base: main
Are you sure you want to change the base?
Conversation
…e in tiered caching
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
* GitHub history for details. | ||
*/ | ||
|
||
package org.opensearch.indices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these under suggested common/cache/tier package
*/ | ||
public class CheckDataResult { | ||
private boolean isAccepted; | ||
private String deniedReason; // null if the data was accepted, has an explanation if data was rejected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be Optional ?
|
||
import java.io.IOException; | ||
|
||
public interface CacheTierPolicy<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this generic T being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to make it explicit what type the BytesReference should be turned into. I could separate out the lines which transform BytesReference -> T as their own function in the interface, as all implementations should need that?
* @return A CheckDataResult object containing whether the data is admitted, and if it isn't, the reason. | ||
* @throws IOException if the input can't be deserialized to the right class. | ||
*/ | ||
CheckDataResult checkData(BytesReference data) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually now I think, this CheckDataResult is not really needed. Just returning a boolean should be fine as it avoids unnecessary allocation of new object which can add overhead to GC.
We can just do like cachingPolicy1 && cachingPolicy2....
* which is specified as a dynamic cluster-level setting. The threshold should be set to approximately | ||
* the time it takes to get a result from the cache tier. | ||
*/ | ||
public class IndicesRequestCacheTookTimePolicy implements CacheTierPolicy<QuerySearchResult> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets rename this to DiskTierTookTimePolicy
*/ | ||
public class IndicesRequestCacheTookTimePolicy implements CacheTierPolicy<QuerySearchResult> { | ||
public static final Setting<TimeValue> INDICES_REQUEST_CACHE_DISK_TOOKTIME_THRESHOLD_SETTING = Setting.positiveTimeSetting( | ||
"index.requests.cache.disk.tooktime.threshold", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we might rename this class to generic DiskTierTookTimePolicy, you will have to take setting prefix "indices.request.cache" as a parameter in constructor. It is done in a similar way in framework PR.
clusterSettings.addSettingsUpdateConsumer(INDICES_REQUEST_CACHE_DISK_TOOKTIME_THRESHOLD_SETTING, this::setThreshold); | ||
} | ||
|
||
public void setThreshold(TimeValue threshold) { // public so that we can manually set value in unit test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Unit test, making it protected should suffice?
public class IndicesRequestCacheDiskTierPolicy implements CacheTierPolicy<QuerySearchResult> { | ||
private final CacheTierPolicy<QuerySearchResult>[] policies; | ||
private final int numPolicies; | ||
private final boolean allowedByDefault; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? If there are no policies mentioned ie list is empty, it should be allowedByDefault?
* It short circuits these policies' checks together, in the provided order, to get one overall check. | ||
*/ | ||
public class IndicesRequestCacheDiskTierPolicy implements CacheTierPolicy<QuerySearchResult> { | ||
private final CacheTierPolicy<QuerySearchResult>[] policies; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use List<>
* This policy takes in an array containing an instance of all policies we want to apply to the IRC disk tier. | ||
* It short circuits these policies' checks together, in the provided order, to get one overall check. | ||
*/ | ||
public class IndicesRequestCacheDiskTierPolicy implements CacheTierPolicy<QuerySearchResult> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might not really need this class. As this logic can be fit inside DiskCachingTier itself, where we can maintain a list of policies object, and do these all checks which seems minimal.
Signed-off-by: Peter Alfonsi <[email protected]>
Description
Adds a framework for caching policies, which will applied to evicted in-memory cache entries to decide whether or not it should be accepted into the disk tier. For now, the only implemented policy is a took-time policy which rejects queries whose took time is too short to be worth caching. This threshold is set with a cluster-level setting.
However, it should be easy to add more policies. The cache tier uses a IndicesRequestCacheDiskTierPolicy object, which is passed an array of policies into its constructor. When this overall IndicesRequestCacheDiskTierPolicy checks data, it goes through each of the policies that were passed to it and checks the new entry in a short-circuiting way. To add more policies, just add an instance of them to the array passed into the overall policy. The interface should also be applicable for other caches or other tiers in the future.
The overall IndicesRequestCacheDiskTierPolicy class was unit tested with combinations of the took-time policy and a dummy policy. The tests checked loading QuerySearchResponse objects from BytesReference inputs, short-circuiting behavior, took-time checking, and default behavior when no policies are supplied.
Related Issues
N/A, part of larger tiered caching project
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.