Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching policy #4

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Caching policy #4

wants to merge 14 commits into from

Conversation

peteralfonsi
Copy link
Owner

@peteralfonsi peteralfonsi commented Oct 5, 2023

Description

Adds a framework for caching policies, which will applied to evicted in-memory cache entries to decide whether or not it should be accepted into the disk tier. For now, the only implemented policy is a took-time policy which rejects queries whose took time is too short to be worth caching. This threshold is set with a cluster-level setting.

However, it should be easy to add more policies. The cache tier uses a IndicesRequestCacheDiskTierPolicy object, which is passed an array of policies into its constructor. When this overall IndicesRequestCacheDiskTierPolicy checks data, it goes through each of the policies that were passed to it and checks the new entry in a short-circuiting way. To add more policies, just add an instance of them to the array passed into the overall policy. The interface should also be applicable for other caches or other tiers in the future.

The overall IndicesRequestCacheDiskTierPolicy class was unit tested with combinations of the took-time policy and a dummy policy. The tests checked loading QuerySearchResponse objects from BytesReference inputs, short-circuiting behavior, took-time checking, and default behavior when no policies are supplied.

Related Issues

N/A, part of larger tiered caching project

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • [N/A] Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

* GitHub history for details.
*/

package org.opensearch.indices;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these under suggested common/cache/tier package

*/
public class CheckDataResult {
private boolean isAccepted;
private String deniedReason; // null if the data was accepted, has an explanation if data was rejected
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be Optional ?


import java.io.IOException;

public interface CacheTierPolicy<T> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this generic T being used?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to make it explicit what type the BytesReference should be turned into. I could separate out the lines which transform BytesReference -> T as their own function in the interface, as all implementations should need that?

* @return A CheckDataResult object containing whether the data is admitted, and if it isn't, the reason.
* @throws IOException if the input can't be deserialized to the right class.
*/
CheckDataResult checkData(BytesReference data) throws IOException;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now I think, this CheckDataResult is not really needed. Just returning a boolean should be fine as it avoids unnecessary allocation of new object which can add overhead to GC.

We can just do like cachingPolicy1 && cachingPolicy2....

* which is specified as a dynamic cluster-level setting. The threshold should be set to approximately
* the time it takes to get a result from the cache tier.
*/
public class IndicesRequestCacheTookTimePolicy implements CacheTierPolicy<QuerySearchResult> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets rename this to DiskTierTookTimePolicy

*/
public class IndicesRequestCacheTookTimePolicy implements CacheTierPolicy<QuerySearchResult> {
public static final Setting<TimeValue> INDICES_REQUEST_CACHE_DISK_TOOKTIME_THRESHOLD_SETTING = Setting.positiveTimeSetting(
"index.requests.cache.disk.tooktime.threshold",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we might rename this class to generic DiskTierTookTimePolicy, you will have to take setting prefix "indices.request.cache" as a parameter in constructor. It is done in a similar way in framework PR.

clusterSettings.addSettingsUpdateConsumer(INDICES_REQUEST_CACHE_DISK_TOOKTIME_THRESHOLD_SETTING, this::setThreshold);
}

public void setThreshold(TimeValue threshold) { // public so that we can manually set value in unit test
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Unit test, making it protected should suffice?

public class IndicesRequestCacheDiskTierPolicy implements CacheTierPolicy<QuerySearchResult> {
private final CacheTierPolicy<QuerySearchResult>[] policies;
private final int numPolicies;
private final boolean allowedByDefault;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? If there are no policies mentioned ie list is empty, it should be allowedByDefault?

* It short circuits these policies' checks together, in the provided order, to get one overall check.
*/
public class IndicesRequestCacheDiskTierPolicy implements CacheTierPolicy<QuerySearchResult> {
private final CacheTierPolicy<QuerySearchResult>[] policies;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use List<>

* This policy takes in an array containing an instance of all policies we want to apply to the IRC disk tier.
* It short circuits these policies' checks together, in the provided order, to get one overall check.
*/
public class IndicesRequestCacheDiskTierPolicy implements CacheTierPolicy<QuerySearchResult> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not really need this class. As this logic can be fit inside DiskCachingTier itself, where we can maintain a list of policies object, and do these all checks which seems minimal.

Signed-off-by: Peter Alfonsi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants