Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache plugin and tiered cache documentation #7052

Merged
merged 50 commits into from
May 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
c37276e
Add cache plugin and tiered cache documentation
sgup432 Apr 29, 2024
56ced9e
Apply suggestions from code review
Naarcha-AWS May 6, 2024
702114c
Apply suggestions from code review
Naarcha-AWS May 6, 2024
6701a1c
Update index.md
Naarcha-AWS May 6, 2024
0b13f6f
Update concepts.md (#7049)
Naarcha-AWS Apr 30, 2024
1082316
Update alert-finding-api.md (#7036)
hdhalter Apr 30, 2024
443baf4
[DOC] Add urldecode processor documentation (#5994)
vagimeli Apr 30, 2024
6b856d1
clarify "contexts" (#7063)
smacrakis Apr 30, 2024
da33110
Add documentation for indices.replication.max_bytes_per_sec (#7048)
mch2 Apr 30, 2024
69604d6
Update concurrent search docs with composite aggs updates (#7043)
jed326 Apr 30, 2024
833521c
add query_by_tokens option in Neural Sparse Search (#7040)
zhichao-aws Apr 30, 2024
8cde6d3
Update bundled JDK version (April 2024 Patch releases) (#7031)
reta Apr 30, 2024
1462661
Updating documentation for ignore_unavailable search request paramete…
jainankitk Apr 30, 2024
d5e62ae
Add documentation for hardware-accelerated compression codecs. (#6841)
mulugetam Apr 30, 2024
1a7ffaf
Correct the release version for the upgrade API feature (#6955)
peternied Apr 30, 2024
2f24fb6
[DOC] Add user agent processor documentation (#5995)
vagimeli Apr 30, 2024
b0c4a1e
Add understanding results page (#6984)
Naarcha-AWS May 1, 2024
a8e2f71
Add cluster setting for filter rewrite optimization in aggregation (#…
bowenlan-amzn May 1, 2024
1bf1f00
remove-has-childreren (#7072)
hdhalter May 1, 2024
7e08cb8
Update csp configuration instructions for OSD (#7026)
tianleh May 1, 2024
d460654
adding do_not_fail_on_forbidden section to docs #4896 (#6958)
AntonEliatra May 1, 2024
176452b
feat: fix overlap rate param (#7045)
IanMenendez May 1, 2024
9544aaa
adding kibana_server role specification and explanation #4094 (#7066)
AntonEliatra May 2, 2024
3dc5192
expanding on TrustStore and KeyStore #4578 #4060 (#7015)
AntonEliatra May 2, 2024
17766a6
Add documentation for primary rebalancing (#7059)
Arpit-Bandejiya May 2, 2024
7a8d41f
Update stats.md (#7087)
Naarcha-AWS May 3, 2024
f05e14a
navbar yml update. (#7091)
nateynateynate May 3, 2024
a574c23
Missed a few includes that are referred to by the navbar. (#7093)
nateynateynate May 3, 2024
1f2a301
Add missing OPENSEARCH_INITIAL_ADMIN_PASSWORD for both apt/deb and yu…
drewmiranda-gl May 6, 2024
f6846e1
Update tiered-cache.md
Naarcha-AWS May 6, 2024
8232d46
Apply suggestions from code review
Naarcha-AWS May 7, 2024
6cec39a
Apply suggestions from code review
Naarcha-AWS May 7, 2024
af6f2db
Update _search-plugins/caching/index.md
sgup432 May 8, 2024
d19124f
Update _search-plugins/caching/index.md
sgup432 May 8, 2024
583fad7
Update _search-plugins/caching/index.md
sgup432 May 8, 2024
ec51008
Update _search-plugins/caching/index.md
sgup432 May 8, 2024
cc352a8
Update _search-plugins/caching/index.md
sgup432 May 8, 2024
5bef385
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
44bb9ec
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
b01a0fd
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
596c5c1
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
5750948
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
078ec0d
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
b87bfb7
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
1e78284
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
69c4f48
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
6e4c5d7
Update _search-plugins/caching/tiered-cache.md
sgup432 May 8, 2024
09e7da0
Merge branch 'main' into tiered_cache_doc2
sgup432 May 8, 2024
bfe1fbb
Addressing comments
sgup432 May 8, 2024
1186c37
Apply suggestions from code review
Naarcha-AWS May 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions _search-plugins/caching/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
layout: default
title: Caching
parent: Improving search performance
has_children: true
nav_order: 100
---

# Caching

OpenSearch relies on different on-heap cache types to accelerate data retrieval, providing significant improvement in search latency. However, cache size is limited by the amount of memory available on a node. When processing a larger dataset that can potentially be cached, the cache size limit can result in many pieces of data either being removed from the cache or not being cached, causing an incomplete query. This impacts performance because OpenSearch needs to process the query again, causing high resource consumption.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Understanding how your data uses the cache can help improve your cluster's performance and prevent you from using too much memory, reducing the cost of querying your data.

## Supported on-heap cache types

OpenSearch supports the following on-heap cache types:

- **Request cache**: Caches the local results on each shard. This allows frequently used and potentially resource-heavy search requests to return results almost instantaneously.
- **Query cache**: Caches common data from similar queries at the shard level. The query cache is more granular than the request cache and can cache data to be reused in different queries.
- **Field data cache**: Caches field data and global ordinals, which are both used to support aggregations on certain field types.

## Additional cache stores
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

**Introduced 2.14**
{: .label .label-purple }

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024).
{: .warning}

In addition to existing custom OpenSearch on-heap cache stores, cache plugins provide the following cache stores:

- **Disk cache**: Stores the precomputed result of a query on disk. Use a disk cache to cache much larger datasets, provided that the disk's latency is within an acceptable range.
- **Tiered cache**: A multi-level cache in which each tier has its own characteristics and performance levels. For example, a tiered cache can contain both on-heap and disk tiers. By combining different tiers, you can achieve a balance between cache performance and size. To learn more, see [Tiered cache]({{site.url}}{{site.baseurl}}/search-plugins/caching/tiered-cache/).

In OpenSearch 2.14, the request cache is integrated with cache plugins. You can use a tiered or disk cache as a request-level cache.
{: .note}
92 changes: 92 additions & 0 deletions _search-plugins/caching/tiered-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
layout: default
title: Tiered cache
parent: Caching
grand_parent: Improving search performance
nav_order: 10
---

# Tiered cache

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024).
{: .warning}

A tiered cache is a multi-level cache in which each tier has its own characteristics and performance levels. By combining different tiers, you can achieve a balance between cache performance and size.

## Types of tiered caches

OpenSearch provides an implementation of a `_tiered` spillover `cache_`. This implementation spills any items removed from the upper tiers to the lower tiers of cache. The upper tier, such as the on-heap tier, is smaller in size but offers better latency. The lower tier, such as the disk cache, is larger in size but slower in terms of latency. OpenSearch offers both on-heap and disk tiers.

## Enabling a tiered cache

To enable a tiered cache, configure the following setting in `opensearch.yml`:

```yaml
opensearch.experimental.feature.pluggable.caching.enabled: true
```
{% include copy.html %}

For more information about ways to enable experimental features, see [Experimental feature flags]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).

## Installing required plugins

To use tiered caching, install a tiered cache plugin. As of OpenSearch 2.13, the only available cache plugin is the `cache-ehcache` plugin. This plugin provides a disk cache implementation that can be used as a disk tier within a tiered cache. For more information about installing non-bundled plugins, see [Additional plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/#additional-plugins).

A tiered cache will fail to initialize if the `cache-ehcache` plugin is not installed or if disk cache properties are not set.
{: .warning}

## Tiered cache settings

In OpenSearch 2.14, a request cache can be used in a tiered cache. To begin, configure the following settings in the `opensearch.yml` file.

### Cache store name

To use the OpenSearch-provided tiered spillover cache implementation, set the cache store name to `tiered_spillover`, as shown in the following example:

```yaml
indices.request.cache.store.name: tiered_spillover: true
```
{% include copy.html %}

### Setting on-heap and disk store tiers

Set the on-heap and disk store tiers to `opensearch_onheap` and `ehcache_disk`, as shown in the following example:

```yaml
indices.request.cache.tiered_spillover.onheap.store.name: opensearch_onheap
indices.request.cache.tiered_spillover.disk.store.name: ehcache_disk
```
The `opensearch_onheap` setting uses the built-in on-heap cache available in OpenSearch.

The `ehcache_disk` setting is the disk cache implementation from [Ehcache](https://www.ehcache.org/) and requires installing the `cache-ehcache` plugin.

Check failure on line 61 in _search-plugins/caching/tiered-cache.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Ehcache. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Ehcache. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/caching/tiered-cache.md", "range": {"start": {"line": 61, "column": 67}}}, "severity": "ERROR"}

{% include copy.html %}

### Configuring on-heap and disk stores

The following table lists the cache store settings for the `opensearch_onheap` store.

Setting | Data type | Default | Description
:--- | :--- | :--- | :---
`indices.request.cache.opensearch_onheap.size` | Percentage | 1% of the heap size | The size of the on-heap cache. Optional.
`indices.request.cache.opensearch_onheap.expire` | Time unit | `MAX_VALUE` (disabled) | Specifies a time-to-live (TTL) for the cached results. Optional.

The following table lists the disk cache store settings for the `ehcache_disk` store.

Setting | Data type | Default | Description
:--- | :--- | :--- | :---
`indices.request.cache.ehcache_disk.max_size_in_bytes` | Long | `1073741824` (1 GB) | Defines the size of the disk cache. Optional.
`indices.request.cache.ehcache_disk.storage.path` | String | `""` | Defines the storage path for the disk cache. Required.
`indices.request.cache.ehcache_disk.expire_after_access` | Time unit | `MAX_VALUE` (disabled) | Specifies a TTL for the cached results. Optional.
`indices.request.cache.ehcache_disk.alias` | String | `ehcacheDiskCache#INDICES_REQUEST_CACHE` | Specifies an alias for the disk cache. Optional.
`indices.request.cache.ehcache_disk.segments` | Integer | `16` | Defines the number of segments into which the disk cache is separated. Used for concurrency. Optional.
`indices.request.cache.ehcache_disk.concurrency` | Integer | `1` | Defines the number of distinct write queues created for the disk store, where a group of segments shares a write queue. Optional.

### Additional settings for the `tiered_spillover` store

The following table lists additional settings for the `tiered_spillover` store setting.

Setting | Data type | Default | Description
:--- | :--- | :--- | :---
`indices.request.cache.tiered_spillover.disk.store.policies.took_time.threshold` | Time unit | `10ms` | A policy used to determine whether to cache a query into a disk cache based on its took time. This is a dynamic setting. Optional.
`indices.request.cache.tiered_spillover.disk.store.enabled` | Boolean | `True` | Enables or disables the disk cache dynamically within a tiered spillover cache. Note: After disabling a disk cache, entries are not removed automatically and requires the cache to be manually cleared. Optional.
Loading