-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Providing api to save querycache to disk #16822
Comments
@jainankitk @kiranprakash154 @sgup432 FYI, seems like some overlap with issues you've worked on |
I've been working on a proof of concept for plugging in different cache implementations to the query cache, including the TieredSpilloverCache which has a disk tier. TBD on whether the disk tier helps performance or not - the query cache entries are very large so there's a lot of overhead to serialize them. I should have numbers on this next week. Currently the TSC doesn't persist its disk values after node restart. But if the PoC benchmark is promising it could make sense to make this change. However if the serialization/deserialization overhead is too much to actually use the disk values while the node is running, it'd probably make more sense to add some other way to dump all the contents to disk at node shutdown, and read them back on startup, and not use the TSC for this. |
@peteralfonsi, In some query-sensitive scenarios,, we have proven that |
Hey @kkewwei, appreciate the interest. I've wrapped up my proof of concept earlier this week. It looks like the disk tier does not make sense here. We had previously seen significant gain by adding a disk tier to the request cache, which has key/value pairs around 1-5 KB. The query cache has much larger entries - in my nyc_taxis based workload, around 3 MB each. It seems like Ehcache (the caching library backing the disk tier used in TieredSpilloverCache) as well as deserializing the DocIdSet objects from disk cause a lot of overhead when the values are this large. Ultimately performance was worsened. Here's an annotated flamegraph showing the overhead: and a graph showing p90 latencies on my benchmark for 4 different settings of query cache: the original, QC disabled, TSC-backed QC, and Caffeine-backed QC. Even though using the TieredSpilloverCache doesn't make sense, I do think dumping all or at least some of the query cache entries to disk at shutdown time and reading them back in at startup could work. One issue we'll encounter is serializing all the different implementations of |
@peteralfonsi, I aggree with you, the commonly used |
@peteralfonsi, to avoid performance affect, if we can serializing/deserializing the DocIdSet objects in opensearch, just store the binary to disk? |
@kkewwei I think this unfortunately wouldn't help. The serialization step is quite fast already. It's parsing all the bytes back into ints that takes too long, so changing the format they're stored in won't matter much. (Note this isn't using the default slow Java |
Is your feature request related to a problem? Please describe
It's widely acknowledged that the querycache plays a significant role in queries. However, when a node restarts, the os has to rebuild the querycache, which is a time-consuming process and can have a big impact on query performance.
1.Time-consuming to rebuild the querycache.
2.Query took(p99) becomes longer after the cluster restarting
Describe the solution you'd like
It is important for some query-sensitive indices to keep query performance, if we could provide api to save querycache to disk? when we begin to restart the node/cluster, we can first save the querycache to the disk.
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: