This repository has been archived by the owner on Jun 29, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
Segment query cache
sdesikan6 edited this page Feb 11, 2018
·
1 revision
This feature caches Query Results on a per Segment basis. Use the spark.sparklinedata.spmd.segment.query.cache
setting to control Query Cache behavior. For example, here is a sample setting:
spark.sparklinedata.spmd.segment.query.cache={"useCache" : true, "sizeInMBytes" : 1024,"expireAfterSeconds" : 60,"resultSizeMax" : 20000}
Parameter | Description |
---|---|
useCache | turn on or off caching |
sizeInMBytes | total size of the Query Cache in MB |
expireAfterSeconds | the duration after which a query result is evictable |
resultSizeMax | for GroupBy and Search Queries only if the number of rows in the result is below this size are they cached. The result size is estimated in a very conservative way: as the product of the cardinalities of the dimensions involved. |
- under the covers the Caffeine Library is used for caching
- Caffeine is configured to evict based on query usage and the Query Result memory footprint.
- Spark's CachedBatch mechanism is used to represent Query results in the cache.
- Currently only GroupBy, Search and Timeseries Queries are considered for caching.
The following metrics are exposed:
Metric | Description |
---|---|
Hit-Count | number of times Result found in Cache |
Miss-Count | number of times Result not found in Cache |
Hit-Rate | Percentage of Hits |
Eviction-Count | number of times a Result eviction happened because of memory pressure |
Eviction-Weight | total memory size of Results evicted. |
Result Serializes | Time to serialize Query Results, the histogram of the time taken, and the Histogram of the Query Result Size |
Result Deserializes | Time to deserialize Query Results, the histogram of the time taken. |