Skip to content

Commit af49d70

Browse files
yeya24alanprotfriedrichgdamnever
authored
Cherry-pick fixes to release 1.15 branch (#5241)
* Batch Iterator optimization (#5237) * Batch Opmization Signed-off-by: Alan Protasio <[email protected]> * Add test bacj Signed-off-by: Alan Protasio <[email protected]> * Testing Multiples scrape intervals Signed-off-by: Alan Protasio <[email protected]> * no assimption Signed-off-by: Alan Protasio <[email protected]> * Using max chunk ts Signed-off-by: Alan Protasio <[email protected]> * test with scrape 10 Signed-off-by: Alan Protasio <[email protected]> * rename method Signed-off-by: Alan Protasio <[email protected]> * comments Signed-off-by: Alan Protasio <[email protected]> * using next Signed-off-by: Alan Protasio <[email protected]> * change test name Signed-off-by: Alan Protasio <[email protected]> * changelog/comments Signed-off-by: Alan Protasio <[email protected]> --------- Signed-off-by: Alan Protasio <[email protected]> Signed-off-by: Ben Ye <[email protected]> * Store Gateway: Convert metrics from summary to histograms (#5239) * Convert following metrics from summary to histogram cortex_bucket_store_series_blocks_queried cortex_bucket_store_series_data_fetched cortex_bucket_store_series_data_size_touched_bytes cortex_bucket_store_series_data_size_fetched_bytes cortex_bucket_store_series_data_touched cortex_bucket_store_series_result_series Signed-off-by: Friedrich Gonzalez <[email protected]> * Update changelog Signed-off-by: Friedrich Gonzalez <[email protected]> * fix changelog Signed-off-by: Friedrich Gonzalez <[email protected]> --------- Signed-off-by: Friedrich Gonzalez <[email protected]> Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> * Catch context error in the s3 bucket client (#5240) Signed-off-by: Xiaochao Dong (@damnever) <[email protected]> Signed-off-by: Ben Ye <[email protected]> * bump RC version Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: Alan Protasio <[email protected]> Signed-off-by: Ben Ye <[email protected]> Signed-off-by: Friedrich Gonzalez <[email protected]> Signed-off-by: Xiaochao Dong (@damnever) <[email protected]> Co-authored-by: Alan Protasio <[email protected]> Co-authored-by: Friedrich Gonzalez <[email protected]> Co-authored-by: Xiaochao Dong <[email protected]>
1 parent ebb1835 commit af49d70

File tree

12 files changed

+372
-46
lines changed

12 files changed

+372
-46
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
* [CHANGE] Tracing: Use the default OTEL trace sampler when `-tracing.otel.exporter-type` is set to `awsxray`. #5141
1212
* [CHANGE] Ingester partial error log line to debug level. #5192
1313
* [CHANGE] Change HTTP status code from 503/422 to 499 if a request is canceled. #5220
14+
* [CHANGE] Store gateways summary metrics have been converted to histograms `cortex_bucket_store_series_blocks_queried`, `cortex_bucket_store_series_data_fetched`, `cortex_bucket_store_series_data_size_touched_bytes`, `cortex_bucket_store_series_data_size_fetched_bytes`, `cortex_bucket_store_series_data_touched`, `cortex_bucket_store_series_result_series` #5239
1415
* [FEATURE] Querier/Query Frontend: support Prometheus /api/v1/status/buildinfo API. #4978
1516
* [FEATURE] Ingester: Add active series to all_user_stats page. #4972
1617
* [FEATURE] Ingester: Added `-blocks-storage.tsdb.head-chunks-write-queue-size` allowing to configure the size of the in-memory queue used before flushing chunks to the disk . #5000
@@ -44,6 +45,7 @@
4445
* [ENHANCEMENT] Distributor: Reuse byte slices when serializing requests from distributors to ingesters. #5193
4546
* [ENHANCEMENT] Query Frontend: Add number of chunks and samples fetched in query stats. #5198
4647
* [ENHANCEMENT] Implement grpc.Compressor.DecompressedSize for snappy to optimize memory allocations. #5213
48+
* [ENHANCEMENT] Querier: Batch Iterator optimization to prevent transversing it multiple times query ranges steps does not overlap. #5237
4749
* [BUGFIX] Updated `golang.org/x/net` dependency to fix CVE-2022-27664. #5008
4850
* [BUGFIX] Fix panic when otel and xray tracing is enabled. #5044
4951
* [BUGFIX] Fixed no compact block got grouped in shuffle sharding grouper. #5055
@@ -57,6 +59,7 @@
5759
* [BUGFIX] Compactor: Fix issue that shuffle sharding planner return error if block is under visit by other compactor. #5188
5860
* [BUGFIX] Fix S3 BucketWithRetries upload empty content issue #5217
5961
* [BUGFIX] Query Frontend: Disable `absent`, `absent_over_time` and `scalar` for vertical sharding. #5221
62+
* [BUGFIX] Catch context error in the s3 bucket client. #5240
6063

6164
## 1.14.0 2022-12-02
6265

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.15.0-rc.0
1+
1.15.0-rc.1

pkg/querier/batch/batch.go

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@ type iterator interface {
4343
// Seek or Next have returned true.
4444
AtTime() int64
4545

46+
// MaxCurrentChunkTime returns the max time on the current chunk.
47+
MaxCurrentChunkTime() int64
48+
4649
// Batch returns the current batch. Must only be called after Seek or Next
4750
// have returned true.
4851
Batch() promchunk.Batch
@@ -98,6 +101,17 @@ func (a *iteratorAdapter) Seek(t int64) bool {
98101
a.curr.Index++
99102
}
100103
return true
104+
} else if t <= a.underlying.MaxCurrentChunkTime() {
105+
// In this case, some timestamp inside the current underlying chunk can fulfill the seek.
106+
// In this case we will call next until we find the sample as it will be faster than calling
107+
// `a.underlying.Seek` directly as this would cause the iterator to start from the beginning of the chunk.
108+
// See: https://github.com/cortexproject/cortex/blob/f69452975877c67ac307709e5f60b8d20477764c/pkg/querier/batch/chunk.go#L26-L45
109+
// https://github.com/cortexproject/cortex/blob/f69452975877c67ac307709e5f60b8d20477764c/pkg/chunk/encoding/prometheus_chunk.go#L90-L95
110+
for a.Next() {
111+
if t <= a.curr.Timestamps[a.curr.Index] {
112+
return true
113+
}
114+
}
101115
}
102116
}
103117

pkg/querier/batch/batch_test.go

Lines changed: 54 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ func BenchmarkNewChunkMergeIterator_CreateAndIterate(b *testing.B) {
3535
scenario.duplicationFactor,
3636
scenario.enc.String())
3737

38-
chunks := createChunks(b, scenario.numChunks, scenario.numSamplesPerChunk, scenario.duplicationFactor, scenario.enc)
38+
chunks := createChunks(b, step, scenario.numChunks, scenario.numSamplesPerChunk, scenario.duplicationFactor, scenario.enc)
3939

4040
b.Run(name, func(b *testing.B) {
4141
b.ReportAllocs()
@@ -55,10 +55,59 @@ func BenchmarkNewChunkMergeIterator_CreateAndIterate(b *testing.B) {
5555
}
5656
}
5757

58+
func BenchmarkNewChunkMergeIterator_Seek(b *testing.B) {
59+
scenarios := []struct {
60+
numChunks int
61+
numSamplesPerChunk int
62+
duplicationFactor int
63+
seekStep time.Duration
64+
scrapeInterval time.Duration
65+
enc promchunk.Encoding
66+
}{
67+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 30 * time.Second, seekStep: 30 * time.Second / 2, enc: promchunk.PrometheusXorChunk},
68+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 30 * time.Second, seekStep: 30 * time.Second, enc: promchunk.PrometheusXorChunk},
69+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 30 * time.Second, seekStep: 30 * time.Second * 2, enc: promchunk.PrometheusXorChunk},
70+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 30 * time.Second, seekStep: 30 * time.Second * 10, enc: promchunk.PrometheusXorChunk},
71+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 30 * time.Second, seekStep: 30 * time.Second * 30, enc: promchunk.PrometheusXorChunk},
72+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 30 * time.Second, seekStep: 30 * time.Second * 50, enc: promchunk.PrometheusXorChunk},
73+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 30 * time.Second, seekStep: 30 * time.Second * 100, enc: promchunk.PrometheusXorChunk},
74+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 30 * time.Second, seekStep: 30 * time.Second * 200, enc: promchunk.PrometheusXorChunk},
75+
76+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 10 * time.Second, seekStep: 10 * time.Second / 2, enc: promchunk.PrometheusXorChunk},
77+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 10 * time.Second, seekStep: 10 * time.Second, enc: promchunk.PrometheusXorChunk},
78+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 10 * time.Second, seekStep: 10 * time.Second * 2, enc: promchunk.PrometheusXorChunk},
79+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 10 * time.Second, seekStep: 10 * time.Second * 10, enc: promchunk.PrometheusXorChunk},
80+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 10 * time.Second, seekStep: 10 * time.Second * 30, enc: promchunk.PrometheusXorChunk},
81+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 10 * time.Second, seekStep: 10 * time.Second * 50, enc: promchunk.PrometheusXorChunk},
82+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 10 * time.Second, seekStep: 10 * time.Second * 100, enc: promchunk.PrometheusXorChunk},
83+
{numChunks: 1000, numSamplesPerChunk: 120, duplicationFactor: 3, scrapeInterval: 10 * time.Second, seekStep: 10 * time.Second * 200, enc: promchunk.PrometheusXorChunk},
84+
}
85+
86+
for _, scenario := range scenarios {
87+
name := fmt.Sprintf("scrapeInterval %vs seekStep: %vs",
88+
scenario.scrapeInterval.Seconds(),
89+
scenario.seekStep.Seconds())
90+
91+
chunks := createChunks(b, scenario.scrapeInterval, scenario.numChunks, scenario.numSamplesPerChunk, scenario.duplicationFactor, scenario.enc)
92+
93+
b.Run(name, func(b *testing.B) {
94+
b.ReportAllocs()
95+
96+
for n := 0; n < b.N; n++ {
97+
it := NewChunkMergeIterator(chunks, 0, 0)
98+
i := int64(0)
99+
for it.Seek(i*scenario.seekStep.Milliseconds()) != chunkenc.ValNone {
100+
i++
101+
}
102+
}
103+
})
104+
}
105+
}
106+
58107
func TestSeekCorrectlyDealWithSinglePointChunks(t *testing.T) {
59108
t.Parallel()
60-
chunkOne := mkChunk(t, model.Time(1*step/time.Millisecond), 1, promchunk.PrometheusXorChunk)
61-
chunkTwo := mkChunk(t, model.Time(10*step/time.Millisecond), 1, promchunk.PrometheusXorChunk)
109+
chunkOne := mkChunk(t, step, model.Time(1*step/time.Millisecond), 1, promchunk.PrometheusXorChunk)
110+
chunkTwo := mkChunk(t, step, model.Time(10*step/time.Millisecond), 1, promchunk.PrometheusXorChunk)
62111
chunks := []chunk.Chunk{chunkOne, chunkTwo}
63112

64113
sut := NewChunkMergeIterator(chunks, 0, 0)
@@ -72,13 +121,13 @@ func TestSeekCorrectlyDealWithSinglePointChunks(t *testing.T) {
72121
require.Equal(t, int64(1*time.Second/time.Millisecond), actual)
73122
}
74123

75-
func createChunks(b *testing.B, numChunks, numSamplesPerChunk, duplicationFactor int, enc promchunk.Encoding) []chunk.Chunk {
124+
func createChunks(b *testing.B, step time.Duration, numChunks, numSamplesPerChunk, duplicationFactor int, enc promchunk.Encoding) []chunk.Chunk {
76125
result := make([]chunk.Chunk, 0, numChunks)
77126

78127
for d := 0; d < duplicationFactor; d++ {
79128
for c := 0; c < numChunks; c++ {
80129
minTime := step * time.Duration(c*numSamplesPerChunk)
81-
result = append(result, mkChunk(b, model.Time(minTime.Milliseconds()), numSamplesPerChunk, enc))
130+
result = append(result, mkChunk(b, step, model.Time(minTime.Milliseconds()), numSamplesPerChunk, enc))
82131
}
83132
}
84133

pkg/querier/batch/chunk.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ func (i *chunkIterator) reset(chunk GenericChunk) {
2121
i.batch.Index = 0
2222
}
2323

24+
func (i *chunkIterator) MaxCurrentChunkTime() int64 {
25+
return i.chunk.MaxTime
26+
}
27+
2428
// Seek advances the iterator forward to the value at or after
2529
// the given timestamp.
2630
func (i *chunkIterator) Seek(t int64, size int) bool {

pkg/querier/batch/chunk_test.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ func forEncodings(t *testing.T, f func(t *testing.T, enc promchunk.Encoding)) {
4444
}
4545
}
4646

47-
func mkChunk(t require.TestingT, from model.Time, points int, enc promchunk.Encoding) chunk.Chunk {
47+
func mkChunk(t require.TestingT, step time.Duration, from model.Time, points int, enc promchunk.Encoding) chunk.Chunk {
4848
metric := labels.Labels{
4949
{Name: model.MetricNameLabel, Value: "foo"},
5050
}
@@ -65,7 +65,7 @@ func mkChunk(t require.TestingT, from model.Time, points int, enc promchunk.Enco
6565
}
6666

6767
func mkGenericChunk(t require.TestingT, from model.Time, points int, enc promchunk.Encoding) GenericChunk {
68-
ck := mkChunk(t, from, points, enc)
68+
ck := mkChunk(t, step, from, points, enc)
6969
return NewGenericChunk(int64(ck.From), int64(ck.Through), ck.Data.NewIterator)
7070
}
7171

pkg/querier/batch/merge.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,14 @@ func (c *mergeIterator) AtTime() int64 {
128128
return c.batches[0].Timestamps[0]
129129
}
130130

131+
func (c *mergeIterator) MaxCurrentChunkTime() int64 {
132+
if len(c.h) < 1 {
133+
return -1
134+
}
135+
136+
return c.h[0].MaxCurrentChunkTime()
137+
}
138+
131139
func (c *mergeIterator) Batch() promchunk.Batch {
132140
return c.batches[0]
133141
}

pkg/querier/batch/non_overlapping.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ func (it *nonOverlappingIterator) Seek(t int64, size int) bool {
3232
}
3333
}
3434

35+
func (it *nonOverlappingIterator) MaxCurrentChunkTime() int64 {
36+
return it.iter.MaxCurrentChunkTime()
37+
}
38+
3539
func (it *nonOverlappingIterator) Next(size int) bool {
3640
for {
3741
if it.iter.Next(size) {

pkg/storage/bucket/s3/bucket_client.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ func (b *BucketWithRetries) retry(ctx context.Context, f func() error, operation
124124
level.Error(b.logger).Log("msg", "bucket operation fail after retries", "err", lastErr, "operation", operationInfo)
125125
return lastErr
126126
}
127-
return nil
127+
return retries.Err()
128128
}
129129

130130
func (b *BucketWithRetries) Name() string {

pkg/storage/bucket/s3/bucket_client_test.go

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,25 @@ func TestBucketWithRetries_UploadFailed(t *testing.T) {
7373
require.ErrorContains(t, err, "failed upload: ")
7474
}
7575

76+
func TestBucketWithRetries_ContextCanceled(t *testing.T) {
77+
t.Parallel()
78+
79+
m := mockBucket{}
80+
b := BucketWithRetries{
81+
logger: log.NewNopLogger(),
82+
bucket: &m,
83+
operationRetries: 5,
84+
retryMinBackoff: 10 * time.Millisecond,
85+
retryMaxBackoff: time.Second,
86+
}
87+
88+
ctx, cancel := context.WithCancel(context.Background())
89+
cancel()
90+
obj, err := b.GetRange(ctx, "dummy", 0, 10)
91+
require.ErrorIs(t, err, context.Canceled)
92+
require.Nil(t, obj)
93+
}
94+
7695
type fakeReader struct {
7796
}
7897

@@ -121,7 +140,7 @@ func (m *mockBucket) Get(ctx context.Context, name string) (io.ReadCloser, error
121140

122141
// GetRange mocks objstore.Bucket.GetRange()
123142
func (m *mockBucket) GetRange(ctx context.Context, name string, off, length int64) (io.ReadCloser, error) {
124-
return nil, nil
143+
return io.NopCloser(bytes.NewBuffer(bytes.Repeat([]byte{1}, int(length)))), nil
125144
}
126145

127146
// Exists mocks objstore.Bucket.Exists()

pkg/storegateway/bucket_store_metrics.go

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -214,16 +214,16 @@ func (m *BucketStoreMetrics) Collect(out chan<- prometheus.Metric) {
214214

215215
data.SendSumOfGaugesPerUser(out, m.blocksLoaded, "thanos_bucket_store_blocks_loaded")
216216

217-
data.SendSumOfSummariesWithLabels(out, m.seriesDataTouched, "thanos_bucket_store_series_data_touched", "data_type")
218-
data.SendSumOfSummariesWithLabels(out, m.seriesDataFetched, "thanos_bucket_store_series_data_fetched", "data_type")
219-
data.SendSumOfSummariesWithLabels(out, m.seriesDataSizeTouched, "thanos_bucket_store_series_data_size_touched_bytes", "data_type")
220-
data.SendSumOfSummariesWithLabels(out, m.seriesDataSizeFetched, "thanos_bucket_store_series_data_size_fetched_bytes", "data_type")
221-
data.SendSumOfSummariesWithLabels(out, m.seriesBlocksQueried, "thanos_bucket_store_series_blocks_queried")
217+
data.SendSumOfHistogramsWithLabels(out, m.seriesDataTouched, "thanos_bucket_store_series_data_touched", "data_type")
218+
data.SendSumOfHistogramsWithLabels(out, m.seriesDataFetched, "thanos_bucket_store_series_data_fetched", "data_type")
219+
data.SendSumOfHistogramsWithLabels(out, m.seriesDataSizeTouched, "thanos_bucket_store_series_data_size_touched_bytes", "data_type")
220+
data.SendSumOfHistogramsWithLabels(out, m.seriesDataSizeFetched, "thanos_bucket_store_series_data_size_fetched_bytes", "data_type")
221+
data.SendSumOfHistogramsWithLabels(out, m.seriesBlocksQueried, "thanos_bucket_store_series_blocks_queried")
222222

223223
data.SendSumOfHistograms(out, m.seriesGetAllDuration, "thanos_bucket_store_series_get_all_duration_seconds")
224224
data.SendSumOfHistograms(out, m.seriesMergeDuration, "thanos_bucket_store_series_merge_duration_seconds")
225225
data.SendSumOfCounters(out, m.seriesRefetches, "thanos_bucket_store_series_refetches_total")
226-
data.SendSumOfSummaries(out, m.resultSeriesCount, "thanos_bucket_store_series_result_series")
226+
data.SendSumOfHistograms(out, m.resultSeriesCount, "thanos_bucket_store_series_result_series")
227227
data.SendSumOfCounters(out, m.queriesDropped, "thanos_bucket_store_queries_dropped_total")
228228

229229
data.SendSumOfCountersWithLabels(out, m.cachedPostingsCompressions, "thanos_bucket_store_cached_postings_compressions_total", "op")

0 commit comments

Comments
 (0)