-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate circuitbreaker for region calls #1543
Changes from 5 commits
01fadf5
7ab69c9
c325d48
1d718a1
bf5b945
01926d5
a610a67
53d4f96
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems you forgot to update L2218 and L2283? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Originally I thought it is not needed as they low qps and didn't want to affect GC unnecessary, but it is probably better to throttle them as well. Just added them. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -69,6 +69,7 @@ import ( | |||||
pd "github.com/tikv/pd/client" | ||||||
"github.com/tikv/pd/client/clients/router" | ||||||
"github.com/tikv/pd/client/opt" | ||||||
"github.com/tikv/pd/client/pkg/circuitbreaker" | ||||||
"go.uber.org/zap" | ||||||
"google.golang.org/grpc/codes" | ||||||
"google.golang.org/grpc/status" | ||||||
|
@@ -133,6 +134,13 @@ func nextTTL(ts int64) int64 { | |||||
return ts + regionCacheTTLSec + jitter | ||||||
} | ||||||
|
||||||
var pdRegionMetaCircuitBreaker = circuitbreaker.NewCircuitBreaker("region-meta", circuitbreaker.AlwaysClosedSettings) | ||||||
|
||||||
// ChangePdRegionMetaCircuitBreakerSettings changes circuit breaker changes for region metadata calls | ||||||
func ChangePdRegionMetaCircuitBreakerSettings(apply func(config *circuitbreaker.Settings)) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. renamed |
||||||
pdRegionMetaCircuitBreaker.ChangeSettings(apply) | ||||||
} | ||||||
|
||||||
// nextTTLWithoutJitter is used for test. | ||||||
func nextTTLWithoutJitter(ts int64) int64 { | ||||||
return ts + regionCacheTTLSec | ||||||
|
@@ -2070,10 +2078,11 @@ func (c *RegionCache) loadRegion(bo *retry.Backoffer, key []byte, isEndKey bool, | |||||
start := time.Now() | ||||||
var reg *router.Region | ||||||
var err error | ||||||
cbCtx := circuitbreaker.WithCircuitBreaker(ctx, pdRegionMetaCircuitBreaker) | ||||||
if searchPrev { | ||||||
reg, err = c.pdClient.GetPrevRegion(ctx, key, opts...) | ||||||
reg, err = c.pdClient.GetPrevRegion(cbCtx, key, opts...) | ||||||
} else { | ||||||
reg, err = c.pdClient.GetRegion(ctx, key, opts...) | ||||||
reg, err = c.pdClient.GetRegion(cbCtx, key, opts...) | ||||||
} | ||||||
metrics.LoadRegionCacheHistogramWhenCacheMiss.Observe(time.Since(start).Seconds()) | ||||||
if err != nil { | ||||||
|
@@ -2121,7 +2130,8 @@ func (c *RegionCache) loadRegionByID(bo *retry.Backoffer, regionID uint64) (*Reg | |||||
} | ||||||
} | ||||||
start := time.Now() | ||||||
reg, err := c.pdClient.GetRegionByID(ctx, regionID, opt.WithBuckets()) | ||||||
cbCtx := circuitbreaker.WithCircuitBreaker(ctx, pdRegionMetaCircuitBreaker) | ||||||
reg, err := c.pdClient.GetRegionByID(cbCtx, regionID, opt.WithBuckets()) | ||||||
metrics.LoadRegionCacheHistogramWithRegionByID.Observe(time.Since(start).Seconds()) | ||||||
if err != nil { | ||||||
metrics.RegionCacheCounterWithGetRegionByIDError.Inc() | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,6 +45,7 @@ import ( | |
"github.com/tikv/client-go/v2/oracle" | ||
"github.com/tikv/client-go/v2/tikvrpc" | ||
pd "github.com/tikv/pd/client" | ||
"github.com/tikv/pd/client/pkg/circuitbreaker" | ||
) | ||
|
||
// RPCContext contains data that is needed to send RPC to a region. | ||
|
@@ -197,6 +198,11 @@ func SetRegionCacheTTLSec(t int64) { | |
locate.SetRegionCacheTTLSec(t) | ||
} | ||
|
||
// ChangePdRegionMetaCircuitBreakerSettings changes circuit breaker settings for region metadata calls | ||
func ChangePdRegionMetaCircuitBreakerSettings(apply func(config *circuitbreaker.Settings)) { | ||
locate.ChangePdRegionMetaCircuitBreakerSettings(apply) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
} | ||
|
||
// SetRegionCacheTTLWithJitter sets region cache TTL with jitter. The real TTL is in range of [base, base+jitter). | ||
func SetRegionCacheTTLWithJitter(base int64, jitter int64) { | ||
locate.SetRegionCacheTTLWithJitter(base, jitter) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about referring to
config/retry/backoff.go
? We can use NewCircuitBreakerWithVars or something like that and only add a var for the error rate threshold. The other configs can be hard coded and we don't need to support changing them through the config file. @Tema @MyonKeminta @okJiang WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is problematic. Backoff is a request scope object so we can create it from vars all the time, however CircuitBreaker is an instance scope, which is created on startup and aggregates stats across all requests. Any concerns from propagating error rate directly from system variables like pingcap/tidb#58737 proposes?
As for all other configs, I think it is still make sense to keep them configurable as e.g.
MinQPSToOpen
could depend on the cluster size and might need tuning per workload. But I don't feel too strong about that and can rollback the last change. Otherwise, I will move them to PDClient section instead of TiKVClient as they belong there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about propagating the error rate instead of the whole
Settings
? As for the other configs, as we discussed before, we can make them hard-coded temporarily. If we do need to change them, then we can make them configurable. Right now, it's better to hide them to reduce the complexity.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't propagate the full settings. Circuit breaker has a callback to modify any part of the settings:
so in pingcap/tidb#58737 I propagate only error rate through system variables:
I would like to keep this API as it allows easily to propagate any other part of the setting if needed, but not required so.
I will remove everything from TiKVClient config back to hardcoded values and keep propagating only error rate system variable as it was in the original version of this PR. Let me know if there is still any concern with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, /cc @okJiang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed all configurations except sysvar from this PR and pingcap/tidb#58737