Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

服务几天没访问后就访问不了了 #1417

Closed
hzxgo opened this issue Jul 1, 2024 · 2 comments
Closed

服务几天没访问后就访问不了了 #1417

hzxgo opened this issue Jul 1, 2024 · 2 comments

Comments

@hzxgo
Copy link

hzxgo commented Jul 1, 2024

Describe the bug

我有个测试环境,平时基本无请求量,经常遇到过几天后就访问不了了(etcd中已无注册信息,所以找不到该微服务),重启后问题就能解决。但是生产环境没遇到过这样的问题,今日测试环境又复现了所以提个issue。

本次bug时间点及Log:

  1. 2024/06/24 微服务启动
    1 2024/06/24 15:43:12.401003 server.go:83: [Info] KITEX: server listen at addr=172.31.251.191:10000
    2 2024/06/24 15:43:13.515344 etcd_registry.go:290: [Info] start keepalive lease 6c8d8e3743d9e86f for etcd registry

  2. 2024/06/30 微服务注册信息从etcd中丢失,导致后续其他服务访问不了该微服务
    01 2024/06/30 01:00:48.568084 etcd_registry.go:223: [Info] keep register service kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000
    02 {"level":"warn","ts":"2024-06-30T01:00:51.571+0800","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012728c0/mydomain.etcd.cbs:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    03 2024/06/30 01:00:51.577406 etcd_registry.go:227: [Warn] keep register grant lease kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000 failed with err: context deadline exceeded
    04 2024/06/30 01:01:01.624931 etcd_registry.go:223: [Info] keep register service kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000
    05 {"level":"warn","ts":"2024-06-30T01:01:04.578+0800","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012728c0/mydomain.etcd.cbs:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    06 2024/06/30 01:01:04.579596 etcd_registry.go:234: [Warn] keep register put kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000 failed with err: context deadline exceeded
    07 2024/06/30 01:01:14.626367 etcd_registry.go:223: [Info] keep register service kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000
    08 {"level":"warn","ts":"2024-06-30T01:01:17.580+0800","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012728c0/mydomain.etcd.cbs:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    09 2024/06/30 01:01:17.580799 etcd_registry.go:234: [Warn] keep register put kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000 failed with err: context deadline exceeded
    10 2024/06/30 01:01:27.627743 etcd_registry.go:223: [Info] keep register service kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000
    11 {"level":"warn","ts":"2024-06-30T01:01:30.582+0800","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012728c0/mydomain.etcd.cbs:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    12 2024/06/30 01:01:30.583166 etcd_registry.go:234: [Warn] keep register put kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000 failed with err: context deadline exceeded
    13 2024/06/30 01:01:40.630734 etcd_registry.go:223: [Info] keep register service kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000
    14 {"level":"warn","ts":"2024-06-30T01:01:43.585+0800","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0012728c0/mydomain.etcd.cbs:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    15 2024/06/30 01:01:43.585891 etcd_registry.go:234: [Warn] keep register put kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000 failed with err: context deadline exceeded
    16 2024/06/30 01:01:43.585914 etcd_registry.go:255: [Error] keep register service kitex/registry-etcd/wsp.earn.qa.int/172.31.251.191:10000 failed times:5

  3. 重启该微服务后一切恢复正常。

从日志上分析看,是服务与etcd断连后,服务重试了5次没重新注册上etcd,后续就没再尝试重试注册了。虽然该服务没挂(挂了后会被监控程序拉起来,这种情况反而没事),但是在etcd上已无该服务注册信息,所以其他服务访问不到该微服务。

这种问题如何更好的规避下?一直尝试重新注册还是有更好的方案?
还有个疑问点,该服务启动注册在etcd上的信息怎么丢失了。

@jayantxie
Copy link
Member

jayantxie commented Jul 4, 2024

https://github.com/kitex-contrib/registry-etcd/blob/main/etcd_registry.go#L242
是不是测试环境etcd访问时间较长?这里的3s包含了调用get的时间,可能导致put调用时时间不够。
可以把重试的超时时间调整到5s试试

@hzxgo
Copy link
Author

hzxgo commented Jul 8, 2024

我先调整重试次数为0(即一直尝试),先跑一段时间看看效果先。

etcd.NewEtcdRegistryWithRetry(["ip:addr"], retry.NewRetryConfig(
retry.WithMaxAttemptTimes(0), // 无限次尝试
retry.WithObserveDelay(20time.Second),
retry.WithRetryDelay(5
time.Second),
))

@hzxgo hzxgo closed this as completed Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants