-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abstract whole agent #70
Conversation
agent.go
Outdated
if err != nil { | ||
return err | ||
} | ||
|
||
go api.Serve(config.API.Addr) | ||
go profile.Serve(config.Profile.Addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我建议这里你不要改,你看 line 136,环境变量还是 API……另外这里确实不仅仅做 profile ,你看 metrics 接口我记得也是这里输出的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确实 profile 比 api 更不好一点, api 最多会造成一眼看过去不知道是什么 api 的困惑, profile 直接造成了误导. 想不到好名字的话就将就 api 用着先.
标题开头加上 |
Events(ctx context.Context, filters [][2]string) (<-chan types.WorkloadEventMessage, <-chan error) | ||
GetStatus(ctx context.Context, workload workload.Workload) (*types.WorkloadStatus, error) | ||
Runtime() string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的 interface 有一些想法可以讨论一下:
- workload.Workload 这个 interface, 我看你在 Runtime 实现里的都去做类型转化, 很僵硬, 那是不是可以考虑去掉这个模型, 直接传 ID, 比如
GetStatus(ctx, ID)
和StatWorkload(ctx, ID)
. AttachWorkload
的返回值没有区分 stdout 和 stderrfilters [][2]string
看类型猜不出参数构造方式StatWorkload
这个接口内部去调用 mClient 了, 这一块的逻辑应该抽到上层; 一种思考方式是认为这不是 runtime 模型本身的事情, 另一种思考方式是考虑以后实现其他类型的 runtime, mClient 这一部分的逻辑就会被原封不动地复制进去, 不 DIY 了.
} | ||
}() | ||
|
||
return eventChan, errChan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个 chan 没有 close, 要尽量保证"谁创建谁关闭" 的原则; 同时这里没有处理 ctx.Done() 的情况.
@anrs 点评说这个任务很简单, 一周做完两周上线( |
我发现你喜欢用 pkg.NewPkg 这样,core 的习惯一般是 pkg.New,更直观 |
顺便做了的事情:
|
补充一下其它的行为变化:
|
profile/http.go
Outdated
@@ -1,21 +1,20 @@ | |||
package api | |||
package profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not just profile
api/api.go
Outdated
@@ -0,0 +1,21 @@ | |||
package api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store
api/core/client.go
Outdated
) | ||
|
||
// CoreAPI use core to store meta | ||
type CoreAPI struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 golint 没报错么,理论上他 warn coreapi.CoreAPI, 比较合理的是 pkg name 叫 core,struct 叫 API,这样你 import 后调用就是 core.API 初始化函数是 core.New 调用方式是 core.Get
api/core/node.go
Outdated
) | ||
|
||
// GetNode return a node by core | ||
func (c *CoreAPI) GetNode(ctx context.Context, nodename string) (coretypes.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
返回值一致性,你下面用的是 *coretypes.Node 那么这里最好都统一
api/core/node.go
Outdated
cctx, cancel := context.WithTimeout(ctx, c.config.GlobalConnectionTimeout) | ||
defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我如果写的话会设计一个 decor
func WithTimeout(ctx context.Context, f func () error) error {
cctx, cancel := context.WithTimeout(ctx, c.config.GlobalConnectionTimeout)
defer cancel()
return f()
}
# 那么这里就可以这样写
func (c *CoreAPI) GetNode(ctx context.Context, nodename string) (*coretypes.Node, error) {
var node *coretypes.Node
err := WithTimeout(ctx, func () error {
bla bla
})
return node, err
}
api/core/node.go
Outdated
if node.Available { | ||
opts.StatusOpt = coretypes.TriTrue | ||
} else { | ||
opts.StatusOpt = coretypes.TriFalse | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see line 74....你可以先给 StatusOpt 一个默认值,减少分支预测
api/core/node.go
Outdated
} | ||
|
||
// GetNodeStatus gets the status of node | ||
func (c *CoreAPI) GetNodeStatus(ctx context.Context, nodename string) (coretypes.NodeStatus, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
返回值
api/core/rpcpool.go
Outdated
func checkAlive(ctx context.Context, rpc *clientWithStatus, timeout time.Duration) bool { | ||
cctx, cancel := context.WithTimeout(ctx, timeout) | ||
defer cancel() | ||
_, err := rpc.client.Info(cctx, &pb.Empty{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if in 1 line
engine/status_report.go
Outdated
@@ -30,7 +30,7 @@ func (e *Engine) heartbeat(ctx context.Context) { | |||
// nodeStatusReport does heartbeat, tells core this node is alive. | |||
// The TTL is set to double of HeartbeatInterval, by default it will be 360s, | |||
// which means if a node is not available, subcriber will notice this after at least 360s. | |||
// HealthCheck.Timeout is used as timeout of requesting core API | |||
// HealthCheck.Timeout is used as timeout of requesting core Profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没理解为啥是 profile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是用Goland refactor功能的时候,它搞了个自动的字符串替换。。。太睿智了
api/api.go
Outdated
GetNodeStatus(ctx context.Context, nodename string) (coretypes.NodeStatus, error) | ||
SetWorkloadStatus(ctx context.Context, status *types.WorkloadStatus, ttl int64) error | ||
SetNode(ctx context.Context, node string, status bool) error | ||
GetCoreIdentifier(ctx context.Context) string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetStoreIdentifier
manager/workload/filter.go
Outdated
|
||
if m.config.CheckOnlyMine && utils.UseLabelAsFilter() { | ||
f = append(f, types.KV{Key: "label", Value: fmt.Sprintf("eru.nodename=%s", m.config.HostName)}) | ||
if m.coreIdentifier != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont think here is coreIdentifier
, here is identify which store
used, core is just one of implementation.
types/config.go
Outdated
@@ -114,8 +122,8 @@ func (config *Config) PrepareConfig(c *cli.Context) { | |||
if len(c.StringSlice("metrics-transfers")) > 0 { | |||
config.Metrics.Transfers = c.StringSlice("metrics-transfers") | |||
} | |||
if c.String("api-addr") != "" { | |||
config.API.Addr = c.String("api-addr") | |||
if c.String("profile-addr") != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
记得改回去后的一致性
本来想把agent分成三个manager: WorkloadManager,NodeManager,Selfmon;但是觉得Manager没必要抽象出一个接口出来...? |
没有共同性不如就3个模块呗 |
runtime/runtime.go
Outdated
GetStatus(ctx context.Context, ID string, checkHealth bool) (*types.WorkloadStatus, error) | ||
GetWorkloadName(ctx context.Context, ID string) (string, error) | ||
LogFieldsExtra(ctx context.Context, ID string) (map[string]string, error) | ||
IsConnected(ctx context.Context) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我看了一下 IsConnected 的调用, 在 heartbeat.go 里, 但是哪里的语义来看, 你的调用目的是来判断本地的 runtime 是否可用? 那可能换个名字更好
runtime/runtime.go
Outdated
// Runtime provides runtime-related functions | ||
type Runtime interface { | ||
AttachWorkload(ctx context.Context, ID string) (io.Reader, io.Reader, error) | ||
StatWorkload(ctx context.Context, ID string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个接口语义不明确啊, 比如我现在要实现一个 yavirt 的 runtime, 那应该怎么实现这个方法? 语义是什么?
暂时没有什么集成测试方案(在做了在做了), 所以你得自己仔细测试每一个功能.... |
哦你本来就要写单测来着, 好耶 |
有两个重复的 Container type, 删掉一个; 然后再仔细检查下建模和接口, 设想一下假如要扩展新的 runtime 实现需要哪些改动, 来评估抽象的合理性. |
修复了这个issue。出现这个问题的原因是:某个workload如果被attach第二次的时候,会造成prometheus的重复register,加个缓存就行了。 |
目前基本功能测试着都没啥问题...准备mock了! |
现在的一个问题是我能启动一个workload manager,用纯mock的runtime和store正常运行。但是这玩意如果一直运行,怎么assert呢... |
mock只能做白盒测试…否则就只能做测试框架了
DuodenumL ***@***.***>于2021年9月1日 周三19:00写道:
现在的一个问题是我能启动一个workload
manager,用纯mock的runtime和store正常运行。但是这玩意如果一直运行,怎么assert呢...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#70 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD3S2HIDRPUQCYJK5ZGFBLT7YBWXANCNFSM5CYZFYIQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
- CMGS
|
愿意的话集成测试也搞起来啊, 不过我这边本来也有计划把 agent 集成起来. |
改了一下selfmon的register实现 |
还应该在哪里加test呢... |
agent.yaml.sample
Outdated
|
||
# kv_type defines the type of kv store. | ||
# This option is not required as the default value is "etcd". | ||
kv_type: etcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我讨厌下划线在 yaml 中
api/http.go
Outdated
@@ -32,7 +31,7 @@ func (h *Handler) version(w http.ResponseWriter, _ *http.Request) { | |||
_ = json.NewEncoder(w).Encode(JSON{"version": version.VERSION}) | |||
} | |||
|
|||
// URL /profile/ | |||
// URL /api/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually...no need to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= =这里也是Goland自动干的...吐了
manager/node/manager.go
Outdated
if m.runtimeClient == nil { | ||
return nil, errors.New("failed to get runtime client") | ||
} | ||
case "mocks": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type RPCClientPool interface {
GetClient() pb.CoreRPCClient
}
这个抽象是出于什么考虑?
// CoreRPCClientPool implement of RPCClientPool | ||
type CoreRPCClientPool struct { | ||
// ClientPool implement of RPCClientPool | ||
type ClientPool struct { | ||
rpcClients []*clientWithStatus | ||
} | ||
|
||
func checkAlive(ctx context.Context, rpc *clientWithStatus, timeout time.Duration) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实我觉得这个方法似乎可以放到 grpc client 内部
SetNode(ctx context.Context, node string, status bool) error | ||
GetIdentifier(ctx context.Context) string | ||
NodeStatusStream(ctx context.Context) (<-chan *coretypes.NodeStatus, <-chan error) | ||
ListPodNodes(ctx context.Context, all bool, podname string, labels map[string]string) ([]*coretypes.Node, error) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
要不要考虑把 coretypes 都干掉
话说 watcher 那一部分没有改啊, 从建模边界来看应该是属于 workload manager 的活吧 |
LGTM @jschwinger23 和 @anrs 你们再看看 |
和他又勾兑了一会儿, 再改一版应该就好了, 改 watcher/log 那块 |
7d1cd0f
to
a195e70
Compare
manager/workload/log.go
Outdated
} | ||
|
||
ID := coreutils.RandomString(8) | ||
ctx, cancel := context.WithCancel(context.TODO()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我看了你的代码,从最外层其实可以传入 req.Context()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
而且从逻辑上确实也应该如此
|
||
l.subscribers[app][ID] = &subscriber{buf, cancel} | ||
logrus.Infof("%s %s log subscribed", app, ID) | ||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好像会永久阻塞...是预期的设计?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗷是同步式 API...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cancel会作为subscriber的成员传到broadcast里,broadcast时如果err != nil就会调用cancel
c2bcd54
to
6102853
Compare
中间版本,还在瞎改中,不代表最终品质