-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent Metadata Servicer #4511
Agent Metadata Servicer #4511
Conversation
Signed-off-by: Future Outlier <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #4511 +/- ##
==========================================
- Coverage 59.03% 58.98% -0.06%
==========================================
Files 622 622
Lines 52687 52739 +52
==========================================
+ Hits 31104 31106 +2
- Misses 19101 19148 +47
- Partials 2482 2485 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
… agent-metadata-proto-service
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
… agent-metadata-proto-service Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
|
||
return webapi.PluginEntry{ | ||
ID: "agent-service", | ||
SupportedTaskTypes: supportedTaskTypes, | ||
SupportedTaskTypes: cfg.SupportedTaskTypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SupportedTaskTypes: cfg.SupportedTaskTypes, | |
SupportedTaskTypes: agentMetadata.SupportedTaskTypes, |
Co-authored-by: Kevin Su <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
… agent-metadata-proto-service
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
|
||
type Plugin struct { | ||
metricScope promutils.Scope | ||
cfg *Config | ||
getClient GetClientFunc | ||
connectionCache map[*Agent]*grpc.ClientConn | ||
agentRegistry map[string]map[bool]*Agent // map[taskType][isSync] => Agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used? Or would that be addressed in follow-up PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will have follow-up PRs, we want to make one task type can have both sync and async agent.
Please leave more questions if this is unclear.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: we will remove the second dimension, it will improve user experience more in the future, and we will have follow-up PRs for this Registry, thank you.
return context.WithTimeout(ctx, timeout) | ||
} | ||
|
||
func initializeAgentRegistry(cfg *Config, connectionCache map[*Agent]*grpc.ClientConn, getAgentMetadataClientFunc GetAgentMetadataClientFunc) (map[string]map[bool]*Agent, error) { | ||
agentRegistry := make(map[string]map[bool]*Agent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agentRegistry := make(map[string]map[bool]*Agent) | |
agentRegistry := make(map[string]*Agent) |
.github/workflows/single-binary.yml
Outdated
@@ -170,7 +170,7 @@ jobs: | |||
cpu: "0" | |||
memory: "0" | |||
EOF | |||
flytectl demo start --image flyte-sandbox-bundled:local --imagePullPolicy Never | |||
flytectl demo start --image flyte-sandbox-bundled:local --disable-agent --imagePullPolicy Never |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
It works well with agent metadata server. agent-service:
# supportedTaskTypes:
# - sensor
# - spark
# - default_task
# - custom_task
# - api_task
# - sensor
# - airflow
# By default, all the request will be sent to the default agent.
defaultAgent:
endpoint: "dns:///localhost:8000"
insecure: true
timeouts:
GetTask: 100s
defaultTimeout: 100s
# agents:
# custom_agent:
# endpoint: "dns:///localhost:8001"
# insecure: true
# defaultServiceConfig: '{"loadBalancingConfig": [{"round_robin":{}}]}'
# timeouts:
# DoTask: 300s
# GetTask: 100s
# defaultTimeout: 300s
agentForTaskTypes:
# It will override the default agent for custom_task, which means propeller will send the request to this agent.
- custom_task: custom_agent
- default_task: custom_agent |
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
…ure-Outlier/flyte into agent-metadata-proto-service
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
message DeleteTaskResponse {} | ||
|
||
// A message containing the agent metadata. | ||
message Agent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not adding async
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for both sync
and async
Agent, do you think we need to add it?
Co-authored-by: Haytham Abuelfutuh <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future Outlier <[email protected]>
updated method {
"json":{
"src":"plugin.go:383"
},
"level":"info",
"msg":"Agent supports task types: [sensor spark airflow task_type_1 task_type_2]",
"ts":"2023-12-20T10:45:38+08:00"
} |
for _, agentDeployment := range agentDeployments { | ||
client, err := getAgentMetadataClientFunc(context.Background(), agentDeployment, connectionCache) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to connect to agent [%v] with error: [%v]", agentDeployment, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this fail flytepropeller startup? If so, I think we need to be more resilient and not fail the whole propeller due to a single misbehaving agent deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could issue a warning and move on to the next deployment. When routing task later we will also need to be more resilient and fail if a mapping is not found.
Insecure: true, | ||
DefaultTimeout: config.Duration{Duration: 10 * time.Second}, | ||
}, | ||
// AsyncPlugin should be registered to at least one task type. | ||
// Reference: https://github.com/flyteorg/flyte/blob/master/flyteplugins/go/tasks/pluginmachinery/registry.go#L27 | ||
SupportedTaskTypes: []string{"task_type_1", "task_type_2"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just remove SupportedTaskTypes
to ship the breaking change and from now on we will solely rely on metadata as the single source of truth. If we still want to support this config, we will definitely need to be more resilient when applying the routing to find agent deployment because users might have a bad configuration, more specifically I am referring to getFinalAgent
function.
As I commented, I'm afraid this is of a too strong assumption in this imperfect world :). |
Tracking issue
#3936
Note to reviewers
Agent Metadata Servicer flytekit#2012
(we will use the
IsSyncTask
map for routing mechanism)flytekit: https://github.com/flyteorg/flytekit/pull/2012/files#diff-9f7af27264f8773b069e8200804c224fe19a6fcaaf9dc33edc644f5351cbb3beR157
sync agent
PR and theget agent secret
PR are merged.will be updated in agent integration test in the future.
Describe your changes
(We will use the
Check all the applicable boxes
Setup Process
Supported Task Types
Note: I didn't use the supported task type section.
Note: I changed the code in
prometheus_client/start_http_server
because it can't use the same HTTP server in 2 different agent servers.flyteorg/flytekit@a670fd2#diff-3fb315ad3aeb0e3eff4edd799cf4ec7c9e934ea12537e8f6f50d56828a12a410R46-R57
Screenshots
server port 8000
server port 8001
sync task type
Related PRs
AgentMetadataProto by pingsutw : #4500
AgentMetadataServicer : flyteorg/flytekit#2012