-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent ClientSet #4718
Agent ClientSet #4718
Conversation
Signed-off-by: Future Outlier <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #4718 +/- ##
==========================================
- Coverage 59.02% 58.99% -0.04%
==========================================
Files 643 644 +1
Lines 55153 55148 -5
==========================================
- Hits 32555 32535 -20
- Misses 20018 20039 +21
+ Partials 2580 2574 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Kevin Su <[email protected]>
generate mock file by mockery cd flyte/flyteidl/gen/pb-go/flyteidl/service
mockery --name=AsyncAgentServiceClient --output=/mnt/c/code/dev/flyte/flyteplugins/go/tasks/plugins/webapi/agent/mocks/ --outpkg=mocks |
Signed-off-by: Future-Outlier <[email protected]>
Only integration test needs to be fixed, other test files are correct! |
agent-service:
supportedTaskTypes:
- sensor
- spark
- default_task
- custom_task
- chatgpt
- sensor
- airflow
# By default, all the request will be sent to the default agent.
# defaultAgent:
# endpoint: "dns:///localhost:8000"
# insecure: true
# timeouts:
# GetTask: 100s
# defaultTimeout: 100s Before Error MessageAfter Error Message |
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
cfg := GetConfig() | ||
connectionCache := make(map[*Agent]*grpc.ClientConn) | ||
agentRegistry, err := initializeAgentRegistry(cfg, connectionCache, getAgentMetadataClientFunc) | ||
cs, err := initializeClients(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please help me understand what happens if one of the agent endpoints is not available when flytepropeller is booting up? Would it just crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, it is a very strong assumption that I'm afraid we cannot take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even the plugin loading is guarded so a single failing plugin would not impact propeller booting, it is still undesired that one failed agent endpoint (out of for example 50 healthy ones) would fail the whole plugin loading. I think some kind of late binding would be nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to wait for every endpoint to start; otherwise, FlytePropeller will keep crashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice suggestions, I will discuss with @pingsutw and reply the result here.
I think late binding or some kind of endpoint or service
detect mechanism will be really helpful.
For example, in k8s, maybe readiness probe
can help us realize lazy binding?
I am not 100% familiar with how agent is deployed in flyte cluster, but yes, I will try to help.
Thank you for always providing great advices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe, only need 1 agent is connected is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this plugin used to work in a late binding way, meaning the connection was not established until the very first RPC. In this way the propeller does not depend on agents in the wild when booting up.
For example, in k8s, maybe readiness probe can help us realize lazy binding?
I think we cannot make any assumption how agents are deployed, and only interface they expose to the plugin is gRPC endpoints, so we can only make sense out of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update the solution to your case with Kevin this week, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@honnix, We will add a watcher to solve your case!
Thank you for the patience, we will mention you to see the new update, thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Future-Outlier That is fantastic. Thank you!
Tracking issue
Fixes #3936
Why are the changes needed?
For better readability and scalability in
package agent
in the future.What changes were proposed in this pull request?
getClientFunc
togetAgentClientFunc
ClientFuncSet
to storegetAgentClientFunc
andgetAgentMetadataClientFunc
ClientFuncSet
to all related functions and objectsHow was this patch tested?
unit test, integration test, and single binary mode.
Setup process
flyte-single-binary-local-dev.yaml
sensor_example.py
Screenshots
Check all the applicable boxes
Related PRs
Docs link