Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: formalize gRPC errors in case of UDF failures #218

Merged
merged 12 commits into from
Mar 11, 2025
Merged

chore: formalize gRPC errors in case of UDF failures #218

merged 12 commits into from
Mar 11, 2025

Conversation

veds-g
Copy link
Contributor

@veds-g veds-g commented Feb 26, 2025

Fixes #212 #204
Formalizes the UDF errors with standard prefixes across SDKs.
For sync servers we need to implement graceful shutdown to see the actual errors, else the gRPC contexts gets overridden before it is returned to the client, fixing #198 should make it work.

Signed-off-by: veds-g <[email protected]>
Signed-off-by: veds-g <[email protected]>
Copy link

codecov bot commented Feb 28, 2025

Codecov Report

Attention: Patch coverage is 91.83673% with 4 lines in your changes missing coverage. Please review.

Project coverage is 94.21%. Comparing base (d8e165d) to head (7b30927).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...numaflow/reducestreamer/servicer/async_servicer.py 50.00% 2 Missing ⚠️
pynumaflow/mapper/_servicer/_sync_servicer.py 75.00% 1 Missing ⚠️
pynumaflow/sourcetransformer/servicer/_servicer.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #218      +/-   ##
==========================================
- Coverage   94.30%   94.21%   -0.09%     
==========================================
  Files          55       55              
  Lines        2317     2316       -1     
  Branches      119      119              
==========================================
- Hits         2185     2182       -3     
- Misses         95       97       +2     
  Partials       37       37              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@veds-g veds-g marked this pull request as ready for review February 28, 2025 03:07
@veds-g veds-g requested a review from kohlisid February 28, 2025 03:07
Copy link
Member

@vigith vigith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like earlier we weren't using handle_async_error but rather exit_on_error. was this accidental?

@@ -26,7 +26,8 @@ grpcio = "^1.48.1"
grpcio-tools = "^1.48.1"
google-cloud = "^0.34.0"
google-api-core = "^2.11.0"
protobuf = ">=3.20,<5.0"
grpcio-status = "^1.48.1"
protobuf = ">=3.20,<6.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this resolves #204

@veds-g
Copy link
Contributor Author

veds-g commented Feb 28, 2025

looks like earlier we weren't using handle_async_error but rather exit_on_error. was this accidental?

we missed adding that for few of the async servers

@kohlisid
Copy link
Contributor

kohlisid commented Mar 3, 2025

@veds-g
We removed the exception propagation to the queue from the executor layer, what was the behavior when we propagate it from there?

@veds-g
Copy link
Contributor Author

veds-g commented Mar 3, 2025

@veds-g We removed the exception propagation to the queue from the executor layer, what was the behavior when we propagate it from there?

when we were returning exception from executor layer in some of the cases we were updating the gRPC context with errors will parallel running thread even after we shutdown the gRPC server. This updated the context , overriding the actual reason for restart.

@kohlisid
Copy link
Contributor

kohlisid commented Mar 3, 2025

@veds-g Yes, I remember discussing that. To confirm this getting seen during our sync server executors only right? or in async ones as well?
Please add this as part of the clean shutdown issue, I would like to explore more on this

Copy link
Contributor

@kohlisid kohlisid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit
Please add few testing scenarios covered
Also add the pointers in the other issue
Thank you for taking this up :D

@veds-g
Copy link
Contributor Author

veds-g commented Mar 4, 2025

@veds-g Yes, I remember discussing that. To confirm this getting seen during our sync server executors only right? or in async ones as well? Please add this as part of the clean shutdown issue, I would like to explore more on this

yes for sync server executor. For async servers I just updated the exit_on_error to handle_async_error.

Signed-off-by: veds-g <[email protected]>
@veds-g veds-g linked an issue Mar 4, 2025 that may be closed by this pull request
8 tasks
@veds-g veds-g self-assigned this Mar 4, 2025
@yhl25 yhl25 merged commit a6322f8 into main Mar 11, 2025
11 checks passed
@yhl25 yhl25 deleted the app-error branch March 11, 2025 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add common prefix and formalize error messages Update protobuf Dependency to Support Version >=5.0
4 participants