You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have identified foundational limitations in the Python function execution model within the Azure Functions Python Worker that have impacted multiple durable Python customers. The current model runs all asynchronous function calls within a single asyncio event loop, leading to several constraints and challenges that need to be addressed:
Execution Order of Invocations:
The order in which invocations are received and executed is subject to the scheduling logic of the asyncio event loop, resulting in potentially random execution order. Currently, we cannot guarantee a first-come, first-served execution model.
Invocation Timeout Tracking:
There is no mechanism for tracking the real status of an invocation when it times out. The timeout could result from a genuinely long-running invocation, or it might be due to the worker not picking up the invocation for an extended period (e.g., 15 minutes) because the event loop is busy with other tasks. Additionally, the loop may become stuck processing one or more "bad" async calls (i.e., calls declared as async but performing blocking operations). The Function Host sends the request to the worker and begins measuring the timeout, but there is no explicit acknowledgment, status check, or fast-fail mechanism for these bad async calls currently in place.
Monitoring Event Loop Status:
These challenges present significant difficulties for customers relying on durable/ non durable Python functions, and we are looking for potential enhancements to address these limitations.
Description:
We have identified foundational limitations in the Python function execution model within the Azure Functions Python Worker that have impacted multiple durable Python customers. The current model runs all asynchronous function calls within a single asyncio event loop, leading to several constraints and challenges that need to be addressed:
Execution Order of Invocations:
The order in which invocations are received and executed is subject to the scheduling logic of the asyncio event loop, resulting in potentially random execution order. Currently, we cannot guarantee a first-come, first-served execution model.
Invocation Timeout Tracking:
There is no mechanism for tracking the real status of an invocation when it times out. The timeout could result from a genuinely long-running invocation, or it might be due to the worker not picking up the invocation for an extended period (e.g., 15 minutes) because the event loop is busy with other tasks. Additionally, the loop may become stuck processing one or more "bad" async calls (i.e., calls declared as async but performing blocking operations). The Function Host sends the request to the worker and begins measuring the timeout, but there is no explicit acknowledgment, status check, or fast-fail mechanism for these bad async calls currently in place.
Monitoring Event Loop Status:
Currently, there is no platform support for real-time monitoring of the asyncio event loop's running status or the ability to take snapshots to diagnose potential issues.
Reference to the relevant code: https://github.com/Azure/azure-functions-python-worker/blob/b734c57b3b81b3cad2f84951ee79c3a493504e32/azure_functions_worker/dispatcher.py#L659C13-L669C62
These challenges present significant difficulties for customers relying on durable/ non durable Python functions, and we are looking for potential enhancements to address these limitations.
++ @davidmrdavid @andystaples @vrdmr @gavin-aguiar @hallvictoria @fabiocav
Expected Behavior
No response
Relevant sample code snipped
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: