You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.
Describe the bug
In pollMessages implementation we need to address two issues when messagesSlices object is empty.
The loop continues when there are no eligible messages in the queue. This leads to unnecessary querying of the database until the timeout is reached, and the sleep time of 100ms escalates the problem. The mix of these two together causes high queries per second at scale.
When the number of messages is less than the count specified, the thread blocks and holds onto the previous messages in the memory until timeout happens or the count is reached.
Please share your thoughts on this. If we agree, I can raise a Merge Request for the same.
Details
Conductor version: any
Persistence implementation: Postgres
Queue implementation: Postgres
To Reproduce
Steps to reproduce the behavior:
Run conductor with Postgres as the persistence layer for Queue implementation.
Monitor QPS in Postgres using pg_stat_statements
Note: Here we are only seeing the effect of pollMessages called from the WorkflowReconciler where the timeout is 2000ms and count depends on the CPU core.
Expected behavior
Return the messages that are present instead of waiting and retrying till the timeout or count.
Screenshots
Just started the service and execute pg_stat_statements. Below, is the number of queries executed per minute from one instance just for WorkflowReconciler. The QSP increases exponentially if there are multiple instances and if the workers poll for activity.
Note: Here we are only seeing the effect of pollMessages called from the WorkflowReconciler. There are other places where pollMessage is being invoked.
Additional context
The MySQL implementation of the poll does not have this while(true) loop.
The text was updated successfully, but these errors were encountered:
In the conductor version 3.13.7, I see that the default timeout is 2000 ms, but we can override it to 0 which will execute timeout < 1 the if block for the desired results. But ideally, this timeout should be the query timeout. If this is changed in the future it would cause an issue.
Just curious to know why we are not using this timeout value as a query timeout for the database call.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Describe the bug
In pollMessages implementation we need to address two issues when messagesSlices object is empty.
Please share your thoughts on this. If we agree, I can raise a Merge Request for the same.
Details
Conductor version: any
Persistence implementation: Postgres
Queue implementation: Postgres
To Reproduce
Steps to reproduce the behavior:
Note: Here we are only seeing the effect of pollMessages called from the
WorkflowReconciler
where the timeout is 2000ms and count depends on the CPU core.Expected behavior
Return the messages that are present instead of waiting and retrying till the timeout or count.
Screenshots
Just started the service and execute pg_stat_statements. Below, is the number of queries executed per minute from one instance just for WorkflowReconciler. The QSP increases exponentially if there are multiple instances and if the workers poll for activity.
Note: Here we are only seeing the effect of pollMessages called from the
WorkflowReconciler
. There are other places where pollMessage is being invoked.Additional context
The MySQL implementation of the poll does not have this while(true) loop.
The text was updated successfully, but these errors were encountered: