pollMessages() in PostgresQueueDAO does not break when popMessages() returns empty message list. #236

Veera93 · 2023-06-20T20:16:11Z

Describe the bug
In pollMessages implementation we need to address two issues when messagesSlices object is empty.

The loop continues when there are no eligible messages in the queue. This leads to unnecessary querying of the database until the timeout is reached, and the sleep time of 100ms escalates the problem. The mix of these two together causes high queries per second at scale.
When the number of messages is less than the count specified, the thread blocks and holds onto the previous messages in the memory until timeout happens or the count is reached.

Please share your thoughts on this. If we agree, I can raise a Merge Request for the same.

Details
Conductor version: any
Persistence implementation: Postgres
Queue implementation: Postgres

To Reproduce
Steps to reproduce the behavior:

Run conductor with Postgres as the persistence layer for Queue implementation.
Monitor QPS in Postgres using pg_stat_statements
Note: Here we are only seeing the effect of pollMessages called from the WorkflowReconciler where the timeout is 2000ms and count depends on the CPU core.

Expected behavior

Return the messages that are present instead of waiting and retrying till the timeout or count.

Screenshots
Just started the service and execute pg_stat_statements. Below, is the number of queries executed per minute from one instance just for WorkflowReconciler. The QSP increases exponentially if there are multiple instances and if the workers poll for activity.

Note: Here we are only seeing the effect of pollMessages called from the WorkflowReconciler. There are other places where pollMessage is being invoked.

Additional context
The MySQL implementation of the poll does not have this while(true) loop.

The text was updated successfully, but these errors were encountered:

Veera93 · 2023-06-22T02:45:51Z

In the conductor version 3.13.7, I see that the default timeout is 2000 ms, but we can override it to 0 which will execute timeout < 1 the if block for the desired results. But ideally, this timeout should be the query timeout. If this is changed in the future it would cause an issue.

Just curious to know why we are not using this timeout value as a query timeout for the database call.

Veera93 added the type: bug label Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pollMessages() in PostgresQueueDAO does not break when popMessages() returns empty message list. #236

pollMessages() in PostgresQueueDAO does not break when popMessages() returns empty message list. #236

Veera93 commented Jun 20, 2023 •

edited

Loading

Veera93 commented Jun 22, 2023

pollMessages() in PostgresQueueDAO does not break when popMessages() returns empty message list. #236

pollMessages() in PostgresQueueDAO does not break when popMessages() returns empty message list. #236

Comments

Veera93 commented Jun 20, 2023 • edited Loading

Veera93 commented Jun 22, 2023

Veera93 commented Jun 20, 2023 •

edited

Loading