-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
output: scheduled timer callbacks with coroutines #7981
base: master
Are you sure you want to change the base?
Conversation
@PettitWesley it seems the code breaks something, majority of the unit tests are failing, can you check pls ? |
The failure is due to a bug in my code:
|
I think the root cause of the unit test failure is that I forgot to initialize the timer coro list for the zero worker case. |
@@ -333,7 +334,7 @@ static void output_thread(void *data) | |||
flb_sched_timer_cleanup(sched); | |||
|
|||
/* Check if we should stop the event loop */ | |||
if (stopping == FLB_TRUE && mk_list_size(&th_ins->flush_list) == 0) { | |||
if (stopping == FLB_TRUE && mk_list_size(&th_ins->flush_list) == 0 && mk_list_size(&th_ins->timer_coro_list) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why mk_list_size() for timer_coro_list in this case is not protected as in the others part of the code (mutex) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flush_list isn't protected either, I thought that in this control code, its guaranteed that this is the only code running in this thread and accessing the thread instance list... is that not true? Why is the flush_list not protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think on this some more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matt is helping me realize that the problem is that the engine looks at these lists too, which means there need to be a lock that the engine acquires to ensure its allowed to safely review the list that the output thread owns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If the main engine (parent thread) needs to query the number of active
* 'flushes' running by a threaded instance, then the access to the 'flush_list'
* must be protected: we use 'flush_mutex for that purpose.
https://github.com/fluent/fluent-bit/blob/master/include/fluent-bit/flb_output_thread.h#L85
I htink I get it now. The engine needs to acquire the lock to read the list, and the output thread needs to acquire the lock for writes/deletes, so that the engine won't read it at the same time. If the output thread is just reading, then that's always safe. Assuming the output thread will modify the list.
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
552830f
to
771f5f2
Compare
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Implements #7466
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
In order to test the change, some output needs to use timer coroutines.
The plan is for S3 output to use them, I have code which works and is undergoing extensive testing here: https://github.com/PettitWesley/fluent-bit/pull/27/commits
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.