Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] workqueue 在超时到达之前调用 rt_workqueue_urgent_work 卡死。 #9814

Closed
yuqingli05 opened this issue Dec 20, 2024 · 1 comment · Fixed by #9825
Closed

[Bug] workqueue 在超时到达之前调用 rt_workqueue_urgent_work 卡死。 #9814

yuqingli05 opened this issue Dec 20, 2024 · 1 comment · Fixed by #9825

Comments

@yuqingli05
Copy link
Contributor

yuqingli05 commented Dec 20, 2024

RT-Thread Version

当前最新5.2版本,master分支

Hardware Type/Architectures

qemu-vexpress-a9

Develop Toolchain

GCC

Describe the bug

1、复现
qemu-vexpress-a9 main.c 如下

#include <stdint.h>
#include <stdio.h>
#include <rtthread.h>
#include <rtdevice.h>

struct rt_work work_test;
void work_func(struct rt_work *work, void *work_data)
{
    rt_kprintf("work_func\n");
}

int main(void)
{
    rt_kprintf("Hello RT-Thread!\n");
    rt_work_init(&work_test, work_func, NULL);

    rt_work_submit(&work_test, 100);
    rt_work_urgent(&work_test);

    return 0;
}

2、报错如下

$ qemu.bat
QEMU emulator version 8.0.94 (v8.1.0-rc4-12032-g74a4cbee04)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
WARNING: Image format was not specified for 'sd.bin' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
dsound: Could not initialize DirectSoundCapture
dsound: Reason: No sound driver is available for use, or the given GUID is not a valid DirectSound device ID

 \ | /
- RT -     Thread Operating System
 / | \     5.2.0 build Dec 21 2024 00:20:07
 2006 - 2024 Copyright by RT-Thread team
[I/SDIO] SD card capacity 65536 KB.
[I/SDIO] sd: switch to High Speed / SDR25 mode 

[I/FileSystem] file system initialization done!

Hello RT-Thread!
msh />work_func
(queue != RT_NULL) assertion failed at function:_delayed_work_timeout_handler, line number:181
backtrace:
please use: addr2line -e rtthread.elf -a -f 6003ae28 6006cf74 600493f0 600738b8 60073fa4
Terminate batch job (Y/N)? y

(.venv) myname@DESKTOP-B6TRHE0 C:\Users\myname\Documents\rtt\rt-thread\bsp\qemu-vexpress-a9
$ wsl addr2line -e rtthread.elf -a -f 6003ae28 6006cf74 600493f0 600738b8 60073fa4
0x6003ae28
rt_backtrace
C:\Users\myname\Documents\rtt\rt-thread\bsp\qemu-vexpress-a9/C:\Users\myname\Documents\rtt\rt-thread\libcpu\arm\cortex-a/backtrace.c:540
0x6006cf74
rt_assert_handler
C:\Users\myname\Documents\rtt\rt-thread\bsp\qemu-vexpress-a9/C:\Users\myname\Documents\rtt\rt-thread\src/kservice.c:1156
0x600493f0
_delayed_work_timeout_handler
C:\Users\myname\Documents\rtt\rt-thread\bsp\qemu-vexpress-a9/C:\Users\myname\Documents\rtt\rt-thread\components\drivers\ipc/workqueue.c:183
0x600738b8
_timer_check
C:\Users\myname\Documents\rtt\rt-thread\bsp\qemu-vexpress-a9/C:\Users\myname\Documents\rtt\rt-thread\src/timer.c:532
0x60073fa4
_timer_thread_entry
C:\Users\myname\Documents\rtt\rt-thread\bsp\qemu-vexpress-a9/C:\Users\myname\Documents\rtt\rt-thread\src/timer.c:813 (discriminator 1)

3、分析原因为。rt_workqueue_urgent_work 超时时间为0直接触发了work生效,但是没有取消上一次开启的 timer。timer继续计时到达之后,进入回调之后work已经执行完成,work->workqueue 为空导致。

4、解决方案:先停止上一次的timer
components\drivers\ipc\workqueue.c 的 _workqueue_submit_work函数和rt_workqueue_urgent_work 进行修改

static rt_err_t _workqueue_submit_work(struct rt_workqueue *queue,
                                       struct rt_work *work, rt_tick_t ticks)
{
    rt_base_t level;
    rt_err_t err = RT_EOK;

    level = rt_spin_lock_irqsave(&(queue->spinlock));

    if (ticks == 0)
    {
        rt_uint8_t timer_is_active = 0;
        if (work->flags & RT_WORK_STATE_SUBMITTING)
        {
            if (rt_timer_stop(&(work->timer)) == RT_EOK)
            {
                rt_timer_detach(&(work->timer));
            }
            else
            {
                /* timer is active, No need to handle it */
                timer_is_active = 1;
            }
        }

        if(timer_is_active == 0)
        {
            /* remove list */
            rt_list_remove(&(work->list));
            work->flags &= ~RT_WORK_STATE_PENDING;

            rt_list_insert_after(queue->work_list.prev, &(work->list));
            work->flags |= RT_WORK_STATE_PENDING;
            work->workqueue = queue;

            /* whether the workqueue is doing work */
            if (queue->work_current == RT_NULL)
            {
                /* resume work thread, and do a re-schedule if succeed */
                rt_thread_resume(queue->work_thread);
            }
        }
    }
    else if (ticks < RT_TICK_MAX / 2)
    {
        /* remove list */
        rt_list_remove(&(work->list));
        work->flags &= ~RT_WORK_STATE_PENDING;

        /* Timer started */
        if (work->flags & RT_WORK_STATE_SUBMITTING)
        {
            rt_timer_control(&work->timer, RT_TIMER_CTRL_SET_TIME, &ticks);
        }
        else
        {
            rt_timer_init(&(work->timer), "work", _delayed_work_timeout_handler,
                          work, ticks, RT_TIMER_FLAG_ONE_SHOT | RT_TIMER_FLAG_SOFT_TIMER);
            work->flags |= RT_WORK_STATE_SUBMITTING;
        }
        work->workqueue = queue;
        /* insert delay work list */
        rt_list_insert_after(queue->delayed_list.prev, &(work->list));

        err = rt_timer_start(&(work->timer));
    }
    else
    {
        err = - RT_ERROR;
    }

    rt_spin_unlock_irqrestore(&(queue->spinlock), level);
    return err;
}

rt_err_t rt_workqueue_urgent_work(struct rt_workqueue *queue, struct rt_work *work)
{
    return rt_workqueue_submit_work(queue, work, 0);
}

Other additional context

其他思考:
1、有个项目大量用到了 work,发现内存占用消耗很大。
2、进一步查看源码发现 struct rt_work 里面有个 timer。这就会导致定义 struct rt_work 的时候有 timer 占用很大。
3、这里为什么每个 rt_work 都有一个拿timer,明明延迟应该是 struct rt_workqueue 里面的线程在做,用一个定时器不就好了?
4、rt_work只需要在 rt_workqueue 进行排序就好了啊?

针对上面疑问,我重构了 workqueue.c ,实际使用发现一点区别没有。并且因为取消了定时器,上诉bug也不复存在,运行效率也得到提高。后续这块代码我整理下提交PR。

@Rbb666
Copy link
Member

Rbb666 commented Dec 31, 2024

可以帮忙补充下这块的测试用例:#9850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants