Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rt ffs providing another find least non-0 bit position method with no memory requirement #9729

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

pegasusplus
Copy link

@pegasusplus pegasusplus commented Nov 30, 2024

拉取/合并请求描述:再提供两种查找最低非0bit位的算法选择

最新更新:了解到很多编译器有内置函数,尽可能直接利用处理器专有指令来计算最低非0位的位置,经测试比原本的ffs和原作者改进的tiny_ffs再快一些,加入使用GCC/KEIL/IAR的内置函数来计算的方法。(家里的实体机英特尔处理器快28%,GitHub上快35%,有机会可以测试一些ARM(新一些的都有专有指令))

ffs execution time: 14580990 microseconds
tiny_ffs execution time: 13520661 microseconds
puny_ffs execution time: 20757811 microseconds
builtin_ffs execution time: 8865771 microseconds

KConfig里默认配的优先使用编译器内置函数来计算

微信上看到原作者分享算法经验,为rt-thread中查找最低非0bit位的算法做了改进,觉得很赞,又觉得还可以改进,
不使用额外空间,算法简单直接,故再提供一个选项,可以极限压榨内存。

本算法采用折半查找的原理,判断非0bit位的位置,32bit需连续5轮判断比较。
三种算法比较:32bit整数全部查找一遍:

ffs execute 4294967295 times execution time: 9307483 microseconds
tiny_ffs execute 4294967295 times execution time: 9487902 microseconds
puny_ffs execute 4294967295 times execution time: 16202501 microseconds

这个不额外占内存的慢一些。
上面是在自家Intel上测试的8bit时相当,16bit慢33%,32bit慢67%大致;
刚才又用GitHub测试了一下,32bit大约慢25%

我的仓库中Action编译链接成功:
https://github.com/pegasusplus/rt-thread/actions/runs/12099488207

当前拉取/合并请求的状态 Intent for your PR

必须选择一项 Choose one (Mandatory):

  • 本拉取/合并请求是一个草稿版本 This PR is for a code-review and is intended to get feedback
  • 本拉取/合并请求是一个成熟版本 This PR is mature, and ready to be integrated into the repo

代码质量 Code Quality:

我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:

  • 已经仔细查看过代码改动的对比 Already check the difference between PR and old code
  • 代码风格正确,包括缩进空格,命名及其他风格 Style guide is adhered to, including spacing, naming and other styles
  • 没有垃圾代码,代码尽量精简,不包含#if 0代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up
  • 所有变更均有原因及合理的,并且不会影响到其他软件组件代码或BSP All modifications are justified and not affect other components or BSP
  • 对难懂代码均提供对应的注释 I've commented appropriately where code is tricky
  • 代码是高质量的 Code in this PR is of high quality
  • 已经使用formatting 等源码格式化工具确保格式符合RT-Thread代码规范 This PR complies with RT-Thread code specification
  • 如果是新增bsp, 已经添加ci检查到.github/workflows/bsp_buildings.yml 详细请参考链接BSP自查

@CLAassistant
Copy link

CLAassistant commented Nov 30, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added Kernel PR has src relate code action github action yml imporve labels Nov 30, 2024
@pegasusplus pegasusplus changed the title Improve rt ffs Improve rt ffs providing another find least non-0 bit position method with no memory requirement Nov 30, 2024
@mysterywolf
Copy link
Member

感谢提交PR,无需处理CI报的错误,我会把您的PR重新整理一下。

@supperthomas
Copy link
Member

ci改动不需要提交,直接手动触发即可

Some compiler complains value & (value - 1) ^ value
better (value & (value - 1)) ^ value
@pegasusplus
Copy link
Author

pegasusplus commented Dec 2, 2024

最新更新:了解到很多编译器有内置函数,尽可能直接利用处理器专有指令来计算最低非0位的位置,经测试比原本的ffs和原作者改进的tiny_ffs再快一些,加入使用GCC/KEIL/IAR的内置函数来计算的方法。(家里的实体机英特尔处理器快28%,GitHub上快35%,有机会可以测试一些ARM(新一些的都有专有指令))

ffs execution time: 14580990 microseconds
tiny_ffs execution time: 13520661 microseconds
puny_ffs execution time: 20757811 microseconds
builtin_ffs execution time: 8865771 microseconds

@pegasusplus
Copy link
Author

上面的CLA协议,好像我用了本机的Visual Studio Community来测试了一下性能,后来提交代码时,记录了Visual Studio Community上配的微软账号,这样就多了一个账号,还没办法签协议,不是GitHub的账户。

@aozima
Copy link
Member

aozima commented Dec 2, 2024

上面的CLA协议,好像我用了本机的Visual Studio Community来测试了一下性能,后来提交代码时,记录了Visual Studio Community上配的微软账号,这样就多了一个账号,还没办法签协议,不是GitHub的账户。

rebase改下作者,重新提交到同名分支

push --force-with-lease

@pegasusplus
Copy link
Author

谢谢,解决好了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action github action yml imporve Kernel PR has src relate code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants