-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
wufei
committed
Apr 12, 2024
1 parent
81d80d4
commit fcb1f0d
Showing
1 changed file
with
63 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
--- | ||
layout: post | ||
title: 利用QEMU发现可优化指令的通用方法 | ||
author: "Fei Wu" | ||
header-mask: 0.4 | ||
tags: | ||
- Qemu | ||
- RISC-V | ||
- Performance | ||
--- | ||
|
||
* content | ||
{:toc} | ||
|
||
# 起因 | ||
|
||
看到Svvptc proposal的一封邮件 [Feedback Request on Svvptc Extension Proposal](https://lists.riscv.org/g/tech-privileged/topic/feedback_request_on_svvptc/103045776),里面提到sfence.vma指令执行太多有优化空间。 | ||
|
||
``` | ||
+--------------------------+------+-----------------+ | ||
| Test | Gain | # of SFENCE.VMA | | ||
+--------------------------+------+-----------------+ | ||
| Kernel boot | 6% | 50535 -> 8768 | | ||
| ltp - mmapstress01 | 8% | 44978 -> 6300 | | ||
| lmbench - lat_pagefault | 20% | 665254 -> 832 | | ||
| lmbench - lat_mmap | 5% | 546401 -> 718 | | ||
+--------------------------+------+-----------------+ | ||
1. The gains represent performance improvements for | ||
the benchmark metrics. | ||
2. The second column lists the reduction in | ||
issued single-address SFENCE.VMA instructions. | ||
``` | ||
|
||
# 一些想法 | ||
|
||
除了Svvptc扩展本身,我其实对怎么发现这个优化点更感兴趣,甚至有没有更通用的方法去发现一系列优化点。如果我们把这个问题抽象成,一条执行代价较高的指令执行了较多次数,那这个问题是比较适合qemu之类的binary translation工具来做的,毕竟所有的guest指令都要经过qemu来翻译,很多时候这比真实硬件来得精确,况且真实硬件也不一定有对应的pmu。 | ||
|
||
# 验证想法 | ||
|
||
借助qemu的insn plugin,我们可以统计在某种场景比如bootup时sfence.vma的执行情况 | ||
|
||
``` | ||
qemu-system-riscv64 -machine virt -cpu rv64 -m 4G -smp 4 \ | ||
-plugin $PD/libinsn.so,match=sfence -d plugin -D log \ | ||
``` | ||
|
||
bootup后的统计大致这样,可以看到比邮件里面统计的还多不少 | ||
|
||
``` | ||
0x80003262, 'sfence.vma a4,s3', 61053 hits , cpu 0, 235498 match hits, Δ+1807 since last match, 46420 avg insns/match | ||
0x80003262, 'sfence.vma a4,s3', 61054 hits , cpu 1, 284713 match hits, Δ+1806 since last match, 35344 avg insns/match | ||
0x80003262, 'sfence.vma a4,s3', 61055 hits , cpu 2, 198382 match hits, Δ+1807 since last match, 37572 avg insns/match | ||
0x80003262, 'sfence.vma a4,s3', 61056 hits , cpu 3, 207546 match hits, Δ+12931 since last match, 38918 avg insns/match | ||
0x80003262, 'sfence.vma a4,s3', 61057 hits , cpu 0, 235499 match hits, Δ+1806 since last match, 46420 avg insns/match | ||
0x80003262, 'sfence.vma a4,s3', 61058 hits , cpu 2, 198383 match hits, Δ+1806 since last match, 37572 avg insns/match | ||
0x80003262, 'sfence.vma a4,s3', 61059 hits , cpu 1, 284714 match hits, Δ+1806 since last match, 35344 avg insns/match | ||
``` | ||
|
||
# 通用化 | ||
|
||
* 同样方法我们用来检查fence,cmo或者其他指令,自动上报相关优化点而不需要额外的人工去分析和上报 | ||
* 可以把更多场景作为测试用例加进来 | ||
* 将热点和代码关联起来,现在的统计方法还有优化空间 |