diff --git a/_posts/2024-01-08-qemu-sfence.md b/_posts/2024-01-08-qemu-sfence.md
new file mode 100644
index 0000000..6c7f627
--- /dev/null
+++ b/_posts/2024-01-08-qemu-sfence.md
@@ -0,0 +1,63 @@
+---
+layout: post
+title: 利用QEMU发现可优化指令的通用方法
+author: "Fei Wu"
+header-mask: 0.4
+tags:
+  - Qemu
+  - RISC-V
+  - Performance
+---
+
+* content
+{:toc}
+
+# 起因
+
+看到Svvptc proposal的一封邮件 [Feedback Request on Svvptc Extension Proposal](https://lists.riscv.org/g/tech-privileged/topic/feedback_request_on_svvptc/103045776)，里面提到sfence.vma指令执行太多有优化空间。
+
+```
+  +--------------------------+------+-----------------+
+  | Test                     | Gain | # of SFENCE.VMA |
+  +--------------------------+------+-----------------+
+  | Kernel boot              | 6%   | 50535 -> 8768   |
+  | ltp - mmapstress01       | 8%   | 44978 -> 6300   |
+  | lmbench - lat_pagefault  | 20%  | 665254 -> 832   |
+  | lmbench - lat_mmap       | 5%   | 546401 -> 718   |
+  +--------------------------+------+-----------------+
+   1. The gains represent performance improvements for
+      the benchmark metrics.
+   2. The second column lists the reduction in
+      issued single-address SFENCE.VMA instructions.
+```
+
+# 一些想法
+
+除了Svvptc扩展本身，我其实对怎么发现这个优化点更感兴趣，甚至有没有更通用的方法去发现一系列优化点。如果我们把这个问题抽象成，一条执行代价较高的指令执行了较多次数，那这个问题是比较适合qemu之类的binary translation工具来做的，毕竟所有的guest指令都要经过qemu来翻译，很多时候这比真实硬件来得精确，况且真实硬件也不一定有对应的pmu。
+
+# 验证想法
+
+借助qemu的insn plugin，我们可以统计在某种场景比如bootup时sfence.vma的执行情况
+
+```
+qemu-system-riscv64 -machine virt -cpu rv64 -m 4G -smp 4 \
+        -plugin $PD/libinsn.so,match=sfence -d plugin -D log \
+```
+
+bootup后的统计大致这样，可以看到比邮件里面统计的还多不少
+
+```
+0x80003262, 'sfence.vma              a4,s3', 61053 hits , cpu 0, 235498 match hits, Δ+1807 since last match, 46420 avg insns/match
+0x80003262, 'sfence.vma              a4,s3', 61054 hits , cpu 1, 284713 match hits, Δ+1806 since last match, 35344 avg insns/match
+0x80003262, 'sfence.vma              a4,s3', 61055 hits , cpu 2, 198382 match hits, Δ+1807 since last match, 37572 avg insns/match
+0x80003262, 'sfence.vma              a4,s3', 61056 hits , cpu 3, 207546 match hits, Δ+12931 since last match, 38918 avg insns/match
+0x80003262, 'sfence.vma              a4,s3', 61057 hits , cpu 0, 235499 match hits, Δ+1806 since last match, 46420 avg insns/match
+0x80003262, 'sfence.vma              a4,s3', 61058 hits , cpu 2, 198383 match hits, Δ+1806 since last match, 37572 avg insns/match
+0x80003262, 'sfence.vma              a4,s3', 61059 hits , cpu 1, 284714 match hits, Δ+1806 since last match, 35344 avg insns/match
+```
+
+# 通用化
+
+* 同样方法我们用来检查fence，cmo或者其他指令，自动上报相关优化点而不需要额外的人工去分析和上报
+* 可以把更多场景作为测试用例加进来
+* 将热点和代码关联起来，现在的统计方法还有优化空间