From d577a10c295de2b5c7f14e6c64a24edb6d47342d Mon Sep 17 00:00:00 2001
From: Jayant <xiezhengyao@bytedance.com>
Date: Thu, 18 Jul 2024 21:27:41 +0800
Subject: [PATCH] docs: optimize readme and doc (#10)

* chore: optimize readme and doc

* Update principle_cn.md
---
 README.md                             |  31 +++----
 docs/principle.md                     | 129 ++++++++++++++++++++++++++
 docs/principle_cn.md                  | 128 +++++++++++++++++++++++++
 pkg/proc/{objects.go => reference.go} |   0
 4 files changed, 270 insertions(+), 18 deletions(-)
 create mode 100644 docs/principle.md
 create mode 100644 docs/principle_cn.md
 rename pkg/proc/{objects.go => reference.go} (100%)
diff --git a/README.md b/README.md
index 91405c2..12211dd 100644
--- a/README.md
+++ b/README.md
@@ -4,20 +4,15 @@
 [![License](https://img.shields.io/github/license/cloudwego/goref)](https://github.com/cloudwego/goref/blob/main/LICENSE-APACHE)
 
 Goref is a Go heap object reference analysis tool based on delve.
+
 It can display the space and object count distribution of Go memory references, which is helpful for efficiently locating memory leak issues or viewing persistent heap objects to optimize GC overhead.
 
 ## Installation
 
-Clone the git repository and build:
-
 ```
-$ git clone https://github.com/cloudwego/goref
-$ cd goref
-$ go install github.com/cloudwego/goref/cmd/grf
+$ go install github.com/cloudwego/goref/cmd/grf@latest
 ```
 
-> Supported go version to compile the command tool: go1.21 ~ go1.22.
-
 ## Usage
 
 Attach to a running process with its PID, and then use go pprof tool to open the output file.
@@ -30,11 +25,14 @@ $ go tool pprof -http=:5079 ./grf.out
 
 The opened HTML page displays the reference distribution of the heap memory. You can choose to view the "inuse space" or "inuse objects".
 
-<img width="1920" alt="image" src="https://github.com/cloudwego/goref/assets/24311963/a9fe0294-fe58-456a-a9d5-a8cb25049bff"> <br />
+<img width="1919" alt="image" src="https://github.com/user-attachments/assets/95afe64b-0aab-4de6-93af-b5e671f43b0c">
+
+> DWARF parsing of closure type was not supported before Go 1.23, so sub objects of `wpool.task` cannot be displayed.
 
 View flame graph of inuse objects:
 
-<img width="1916" alt="image" src="https://github.com/cloudwego/goref/assets/24311963/24e80f51-3af3-4405-8f71-57e51c42c7ed">
+<img width="1917" alt="image" src="https://github.com/user-attachments/assets/86e76318-7eea-4180-96e3-7a184e65252b">
+
 
 It also supports analyzing core files, e.g.
 
@@ -43,18 +41,15 @@ $ grf core ${execfile} ${corefile}
 successfully output to `grf.out`
 ```
 
-> Supported go version for executable file: go1.17 ~ go1.22.
+## Go Version Constraints
+
+- Executable file: go1.17 ~ go1.22.
+- Compile goref tool: >= go1.21.
 
 
-## Principle
-The main steps of Goref reference analysis are as follows:
+## Docs
 
-1. Based on Delve, implement the functionality to attach processes and parse core files to achieve memory reading of the process to be analyzed.
-2. Parse the goroutine stack, data/bss segments, and heap span address space ranges, as well as gcmask, from the memory of the process to be analyzed, and build an index in the tool's runtime memory.
-3. Read DWARF Entries from the process executable file, parse the type descriptions of global variables and goroutine local variables, and calculate the actual memory addresses using Location expressions.
-4. Starting from the root objects obtained from step 3, prioritize object retrieval based on their DWARF types, determining the reference paths of all objects whose types can be determined.
-5. Search for gcmask and perform a second search for any objects that were not accessed during the DWARF retrieval in step 4, recording the found objects on the reference paths of known types.
-6. Record the number of objects and memory space occupied by all reference chains in the pprof file buffer, and then flush it to a file. In this way, we'll obtain a complete flame graph of the reference chains.
+[Principle](docs/principle.md) | [实现原理](docs/principle_cn.md)
 
 ## Credit
 
diff --git a/docs/principle.md b/docs/principle.md
new file mode 100644
index 0000000..46dc7e8
--- /dev/null
+++ b/docs/principle.md
@@ -0,0 +1,129 @@
+# Is Pprof Really Sufficient?
+
+As Go developers, we may often encounter issues of memory leaks, and most people's initial approach is to generate a heap profile to identify the cause of the problem. However, in many cases, the heap profile flame graph is not very helpful in troubleshooting because it only records where objects were created. In complex business scenarios where objects are passed through multiple layers of dependencies or reused in memory pools, it becomes nearly impossible to locate the root cause based solely on the stack information of object creation.
+
+It is well known that Go is a garbage-collected language, and when an object cannot be freed, it is almost always because the GC has marked it as live through reference analysis. In contrast, Java, as another GC-enabled language, has more sophisticated analysis tools. For example, JProfiler can effectively provide object reference relationships. Therefore, we also wanted to develop an efficient reference analysis tool for Go that can accurately and directly show us memory reference distribution and relationships, freeing us from the difficulties of static analysis.
+
+The good news is that we have made significant progress in developing this tool, and its usage and results are described in the README document. The following will provide a detailed explanation of the implementation of this tool.
+
+# Ideas
+
+## GC Mark Process
+
+Before diving into the specific implementation, let's review how GC marks objects as live.
+
+Go adopts a tiered allocation scheme similar to tcmalloc, where each heap object is assigned to an `mspan` during allocation, and its size is fixed. During GC, a heap address is used to locate the corresponding `mspan` through multiple-level indexing, enabling access to the base and size of the original object. The GC bitmap marks whether each 8-byte aligned address in the memory space of an object is a pointer type, allowing for further marking of downstream objects.
+
+For example, consider the following Go code snippet:
+
+```Go
+type Object struct {
+    A string
+    B int64
+    C *[]byte
+}
+// global variables
+var a = echo()
+var b *int64 = &echo().B
+func echo() *Object {
+    bytes := make([]byte, 1024)
+    return &Object{A: string(bytes), C: &bytes}
+}
+```
+
+When GC scans the variable `b`, it doesn't just scan the memory of the field `B int64` directly. Instead, it looks up the base address and elem size through the `mspan` index before performing the scan. As a result, the memory of fields `A` and `C`, as well as their downstream objects, will be marked as live.
+
+When GC scans the variable `a`, it encounters a corresponding GC bit of `1010`. How should we interpret this? We can consider it as the addresses `base+8` and `base+24` being pointers, indicating that further scanning of downstream objects is required. Both `A string` and `C *[]byte` contain pointers that point to downstream objects.
+
+Based on this brief analysis, we can conclude that to find all live objects, the basic principle is to start from the GC roots and scan the GC bits of objects one by one. If an address is marked as `1`, we continue scanning downstream. For each downstream address, we need to determine its mspan to obtain the complete object's base address, size, and GC bits.
+
+## DWARF Type Information
+
+However, knowing the object's reference relationships alone is almost useless for troubleshooting purposes. It doesn't provide any helpful variable names that developers can use to pinpoint issues. Therefore, there is a crucial step involved in obtaining the variable names and type information of these objects.
+
+Go itself is a statically typed language, and objects typically do not directly contain their type information. For example, when we create an object using the `obj = new(Object)` function, the actual memory only stores the values of the fields `A/B/C`, occupying only 32 bytes of memory. In this case, how can we obtain the type information?
+
+# Implementation of Goref
+
+## Delve Tool Introduction
+
+Those who have experience with Go development are likely familiar with Delve. Even if you think you haven't used it directly, if you've used the code debugging functionality in the Goland IDE, it is actually based on Delve underneath. Now that we've mentioned it, I believe you can recall the debugging window with variable names, values, and types displayed. Yes, those are exactly the type information we need!
+
+So, how does Delve obtain this type information for variables? When we attach to a process, Delve reads the executable file from the symbolic link `/proc/<pid>/exe`, which points to the actual ELF file path. During Go compilation, various debug information is generated and stored in sections prefixed with `.debug_*` in the executable file, following the DWARF standard format. The type information for global and local variables, which is needed for reference analysis, can be parsed from these DWARF information.
+
+For global variables: Delve iterates over all DWARF entries and parses the ones with the `Variable` tag, which contain attributes such as Location, Type, and Name.
+
+1. Among them, the Type attribute records the type information of the variable. By recursively traversing it according to the DWARF format, we can further determine the type of each sub-object of the variable.
+
+2. The Location attribute is relatively complex. It stores an executable expression or a simple variable address. Its purpose is to determine the memory address of a variable or return the value of a register. During the resolution of global variables, Delve uses the Location attribute to obtain the memory address of the variable.
+
+
+The principle of resolving local variables within a Goroutine is similar to that of global variables, but it is slightly more complex. For example, it requires determining the DWARF offset based on the PC (Program Counter), and the location expressions can be more intricate, involving register access. However, delving into these details is beyond the scope of this discussion.
+
+## Building Metadata for GC Analysis
+
+Through the process attach and core file analysis features provided by Delve, we can also obtain memory access permissions. Following the approach of marking objects in the GC, we construct the necessary metadata of the target process in the runtime memory of our tool. This includes:
+
+1. The address space ranges of each Goroutine stack in the target process, including the `stackmap` that stores the gcmask for each Goroutine stack. The `stackmap` is used to determine whether it may point to a live heap object.
+
+2. The address space ranges of each data/bss segment in the target process, including the gcmask for each segment. The gcmask is also used to determine whether it may point to a live heap object.
+
+3. The above two steps are necessary to obtain the GC Roots information.
+
+4. The final step is to read the mspan index of the target process and reconstruct this index in the memory of our tool, including the base, elem size, gcmask, and other information for each mspan.
+
+
+The above steps provide a general overview of the process, but there are additional details to consider, such as handling GC finalizer objects and special handling for the allocation header feature in Go 1.22. However, these details are beyond the scope of this discussion.
+
+## DWARF Type Scan
+
+All preparations are complete except one thing. Whether it is the GC metadata for heap scanning or the type information for GC root variables, they have been successfully parsed. Now, the most crucial step of object reference analysis begins its execution.
+
+We invoke the `findRef` function and access the memory of the object based on different DWARF types. Assuming it is a pointer that may point to a downstream object, we read the value of the pointer and search for the corresponding downstream object in the GC metadata. At this point, as mentioned earlier, we have obtained information such as the object's base address, element size, and GC mask.
+
+If the object is accessed, record a mark bit to avoid repeated access to the object. Construct a new variable using the DWARF sub-object type, and recursively invoke `findRef` again until all known types of objects are confirmed.
+
+However, this reference scanning approach is completely contradictory to the way GC operates. The main reason is that Go contains a significant amount of unsafe type conversions. It is possible that an object, after creation, may have pointer fields, such as:
+
+```Go
+func echo() *byte {
+    bytes := make([]byte, 1024)
+    obj := &Object{A: string(bytes), C: &bytes}
+    return (*byte)(unsafe.Pointer(obj))
+}
+```
+
+From the perspective of GC, although the type was converted to `*byte` using unsafe, it did not affect the marking of its gcmask. Therefore, when scanning downstream objects, the complete `Object` object can still be scanned, and the downstream object `bytes` can be identified and marked as live.
+
+However, this is not achievable through DWARF type scanning. When encountering the `byte` type, it is considered an object without pointers and further scanning is skipped. Therefore, the only solution is to prioritize DWARF type scanning, and for objects that cannot be scanned using this method, resort to GC-style marking.
+
+To achieve this, each time we access a pointer of an object using the DWARF type, we mark its corresponding gcmask from 1 to 0. After scanning an object, if there are still pointers with non-zero marks within the object's address space, they are recorded as tasks for final marking. Once all objects have been scanned using the DWARF type, these final marking tasks are retrieved and subjected to a second scan using GC's approach.
+
+For example, in the case of the `Object` object mentioned above, its gcmask is `1010`. After reading field A, the gcmask becomes `1000`. If field C is not accessed due to type coercion or memory out-of-bounds, it will be accounted for during the final GC marking scan.
+
+
+## Final Scan
+
+The aforementioned field C, fields that cannot be accessed due to exceeding the address range defined by DWARF, or variables of types like `unsafe.Pointer` that cannot have their types determined, will all be marked during the final scan. Since the specific types of these objects cannot be determined, there is no need to output them separately. It is sufficient to record their size and count in the known reference chain.
+
+In the native Go implementation, several commonly used libraries make use of `unsafe.Pointer`, which causes issues with identifying sub-objects. Special handling is required for such types.
+
+## Output File Format
+
+Once all objects have been scanned, the reference chains along with the number of objects and their memory space will be output to a file. The file will be aligned with the pprof binary file format and encoded using protobuf.
+
+1. **Output** **root object format:**
+
+- Stack variable format: package name + function name + stack variable name
+
+   `github.com/cloudwego/kitex/client.invokeHandleEndpoint.func1.sendMsg`
+
+- Global variable format: package name + global variable name
+
+   `github.com/cloudwego/kitex/``pkg/loadbalance/lbcache.balancerFactories`
+
+2. **Output** **sub-object format:**
+
+- Output the field name and type name of the child object, in the form of:
+
+   `Conn. (net.Conn)`
\ No newline at end of file
diff --git a/docs/principle_cn.md b/docs/principle_cn.md
new file mode 100644
index 0000000..9de207d
--- /dev/null
+++ b/docs/principle_cn.md
@@ -0,0 +1,128 @@
+# Pprof 真的够用吗？
+
+作为 Go 研发经常会遇到内存泄露的情况，大部分人第一时间会尝试打一个 heap profile 看问题原因。但很多时候，heap profile 火焰图对问题排查起不到什么帮助，因为它只记录了对象是在哪创建的。然而，在一些复杂业务场景下，对象经过多层依赖传递或者内存池复用，几乎已经无法根据创建的堆栈信息定位根因。
+
+众所周知， Go 是带 GC 的语言，一个对象无法释放，几乎 100% 是由于 GC 通过引用分析将其标记为存活。而同样作为 GC 语言，Java 的分析工具就更加完善了，比如 JProfiler 可以有效地给出对象引用关系。因此，我们也想在 Go 上实现一个高效的引用分析工具，能够准确直接地告诉我们内存引用分布和引用关系，帮我们从艰难的静态分析中解放出来。
+
+好消息是，我们已基本完成了这个工具的开发工作，使用方式和效果展示见 README 文档。以下将对这个工具的实现做详细讲解。
+
+# 思路
+
+## GC 标记过程
+
+在讲具体实现之前，我们先回顾一下 GC 是怎么标记对象的存活的。
+
+Go 采用类似于 tcmalloc 的分级分配方案，每个堆对象在分配时会指定到一个`mspan`上，它的size是固定的。在GC时，一个堆地址会根据多级索引查找到这个`mspan`，从而得到原始对象的base和size。在 gc bitmap 中标记了一个对象所在内存的每 8 字节对齐的地址是否是一个指针类型，从而判断是否进一步标记下游对象。
+
+例如以下 go 代码片段：
+
+```Go
+type Object struct {
+    A string
+    B int64
+    C *[]byte
+}
+// global variables
+var a = echo()
+var b *int64 = &echo().B
+func echo() *Object {
+    bytes := make([]byte, 1024)
+    return &Object{A: string(bytes), C: &bytes}
+}
+```
+
+GC 在扫描变量`b`时，不是只简单地扫描`B int64`这个字段的内存，而是通过`mspan`索引查找出`base`和`elem size`后再进行扫描，因此，字段 A 和 C 以及它们的下游对象的内存都会被标记为存活。
+
+GC 扫描变量`a`变量时，发现对应的gc bit是`1010`，怎么理解呢？可以认为是`base+8`和`base+24`的地址是指针，要继续扫描下游对象，这里`A string`和`C *[]byte`都包含了一个指向下游对象的指针。
+
+基于以上的简要分析，我们可以发现，要找到所有存活的对象，简单的原理就是从 GC Root 出发，挨个扫描对象的 gc bit，如果某个地址被标记为`1`，就继续向下扫描，每个下游地址都要确定它的 mspan，从而获取完整的对象基地址、大小和 gc bit。
+
+## DWARF 类型信息
+
+然而，光知道对象的引用关系对于问题排查几乎没有任何帮助。因为它不能输出任何有效的可供研发定位问题的变量名称。所以，还有一个很关键的步骤是，获取到这些对象的变量名和类型信息。
+
+Go 本身是静态语言，对象一般不直接包含其类型信息，比如我们通过`obj=new(Object)`函数创建一个对象，实际内存只存储了`A/B/C`三个字段的值，在内存中只有 32 字节大小。既然如此，有什么办法能拿到类型信息呢？
+
+# Goref 的实现
+
+## Delve工具介绍
+
+有过 Go 开发经历的同学应该都用过 Delve，如果你觉得自己没用过，不要怀疑，你在 Goland IDE 上玩的代码调试功能，底层就是基于 Delve 的。说到这里，相信大家已经回忆起 Debug 时调试窗口的画面了，没错，调试窗口所展示的变量名，变量值，变量类型这些信息，不正是我们需要的类型信息吗！
+
+那么，Delve 是怎么获取这些变量类型信息的呢？在我们 attach 进程时，delve 从`/proc/<pid>/exe`读取软链接到实际 elf 文件路径的可执行文件。Go 编译时会生成一些调试信息，以 DWARF 标准格式存储在可执行文件的 `.debug_*` 前缀的 section 节里。引用分析所需要的全局变量和局部变量的类型信息就可以通过这些 DWARF 信息解析得到。
+
+对于全局变量：Delve 迭代读取所有 DWARF Entry ，解析出带`Variable`标签的全局变量的 DWARF Entry。这些 Entry 包含了 Location、Type、Name 等属性。
+
+1. 其中，Type 属性记录了它的类型信息，按 DWARF 格式递归遍历，可以进一步确定变量的每一个子对象类型；
+
+2. Location 则是一个相对复杂的属性，它记录了一个可执行的表达式或者一个简单的变量地址，作用是确定一个变量的内存地址，或者返回寄存器的值。在全局变量解析时，Delve 通过它获得了变量的内存地址。
+
+
+Goroutine 中的局部变量解析的原理与全局变量大同小异，不过还是要更复杂一些。比如需要根据 PC 确定 DWARF offset，同时 location 表达式也会更复杂，还涉及到寄存器访问。这里不再展开。
+
+## GC 分析的元信息构建
+
+通过 Delve 提供的进程 attach 和 core 文件分析功能，我们还可以获取到内存访问权限。我们仿照 GC 标记对象的做法，在工具的运行时内存中构建待分析进程的必要元信息。这包括：
+
+1. 待分析进程的各个 goroutine stack 的地址空间范围，并包括每个 goroutine stack 存储 gcmask 的 `stackmap`，用来标记是否可能指向一个存活的堆对象；
+
+2. 待分析进程的各个 data/bss segment 的地址空间范围，包括每个 segment 的 gcmask，也是用来标记是否可能指向一个存活的堆对象；
+
+3. 以上两步都是获取 GC Roots 的必要信息；
+
+4. 最后一步是读取待分析进程的 `mspan` 索引，以及每个 `mspan` 的 base、elem size、gcmask等信息，在工具的内存中复原这个索引；
+
+
+以上步骤是大概的流程，其中还有一些细节问题的处理，例如对 gc finalizer 对象的处理，以及对 go1.22 版本 allocation header 特性的特殊处理，这里不再展开。
+
+## DWARF 类型扫描
+
+万事俱备，只欠东风。不管是堆扫描的 GC 元信息，还是 GC Root 变量的类型信息都已经完成解析。那么所谓的“东风”就是最关键的对象引用关系分析环节了。
+
+我们调用`findRef`函数，按不同的 DWARF 类型访问对象的内存，假设是一个可能指向下游对象的指针，则读取指针的值，在 GC 元信息里找到这个下游对象。这时，按前所述，我们得到了对象的 base、elem size、gcmask 等信息。
+
+如果对象被访问到，记录一个 mark bit 位，以避免对象被重复访问。通过 DWARF 子对象类型构造一个新的变量，再次递归调用`findRef`直至所有已知类型的对象被全部确认。
+
+然而，这种引用扫描方式和 GC 的做法是完全相悖的。主要原因在于，Go 里面有大量不安全的类型转换，可能某个对象在创建后是带了指针字段的对象，比如：
+
+```Go
+func echo() *byte {
+    bytes := make([]byte, 1024)
+    obj := &Object{A: string(bytes), C: &bytes}
+    return (*byte)(unsafe.Pointer(obj))
+}
+```
+
+从 GC 的角度出发，虽然 unsafe 转换了类型为`*byte`，但并没有影响其 gcmask 的标记，所以在扫描下游对象时，仍然能扫描到完整的 `Object` 对象，识别到 `bytes` 这个下游对象，从而将其标记为存活。
+
+但 DWARF 类型扫描可做不到，在扫描到 `byte` 类型时，会被认为是无指针的对象，直接跳过进一步的扫描了。所以，唯一的办法是，优先以 DWARF 类型扫描，对于无法扫到的对象，再用 gc 的方式来标记。
+
+要实现这一点，做法是每当我们用 DWARF 类型访问一个对象的指针时，都将其对应的 gcmask 从 1 标记为 0，这样在扫描完一个对象后，如果对象的地址空间范围内仍然有非 0 标记的指针，就把它记录到最终标记的任务里。等到所有对象通过 DWARF 类型扫描完成后，再把这些最终标记任务取出来，以 GC 的做法二次扫描。
+
+例如，上述 `Object` 对象访问时，其 gcmask 是`1010`，读取字段 A 后，gcmask 变成 `1000`，如果字段 C 因为类型强转或内存越界没有访问到，则在最终扫描的 GC 标记时就会被统计到。
+
+## 最终扫描
+
+上述的 C 字段，或者因为超过了 DWARF 定义的地址范围而无法访问到的字段，又或者像 `unsafe.Pointer` 这种无法确定类型的变量，都会在最终扫描时被标记。因为这些对象没法确定具体的类型，所以不需要专门输出，只需要把 size 和 count 记录到已知的引用链路中即可。
+
+在 go 原生实现中，有不少常用库都采用了`unsafe.Pointer`，导致子对象识别出现问题，这类类型要做特殊处理。
+
+## 输出文件格式
+
+所有对象扫描完毕后，将引用链路及其对象数、对象内存空间输出到文件，文件对齐 pprof 二进制文件格式，采用 protobuf 编码。
+
+1. **输出的根对象格式：**
+
+- 栈变量格式：包名 + 函数名 + 栈变量名
+
+    `github.com/cloudwego/kitex/client.invokeHandleEndpoint.func1.sendMsg`
+
+- 全局变量格式：包名 + 全局变量名
+
+    `github.com/cloudwego/kitex/``pkg/loadbalance/lbcache.balancerFactories`
+
+2. **输出的子对象格式：**
+
+- 输出子对象的字段名和类型名，形如：
+
+    `Conn. (net.Conn)`
diff --git a/pkg/proc/objects.go b/pkg/proc/reference.go
similarity index 100%
rename from pkg/proc/objects.go
rename to pkg/proc/reference.go