Skip to content

GDA Path Solver

charles2gan edited this page Dec 13, 2021 · 2 revisions

GDA Path Solver

Application fields of program path solving

   Vulnerability detection: When we determine the attack surface of the program, we need to know whether there is a path between the attack surface and the vulnerable function.

   Privacy Leaking:For any function of getting sensitive data in the program, it is necessary to know whether and where the data is leaked.

   Malicious code analysis: For sensitive functions in malicious code, it is necessary to determine whether and where these sensitive functions are triggered.

   Others: closed-source code audit, program semi-automatic analysis, etc.

The Idea of GDA Path Solving

   A program can be regarded as a set of finite nonlinear data flows, whose branches and flows are determined by the call chain and control flow. A flowing data with direction form a path. On the contrary, each path also determines the direction of data flow just like the pipeline of data. In a complex program, there would be dozens or even millions of paths, and then under the action of numerous branch instructions and call-tree, a complex network is formed. Therefore, the program path solution here mainly solves the connectivity problem of two or more specific nodes. All the connectivity paths are the set of solutions. Of course, the problem of multi-node connectivity can be simplified to the problem of two-node connectivity.

   In GDA, it mainly solves the path between two nodes, and the node of the path is a low-level intermediate expression(LIR). Here, I call the starting point of path solving as a source and the target node as a target. This guild mainly uses the method of data flow analysis to spread data with source as the starting point, combined with the control flow diagram, call tree, and object association. In the process of transmission, circulation (loop) and path explosion should be prevented. Finally, the solution results are recorded. In the real world application process, we do not have to spend a lot of energy on all the LIR nodes (any LIR node can be analyzed by using the taint propagation analysis based on smali), we pay more attention to function call nodes.

   In GDA decompiler, the path set of two nodes is solved by arbitrary value reachability. Because the two nodes often have too many paths to connect, it is very cumbersome to show on GUI. Therefore, in order to reduce the complexity, I will filter out the non-calling nodes in the solution results, and retain the key function call points, so as to facilitate analysis and locating.

HOW TO DO

   Open GDA decompiler, Menu > Tool > Path Solving, and the path solving dialog will pop up.

We can select APIs on the right side of the dialog to search the API in real-time. If you would like to search for NON-APIs(methods of classes), you can cancel the selection by clicking it. In the search results, we can choose an interesting result as the source or target.

   Here, we choose android.database.Cursor.getString as the source. Since we cannot access the internal logic of the API during static analysis, Ret-value is selected by default, which means that the current node takes the return value of the selected method as the starting point of the path. If we choose a NON-API as the source, we can also specify the parameters of the source.    

   Next, we search for a function as a target. Here we use the logging API(android.util.Log.d) as an example. We can also choose "MeetArg" to do refined soluting (for example, selecting the first parameter "p0" represents that there is an effective solution where the data stream from the source must pass through p0 before. If "meetarg" is not selected, any path through this API is one of the solutions).

   Finally, we click solve and the result is displayed in the form of a tree. When GDA is solving, first search all the call points of the source, and start the path search from the return value of each call point(if NON-API and chosen MeetArg, from chosen argument). Once the target is found, it will be added to the solution set. Through the result of the solution, we can see that there are three call points from source to target.

By default, GDA only writes the solved source and target to the tree.

If you are interested in the details of the path, you can right-click > detail to view the detail of path execution.

At the same time, when you click on each node, GDA automatically locates the node in the decompiling (or smali) code. Because it is difficult to save the reallocation of the original instructions when decompiling, so difficult to accurately locate the node location in Java code, but it can be accurately located in the smali code(just press F5).