Deadlock detection #152

Pranav-Agarwal0612 · 2025-02-07T04:22:16Z

Deadlock Module
Created a dependency graph between Objects.
Tested with mutex cycle detection.

Lock Module
Changed scan_mutex_lock to return its findings for graph creating purpose

Created a dependency graph between Objects. Tested with mutex cycle detection. Lock Module Changed scan_mutex_lock to return its findings for graph creating purpose

brenns10 · 2025-02-10T18:07:18Z

drgn_tools/deadlock.py

@@ -0,0 +1,109 @@
+import argparse


This new file needs an Oracle copyright header. See the .header.txt file, but make sure to use 2025 :)
(This should have been automatically caught by our pre-commit hooks. Please install them and use them for the future, see the developer workflow doc.)

brenns10 · 2025-02-10T18:08:46Z

drgn_tools/deadlock.py

+
+class DependencyGraph:


nit: there should be two blank lines separating the "import" statements from the rest of the top-level code.

This should be automatically detected, and fixed, by our pre-commit hooks. I won't add any more comments on coding style, since it's not really interesting and the pre-commit hooks should fix everything.

brenns10 · 2025-02-10T18:12:53Z

drgn_tools/deadlock.py

+        def __init__(self, object: Object) :
+            self.name: str = object.type_.type_name() # TypeName of Object inside Node


I think it would be good to allow creating a Node object that doesn't represent a drgn Object. For instance, a CPU or IRQ number are things we may want to use in deadlock detection, and those don't necessarily have an object associated with them. In that case, I think you'd want to allow something like:

node = Node("cpu", 5) # but also: node = Node(mutex_object)

brenns10 · 2025-02-10T18:34:51Z

drgn_tools/deadlock.py

+            self.name: str = object.type_.type_name() # TypeName of Object inside Node
+            self.address: int = object.value_() # Memory address of Object in host RAM
+            self.object: Object = object # The object itself
+            self.depends_on: list[__class__] = []


Type annotations for collections (List, Tuple, Dict, etc) should use the classes from typing (e.g. from typing import List). This is because we need to maintain compatibility with Python 3.6. The default objects (list, tuple, dict) cannot be used in annotations until Python 3.9.

Type annotations for the class itself should be a string with the class name. This is because the class name ("Node") does not exist in the namespace yet. The postponed evaluation of annotations feature would allow us to use the class name, and the typing.Self class would also be helpful here, but unfortunately neither are available for Python 3.6. Thus, you need to just use a string :(

Suggested change

self.depends_on: list[__class__] = []

self.depends_on: List["Node"] = []

brenns10 · 2025-02-10T18:49:30Z

drgn_tools/deadlock.py

+    class Node:
+        def __init__(self, object: Object) :
+            self.name: str = object.type_.type_name() # TypeName of Object inside Node
+            self.address: int = object.value_() # Memory address of Object in host RAM


This needs to be slightly more intelligent. Drgn objects, like C objects, can either be a pointer, or a value. So you can have a drgn struct mutex *, or a drgn struct mutex. These are different. The object.value_() method will return an int for the pointer, but for the actual struct, it would return a dict of the contents. So it's possible to have two different behaviors:

A user passes a struct mutex * (the expected behavior). They get a node with name = "struct mutex *" and the correct value.

A user passes a struct mutex. They get a node with name = "struct mutex *" (no star), and the value is a dictionary, which is not hashable.

Many common mutexes would be accessed in a way that the object you get is actually a struct mutex, not a struct mutex *. But other access mechanisms may result in a pointer. So this needs to handle both. I'd recommend something like this:

Suggested change

self.address: int = object.value_() # Memory address of Object in host RAM

type_ = object.type_

if type_.kind == TypeKind.POINTER:

object = object[0]

type_ = object.type_

if type_.kind != TypeKind.STRUCT or object.address_ is None:

raise ValueError("A reference object of type struct is expected")

self.name = type_.type_name()

self.address = object.address_

self.address: int = object.value_() # Memory address of Object in host RAM

Question: Can there be objects of type struct <something> **, or deeper?

brenns10 · 2025-02-10T18:52:02Z

drgn_tools/deadlock.py

+        self.NodeSet: dict[tuple, self.Node] = dict()
+
+
+    def insert(self, dependency: tuple) -> None:


Using a tuple for an argument here is a bit unintuitive. The first argument is an object, and the second argument is a list of things it depends on. I think it would make more sense to make this function take two arguments, rather than a single tuple.

That said, I don't think this is the right API. We shouldn't expect that the caller has every single dependency ahead of time. Rather, I think it would be nice to have a simple API: add_edge(src, dst). This would create a node for both src and dst if necessary, and then it would add dst to the depends_on list of src. This makes it easier for the caller: they don't need to form a list of all the dependencies ahead of time. They can just record each dependency as they discover them.

Another note on the type annotations... I won't highlight all of them, you get the idea. Again, this should use Tuple rather than tuple, and it should include annotations of the elements of the tuple as well.

Finally, can you have a docstring that describes the arguments? In particular, document which node depends on which other node. The order of the arguments is important :)

brenns10 · 2025-02-10T18:56:13Z

drgn_tools/deadlock.py

+            for neighbor in node.depends_on:
+                dfs(neighbor)


I don't think recursive DFS will be a good idea. The Python stack has a limited number of frames, and while I would hope that there isn't a cycle composed of 1000 elements, I think it would be prudent to avoid a known error case like that.

brenns10 · 2025-02-10T18:57:23Z

drgn_tools/deadlock.py

+
+
+    def __init__(self):
+        self.NodeSet: dict[tuple, self.Node] = dict()


For class variables, please try to use lower_case_with_underscores. EG node_set.

Though in this case, it's not really accurate, because it's a dictionary. It's the actual node mapping. I would just call it nodes or something.

(Also again, note on using the typing versions of classes for annotations here)

brenns10 · 2025-02-10T18:59:22Z

drgn_tools/deadlock.py

+        if not blocking_object:
+            return 
+
+        key = (blocking_object.type_.type_name(), blocking_object.value_())


I had some comments above on getting the correct type name and value. Since you're repeating this logic, it would be good to have it in a helper function.

brenns10 · 2025-02-10T19:10:09Z

drgn_tools/deadlock.py

+
+
+
+    def detect_cycle(self):


This needs a type annotation.

brenns10 · 2025-02-10T19:10:10Z

drgn_tools/deadlock.py

+
+
+
+    def detect_cycle(self):


This needs a type annotation.

Should I take the detect_cycle function out of the class?

brenns10 · 2025-02-10T19:14:49Z

drgn_tools/deadlock.py

+        for cycle in cycles:
+            print(cycle)


The cycle, as far as I can tell, is just a list of nodes. This will not be nicely formatted or very intuitive for the user. We need to think more carefully about how we format this for the user, so that it is understandable and useful. We can make this a TODO if the problem is a bit larger than we can fit into a single sprint but I think it's important.

brenns10 · 2025-02-10T20:06:02Z

drgn_tools/deadlock.py

+            if node not in visited:  # Only visit unvisited nodes
+                dfs(node)
+
+        return cycles


I think the problem we want to solve is larger than simply running DFS on each node. Suppose we have the following graph:

A -> D B -> D C -> D D -> C E -> F

There are two things we care about.

What is the minimal cycle present? In this case, it is C <-> D.

What is the maximal set of of elements whose dependencies include the cycle? In this case, it is A, B, C, D.

I think we want to return both pieces of information, and we don't want any duplication. That way, we would be able to report to the user the root cause of the deadlock (that's the first thing), and we'd also be able to go through the second item and report all the tasks or CPUs or whatever are impacted by the cycle, without needing to spell out the entire path for each of them.

This can easily be solved by changing A depends_on B with B blocks A. So when we find the cycle, we can also print out the objects/nodes blocked on the nodes present in the cycle. There might be future benefits of keeping both depends_on and blocks, but I am not sure right now.

brenns10 · 2025-02-10T20:06:47Z

drgn_tools/lock.py

-) -> None:
+) -> List[Tuple[Object, List[Object]]]:


Instead of tracking and returning the dependencies, maybe just include an optional argument which is the graph. If the argument is present, then the function would add edges to the graph as it encounters them.

I think instead of a graph argument, we should have a separate get_mutex_lock_info function from the scan_mutex_lock function. Because even on passing a graph argument, the lock info will get printed a second time on both lock and deadlock modules. I am thinking of a way to prevent duplicate code.

Deadlock Module

ba97165

Created a dependency graph between Objects. Tested with mutex cycle detection. Lock Module Changed scan_mutex_lock to return its findings for graph creating purpose

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Feb 7, 2025

brenns10 reviewed Feb 10, 2025

View reviewed changes

Pranav-Agarwal0612 added 2 commits February 18, 2025 09:18

Added Node creation for non-objects. Fixed design and formatting

7265719

Fixed Cycle detection algorithm from recursive to iterative

5b9a21e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock detection #152

Deadlock detection #152

Pranav-Agarwal0612 commented Feb 7, 2025

brenns10 Feb 10, 2025

brenns10 Feb 10, 2025

brenns10 Feb 10, 2025

brenns10 Feb 10, 2025

brenns10 Feb 10, 2025

Pranav-Agarwal0612 Feb 12, 2025

brenns10 Feb 10, 2025

brenns10 Feb 10, 2025

brenns10 Feb 10, 2025

Pranav-Agarwal0612 Feb 18, 2025

brenns10 Feb 10, 2025

Pranav-Agarwal0612 Feb 18, 2025

brenns10 Feb 10, 2025

Pranav-Agarwal0612 Feb 18, 2025

brenns10 Feb 10, 2025

Pranav-Agarwal0612 Feb 12, 2025

brenns10 Feb 10, 2025

brenns10 Feb 10, 2025

Pranav-Agarwal0612 Feb 12, 2025

brenns10 Feb 10, 2025

Pranav-Agarwal0612 Feb 12, 2025

		def __init__(self, object: Object) :
		self.name: str = object.type_.type_name() # TypeName of Object inside Node

	self.depends_on: list[__class__] = []
	self.depends_on: List["Node"] = []

-            self.address: int = object.value_() # Memory address of Object in host RAM
+            type_ = object.type_
+            if type_.kind == TypeKind.POINTER:
+                object = object[0]
+                type_ = object.type_
+            if type_.kind != TypeKind.STRUCT or object.address_ is None:
+                raise ValueError("A reference object of type struct is expected")
+            self.name = type_.type_name()
+            self.address = object.address_
+            self.address: int = object.value_() # Memory address of Object in host RAM

		self.NodeSet: dict[tuple, self.Node] = dict()


		def insert(self, dependency: tuple) -> None:



		def __init__(self):
		self.NodeSet: dict[tuple, self.Node] = dict()

Deadlock detection #152

Are you sure you want to change the base?

Deadlock detection #152

Conversation

Pranav-Agarwal0612 commented Feb 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment