Refactor py-spy & pyperf to separate ProfilerInterface. #805

marcin-ol · 2023-06-20T12:10:14Z

Refactor py-spy & pyperf to separate ProfilerInterface classes.

Description

Group profilers by common runtime (e.g.: Python, Java, etc.).
Support automatic selection of best available profiler for each runtime.

Related Issue

#500

Motivation and Context

We have PythonProfiler which is a class that hides the fact that we have 2 profiler types behind - py-spy or pyperf, and decides itself which one is used.
There was no mechanism to support multiple profiler options and keep them separate. More profilers will arrive in future that will benefit from this.

Also: #448

How Has This Been Tested?

Additional tests developed for new mechanism, added tests to ensure correct behavior of Python profiling.

Checklist:

I have read the CONTRIBUTING document.
I have added tests for new logic.

marcin-ol · 2023-06-20T12:53:20Z

Remaining actions for this PR:

Register PySpyProfiler and EbpfProfiler separately for python, using developed mechanism.
Organize profilers options: supported modes, runtime arguments should be either well separated or declared with runtime (Python) itself.

tests/conftest.py

Jongy · 2023-07-08T15:25:28Z

tests/conftest.py

+
+    # Reuse gProfiler flow to initialize specific profiler instance - one selected by runtime fixture.
+    @contextmanager
+    def _make_profiler_instance() -> Iterator[ProfilerBase]:


What's the reason to create profiler instances in such implicit way and not directly as we did before?

Personally I prefer the explicit way. This new flow is harder to understand and is more error prone :/

Yes, in hindsight that could be simpler.

Jongy · 2023-07-08T15:27:49Z

tests/test_python.py

+    )
+    wait_for_log(gprofiler, "gProfiler initialized and ready to start profiling", 0, timeout=7)
+    assert f"Initialized {profiler_class_name}".encode() in gprofiler.logs()
+    gprofiler.remove(force=True)


force=True uses SIGKILL, preventing gprofiler from doing proper cleanup. Why?

No specific reason, I think I made a shortcut here.
Will replace it.

Jongy · 2023-07-08T15:28:20Z

gprofiler/profilers/python_ebpf.py

+    "Python",
+    profiler_name="PyPerf",
+    is_preferred=True,
+    # py-spy is like pyspy, it's confusing and I mix between them


Comment not relevant here anymore

Jongy · 2023-07-08T15:34:18Z

gprofiler/profilers/registry.py

+def get_runtime_possible_modes(runtime: str) -> List[str]:
+    possible_modes: List[str] = [ProfilerConfig.ENABLED_MODE] if len(profilers_config[runtime]) > 1 else []
+    for config in profilers_config[runtime]:
+        possible_modes += [m for m in config.get_active_modes() if m not in possible_modes]


Instead of checking for existence you can build it as a set and then convert to a list.

Jongy · 2023-07-08T15:37:53Z

gprofiler/profilers/registry.py

+    arch = get_arch()
+    profiler_configs = sorted(
+        profilers_config[runtime],
+        key=lambda c: (arch in c.get_supported_archs(), c.is_preferred, c.profiler_name),


unsupported arches should be filtered out, not sorted away, no?

Yes, they should be.
This was needed for some earlier revision of this PR and will be removed.

On a second thought, this has one implication - we won't be able to explain to the user why profiler wasn't selected:

factory.get_profilers() is walking through this list of sorted profilers and will warn if some profilers are unavailable (on given architecture),

if we get an already-filtered list, user will only learn that no profilers were selected.

To be on the safe side we should filter available profilers when building list of command-line arguments. Therefore another function needs changing - _add_profilers_arguments(), to do this filtering.

Jongy · 2023-07-08T15:42:32Z