Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System tracing provider for tracing via native probes (BPF) #1288

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

lc525
Copy link
Member

@lc525 lc525 commented Jul 7, 2023

This PR introduces a new tracing provider, enabling dynamic attachment of native (BPF, Systemtap) probes at runtime.
The goal of the provider is to allow correlation between MLServer-specific events and operating system behaviour (system load, performance, resource usage, etc).

Approach

The provider exposes a number of tracepoints (hook functions without any code attached), which can be triggered (fired)
when particular application-level events happen (e.g. a model gets loaded/unloaded, an inference request joins a queue, etc).

At runtime, external probes (BPF programs, Systemtap scripts) can be attached to those tracepoints and to perform tracing actions (measurements, in-linux-kernel data aggregations, correlating MLServer context with OS context, etc).

The underlying implementation creates and dynamically links a native shared library where the tracepoint hooks exist as functions containing just a couple of nop instructions. When external probes are attached, the code of those tracepoints is modified at runtime to jump into the tracing code.

Features

  • Completely optional feature, with the ability to enable it via the tracepoints extra
  • The exact tracepoints exposed for external probing configurable via settings
  • Near-zero overhead when not in use (no external probes attached). Because tracepoints look like normal functions from the perspective of python code, and receive arguments which might need to be computed, the provider offers a way of conditionally computing the tracepoint arguments based on whether an external probe has been attached to a particular tracepoint or not.

Introduced dependencies

  • When the tracepoints extra is enabled, MLServer will require additional dependencies, as follows:
  • MLServer will continue to work as normal even if the extra is enabled but the dependencies are not met
  • The Dockerfile has been updated to allow for container images in which system tracing dependencies are installed

TODOs

  • Add instrumentation to the inference request path (including for batched requests)
  • Stabilise tracepoint arguments
  • Write documentation example for simple usage via bpftrace
  • Move to "off-by-default" settings in settings/Dockerfile (left to enabled atm for testing)

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks amazing @lc525! 🚀

Conscious this is still a WIP draft, but I've added a few comments and some questions below. Happy to chat about how you foresee this being used!

mlserver/types/__init__.py Outdated Show resolved Hide resolved
Comment on lines 177 to 186
ArgTypeMap._prototypes = {
Tracepoint.model_load_begin: ArgTypeMap.model_args,
Tracepoint.model_load_end: ArgTypeMap.model_args,
Tracepoint.model_reload_begin: ArgTypeMap.model_args_reload,
Tracepoint.model_reload_end: ArgTypeMap.model_args,
Tracepoint.model_unload: ArgTypeMap.model_args,
Tracepoint.inference_enqueue_req: ArgTypeMap.queue_args,
Tracepoint.inference_begin: ArgTypeMap.infer_args,
Tracepoint.inference_end: ArgTypeMap.infer_args,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here, but the tracepoints and args seem very coupled with the implementation - would it be possible to make those more generic?

My main worry would be whether a small change into any of those implementations (e.g. changing the args for inference) could then cause any side effects around system tracing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way, they do end quite coupled indeed. However, the prototypes essentially become the public interface that external probes use to run their code. Any changes to arguments, besides simply adding new ones at the end of the list, would require changes for whatever external BPF probes people write against the interface. So we would need to stabilise them at some point, after we write more actual probing code.

The idea is to pass to probes the minimal information that is required for meaningful tracing applications. Hopefully, this information remains relatively stable even as MLServer evolves. For example, the actual inference functions might change without us changing the data passed for tracing via the same Tracepoints.

That being said, the _prototypes defined here do not end up being "enforced". Say we add a new argument to the inference_begin tracepoint without changing any of the tp_* functions in the provider. All existing code will continue to work because the function for triggering the tracepoint (Probe.fire) receives a variable number of arguments. We can then selectively call the tracepoint with additional arguments sometimes by directly using the SystemTracingProvider.__call__(tracepoint, *args) (I don't necessarily advocate that it's a sane thing to do, as code might end up being confusing in the long term wrt to what arguments should be passed to which tracepoints).

We can discuss further, perhaps there is a better design to be found here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding that extra context.

Do you have an example of how it looks on the "other end", i.e. where those probes are used?

tests/testdata/settings-tracing.json Outdated Show resolved Hide resolved
mlserver/sys_tracing/stapsdt_stub.py Show resolved Hide resolved
mlserver/registry.py Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated
@@ -107,7 +169,12 @@ RUN . $CONDA_PATH/etc/profile.d/conda.sh && \
pip install $_wheel --constraint ./dist/constraints.txt; \
done \
fi && \
pip install $(ls "./dist/mlserver-"*.whl) --constraint ./dist/constraints.txt && \
if [[ ${#_extras[@]} -gt 0 ]]; then \
extras_list=$(IFS=, ; echo "${_extras[@]}") && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the same reasoning as the other comment, unless we have some reason to avoid it I'd install all the extras in the Dockerfile - that way the seldonio/mlserver image is always ready as-is to enable tracepoints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do that if the decision (given above comments) is to install all extras. I don't know if pip has a simple way of doing that here, I might still have to build a list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also see if Poetry supports an all extras target? Otherwise we can always add one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and / or just hardcode a pip install mlserver[extras1, extras2, etc.])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poetry currently supports --all-extras on install but not export (this is tracked in python-poetry/poetry-plugin-export#45). I've added an everything extra. I've intentionally avoided the name all as that might be reserved in the future by poetry/pip, but we can move to it once there is enough support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I would just use all as it's more standard. If poetry decides to always add one automatically we can just remove it and use Poetry's built-in target.

Dockerfile Show resolved Hide resolved
lc525 added 10 commits July 12, 2023 11:09
- Allows MLServer to expose native tracepoints that fire on
  application events; BPF/Systemtap probes can be attached at
  runtime to those tracepoints, for performing measurements or
  for linking application activity to OS behaviour (resource
  consumption and contention, performance variations)
- Near-zero overhead when not in use or external probes are not
  attached
- The exposed tracepoints are configured via MLServer settings
- Adds (optional) dependencies on
   * linux-usdt/python-stapsdt
   * linux-usdt/libstapsdt (native library)
Also update license files to contain dependencies for the `tracepoints`
extra
- Only allow tracepoints to be enabled/disabled as a whole rather than
  having per-tracepoint options;
- Refactor code so that all tracepoint/provider code resides in the same
  module (this has been facilitated by the simplified settings)
- Update code for Python 3.8+ (use explicit Union types rather than |)
@CLAassistant
Copy link

CLAassistant commented May 22, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants