Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 177 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# XProf (+ Tensorboard Profiler Plugin)
XProf includes a suite of profiling tools for [JAX](https://jax.readthedocs.io/), [TensorFlow](https://www.tensorflow.org/), and [PyTorch/XLA](https://github.com/pytorch/xla). These tools help you understand, debug and optimize machine learning programs to run on CPUs, GPUs and TPUs.

XProf offers a number of tools to analyse and visualize the
performance of your model across multiple devices. Some of the tools include:
Expand Down Expand Up @@ -45,24 +44,199 @@ To get the most recent release version of XProf, install it via pip:
$ pip install xprof
```

Without TensorBoard:
## Running XProf

XProf can be launched as a standalone server or used as a plugin within
TensorBoard. For large-scale use, it can be deployed in a distributed mode with
separate aggregator and worker instances ([more details on it later in the
doc](#distributed-profiling)).

### Command-Line Arguments

When launching XProf from the command line, you can use the following arguments:

* **`logdir`** (optional): The directory containing XProf profile data (files
ending in `.xplane.pb`). This can be provided as a positional argument or
with `-l` or `--logdir`. If provided, XProf will load and display profiles
from this directory. If omitted, XProf will start without loading any
profiles, and you can dynamically load profiles using `session_path` or
`run_path` URL parameters, as described in the [Log Directory
Structure](#log-directory-structure) section.
* **`-p <port>`**, **`--port <port>`**: The port for the XProf web server.
Defaults to `8791`.
* **`-gp <grpc_port>`**, **`--grpc_port <grpc_port>`**: The port for the gRPC
server used for distributed processing. Defaults to `50051`. This must be
different from `--port`.
* **`-wsa <addresses>`**, **`--worker_service_address <addresses>`**: A
comma-separated list of worker addresses (e.g., `host1:50051,host2:50051`)
for distributed processing. Defaults to to `0.0.0.0:<grpc_port>`.
* **`-hcpb`**, **`--hide_capture_profile_button`**: If set, hides the 'Capture
Profile' button in the UI.

### Standalone

If you have profile data in a directory (e.g., `profiler/demo`), you can view it
by running:

```
$ xprof profiler/demo --port=6006
```

Or with the optional flag:

```
$ xprof --logdir=profiler/demo --port=6006
```

With TensorBoard:
### With TensorBoard

If you have TensorBoard installed, you can run:

```
$ tensorboard --logdir=profiler/demo
```

If you are behind a corporate firewall, you may need to include the `--bind_all`
tensorboard flag.

Go to `localhost:6006/#profile` of your browser, you should now see the demo
overview page show up.
Congratulations! You're now ready to capture a profile.

### Log Directory Structure

When using XProf, profile data must be placed in a specific directory structure.
XProf expects `.xplane.pb` files to be in the following path:

```
<log_dir>/plugins/profile/<session_name>/
```

* `<log_dir>`: This is the root directory that you supply to `tensorboard
--logdir`.
* `plugins/profile/`: This is a required subdirectory.
* `<session_name>/`: Each subdirectory inside `plugins/profile/` represents a
single profiling session. The name of this directory will appear in the
TensorBoard UI dropdown to select the session.

**Example:**

If your log directory is structured like this:

```
/path/to/your/log_dir/
└── plugins/
└── profile/
├── my_experiment_run_1/
│ └── host0.xplane.pb
└── benchmark_20251107/
└── host1.xplane.pb
```

You would launch TensorBoard with:

```bash
tensorboard --logdir /path/to/your/log_dir/
```

The runs `my_experiment_run_1` and `benchmark_20251107` will be available in the
"Sessions" tab of the UI.

You can also dynamically load sessions from a GCS bucket or local filesystem by
passing URL parameters when loading XProf in your browser. This method works
whether or not you provided a `logdir` at startup and is useful for viewing
profiles from various locations without restarting XProf.

For example, if you start XProf with no log directory:

```bash
xprof
```

You can load sessions using the following URL parameters.

Assume you have profile data stored on GCS or locally, structured like this:

```
gs://your-bucket/profile_runs/
├── my_experiment_run_1/
│ ├── host0.xplane.pb
│ └── host1.xplane.pb
└── benchmark_20251107/
└── host0.xplane.pb
```

There are two URL parameters you can use:

* **`session_path`**: Use this to load a *single* session directly. The path
should point to a directory containing `.xplane.pb` files for one session.

* GCS Example:
`http://localhost:8791/?session_path=gs://your-bucket/profile_runs/my_experiment_run_1`
* Local Path Example:
`http://localhost:8791/?session_path=/path/to/profile_runs/my_experiment_run_1`
* Result: XProf will load the `my_experiment_run_1`
session, and you will see its data in the UI.

* **`run_path`**: Use this to point to a directory that contains *multiple*
session directories.

* GCS Example:
`http://localhost:8791/?run_path=gs://your-bucket/profile_runs/`
* Local Path Example:
`http://localhost:8791/?run_path=/path/to/profile_runs/`
* Result: XProf will list all session directories found under `run_path`
(i.e., `my_experiment_run_1` and `benchmark_20251107`) in the "Sessions"
dropdown in the UI, allowing you to switch between them.

**Loading Precedence**

If multiple sources are provided, XProf uses the following order of precedence
to determine which profiles to load:

1. **`session_path`** URL parameter
2. **`run_path`** URL parameter
3. **`logdir`** command-line argument

### Distributed Profiling

XProf supports distributed profile processing by using an aggregator that
distributes work to multiple XProf workers. This is useful for processing large
profiles or handling multiple users.

**Note**: Currently, distributed processing only benefits the following tools:
`overview_page`, `framework_op_stats`, `input_pipeline`, and `pod_viewer`.

**Note**: The ports used in these examples (`6006` for the aggregator HTTP
server, `9999` for the worker HTTP server, and `50051` for the worker gRPC
server) are suggestions and can be customized.

**Worker Node**

Each worker node should run XProf with a gRPC port exposed so it can receive
processing requests. You should also hide the capture button as workers are not
meant to be interacted with directly.

```
$ xprof --grpc_port=50051 --port=9999 --hide_capture_profile_button
```

**Aggregator Node**

The aggregator node runs XProf with the `--worker_service_address` flag pointing
to all available workers. Users will interact with aggregator node's UI.

```
$ xprof --worker_service_address=<worker1_ip>:50051,<worker2_ip>:50051 --port=6006 --logdir=profiler/demo
```

Replace `<worker1_ip>, <worker2_ip>` with the addresses of your worker machines.
Requests sent to the aggregator on port 6006 will be distributed among the
workers for processing.

For deploying a distributed XProf setup in a Kubernetes environment, see
[Kubernetes Deployment Guide](docs/kubernetes_deployment.md).

## Nightlies

Every night, a nightly version of the package is released under the name of
Expand Down
6 changes: 6 additions & 0 deletions docs/_toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,9 @@ toc:
section:
- title: XProf Demo
path: /xprof/xprof_demo
- title: Deployment
section:
- title: Docker Deployment
path: /xprof/docker_deployment
- title: Kubernetes Deployment
path: /xprof/kubernetes_deployment
67 changes: 67 additions & 0 deletions docs/docker_deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Building an XProf Docker Image

This document describes how to build a Docker image for XProf using the official
release from PyPI.

## Dockerfile

Create a file named `Dockerfile` with the following content:

**Note**: The following Dockerfile provides a basic configuration for running
XProf. You may need to modify it to fit your specific security requirements,
base image preferences, or other environmental needs.

```dockerfile
FROM python:3.11-slim

ARG XPROF_VERSION=2.21.0

ENV PYTHONUNBUFFERED=1

WORKDIR /app

RUN pip install --no-cache-dir xprof==${XPROF_VERSION}

EXPOSE 8791 50051

ENTRYPOINT ["xprof"]

CMD ["--logdir=/app/logs", "--port=8791"]
```

## How to Build

1. Save the content above as `Dockerfile` in an empty directory.
2. Build the image using the following command:

```bash
docker build -t xprof:2.21.0 .
```

You can change the version by modifying the `XPROF_VERSION` argument in the
Dockerfile or by passing `--build-arg XPROF_VERSION=<version>` to the `docker
build` command.

## How to Run

### Run with Local Logs

Map your local log directory to `/app/logs` in the container.

```bash
docker run -p 8791:8791 \
-v /tmp/xprof_logs:/app/logs \
xprof:2.21.0
```

### Run with GCS Logs

Mount your local gcloud credentials so xprof can authenticate with Google Cloud
Storage.

```bash
docker run -p 8791:8791 \
-v ~/.config/gcloud:/root/.config/gcloud \
xprof:2.21.0 \
--logdir=gs://your-bucket-name/xprof_logs --port=8791
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading