sched-ext
diff --git a/‎tools/scxtop/CLAUDE_INTEGRATION.md‎
Lines changed: 441 additions & 0 deletions b/‎tools/scxtop/CLAUDE_INTEGRATION.md‎
Lines changed: 441 additions & 0 deletions
diff --git a/‎tools/scxtop/MCP_INTEGRATIONS.md‎
Lines changed: 382 additions & 0 deletions b/‎tools/scxtop/MCP_INTEGRATIONS.md‎
Lines changed: 382 additions & 0 deletions
@@ -0,0 +1,382 @@
+# scxtop MCP Server - Complete Integration Reference
+
+This document provides a comprehensive overview of all MCP (Model Context Protocol) integrations available in the scxtop MCP server.
+
+## Overview
+
+The scxtop MCP server provides AI assistants with programmatic access to Linux scheduler metrics, BPF events, hardware topology, and performance profiling capabilities. It implements the MCP specification (protocol version 2024-11-05) and supports both one-shot queries and daemon mode with real-time event streaming.
+
+## Server Information
+
+- **Server Name**: `scxtop-mcp`
+- **Version**: (from Cargo.toml)
+- **Protocol Version**: `2024-11-05`
+- **Modes**:
+  - One-shot mode (stdio)
+  - Daemon mode (continuous with event streaming)
+
+## Capabilities
+
+### 1. Resources (17 URIs)
+
+Resources are read-only data endpoints that provide access to system metrics and configuration.
+
+#### Scheduler Resources
+
+| URI | Description | Data Type |
+|-----|-------------|-----------|
+| `scheduler://current` | Currently active scheduler name, class (sched_ext or other), and state | JSON |
+| `stats://scheduler/raw` | Raw JSON statistics from the scheduler's scx_stats framework | JSON |
+| `stats://scheduler/scx` | Kernel-level sched_ext statistics and counters | JSON |
+
+#### Topology Resources
+
+| URI | Description | Data Type |
+|-----|-------------|-----------|
+| `topology://info` | Hardware topology including CPUs, cores, LLCs, NUMA nodes with IDs and mappings | JSON |
+
+#### Aggregated Statistics
+
+| URI | Description | Data Type |
+|-----|-------------|-----------|
+| `stats://aggregated/cpu` | Per-CPU statistics including utilization, frequency, scheduling metrics | JSON |
+| `stats://aggregated/llc` | Statistics aggregated by last-level cache domain | JSON |
+| `stats://aggregated/node` | Statistics aggregated by NUMA node | JSON |
+| `stats://aggregated/dsq` | Dispatch queue statistics for sched_ext schedulers (latencies, depths, vtime) | JSON |
+| `stats://aggregated/process` | Per-process scheduler statistics including runtime, vtime, layer info | JSON |
+
+#### System-Wide Statistics
+
+| URI | Description | Data Type |
+|-----|-------------|-----------|
+| `stats://system/cpu` | System-wide CPU utilization statistics and context switch rates | JSON |
+| `stats://system/memory` | System memory statistics (total, free, cached, etc.) | JSON |
+| `stats://system/network` | Network interface statistics | JSON |
+
+#### Profiling and Events
+
+| URI | Description | Data Type |
+|-----|-------------|-----------|
+| `events://perf` | List of all available perf tracepoint events organized by subsystem | JSON |
+| `events://kprobe` | List of all available kernel functions for kprobe profiling | JSON |
+| `bpf://programs` | Currently loaded BPF programs with runtime statistics | JSON |
+| `profiling://perf/status` | Current perf profiling status (running/stopped, sample count, duration) | JSON |
+| `profiling://perf/results` | Symbolized stack traces from perf profiling (kernel and userspace, top 50 symbols) | JSON |
+
+#### Event Streaming (Daemon Mode Only)
+
+| URI | Description | Data Type |
+|-----|-------------|-----------|
+| `events://stream` | Real-time stream of BPF scheduler events (requires daemon mode and subscription) | NDJSON |
+
+**Supported Event Types** (when subscribed to `events://stream`):
+- Scheduling: `sched_switch`, `sched_wakeup`, `sched_waking`, `sched_wakeup_new`, `sched_migrate_task`
+- Process lifecycle: `fork`, `exec`, `exit`, `wait`
+- System events: `softirq`, `ipi`, `cpuhp_enter`, `cpuhp_exit`, `hw_pressure`
+- Profiling: `kprobe`, `perf_sample`
+- Scheduler-specific: `sched_hang`, `sched_cpu_perf_set`
+- Special: `mango_app`, `trace_started`, `trace_stopped`, `system_stat`, `sched_stats`
+
+### 2. Tools (6 Interactive Functions)
+
+Tools are callable functions that perform queries or actions.
+
+#### `query_stats`
+
+Discover available statistics resources and how to query them.
+
+**Parameters**:
+- `stat_type` (optional): Filter by type: `cpu`, `llc`, `node`, `dsq`, `process`, `scheduler`, `system`
+
+**Returns**: List of available resource URIs with descriptions and usage examples.
+
+#### `get_topology`
+
+Get detailed hardware topology with core/LLC/node mappings.
+
+**Parameters**:
+- `detail_level` (optional, default: `summary`): `summary` or `full`
+  - `summary`: High-level counts and SMT status
+  - `full`: Complete per-CPU, per-core, per-LLC, per-node details with frequencies and capacities
+- `include_offline` (optional, default: `false`): Include offline CPUs
+
+**Returns**: JSON object with topology information based on detail level.
+
+#### `list_events`
+
+List available profiling events filtered by subsystem.
+
+**Parameters**:
+- `subsystem` (**required**): Filter perf events by subsystem (e.g., `sched`, `irq`, `power`, `block`, `net`)
+- `event_type` (optional, default: `perf`): `kprobe`, `perf`, or `all`
+
+**Returns**: JSON object with filtered events, count, and subsystem information. On error, lists available subsystems.
+
+**Example**:
+```json
+{
+  "subsystem": "sched",
+  "event_type": "perf"
+}
+```
+
+#### `start_perf_profiling`
+
+Start perf profiling with stack trace collection and symbolization.
+
+**Parameters**:
+- `event` (optional, default: `hw:cpu-clock`): Event to profile
+  - Hardware events: `hw:cpu-clock`
+  - Software events: `sw:task-clock`
+  - Tracepoints: `tracepoint:subsystem:event` (e.g., `tracepoint:sched:sched_switch`)
+- `freq` (optional, default: `99`): Sampling frequency in Hz
+- `cpu` (optional, default: `-1`): CPU to profile (-1 for all CPUs, specific CPU ID otherwise)
+- `pid` (optional, default: `-1`): Process ID to profile (-1 for system-wide)
+- `max_samples` (optional, default: `10000`): Maximum samples to collect (0 for unlimited)
+- `duration_secs` (optional, default: `0`): Duration in seconds (0 for manual stop)
+
+**Returns**: Confirmation with profiling configuration.
+
+**Example**:
+```json
+{
+  "event": "hw:cpu-clock",
+  "freq": 99,
+  "duration_secs": 10,
+  "max_samples": 0
+}
+```
+
+#### `stop_perf_profiling`
+
+Stop perf profiling and prepare results for retrieval.
+
+**Parameters**: None
+
+**Returns**: Status object with sample count, duration, and profiling state.
+
+#### `get_perf_results`
+
+Retrieve symbolized stack traces and top functions from perf profiling.
+
+**Parameters**:
+- `limit` (optional, default: `50`): Number of top symbols to return
+- `include_stacks` (optional, default: `true`): Include full symbolized stack traces
+
+**Returns**: JSON object with:
+- Top symbols ranked by sample count with percentages
+- Symbolized stack traces (if `include_stacks` is true)
+- Kernel and userspace function names
+- Sample statistics
+
+**Example**:
+```json
+{
+  "limit": 20,
+  "include_stacks": true
+}
+```
+
+### 3. Prompts (5 Guided Workflows)
+
+Prompts are pre-defined analysis workflows that guide the AI through complex investigations.
+
+#### `analyze_scheduler_performance`
+
+Comprehensive scheduler performance analysis workflow.
+
+**Arguments**:
+- `focus_area` (optional): `latency`, `throughput`, `balance`, or `general`
+  - `latency`: Focus on dispatch queue latencies, wakeup delays
+  - `throughput`: Focus on context switch rates, CPU utilization, migration patterns
+  - `balance`: Focus on load distribution across CPUs, LLCs, NUMA nodes
+  - `general`: Comprehensive overview of all aspects
+
+**Returns**: Detailed workflow instructions for analyzing scheduler performance based on focus area.
+
+#### `debug_high_latency`
+
+Debug high scheduling latency issues with step-by-step investigation.
+
+**Arguments**:
+- `pid` (optional): Process ID to investigate (if not specified, system-wide analysis)
+
+**Returns**: Workflow for identifying latency bottlenecks, analyzing wakeup patterns, checking hardware factors, and suggesting remediation.
+
+#### `analyze_cpu_imbalance`
+
+Analyze CPU load imbalance and migration patterns.
+
+**Arguments**: None
+
+**Returns**: Workflow for measuring imbalance severity, understanding topology, identifying migration patterns, analyzing task characteristics, and determining root causes.
+
+#### `investigate_scheduler_behavior`
+
+Deep dive into scheduler behavior and policies.
+
+**Arguments**:
+- `scheduler_name` (optional): Specific scheduler to analyze (e.g., `scx_rusty`, `scx_lavd`)
+
+**Returns**: Workflow for examining dispatch queue behavior, monitoring scheduling decisions, analyzing task placement patterns, and comparing against expected behavior.
+
+#### `summarize_system`
+
+Comprehensive system and scheduler summary.
+
+**Arguments**: None
+
+**Returns**: Workflow for gathering complete system overview including hardware topology, active scheduler, system-wide statistics, resource distribution, top processes, and available monitoring capabilities.
+
+## Usage Examples
+
+### Claude Desktop Configuration
+
+Add to `claude_desktop_config.json`:
+
+```json
+{
+  "mcpServers": {
+    "scxtop": {
+      "command": "/path/to/scxtop",
+      "args": ["--mcp"]
+    }
+  }
+}
+```
+
+### Query Examples
+
+**Basic resource read**:
+```
+"What scheduler is currently running?"
+→ Claude reads: scheduler://current
+```
+
+**Using tools**:
+```
+"Show me the hardware topology"
+→ Claude calls: get_topology with detail_level="full"
+```
+
+**Using prompts**:
+```
+"Analyze scheduler latency issues"
+→ Claude invokes: analyze_scheduler_performance prompt with focus_area="latency"
+```
+
+**Profiling workflow**:
+```
+"Profile the system and show me the hottest kernel functions"
+→ Claude calls: start_perf_profiling with event="hw:cpu-clock", freq=99
+→ (waits or sets duration)
+→ Claude calls: stop_perf_profiling
+→ Claude calls: get_perf_results with limit=20, include_stacks=true
+→ Claude analyzes and presents results
+```
+
+**Event monitoring (daemon mode)**:
+```
+"Monitor scheduling events and alert me if you see high latency"
+→ Claude subscribes to: events://stream
+→ Claude filters sched_switch events for dsq_lat_us > 1000
+→ Claude reports anomalies in real-time
+```
+
+## Implementation Details
+
+### File Organization
+
+- `src/mcp/server.rs` - Main MCP server implementation and request handling
+- `src/mcp/resources.rs` - Resource registration and data retrieval
+- `src/mcp/tools.rs` - Tool implementations
+- `src/mcp/prompts.rs` - Workflow prompt definitions
+- `src/mcp/protocol.rs` - MCP protocol types and structures
+- `src/mcp/events.rs` - BPF event to MCP event conversion
+- `src/mcp/bpf_stats.rs` - BPF program statistics collector
+- `src/mcp/perf_profiling.rs` - Perf profiling engine with symbolization
+
+### Resource Handler Registration
+
+Resources are registered with closures that capture necessary state (e.g., topology, BPF stats collector):
+
+```rust
+self.resources.register_handler("topology://info".to_string(), move || {
+    Ok(serde_json::json!({
+        "nr_cpus": topo.all_cpus.len(),
+        // ... topology data
+    }))
+});
+```
+
+### Event Streaming Architecture
+
+1. MCP server creates an unbounded channel
+2. Resources module holds the sender
+3. Main scxtop BPF event loop pushes events via `push_event()`
+4. Events are converted to JSON and streamed to the client
+5. Client subscribes to `events://stream` resource
+
+## Daemon Mode vs One-Shot Mode
+
+### One-Shot Mode
+```bash
+scxtop --mcp
+```
+- Processes one MCP request cycle
+- No event streaming
+- Exits after initialize + first request
+- Suitable for CLI usage with Claude Code
+
+### Daemon Mode
+```bash
+scxtop --mcp-daemon
+```
+- Runs continuously
+- Enables `events://stream` resource
+- Real-time BPF event streaming
+- Suitable for Claude Desktop long-running sessions
+- Allows monitoring and proactive analysis
+
+## Statistics Collection
+
+The MCP server integrates with scxtop's existing statistics infrastructure:
+
+- **scx_stats framework**: Reads raw scheduler statistics from sched_ext schedulers
+- **BPF programs**: Collects per-CPU, per-process, per-DSQ metrics
+- **System stats**: Reads `/proc` and `/sys` for system-wide metrics
+- **BPF program stats**: Monitors loaded BPF programs via bpffs
+- **Perf profiling**: Uses perf_event_open() for stack trace collection
+- **Symbolization**: Resolves kernel and userspace addresses to function names
+
+## Security Considerations
+
+- Requires root or CAP_BPF/CAP_PERFMON capabilities
+- Accesses /sys/kernel/debug/tracing (tracefs)
+- Reads /proc filesystem
+- Attaches BPF programs to tracepoints
+- Can profile system-wide or specific processes
+- No authentication mechanism (local stdio only)
+
+## Performance Impact
+
+- **Resource reads**: Minimal impact, reads from in-memory stats
+- **Event streaming**: Low overhead, BPF programs filter in kernel
+- **Perf profiling**: Configurable sampling rate (default 99 Hz)
+- **Symbolization**: Performed in userspace, cached for efficiency
+
+## Future Enhancements
+
+Potential areas for expansion:
+- Additional resource types (disk I/O, interrupts)
+- More granular event filtering
+- Historical data retention and querying
+- Flamegraph generation
+- Scheduler parameter tuning via tools
+- Integration with additional profiling tools (eBPF-based)
+
+## References
+
+- MCP Specification: https://spec.modelcontextprotocol.io/
+- scxtop documentation: README.md, CLAUDE_INTEGRATION.md
+- sched_ext documentation: https://github.com/sched-ext/scx