Skip to content

Conversation

markurtz
Copy link
Collaborator

Summary

Refactor of the GuideLLM command-line interface, streamlining the benchmark command structure while adding new mock server functionality and performance optimization features and adding in any missing fixes in other PRs to stabilize the refactor to a working state.

Details

  • CLI Interface Overhaul:
    • Removed legacy -scenario option in favor of direct parameter specification
    • Reorganized CLI options with clear grouping (Backend, Data, Output, Aggregators, Constraints)
    • Added parameter aliases for backward compatibility (e.g., -rate-type → -profile)
    • Simplified option defaults by removing scenario-based defaults
    • Added comprehensive docstrings and help text for all commands and options
  • New Mock Server Command:
    • Added guidellm mock-server command with full OpenAI/vLLM API compatibility
    • Configurable latency characteristics (request latency, TTFT, ITL, output tokens)
    • Support for both streaming and non-streaming endpoints
    • Comprehensive server configuration options (host, port, workers, model name)
  • Performance Optimization Features:
    • Added new perf optional dependency group with orjsonmsgpackmsgspec, uvloop
    • Integrated uvloop for enhanced async performance when available
    • Optimized event loop policy selection based on availability
  • Internal Architecture Improvements:
    • Updated import paths (guidellm.backend → guidellm.backends, guidellm.scheduler.strategy → guidellm.scheduler)
    • Replaced scenario-based benchmarking with direct benchmark_generative_text function calls
    • Enhanced error handling and parameter validation
    • Simplified logging format for better readability
  • Enhanced Output and Configuration:
    • Added support for multiple output formats with -output-formats option
    • Improved output path handling for files vs directories
    • Added new constraint options (-max-errors-max-error-rate-max-global-error-rate)
    • Enhanced warmup/cooldown specification with flexible numeric/percentage options
  • Code Quality Improvements:
    • Comprehensive type annotations throughout the codebase
    • Detailed docstrings following Google/NumPy style conventions
    • Consistent parameter naming and organization
    • Removed deprecated version option from main CLI group

Test Plan

  • Tests for entrypoints to be added later

Related Issues

  • Part of the larger scheduler refactor initiative

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the GuideLLM command-line interface to streamline the benchmark command structure while adding mock server functionality and performance optimization features. The changes modernize the CLI interface by removing legacy scenario-based configuration in favor of direct parameter specification and introduce new capabilities for testing and development.

Key changes:

  • Replaced scenario-based CLI configuration with direct parameter specification using reorganized option groups
  • Added new mock server command with configurable OpenAI/vLLM API compatibility
  • Integrated performance optimizations through optional uvloop support and new perf dependency group

Reviewed Changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/guidellm/main.py Complete CLI overhaul with new parameter structure, mock server command, and uvloop integration
src/guidellm/settings.py Updated multiprocessing and scheduler settings with new configuration options
tests/unit/test_cli.py Removed legacy CLI version flag tests
tests/unit/conftest.py Removed old mock fixtures and test utilities
tests/unit/mock_* Updated mock implementations for new architecture
tests/integration/scheduler/ Added integration tests for scheduler components
src/guidellm/utils/typing.py New utility for extracting literal values from type aliases
pyproject.toml Added perf optional dependency group

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

] = list(get_literal_vals(Union[ProfileType, StrategyType]))


def decode_escaped_str(_ctx, _param, value):
Copy link
Preview

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decode_escaped_str function is defined but never used in this file. Consider removing it or moving it to a utilities module if it's needed elsewhere.

Copilot uses AI. Check for mistakes.

type=str,
help="The target path for the backend to run benchmarks against. For example, http://localhost:8000",
"--random-seed",
default=GenerativeTextScenario.get_default("random_seed"),
Copy link
Preview

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line references GenerativeTextScenario.get_default() but the import for GenerativeTextScenario is incomplete - only the class is imported but not its methods. This will likely cause a runtime error.

Copilot uses AI. Check for mistakes.

"--cooldown-percent", # legacy alias
"cooldown",
type=float,
default=GenerativeTextScenario.get_default("cooldown_percent"),
Copy link
Preview

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as with random_seed - this references GenerativeTextScenario.get_default() which may not be available due to incomplete import.

Copilot uses AI. Check for mistakes.

request_http2: bool = True

# Scheduler settings
mp_context_type: Literal["spawn", "fork", "forkserver"] | None = "fork"
Copy link
Preview

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using 'fork' as the default multiprocessing context can cause issues in some environments (especially macOS with certain Python versions). Consider using 'spawn' as a safer default or making it platform-dependent.

Suggested change
mp_context_type: Literal["spawn", "fork", "forkserver"] | None = "fork"
mp_context_type: Literal["spawn", "fork", "forkserver"] | None = "spawn"

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant