Skip to content

Conversation

johndmulhausen
Copy link
Contributor

@johndmulhausen johndmulhausen commented Aug 22, 2025

Summary

This PR addresses DOCS-1003 by adding comprehensive documentation about W&B's SDK architecture and logging performance.

Background

An academic user requested a high-level explanation of how our event-driven architecture in the SDK handles logging in terms of CPU and processes. They were particularly concerned about managing CPU usage for GPU data logging and wanted documentation to help them understand "what to worry about and what not to worry about."

Changes

New Documentation

  • Created /content/en/guides/models/track/sdk-architecture.md containing:
    • SDK Architecture Overview - Explains the event-driven, non-blocking nature of W&B's logging system
    • Visual Architecture Diagram - Mermaid flowchart showing data flow from training script to W&B servers
    • CPU/GPU Synchronization - Addresses concerns about GPU data logging with practical patterns
    • Performance Guidelines - Clear guidance on what users should know vs. what not to worry about
    • Best Practices - Practical examples for optimal logging performance

Documentation Updates

  • Updated /content/en/guides/models/track/_index.md to link to the new SDK architecture documentation

Key Features

  1. Event-driven architecture explanation - Shows how logging works without blocking training
  2. Practical code examples - Demonstrates batch logging and deferred logging patterns
  3. Performance guidance - Specific recommendations for logging frequency and data sizes
  4. GPU/CPU sync patterns - Addresses the user's specific concern about GPU data handling

Testing

  • Documentation renders correctly
  • Mermaid diagram displays properly in both light and dark modes
  • All code examples are syntactically correct
  • Links work correctly

Related Issues

📄 View preview links for changed pages

- Created comprehensive guide explaining W&B SDK event-driven architecture
- Covers logging performance, CPU/GPU synchronization, and best practices
- Addresses user concerns about managing CPU usage for GPU data logging
- Includes practical code examples and performance guidelines

Resolves DOCS-1003
@johndmulhausen johndmulhausen requested a review from a team as a code owner August 22, 2025 19:30
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Aug 22, 2025

Deploying docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: ee6c31c
Status: ✅  Deploy successful!
Preview URL: https://2d62d875.docodile.pages.dev
Branch Preview URL: https://docs-1003-sdk-architecture-d.docodile.pages.dev

View logs

@github-actions
Copy link
Contributor

github-actions bot commented Aug 22, 2025

PR Preview: Changed content

Base preview: https://docs-1003-sdk-architecture-d.docodile.pages.dev

Added

Title Path
Sdk Architecture content/en/guides/models/track/sdk-architecture.md
Sdk Performance content/en/guides/models/track/sdk-performance.md

Modified

Title Path
Experiments content/en/guides/models/track/_index.md

Copy link
Contributor

@ngrayluna ngrayluna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johndmulhausen Can you get a tech review from the SDK Team in #platform-sdk?

- Revised section titles for consistency and improved readability
- Enhanced descriptions of SDK architecture and performance, emphasizing event-driven design and non-blocking operations
- Added new subsections for better organization, including detailed explanations of key components and performance guidelines
- Included practical examples to illustrate data flow and logging practices

Resolves DOCS-1004
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants