Skip to content

feat(core): MCP server instrumentation #16807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from

Conversation

betegon
Copy link
Member

@betegon betegon commented Jul 3, 2025

An improvement on MCP server tracing. It follows:

This is WIP, and will later add:

  • error handling
  • PII

Project with this new spans:

  • If you've added code that should be tested, please add tests.
  • Ensure your code lints and the test suite passes (yarn lint) & (yarn test).

betegon added 22 commits June 27, 2025 20:46
…ibute names to match OTEL draft semantic convention
@betegon betegon self-assigned this Jul 3, 2025
@betegon betegon changed the title Bete/mcp server semantic convention feat(core): MCP server instrumentation Jul 3, 2025
Copy link
Member

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this is looks to be the right direction! The only comment I would make is to do mv core/src/mcp-server.ts core/src/integrations/mcp-server/index.ts and move all of the utils/mcp-server files into that integrations folder. This matches our existing structure.

I do have some low prio items, but that I can get to once this PR gets out of draft.

attributes['mcp.resource.uri'] = String(params.uri);
// Extract protocol from URI
try {
const url = new URL(String(params.uri));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a url parsing helper in the core package that we should use here. It also handles relative URLs.

@betegon
Copy link
Member Author

betegon commented Jul 7, 2025

Thanks @AbhiPrasad! I'm closing this in favor of #16817, that avoids Proxy by using our Fill function so we make don't have any problem with Cloudflare (#16182).

I'll apply your suggestions on #16817

@betegon betegon closed this Jul 7, 2025
@betegon betegon deleted the bete/mcp-server-semantic-convention branch July 7, 2025 16:25
AbhiPrasad pushed a commit that referenced this pull request Jul 28, 2025
…6817)

Closes #16826, #16654, #16666  #16978

Different approach from #16807 .

Using `Proxy` was causing issues in cloudflare #16182.

Now using `fill` we shouldn't have those problems as `fill` doesn't
create a new wrapper object with a different identity, so now:

1. `fill` just replaces the method on the existing object
2. The transport object keeps its original identity
3. When `transport.start()` runs and accesses private fields, this is
still the original transport object
4. The `WeakMap` recognizes it as the same object that owns the private
fields

## What's inside
- Support for new MCP SDK methods (`mcpServerInstance.tool()`,
`mcpServerInstance.resource()`, etc.)
- Tracing instrumentation
- Error handling 

## Tracing
It follows [OTEL semantic conventions for
MCP](https://github.com/open-telemetry/semantic-conventions/blob/3097fb0af5b9492b0e3f55dc5f6c21a3dc2be8df/docs/gen-ai/mcp.md)
and adds more attributes we thought are useful.

It also handles PII based on user setting of `sendDefaultPii`.

###  Tracing flow
1. **Transport receives tools/call request (`id: 2`)**
2. **Create INACTIVE `mcp.server` span** 
3. **Store span in**`requestIdToSpanMap[2] = { span, method:
"tools/call", startTime }`
4. **Execute handler with span context** *(handler gets requestId: 2)*
5. **Handler finds span using** `requestId: 2` 
6. **Tool execution happens within span context**
7. **Response sent with tool results**
8. `completeSpanWithToolResults(2, result)` **enriches and completes
span**

## Error handling

1.  **error capture** - `errorCapture.ts`
- Non-invasive error reporting that never interferes with MCP service
operation
- Error context with MCP-specific metadata
- PII filtering respects `sendDefaultPii` settings
- Resilient to Sentry failures (wrapped in try-catch)

2. **Tool execution error capturing** - `handlers.ts`
- Captures exceptions thrown during tool execution
- Preserves normal MCP behaviour (errors converted to `isError: true`)
- Includes tool name, arguments, and request context
- Handles both sync and async tool handlers

3. **Transport error instrumentation** - `transport.ts`
- Captures connection errors and network failures
- Intercepts JSON-RPC error responses
- Includes transport type and session information
- Handles error responses being sent back to clients
AbhiPrasad pushed a commit that referenced this pull request Jul 28, 2025
…6817)

Closes #16826, #16654, #16666  #16978

Different approach from #16807 .

Using `Proxy` was causing issues in cloudflare #16182.

Now using `fill` we shouldn't have those problems as `fill` doesn't
create a new wrapper object with a different identity, so now:

1. `fill` just replaces the method on the existing object
2. The transport object keeps its original identity
3. When `transport.start()` runs and accesses private fields, this is
still the original transport object
4. The `WeakMap` recognizes it as the same object that owns the private
fields

- Support for new MCP SDK methods (`mcpServerInstance.tool()`,
`mcpServerInstance.resource()`, etc.)
- Tracing instrumentation
- Error handling

It follows [OTEL semantic conventions for
MCP](https://github.com/open-telemetry/semantic-conventions/blob/3097fb0af5b9492b0e3f55dc5f6c21a3dc2be8df/docs/gen-ai/mcp.md)
and adds more attributes we thought are useful.

It also handles PII based on user setting of `sendDefaultPii`.

1. **Transport receives tools/call request (`id: 2`)**
2. **Create INACTIVE `mcp.server` span**
3. **Store span in**`requestIdToSpanMap[2] = { span, method:
"tools/call", startTime }`
4. **Execute handler with span context** *(handler gets requestId: 2)*
5. **Handler finds span using** `requestId: 2`
6. **Tool execution happens within span context**
7. **Response sent with tool results**
8. `completeSpanWithToolResults(2, result)` **enriches and completes
span**

1.  **error capture** - `errorCapture.ts`
- Non-invasive error reporting that never interferes with MCP service
operation
- Error context with MCP-specific metadata
- PII filtering respects `sendDefaultPii` settings
- Resilient to Sentry failures (wrapped in try-catch)

2. **Tool execution error capturing** - `handlers.ts`
- Captures exceptions thrown during tool execution
- Preserves normal MCP behaviour (errors converted to `isError: true`)
- Includes tool name, arguments, and request context
- Handles both sync and async tool handlers

3. **Transport error instrumentation** - `transport.ts`
- Captures connection errors and network failures
- Intercepts JSON-RPC error responses
- Includes transport type and session information
- Handles error responses being sent back to clients
AbhiPrasad pushed a commit that referenced this pull request Jul 28, 2025
…6817)

Closes #16826, #16654, #16666  #16978

Different approach from #16807 .

Using `Proxy` was causing issues in cloudflare #16182.

Now using `fill` we shouldn't have those problems as `fill` doesn't
create a new wrapper object with a different identity, so now:

1. `fill` just replaces the method on the existing object
2. The transport object keeps its original identity
3. When `transport.start()` runs and accesses private fields, this is
still the original transport object
4. The `WeakMap` recognizes it as the same object that owns the private
fields

- Support for new MCP SDK methods (`mcpServerInstance.tool()`,
`mcpServerInstance.resource()`, etc.)
- Tracing instrumentation
- Error handling

It follows [OTEL semantic conventions for
MCP](https://github.com/open-telemetry/semantic-conventions/blob/3097fb0af5b9492b0e3f55dc5f6c21a3dc2be8df/docs/gen-ai/mcp.md)
and adds more attributes we thought are useful.

It also handles PII based on user setting of `sendDefaultPii`.

1. **Transport receives tools/call request (`id: 2`)**
2. **Create INACTIVE `mcp.server` span**
3. **Store span in**`requestIdToSpanMap[2] = { span, method:
"tools/call", startTime }`
4. **Execute handler with span context** *(handler gets requestId: 2)*
5. **Handler finds span using** `requestId: 2`
6. **Tool execution happens within span context**
7. **Response sent with tool results**
8. `completeSpanWithToolResults(2, result)` **enriches and completes
span**

1.  **error capture** - `errorCapture.ts`
- Non-invasive error reporting that never interferes with MCP service
operation
- Error context with MCP-specific metadata
- PII filtering respects `sendDefaultPii` settings
- Resilient to Sentry failures (wrapped in try-catch)

2. **Tool execution error capturing** - `handlers.ts`
- Captures exceptions thrown during tool execution
- Preserves normal MCP behaviour (errors converted to `isError: true`)
- Includes tool name, arguments, and request context
- Handles both sync and async tool handlers

3. **Transport error instrumentation** - `transport.ts`
- Captures connection errors and network failures
- Intercepts JSON-RPC error responses
- Includes transport type and session information
- Handles error responses being sent back to clients
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants