Skip to content

feat: adds an export tool and exported-data resource MCP-16 #424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

himanshusinghs
Copy link
Collaborator

@himanshusinghs himanshusinghs commented Aug 5, 2025

Proposed changes

This PR introduces:

  1. a new export tool that allows users to export MongoDB collection data or query results in JSON format
  2. a templated resource called exported-data:// to access the data exports created by export tool

JSON Export tool provides two formats for data export:

  1. relaxed: This exports plain json format at the possible loss of some BSON types. Useful when the data is expected to be used in tools other than MongoDB such as BI tools or when simply having conversation with LLMs
  2. canonical: This exports json with BSON types and is expected to be used when data might get imported back again in MongoDB.

JSON export tool allows exporting:

  1. an entire namespace
  2. query results for a filter (possibly with limit, sort, projection)

Export tool result contains:

  1. a text representation of how to access the exported data via resource URI
  2. a resource_link content part for the clients who are capable of accessing resource_link
  3. a local path, only if the MCP server is running behind stdio transport to make it easier for editor clients to access the exported file.

Exported data temporarily resides on the host machine running the mongodb-mcp-server at the configured path (configuration exposed by flag --exportPath or environment variable MDB_MCP_EXPORT_PATH), until it gets cleaned up automatically either by:

  • the autocleaner running every configured milliseconds (configuration exposed by flag --exportCleanupIntervalMs or environment variable MDB_MCP_EXPORT_CLEANUP_INTERVAL_MS)
  • disconnect of the client

The expiry of exported data is controlled by flag --exportTimeoutMs or the environment variable MDB_MCP_EXPORT_TIMEOUT_MS

The exported data can be accessed via the resource URI template - exported-data://{exportName}. Because the exported data are written to non-guessable filenames, the autocomplete API is provided to easily autocomplete the matching file names.

Checklist

  • JSON export tool should allow exporting BI friendly data
  • JSON export tool should allow exporting data with BSON types
  • JSON export tool should allow exporting entire namespace
  • JSON export tool should allow exporting query results
  • Exported data should be available for limited time and automatically cleaned up during a session and on session disconnect
  • Exported data should be made available via resources for stable and re-usable reference
  • Exported data should optionally be provided via local path when MCP server is connected to editor clients
  • Export tool covered by test cases
  • Exported resource covered by test cases
  • Export cleanup and generation should be covered by test cases

@himanshusinghs himanshusinghs force-pushed the feat/MCP-16-json-export-tool branch from 61e7064 to e359184 Compare August 5, 2025 13:46
@himanshusinghs himanshusinghs marked this pull request as ready for review August 5, 2025 17:47
@Copilot Copilot AI review requested due to automatic review settings August 5, 2025 17:47
@himanshusinghs himanshusinghs requested a review from a team as a code owner August 5, 2025 17:47
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive JSON export functionality for MongoDB collections, allowing users to export data in both relaxed and canonical JSON formats, with automatic cleanup and resource access via MCP resource templates.

Key changes:

  • Added a new export tool that supports filtering, sorting, limiting, and projecting MongoDB collection data
  • Implemented an exported-data:// resource template for stable access to exported files
  • Created automatic cleanup mechanisms with configurable timeouts and intervals

Reviewed Changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/common/sessionExportsManager.ts Core export management with file operations, cleanup, and EJSON formatting
src/tools/mongodb/read/export.ts Export tool implementation with MongoDB query support
src/resources/common/exportedData.ts Resource template for accessing exported data with autocomplete
tests/unit/common/sessionExportsManager.test.ts Comprehensive unit tests for export manager functionality
tests/integration/tools/mongodb/read/export.test.ts Integration tests for export tool with various query scenarios
tests/integration/resources/exportedData.test.ts Resource template integration tests
Comments suppressed due to low confidence (1)

src/common/sessionExportsManager.ts:131

  • The closing bracket is written directly to the output stream here, but the Transform stream's final callback on line 184 also pushes a closing bracket. This will create invalid JSON with duplicate closing brackets.
                    outputStream.write("]\n");

import { Server } from "../../server.js";
import logger, { LogId } from "../../common/logger.js";

export class ExportedData {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This Resource does not extend from ReactiveResource class because its templated MCP resource unlike static MCP resource (one that is built by ReactiveResource class). I found several points of independence from a "ReactiveResource" for example:

  • This resource does not have a "state" that needs to be reduced
  • Unlike a reactive resource, it only serves as a gateway to actual resources
  • The configuration options for a templated resource are unique to it and while we can extend reactive resource to allow for these options, it wont' seem right.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Aug 6, 2025

📊 Accuracy Test Results

📈 Summary

Metric Value
Commit SHA d683919a23b1f507815d4908e14ee123f6dd90ea
Run ID 1519ed72-a0df-4e44-bff0-68b22a013d81
Status done
Total Prompts Evaluated 55
Models Tested 1
Average Accuracy 98.6%
Responses with 0% Accuracy 0
Responses with 75% Accuracy 3
Responses with 100% Accuracy 52

📊 Baseline Comparison

Metric Value
Baseline Commit a269053ba0d973ac79ecbeaf0177220cb7d06b06
Baseline Run ID f05769f8-177e-4150-b1a4-390757a692ea
Baseline Run Status done
Responses Improved 0
Responses Regressed 0

📎 Download Full HTML Report - Look for the accuracy-test-summary artifact for detailed results.

Report generated on: 8/6/2025, 11:28:49 AM

@himanshusinghs himanshusinghs force-pushed the feat/MCP-16-json-export-tool branch from 6109cea to e89cf23 Compare August 6, 2025 11:43
@coveralls
Copy link
Collaborator

coveralls commented Aug 6, 2025

Pull Request Test Coverage Report for Build 16781101181

Details

  • 382 of 473 (80.76%) changed or added relevant lines in 11 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 81.961%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/common/config.ts 12 13 92.31%
src/tools/mongodb/read/export.ts 71 73 97.26%
src/common/sessionExportsManager.ts 209 236 88.56%
src/resources/common/exportedData.ts 68 129 52.71%
Totals Coverage Status
Change from base Build 16779162520: 0.2%
Covered Lines: 3791
Relevant Lines: 4585

💛 - Coveralls

Copy link
Collaborator

@nirinchev nirinchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't reviewed it thoroughly, but leaving some comments for things that should be reworked. Happy to grab some time later today or tomorrow to go through the high-level architecture and offer some suggestions there.

Comment on lines +21 to +27
"export-expired": [string];
"export-available": [string];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong to me. Event emitters should not have external components emit events on their behalf.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm - are you suggesting that these events should not even be emitted on Session class or rather we should provide public methods for notifying of these events and handle emitting them exclusively in this class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more inclined to move the event manager as a field on the session and have consumers subscribe to the events on the event manager directly, rather than having them proxied through the session.


const MAX_LOCK_RETRIES = 10;

export class SessionExportsManager {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have the export manager live on the session rather than the server, similarly to what Kevin is doing with the connection manager? Then tools can access it via the session rather than store it upon registration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly - I had it on Server only because there wasn't a pattern of accessing session internals in the codebase and I did not want to disturb that. Will move it to Session now that we are already doing the same for ConnectionManager.

public exportsDirectoryPath(): string {
// If the session is not connected, we can't cannot work with exports
// for that session.
if (!this.session.sessionId) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure when and why this session id was added, but looking at it again, it's just an ObjectId that gets set when the server is initialized. Doesn't appear to be any reason not to set it upon session initialization, in which case we'd avoid these redundant checks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just revisited this and yea it can actually be set on initialization, even for http transport

/**
* Small utility to validate provided export name for path traversal or no
* extension */
private validateExportName(nameWithExtension: string): void {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be static/free function.

return {
contents: [
{
uri: this.server.exportsManager.exportNameToResourceURI(exportName),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] Is this not just uri?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, the exportName is the variable substituted in the uri exported-data://{exportName}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this.server.exportsManager.exportNameToResourceURI(exportName) would just put the exported-data:// prefix there, right, eventually giving us the original uri?

Comment on lines +49 to +54
await this.exportsManager.createJSONExport({ input: findCursor, exportName, jsonExportFormat });
const exportedResourceURI = this.exportsManager.exportNameToResourceURI(exportName);
const exportedResourcePath = this.exportsManager.exportFilePath(
this.exportsManager.exportsDirectoryPath(),
exportName
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path and the uri look like something that should be returned by the exportManager.createJSONExport rather than being queried later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - I will make this change, right away.

return `exported-data://${nameWithExtension}`;
}

public exportsDirectoryPath(): string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and exportFilePath being public seems like a code smell to me. The way I see it, the sessionExportsManager should encapsulate all the file management logic, and the entire public API should be more or less surfaced through the Export type. That is, it should have API for listing, reading, writing, and deleting Export's, but the consumers should not ever concern themselves with filesystem operations or paths.

await this.withExportsLock<void>(async (exportsDirectoryPath) => {
const exportFilePath = path.join(exportsDirectoryPath, exportNameWithExtension);
const outputStream = createWriteStream(exportFilePath);
outputStream.write("[");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we missing a closing bracket in the outputStream? Why are we opening it and not closing it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closing bracket gets appended in the final callback of Transform stream here.


/**
* Acquires a lock on the session exports directory. */
private async withExportsLock<R>(callback: (lockedPath: string) => Promise<R>): Promise<R> {
Copy link
Collaborator

@kmruiz kmruiz Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this file lock. Usually you have a file lock when you have multiple processes competing for writing on the same file, but this is not the case.

  1. Deleting will remove files that are expired already. Worst case scenario, ENOENT when deleting or reading could be ignored.
  2. Files are written once, and never updated.

I would suggest to keep the lock in memory, as we do right now, and avoid using file locks which have their own set of problems.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a case which explicitly requires locking. It is when writing a file takes way more time than the configured timeout, lets imagine 30 seconds. The cleanup interval, without a lock, would remove this file while its being written and thus lead to inconsistent behaviour.

In addition to the case above, there is an improvement which gets hindered if we don't do the locking. Currently we wait for the export to be finished but in an ideal world, we should be able to return the export URI immediately and then push progress notification about the export progress and eventually the export availability. If we remove the locking, there is chance of reading partially exported data.

I can't vouch for proper-filelock's stability but we have been using it for our DiskStorage in accuracy tests and it has worked pretty reliably so far. It also provides transaction primitives such lock retries on failure (same as transaction queues) and exclusive updates which we are able to use to "confidently" update the state in memory and on disk.

void input.close();
if (pipeSuccessful) {
const resourceURI = this.exportNameToResourceURI(exportNameWithExtension);
this.availableExports = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have availableExports and exports files separated and that can lead to inconsistencies. I would suggest to keep only one as a source of truth.

If we need to clean up files that are not available, we can always iterate the folder and remove any file not in the availableExports array.

transform: function (chunk: unknown, encoding, callback) {
++docsTransformed;
try {
const doc: string = EJSON.stringify(chunk, undefined, 2, ejsonOptions);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the padding? (the 2). This is going to be consumed likely by the agent or another program, and if a user wants to read the documents themselves, they can always use a formatter in their editor. This increases disk size and token count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point. I don't think we need padding for our case, happy to remove it.

@himanshusinghs himanshusinghs force-pushed the feat/MCP-16-json-export-tool branch from e89cf23 to e8845bc Compare August 6, 2025 14:40

public exportNameToResourceURI(nameWithExtension: string): string {
this.validateExportName(nameWithExtension);
return `exported-data://${nameWithExtension}`;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return `exported-data://${nameWithExtension}`;
return `exported-data://${encodeURIComponent(nameWithExtension)}`;

and the reverse when decoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants