-
Notifications
You must be signed in to change notification settings - Fork 92
feat: adds an export tool and exported-data resource MCP-16 #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
61e7064
to
e359184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive JSON export functionality for MongoDB collections, allowing users to export data in both relaxed and canonical JSON formats, with automatic cleanup and resource access via MCP resource templates.
Key changes:
- Added a new
export
tool that supports filtering, sorting, limiting, and projecting MongoDB collection data - Implemented an
exported-data://
resource template for stable access to exported files - Created automatic cleanup mechanisms with configurable timeouts and intervals
Reviewed Changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/common/sessionExportsManager.ts | Core export management with file operations, cleanup, and EJSON formatting |
src/tools/mongodb/read/export.ts | Export tool implementation with MongoDB query support |
src/resources/common/exportedData.ts | Resource template for accessing exported data with autocomplete |
tests/unit/common/sessionExportsManager.test.ts | Comprehensive unit tests for export manager functionality |
tests/integration/tools/mongodb/read/export.test.ts | Integration tests for export tool with various query scenarios |
tests/integration/resources/exportedData.test.ts | Resource template integration tests |
Comments suppressed due to low confidence (1)
src/common/sessionExportsManager.ts:131
- The closing bracket is written directly to the output stream here, but the Transform stream's
final
callback on line 184 also pushes a closing bracket. This will create invalid JSON with duplicate closing brackets.
outputStream.write("]\n");
import { Server } from "../../server.js"; | ||
import logger, { LogId } from "../../common/logger.js"; | ||
|
||
export class ExportedData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: This Resource does not extend from ReactiveResource class because its templated MCP resource unlike static MCP resource (one that is built by ReactiveResource class). I found several points of independence from a "ReactiveResource" for example:
- This resource does not have a "state" that needs to be reduced
- Unlike a reactive resource, it only serves as a gateway to actual resources
- The configuration options for a templated resource are unique to it and while we can extend reactive resource to allow for these options, it wont' seem right.
This comment has been minimized.
This comment has been minimized.
📊 Accuracy Test Results📈 Summary
📊 Baseline Comparison
📎 Download Full HTML Report - Look for the Report generated on: 8/6/2025, 11:28:49 AM |
6109cea
to
e89cf23
Compare
Pull Request Test Coverage Report for Build 16781101181Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't reviewed it thoroughly, but leaving some comments for things that should be reworked. Happy to grab some time later today or tomorrow to go through the high-level architecture and offer some suggestions there.
"export-expired": [string]; | ||
"export-available": [string]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong to me. Event emitters should not have external components emit events on their behalf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm - are you suggesting that these events should not even be emitted on Session class or rather we should provide public methods for notifying of these events and handle emitting them exclusively in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more inclined to move the event manager as a field on the session and have consumers subscribe to the events on the event manager directly, rather than having them proxied through the session.
|
||
const MAX_LOCK_RETRIES = 10; | ||
|
||
export class SessionExportsManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have the export manager live on the session rather than the server, similarly to what Kevin is doing with the connection manager? Then tools can access it via the session rather than store it upon registration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly - I had it on Server only because there wasn't a pattern of accessing session internals in the codebase and I did not want to disturb that. Will move it to Session now that we are already doing the same for ConnectionManager.
src/common/sessionExportsManager.ts
Outdated
public exportsDirectoryPath(): string { | ||
// If the session is not connected, we can't cannot work with exports | ||
// for that session. | ||
if (!this.session.sessionId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure when and why this session id was added, but looking at it again, it's just an ObjectId that gets set when the server is initialized. Doesn't appear to be any reason not to set it upon session initialization, in which case we'd avoid these redundant checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just revisited this and yea it can actually be set on initialization, even for http transport
/** | ||
* Small utility to validate provided export name for path traversal or no | ||
* extension */ | ||
private validateExportName(nameWithExtension: string): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be static/free function.
return { | ||
contents: [ | ||
{ | ||
uri: this.server.exportsManager.exportNameToResourceURI(exportName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[q] Is this not just uri
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, the exportName is the variable substituted in the uri exported-data://{exportName}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this.server.exportsManager.exportNameToResourceURI(exportName)
would just put the exported-data://
prefix there, right, eventually giving us the original uri?
await this.exportsManager.createJSONExport({ input: findCursor, exportName, jsonExportFormat }); | ||
const exportedResourceURI = this.exportsManager.exportNameToResourceURI(exportName); | ||
const exportedResourcePath = this.exportsManager.exportFilePath( | ||
this.exportsManager.exportsDirectoryPath(), | ||
exportName | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path and the uri look like something that should be returned by the exportManager.createJSONExport
rather than being queried later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - I will make this change, right away.
return `exported-data://${nameWithExtension}`; | ||
} | ||
|
||
public exportsDirectoryPath(): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and exportFilePath
being public seems like a code smell to me. The way I see it, the sessionExportsManager
should encapsulate all the file management logic, and the entire public API should be more or less surfaced through the Export
type. That is, it should have API for listing, reading, writing, and deleting Export
's, but the consumers should not ever concern themselves with filesystem operations or paths.
await this.withExportsLock<void>(async (exportsDirectoryPath) => { | ||
const exportFilePath = path.join(exportsDirectoryPath, exportNameWithExtension); | ||
const outputStream = createWriteStream(exportFilePath); | ||
outputStream.write("["); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we missing a closing bracket in the outputStream? Why are we opening it and not closing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closing bracket gets appended in the final callback of Transform stream here.
|
||
/** | ||
* Acquires a lock on the session exports directory. */ | ||
private async withExportsLock<R>(callback: (lockedPath: string) => Promise<R>): Promise<R> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need this file lock. Usually you have a file lock when you have multiple processes competing for writing on the same file, but this is not the case.
- Deleting will remove files that are expired already. Worst case scenario, ENOENT when deleting or reading could be ignored.
- Files are written once, and never updated.
I would suggest to keep the lock in memory, as we do right now, and avoid using file locks which have their own set of problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a case which explicitly requires locking. It is when writing a file takes way more time than the configured timeout, lets imagine 30 seconds. The cleanup interval, without a lock, would remove this file while its being written and thus lead to inconsistent behaviour.
In addition to the case above, there is an improvement which gets hindered if we don't do the locking. Currently we wait for the export to be finished but in an ideal world, we should be able to return the export URI immediately and then push progress notification about the export progress and eventually the export availability. If we remove the locking, there is chance of reading partially exported data.
I can't vouch for proper-filelock's stability but we have been using it for our DiskStorage in accuracy tests and it has worked pretty reliably so far. It also provides transaction primitives such lock retries on failure (same as transaction queues) and exclusive updates which we are able to use to "confidently" update the state in memory and on disk.
void input.close(); | ||
if (pipeSuccessful) { | ||
const resourceURI = this.exportNameToResourceURI(exportNameWithExtension); | ||
this.availableExports = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have availableExports and exports files separated and that can lead to inconsistencies. I would suggest to keep only one as a source of truth.
If we need to clean up files that are not available, we can always iterate the folder and remove any file not in the availableExports array.
transform: function (chunk: unknown, encoding, callback) { | ||
++docsTransformed; | ||
try { | ||
const doc: string = EJSON.stringify(chunk, undefined, 2, ejsonOptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the padding? (the 2). This is going to be consumed likely by the agent or another program, and if a user wants to read the documents themselves, they can always use a formatter in their editor. This increases disk size and token count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid point. I don't think we need padding for our case, happy to remove it.
also makes SessionExportManager available on MongoDBToolBase
e89cf23
to
e8845bc
Compare
|
||
public exportNameToResourceURI(nameWithExtension: string): string { | ||
this.validateExportName(nameWithExtension); | ||
return `exported-data://${nameWithExtension}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return `exported-data://${nameWithExtension}`; | |
return `exported-data://${encodeURIComponent(nameWithExtension)}`; |
and the reverse when decoding?
Proposed changes
This PR introduces:
exported-data://
to access the data exports created by export toolJSON Export tool provides two formats for data export:
relaxed
: This exports plain json format at the possible loss of some BSON types. Useful when the data is expected to be used in tools other than MongoDB such as BI tools or when simply having conversation with LLMscanonical
: This exports json with BSON types and is expected to be used when data might get imported back again in MongoDB.JSON export tool allows exporting:
filter
(possibly withlimit
,sort
,projection
)Export tool result contains:
stdio
transport to make it easier for editor clients to access the exported file.Exported data temporarily resides on the host machine running the mongodb-mcp-server at the configured path (configuration exposed by flag
--exportPath
or environment variableMDB_MCP_EXPORT_PATH
), until it gets cleaned up automatically either by:--exportCleanupIntervalMs
or environment variableMDB_MCP_EXPORT_CLEANUP_INTERVAL_MS
)The expiry of exported data is controlled by flag
--exportTimeoutMs
or the environment variableMDB_MCP_EXPORT_TIMEOUT_MS
The exported data can be accessed via the resource URI template -
exported-data://{exportName}
. Because the exported data are written to non-guessable filenames, the autocomplete API is provided to easily autocomplete the matching file names.Checklist