[new feature request] Is it possible to support screenshot as a server or resource? #48

wa008 · 2024-11-26T02:37:43Z

I want LLM to read screenshot as a input, like what computer use do, but MCP can have a better permission control management.

Is it possible? or is there some clue or suggestion that can help me to contribute it.

jspahrsummers · 2024-11-26T03:16:55Z

You could definitely build an MCP server for this. The server could offer one tool which takes a screenshot, and returns it as an image.

Does that help?

deksden · 2024-11-28T14:49:05Z

Is it possible?

Puppeteer server sample have tool:

puppeteer_screenshot

Capture screenshots of the entire page or specific elements
Inputs:
- name (string, required): Name for the screenshot
- selector (string, optional): CSS selector for element to screenshot
- width (number, optional, default: 800): Screenshot width
- height (number, optional, default: 600): Screenshot height

rmensing · 2024-11-29T00:57:36Z

Not sure if this is within the scope here.

We know that the Claude desktop app, through MCP, has access to images using the screenshot tool in the puppeteer server, although the image is available to Claude in base64 encoded format.

We could probably modify the filesystem server to also make images available to Claude also.

The question is, though, how can we actually give it to Claude so he can view it the same as if we uploaded an image directly in chat?

deksden · 2024-11-30T19:52:27Z

@rmensing I m sure you have to serve it base64 encoded, same as for any API request.

MCP server only provides some data ( context, prompt, tools) to host/client, and its up to you how you communicate with LLM. No difference where you get data: from MCP server or just performing generic API call to LLM.

rmensing · 2024-12-01T01:04:45Z

@deksden In chat it cannot directly use the base64 formatted image. Although it could write code to convert it back to the image format, it still does not have the ability to view it. The only reason base64 is used in the API is so it can be sent in a text format. once it is received it is converted back to an image more than likely.

There would need to be the functionality in the Claude desktop app MCP client implementation for it to attach the image, or other documents for that matter, the same as you do in the chat interface. Unfortunately we don't have any documentation on the interface between the MCP client code and the desktop app as far as what the desktop app can or can't do. My guess is that they could probably code it to accept the images and view them, it's only a matter of will they. It could be a case of them having a reason to not implement it.

jspahrsummers · 2024-12-03T13:35:11Z

As this is supported in MCP, and demonstrated in the examples, I'm going to close this out as resolved. Please start a discussion if you have other questions.

jspahrsummers closed this as completed Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[new feature request] Is it possible to support screenshot as a server or resource? #48

[new feature request] Is it possible to support screenshot as a server or resource? #48

wa008 commented Nov 26, 2024

jspahrsummers commented Nov 26, 2024

deksden commented Nov 28, 2024

rmensing commented Nov 29, 2024

deksden commented Nov 30, 2024

rmensing commented Dec 1, 2024

jspahrsummers commented Dec 3, 2024

[new feature request] Is it possible to support screenshot as a server or resource? #48

[new feature request] Is it possible to support screenshot as a server or resource? #48

Comments

wa008 commented Nov 26, 2024

jspahrsummers commented Nov 26, 2024

deksden commented Nov 28, 2024

rmensing commented Nov 29, 2024

deksden commented Nov 30, 2024

rmensing commented Dec 1, 2024

jspahrsummers commented Dec 3, 2024