Skip to content

Conversation

@TomeHirata
Copy link
Collaborator

Support File input by introducing dspy.File. This is the last content type supported by OpenAI chat completion that is not supported by DSPy natively.

Closes #8974

…a URI representation

- Added MIME type detection in `from_path` and `from_bytes` methods.
- Updated `__repr__` to display file data as a data URI.
- Modified tests to validate new functionality and ensure correct MIME type handling.

Signed-off-by: TomuHirata <[email protected]>
Copy link

@synaptiz synaptiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to add mime_type argument to from_file_id() function.

return cls(file_data=file_data, filename=filename)

@classmethod
def from_file_id(cls, file_id: str, filename: str | None = None) -> "File":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be good to add mime_type argument to from_file_id() function. This function will be used when the client already has the URI of an uploaded file. If the client is aware of the mime_type of the file, they should be able to pass that too to this function. Since the URI may not always contain the file extension, DSPy will have no other way to determine the mime_type without reading the file contents.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would mime_type be used? According to OpenAI specification, mime_type is not a supported field of the file content part: https://platform.openai.com/docs/api-reference/chat/create#chat_create-messages-user_message-content-array_of_content_parts-file_content_part-file.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TomeHirata ,

Thanks for sharing the specs link.

Based on my testing, it appears that LiteLLM expects format to be passed along with file_id. The format attribute is optional, if not passed, LiteLLM tries to infer it by reading the file content from the URL. This fails if the file isn't accessible. You can ignore my comment since this seems to be an issue on LiteLLM's side.

Best,

Rakesh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Using DSPy with Google Gemini Models unable to read uploaded file's content

2 participants