Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add plain text to multimodal prompting guide #400

Closed
wants to merge 4 commits into from
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 29 additions & 7 deletions site/en/gemini-api/docs/prompting_with_media.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,9 @@
},
Copy link
Member

@markmcd markmcd May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this wording sound less strict?

they are subject to the following limitations and requirements

It's not a strict requirement that plain text inputs adhere to one of these formats, since they don't map to an encoding like other media.

We should also recommend that users specify the MIME type manually if it isn't in this list. In this cookbook recipe I gave an example with a C++ file because that's not detected as plain-text, but it really is, so users can specify it.


Reply via ReviewNB

"source": [
"The Gemini API supports prompting with text, image, and audio data, also known as *multimodal* prompting. You can include text, image,\n",
"and audio in your prompts. For small images, you can point the Gemini model\n",
"directly to a local file when providing a prompt. For larger images, videos\n",
"(sequences of image frames), and audio, upload the files with the [File\n",
"API](https://ai.google.dev/api/rest/v1beta/files) before including them in\n",
"and audio in your prompts. For small files, you can point the Gemini model\n",
"directly to a local file when providing a prompt. Upload larger files with the\n",
"[File API](https://ai.google.dev/api/rest/v1beta/files) before including them in\n",
"prompts.\n",
"\n",
"The File API lets you store up to 20GB of files per project, with each file not\n",
Expand Down Expand Up @@ -176,7 +175,7 @@
"source": [
"## Upload a file to the File API\n",
"\n",
"The File API lets you upload a variety of multimodal MIME types, including images and audio formats. The File API handles inputs that can be used to generate content with [`model.generateContent`](https://ai.google.dev/api/rest/v1/models/generateContent) or [`model.streamGenerateContent`](https://ai.google.dev/api/rest/v1/models/streamGenerateContent).\n",
"The File API lets you upload a variety of multimodal MIME types, including plain text, images, and audio formats. The File API handles inputs that can be used to generate content with [`model.generateContent`](https://ai.google.dev/api/rest/v1/models/generateContent) or [`model.streamGenerateContent`](https://ai.google.dev/api/rest/v1/models/streamGenerateContent).\n",
"\n",
"The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API."
]
Expand Down Expand Up @@ -328,7 +327,7 @@
"source": [
"## Supported file formats\n",
"\n",
"Gemini models support prompting with multiple file formats. This section explains considerations in using general media formats for prompting, specifically image, audio, and video files. You can use media files for prompting only with specific model versions, as shown in the following table.\n",
"Gemini models support prompting with multiple file formats. This section explains considerations in using general media formats for prompting, specifically image, audio, video, and plain text files. You can use media files for prompting only with specific model versions, as shown in the following table.\n",
"\n",
"<table>\n",
" <thead>\n",
Expand Down Expand Up @@ -387,7 +386,30 @@
"\n",
"You can use video data for prompting with the `gemini-1.5-pro` model. However, video file formats are not supported as direct inputs by the Gemini API. You can use video data as prompt input by breaking down the video into a series of still frame images and a separate audio file. This approach lets you manage the amount of data, and the level of detail provided by the video, by choosing how many frames per second are included in your prompt from the video file.\n",
"\n",
"Note: Video files added to a prompt as constituent parts, audio file and image frames, are considered as separate prompt data inputs by the model. For this reason, requests or questions that specify the time when *both* an audio snippet and video frames appear in the source video may not produce useful results."
"Note: Video files added to a prompt as constituent parts, audio file and image frames, are considered as separate prompt data inputs by the model. For this reason, requests or questions that specify the time when *both* an audio snippet and video frames appear in the source video may not produce useful results.\n",
"\n",
"### Plain text formats\n",
"\n",
"When you use plain text files for prompting, they are subject to the following limitations and requirements:\n",
"\n",
"- The File API supports uploading plain text files with the following MIME\n",
" types:\n",
" - text/plain\n",
" - text/html \n",
" - text/css\n",
" - text/javascript\n",
" - application/x-javascript\n",
" - text/x-typescript\n",
" - application/x-typescript\n",
" - text/csv\n",
" - text/markdown\n",
" - text/x-python\n",
" - application/x-python-code\n",
" - application/json\n",
" - text/xml\n",
" - application/rtf\n",
" - text/rtf\n",
" - video/text/timestamp"
]
},
{
Expand Down
Loading