feat(llm): add video and audio media support to Gemini protocol by remorses · Pull Request #31889 · anomalyco/opencode · GitHub
Skip to content

feat(llm): add video and audio media support to Gemini protocol#31889

Merged
rekram1-node merged 9 commits into
anomalyco:devfrom
remorses:video-media-support
Jun 22, 2026
Merged

feat(llm): add video and audio media support to Gemini protocol#31889
rekram1-node merged 9 commits into
anomalyco:devfrom
remorses:video-media-support

Conversation

@remorses

@remorses remorses commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Issue for this PR

N/A

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Extends the Gemini protocol to accept video and audio media as inlineData, not just images.

Video: mp4, webm, quicktime added to MEDIA_MIMES.

Audio: wav, mp3, aiff, aac, ogg, flac added to MEDIA_MIMES. These are the 6 formats listed in the Gemini audio docs.

Gemini's protocol creates its local supported set from ProviderShared.MEDIA_MIMES, so both video and audio flow through the same inlineData path as images with no protocol file changes.

Other protocols (OpenAI Chat, OpenAI Responses, Anthropic, Bedrock) are unaffected because they reference IMAGE_MIMES directly.

Also raises media size limits to 20 MB decoded / 28 MB encoded.

How did you verify your code works?

Existing Gemini test suite passes (bun test test/provider/gemini.test.ts). Other provider tests (OpenAI Responses, Anthropic) still correctly reject audio.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

…ni protocol

Adds video file reading (MP4, WebM) to the read tool and widens
the Gemini protocol to accept video MIME types via inlineData.

MIME sniffing uses ftyp major brand validation for MP4 (excludes
AVIF/HEIC/M4A) and EBML DocType verification for WebM (excludes
Matroska). Video bypasses photon image normalization and passes
through as raw base64.

Gemini gets higher media size limits for video (20 MB decoded /
28 MB encoded) while other protocols keep the default image limits
(6 MB / 8 MB). OpenAI and Anthropic stay image-only since their
APIs do not support inline base64 video.

Session: ses_1494c002affeaQLbzZve1wSJEm
@github-actions

Copy link
Copy Markdown
Contributor

…pport

Video support is provided through the protocol layer only, so plugins
can return video via Tool.make toModelOutput. The native read tool
stays image/PDF-only per maintainer preference.

Session: ses_1494c002affeaQLbzZve1wSJEm
@remorses remorses changed the title feat(core): add video media support (mp4, webm) to read tool and Gemini protocol feat(llm): add video media support to Gemini protocol Jun 11, 2026
remorses added 2 commits June 11, 2026 15:23
- Replace base64 regex with character-by-character validator to avoid
  Bun/JSC regex failure on large strings (>4 MB)
- Lower Gemini video limits to 14 MB decoded / 20 MB encoded to stay
  within Gemini total request size budget
- Case-insensitive video MIME detection for limit selection

Session: ses_1494c002affeaQLbzZve1wSJEm
Add VIDEO_MIMES and MEDIA_MIMES to ProviderShared and widen
Gemini protocol to accept video MIME types (mp4, webm, quicktime)
via inlineData. Plugins can now return video content to Gemini
models through toModelOutput or v1 ToolAttachment.

Other protocols stay image-only.

Session: ses_1494c002affeaQLbzZve1wSJEm
@remorses remorses marked this pull request as ready for review June 11, 2026 13:27
remorses added 2 commits June 11, 2026 16:18
Previous 6 MB / 8 MB limits were too tight for video. Raise to
20 MB decoded (raw bytes) / 28 MB encoded (base64 string) to
support reasonable video clip sizes.

Session: ses_1494c002affeaQLbzZve1wSJEm
Add AUDIO_MIMES array with 12 audio types supported by the Gemini API
(wav, mp3, mpeg, aiff, aac, ogg, flac, m4a, mp4, opus, pcm, webm) and
include it in MEDIA_MIMES. Since Gemini creates its local supported set
from ProviderShared.MEDIA_MIMES, audio flows through the same inlineData
path as images and video with zero protocol changes needed.

Other protocols (OpenAI Chat, OpenAI Responses, Anthropic, Bedrock) are
unaffected since they reference IMAGE_MIMES directly.

Session: ses_135abc74affeXj589B4g7YkPb0
@remorses remorses changed the title feat(llm): add video media support to Gemini protocol feat(llm): add video and audio media support to Gemini protocol Jun 15, 2026
@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label Jun 15, 2026
Remove 6 audio types not listed in Gemini docs (mpeg, m4a, mp4, opus,
pcm, webm). Keep only the 6 officially supported formats: wav, mp3,
aiff, aac, ogg, flac.

Ref: https://ai.google.dev/gemini-api/docs/audio#supported-audio-formats

Session: ses_135abc74affeXj589B4g7YkPb0
@github-actions github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Jun 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

@rekram1-node rekram1-node merged commit d5980b4 into anomalyco:dev Jun 22, 2026
6 of 8 checks passed
markjaquith pushed a commit to markjaquith/opencode that referenced this pull request Jun 23, 2026
…alyco#31889)

Co-authored-by: Aiden Cline <63023139+rekram1-node@users.noreply.github.com>
BenGu3 pushed a commit to BenGu3/opencode that referenced this pull request Jun 27, 2026
…alyco#31889)

Co-authored-by: Aiden Cline <63023139+rekram1-node@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants