🟡 🤖 Models Thursday, May 7, 2026 · 2 min read ·

Google: Gemini API Gets Multimodal File Search for Images and Breaking Change in Interactions API

Editorial illustration: Gemini API gains multimodal File Search and breaking change in Interactions API

Google has expanded Gemini File Search to multimodal image search using the gemini-embedding-2 model, with media_id in grounding metadata for visual citations. Simultaneously, a breaking change is announced in the Interactions API where outputs becomes steps, with the new default on 20.05.2026 and removal of the old schema on 06.06.2026.

🤖

This article was generated using artificial intelligence from primary sources.

Google announced two significant changes in the Gemini API changelog: an extension of File Search to multimodal image search (6 May 2026) and a breaking change in the Interactions API (7 May 2026). Both changes affect developers building applications on the Gemini stack.

What does multimodal File Search enable?

File Search now natively embeds and searches images using the new gemini-embedding-2 model. This eliminates the previous workflow where developers had to separately generate embeddings for visual content or convert images into text descriptions.

Grounding metadata has been expanded with two new fields: media_id, which enables visual citations (precisely identifying which image contributed to the response), and page_numbers, which points to specific pages within the source document. The combination makes it easier to build RAG systems over PDFs and other documents that mix text and images.

What is changing in the Interactions API?

This is a breaking change in the request and response schema. The outputs field is being renamed to steps, and the output format configuration (response_format) is also changing. Google states in the changelog: “The Interactions API request and response schema (outputssteps) and output format configuration (response_format) are changing.”

The new schema becomes default on 20 May 2026, giving developers two weeks to test migration before clients are automatically switched over. The old schema is fully removed on 6 June 2026 — after that date, old client code will no longer work.

What do developers need to do?

Teams using the Interactions API must update their response parsing logic and check for references to outputs fields in their code. Google recommends consulting the migration guide before 20 May to avoid production disruptions.

For File Search users, the recommendation is to review image retrieval and verify whether the new media_id and page_numbers fields are being used in the citation UI. The multimodal extension is backward-compatible — existing text searches continue to work without modifications.

Frequently Asked Questions

What does the multimodal File Search extension bring?
File Search can now natively embed and search images using the gemini-embedding-2 model. Grounding metadata includes media_id for visual citations and page_numbers indicating where information is located in source documents.
How will the Interactions API change?
The schema renames outputs to steps, and output format configuration (response_format) also changes. The new schema becomes default on 20 May 2026, while the old schema is fully removed on 6 June 2026.
How much time do developers have to migrate?
From 20 May 2026 the new schema is default, but the old one continues to work until 6 June 2026 — giving approximately two weeks of transition time for testing and adapting client implementations.