What can Copilot Vision do with attached images and PDFs?

Copilot can reason about the visual content of attached files together with code — analyzing design mockups, diagrams, error screenshots, or technical documents in the context of a code conversation.

What are browser tools in GitHub Copilot and what are they for?

Browser tools enable Copilot agents to control a real browser — navigation, clicking, typing, reading content, capturing console errors, and taking screenshots. Parallel agents maintain isolated sessions separate from the user's own activity.

Do admins need to take any action to enable Vision functionality?

No. Vision is enabled by default on all plans at GA, including Business and Enterprise, with no admin configuration required. The previous requirement for the 'Editor Preview Features' policy no longer applies.

Copilot Vision and Browser Tools Now GA

GitHub has declared GA two Copilot capabilities: Vision for attaching images and PDFs to chat prompts, and browser tools that give agents in VS Code control over a real browser. Both are available to all plans without admin action.

GitHub declared GA two significant Copilot capabilities on July 1, 2026 in a single day: Copilot Vision, which until now required special policy configuration on Business and Enterprise plans, and browser tools for VS Code, which for the first time give agents direct control over a real, live browser. Both capabilities are available to all users without admin action.

What Can Copilot Vision Do Now?

Copilot Vision enables attaching visual materials alongside chat prompts so that Copilot can reason about the content of images and documents together with code. Supported formats are JPEG, PNG, GIF, and WebP, as well as PDF documents.

In VS Code, files can be attached in three ways: pasting, drag-and-drop, or right-clicking on a file. On github.com, attaching works directly in the chat interface, while Copilot CLI supports specifying file paths in the terminal.

Practical use cases include analyzing design mockups while discussing implementation, diagnosing error screenshots, reasoning about architectural diagrams, and processing technical documents in PDF format — all within a single conversation with Copilot, without switching between tools.

Vision is available in all modes of operation: ask, plan, and agent.

Availability: All Plans, No Admin Action

The key change in the GA announcement is default availability. Until now, users on Business and Enterprise plans needed the “Editor Preview Features” policy enabled to access Vision capabilities. From July 1, 2026 that requirement no longer exists.

Vision is enabled by default on all plans — Free, Pro, Pro+, Business, and Enterprise — without any admin action. This removes the administrative barrier for organizations that had been delaying activation due to preview feature approval procedures.

The only special note for Business and Enterprise users: attached images and PDFs are retained for approximately 24 hours for service-delivery purposes.

Browser Tools: Browser Control from VS Code

Alongside the Vision GA, GitHub also declared GA browser tools within VS Code — a capability that gives Copilot agents direct control over a real, live browser, not a simulated environment.

Through browser tools, agents can perform the following actions:

Navigation — opening URLs and moving through pages
Interaction — clicking, typing, hovering, drag and drop, managing dialogs
Reading — fetching page content and DOM element attributes
Diagnostics — capturing console errors and JavaScript exceptions
Screenshots — capturing the current state of a page

Parallel agents can simultaneously hold isolated browser sessions, independent of each other and separate from the user’s own browser activity.

Privacy and Granular Permissions

GitHub designed browser tools with user privacy as an explicit priority. Open tabs remain private — they cannot be read without the user’s action. The user must explicitly share a tab with the agent via the “Share with Agent” option.

For sensitive permissions — camera, microphone, location access, and clipboard reading — the system requests explicit approval at each use, not once at installation or agent launch. This means an agent cannot access those resources without active user consent for each individual operation.

Enterprise Administrator Controls

For Enterprise organizations, browser tools come with granular controls:

Toggle workbench.browser.enableChatTools to enable or disable browser tools at the organization level
Domain filters that restrict which domains agents may navigate to, preventing unauthorized access to external content

These controls allow organizations to use browser tools in a controlled environment — for example by restricting navigation to internal development servers or test environment domains — without fully disabling the capability.

Two GA Releases in One Day

The simultaneous GA of Vision and browser tools is not coincidental. Both capabilities extend Copilot’s reach beyond text and code — Vision toward visual materials and documents, browser tools toward the actual state of a web application in development or production.

Together with the simultaneous arrival of Kimi K2.7 Code as the first open-weight model in Copilot and the announcement of the GitHub Models platform shutdown by July 30, 2026, this date becomes a significant milestone in GitHub’s AI strategy: fewer standalone platforms, more capabilities consolidated within a single tool available to everyone without additional configuration.

GitHub Copilot Vision and Browser Tools: Two GA Capabilities in One Day

What Can Copilot Vision Do Now?

Availability: All Plans, No Admin Action

Browser Tools: Browser Control from VS Code

Privacy and Granular Permissions

Enterprise Administrator Controls

Two GA Releases in One Day

Frequently Asked Questions

Sources

Related news