What is Gemini 3.5 Flash?

Gemini 3.5 Flash is Google DeepMind's fastest frontier AI model, specifically optimized for agentic tasks, coding, and long-horizon reasoning. It produces 289 tokens per second and performs near the top on coding benchmarks.

What is the difference between Gemini 3.5 Flash and 2.5 Flash?

3.5 Flash offers 10-20% improvement in low-reasoning coding performance, multi-agent architecture (Antigravity harness), better long-horizon task management, and richer web UI generation. Speed increased from 219.7 to 289 tokens/s. Pricing is 5x higher for input and 3.6x higher for output.

How much does Gemini 3.5 Flash cost?

$1.50 per million input tokens and $9.00 per million output tokens. The 1 million token context window is included at no additional cost. This pricing is significantly lower than frontier Pro-tier models.

What tasks is Gemini 3.5 Flash best at?

It excels at autonomous software development, large codebase analysis, agentic tasks requiring multi-step tool use, data analysis, and workflow automation. Its multi-agent architecture provides unique capabilities for parallel task execution.

How do I use the Gemini 3.5 Flash API?

It is generally available from launch through Google AI Studio and the Gemini API. Vertex AI integration is also available. Compatible with all major AI development frameworks. Can be used for text, code, or multimodal tasks through simple API calls.

Gemini 3.5 Flash

Proprietary

4.7

Google DeepMind

Gemini 3.5 Flash is Google DeepMind's fastest frontier AI model, designed specifically for agentic tasks, coding, and long-horizon reasoning. Announced at Google I/O 2026, it delivers 289 tokens per second — four times faster than comparable frontier models — while scoring within two points of Anthropic's flagship on coding benchmarks at one-third the cost. The model achieves 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 83.6% on MCP Atlas (tool use), and 84.2% on CharXiv Reasoning (multimodal). It introduces a multi-agent architecture with Antigravity harness for parallel subagent execution, making it exceptionally capable at complex, multi-step software engineering tasks. With a 1 million token context window, it handles entire codebases, long documents, and extended conversations without context degradation. Priced at $1.50 per million input tokens and $9.00 per million output tokens, it offers frontier-level intelligence at Flash-tier pricing. Available through Google AI Studio and the Gemini API with general availability from day one.

Visit Website

Key Highlights

Fastest Frontier Model

289 tokens per second output — 4x faster than other frontier models, ideal for agentic tasks.

Multi-Agent Architecture

Parallel subagent spawning and coordination with Antigravity harness — simultaneous coding, testing, and review.

Frontier Quality, Flash Pricing

Performance within 2 points of Anthropic's flagship at one-third the cost — 1M token context window included.

Superior Tool Use

83.6% on MCP Atlas — effective interaction with external tools, APIs, and development environments.

About

Gemini 3.5 Flash is Google DeepMind's most capable Flash-tier model, unveiled at Google I/O on May 19, 2026. Positioned as the fastest frontier AI model available, it targets the growing demand for AI agents that can autonomously execute complex, multi-step tasks in software engineering, data analysis, and business automation.

The model's headline metric is speed: 289 tokens per second output, making it approximately four times faster than other frontier models. This speed doesn't come at the expense of quality — Gemini 3.5 Flash scores within two points of Anthropic's flagship model on major coding benchmarks while costing roughly one-third the price. On Terminal-Bench 2.1, a rigorous software engineering benchmark, it achieves 76.2%. Its GDPval-AA Elo rating of 1656 places it among the top coding models globally.

A key innovation is the multi-agent architecture with the Antigravity harness system. This enables Gemini 3.5 Flash to spawn and coordinate parallel subagents for concurrent task execution. In practice, this means the model can simultaneously research documentation, write code, run tests, and review results — dramatically accelerating complex software development workflows. The architecture is purpose-built for agentic scenarios where the model must plan, execute, observe results, and iterate autonomously.

The model excels in tool use scenarios, scoring 83.6% on MCP Atlas, which evaluates how effectively models interact with external tools and APIs. Combined with its 84.2% on CharXiv Reasoning for multimodal understanding, it can process technical diagrams, charts, and visual documentation alongside code — a critical capability for real-world engineering work.

With a 1 million token context window, Gemini 3.5 Flash can ingest entire codebases, lengthy technical specifications, and extended conversation histories without losing coherence. This is particularly valuable for code understanding tasks where the model needs to reason about the relationships between files spread across a large project.

Compared to its predecessor Gemini 2.5 Flash, the 3.5 version shows 10-20% improvement in low-reasoning coding performance, significantly better handling of long-horizon tasks, and enhanced multi-agent coordination. It also produces richer, more interactive web UIs and graphics when given generation tasks.

Pricing follows Google's aggressive Flash-tier strategy: $1.50 per million input tokens and $9.00 per million output tokens — roughly 5x cheaper than input and 3.6x cheaper than output compared to frontier Pro-tier models. The 1M token context window is included at no additional premium. This pricing makes it economically viable for always-on agent scenarios that would be prohibitively expensive with other frontier models.

Gemini 3.5 Flash is generally available from launch through Google AI Studio and the Gemini API, with integration support for all major AI development frameworks. The 3.5 Pro variant is in internal testing with public release planned for the following month.

Use Cases

Autonomous Software Development

Autonomously executing code writing, test generation, debugging, and code review with parallel subagents.

Large Codebase Analysis

Understanding entire projects with 1M token context, dependency analysis, and generating refactoring recommendations.

Data Analysis and Reporting

Processing large datasets, performing statistical analysis, and generating interactive visualizations.

Workflow Automation

Automating API integrations, data transformations, and multi-step business processes through tool use.

Pros & Cons

Pros

Fastest frontier model at 289 tokens/s — critical speed advantage for agent workloads
Delivers frontier quality at Flash pricing — one-third the cost of competitors
Multi-agent architecture with parallel task execution and coordination
1M token context window for comprehending entire codebases
Superior tool use capability with 83.6% on MCP Atlas

Cons

5x more expensive input and 3.6x more expensive output pricing compared to 2.5 Flash
Pro variant not yet publicly available — full potential pending
Closed source — weights not downloadable, local execution not possible
Very new — community ecosystem and integration tools still developing

Technical Details

Parameters

undisclosed

Architecture

Gemini Transformer

Training Data

proprietary

License

Proprietary

Features

289 tokens/s Output Speed
1M Token Context Window
Multi-Agent Architecture (Antigravity)
Advanced Tool Use (MCP)
Code Generation and Review
Multimodal Understanding
Long-Horizon Task Execution
Interactive Web UI Generation
Parallel Subagent Coordination

Benchmark Results

Metric	Value	Compared To	Source
Terminal-Bench 2.1	76.2%	Gemini 2.5 Flash: ~65%	Google DeepMind
GDPval-AA (Elo)	1656	—	Google DeepMind
MCP Atlas (Tool Use)	83.6%	—	Google DeepMind
CharXiv Reasoning	84.2%	—	Google DeepMind
Output Speed	289 tokens/s	Gemini 2.5 Flash: 219.7 t/s	Google DeepMind
Context Window	1M tokens	—	Google DeepMind

Available Platforms

google ai studio

gemini api

vertex ai

News & References

Gemini 3.5 Flash announced at Google I/O 2026 — fastest agentic model

Google DeepMind · 2026-05-19

Gemini 3.5 Flash scores within two points of Anthropic's flagship

R&D World · 2026-05-19

Google unveils Gemini 3.5 and AI agent Gemini Spark

CNBC · 2026-05-19

Frequently Asked Questions

Quick Info

Parametersundisclosed

Typetransformer

LicenseProprietary

Released2026-05

ArchitectureGemini Transformer

Rating4.7 / 5

CreatorGoogle DeepMind

Links

Official Website deepmind.google en.wikipedia.org