Google unveils its new Gemini 3 Flash: emphasizing low cost and high performance, with inference speeds up to 3 times faster than the Gemini 2.5 Pro.

12-18

This article is machine translated

Show original

Google, having just launched Gemini 3, followed up with Gemini 3 Flash just one month later, making it the newest member of the Gemini 3 model family that emphasizes "performance and application deployment." The official positioning is clear: this model significantly improves speed and reduces costs without sacrificing inference quality, aiming to become the most efficient general-purpose model for daily tasks and agent-based workflows.

Professional-grade reasoning capabilities, with speed and cost optimized simultaneously.

Google points out that the biggest feature of Gemini 3 Flash is that it integrates the inference power of Gemini 3 Pro into the high-efficiency architecture that has always been present in the Flash series.

In multiple advanced benchmark tests, the Gemini 3 Flash has achieved doctoral-level inference and multimodal understanding capabilities comparable to larger prospective models, and significantly surpasses its predecessor, the Gemini 2.5 Pro, in several metrics.

Deep thinking for complex tasks, more resource-efficient for daily tasks

In its actual design, the Gemini 3 Flash is positioned as a model with "adjustable thinking time". When faced with highly difficult reasoning problems, the model will invest a longer thinking time.

When handling general daily tasks, the average number of tokens used is reduced by about 30% compared to the 2.5 Pro. While reducing the number of tokens used, it still maintains higher performance and accuracy.

Inference speed increased by 3 times, price significantly reduced.

In terms of raw processing speed, the Gemini 3 Flash continues the Flash series' strengths. According to Artificial Analysis benchmark tests, its inference speed is 3 times faster than the Gemini 2.5 Pro, at a fraction of the cost. The official pricing is:

Input: $0.50 per million words.
Output: $3 per million words.
Audio input: $1 per million words.

Google says this combination of performance and price makes Gemini 3 Flash particularly suitable for large-scale, high-frequency practical applications.

Two key applications of Gemini 3 Flash

Key Point 1: Proxy-based and High-Frequency Iterative Development

The primary application of Gemini 3 Flash is focused on agentic workflows and iterative development.

In the SWE-bench Verified test, which evaluates programmatic capabilities, the Gemini 3 Flash achieved a score of 78%, outperforming not only the 2.5 series but also the Gemini 3 Pro. Google notes that this makes it particularly suitable for:

Proxy programming.
Production-level system maintenance.
Interactive applications that require rapid responses.

Currently, Gemini 3 Flash can quickly execute and update real-world applications on the Google Antigravity platform.

Key Point 2: Balancing Multimodal Reasoning and Rapid Analysis

In terms of multimodal capabilities, the Gemini 3 Flash is positioned as a model that can simultaneously achieve both "speed" and "deep inference." The official specifications state that it is particularly suitable for:

Analysis of complex film content.
Data extraction and structuring.
Visual question answering and cross-modal understanding.

These capabilities can support in-game smart assistants, A/B testing systems, and application scenarios that require real-time responses and in-depth analysis.

From enterprise practice to daily use, speed and efficiency are upgraded simultaneously.

Google stated that feedback from enterprises regarding Gemini 3 Flash has been quite positive. Companies such as JetBrains, Bridgewater Associates, and Figma have begun to implement it in their actual business processes, generally reporting significant improvements in inference speed and computational efficiency, with overall inference performance approaching the level of large-scale models.

On the other hand, in consumer applications, Gemini 3 Flash has become the default model for Gemini Apps, officially replacing 2.5 Flash, and all users can use Gemini 3-level capabilities for free. Google points out that with its multimodal reasoning capabilities, users can more quickly understand the content of images and videos, organize information into actionable plans, and even quickly produce working application prototypes through voice alone, without any programming background.

Currently, Gemini 3 Flash is available for preview in Google AI Studio and Google Antigravity's Gemini API, and is continuously being pushed to the Gemini App and Search AI mode.

(Google officially launches Gemini 3: the most powerful AI agentic and Vibe Coding large-scale language model to date)

This article, titled "Google Launches New Gemini 3 Flash: Focusing on Low Cost and High Performance, Inference 3 Times Faster Than Gemini 2.5 Pro," first appeared on ABMedia .

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content