Claude is down, but Gemini 3.1 Flash-Lite is rolling out right now

If you tried using Claude today and ran into errors, you’re not alone. Claude is still having issues after outage on March 2, 2026.

The outage hit at the worst possible time for developers and teams who rely on it for production workloads, internal tools, and content pipelines.

The timing is awkward for Anthropic, but it’s interesting for Google.

On the same day Claude is struggling with availability, Google is rolling out Gemini 3.1 Flash-Lite in preview to developers via the Gemini API in Google AI Studio and to enterprises through Vertex AI.

Gemini 3.1 Flash-Lite is positioned as Google’s fastest and most cost-efficient Gemini 3 model so far. It’s built specifically for high-volume workloads where scale and latency matter more than flashy demos. Pricing starts at $0.25 per million input tokens and $1.50 per million output tokens, which undercuts larger models and makes it viable for tasks that run thousands or millions of times per day.

Google says 3.1 Flash-Lite delivers a 2.5x faster Time to First Answer Token compared to Gemini 2.5 Flash and a 45 percent increase in output speed, based on Artificial Analysis benchmarks.

If you’re running high-frequency workflows like real-time chat, content moderation, or translation at scale, speed is not a luxury, it’s infrastructure.

The model also posts an Elo score of 1432 on the Arena.ai leaderboard and scores 86.9 percent on GPQA Diamond and 76.8 percent on MMMU Pro, placing it competitively against other models in its tier.

Google highlights that it even surpasses some larger previous-generation Gemini models in reasoning and multimodal understanding benchmarks.

There’s also a control layer aimed directly at developers. Gemini 3.1 Flash-Lite includes configurable “thinking levels” in AI Studio and Vertex AI, allowing teams to tune how much reasoning the model applies per task. That matters when you’re balancing cost and performance across thousands of requests per minute.

You can keep things lean for bulk translation or moderation, then dial up reasoning for UI generation, simulations, or more complex instruction-following tasks.

Related Articles

OpenAI GPT-5.3 Instant released, when will you get it, and benchmarks

Apple March Event Live: M5 Pro, M5 Mac, MacBook Air, MacBook Pro, and new Studio Displays

Is the Mac mini M5 launching today? Here’s everything you need to know

Leave a Comment Cancel reply