AI & Machine Learning

The Blueprint: How Movix fills a gap in dental skills with specialized agentic AI

Fri, 22 May 2026 16:00:00 +0000

Welcome to The Blueprint, a regular feature where we highlight how Google Cloud customers are tackling unique and common challenges across industries using the latest AI and cloud technologies. We hope to inspire others looking to innovate in their work.

The demand for dental appliances, like crowns and aligners, is booming, but it’s hard for manufacturers to keep up. At Movix, we’re building one of the first agentic AI solutions for dental appliance manufacturers and dental labs to help companies in the sector acquire digital technical expertise so they can scale clinical workflows cost-effectively and consistently.

The challenge:

Movix started in 2025 with a mission to solve a serious shortage of skilled dental technicians in aligner manufacturing through AI and agentic workflows. The need is significant: the global dental market is valued at nearly $400 billion and growing at double digits, yet many operations remain analog - creating enormous demand for co-pilot, agentic solutions.

Before founding Movix, we had previously started a vertically integrated dental aligner company that focused on very difficult dental situations, such as very crooked teeth. Yet even with highly skilled and trained technicians, there were often mistakes that would require remaking an aligner — a process that costs $300, roughly 25% of the retail price. Poor quality control took a real bite out of the company’s margins.

We saw an opportunity with Movix to address these mistakes by providing technicians with AI-powered quality control agents that automate aligner workflows and reduce errors. To achieve this, we needed to solve for a few key technical challenges:

Develop a custom AI model and end-to-end agentic workflow, since off-the-shelf solutions lacked domain expertise,
Ensure scalability would be built into the platform to prevent outages or production delays,
Achieve broad interoperability through a complex hybrid integration strategy since many dental practices are slow to adopt new technology and run on legacy systems.
Optimize security and compliance to comply with medical record regulatory requirements and keep patient data safe.

The solution:

In order to deliver AI agents that can provide expert-level accuracy, we needed to custom build a lot of the tooling ourselves. We started by developing our custom models for deep learning, computer vision, and 3D mesh analysis over a five-month period, using Google Cloud infrastructure. This intensive, methodical time helped ensure the right level of accuracy and quality control.

We use Google Cloud infrastructure across the full pipeline — from dataset storage and model training to evaluation — to build and refine our defect detection models for intraoral scans. Once defects are detected, we use Gemini Enterprise Agent Platform to generate client-facing feedback that reads as if it came directly from a human technician — acting as a digital team member in the quality control workflow.

Our 3D models use Cloud Run with L4 GPUs for the massive compute power we require; notably, performing the 3D segment scans and detecting defects across the entire fabrication process are highly compute-intensive processes. We use Compute Engine VMs to run experiments, along with various other GPUs to train our models, and perform the heavy lifting of model development in this environment.

Cloud Run and other tools like Cloud Storage support our scalability goals as we target large customers who handle high case volumes — some large labs might produce up to 200,000 appliances per year. Google Cloud's global network of data centers also simplifies regulatory compliance across regions and ensures fast delivery of large 3D datasets to clients worldwide.

The architecture:

The outcome:

Our agentic solutions automate data entry and quality control, which are traditionally manual, time-consuming, and error prone tasks. By automating the work of the best dental technicians, we’re ensuring a top quality product that will improve the fit of crowns, aligners, veneers, and implants for many, many patients. We estimate that our automation and the higher level of accuracy our QC agent delivers could save an aligner manufacturer $300 per remake, for example.

We also believe we’re helping to speed the appliance manufacturing process, leading to quicker turnaround times for dental appliances, which helps dental labs receive revenue faster and improve their cash flow. And we already know we’re meeting a critical need: After we launched the QC agent in October 2025, our first customer signed with us in December. That customer, Orthero, an aligner company serving more than 20 countries, has enjoyed significant results.

“Orthero benefits from this automation by making quality control faster, more consistent, and scalable,” Efer Turhan, a co-founder of Orthero, said. “With support from Movix’s QC AI Agent, we detect missing or inconsistent inputs early and flag unusual deviations before they cause delays.”

The details:

Even with the advantages of AI, our goals demand some serious work. Our architecture supports a solution that’s agentic and modular, integrates into existing on-premises dental systems, and ensures security and compliance.

Our agentic approach allows our system to run checks and balances, manage the complex, multi-step process of quality control for dental scans, and eliminate human errors that occur in data handling and quality review. Our goal is to develop five distinct AI agents by 2029 that cover the entire dental appliance workflow, from original patient dental scan to appliance manufacturing. While our first agents focus on data entry and dental scan quality control, our next agents will handle 3D file repair, clinical review, treatment planning, and manufacturing.

Our solution architecture also enables our system to integrate seamlessly with our customers’ existing lab management and manufacturing systems through API integrations. Because we are selling our solution into a conservative market, we decided to bear the burden of responsibility for successful adoption by doing as much of the integration work as possible.

Because we operate in the highly regulated healthcare industry, we built an environment that strictly follows compliance rules, anonymizing protected health information, or PHI, before it enters our machine learning pipeline to prevent health information from being exposed to the processing environment.

We plan to build hybrid solutions to capture a wider market as we move forward. We're designing an architecture that connects our cloud-based AI agents with older, on-premises software that many conservative labs still use — through lightweight local connectors and standardized APIs. This will allow us to access a large market segment that has not yet migrated to the cloud or begun to use new digital dental technologies.

Taken together, we are not just solving a skills gap, we are reimagining what is possible with co-pilot and agentic solutions across the entire dental industry.

How Glance turns hours of video into mobile-ready clips with AI

Thu, 21 May 2026 17:00:00 +0000

Every day, thousands of hours of new video content sits waiting to be discovered. Most of it lives in long-form, horizontal formats, while audiences are scrolling through vertical feeds on their phones.

Glance, a mobile-first content platform, knows this challenge well. The company processes 1-2 hour videos from sources like podcasts, news reports, movies, and web series, and transforms them into 30 to 180-second vertical clips optimized for mobile lock screens. With daily volume projected to grow from 3,500 to over 10,000 videos per day, manual editing wasn’t a realistic path forward.

The solution also needed to go beyond simple cropping. It required the intelligence to identify and center the primary speaker, or dynamically split the screen to stack speakers vertically during conversations, preserving the context that makes content worth watching.

Here’s how Glance’s video generation solution works.

Building for the lock screen era

The goal was to create a complete pipeline that takes a long-form landscape video (16:9) and outputs multiple ready-to-publish short-form portrait videos (9:16). The solution needed to handle:

Key Moment Identification: Finding the most engaging 60-second segments within hours of long-form footage
Active Speaker Detection: Identifying who’s talking in each frame and positioning them at the top of a split screen. This includes distinguishing between a static image and a live person to ensure the crop focuses on the actual speaker.
Split Screen Detection: Recognizing interview layouts (common in news broadcasts) and stacking the frames vertically to preserve conversation context
Intelligent Reframing: Converting a multi-speaker, wide-screen shot into a focused, vertical frame without losing context
Dynamic Caption Highlighting: Generating word-level timestamps for "Karaoke-style" captions that increase engagement on silent-by-default mobile screens
Automated Branding: Applying masks, logos, and overlays programmatically to maintain brand consistency across all videos

The final technical solution uses Google Cloud Speech-to-Text v2, Gemini, and the Google Vision API, combined with custom video manipulation using Samurai (an open-source object tracking tool), OpenCV and MoviePy.

Architecture overview

The pipeline is divided into three distinct modules.

Fig. 2: High-level architecture

Module 1: Video clipping

This module converts long videos to transcripts, identifies key segments, and clips the video. Accuracy matters here: precise word-level timestamps ensure clips start and end exactly where they should.

Fig. 3: Video Clipping Workflow

The process involves audio extraction, speech-to-text transcription, and timestamp identification using generative AI. The module performs the following key functions:

Audio extraction: Extracting the audio from the original video file.
Speech-to-text transcription: Converting audio into text with precise timestamps for each word
Segment identification: Using Gemini 2.5 Flash (aka Nano Banana) to analyze transcripts text and identify optimal start and end timestamps for short video clips
Video clipping: Clipping the video into short segments based on the identified timestamps
Transcript validation: Using Gemini to verify phrases and words are accurately captured (this step does not validate word timing)

The output is a set of short video clips, each paired with its time-aligned transcript, ready for the next stage: the Intelligent Reframing Engine.

Module 2: Intelligent Reframing Engine

The core technical work here is converting a horizontal 16:9 frame into a compelling 9:16 vertical frame. A simple center crop often cuts out key speakers or action, so our solution uses a multi-stage scene analysis pipeline.

Fig. 4: Intelligent reframing engine

Active speaker detection

To know what to crop, we first need to know who’s talking. This happens on a frame-by-frame basis using the face detection capabilities of the Google Cloud Vision API.

Fig. 5: Active speaker detection

The liveness check: Differentiating a live speaker from a static image (like a photo on the wall or a graphic) is essential. This was achieved by tracking facial landmarks:

Mouth movement: Calculating the normalized distance between upper and lower lip landmarks
Head movement: Tracking changes in head pose angles (pan, roll, tilt)
A face must show consistent animation in these cues to be classified as a "live" participant

Quantifying engagement: Once confirmed as live, we calculate an activity score based on:

Mouth openness
Emotional fluctuation (changes in joy, surprise, etc., provided by Vision API)

Primary speaker identification: The final decision uses a liveness ratio:animated frames divided by total frames where the face appears. The person with the ratio closest to 1.0 (meaning they were consistently animated on screen) is designated as the primary speaker.

One edge case addressed during the development was a static background image appearing behind a live news anchor (as shown in Fig. 6). The liveness check handles this correctly because the static image shows no facial animation.

Fig. 6: Scenario with one active speaker and one static background image

Split-screen detection

This step addresses interview scenarios where two subjects appear on opposite sides of the landscape frame. The system detects split-screen layouts and stacks the two halves vertically to maintain conversation context.

Fig. 7: Video reformatting

With active speaker detection complete, the system uses the primary speaker's location to identify split-screen segments. The goal is to find the precise dividing line between panels, enabling the video to be reformatted into a vertical, top-and-bottom layout. Two complementary approaches accomplish this:

Approach 1: Continuous face tracking with Samurai

This method uses Samurai, an open-source object tracking tool, to follow the primary speaker continuously. The trajectory is analyzed for split-screen layouts based on:

Consistent off-center positioning: The speaker remains on one side of the screen (e.g., left or right half), indicating a split panel rather than free movement across the frame.
Vertical dividing line detection: Image analysis identifies a persistent vertical line separating the two panels.
Background discontinuity analysis: Differences in color, texture, and scenery between the speaker’s background and the opposite side confirm two separate video feeds.

Fig. 8: Background discontinuity analysis

Approach 2: Frame-by-frame detection with Google Cloud Vision API

This approach uses Cloud Vision API's face detection to identify split-screen layouts based on the primary speaker's face location:

Off-center face: Consistent face detection in one region (such as the left 40% of the frame) flags a potential split screen.
Proximate dividing line: Vertical lines between the face and the screen center confirm a panel boundary.
Contrasting backgrounds: Inconsistent backgrounds between the speaker's side and the far side confirm the split-screen layout.

The output: Vertical stacking

Once the system recognizes a split-screen, it performs a digital cut-and-paste. This preserves both speakers and their reactions in a mobile-native format.

Automated reformatting

With the scene analysis complete, the OpenCV2-based solution intelligently applies the appropriate reframing rule to each segment:

Single speaker crop: For scenes with one primary speaker, the system anchors the 9:16 frame to the speaker’s face, keeping them centered.
Split screen: When a split is detected, the system slices the frame along the dividing line and stacks the panels vertically (left panel on top, right panel on bottom).
Multi-speaker crop: For scenes with multiple people (not a formal split), the system focuses the crop on the most prominent speaker or the face closest to the center.
Fallback: If no faces are detected (e.g., graphics or wide shots), the system applies a center crop or horizontal padding (letterboxing).

Two final techniques ensure a polished look:

Short scene merging: Segments shorter than a defined threshold merge with the preceding or following scene, eliminating flicker.
Camera smoothing: When focus shifts between speakers, a virtual camera effect creates a slow pan from one position to the next, rather than an abrupt cut.

Module 3: Finishing and branding

The final stage ensures the clips are ready for immediate publication, focusing on viewer engagement and brand reinforcement.

Dynamic caption highlighting

Using the word-level timestamps from the speech-to-text module, the system overlays highlighted captions with MoviePy. This involves:

Fig. 9: Dynamic caption highlighting

Sentence reconstruction: Grouping individual words into readable lines that adhere to character limits
Highlighting: The currently spoken word is highlighted in a distinct color (mustard yellow) against a black background, a proven method for increasing engagement when videos play without sound.

Masking and logo placement

Two overlay techniques maintain consistent branding across all videos:

Mask placement: A PNG mask with an alpha channel resizes the video to fit precisely into the transparent area. The mask's opaque regions (such as colored bars) serve as a dedicated background for captions and persistent graphics.
Logo overlay: The brand logo is placed onto the video based on configurable parameters for position (top-right, bottom-left, and so on), size, and margin.

Fig. 10: Mask and logo placement

Conclusion

Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. By combining speech-to-text transcription, computer vision, and generative AI, the system transforms thousands of long-form videos into mobile-ready clips each day, preserving narrative context while optimizing for vertical viewing.

The approach offers a template for any organization sitting on long-form video archives. Rather than choosing between scale and quality, automated pipelines can deliver both.

If you’re exploring similar video processing, content transformation, or media AI projects, the Google Cloud consulting team is eager to connect and explore the possibilities. For more on the AI products used in solutions Glance’s this, visit our AI & ML Products page.

_{This solution was a collaborative effort between Glance ( Pradeep Tiwari , Himanshu Aggarwal) and Google Cloud Consulting (Sharmila Devi, Jinyeong Yim, Rohit Sroch, Neeraj Shivhare and Kinjal Singh).}

The top announcements for startups from Google I/O ‘26

Thu, 21 May 2026 16:00:00 +0000

Many of the world’s fastest-growing AI startups are choosing to build their future — and the world’s — on Google Cloud because of our complete and open AI stack. We embed AI into every layer of our architecture so your team is fully equipped to build and scale for the agentic era.

At Google Cloud Next ‘26, we focused on providing updates to each layer of the stack to help lean teams move faster and operate more efficiently. We introduced a unified Agent Platform that moves beyond isolated AI tools to a complete lifecycle platform and a new generation of TPUs optimized for both training and inference. This was complemented by the world’s first Agentic Data Cloud, a system designed to take on the shift from human scale to agent scale and updates to our AI-powered cybersecurity platform, now combining Google’s Threat Intelligence and Security Operations with Wiz’s Cloud and AI Security Platform to prevent, detect, and respond to threats in the agentic era.

A diverse set of global startups are already putting this stack to work. For example, Photoroom relies on our highly optimized compute to process over 1,000 images per minute. In the enterprise customer engagement space, Satisfi Labs leverages Gemini, BigQuery, and AlloyDB to deliver intelligent interactions and real-time insights to 800 sports, entertainment, and tourism clients. Meanwhile, global collaboration platform Notion is at the forefront of integrating artificial intelligence into productivity tools, and Mantis AI operates high-throughput inference pipelines to deliver video-native intelligence to major broadcasters.

Now, at Google I/O ‘26, we are expanding on that foundation by introducing more cost-effective frontier models and bringing these agentic capabilities directly into your development workflow. We are bridging the gap between local prototyping and cloud scale, giving your team a straightforward path to build, test, and distribute your applications.

Let’s look at the key announcements for startups from Google I/O and how you can apply them to your business.

1. Smarter, faster models

Models continue to be the foundation of what many startups are building with Google Cloud, so we’re excited to have evolved them once again. This new generation of models deliver maximum intelligence with incredible efficiency.

Gemini 3.5 Flash delivers intelligence that rivals large flagship models at Flash speed. It’s our strongest agentic and coding model yet, ideal for tackling long-horizon agentic tasks, often at less than half the cost of comparable models.
Gemini 3.5 Pro (pre-announcement): We announced our flagship reasoning model officially rolls out next month.
Gemini Omni is a groundbreaking new model that produces dynamic video content by blending text, audio, image, and video inputs. Gemini Omni delivers a highly intuitive approach to video creation and editing — whether you are developing interactive virtual try-ons for e-commerce, streamlining complex post-production workflows, or generating tailored video narratives, Gemini Omni unlocks new ways to create content and drive deeper customer engagement.

What it means for startups: You now have direct access to build with new state-of-the-art models from DeepMind. We are continuing to push the boundaries of what AI can do, ensuring that as we pioneer new frontiers, we simultaneously deliver the speed and cost-efficiencies your startup needs to scale.

Landing in the top-right quadrant of the Artificial Analysis index, 3.5 Flash delivers frontier-level intelligence at exceptional speed — proving you no longer have to trade quality for latency.

2. AI that works for you

At Next ‘26, we introduced purpose-built agents to handle the heavy lifting at every layer of the stack — from threat-hunting SecOps agents and modular data agents, to self-driving cloud infrastructure that autonomously fixes misconfigurations.

But that massive, specialized agentic power needs a coordinator. Enter Google Antigravity — the ultimate control plane. These new updates transform how applications are built, deployed, and managed.

Antigravity 2.0: A dedicated, standalone desktop application for Mac, Windows, and Linux. This acts as an "agent-first" workspace to build, test, and orchestrate complex AI workflows without being tied down to a traditional code editor (IDE).
Antigravity CLI & SDK: For developers who want to stay at the keyboard, the blazing-fast and stack-agnostic command-line interface lets you run and monitor agents seamlessly. Meanwhile, the Python SDK opens up Google's internal agent infrastructure, allowing you to code and control stable agent loops programmatically.
Dynamic subagents: This allows a primary AI agent to automatically spawn smaller, specialized child agents to handle focused subtasks. It unlocks massive parallel engineering output — your main local agent can delegate a database query to a Cloud Data Agent, for example, or spin up a local code-review subagent, all without cluttering its main memory space.
Scheduled tasks: Remove manual maintenance chores by setting background timers and cron schedules. Startups can instruct Antigravity to trigger your cloud-based observability agents or run local repository sanity checks every night at midnight — completely autonomously.
Enterprise-grade security: Antigravity connects local desktop and terminal agent loops directly to your private Google Cloud projects. By inheriting Google Cloud’s standard data privacy protections and Terms of Service, this ensures your customer data is in your control and agent activity runs within your secure cloud boundary by default.

What it means for startups: You no longer just have a coding assistant; you have an entire fleet of specialized AI engineers at your fingertips. By combining the purpose-built cloud agents launched at Next with the Antigravity control plane, a lean team can orchestrate data pipelines, manage security, and execute massive parallel coding tasks — all from a single, secure environment.

Antigravity 2.0 can deploy simultaneous, agent-driven execution. This example shows automated code generation for your website, creation of on-brand assets, and personalized customer email development. Sequences shortened throughout

3. More ways to accelerate development

We are streamlining the developer workflow to help you move from a prompt to a production-ready application with far less friction. This year’s updates focus on removing the infrastructure setup that traditionally slows teams down.

Native Android support in AI Studio: You can now go directly from a natural language prompt to a fully native Android app within the browser. This includes support for the Google Play Console, allowing developers to publish apps directly to the test track without managing local SDK environments.
The seamless handoff: Through a new integration, you can export entire projects from Google AI Studio directly to your local Antigravity environment with a single click. This transfers your complete codebase, files, and conversation context so you can transition from web prototyping to local development without losing your place.
Managed agents: For lean startup teams, building a production-grade agent shouldn't require managing complex infrastructure. Available across both the Gemini API and Google Cloud's Agent Platform, the new Managed Agents API acts as an agent-as-a-service so you can "manage the mission, not the machine." Simply define your instructions and tools, and a single API call will spin up your agent within a secure, ephemeral Google Cloud sandbox. This allows your team to offload the heavy lifting of backend maintenance and focus entirely on building great agentic experiences

What it means for startups: We are providing a straightforward path from prototype to production. You can quickly test app concepts in the browser, move them to a local workspace for deep orchestration, and deploy user-facing agents using managed cloud infrastructure — saving your engineering team weeks of setup and maintenance.

4. A boost for your personal productivity

While your engineers are accelerating product development, we also want to help founders and operators manage the daily noise of running a company.

To help with this, we are introducing Gemini Spark — a new 24/7 personal AI agent that works in the background across Google Workspace and other daily tools. Instead of just answering questions, it can autonomously execute multi-step workflows on your behalf.

For example, Spark can identify a critical product delay, cross-reference your team's documents to recalculate the timeline, update internal tracking sheets, and draft an update email to your investors — all while waiting for your explicit approval before executing.

What it means for startups: While Antigravity builds your product, Spark acts as your digital chief of staff. It handles the routine, manual operational processes so you can stay focused on high-impact, strategic innovation.

Start building today: The Google for Startups AI Agents Challenge

Open globally to eligible startup founders and developers, this competition equips your team with $500 in cloud credits and access to our new Agent Platform, so you can build autonomous agents and compete for a share of a $90,000 prize pool.

We’re offering separate tracks, whether you want to build a net-new agent from scratch, optimize an existing prototype for production, or prep a business-ready agent for enterprise distribution, there is a track tailored to your exact stage. Submissions are open until June 5, 2026, and will be evaluated on technical implementation, business case, innovation, and your final demo. Learn more and sign up for the challenge here.

AI Studio unlocks full-stack vibe coding with Cloud Run, Firebase, and Cloud SQL, no credit card required

Thu, 21 May 2026 16:00:00 +0000

At Google I/O 2026, we announced updates to the integration between Google AI Studio and Google Cloud:

New users can deploy up to two full-stack applications to the Google Cloud Starter Tier, no billing account required
An expanded choice of databases: Firestore for non-relational data, and Cloud SQL as a new relational database option
Tight integration with Google Workspace tools like Sheets, Calendar, and Gmail using Firebase Auth as the single user login flow

This is an update to the integration we announced in March, which included support for vibe-coded full-stack app deployments from AI Studio powered by Cloud Run, Firestore, and Firebase Auth.

With this expanded integration, you can use AI Studio to build a broader set of applications, using either a relational database with Cloud SQL or a non-relational database with Firestore. You don’t even need to specify a database — the AI agent can infer the right database for your app or feature.

Get started today in AI Studio at no cost with Cloud Run, Cloud SQL for PostgreSQL (coming next month), Firestore, and Firebase Auth for Starter Tier.

Publishing a full-stack app from AI Studio to Cloud Run with a single click

An easy on-ramp: The Google Cloud Starter Tier

You can build applications in AI Studio and deploy your prototypes directly to Cloud Run, authenticate via Firebase Auth, and store your data in a Firestore or Cloud SQL database. No credit card, no Google Cloud account, no friction — just prompt and launch.

If you don’t have an account, AI Studio uses the Google Cloud Starter Tier to create resources for you. You can deploy up to two full-stack apps. If you outgrow the limits of the Starter Tier, you can upgrade to a standard Google Cloud project with a billing account. All your resources will be transferred to your billable Google Cloud project, so that your application can scale as it grows.

Powering full-stack vibe coding with Cloud SQL

We’re introducing an intelligent, automated data foundation that makes it easy for developers to focus on their applications, not their infrastructure.

AI Studio integration with Cloud SQL includes:

An instant on-ramp: Go from prompt to a fully-deployed PostgreSQL database rapidly with instant provisioning.
Zero-cost startup: Try Cloud SQL for the Google Cloud Starter Tier at no cost, without needing a credit card or Google Cloud account.
Flexible cost control: The AI agent uses a new Cloud SQL for PostgreSQL developer edition, which enables the backend to scale to zero automatically, so you only pay while you’re using the app.
Agent-driven experience: To update your application, enter new prompts and the AI Agent automatically creates the schema and executes SQL statements in the database.
Global scalability: While the interface is simple, your app runs on Google Cloud’s robust, highly-reliable, and securely designed infrastructure that can scale to support millions of users.

Creating an app powered by Cloud SQL for PostgreSQL developer edition

Full-stack vibe coding with Firestore and Firebase Auth

When you’re building an app in AI Studio, the agent proactively detects if you need data storage and authentication based on your prompt, and offers to set up a database and user authentication. For apps that benefit from a document database, the agent shows a card to turn on Firestore and Firebase Authentication with your approval.

Enable Firebase for your application when prompted by the agent

By clicking “Enable Firebase,” the agent automatically:

Provisions Firestore, enables authentication, and connects your app to the database
Creates your web app’s sign-in page and configures authentication with Google Sign In
Generates the Firestore code in your app so you can sync data across sessions and devices
Drafts and deploys Firestore Security Rules based on your app’s logic (but you should always double-check these rules before sharing or deploying your app!)

With Firebase Auth, you can:

Connect your apps to Google Workspace using natural language: When you ask for a feature involving Workspace (e.g. Sheets, Calendar, Gmail), the agent implements a “Sign in with Google” flow, powered by Firebase Authentication, designed to securely grant Google AI Studio access to your data.

Connect your app to Google Sheets, powered by Firebase Authentication

Check out more details on the What’s New from Firebase at Google I/O blog.

Getting started in AI Studio

Going from idea to app is now a reality. You can build a full-stack application at no cost using the following steps:

Log into AI Studio: Access the platform to begin your project.
Build with prompts: Start building your application using natural language prompts. For example, “Build an expense tracker app.”
Enable the database: Prompt “Add a database” and AI Studio intelligently provisions a database through an "Enable" widget. You can explicitly ask for a relational database if you’d like to make your preference clear.
Set up the system: Select “Enable” and agree to the terms.
Start sharing: Deploy and share the application through the “Publish” button.

Get started today in AI Studio to turn your ideas into live applications in seconds.

Agent Sandbox on GKE is now available for everyone, and a first look at Agent Substrate

Wed, 20 May 2026 16:00:00 +0000

In just a short time, we’ve seen AI transition from simple chat interfaces to autonomous agents capable of function calling, code execution, and persistent terminal use. But to orchestrate these capabilities securely, agents need more than just intelligence — they need a robust, hyper-scalable, secure compute environment in which to execute code.

Since our preview announcement of GKE Agent Sandbox at KubeCon NA in November 2025, the community adoption has rapidly accelerated: we have seen more than 16x growth in sandboxes on Google Kubernetes Engine (GKE) in less than 5 months.

We’ve worked with key customers like Langchain and Lovable, and many others who are rapidly deploying millions of agents into production. Since its unveiling, Agent Sandbox has evolved rapidly, moving from a new project to a mature product with stable APIs. This stability is now fueling its integration into the broader agent ecosystem, where it serves as a critical infrastructure layer.

Today, we are excited to build on this momentum in two ways:

GKE Agent Sandbox is now generally available, giving you a secure, scalable foundation for your agent workloads
Introducing Agent Substrate, a new open source project aimed at continuing to push the limits of agentic infrastructure density

Secure, low-latency execution at scale

Agent Sandbox is an open-source, cloud-native execution environment built on Kubernetes, designed specifically for the unique demands of AI agents. It provides the foundational infrastructure to empower builders to safely and securely execute untrusted logic on top of their own infrastructure with industry-leading speed and efficiency.

With this release, we are delivering on the core requirements of modern agent workloads:

Reduce idle compute with pod snapshots: Agents often have short bursty cycles followed by longer idle periods. Instead of wasting valuable compute to keep the agent running, GKE Agent Sandbox integrates with Pod Snapshots to suspend your idle agent workloads and resume them in seconds upon request.
Low latency sandbox provisioning: Initializing a new sandbox instance for every request introduces unwanted seconds of cold start latency. GKE Agent Sandbox introduces a Sandbox API with an integrated warm pool. The Agent Sandbox API's integrated warm pool enables GKE to allocate 300 sandboxes per second, per cluster, at sub second latency, with 90% of allocations completing in 200 milliseconds.
Cost-effective warm pool: GKE Agent Sandbox warm pools keep pre-provisioned replicas ready to minimize sandbox startup latency. To minimize the cost of maintaining a sandbox warm pool, Agent Sandbox is integrated with standby capacity buffers (suspended VMs) to provide a cold pool of suspended sandboxes that can quickly replenish the warm pool for a fraction of the cost.
Ironclad security & isolation: Agent Sandbox natively supports gVisor and default-deny Kubernetes network policy. Agent Sandbox provides pluggable interfaces for open source sandboxes like Kata Containers, enabling users to customize their kernel isolation.

As the demand for compute continues to rise, this release ensures our customers have access to the broad range of Google Cloud compute options. GKE Agent Sandbox delivers up to 30% better price-performance when running on Axion processors than comparable hyperscaler cloud providers.

The next revolutionary step forward in agentic infrastructure Agentic workloads are simultaneously scaling up to the 10s to 100s of millions of instances while at the same time becoming increasingly idle, waiting for human interactions, events or triggers. These workloads continue to demand strong kernel and network isolation, making dense scheduling a challenge. Handling this level of scale and rapid suspend-and-resume is pushing the limits of the Kubernetes control plane.That’s why we are introducing Agent Substrate, a new open source project aimed at addressing the performance and density needs of ultra scale agents.

Agent Substrate introduces a new level of abstraction that moves agents onto and off of ready compute capacity (running in Kubernetes, of course) in real-time. Agent Substrate takes the core secure runtime and snapshotting capabilities of Agent Sandbox and pairs them with a minimal control plane designed to bypass some of the limitations of Kubernetes, without reinventing the rest of it.

This lets Agent Substrate optimize the critical paths to offer lower latency with higher scale and efficiency. While standard Kubernetes is optimized to handle thousands of long-running services, Agent Substrate is designed for the chatter of millions of sub-second tool calls that would otherwise overwhelm a standard control plane. It provides the perfect foundation for Agents, Agent Harnesses and Agent Runtimes, including the new Agent Executor project.

Agent Substrate’s goal is to explore every opportunity to make things move faster and scale bigger. Achieving this level of scale and efficiency is going to push the bounds of what current compute infrastructure can do, and no rock will be left unturned. One such exploration is to bring data locality into the core of the scheduler, ensuring that agent state and scheduling work together to shave off every possible millisecond of overhead.

Building the future in the open

In the early days of Kubernetes, the feedback and perspective from diverse contributors solving similar challenges was critical to setting the project up for success. We believe that agent infrastructure is at a similar inflection point. Today, we're hoping to recreate that magic of radically open and collaborative innovation to shape the future of agent infrastructure together. By kicking off the Agent Substrate project in the open, we are inviting the community to help design and build this critical next mode of infrastructure.

Get started today

As we look toward a future of autonomous agents, we are excited to continue to build the critical layers of the stack. We invite you to use Agent Sandbox to power your workloads today, and join us in the open-source community to collaborate on Agent Substrate – the next chapter in agent-native infrastructure.

Try Agent Sandbox on GKE
Contribute: Join the Agent Sandbox open-source community
Explore Agent Substrate

Introducing Agent Executor, Google’s distributed Agent Runtime

Wed, 20 May 2026 16:00:00 +0000

As models and harnesses improve, agents are taking on increasingly complex tasks that can run for hours or even days. But as we push agents to do more, this has surfaced a new operational problem: long-running agent workflows are fragile and incredibly hard to manage reliably and efficiently in production.

Today, we’re introducing Agent Executor, Google’s open-source runtime standard for agent execution, resumption, and distributed deployment. Based on what we’ve learned from solving these challenges internally, we’ve built Agent Executor to have the following native capabilities:

Durable execution: Long-running execution requires the ability to resume after outages or agentic interruptions such as human-in-the-loop (HITL) confirmations. Agent Executor provides this backend resilience automatically for any actor (e.g., an agent, agent harness, skill, tool, or sandbox) through its event log and snapshotting.
Secure isolation: Agent Executor isolates components in secure-by-design sandboxes to prevent harmful side effects and help ensure malicious activity cannot compromise the broader service. Sandboxes are especially useful when agents generate code or handle multiple tenants or user data concurrently.
Session consistency: In distributed agent workflows, multiple components may attempt to update shared session state at the same time. Agent Executor’s built-in single-writer architecture helps maintain consistency and reduces the risk of corruption in that state.
Connection recovery: In long-running agentic execution, clients may disconnect for many reasons, including network outages. Agent Executor lets clients reconnect to agents and backfills responses from the last sequence seen by the client for a better user experience.
Trajectory branching: Checkpoints let you branch an agentic trajectory (its decision or workflow path) at any point, allowing agents to test or evaluate different paths without losing context or other state.

In this blog, we’ll share more about Agent Executor and how you can get started.

Federate with Google’s agent runtime

Enterprise adoption of agents requires orchestration across deployment models. Some teams need on-prem infrastructure for proprietary workflows, performance, or compliance, while others prefer pre-built or custom managed agents for faster time-to-value. At Google I/O, we introduced a new suite of such solutions – including Antigravity 2.0 and the Managed Agents API – designed to accelerate how teams build and scale within the agentic enterprise.

Agent Executor bridges these deployment models, letting you mix-and-match between any or all of:

Google Antigravity, Gemini’s state-of-the-art agent harness
Google-built frontier agents, such as the latest Deep Research agent
Custom agents built by you and managed by Google (e.g., via the new Managed Agents in Gemini API)
Custom purpose-built agents, built with LangChain/LangGraph, Agent Development Kit (ADK), etc and any agents using Agent2Agent Protocol (A2A)

Own your agents, models, and compute

With Agent Executor, enterprises have maximum flexibility to maintain sovereignty over workloads and keep proprietary workflows within their self-managed compute and custom sandboxes. Your internal development teams have much more flexibility over how agents are deployed and managed and you benefit from:

Prevent vendor lock-in: Deploy your agents on your own infrastructure without being tethered to a specific provider’s model or compute environment. This allows for full control over data residency and your cost and budgetary controls.
Bring your own harness and agents: Agent Executor is designed to be harness-agnostic, allowing you to bring your own or use those made available by other vendors. It also supports agents developed with industry-standard frameworks and protocols providing a broad ecosystem of compatible agents.
Fully control execution: Agent Executor allows developers to run the entire agentic stack, including MCPs, skills, and other agents, directly on their own data plane. Developers can choose any compute with custom isolation boundaries and workload policy enforcement.

Scale agents up on Kubernetes with an agent-first compute layer

As agent workloads scale into the hundreds of millions and become increasingly long-running, our customers are hitting the limits of traditional compute abstractions because unlike traditional software, agents are nonlinear programs that wait for external inputs. To solve this problem, we’ve partnered with the Google Kubernetes Engine team on Agent Substrate, a new open-source project also announced today.

Agent Substrate introduces a new level of abstraction for Kubernetes that moves agents onto and off of ready compute capacity in real-time, resulting in lower latency with higher scale and efficiency. While standard Kubernetes is optimized to handle thousands of long-running services, Agent Substrate is designed for the chatter of millions of sub-second tool calls that would otherwise overwhelm a standard control plane. Agent Substrate takes core secure runtime and snapshotting capabilities of existing sandbox infrastructure and pairs them with a minimal control plane designed to bypass some of the limitations of Kubernetes, without reinventing the rest of it. Working together, these layers enable you to:

Maximize compute efficiency: Agent Substrate introduces a new control plane designed to handle hundreds of millions of registered agents. Together with Agent Executor, Agent Substrate can provide a foundation for today’s largest agent deployments.
Stay within the Kubernetes ecosystem: Agent Substrate is built on top of Kubernetes and allows scheduling and horizontal scaling of compute with declarative configuration.

In the demo below, we showcase using Agent Executor together with Agent Substrate with a sample workload.

Get started today

Models, agents, harnesses, and the infrastructure around them are all evolving faster than ever. We’re building Agent Executor in the open so we can validate the design in the hands of real developers and improve based on your feedback.

Agent Executor is available now in preview. We invite you to explore the code, test it with your own workloads, and help shape the future of agent runtimes. Head over to our GitHub repo to get started today.

Benchmark and optimize LLMs on-device with AI Edge Portal

Wed, 20 May 2026 16:00:00 +0000

LLMs have become more powerful at smaller sizes, but deploying them to edge devices like smartphones remains a massive challenge. Today, developers have to optimize across a sprawling combination of accelerators, operating systems, and countless System-on-a-Chip (SoC) configurations, often relying on manual testing with just a handful of devices. Google AI Edge Portal helps solve these challenges.

By letting developers test ML workloads across a fleet of over 120 representative Android device types, Google AI Edge Portal provides deep insight into latency and performance across all CPU, GPU, and NPU backends.

Today, we are excited to announce two new capabilities that expand Google AI Edge Portal’s capabilities for the generative AI era: benchmarking and debugging on-device LLMs. These new services give developers what they need to optimize generative AI performance accurately and efficiently across the entire Android ecosystem.

Benchmark LLMs across over 120 different mobile devices

When a user interacts with an LLM-enabled experience in your app, they expect fast and consistent performance on their device. Common challenges like initialization time can result in your app appearing to freeze, or, in a worst case, crash completely if the model consumes all available memory.

With the latest release of Google AI Edge Portal, you can now run automated gen AI benchmarks directly on a physical lab of over 120 diverse Android devices and test for these scenarios specifically. Portal natively supports CPU and GPU benchmarking for LLMs in the LiteRT-LM format.

Customers can benchmark GenAI models on over 120 Android devices, viewing metrics including initialization time, prefill speed, decode speed, and peak memory usage.

When you trigger a gen AI benchmarking job with Portal, it profiles the critical metrics that dictate your end-users’ experience when interacting with your AI application on-device:

Metric	What it measures	Why it matters to you
Initialization time	Measures how long it takes to load your model into memory.	High initialization time can result in delays, or freeze the user interface when your application starts up.
Prefill speed	Captures how fast the device processes prompt tokens to generate the first output token.	Dictates the initial delay before the user sees the first response.
Decode speed	Captures how fast the model generates tokens during a response.	Dictates the speed at which output is generated.
Peak memory	Monitors maximum RAM usage.	Flags potential “out of memory” crash risk, especially prevalent on memory constrained devices.

With these insights, you can confidently decide which devices are ready to host your model and adjust or better optimize your LLMs for device targeting before shipping.

Debug performance easily with Model Explorer

Benchmarking is only useful if you can fix the discovered performance issues. When an LLM performs poorly, finding the root cause within the complex graph of multiple layers and thousands of nodes is a daunting task for developers, involving tedious and time-consuming searching that can take hours if not days.

To bridge this gap, we have added the ability to visualize and compare model graphs in Portal with ease. Through the natively integrated Model Explorer, our graph visualization tool, you can search and locate specific nodes, compare models side-by-side in the same tab, and view tensor shapes, trace inputs and outputs, and more. To further speed up debugging for teams, we also added the ability to take screenshots and share specific views directly with your collaborators in Google Cloud.

These visualizations are one of the most effective ways to identify targets for optimization, including:

Conversion: Model Explorer simplifies the identification of conversion anomalies through its dual-view comparison tool. This interface allows you to traverse complex model architectures by selectively expanding or collapsing specific layers, granting you the ability to analyze internal dependencies and structural nodes with precise granularity.
Quantization: Model Explorer aids in detecting specific operations where quantization may compromise performance. By sorting layers using error metrics, you can pinpoint precision loss, access granular per-layer data, and evaluate various quantization strategies to achieve an optimal balance between model footprint and output quality.
Optimization: Use Model Explorer to visualize hardware compatibility, organize operations by latency, and conduct granular, per-op performance comparisons across different hardware accelerators.

With Model Explorer, you can view model graphs, search for specific layers, and compare models side-by-side to debug performance.

Start benchmarking LLMs on-device today

With the era of LLMs on-device here, we are excited to help close the critical gap in benchmarking to bring the power of AI to the thousands of types of smartphones on the market today. To utilize these latest features, please complete our sign-up form here to express interest.

Google AI Edge Portal is currently available in private preview for allowlisted Google Cloud customers. During this private preview period, access is provided at no charge, subject to the preview terms. All current allowlisted customers will receive access to these new features automatically.

We can’t wait to see what gen AI capabilities you are able to deploy across the full spectrum of devices with Google AI Edge Portal!

_{Thank you to the members of the team, and collaborators for their contributions in making the advancements in this release possible: Akshat Sharma, Ami Kubota, Charlie Xu, Chunlei Niu, Cormac Brick, Derek Bekebrede, Eric Yang, Jing Jin, Kathleen Low, Matthias Grundmann, Marissa Ikonomidis, Na Li, Ram Iyengar, Sachin Kotwani, Sommayah Soliman, Tenghui Zhu, Xiaoming Hu, Zi Yuan}

Everything Google Cloud customers need to know coming out of Google I/O

Tue, 19 May 2026 17:45:00 +0000

At Google Cloud Next ‘26, we unveiled the blueprint for the Agentic Enterprise, sharing our eighth-generation TPUs, Gemini Enterprise Agent Platform, a fully reimagined Agentic Data Cloud, Workspace Intelligence, and security built for the AI era.

Today at Google I/O, we’re delivering a new set of powerful AI innovations and models — and putting them directly in the hands of Google Cloud customers via Gemini Enterprise and Google Workspace:

Gemini 3.5: Our latest family of models combines frontier intelligence with action – starting with Gemini 3.5 Flash.
Gemini Omni: Our new model is a leap forward in world understanding, multimodality, and editing, letting you generate any output from any input, starting with video.
Google Antigravity: Google Antigravity’s expanded capabilities and new integration with Agent Platform bring agentic development to your entire organization.
Gemini Spark: For Gemini Enterprise and Workspace customers, Gemini Spark is your 24/7 personal AI agent that helps you work more efficiently by autonomously taking action on your behalf, under your direction.
Google Workspace: Google Pics, our new image generation and editing tool, and new voice features in Gmail, Docs and Keep, help reimagine how you work.
Managed Agents API on Agent Platform: Allows developers to build and run custom agents inside secure, Google-hosted environments that seamlessly integrate with Agent Platform.
CodeMender: A powerful AI security agent provided through Agent Platform, CodeMender can help find and fix vulnerabilities in your code.

Expand what’s possible with Gemini 3.5

We’re kicking off the Gemini 3.5 series with the release of 3.5 Flash, which delivers frontier performance for agents and coding, excelling at complex long-horizon tasks that deliver real-world utility. Google DeepMind engineered these models from the ground up using our purpose-built AI infrastructure. This unique co-design of the model and hardware allows us to train deeper reasoning capabilities faster and more efficiently with every new generation.

Gemini 3.5 Flash delivers intelligence that rivals large flagship models on multiple dimensions at speeds you have come to expect from the Flash series. It’s our strongest agentic and coding model yet, outperforming Gemini 3.1 Pro on key benchmarks (Terminal-Bench 2.1: 76.2%, GDPval-AA: 1656 Elo, MCP Atlas: 83.6%) and leading in multimodal understanding (CharXiv: 84.2%). 3.5 Flash offers a balance of performance and speed, ideal for tackling long-horizon agentic tasks, often at less than half the cost of comparable models.

Gemini 3.5 Pro is currently in testing and will be coming next month.

We have partnered closely with leading organizations across industries to validate the Gemini 3.5 series within their own environments.

Gemini 3.5 Flash is rolling out today:

Developers can build agents using Gemini 3.5 Flash on the Gemini Enterprise Agent Platform, or use it in your projects in Google AI Studio and Antigravity.
Business users can start using Gemini 3.5 Flash in the Gemini Enterprise app to help discover, create, and use the best of Google AI in their workflows starting today.

Generate high-quality video content with Gemini Omni

Gemini Omni is a groundbreaking new model that produces dynamic video content by blending text, audio, image, and video inputs. Building on how Nano Banana reimagined what you can do with images, Gemini Omni delivers a highly intuitive approach to video creation and editing using natural language.

This capability redefines how enterprises produce and refine visual media with intuitive creation and precise editing. Whether you are developing interactive virtual try-ons for e-commerce, streamlining complex post-production workflows, or generating tailored video narratives, Gemini Omni helps your team unlock entirely new ways to create content and drive deeper customer engagement.

Gemini Omni Flash will be rolling out in the coming weeks to developers and enterprise customers via the Gemini API and Agent Platform API.

Make everyone a builder with Google Antigravity

Google Antigravity empowers the next era of enterprise builders. It enables your organization to transform how applications are built, deployed, and managed.

With Gemini 3.5 Flash, Antigravity delivers exceptional computational efficiency. This translates to more rapid development cycles and reduced operational costs for production-scale AI initiatives. Today, we are announcing new tools including:

Enterprise security and compliance: Google Cloud customers can now access Antigravity through Agent Platform. By inheriting Google Cloud’s standard data privacy protections and Terms of Service, this ensures your customer data is in your control and agent activity runs within your secure cloud boundary by default.
Antigravity 2.0 desktop app: At the core of the expanded Antigravity platform is Antigravity 2.0, a new standalone desktop app for builders that provides a centralized workspace to steer, customize, and orchestrate agents. For example, imagine managing a product launch and using Antigravity 2.0 to deploy simultaneous, agent-driven execution across critical project phases, including automated code generation for your website, creation of on-brand assets, and personalized customer email development.

Sequences shortened throughout

Antigravity CLI: For developers who want a more lightweight interface for rapidly building and deploying agents, we are also launching an Antigravity CLI that is tightly integrated with the desktop app.

We’re using Antigravity every day to build at Google, and we’ve already seen Cloud customers and partners using it, too:

“In modern enterprise architecture, the real bottleneck is the cognitive load on the engineer. Antigravity, augmented by Gemini within Accenture’s scaled agentic execution, abstracts away infrastructure complexity and automates delivery mechanics. By moving to a standalone application and consumption model, Google Cloud is making high-velocity engineering accessible at scale—enabling our best-in-class talent to focus on innovation and building resilient digital cores for our clients.” - Chetna Sehgal, Senior Managing Director and Global Practice Lead, Accenture Google Business Group

“Antigravity isn't just an assistant; it’s a core component of our technical DNA. Its ability to simulate user environments via the agentic browser has revolutionized our QA process, while the agent manager has streamlined our entire pipeline. Today, more than half of our production-ready code is generated through these agentic workflows, proving that the future of software isn't just AI-assisted—it’s AI-driven." - Nikunj Shanti, CTO, AirAsia Next

“Antigravity has transformed how our internal and Forward Deployed Engineering teams operate by enabling governed, autonomous software engineering workflows that adhere to Deloitte's enterprise security standards at massive scale. This capability allows us to rapidly accelerate the deployment of high-fidelity, AI-powered solutions for our clients and drive innovation across industries." - Faruk Muratovic, US AI & Engineering Strategy and Services Leader at Deloitte

"Antigravity has fundamentally shifted our team from manual coding to high-level orchestration. By leveraging the agentic reasoning of Gemini, we’ve moved beyond simple code completion to a governed, autonomous workflow that maintains architectural integrity through every sprint. The ability to transform a functional requirements document directly into high-fidelity code while automating unit tests and documentation has drastically reduced our concept-to-deployment lifecycle. Antigravity acts as an autonomous coding partner that empowers our engineers to shift from manual syntax entry to governed, intelligent execution." - John O'Rourke, Technical Director, AI Implementation, Monks

"We're moving past simple AI code completion to true agent orchestration. By using Google Antigravity to run engineering pipelines in the background, our teams eliminate traditional development friction, allowing us to build custom client solutions and deliver value at unprecedented speed." - Vikas Agarwal, CTIO, PwC Advisory, PwC US and Global

“WPP has integrated Antigravity into WPP Open, our agentic marketing platform to supplement our product development lifecycle. Leveraging the power of Gemini, it has streamlined workflows, automated repetitive tasks and empowered engineering teams to deliver high-quality solutions for our clients, faster.” - Callum Anderson, Head of Engineering, WPP Open

To get started, download Antigravity and log in to the desktop application or Antigravity CLI using your standard Google Cloud credentials. Antigravity will be available in Gemini Enterprise in the coming months.

Gemini Spark, your 24/7 personal AI agent

To realize its full potential, AI must serve as the connective tissue that empowers every individual within your enterprise. This requires intelligent personal agents that grasp your specific business context – like what you’re working on, who you’re working with, and even your writing style. These agents can autonomously execute multi-step workflows on your behalf, with your permission, freeing your teams from routine, manual processes so they can focus on the high-impact, strategic innovation that drives your business forward.

Gemini Spark in Gemini Enterprise is our new 24/7 personal agent that can work in the background across Workspace, custom connectors, and the open web. It enables you to:

Delegate complex work: Set recurring tasks, teach the agent new skills, and let it execute multi-step work on your behalf.
Maintain complete control: Spark proactively sends critical updates and requires explicit approval for high-risk actions like sending emails.
Personalize your agent-led experience: The more you use Spark, the better it learns your unique preferences and interactions to become a more accurate, helpful extension of your work style.
Connect to your tools and apps: Spark can use your existing Gemini Enterprise connectors, including Microsoft Sharepoint, OneDrive, ServiceNow, and many others.
Security and governed sandbox: Spark operates in a fully managed, secure runtime on Google Cloud, meaning you get enterprise-grade security without ever having to manage the underlying infrastructure. Every task executes in a fresh, strictly isolated, ephemeral VM to help ensure data never overlaps between sessions. To protect your enterprise, all traffic routes through our secure Agent Gateway that enforces Data Loss Prevention (DLP) policies, while user credentials remain fully encrypted and are never exposed directly to the agent.

Draft an email to stakeholders with Spark in Gemini Enterprise

Spark can assist with tasks for many different processes in your organization to help teams focus on more strategic work. For example:

Spark can identify a new product request related to critical functionality that needs to be fixed before launch and requires a major timeline change. In the background, it can automatically suggest code changes by working with Antigravity, create a Jira ticket for the development team to review the changes, cross-reference your team’s documents to recalculate the launch timeline, and update the internal status across relevant Sheets and Docs. It can then draft an email to relevant stakeholders confirming the actions it took and updating the team.
Spark can help someone working in IT operations by monitoring system health via ServiceNow, checking for relevant tickets. When it detects a recurring critical issue, it creates an escalated Jira ticket to the developer team, drafts a comprehensive incident report in Docs, and pings the IT manager via Chat to review and approve the stakeholder communication plan.
For salespeople, Spark can proactively prepare for client meetings by pulling account history from Salesforce and recent support tickets from Zendesk. Identifying a potential churn risk, Spark drafts a tailored account retention strategy in Docs and a customer-ready email, awaiting explicit user approval to send.

Gemini Spark in the Gemini Enterprise app is rolling out to customers soon. Gemini Spark in Google Workspace will be available soon in preview for business customers in the Gemini app.

Reimagine how you get work done with Google Workspace

Today, we also announced new ways AI can help you accomplish even more directly in your Workspace apps, available in preview for business customers this summer:

Google Pics: Our new AI-powered image generation and editing tool gives you precise control over your images. Pics allows you to easily move, resize, and transform individual objects as well as modify and translate text independently. Built right into apps like Drive, Docs, and Slides, it makes complex editing easy so teams can quickly update global marketing campaigns, swap out products in a photo, or resize backgrounds to fit different ad layouts.
Voice capabilities in Gmail, Docs and Keep: Brainstorm, organize and execute tasks hands-free in your favorite apps. You can quickly find a project due date from your inbox, iterate on a draft of your weekly report, or turn a messy brain dump into a structured list.

Unlock developer velocity with fully managed agents

With the Managed Agents API on Agent Platform, builders can now instantly spin up custom agents that reason, call tools, and execute code inside secure, Google-hosted remote environments using a single API call. This allows technical teams to offload complex infrastructure management and focus entirely on agent behavior. Managed Agents API will automatically inherit Agent Platform's enterprise-grade data privacy, governance, and security protections.

Visit the documentation to get started.

Secure the Agentic Enterprise

Advanced models, such as Gemini 3.5, unlock powerful new capabilities across your enterprise, including transforming AI security and vulnerability detection. Strategic risk mitigation, robust compliance, and trusted AI deployment are critical for sustained business growth.

In addition to sharing our findings and mitigations with the larger security and AI community, we are integrating CodeMender into Agent Platform. CodeMender is an AI code security agent, originally developed by Google DeepMind. Leveraging Agent Platform capabilities and advanced Gemini models, CodeMender autonomously identifies vulnerabilities within your code. It then recommends precise fixes, securely tests them, and can apply patches and necessary changes across dependent systems, with your approval. This entire process automates secure deployment while ensuring your developers retain control.

Several Gemini Enterprise customers are already testing CodeMender, and we will have more to share about expanded availability soon.

We are also making significant strides in ensuring platform trust and safety for synthetically generated media with a new AI Content Detection API, rolling out today on Agent Platform. This provides businesses with a powerful method to identify AI-generated content from both Google and other popular models, supporting responsible media governance. Learn more about our expanded tools for content transparency and verification.

Looking ahead

With the latest Google Cloud updates unveiled at I/O, we’re making it easier than ever to move from initial idea to meaningful impact. These new tools are designed to streamline your workflows and empower your teams. For a deep dive into all the news and live product demonstrations, catch the full I/O keynote.

What Google I/O '26 means for developing agents on Google Cloud

Tue, 19 May 2026 17:45:00 +0000

At Google I/O, we introduced a unified development toolkit featuring Antigravity 2.0 and the Managed Agents API, giving developers better ways to build locally and deploy securely to the cloud on a shared protocol layer. In this blog, we’re going to show you how Gemini Enterprise Agent Platform and the new developer tools shared at I/O fit together, unpack the spectrum of choice for building, and share what we’d actually try first.

Following the evolution of Vertex AI into the Gemini Enterprise Agent Platform – a comprehensive platform to build, scale, govern, and optimize agents with new features like session memory and centralized governance – we are now extending these capabilities directly into your local development tools. Our goal is to bridge the gap between high-speed prototyping and secure, compliant corporate deployment, offering a modular approach where you can choose between quick-start workflows or full production control to fit your stack's specific needs.

Here’s how those pieces now lay out across the entire spectrum of choice.

The four rungs: The spectrum of how to build agents

We like to think of the agent development ecosystem as four rungs on a ladder, designed to give you a clear slider between out-of-the-box configuration and complete code-first control. They're deliberately additive, meaning that starting fast on the lower rungs above never locks you out of graduating to the deeper customization of the rungs above.

Underneath all four rungs is the A2A protocol. This interoperability ensures that an agent built on the first rung can be called as a sub-agent on the fourth rung, allowing your entire architecture to scale seamlessly on the same infrastructure.

Rung one: Agent Studio (low code)

A visual workspace inside Agent Platform. You discover models in Model Garden, engineer prompts, wire up tools, and ship an agent without writing code. Best for business-facing teams and rapid prototyping. The agent you build here runs on the exact same runtime as everything below it.

Rung two: Managed Agents API

New at I/O, the Managed Agents API is for technical teams who want to “manage the mission, not the machine." It allows you to define agentic behavior and let Google Cloud handle the heavy lifting, acting as an agent-as-a-service with nothing to manage.

You use the Managed Agents API to configure your agent, and the Interactions API to invoke it. You package your instructions, skills, and tools, POST them, and Gemini builds and runs the agent.

What makes this deployable is the Google Cloud sandbox, which is secure by design. The agent harness runs on our servers, and each agent has its own ephemeral sandbox provisioned with your skills, Model Context Protocol (MCP) servers, and server-side tools. Full integration with A2A and Agent Platform governance and security are coming soon.

Rung three: Antigravity and friends

Antigravity is our primary solution for developers looking to leverage AI for coding tasks and agent orchestration, enabling teams to transform how apps are built and deployed. We've consolidated our developer-facing coding strategy into this single, powerful harness shared across multiple surfaces.

It’s co-optimized with the Gemini family of models, offering high efficiency to speed up development cycles and reduce costs. Skills you develop with Antigravity are intended to be portable across different surfaces.

This is for development teams who want to utilize Google's advanced reasoning capabilities within their coding workflows, implement custom development loops, and transform how they build, deploy, and manage applications.

Today, we are expanding this with new tools:

Antigravity 2.0: A new standalone desktop application providing a centralized workspace to steer, customize, and orchestrate coding agents. Developers can use this to manage complex tasks, such as orchestrating agents to refactor code, generate unit tests, or even scaffold new service components based on a specification. Agents can spin subagents from a single prompt, while multi-agent orchestration allows tasks to run in parallel.
Antigravity CLI: This brings the full Antigravity experience to the command line: same harness, same agent, same quality of intelligence as Antigravity 2.0, with a product experience tailored for the terminal. It's optimized for speed and lower overhead, and adapts entirely to you. The CLI is tightly integrated with the desktop app, sharing authentication, context, skills, and configurations, providing a consistent experience across both interfaces. Use the Antigravity SDK to build your own runtime.
Enterprise security and compliance: Google Cloud customers can now use Antigravity 2.0 and Antigravity CLI with their Gemini Enterprise Agent Platform project. All you have to do is to log in with Cloud OAuth, set your Agent Platform Project ID and region. This ensures that all agent inference runs via Agent Platform models within your secure cloud boundary, inheriting Google Cloud’s standard data privacy protections and Terms of Service. This ensures your customer data is in your control , and you can utilize regional model endpoints.

Integrating other coding agents

While Antigravity is our recommended agentic coding solution, Google Cloud is designed to work well with any coding agent you choose. Our platform is open, and we provide tools to ensure flexibility:

The Agent CLI and Agent Development Kit (ADK) allow you to build and interact with agents from various sources, including tools like Claude Code. This means developers can often keep their preferred interfaces while running the underlying AI inference on Google Cloud. This approach ensures your workflows benefit from Google Cloud's security, compliance, and infrastructure.
Our Skills for Google products, launched at Next, are designed to be compatible with multiple coding tools, enabling you to enhance different agents with a consistent set of capabilities.

This flexibility allows teams to integrate their existing favorite tools and models, ensuring seamless and compliant operation within their established workflows.

Rung four: Agent Development Kit (ADK 2.0)

Code-first, low floor, high ceiling. If Managed Agents are configuration-first, ADK is engineering-first. This is for software engineers who want to build custom agent meshes from the ground up - any architecture, any model, unconstrained.

ADK enhancements launched at Google Cloud Next are now available for everyone. It introduces a unified graph-based engine that gives you a slider from dynamic, model-led reasoning to strict, deterministic workflows. The framework handles the heavy lifting of multi-agent coordination, managing how sub-agents, tools, and data pass between one another.

Collaborative workflows (Python v2.0.0): Previously called the Task-based Agent Collaboration API, this is how you build self-managing agent teams. A coordinator delegates to subagents using explicit operating modes:
- chat: Full user interaction, manual return to parent, this is “handoff conversation to sub-agents”.
- task: User interaction for clarifications, automatic return to parent, this is a new “collaborate for this assignment” which is the best of both other options.
- single-turn: No user interaction, parallel execution, automatic return, this is “agent as tool”.
Dynamic workflows: Dynamic workflows in ADK allow you to put aside graph-based path structures and use the full power of your chosen programming language to build workflows. With Dynamic workflows, you can create workflows with simple decorators, invoke workflow nodes as functions, and build complex routing logic.
ADK Kotlin (Beta): "ADK for Android." Kotlin support joins Python, Go, and Java, increasing language coverage so your on-device mobile agents can seamlessly coordinate with your backend Python agents.

Finally, the Agents CLI packages Google's expert skills for ADK, eval, deploy, observability, and publishing - turning any AI coding agent (like Antigravity, Gemini CLI, Claude Code, or Cursor) into an expert at agent app building as well as agent ops. It gives your AI Agent skills to understand the Google Cloud agent stack, turning an expansive ecosystem into a seamless assembly line for developers hillclimbing their agent builds.

What we'd actually try first

If we were starting today, here's the order we'd reach for things:

Start with the Antigravity 2.0 desktop app: Explore the interface, add a pre-built agent, and interact with it to understand the core functionality. This provides a more intuitive entry point before diving into API specifics.
Build a mesh: Feel free to explore Managed Agents API through the Agents API skill and Interactions API skill. When you start hitting routing decisions you want to make explicit, or need complex multi-agent orchestration, port your logic to ADK 2.0. The graph model is worth the learning curve as soon as you have more than two branching paths. Don't worry about stringing together a bunch of separate pieces to make this happen - this is exactly where the Agents CLI shines.
Govern and reuse shared domain logic: Check out Skill Registry (public preview): A centralized catalog to govern and promote the reuse of packaged domain logic. Skills are accessible via the Managed Agents API, Agent Platform SDK, and ADK (via SkillToolset). Skill Registry will be part of Agent Registry shortly.
Evaluate: Use the Gemini Enterprise Agent Platform evaluation suite to move beyond basic text-matching vibe checks. Leverage synthetic user simulation to auto-generate multi-turn testing scenarios and safely mock API environments to pressure-test tool resilience. Finally, utilize its LLM-based autoraters and trace logging to evaluate complex logic, group failures, and continuously optimize your agent.
Secure the pipeline: Leverage Gemini Enterprise Agent Platform governance capabilities like Agent Identity, Agent Gateway, Agent Security, and Agent Registry to secure your deployment. Once CodeMender releases, add it to your CI/CD to proactively secure the code your human (and AI) developers are pushing.

Note: You can do this whole loop on a Google Cloud Starter Tier account without a billing account attached. First two app deployments are on us.

We’re excited and hope you are, too

The agent space is evolving rapidly. Agent Platform offers a secure and adaptable foundation. Core components like the Agent Gateway, identity management, and the Skill Registry work together to ensure a robust and controlled environment for your agents, enabling you to innovate flexibly without vendor lock-in.

Pick the rung that fits the project. Bring whatever coding agent your team prefers. The platform you graduate to is the same one either way, and the data stays inside your Cloud project the whole time.

If you only read one set of docs after this post, make it the Agents overview in the Agent Platform documentation. If you build something interesting, show us - the best examples will land in the next round of templates.

We can’t wait to see what you build!

The future of agentic development: Redefining the data practitioner lifecycle with Data Agent Kit

Tue, 19 May 2026 17:45:00 +0000

The modern software development landscape isn’t happening just on one surface — it’s happening across an entire ecosystem of agentic tools. Agents are being developed at an unprecedented scale, and these agents require direct access to enterprise data for context and grounding.

However, the current tooling for building agents and managing data is heavily fragmented. This can make it difficult to access data, increasing security risks, and causing broken developer experiences that hinder innovation.

To address this challenge, we recently launched Data Agent Kit, a unified, open-source collection of data engineering and data science skills, tools and plugins that integrate directly into the environments practitioners already use, such as VS Code, Claude Code, Codex, Gemini CLI and the Antigravity CLI. By seamlessly bringing together these core tools and skills with your enterprise data, the Data Agent Kit effectively serves as a comprehensive harness for agentic context, memory, and personalization. It provides:

Agentic skills: Pre-codified pathways for interacting with your data estate, covering query optimization, ML best practices, data validation, data drift checks, governance, and troubleshooting.
Model Context Protocol (MCP) tools: Secure connections between agentic workflows and cloud data platforms like BigQuery, AlloyDB, and Google Cloud Storage. Developers can now configure connection parameters for their cloud datasets and data processing engines without having to manage complex, manual pipeline code.
Plugins and extensions: Native IDE integrations that enable rich, context-aware developer interactions.

Together, these Data Agent Kit capabilities help data practitioners go from manually writing code to intent-driven data science and engineering: defining the desired business outcomes, constraints, and success criteria, and allowing the AI-augmented system to figure out how to execute it. This shift is critical because today, when building agentic applications that navigate complex data architectures, there’s often a 'context window tax' i.e., developers have to manually paste vast amounts of schema metadata into prompts, eating up token limits and increasing latency. Meanwhile, data practitioners often lack guidance about how to efficiently query, optimize, and troubleshoot cloud data, while specialized, fragmented development environments cannot see across your entire data estate. Data Agent Kit helps with these challenges and others, providing the foundational capabilities data practitioners need for a new agentic way of working.

Read on for an overview of Data Agent Kit’s features and benefits, how to install it and connect your local environment to your data estate, and an intent-driven engineering example.

A unified hub for your data estate and lifecycle

Data Agent Kit makes your entire data estate available in a single view. This goes beyond providing a simple catalog for databases such as BigQuery, AlloyDB and Spanner; rather, it integrates data engineering and science tasks, orchestration pipelines, and jobs into a single interface. This allows practitioners to manage their entire data workflow — from discovery to production — without context switching. Data Agent Kit’s intelligent routing automatically chooses the optimal compute engine for your task — whether that’s BigQuery for SQL-native analytics and ELT, or Spark for custom Python transformations and distributed ML training.

Unified Hub of your entire data estate and lifecycle

Ecosystem-led intelligence: Codified agentic skills

Data Agent Kit offers a library of predefined agentic skills (e.g., ML best practices, ELT, building data apps) based on Google Cloud’s data engineering and science expertise. Rather than relying on generic LLM prompts, it codifies prescriptive guidelines into your workflow. This allows you to inject enterprise-grade data intelligence directly into your IDE or CLI.

Browsing a predefined list of agentic data engineering and science skills

Transforming data exploration through natural language

Grounded in this unified data, Data Agent Kit delivers native conversational analytics directly within your workspace, making it easy to explore your data. Powered by the same Gemini natural language to SQL technology found in our first-party agents (e.g., Conversational BigQuery and Looker), Data Agent Kit lets you run natural language queries to profile, search, and visualize your datasets.

Within Data Agent Kit, you can use Conversational Analytics to explore your data

A practical walkthrough: Unifying data and building models

To see how Data Agent Kit’s skills and MCP tools work together, consider a financial services scenario: Your company is facing rising fraud claims. With your transaction data stored in Cloud Storage, you need to build a high-confidence fraud detection model and schedule orchestration pipelines. Traditionally, this involves hours of data wrangling across multiple consoles. With the Data Agent Kit, you can complete this in minutes, directly within your IDE or CLI. Let’s see how.

Onboarding: The one-minute setup

You can get started with the Data Agent Kit in under a minute through an integrated setup process.

To do so, search for "Google Cloud Data Agent Kit" in your IDE’s marketplace (VS Code) or via the GitHub repo in your CLI (Gemini, Antigravity, Claude, Codex) from the links in the “Get started today” section below. Data Agent Kit automatically configures dependencies and checks your Google Cloud login status.

Click the Google Cloud icon in your activity bar to authenticate via IAM. Once logged in, your Cloud Storage, databases, and catalog assets appear instantly in your workspace.

Use the settings menu to set project IDs, regions, and verify MCP status to ensure all backend services are authorized. Data Agent Kit also includes a quick-start guide on using the tools and skills.

An intent-driven data engineering example

With Data Agent Kit installed, you can skip the manual ETL boilerplate, and directly describe your high-level goal to your coding assistant (e.g., Claude Code, GitHub Copilot) in natural language. The assistant leverages Data Agent Kit’s skills to plan and execute the workflow.

Prompt:

I have the raw transaction logs landing in the GCS bucket gs://fin-clearing-raw/.

First, create a Spark notebook and (1) ingest these logs into an Iceberg table in BigQuery.

Second, create a dbt project to (2) deduplicate them, (3) remove the transactions with invalid transaction id and store them in a separate Iceberg table, (4) standardize the timestamps and perform any other necessary cleanup tasks (5) sync the output to another Iceberg table (6) join this output table with tables that have payer and payees identities and write the output to a final Iceberg table.

Third, I would like you to train an ML model on Spark using a notebook to detect fraudulent transactions in the output table. I am thinking about a LightGBM model but I am open to any suggestions you might have. Use the relevant datasets in the project.

Finally, create an inferencing step using Spark notebook to the above pipeline to perform batch inferencing and write flagged transactions to a Spanner table.

Create an orchestration pipeline that first runs the ingestion then the dbt and next the inference notebook.

Under the hood: Data pipeline steps

Behind the scenes, Data Agent Kit plans a robust multi-step orchestration of the entire data lifecycle, from exploration to inference.

Step 1: Notebook creation, ingestion and initial storage

Find your bronze data — raw, unfiltered data on financial transactions — and bring it into an Iceberg table before doing the transformations.

Automatically create a Notebook to ingest the raw logs from Cloud Storage.
Write the necessary SQL, and store the ingested data into an Iceberg table in BigQuery.

Ingestion into a bronze table

Step 2: Transformation (dbt Project)

Now, clean the bronze data into silver and gold tables:

Data preparation: Deduplicate the transaction logs.
Filter invalid IDs: Identify transactions with invalid IDs and store them in a separate Iceberg table.
Clean and standardize: Standardize timestamps and perform other necessary cleanup tasks.
Sync: Output the cleaned data to another Iceberg table, leveraging the BigQuery MCP server.
Enrichment: Join the cleaned table with payer and payee identity tables.
Final output: Write the joined dataset to a final Iceberg table.

Data transformation to create silver and gold tables

Step 3: Machine learning and inferencing

With your gold table minted, it’s time for some data science: model training and inferencing. Here, the agent hands the clean data from the previous step to the model to spot fraudulent patterns.

Training: Use a Spark notebook to train an ML model.
Inference: Create a Spark notebook inferencing step for batch processing.
Storage: Write all flagged fraudulent transactions to a Spanner table by leveraging the Spanner MCP.

Machine learning and inference

Step 4: Orchestration and execution

Finally, you’re ready to move to production and schedule the whole orchestration pipeline: Ingestion -> Transformation -> Inference.

Orchestration pipelines and scheduling runs

When things go sideways: Agentic incident management and intelligent recovery

If an orchestration pipeline fails, not to worry, Data Agent Kit streamlines resolution using its intelligent incident management capabilities:

Intelligent diagnosis: Automatically conducts root cause analysis to pinpoint failure sources
Autonomous remediation: Drafts and tests fixes, bypassing manual debugging
Automated recovery: Validates and deploys fixes via automated Git workflows

Issue diagnosis and remediation

And there you have it: You’ve gone from raw discovery to a fully automated, fraud-catching machine in a matter of minutes, all from within the same UX. No need to hop between multiple browser tabs, IDE interfaces, or learn data engineering and science best practices — Data Agent Kit orchestrates a clean end-to-end flow leveraging various MCP tools and codified skills. Ultimately, this approach helps you achieve what matters most: shipping innovative, high-performance data applications at scale.

Get started today

Data Agent Kit is available today in preview. Start by installing it in your favorite IDE or CLI:

Then visit the documentation to learn more and get started.

Gemini Live Agent Challenge: Announcing the winners and highlights

Fri, 15 May 2026 16:00:00 +0000

The Gemini Live Agent Challenge is officially in the books! We challenged developers worldwide to break out of the traditional 'text box' paradigm by building next-generation AI agents. From our initial announcement to amassing 11,878 participants and 1,536 submitted projects from 151 countries, the results were nothing short of spectacular.

The mission was to seamlessly integrate multimodal capabilities—building agents that help you see, hear, speak, and create in real time — using the Gemini Live API, the Agent Development Kit (ADK), and the robust infrastructure of Google Cloud. Participants pushed the boundaries of interactive AI across three distinct categories: The Live Agent, The Creative Storyteller, and The UI Navigator.

Congratulations to the builders who took home the top prizes! These winning teams combined technical precision with bold imagination, completely redefining how users can interact with and experience agents. Two of these standout developers were even recognized in person at Google Cloud Next 2026. Here’s a look at their experience, alongside the complete list of winning agents.

Celebrating our category winners at Google Cloud Next ‘26

Category winners Jeremiah Somoine and Bryen Param were invited to attend Google Cloud Next 2026 in Las Vegas, where they shared their experiences and insights with the broader developer community. Both winners presented Lightning Talks at the Developer Theatre on the expo floor and sat down for exclusive interviews in the Creator Studio Pod at the GDE and Certified Lounge.

During his time at the event, Bryen discussed the core inspiration behind drone-copilot. He explained that his project was driven by the question of "what if a model could interact with the real world?", showcasing how multimodal capabilities can bridge the gap between AI and physical environments.

Jeremiah, currently a college student, reflected on the development process behind Sankofa, noting that "the best response to a technical limitation was a creative one." When asked what advice he would give to other students looking to build the next generation of AI applications, he emphasized the importance of jumping at any opportunity to get hands-on with the technology. "The best way to learn is by doing," he said, encouraging aspiring developers to simply dive in and start building.

Winners

Grand Prize winner: ORION - Operating Room Intelligent Orchestration Node
By: Aditya Shukla

ORION, or Operating Room Intelligent Orchestration Node, is a voice-directed surgical co-pilot for robotic surgery. Surgeons can speak naturally and instantly receive answers, live data on display, and real-time visual assistance - all without breaking scrub.

The Live Agent winner: drone-copilot
By: Bryen Param

Drone-copilot transforms how users interact with hardware by enabling natural, real-time conversations with a drone instead of using a joystick or complex menus. Simply by speaking, users can instruct the drone to navigate, perform autonomous visual inspections, or describe its surroundings, while the drone verbally responds and confirms its actions in real time.

Creative Storyteller winner: Sankofa
By: Jeremiah Somoine

Sankofa acts as a multimodal AI "griot"—a traditional West African storyteller—transforming fragmented family histories into deeply immersive narratives. Based on just a few user details, it weaves together rich voice narration, watercolor imagery, and ambient soundscapes into a historical story, allowing users to engage in a real-time voice conversation with the storyteller to explore their roots further.

UI Navigator winner: Moonwalk
By: Enaiho Uwas Paul and Aman Kumar Sah

Moonwalk is a conversational, hands-free desktop assistant that helps users intuitively navigate their computer and complete complex tasks using just their voice. By remembering personal preferences and past interactions, it acts as an intelligent co-pilot that can seamlessly control your mouse and keyboard to execute everyday workflows—like booking flights or managing spreadsheets—while you simply sit back and speak.

Best multimodal integration and user experience winner: Wand
By: David Li

Wand is a voice-first, pointer-aware browser assistant that helps you seamlessly navigate and interact with any website using a combination of natural speech and hand gestures. By simply pointing at your screen and speaking — like asking to "play this video" or "zoom in here"—this live agent helps you instantly execute clicks, searches, and commands without ever needing to touch a mouse or keyboard.

Best technical execution and agent architecture winner: JohnKeats.AI
By: Matthew Keats

JohnKeats.AI is a voice-first emotional companion designed to actively listen and hold space for users without rushing to offer solutions. By processing subtle vocal cues like pitch, pacing, and tone, it reacts naturally to a user's emotional state in real time to provide a deeply reflective and empathetic conversational experience.

Best innovation and thought leadership winner: Rayan Memory
By: Yusuf Elnady

Rayan Memory tackles the universal problem of forgetting by turning your daily learnings into a fully explorable 3D "memory palace." A background agent passively listens to your real-world audio to extract important ideas as physical artifacts, allowing you to walk through themed virtual rooms and converse with a dedicated AI companion to easily retrieve your exact memories.

Honorable mention: NagarDrishti
By: Nikita Dongre and Omkar Dongre

NagarDrishti tackles dangerous road conditions by allowing citizens to safely report potholes and waterlogging using a hands-free voice assistant while driving. These real-time reports instantly populate an interactive dashboard, where city officials can use natural language to easily identify hazard hotspots and manage critical repairs.

Honorable mention: Ekaette
By: Bassey John

Ekaette revolutionizes customer service by replacing frustrating hold queues with a conversational, multimodal AI assistant that operates across live phone calls and text messaging. Customers can speak naturally with the agent over a standard phone line while seamlessly sharing photos, reviewing product options, or completing payments via WhatsApp, c

Honorable mention: VibeCat
By: Sejun Kim and Michael Chang

VibeCat is a proactive macOS desktop companion that continuously watches your screen, understands your context, and suggests helpful actions before you even ask. Instead of waiting for a command, it speaks up first — like offering to fix a missing line of code or execute a terminal command — and completes the task only after receiving your permission.

Honorable mention: Call My Parts
By: Sugam Palav, Nikhil Lohar, Siddhant Panday, and Vishal Parekh

Call My Parts automates the tedious, time-consuming process of sourcing used vehicle parts by doing the research and vendor outreach for you. Users simply speak their part request, and the AI agent autonomously searches vendor websites, calls suppliers to check pricing and inventory, and compiles the best options into a ranked, easy-to-read dashboard.

Honorable mention: Relay
By: Faith Ogundimu

Relay is an interactive AI lab partner that uses your webcam to watch and guide your physical electronics projects in real time. It provides step-by-step voice instructions to help you build circuits, catches wiring mistakes before they happen, and reinforces your skills with a built-in 3D simulation sandbox and adaptive quizzes.

Keep the momentum going

Inspired by these incredible projects? Start building and stay connected with the community through our latest programs and events:

Join Gemini Enterprise Agent Ready (GEAR), designed to help developers and decision-makers build and deploy production-ready AI agents.
Catch up on Google Cloud Next 2026: We just wrapped up an amazing Google Cloud Next! If you weren't able to join us in person — or simply want to relive the energy — take a look at our social and livestream recaps to catch up on some of the exciting developer activations straight from the expo floor.
Tune in on Tuesdays: Want to be the first to hear about new tools, product updates, and upcoming hackathons? Join us for our weekly livestream every Tuesday 9:00 A.M. PDT / 12:00 P.M. EDT for the latest in all things Google Cloud.

Congratulations again to all of our winners and participants. We can't wait to see what you build next!

Cloud CISO Perspectives: How Google + Wiz changes multicloud strategy for CISOs

Thu, 14 May 2026 16:00:00 +0000

Welcome to the first Cloud CISO Perspectives for May 2026. Today, Vinod D’Souza, director, Office of the CISO, shares highlights from his RSA Conference fireside chat with Anthony Belfiore, chief strategy officer, Wiz.

As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.

aside_block: <ListValue: [StructValue([('title', 'Get vital board insights with Google Cloud'), ('body', <wagtail.rich_text.RichText object at 0x7f75d2400ee0>), ('btn_text', 'Visit the hub'), ('href', 'https://cloud.google.com/solutions/security/board-of-directors?utm_source=cgc-site&utm_medium=et&utm_campaign=FY26-Q2-GLOBAL-GCP39634-email-dl-dgcsm-CISOP-NL-177159&utm_content=-&utm_term=-'), ('image', <GAEImage: GCAT-replacement-logo-A>)])]>

How Google + Wiz changes multicloud strategy for CISOs

By Vinod D’Souza, director, Office of the CISO, and Anthony Belfiore, chief strategy officer, Wiz

Vinod D’Souza, Director, Office of the CISO

The cybersecurity landscape is undergoing a massive paradigm shift that is being driven by increasingly complicated cloud infrastructure and the ongoing, rapid rise of AI. While threat actors have seen gains from the adversarial misuse of AI, Google and Wiz are tackling these challenges head-on by combining Wiz's deep cloud telemetry with Google's world-class AI and quantum research to help CISOs and their organizations meet the needs of the agentic enterprise era.

As the world becomes increasingly multicloud and multi-AI, we believe that successful CISOs will use AI to analyze code and infrastructure holistically. Developers are building autonomous, agentic systems that can bridge resource gaps and enable real-time infrastructure healing. We should pair that incredible advancement with human oversight of automated fixes.

Anthony Belfiore, Chief Strategy Officer, Wiz

Building towards near real-time defense with AI

The exponential growth of AI means that we can expect technology to leap as much in the next five years as it did in the previous 30. To combat AI-driven threats, security responses will have to become near real-time, if not even faster. By tapping into the innovative minds at Google — specifically integrating with Gemini and Google DeepMind logic — Wiz aims to eventually enable hyper-resilient, self-healing code and infrastructure.

Bridging the gap by centering developers

Wiz has revolutionized vulnerability management by giving organizations an intuitive graph that analyzes cloud environments and ranks threat priorities in 15 minutes or less, turning a weeks-long process into minutes. However, simply giving security teams faster alerts led to a signal tsunami, where teams were chasing developers day and night just to treat symptoms rather than curing the core problem.

The solution was centering developers at the heart of the security strategy. By shifting security left — into the code — and providing context-aware tools, over 50% of Wiz’s daily active users are developers, not security practitioners, leading to a significant increase in security resolution.

In 2026, developers are the ultimate code-watchers because they hold the keys to both innovation and preservation. As vital watchers on the wall, enabling them is no longer an optional strategy if organizations want to stay ahead of modern threats.

Through innovations like Wiz Code, developers get granular data linking production issues directly back to their repositories, empowering them to fix vulnerabilities right where the code is written. In 2026, developers are the ultimate code-watchers because they hold the keys to both innovation and preservation. As vital watchers on the wall, enabling them is no longer an optional strategy if organizations want to stay ahead of modern threats.

Supercharging the agentic SOC future with data and automation

Data is the lifeblood of AI and cloud security. Wiz currently sits on a trove of sanitized data that captures the characteristics of highly secure, resilient, and compliant multicloud environments. When you meld Wiz's specialized cloud telemetry with Google's massive global data access — which includes 90% of the world's browsers and 25% of fiber data — the resulting correlation will profoundly improve threat detection and efficacy.

While this combined intelligence can improve alerts, it can do much more than that. We expect that it will make human security operations center (SOC) operators exponentially more efficient, allowing them to manage the incoming wave of AI-driven threats through automated, agentic interactions. Wiz’s Red, Blue, and Green agents, and Google Security Operations’ Threat Hunting, Detection Engineering, and Third-Party Context agents, can help you develop the human-above-the-loop approach that empowers security teams to rapidly scale up.

However, fully autonomous fixing (where AI automatically changes code and configurations) is not yet ready for prime time. Because automated fixes could accidentally trigger denial-of-service and other outages, human-in-the-loop workflows remain critical.

Bridging the hybrid gap

In order to support as many of you as possible, including major legacy enterprises and institutions, Wiz developed sensors for Linux, vSphere, and Windows environments to enable a unified security approach for hybrid and cloud-native infrastructure. This gives CISOs a vital seat belt, a single pane of glass to protect their organizations as they safely drag and drop applications into the cloud.

Looking ahead

It’s crucial that your 2026 roadmap supports developers, but doing so doesn’t magically make a clean cloud transformation happen. To bridge this gap, the fusion of Wiz and Google focuses on three pillars of developer enablement:

Protection: Providing a sensor for on-premises and private cloud (Linux, vSphere, Windows) is the virtual seat belt that these organizations need to support a consistent security experience during hybrid migration.
Data provision: Delivering high-fidelity, contextualized alerts directly into existing workflows (such as GitHub and images) can help eliminate the noise of the signal tsunami.
Risk management: Using Wiz Code to provide the exact line-of-code traceability, organizations can fix risks at the source before they ever reach production.

The future of the watchers on the wall

The era of chasing mythical beasts in production through manual spreadsheets is ending. As we move toward a world of self-healing code and agentic SOCs, executives should be boldly moving on from treating security symptoms, and instead empowering developers who hold the keys to future resilience.

To learn more about the Google and Wiz approach to securing AI, check out Wiz’s State of AI in the Cloud 2026 report, and Google Cloud’s newest update on the adversarial misuse of AI.

aside_block: <ListValue: [StructValue([('title', 'Fact of the month'), ('body', <wagtail.rich_text.RichText object at 0x7f75d2400910>), ('btn_text', 'Learn more'), ('href', 'https://cloud.google.com/blog/topics/threat-intelligence/m-trends-2026'), ('image', <GAEImage: Cloud-CISO-Perspectives-logo-A>)])]>

In case you missed it

Here are the latest updates, products, services, and resources from our security teams so far this month:

Why AI-powered cyber fraud is winning — and how we fight back: Fraud costs are staggering. At Google, we offer AI-driven tools that span our cloud, browser, and mobile ecosystems to help you build resilient fraud defense. Read more.
The files AI coding agents trust — and attackers exploit: As AI coding agents become embedded in developer workflows, defenders must rethink how to protect against malicious files. Here’s what you need to know. Read more.
What's new in IAM: Security, governance, and runtime defense: We’ve introduced a new security and governance paradigm for managing agent identity and access. Here’s what you need to know. Read more.
Google named a Leader in the 2026 Gartner Magic Quadrant for Cyberthreat Intelligence Technologies: We are proud to announce that Gartner has named Google a Leader in the 2026 Magic Quadrant for Cyberthreat Intelligence Technologies. Here’s what that means. Read more.
Why cloud infrastructure is the foundation for digital health in 2026: As SaMD moves from reactive diagnostics to proactive learning systems, cloud has become a superior foundation for regulated medical software. Read more.
Introducing Agent Gateway ISV ecosystem for security and governance: Google Cloud is partnering with leading identity and AI security solutions to integrate with Agent Gateway and help ensure that your security posture remains as flexible as the agents you’re building. Read more.

Please visit the Google Cloud blog for more security stories published this month.

aside_block: <ListValue: [StructValue([('title', 'Join the Google Cloud CISO Community'), ('body', <wagtail.rich_text.RichText object at 0x7f75d2400640>), ('btn_text', 'Learn more'), ('href', 'https://rsvp.withgoogle.com/events/google-cloud-ciso-community-interest-form-2026?utm_source=cgc-blog&utm_medium=blog&utm_campaign=FY25-Q1-global-GCP30328-physicalevent-er-dgcsm-parent-CISO-community-2025&utm_content=cisop_&utm_term=-'), ('image', <GAEImage: GCAT-replacement-logo-A>)])]>

Threat Intelligence news

GTIG AI Threat Tracker: Adversaries leverage AI for vulnerability exploitation, augmented operations, and initial access: Google Threat Intelligence Group (GTIG) continues to track a maturing transition in the adversarial use of AI. In this report, we update you on AI-augmented vulnerability discovery and exploit generation, defense evasion, autonomous malware operations, research and information operations, intentionally obfuscated LLM access, and supply chain attacks. Read more.
Defending your enterprise when AI models can find vulnerabilities faster than ever: Now is the time to strengthen playbooks, reduce exposure, and incorporate AI into security programs. Here’s an overview of the evolving attack lifecycle, how threat actors will weaponize these capabilities, and a roadmap for modernizing enterprise defensive strategies. Read more.
German cyber criminal Überfall and shifts in Europe's data leak landscape: Germany has reclaimed its position as a primary focus for cyber extortion in Europe. While data leak site posts rose almost 50% globally in 2025, Google Threat Intelligence (GTI) data shows that the surge is hitting German infrastructure harder and faster than its regional neighbors. Read more.
How UNC6692 employed social engineering to deploy a custom malware suite: Google Threat Intelligence Group (GTIG) has identified a multistage intrusion campaign by a newly-tracked threat group, UNC6692, that used persistent social engineering, a custom modular malware suite, and deft pivoting inside the victim’s environment to achieve deep network penetration. Read more.

Please visit the Google Cloud blog for more threat intelligence stories published this month.

Now hear this: Podcasts from Google Cloud

What the law says about AI governance meeting its agentic future: James Sherer, partner, BakerHostetler, joins host Anton Chuvakin and guest co-host Marina Kaganovich, enterprise trust lead, Office of the CISO, to discuss the legal ramifications of emerging technologies (like AI) that are rapidly changing (also like AI.) Listen here.
Revisiting Google Cloud Next: What does the “ragged edge of AI adoption” mean for security? Why do people want agents in their SOC? Hosts Anton and Tim Peacock chat about the most notable and fun announcements from Next ‘26. Listen here.
Defender’s Advantage: Google's Disruption Mission: Host Luke McNamara is joined by Charley Snyder to explore how Google is building a coordinated approach to disrupting adversary cyber operations. Listen here.
Behind the Binary: What happens when botnet operators show up in court: Host Josh Stroschein is joined by Xusheng Li, a debugger architect and reverse engineering expert, to explore the evolution of Time Travel Debugging (TTD) a new way to debug by recording and replaying execution traces. Listen here.

To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in a few weeks with more security-related updates from Google Cloud.

Google Named a Leader in the Gartner® Magic Quadrant™ for AI Application Development Platforms: Mid-cycle update

Wed, 13 May 2026 17:00:00 +0000

May 2026 update: We’ve refreshed this post to reflect our mid-cycle positioning and the evolution of our platform since the report was first published last November.

Last fall, Google was recognized as a Leader in the inaugural Gartner® Magic Quadrant™ for AI Application Development Platforms, positioned highest in Ability to Execute of all vendors evaluated.

In our opinion, the mid-cycle update published last week reflects continued momentum. In this update, Google is a Leader, positioned highest in Ability to Execute and ranked #1 ranking across the three use cases assessed in the associated Critical Capabilities report.

A lot has changed since last November, including the platform itself. At Google Cloud Next ‘26, we unified the core power of Vertex AI with new breakthroughs from Google DeepMind and Google Cloud under the Gemini Enterprise umbrella. The result is the Gemini Enterprise Agent Platform, a unified destination designed to help you build, scale, govern, and optimize production-ready agents.

Here are the three principles guiding our Agent Platform strategy and what we believe this Gartner report validates for our customers.

Governance as default, not an afterthought

When governance is treated as an afterthought, it usually results in one of two extremes: overly restrictive blocks that stall innovation, or inconsistent manual checks that leave the organization exposed.

With Agent Platform, we provide a unified trust framework to manage the entire agent lifecycle. This ensures every agent has a verifiable identity, is inventoried in a central registry to prevent sprawl, and routes every request through a secure gateway.

By integrating these controls with the real-time protection of Model Armor and our recent acquisition of Wiz, we are connecting code, cloud, and runtime into a single shared context – allowing teams to identify and remediate risks across their entire environment.

For L’Oréal, this architecture is what makes the fundamental shift from scripted automation to autonomous agent orchestration possible.

“Google Cloud gives us the resilience, the multi-LLM flexibility, and the enterprise-grade trust framework we need to scale [our Beauty Tech Data platform] globally, while keeping human oversight at the center." – Etienne Bertin, Group CIO, L'Oréal

Persistence for long-running tasks

The difference between a chatbot and a true agent is the ability to follow through on a task. For an agent to move the needle on real outcomes, it has to function like a colleague – maintaining context over days and executing multi-step processes.

We re-engineered the Agent Runtime to support agents that can stay active for days at a time, backed by Memory Bank for persistent context across sessions. This makes it so agents can manage long-running business processes without requiring constant human intervention.

At Payhawk, our infrastructure has fundamentally changed the scope of what their agents can contribute to the business:

“Payhawk uses Gemini Enterprise Agent Platform to transform our AI agents from simple task executors into genuine financial assistants. Our agents now act like dedicated team members, autonomously recalling user-specific constraints and history.” – Diyan Bogdanov, Principal Applied AI Engineer, Payhawk

Visibility for predictable outcomes

In a non-deterministic world, knowing what an agent did is only half the story. The real operational leverage comes from knowing why it did it and having the tools to catch when an agent’s performance begins to slip before it impacts your users.

Google received the highest score for the Critical Capabilities AI Agent Use Case. In our view, this validates our focus on giving teams deep visibility into agent reasoning. By using agent simulation and trajectory evaluations on Agent Platform, organizations can move away from guesswork and ensure their agents perform as expected in real-world interactions.

For Burns & McDonnell, this visibility is what allows them to ground an agent's creative reasoning in their specific business rules:

“Agent Platform enables this innovation to scale responsibly by combining deterministic business rules with probabilistic reasoning — making AI a trusted operational capability, not just a productivity tool. With Agent Platform, we aren’t just managing knowledge; we are activating experience to drive faster, more confident decisions." – Matt Olson, Chief Innovation Officer, Burns & McDonnell

Our commitment to an open agent economy

While our platform has evolved, our core philosophy around choice, flexibility, and accessibility remains unchanged. Model Garden continues to offer over 200 best-in-class models—including Gemini 3.1, Gemma 4, and third-party leaders like Anthropic’s Claude.

Beyond model choice, we are deeply invested in the open-source community and the interoperability of the broader agent economy. Our open-source Agent Development Kit (ADK) provides developers with the core tools they need to build openly. To further standardize collaboration across platforms, we donated the Agent2Agent protocol (A2A) to the Linux Foundation, and officially donated the Agent Payments Protocol (AP2) to the FIDO Alliance.

We’ve also embraced Model Context Protocol (MCP) as a foundational standard, providing more than 50 Google-managed MCP servers that allow agents to securely connect with the Google Cloud ecosystem. These are long-term bets on a secure and vendor-neutral future for agent transactions.

At Google Cloud, we are building the standards for an agent economy that works for every business, regardless of their stack.

Download a complimentary copy of the 2026 Gartner Magic Quadrant update here.

Gartner® Magic Quadrant™ for AI Application Development Platforms: Midcycle Update - Cary Pillers, Mike Fang, Steve Deng, Jim Scheibmeir, April 27, 2026

Gartner® Critical Capabilities™ for AI Application Development Platforms: Midcycle Update - Jim Scheibmeir, Cary Pillers, Steve Deng, Mike Fang, April 28, 2026

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Google.

GARTNER is a registered trademark and service mark of Gartner Inc., and/or its affiliates in the U.S and internationally, and MAGIC QUADRANT is a registered trademark of Gartner Inc., and/or its affiliates and are used herein with permission. All rights reserved.

The power of LLMs on your data, more than two orders of magnitude faster and cheaper

Wed, 13 May 2026 16:00:00 +0000

Databases have introduced new AI-powered SQL functions which take natural language instructions as input and are evaluated using LLMs. They leverage the power of LLMs to answer new kinds of queries: Which product reviews are negative about durability? Which customer support tickets have been resolved by providing a workaround?

These new AI functions push the boundaries of what is possible in a SQL query engine by bringing the semantic understanding of LLMs to your data, thus enabling previously impossible analyses and applications. But, their cost and performance limited their applicability. LLM invocations add 10-100x to the overall query latency and ~1000x on cost. This is much too slow for operational databases. In analytics, a medium-sized query on 10-100 millions of rows would consume an amount of tokens that is prohibitively expensive for some applications.

Google Cloud has published a new paper at SIGMOD where we show how to accelerate and reduce the cost of LLM-powered AI functions by using proxy models. Proxy models are cost-optimized ultra-lightweight models tailored to a specific query (aka prompt) and tuned for your data. They replace the majority of LLM calls during query execution (thus the name proxy model) and can be trained on-the-fly or ahead of time. The fundamental ideas behind proxy models were proposed in Universal Query Engine (UQE) at NeurIPS 2024 by Google DeepMind.

Our paper shows that proxy models are automatically applicable in many (but not all) cases, sometimes with no loss of quality, sometimes with minor quality loss and a few times with a gain of quality. BigQuery and AlloyDB already implement this optimization under the optimized mode feature for AI.IF (BigQuery docs, AlloyDB docs) and AI.CLASSIFY (BigQuery docs). This article is a tl;dr of the SIGMOD paper and provides the key intuitions on three questions:

Why do proxy models work so accurately for so many cases, even though they are so much more performant than LLMs?
How do they work?
In which use cases do they deliver accurate answers? In which cases they fail and accuracy needs LLMs.

Why Proxy Models Work Accurately at Ultra Low Latency and Cost?

How can an ultra-lightweight proxy model, such as the logistic regression currently in use at BigQuery and AlloyDB, have the semantic understanding power of LLMs, which is required for accurate question answering? The key intuition is that these proxy models input rich embeddings of the data that they query. By default, we are using the Gemini embedding generators, which do the heavy lifting of bringing semantics to your data when the embeddings are generated.

Then the ultra low latency and cost are easy to see: Since embeddings are generated once and used many times, the cost of bringing semantics to your data is amortized; it now happens once as opposed to happening for each query. Furthermore, the proxy models run fast in the CPU — no need for dedicated hardware.

We hope that we gave you good intuitions for why proxy models work. But a word of caution is also needed: Proxy models are fundamentally an approximation technique more limited than LLMs. Proxy models perform well on some prompts but may be deficient to LLMs in others. Case in point, the SIGMOD26 paper shows that the proxy/LLM predictive performance (as measured by F1) ratio ranged from 90% to 116% in 10 benchmarks. For example, they might break down on problems that require reasoning to connect multiple semantic concepts. Rather, think of them as specializing the model to your query and your data.

The good news is that the query processors automatically check the effectiveness and feasibility of implementing AI Functions by proxies. Let’s see how they do it.

How Proxy Models Work?

Let’s go through a simple example of a semantic filter (AI.IF). Our taste in movies is very particular: We like movies with an interesting plot and great cinematography. The query below processes IMDB reviews to find such movies.

code_block: <ListValue: [StructValue([('code', 'SELECT\r\n DISTINCT t.primary_title\r\n FROM \r\n bigquery-public-data.imdb.reviews r, \r\n bigquery-public-data.imdb.title_basics t\r\n WHERE TRUE\r\n AND r.movie_id = t.tconst\r\n AND AI.IF("Is the plot interesting? Review: " || r.review, \r\n embeddings => r.review_embedded)\r\n AND AI.IF("Does the review praise the cinematography? Review: " || r.review, \r\n embeddings => r.review_embedded)'), ('language', 'lang-sql'), ('caption', <wagtail.rich_text.RichText object at 0x7f75d25f64f0>)])]>

The column review contains the free-form text of the review. The column review_embedded contains Gemini embeddings of the review text. When you run this query in BigQuery, the query engine will

For the first AI.IF, create a training samples’ set consisting of about one thousand rows of the input relation, the imdb.reviews table.
Use an LLM to label the first sample set, marking each review as either TRUE (yes, the plot is interesting) or FALSE (no, the plot is not interesting).
Train a proxy model for the first AI.IF using the labels computed at the previous step.
Create a test sample set of rows for the first AI.IF and evaluate the quality of the proxy model on this test set.
Based on the eval results, the optimizer adaptively decides to either perform inference using the proxy model or fall back to LLM inference for the first AI.IF
Repeat the above steps for the second AI.IF

In BigQuery, all steps happen on-the-fly during query execution. AlloyDB, being an operational database that targets sub-second latencies, avoids the online proxy model training and the online evaluation. Rather, the query’s proxy models are computed ahead of time in a PREPARE statement, thus moving the cost of sampling, labelling and training out of the critical query path. This enables the offline creation of a big pool of PREPARE statements, while the application chooses the proper PREPARE statement and executes it in the online path.

Let’s take a step back and look at what is really happening at step #3. The proxy model uses each dimension of the review embeddings (from review_embedded) as its features. Modern dense embedding models like Gecko or Gemini capture myriads of semantic notions. In our example with movie reviews, at a high level of abstraction, relevant notions would include: “aesthetic”, “thought-provoking plot”, “underwhelming plot”, or perhaps “boring movie”. We stress the “high level of abstraction” because, in the binary “language” of foundation models, all these notions (and many more) are spread in the numbers of the dense embedding. Do not expect to spot a dimension that corresponds directly to cinematography. Importantly, the embedding space contains many more notions that are irrelevant to our task. The training of the proxy model essentially weighs heavily relevant notions and discards irrelevant ones.

A proxy model (green plane) isolating relevant semantic notions by cutting the embedding space (blue sphere)

Now, let’s enter the details of the particular proxy model, which is used by our current version: logistic regression. To visualize what is happening, think of embeddings as unit vectors forming a (hyper)sphere. For a binary classification task, the proxy model essentially cuts the sphere in two halves. In our example “aesthetic” and “thought-provoking plot” would fall on one side of the plane, whereas “underwhelming plot” and “boring movie” would be on the other side. Conceptually, the orientation of the plane determines which semantic notions are more relevant.

Importantly, the proxy model is tuned for your data and your question: The training of the proxy used a high quality LLM to label a sample from your data for the particular question.

Revisiting when Proxy Models Work

We can now see more clearly what distinguishes cases that proxy models work from cases they don’t: proxy models work well for prompts that can be decided by detecting semantic notions in the embedding space. They will fail for complex prompts that require forms of reasoning that go beyond detecting patterns in the embedding model.

The good news is that, in practice, we have observed that proxy models work for a large class of AI+SQL queries. The SIGMOD26 paper provides a comprehensive evaluation, showing that proxies worked in 11 benchmarks. Specifically, in 10 benchmarks the ratio of proxy F1 to LLM F1 ranged from 90% to 102% and in the 11th benchmark (Amazon Reviews) it was 116%. Notice that the proxy may even deliver better accuracy because it got the benefit of being trained by multiple samples as opposed to the LLM that addressed each row as a new problem.

There is a second limitation currently: extreme selectivities. Notice that Step 1 collects samples. It needs to collect many examples for TRUE and many examples for FALSE. Multiple sophisticated techniques are employed to achieve this, even when the TRUEs are many more than the FALSEs or vice versa. However, no purely sampling technique can confront cases of extreme selectivity, i.e., cases of very few TRUEs or very few FALSEs. This is the reason that the proxies will not be employed in such extreme selectivity cases. However, notice that this problem is fundamentally addressable by various techniques.

Why isn’t Vector Search Enough?

Proxy models appear … suspiciously close to vector search. After all, they also input vector embeddings. Why not just vector search? There are two reasons why vector search is not enough: The obvious one is that proxies are not rankers; they are classifiers: multiclass classifiers (AI.CLASSIFY) or binary classifiers (AI.IF). But, even if you narrow down to just AI.IF, an attempt to simulate AI.IF with vector search will be both hard-to-setup and will give suboptimal results. While proxy models are tailored to your data and your prompts, vector search is based on generic distance functions (such as cosine).

Experimental Results

We present here a subset of characteristic benchmarks from the SIGMOD26 paper. We compare the accuracy of proxy models with using LLM inference on all rows. In terms of quality, the relative accuracy varies from 0.92 (lowest) to 1.16 (highest), which means that for some tasks, proxy models perform slightly better than straight LLM inference.

Dataset	Prompt	F1 (Proxy)	F1 (LLM)	Relative (Proxy/LLM)
Amazon Reviews 10k	Review is {sentiment label}	0.860	0.739	1.163
Banking77	Is intent {intent label}? Think step-by-step: {CoT instructions}	0.700	0.707	0.990
California Housing	Location in Latitude & Longitude belongs to Southern California	0.953	0.953	1.0
FEVER	Is the claim supported by the text?	0.782	0.853	0.917

In terms of scalability and costs, the architectural differences between BigQuery and AlloyDB lead to slightly different results for each system. At a high-level, proxy models move parts of the computation from specialized hardware used by LLM inference services to ordinary database workers. This results in a large reduction in costs and in query latency. In the online training case, employed by BigQuery, for a typical one million row query, proxy models consume about 400x less tokens, and the latency goes down by 30x-100x. In AlloyDB’s case the LLM costs of PREPARE, which are similar to BigQuery’s, can be amortized over arbitrarily many runs of the prepared statements that invoke proxy models.

The cost reduction (token consumed) and latency improvement (query speed up) for various table sizes.

Conclusion

AI functions calling LLMs are becoming commonplace in databases. Choosing the proper model for each AI function is an active area of academic research (e.g. BARGAIN). The key intuition is right-sizing models: Performant cheap models for “easy” problems, powerful reasoning models for the hard problems. Our work builds on the same principles, but while academic research has only used LLMs to navigate the performance spectrum, non-LLM proxy models push performance much further using ultra-lightweight and highly specialized models that deliver surprisingly good quality for many problems. Yet, we should not be surprised: After all, the proxy models feed on the rich semantics that foundation models bring to embeddings and they also feed on being trained by LLMs. As embedding models improve and extract increasingly richer and finer semantics from text and multimodal data (image, video), we suspect that non-linear classifiers will be useful to identify even more complex semantic patterns, further extend the applicability of proxy models (e.g. to AI joins also) and explore additional points on the performance/quality Pareto.

If you would like to learn more, our full paper dives into the differences between online vs. offline training, and compares the performance of different embedding models as well as various proxy models (linear regression, SVM, XGB).

You can try proxy models today in BigQuery (docs) and AlloyDB (docs), dramatically speed up the AI Functions of your SQL queries and reduce their token consumption.

_{We would like to thank Bo Dai, Yuchen Zhuang, Xingchen Wan, and Dale Schuurmans from Google Deepmind for developing the fundamental principles on proxy models in UQE and for their continuous guidance & support along our journey to bring them to Cloud customers. We also thank Yeounoh Chung and Fatma Özcan, our partners in the System Research Group, as well as the AlloyDB and BigQuery engineering teams.}

Beyond source code: The files AI coding agents trust — and attackers exploit

Tue, 12 May 2026 16:00:00 +0000

As AI coding agents become deeply embedded in developer workflows, defenders must evolve their definition of malicious files and rethink how to protect against them.

Autonomous AI agents operate across integrated development environments (IDEs), editors, terminals, and extension runtimes, and they often have access to local files, command execution, and external services. As a result, the attack surface of the modern developer environments now extends well beyond source code. Repository files, agent instructions, runtime settings, and extension packages can all influence what the agent trusts, what it executes, and what it can reach.

Defending this new attack surface requires moving towards semantic analysis to understand the actual instructions, logic, and context being fed to the AI. Powered by VirusTotal Code Insight, our agentic threat intelligence capability in Google Threat Intelligence extracts the true operational intent behind agent-facing files at scale, allowing security teams to expose configurations that override guardrails and mask supply-chain risks.

By integrating agentic capabilities into Google Threat Intelligence, we’re able to link these invisible artifacts to broader threat campaigns. This powerful capability can help ensure that as attackers exploit what AI agents trust, defenders are equipped with the resources to read between the lines.

To help security analysts understand how the developer threat landscape has quickly expanded, we suggest an approach that groups the attack surface into four categories: what executes, what instructs, what connects, and what extends.

Examples of common file types that expand the developer threat landscape.

Attack surface: What executes

Just as developers rely on project configuration to automate setup, debugging, and routine tasks, AI coding agents and modern developer tools also inherit execution paths from repository files. These artifacts can trigger commands, bootstrap environments, and chain execution through normal workflows.

Opening a project, trusting a workspace, starting a debugger, rebuilding a container, or running a standard setup command may therefore execute attacker-controlled logic under the appearance of legitimate project automation.

Attack surface: What instructs

AI coding agents also consume persistent instruction files that shape how they behave inside a project. These files can influence what the agent prioritizes, what it ignores, which tools it uses, which files it trusts, and which actions it takes automatically.

These files do not need to contain exploit code to be security-relevant. Reusing them across repositories introduces a supply-chain risk, because malicious instructions can be presented as harmless guidance while steering otherwise legitimate agent workflows toward unsafe behavior.

Unlike traditional IDEs that require a human to click run, an agent may parse these instructions and execute them as a prerequisite to a task without the developer ever reviewing the specific instruction block.

Attack surface: What connects

Beyond instructions, coding agents also depend on runtime definitions that determine how they interact with tools, hooks, external services, and local execution contexts. These files define permissions, tool connectivity, external endpoints, and execution paths.

This is where repository-level influence becomes operational control. A malicious or unsafe runtime configuration can expose local commands, remote services, sensitive data, and untrusted model context protocol (MCP) servers to the agent, turning configuration abuse into controlled execution.

Attack surface: What extends

Extensions add another layer of inherited trust and introduce third-party code into editor and browser runtimes, often with broad access to local files, credentials, and developer workflows. This inherited trust can create a supply-chain problem similar to malicious project configurations: Compromised extensions, poisoned update paths, and hijacked publisher accounts can introduce attacker-controlled logic through components that otherwise appear to be standard tooling.

Applying VirusTotal Code Insight in agentic threat intelligence

This taxonomy highlights a fundamental shift in the threat landscape: The risk is no longer just in the syntax of code, but in the semantics of intent.

Traditional security tools are effectively blind to natural language instructions that tell an AI to ignore guardrails or redirect data. The operational questions are then: How can defenders identify these risks systematically? How can they detect the danger before a developer or an agent automatically follows a valid instruction file to a malicious conclusion?

To bridge this gap, we use VirusTotal Code Insight and agentic threat intelligence to perform large-scale semantic analysis. Because malicious repository settings and instruction files are often syntactically correct, they frequently return zero detections from signature-based scanners.

Code Insight solves this problem by using AI to analyze the file’s actual logic and read between the lines, surfacing behavioral risks that are invisible to legacy tooling. This context is further enriched within agentic threat intelligence, where security teams can pivot from a single semantic red flag to investigate broader threat infrastructure and associated campaign activity.

Example 1: A Weaponized tasks.json

One representative example is a file distributed under the path coding-challenge/coding-challenge/.cursor/tasks.json. The sample was first submitted to VirusTotal on March 19, and remained undetected by security engines for several days.

VirusTotal Code Insight flagged it as a risk based on the behaviour implied by the configuration itself. The sample has also been verified as malicious by a Mandiant analyst and marked as associated with a tracked threat actor by Google Threat Intelligence.

Screenshot of tasks.json sample.

The Code Insights description indicated that the file, which is parsed when a user opens the project folder in an IDE like Visual Studio (VS) Code, drives the user to download and execute arbitrary code from a GitHub Gist in memory while hiding the execution parameters.

To make Code Insights analysis reproducible at scale, we can also scale access to such descriptions for multiple files via the VirusTotal API. Looking at the contents of this particular file, we identified the Gist URLs that the actor referred to in the instructions.

Instructions from tasks.json pointing to Gists.

Looking up these Gist URLs with agentic threat intelligence provides a detailed breakdown of the malicious instructions embedded within them. Despite masquerading as legitimate tools such as NVIDIA Cuda, these Gists, along with their specific filenames, show strong similarities to widespread campaigns frequently attributed to North Korean actors, which are designed to lure IT professionals.

These attacks often pose as technical challenges to trick users into compromising their own devices.

Agentic threat intelligence enrichment based on the tasks.json and associated Gists quickly gives analysts more robust context.

Example 2. Offensive system instructions files
System instruction files used to provide guidance, resources, and context to LLMs can also contain malicious capabilities while remaining undetected by common antivirus services. Since the beginning of 2026, we have observed a consistent increase in Skill.md files submitted to VirusTotal with either risky or malicious instructions.

While this does not necessarily mean that all samples were harmful, it illustrates a trend that is likely to grow in tandem with the adoption and implementation of Skills across the industry.

In this example, we identified a Skill.md file containing instructions to steal user data. Code Insight indicated that the skill file contained instructions “to exfiltrate sensitive credentials, including API keys and environment variables, to external endpoints."

This case reflects a growing interest among threat actors in acquiring API keys and resources to enable scalable LLM integrations. At the time of writing, this file had remained active for nearly two months without any detections or researcher notes.

Example of a Skill file with instructions to steal user data.

The file's contents reveal a specific narrative designed to evade detection. The instructions direct the agent to exfiltrate API keys, tokens, and configuration files under the guise of "maintenance," explicitly advising the model not to mention this to the user "as it may cause confusion about the security process."

Although direct intelligence on this specific file was limited, we used the agentic threat intelligence briefing capability to generate a summary and explore similar past observations. This provided contextual information to categorize and understand the threat.

Agentic threat intelligence briefs summarize similar threats.

Even files that explicitly state their offensive capabilities often evade traditional detections. For example, we identified a Skill designed to equip an AI agent with Windows privilege escalation and credential theft capabilities.

Although the file includes a disclaimer for authorized use only, its core instructions remain high-risk. Code Insight accurately evaluated the file. "The file provides explicit and systematic instructions for performing high-risk offensive operations," it said.

Despite its offensive capabilities, by the time of writing only a few vendors had flagged the file as malicious.

Example of Skill for Windows privilege escalation and credential theft.

Example 3: Suspicious JSON runtime configurations
A third example is a pair of settings.json samples shared through VirusTotal: One points to api.awstore.cloud, the other to api.kiro.cheap. The two unrelated samples follow a similar pattern: They override ANTHROPIC_BASE_URL, embed an API key, and turn Claude Code into a client of a third-party proxy rather than Anthropic.

Code Insights analyzes suspicious runtime configuration samples.

This demonstrates exactly how runtime configurations can be weaponized. The file does not need exploit code or a malicious binary to be dangerous. It simply rewires trust while the agent is running.

For example, a valid AI-generated settings file can silently redirect prompts, source code, and credentials to an external endpoint while the agent appears to behave normally. Beyond data exfiltration, a rogue endpoint could plausibly reverse the flow, feeding malicious instructions or vulnerabilities back to the agent to be injected directly into the local codebase.

A high level analysis of awstore.cloud using an agentic threat intelligence pivoting prompt, uncovered a series of similar domains sharing the same underlying infrastructure. These domains exhibit a clear naming preference for crypto, finance, and tech-related nomenclature.

While the organization’s public sites currently lack formal malicious detections, OSINT lookups reveal several red flags: a lack of a verifiable legal entity, limited contact options restricted to Discord and Telegram, and a payment model that exclusively accepts cryptocurrency via third-party marketplaces like plati.market.

The settings profile reinforces this pattern. Beyond changing the endpoint, the configuration suppresses telemetry, error reporting, and cost warnings, stripping away the guardrails that would otherwise alert a user. The intent is seemingly to maintain a facade of normal operation while silently redirecting traffic to an opaque third-party service.

While these are technically valid configuration artifacts, their ability to hijack trust and exfiltrate sensitive data is indistinguishable from traditional malware.

Example 4. A Sabotaged Extension Payload
Another low key example we recently identified was that of a VS Code extension for User-centric Use cases Validator (UUV) end-to-end tests submitted to VirusTotal in March. More than one week later, the sample continued to have zero detections, but VirusTotal Code Insights identified suspicious behavior.

The analysis indicated that this specific sample included a well-known protestware payload known as peacenotwar which upon activation writes a blank file named WITH-LOVE-FROM-AMERICA.txt and logs a heart in the console.

Sample of VS Code extension containing malware used to spread political messages.

To bridge the gap between a suspicious file and actionable intelligence, we generated an agentic threat intelligence brief. By feeding the semantic context from Code Insight into the prompt, the agent pivoted across historical data, instantly linking this 'benign' extension to the 2022 cyber activist sabotage of the node-ipc library in response to the invasion of Ukraine.

While this specific event may have limited impact today, it highlights a critical, overlooked weakness in how agents handle configurations. Code Insight bridges this gap by identifying samples that, while technically benign to traditional scanners, harbor clear malicious intent.

In another example, we identified this version of a public AI coding assistant which, according to the feature’s analysis, ‘silently reads the user’s system clipboard contents and transmits this data to a remote server.’ Regardless of the likely benign nature of the sample, the analysis points out a risk for users to consider when using the extension.

Example of public coding assistance that reads the user’s system clipboard contents and transmits data to a remote server.

Rethinking detection for the agentic era

Today, a JSON file or plain-text markdown instructions can compromise environments just as effectively as compiled malware. This shift fundamentally redefines what malicious looks like, as the danger now resides in the semantic intent of common text files that AI agents are designed to trust.

These artifacts do not need to contain exploit code to be high-risk, they simply need to provide instructions that steer an agent’s autonomous actions toward unsafe behavior, data exfiltration, and the silencing of security guardrails.

Securing this new frontier requires expanding beyond traditional syntax-based scanning toward a model of semantic analysis, treating plain-text artifacts with the same rigor as compiled malware.

Organizations can formalize this approach by implementing repository-level security policies that strictly define permitted agent-facing files and ideally mandate that they undergo automated peer reviews before being merged. We also recommend that large-scale teams enforce least-privilege access for coding agents to local files and external services, limiting the potential impact of hijacked configurations and sabotaged extensions.

Ultimately, we recommend that defenders use agentic threat intelligence tools — including VirusTotal AI, the VirusTotal Code Insights API endpoint, and our agentic platform — to supervise the operational intent of these files in real-time.

How Imgix processes 8 billion images daily with G4 VMs powered by NVIDIA Blackwell

Tue, 12 May 2026 16:00:00 +0000

The modern web is extremely visual. People are busy and easily-distracted, and smart companies know they have just seconds to attract would-be customers with compelling images, videos, animations, and other eye-catching elements. That’s why iconic brands like Bugatti, Yeti, Porsche, Spotify, and Sonos rely on Imgix to be the engine driving their online visual media.

Every day, Imgix serves more than 8 billion images and videos for brands like these and many others. With a platform designed to unify media optimization, AI transformation, and global delivery, Imgix ensures that its partners’ digital experiences are fast, personalized, and built for performance. Now more than ever, leading organizations are demanding real-time, high-fidelity media, and they need it to be fast.

To meet that demand, Imgix has evolved its infrastructure from private data centers to a full-stack, GPU-based environment on Google Cloud’s AI Hypercomputer. By transitioning to G4 VMs powered by NVIDIA RTX PRO 6000 Blackwell GPUs, Imgix ramped up its real-time processing capabilities, cutting median latency by 50% and increasing throughput per node by 6x. And it did all of that without changing its core application code.

The challenge: Instant visuals at scale

To capture people’s attention businesses need rich, fast-loading content that can reach millions of users simultaneously across a diverse array of devices.

A big part of that is real-time transformations — resizing, format negotiation, and applying artistic effects — and the computational power required for real-time transformations can be immense.

With inefficient technology, load times can be slow and brands risk giving their users poor experiences. Imgix’s solution to this challenge is a "just-in-time" philosophy. Achieving this requires high-performance instances. And with G4 VMs, they were able to process images instantly upon request rather than pre-rendering and storing millions of image variations.

Adopting the system that runs Google

When companies build on Google Cloud, they get more than just servers: they plug into the same intelligence engine powering Google's many billion-user products. Imgix is leveraging this structural advantage by using G4 VMs.

G4 VMs incorporate eight NVIDIA RTX PRO 6000 Blackwell GPUs, two AMD Turin CPUs, and Google Titanium offloads, which act as a dedicated administrative assistant for businesses’ servers. They handle the ”office chores” of security and data traffic in the background while the main processor does a company’s heavy lifting.

The G4 VM’s custom P2P interconnect yields up to 168% more throughput than standard configurations. With this architecture, Imgix can move all its image processing operations to NVIDIA GPUs and run multiple requests in parallel.

Inside the Imgix architecture

Imgix offers more than 150 different visual filters and its architecture is built to handle transformation requests dynamically based on which filters users choose. The pipeline has four primary stages:

Ingestion: The system retrieves assets directly from customers and routes them to a 2.5 petabyte storage cache on Google Cloud Storage (GCS). This high-speed layer replaces unreliable random web requests with a redundant, geographically distributed infrastructure.
Decoding: High-performance C libraries, supplemented by nvJPEG, decode assets into raw RGBA data. This leverages the G4 VM’s massive parallelism to handle multiple decoding stages, including Huffman decoding, Inverse DCT, and color space conversion.
Transformation: A custom Vulkan compute shader stack handles the core processing. Instead of fixed graphics pipelines, these shaders treat transformations (like resizing or masking) as parallel math problems rather than standard graphics tasks, enabling thousands of simultaneous pixel operations on the G4 VM clusters.
Encoding and Delivery: Once transformed, images are re-encoded using hardware-accelerated tools like NVENC and delivered via a global CDN. Because the G4 VM includes independent hardware engines for NVENC (encoding) and NVDEC (decoding), concurrent image manipulations on the CUDA cores aren’t slowed down.

Advanced video and image intelligence

Imgix is also using NVIDIA’s CUDA libraries for high-performance video analytics. By integrating NVIDIA DeepStream, it executes real-time object tracking within video streams for automated content analysis.

For static imagery, meanwhile, Imgix uses the nvJPEG library to offload computationally intensive JPEG decoding directly to the GPU. This prevents CPU bottlenecks during the ingestion of high-resolution assets while allowing the custom Vulkan compute shaders to begin immediate pixel-level transformations on the raw RGBA data residing in GPU memory.

The results: 50% faster and up to 6x more throughput

Thanks to its transition to G4 VMs, Imgix achieved the significant performance gains mentioned above without having to rewrite its core logic:

A 50% reduction in processing latency: It cut median latency from 100 milliseconds to 50 milliseconds.
A 5x to 6x increase in throughput: Its G4 VMs now handle up to six times the workload of its previous generation nodes.
Seamless migration: Imgix supported the G4 VMs by updating its Terraform scripts without needing to implement any application code changes.

"Building on Google Cloud's AI Hypercomputer isn't just about optimizing our current workloads; it's about future-proofing our platform. It gives us the foundational power to seamlessly weave advanced generative AI capabilities into real-time workflows, allowing our customers to push the boundaries of visual storytelling at global scale." - Alfonso Acosta, Head of Engineering, Imgix

Orchestrating at scale

To support the billions of image and video requests its customers process every day, Imgix built a sophisticated hybrid orchestration model:

Management: Google Cloud Run manages session and account layers.
Core Processing: Google Compute Engine-managed instance groups host the G4 VMs, which allows custom software to use the entire machine with no container "slicing."
Dynamic Scaling: Autoscaling relies on custom application metrics, such as machine queue length, rather than standard CPU use. This ensures that the stack’s most expensive elements are tuned for maximum efficiency.
Self-Healing: A custom mechanism monitors logs for driver faults, automatically "reaping" and restarting GPU instances without manual intervention.
Optimization: To maintain peak performance, Imgix uses NVIDIA Nsight Systems to identify and resolve code bottlenecks.

The future: From experimentation to execution

Even with the significant performance improvements it’s already achieved, Imgix is continuing to expand its AI infrastructure so its customers can access additional advanced capabilities like generative fill, background replacement, object removal, and image upscaling.

Features like these rely on high-performance machine learning systems that must process increasingly complex computations with no loss of speed or quality. By leveraging Google’s AI Hypercomputer, Imgix is now deploying and serving these models efficiently and offering its customers real-time, production-ready AI editing. And as demand grows for more dynamic and personalized visual experiences, this foundation is ensuring that Imgix can continue to deliver powerful capabilities reliably and at scale.

Get started

G4 VMs work natively with Google Compute Engine, Google Kubernetes Engine, Google Cloud Storage, and Vertex AI.

Dive deeper: Explore the Imgix architecture on GitHub.
Start building: Read the G4 VM documentation.

Cloud Storage Rapid: Turbocharged object storage for AI and analytics

Mon, 11 May 2026 17:00:00 +0000

At Google Cloud Next ’26 we announced Cloud Storage Rapid, a family of object storage capabilities for data-intensive workloads like AI and analytics. Out of the gate, Cloud Storage Rapid consists of Rapid Bucket (formerly Rapid Storage), a high-performance zonal object storage offering, and Rapid Cache (formerly Anywhere Cache), which accelerates reads on-demand and colocates compute and data for workloads in existing buckets.

Cloud Storage Rapid is our response to the generational shift in how organizations build with AI. Teams are training trillion-parameter models, deploying inference at global scale, and building autonomous agents that reason over vast amounts of enterprise data. While accelerators like GPUs and TPUs often get the spotlight, they have a critical dependency: storage.

Storage is the engine that feeds accelerators during training, and the fast-access layer that makes real-time inference responsive. But as models scale, storage performance can be a bottleneck. Every time an AI/ML cluster waits on a data read or a checkpoint write stalls, you are paying for expensive compute cycles that aren't doing useful work.

Historically, AI/ML practitioners have had to choose between the specialized performance of a niche, zonal storage system, and the reliability and scale of a global object store like Google Cloud Storage. Many developers value Cloud Storage for its simplicity, scalability, reliability, and cost-effectiveness, but as the AI era has progressed, they’ve been throwing hotter and hotter workloads at it, running training and inference workloads with thousands of GPUs and TPUs. We’ve reached a performance tipping point that traditional object storage was never meant to handle. The Rapid family provides multiple options for co-locating compute workloads directly with high-performance zonal storage. It minimizes I/O bottlenecks that can block accelerators, so that your GPUs and TPUs stay fully saturated and productive. In this blog, let’s take a closer look at Cloud Storage Rapid’s capabilities.

Rapid Bucket

Rapid Bucket (GA), helps Cloud Storage meet the evolving demands of massive-scale generative AI, analytics, and other high-performance workloads. It does so by leveraging Colossus, the Google distributed storage system that powers Gemini and YouTube, to provide massive read/write performance and ultra-low latency in a dedicated object storage zonal bucket.

Lightning-fast performance
By combining the sub-millisecond latency of block-like storage, the throughput of a parallel filesystem, and the scalability and ease of use of object storage, Rapid Bucket provides high performance from the same Cloud Storage that you know and love.

Highlights include:

Ultra-low latency: Achieve up to 20 million queries per second and sub-millisecond latency.
Massive scalability: Rapid Bucket delivers 15+ TB/s of aggregate read throughput from a single Rapid zonal bucket.
New semantics: Enable higher performance with new capabilities such as native appends, unlimited readers (while writing!), and vectored reads.

Optimized for AI and analytics
You can use Rapid Bucket for a variety of demanding scenarios, including AI/ML data preparation, training, checkpointing, batch and streaming analytics processing, and optimizing distributed database architectures.

Key benefits include:

Optimized accelerator utilization: With Rapid Bucket, we observed 50% reduced blocked GPU time and up to 2.5x faster data loading for multi-modal training runs.
Faster checkpointing: Rapid Bucket makes checkpoint restores up to 5x faster and writes 3.2x faster compared to traditional object storage. This ensures faster recovery from workload interruptions, minimizes wasted accelerator time, and increases overall efficiency.

>5x faster checkpoint restores with Rapid Bucket

>3.2x faster checkpoint writes with Rapid Bucket

You can get started with Rapid Bucket here.

Rapid Cache

Originally announced at Cloud Next ‘25, Rapid Cache accelerates bandwidth for AI/ML workloads like data prep, training, and bursty model loading for inference, delivering an aggregate read throughput of 2.5 TB/s for your existing buckets — with no code changes. For inference workloads, we’ve observed that Rapid Cache provides up to 2.1x (114%) accelerated model load, resulting in 47% TCO savings.

When combined with multi-region buckets, customers can flexibly access GPUs and TPUs distributed across regions in a geo, while maintaining a single bucket namespace. This eliminates the need for manually orchestrated data movements between buckets, while benefitting from zonally co-located high performance.

New: Rapid Cache ingest on write
Customers at some of the world’s largest frontier AI/ML labs told us that they were looking for ways to accelerate reads immediately after a write, such as checkpoint restore workloads or a data prep pipeline that then feeds training. Before, caching the data required an initial read to trigger ingestion, which was served directly from the bucket at standard performance.

Rapid Cache’s new ingest on write feature solves this by simultaneously writing data to the Rapid Cache as it is being written to a Cloud Storage bucket. This proactive approach eliminates the initial cache-miss penalty, and helps workloads benefit from an immediate cache hit on the very first read. This provides up to 2.2x faster checkpoint restore times, allowing training clusters to recover faster from interruption.

To enable ingest on write, simply modify the ingestion criteria of your existing Rapid Cache.

Rapid Cache’s simplicity and performance has resulted in explosive adoption. In just one year since General Availability, customers have deployed thousands of Rapid Caches with a 20x growth in caches deployed, In fact,Rapid Cache serves up to 20% of Cloud Storage’s global egress. Cutting-edge AI/ML customers deploy their workloads on Rapid Cache, including Anthropic who uses Rapid Cache to improve the resilience of their cloud workload by co-locating data with TPUs in a single zone and providing dynamically scalable read throughput up to 2.5TB/s.

Case study: Thinking Machines Lab
Thinking Machines Lab is an artificial intelligence research and product company. Its mission is to make AI systems that are adaptable and customizable, building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals.

At Next ‘26, James Sun, Member of Technical Staff at Thinking Machines Lab, spoke at our session, Cloud Storage Rapid: Turbocharged object storage for AI & Analytics, where he presented about the needs of the data-hungry AI/ML workloads that Thinking Machines Lab runs for high-performance storage at scale.

Thinking Machines runs diverse workflows: data processing in Dataflow, Kafka, and Spark, multi-model training, and serving Tinker — a flexible API for fine-tuning open source models. Thinking Machines' workloads run on Google Cloud Storage, Sun explained. Running these data-intensive AI/ML workloads at such a large scale introduces significant infrastructure challenges.

The first is managing a hub and spoke data architecture, where data processing hubs are located in one primary region while training GPUs are spread across multiple regions. Historically, this has made manual data movement and lifecycle management a major operational pain point. Furthermore, Thinking Machines Lab's workloads such as data prep and pretraining workflows, which rely on massive-scale Spark workloads to prepare their multi-modal datasets, often spike from cold to hot instantly. Previously, these surges led to disruptive 429 errors, which stalled data processing and loading, and interrupted critical training cycles.

To minimize these bottlenecks, Thinking Machines Lab integrated Rapid Cache across their AI/ML pipeline, to positive results.

“Rapid Cache has become a core foundation of our AI/ML data infrastructure, supporting our critical workflows, from data prep and pretraining to training and model loading. By acting as a crucial bandwidth shield and booster, it enables us to scale our data-intensive workloads across our entire fleet without compromise, providing us with the on-demand high bandwidth and consistent stability that we need to innovate at speed.” - James Sun, Member of Technical Staff, Thinking Machines Lab

In short, Cloud Storage and Rapid Cache provides Thinking Machines Lab with:

Easy, instant, scalable, on demand bandwidth: The team now achieves stable read throughput peaks of over 1.8TB/s.
Enhanced stability: Rapid Cache has greatly reduced tail-end latencies and 429 errors, providing the consistent performance needed for multi-modal training.
Fleet-wide scalability: Combined with multi-region buckets, they can now scale data-intensive workloads across their entire fleet, meeting the demands of a rapidly growing compute scale without the hassle of manual data movement while benefiting from zonally colocated storage for high performance.
Operational efficiency: The use of Hierarchical Namespace (HNS) has optimized their massive Spark workloads for data preparation, by supporting fast directory renames, along with providing the ability to ramp QPS more quickly as they scale out clusters. Rapid Cache’s "ingest on write" capability helps ensure immediate cache hits for checkpoint restores.

Choose your rocket ship

Whether you are running data preparation, massive-scale training, or low-latency inference, Cloud Storage Rapid delivers high performance together with the reliability and scalability that Cloud Storage is known for.

Rapid Bucket delivers the highest Cloud Storage throughput and queries per second as well as the lowest latency for read/write use cases, such as analytics, AI training, checkpointing, and model serving. This helps to reduce storage bottlenecks and increase compute utilization.
Rapid Cache provides higher read bandwidth and tail latency stabilization in existing buckets, without code changes. Key use cases include AI training, checkpoint restores, and serving, as well as accelerator optionality via multi-region buckets.

Get started with the Cloud Storage Rapid family today!

Cluster-level reliability for trillion-parameter models on TPUs

Mon, 11 May 2026 16:30:00 +0000

Frontier AI models have redefined the unit of compute. At trillion-parameter scale, AI training requires thousands of interconnected components, orchestrated in industrial-scale deployments to operate as a single, massive entity.

Likewise, when it comes to reliability, aggregate infrastructure availability is what matters. Yet for almost two decades, instance-level reliability has been the cloud standard. Designed for microservices and horizontally scalable applications, instance-level reliability treats infrastructure as a collection of small independent units. This model is fundamentally inadequate for large-scale AI workloads.

We believe reliability must shift from an instance- to a cluster-level model.

For over a decade, Google has operated Tensor Processing Unit (TPU) clusters at scale, achieving reliability that meets the architectural requirements of modern AI workloads. In this blog, we’re presenting our cluster-level reliability framework for Google Cloud TPUs that focuses on collective performance at the superpod level, and one we use internally within Google to build the world’s most advanced AI models. This framework is the operational standard for TPUs in production today, and serves as the architectural blueprint for our recently announced eighth-generation TPUs.

Reliability for AI supercomputers

TPU superpods consist of thousands of chips arranged into cubes (64 TPUs), with high-speed Inter-Chip Interconnect (ICI) links connecting every chip within a cube and a dynamically configurable Optical Circuit Switch (OCS) network connecting all cubes to form a superpod.

For system-wide training progress, we must maximize the number of fully healthy cubes within a superpod. Because the performance of AI models relies on high-bandwidth, low-latency communication, every chip and ICI link within a cube must be operational for that unit to contribute to the training progress. Driven by these architectural realities, our cluster-level framework helps define how the industry can achieve reliability in the AI era, moving from instance-level reliability to availability of scale.

Deep dive: The mathematics of availability at scale

Instance-level reliability models are often deterministic, but industrial-scale AI deployments require a probabilistic approach over thousands of chips. In a traditional setup, you might track the Mean Time Between Failures (MTBF) of a single chip. However, at the scale of frontier AI, the cluster-level MTBF drops sharply as the number of components grows.

To visualize how quickly scaling can erode confidence, we can look at simple bounds like Markov’s inequality.

If we define X as the number of failed cubes, Markov’s inequality reminds us that as the expected number of failures E[X] increases with cluster size, the probability of staying below a strict failure threshold becomes increasingly difficult to guarantee without systemic architectural changes.

While Markov’s inequality provides a helpful rule of thumb for the risks at scale, we model the availability of scale using a binomial distribution of aggregate cluster health. For a superpod composed of n independent units (cubes), we define the probability of having at least k fully operational and interconnected cubes as the cumulative distribution of the success of n independent trials. To ensure a 95% confidence interval for training productivity, we solve for k where:

Where n represents the total cubes in a superpod and p represents the aggregate cube-level availability.

This model replaces the instance-level model with a topology-aware framework that mirrors actual performance requirements of large-scale training, ensuring that the larger block of compute is healthy and connected and can drive continuous training progress.

Scale of modern AI hardware

To demonstrate this new reliability model, we used Ironwood, Google’s generally available, seventh-generation TPU, and the custom silicon behind advanced models like Gemini and Nano Banana.

Pictured: Part of an Ironwood superpod, directly connecting 9,216 Ironwood TPUs in a single domain.

An Ironwood superpod is a dense, high-performance fabric consisting of 9,216 chips integrated into a single compute domain. It’s organized into 144 cubes, where each cube contains 64 chips. Within these cubes, ICI links create an extremely dense, all-to-all network fabric that provides massive bandwidth and low-latency connectivity for distributed operations within the cube. To form a superpod, 144 cubes are connected using OCS. For large jobs, capacity can be provisioned by interconnecting multiple cubes within a pod into one super-slice, or connecting multiple slices to form a multi-slice cluster. Cubes across multiple superpods can be connected over the datacenter network to run even larger workloads.

Using this model, we determine that the topological availability for an Ironwood superpod is 130 out of 144 cubes available for 95% of the month. This translates to a large compute block of 8,320 chips that are fully operational and interconnected via ICI and OCS, establishing a reliability model specifically optimized for hero jobs (the massive training runs of frontier AI).

The relationship between cluster size and its statistical availability is non-linear. By adjusting the required confidence level, we can identify the slice size that can be supported with statistical certainty. For researchers, this mapping provides a capacity availability curve. An organization with a workload that requires 99% availability for a mission-critical run can optimize their slice size to 125 cubes, while those pushing utmost scale can utilize 130 cubes at the 95% confidence interval.

Capacity availability curve for an Ironwood superpod (144 cubes)

This new reliability model maximizes the utility of the entire superpod through:

Full access: This model does not constrain capacity utilization; it focuses on the availability of fully healthy cubes. While a single chip or ICI failure results in the entire cube being classified as unhealthy, customers continue to have access to the remaining capacity within the cube. This makes most of the Ironwood superpod available for use while also optimizing the compute footprint for high-stakes, large-scale training.
Optimized resource usage: While the 130-cube model focuses primarily on large-scale training runs, the full superpod remains available for a heterogeneous mix of workloads. This allows researchers to utilize the remaining cubes for research experiments, inference, and dev/test workloads, maximizing the utility of the superpod without compromising the reliability of the main training run.

Our customers are using Ironwood at scale today and this model has empowered them to train their most demanding hero jobs.

Enhancing ML productivity

The goodput metric is the primary measure of ML productivity. Our new reliability standard provides the deterministic foundation for goodput and is engineered to maximize this metric for demanding hero jobs, so that the massive scale infrastructure required for frontier research is ready to perform as a single entity.

This model achieves high scheduling goodput, one of the three goodput metrics, by making the full set of resources available for massive-scale training runs. Combined with the software stack, this infrastructure-level availability helps deliver the high overall goodput. We achieve this through a three-layer reliability model:

Infrastructure: TPU superpods provide the capacity footprint to ensure the necessary scale is physically available and connected.
Frameworks: JAX and Pathways provide resilience, reconfiguring or hot-swapping around failed nodes to maintain forward progress without requiring a full restart.
Application: Fault-tolerance mechanisms like auto-checkpointing and multi-tier checkpointing preserve training state, so that lost progress is minimized in case of a failure.

Enabling the next generation of AI breakthroughs

The cluster-level reliability model marks the beginning of a new standard for the AI era, where an AI supercomputer is a dependable, industrial-scale engine for innovation. By aligning our reliability posture with the demands of frontier models, we’re enabling the next generation of AI breakthroughs to be faster, more reliable, and more predictable. Click here to learn more and get started with TPUs.

Gemini 3.1 Flash-Lite is now generally available on Gemini Enterprise Agent Platform

Thu, 07 May 2026 18:00:00 +0000

Today, we’re thrilled to announce that Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model yet, is now generally available.

Designed for ultra-low latency, high-volume tasks, and unmatched cost-efficiency, Flash-Lite is already transforming how applications are built at scale. Fast, iterative, and scalable, it joins our comprehensive suite of Pro and Flash models to provide the exact combination of intelligence, speed, and cost required for the most demanding production deployments.

Developers and enterprises have noted that the model provides the precision required for agentic tasks like tool calling and orchestration, coupled with the cost-efficiency needed to run automated pipelines at scale.

Here’s a look at how some of them have been driving value.

Software development and engineering

Engineering teams require models that can keep pace with real-time coding environments. With the GA of Gemini 3.1 Flash-Lite, developers are unlocking the instant responsiveness necessary for complex code completion, seamless UX design, and agentic developer tools.

“Integrating Gemini 3.1 Flash-Lite has transformed the responsiveness of our IDE AI assistant & Junie agent. The balance of high intelligence and minimal latency makes it the perfect model for real-time developer support." — Vladislav Tankov, Director of AI at JetBrains

Customer experience and high-volume service

For enterprise customer service operations, handling massive volumes of interactions requires models that can scale affordably without sacrificing reasoning capabilities.

Gladly runs customer service for some of the most demanding retail brands in the world. The core of its text-channel AI agent runs on Flash-Lite. By handling millions of customer-facing calls each week across channels like SMS, WhatsApp, and Instagram, they achieved roughly 60% lower costs than comparable thinking-tier models on the same token mix.

The model powers every step of the agent lifecycle — from selecting tools and classifying playbooks to deciding when to escalate to a human — all while maintaining a p95 latency around 1.8s seconds for fully reply generation and sub-second p95 for classifiers and tool calls, alongside a ~99.6% success rate under heavy concurrent load.

Creative pipelines and gaming

In the fast-paced creative and gaming industries, multimodal capabilities and ultra-low latency are essential for keeping users engaged and content pipelines flowing. Flash-Lite is empowering platforms to process rich media and generate hyper-personalized environments.

Astrocade lets anyone create games by describing what they want in natural language. They integrated Flash-Lite to serve a rapidly growing global user base.

For every incoming game request, it performs a multimodal safety check — analyzing both text and images — before the building agents even start their work. It further supports their global community through inline comment translation, allowing players in different countries to “riff” on the same game. And as part of their asset generation pipeline, it helps refine the final prompts to ensure consistently high-quality thumbnails.

The creative platform krea.ai has also seen positive results by using Flash-Lite as a prompt enhancer in their Nodes tool. By taking a user’s rough idea and expanding it into a full image generation prompt pipeline, the model provides a level of detail that is “weirdly creative” for its price point.

These outputs move the needle on image production, providing a level of reliability and scale that was previously cost-prohibitive for sophisticated prompt engineering.

Financial services and data operations

In the world of finance and enterprise product development, efficiency is just as critical as accuracy. Gemini 3.1 Flash-Lite gives financial analysts and product managers the ideal balance of intelligence, low latency, and cost-effectiveness to run modeling and latency-sensitive applications.

OffDeal uses Flash-Lite to power “Archie,” an AI agent that investment bankers use for real-time research, data lookups, and task execution during Zoom calls. In these scenarios, bankers often need to surface financials mid-conversation. OffDeal found that Flash-Lite was the only model capable of meeting the response times needed for genuinely instant answers without forcing a tradeoff on quality.

Beyond live calls, they also use Flash-Lite as a triage layer for inbound and outbound email traffic. By answering structured questions about messages in parallel, such as whether an email is an automated response or in relation to an active deal, Flash-Lite determines which downstream AI agents get invoked and with what context.

For high-volume, latency-sensitive workflows on the financial operations platform Ramp, Flash-Lite has become a key component:

“Gemini is a core part of the model stack we use across applications at Ramp. As indicated in our benchmarks, we see Gemini lead the pareto fronts in terms of costs, latency and intelligence—providing a great tradeoff between the three and making it well-suited for latency sensitive applications. Gemini 3.1 Flash-Lite has been especially valuable, powering many of our highest-volume, latency-sensitive features without compromising on quality.” – Anton Biryukov, Applied AI Engineer, Ramp

Market intelligence platform AlphaSense integrates Flash-Lite to deliver data insights:

"Gemini 3.1 Flash-Lite provides great balance of speed, cost and performance, allowing AlphaSense to scale our advanced data processing and deliver high-quality intelligence across every layer of our data stack”– Chris Ackerson, Senior Vice President of Product, AlphaSense

Get started

Read the docs for Gemini 3.1 Flash-Lite and learn about our latest pricing structure. Learn more about the Gemini Enterprise Agent Platform, the new standard for enterprise agent development.

How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms

Thu, 07 May 2026 16:00:00 +0000

The agricultural and crop protection supply chain is one of the most intricate networks in the world. It takes up to two years to turn active ingredients into the final products farmers need, and a single change in weather or regulations can disrupt everything. Planners at BASF Agricultural Solutions navigate this reality daily across 180 production sites. To understand how local decisions ripple across their entire global network, BASF turned to AlphaEvolve on Google Cloud to build a digital twin of their supply chain.

Planning across a two-year lead time

BASF Agricultural Solutions manages a network with over 5,000 distinct value chains. Creating a single end product requires a bill of materials that can be over 30 levels deep, moving across different production sites and regions.

Currently, human planners make thousands of local decisions every day. They decide what to produce, when to produce it, and how much safety stock to hold. Because the network is so large, a planner can’t easily see how a localized decision affects the rest of the global supply chain.

This scale can lead to additional working capital and inventory and or cause production imbalances. Traditional mathematical models struggle to capture the dynamic reality of the network that planners navigate based on years of experience.

Building a foundation for decision support

AlphaEvolve is an evolutionary coding agent that generates and refines algorithms autonomously. In collaboration with Google Cloud and prognostica GmbH, BASF’s objective was not to replace human decision-making, but to establish a new model for decision support that helps planners handle the real-world complexity of the production network.

The team gave AlphaEvolve a foundational "seed" program. This initial code established a standard planning logic that translated demand forecasts into production schedules, serving as a functional baseline before introducing dynamic, network-wide coordination. From there, they fed the model three years of historical data, including inventory levels, market demand, and actual production outputs. AlphaEvolve then generated variations of the code, mutating the logic to see if it could simulate a supply chain that matched the real-world historical data.

Measuring what good looks like in initial tests

For AlphaEvolve to improve, it needed a specific goal. The evaluation function scored every new piece of generated code on one primary metric: how closely the simulated inventory levels and production decisions matched the actual historical reality recorded by BASF.

The latest AlphaEvolve runs delivered more than 80% relative improvement in accuracy compared to the initial seed model. With further adjustments, the team expects to push performance even higher — bringing the model to a level of accuracy not achieved with other approaches and making it actionable for operational use.

The results

The evolved planning logic delivered immediate, measurable improvements over the initial seed model. The final algorithm successfully mirrored the actual historical performance of the supply chain, significantly reducing the error rate compared to the initial seed.

“We had several attempts to build a digital twin for our complex supply network using deterministic models, and all of them failed,” said Dr. Goetz Krabbe, vice president for global supply chain at BASF. “By using AlphaEvolve, we cannot only map the complex network based on system data, but at the same time understand and copy the human decisions that drive our daily operations. This gives us a highly accurate and easy to maintain data driven digital twin of the entire network. Using it we can optimize our inventory levels and respond to market volatility with confidence while avoiding stockouts."

What the evolved algorithm actually does

By running thousands of experiments, AlphaEvolve developed a clear, human-readable algorithm that explains how the BASF network truly operates. It automatically discovered factually correct, domain-specific supply chain rules that explain the observed production outputs and inventory levels for the tested product value chain:

Production consolidation: The algorithm learned to group production amounts together, accurately mapping how planners optimize plant time.
Dynamic safety stocks: It introduced safety stock parameters to handle volatile and seasonal demand patterns, helping to strictly manage capital costs while preventing out-of-stock situations.
Network-wide coordination: The model successfully mapped the dependencies between different production tiers, providing a clear foundation for optimizing asset utilization globally.

What's next

The initial simulations showed that evolutionary AI can accurately model large-scale, dynamic supply chains. BASF’s objective is to create a digital twin of their entire global production network as a new foundation for simulation, decision support, scenario forecasting and optimization. This will allow the team to continuously simulate operations, identify hidden bottlenecks before they affect throughput, and optimize asset utilization across all global facilities.

_{This project was a collaboration between the BASF SE team including: Benjamin Priese, Michael Arlt, Debora Morgenstern and Tobias Hausen as well as Manuel Doerr and Thomas Christ from Prognostica GmbH Würzburg, and the AI for Science team at Google Cloud including (but not limited to): Kartik Sanu, Laurynas Tamulevičius, Nicolas Stroppa, Chris Page, Srikanth Soma, John Semerdjian, Skandar Hannachi, Vishal Agarwal and Anant Nawalgaria as well as Christoph Tittelbach from the Google account team and partners at Google DeepMind}