Infrastructure Modernization

BGP route policies: Top 3 use cases by customer demand

Tue, 07 Jul 2026 16:00:00 +0000

When we first made BGP route policies for Cloud Router generally available over a year ago, our goal was to give network administrators deep, programmable control over how network paths are evaluated and propagated. Since then, we’ve been watching closely how our customers have adopted this feature. We've seen network engineering teams build incredibly sophisticated, resilient routing architectures that were previously difficult to achieve without third-party virtual appliances.

This year, we launched policy named sets for Cloud Router. As routing environments grow more complex, managing individual prefixes or communities within these policies can become cumbersome.

Policy named sets solve this by allowing you to group lists of IPv4/IPv6 prefixes or BGP communities into a single, reusable entity. This significantly simplifies your configurations, making it easier to scale, manage, and update your routing rules across multiple Cloud Routers.

Powered by the Common Expression Language (CEL), BGP route policies allow you to define fine-grained, ordered rules to filter BGP routes and modify route attributes directly within Cloud Router.

To celebrate the launch of policy named sets, we want to highlight three of the most impactful ways we've seen customers use BGP route policies over the past year, along with resources on how you can build them yourself.

1. The foundation: Route filtering and network protection

Before manipulating traffic paths, network stability requires strict control over which routes are allowed into and out of your network. We've seen customers extensively use BGP route policies to filter out unwanted learned routes from peers or prevent specific subnet prefixes from being advertised out of their Virtual Private Cloud (VPC).

Operating on a "fail open" model by default, many security-conscious organizations have adapted BGP route policies to create a "fail closed" environment — appending a "drop all" policy as the final term in their evaluation list. This helps enable absolute certainty over accepted network routes, preventing routing loops and ensuring traffic isn't BGP hijacked or inadvertently blackholed.

Dive deeper: For a foundational look at how to set up CEL expressions for route filtering, check out our deep-dive guide: Introduction to BGP policies.

2. Influencing traffic paths for active/standby architectures

Achieving optimal traffic distribution often requires forcing traffic down a specific path, whether for cost optimization or managing active/standby interconnects. Customers have used BGP route policies to influence the preferred BGP route without touching their on-premises hardware.

By dynamically modifying the BGP multi-exit discriminator (MED) attribute, network teams can make a specific peer preferred for incoming traffic. Conversely, if they want to steer traffic away from a congested or backup link, they are using AS-PATH prepending — adding one or more values to the route's AS-PATH to deprioritize it across the broader network.

Dive deeper: To see the configuration steps for managing MED and AS-Path prepending, read: Using BGP policies to influence traffic paths.

3. Solving asymmetric routing with BGP communities

One of the most advanced and highly requested use cases we’ve seen over the last year is achieving traffic symmetry. When enterprises use stateful firewalls or specific network appliances on-premises, return traffic must flow back through the exact same appliance it originated from. If it doesn't, the traffic is dropped.

Customers are successfully solving this by using BGP route policies to match against specific standard BGP communities. By tagging routes with specific communities on-premises, Cloud Router can read those tags via inbound policies and adjust the route preference by manipulating the MED accordingly. This helps ensure that Google Cloud inherently understands the stateful topology of the on-premises network and routes the return traffic symmetrically.

Dive deeper: To learn how to architect stateful traffic symmetry using BGP community tags, explore: Using BGP communities to create traffic symmetry.

Get started today

Taking control of your dynamic routing is now easier and more robust than ever. Using BGP route policies, it's a great time to optimize and secure your hybrid cloud connectivity.

We recommend testing your BGP route policies in a staging environment to verify your CEL expressions and routing logic before rolling them out to production. To explore the technical documentation, check out the BGP route policies overview.

Cloud Network Insights: end-to-end observability for the Cross-Cloud Network

Wed, 17 Jun 2026 19:30:00 +0000

In today’s digital landscape, the network is no longer confined to a single data center or even a single cloud provider. Enterprises are increasingly adopting cross-cloud strategies, connecting Google Cloud workloads to on-premises environments, other clouds like AWS and Azure, and a vast array of internet-facing applications. While this flexibility drives innovation, it can also introduce significant operational complexity. When a user experiences degradation in application performance, the critical question remains: Is it the network, the application, or something else?

We are excited to announce the general availability of Cloud Network Insights, an out-of-the-box, Google Cloud-native solution that provides comprehensive visibility into network and digital experience performance across complex multi-cloud, and hybrid environments.

Closing the visibility gap with active monitoring

Cloud Network Insights, offered in partnership with Broadcom AppNeta, expands your observability beyond Google Cloud to your entire global deployment. By utilizing active synthetic probing, the solution monitors network routes even when no user traffic is present, allowing teams to be proactive rather than reactive.

Whether the source of degradation is in the cloud, on-premises data centers, internet applications, ISPs, or last-mile connectivity, Cloud Network Insights helps you pinpoint the exact location of the bottleneck.

Cloud Network Insights integrates directly into the Google Cloud Observability suite, bringing sophisticated network intelligence into the tools you already use. With Cloud Network Insights, you get:

End-to-end network path visibility: Gain a hop-by-hop visualization of the network path between your sources and destinations. Monitor critical metrics like round-trip time (RTT), packet loss, and jitter across networks you don’t directly manage.
Digital experience insights: Go beyond the network layer to monitor digital experience for web applications. Measure DNS resolution times, HTTP response codes, and full browser page-load times to identify whether an application's degradation is due to the network or the application itself.
Proactive detection and alerting: Use synthetic testing to identify performance dips before they impact your customers. Alarms are integrated with Cloud Monitoring and Cloud Logging, enabling alerting via email, Slack, or PagerDuty.
SLA validation: Arm your team with the data needed to verify if ISPs and service providers are meeting their performance commitments.
Rapid root-cause analysis: Quickly differentiate between network problems, application-level issues, or browser performance impacts.
Integrated monitoring: Access metrics and logs directly within Google Cloud, leveraging Cloud Monitoring and Cloud Logging for dashboards and alerting. Utilize the open partner ecosystem of Google Cloud as well as support for the OpenTelemetry protocol for metrics and logs, allowing direct ingestion by OTel SDKs and collectors.
Agentic workload monitoring: Use synthetic testing to monitor connectivity and network performance to help ensure optimal connectivity to your agents and tools.

Network performance and multi-path routes to/from Google Cloud, AWS, and Azure in one view

How it works: active synthetic probing

Cloud Network Insights uses active synthetic probing technology that consists of three main components:

Monitoring Points: You deploy lightweight software agents, called Monitoring Points, into critical network segments, such as a central VPC, a remote branch, or an on-premises data center. These can be deployed as containers or virtual machines.
Synthetic probes: These Monitoring Points send small, frequent bursts of synthetic traffic (simulating a user or application) to a target destination. This allows you to monitor performance 24/7, even when no real users are on the network.
Data synchronization: The Monitoring Points send real-time performance telemetry to a central backend service. This data is then synchronized back to Google Cloud, with metrics exported to Cloud Monitoring, and alarms and events sent to Cloud Logging.

Core capabilities

Cloud Network Insights supports two primary types of monitoring to give you a full picture of your infrastructure:

1. Network performance monitoring (Layers 3 and 4)

This provides a hop-by-hop visualization of the network between a source and a destination, including.

Metrics captured: Round-trip time (RTT), packet loss, jitter, and path changes.
Single-ended mode: The agent probes an external target (like a URL, IP address or an API endpoint) that doesn't have a Monitoring Point installed.
Dual-ended mode: The Monitoring Point probes another Monitoring Point. This provides richer data, including precise one-way latency and the ability to detect asymmetric routing (when data takes a different path going out than it does coming back).

Network path metrics in Google Cloud console

2. Digital experience monitoring (Layer 7)

With digital experience monitoring, you can track the end-to-end experience of a web application. Here, you can choose from:

Browser mode: Uses a real browser engine (Selenium) to load full web pages, execute JavaScript, and render content. It measures complete page-load times to validate the actual user experience.
HTTP mode: Sends synthetic HTTP/S requests to a URL or API endpoint. This is a lightweight check for server availability, response time, and DNS/TLS performance.

Intelligence and automation

Cloud Network Insights also offers a variety of monitoring and troubleshooting capabilities.

Proactive alarms: Cloud Network Insights leverages auto-baselining to establish dynamic performance thresholds based on your historical metric data. If a metric deviates from your defined parameters, the system instantly triggers an event in Google Cloud, routing alerts directly to your team via email, Slack, or PagerDuty.
Monitoring policies: You can automate monitoring setups across large-scale environments by defining policies that dynamically create or remove paths based on custom tags. For instance, you can automatically track a core web application's performance from specific geographic regions.
Root-cause analysis: Because Cloud Network Insights extends visibility into traditionally "unwatched" areas like ISPs and transit networks, it instantly pinpoints whether a slowdown is occurring within Google Cloud, at the ISP level, or inside another cloud environment like AWS or Azure.
AI-driven insights: With integration to Gemini Cloud Assist, you can use natural language to interrogate Cloud Network Insights telemetry alongside your broader infrastructure data. Rather than manually pivoting between dashboards, ask Gemini to cross-reference specific Cloud Network Insights metrics against other Google Cloud metrics, reducing mean time to resolution (MTTR).

What customers are saying

We are already seeing strong interest from customers looking to simplify their cross-cloud operations. Organizations like Sabre and Pexip are already using Cloud Network Insights to gain clarity in their hybrid environments.

"In an environment as complex and high-scale as Sabre’s, total visibility isn't just a luxury — it's a requirement for operational resilience. Cloud Network Insights will enable us to further shift our posture towards proactive optimization. By providing granular, real-time telemetry across our global cloud footprint, it helps eliminate the traditional 'black box' of the network, allowing our teams to resolve bottlenecks before they impact the traveler experience." - Alfredo Rodriguez, VP of Cloud and Infrastructure, Sabre

“Cloud Network Insights closes the 'visibility gap' between the private corporate network and the public cloud, empowering our joint customers to pinpoint performance bottlenecks in seconds rather than hours.” - Alan Davidson, CIO, Broadcom

Get started today

Navigating complex digital ecosystems shouldn't mean sacrificing visibility. Cloud Network Insights bridges the gap across multi-cloud and hybrid environments by combining deep network performance metrics with digital experience monitoring. Coupled with direct integrations into Google Cloud Observability and Gemini Cloud Assist, your teams are empowered with intelligent alerting, robust SLA validation, and rapid root-cause analysis. We look forward to helping you gain a clearer, unified view of your Cross-Cloud Network.

You can get started in the Google Cloud console today. To learn more:

Explore our product documentation for deep dives into deploying Monitoring Points and configuring policies.
Check out the latest release notes to stay updated on new features.
Watch the overview video
Hear more about the partnership between Google Cloud and Broadcom:

Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators & more

Fri, 29 May 2026 16:00:00 +0000

AI and cloud technology are reshaping every corner of every industry around the world. Without our customers, who are building the future on our platform, there would be no Google

Cloud. In this regular round-up, we dive into some of the exciting projects redefining businesses, shaping industries, and creating new categories.

For our latest edition, we learn how Urban Outfitters sped up its order management; BASF uses AlphaEvolve algorithms to map global supply chains; the unification strategy for UKG’s workforce intelligence; WPP’s secrets to training humanoid robot camera operators; how Breuninger piloted Virtual Try-On APIs; creating automated video clips with Glance; and Movix improves the production of dental aligners.

Be sure to check back next month to see how more industry leaders and exciting startups are putting Google Cloud technologies to use. And if you haven’t already, please peruse our list of 1,302 real-world gen AI use cases from our customers.

Urban Outfitters saves big by migrating order management

Who: Urban Outfitters, Inc. (URBN), the popular clothing and home goods retailer, relies on IBM Sterling OMS as the nerve center of its global ecommerce operations. However, the foundation of this critical system — a massive 11TB Oracle database — was increasingly becoming a bottleneck.

What they did: URBN completed a major infrastructure upgrade, migrating its IBM Sterling OMS from an Oracle database to Google Cloud's AlloyDB for PostgreSQL. To enhance performance and provide high availability and scalability, the AlloyDB deployment architecture includes two read replicas, providing low-latency access to data for reporting and analytics. Google Cloud and IBM teams also assisted URBN in a rigorous, iterative switchover testing strategy.

Why it matters: The migration to AlloyDB has fundamentally reshaped URBN’s data strategy, delivering a more favorable total cost of ownership through an optimized storage and compute architecture, without sacrificing performance or reliability. Furthermore, the shift to a PostgreSQL-compatible database gave URBN the flexibility of an open-source ecosystem, providing freedom from vendor lock-in, as well as significant speed improvements that enhanced responsiveness.

Learn from us: "URBN’s successful migration serves as a blueprint for organizations looking to modernize their mission-critical infrastructure and future-proof their environment for AI expansion. This journey proves that even the most complex, mission-critical migrations can be achieved through deep cross-organizational partnership and a phased, risk-mitigated approach." – Rob Frieman, CIO, Urban Outfitters & Raj Pai, VP, Product Management, Databases, Google Cloud

BASF manages supply chain decisions with AlphaEvolve

Who: BASF Agricultural Solutions manages a complex network of 180 production sites with more than 5,000 distinct value chains. Currently, human planners make thousands of local decisions every day on what to produce, when to produce it, and how much safety stock to hold.

What they did: To understand how local decisions ripple across their entire global network, BASF turned to AlphaEvolve on Google Cloud to build a digital twin of their supply chain. In collaboration with Google Cloud and prognostica GmbH, BASF fed the model three years of historical data and then generated variations of the code, mutating the logic to see if it could simulate a supply chain that matched the real-world historical data.

Why it matters: By running thousands of experiments, AlphaEvolve developed a clear, human-readable algorithm that explains how the BASF network truly operates. The final algorithm successfully mirrored the actual historical performance of the supply chain, significantly reducing the error rates compared to the initial seed model. It automatically discovered factually correct, domain-specific supply chain rules, providing a clear foundation for optimizing asset utilization globally.

Learn from us: “We had several attempts to build a digital twin. … By using AlphaEvolve, we cannot only map the complex network based on system data, but at the same time understand and copy the human decisions that drive our daily operations.” – Dr. Goetz Krabbe, vice president for global supply chain at BASF

UKG unlocks real-time workforce intelligence at scale

Who: UKG is one of the leading providers of human capital management (HCM) and workforce management (WFM) solutions, but years of growth led to backend sprawl. They have 126 application teams, dozens of tech stacks, and more than 12,000 database instances.

What they did: To bring the full UKG suite onto one real-time foundation, the company built People Fabric, a new data and intelligence platform powered by AlloyDB for PostgreSQL and the just-announced Agentic Data Cloud. They created a custom change data capture (CDC) framework to extract changes from existing operational databases, and for larger analytical workloads, the same data flows into BigQuery, while Cloud SQL holds the metadata and tenancy context.

Why it matters: People Fabric gives UKG a complete and consistent view of people, work, pay, and culture data that’s updated continuously and ready for AI to use in real time. For engineering teams, People Fabric acts as a database-as-a-service that accelerates development and supports modernization without customer disruption. Additionally, migrating core person and employment data off their on-prem monolith has generated cost savings significant enough to fund half of People Fabric.

Learn from us: “As we continue expanding People Fabric, we’re laying the groundwork for deeper agentic automation, more responsive analytics, and a growing set of AI-driven capabilities — all on a trusted, scalable foundation built for what’s next.” – Radhi Chagarlamudi, Group Vice President, Product Engineering, UKG & Heather White, Cloud Data Architect, Google Cloud

WPP accelerates humanoid robot training 10x with G4 VMs

Who: WPP is one of the world’s largest marketing organizations, handling $70 billion of media for enterprise clients. They work on some of the most complex commercial film shoots and were eager to test the viability of robotic cameras to capture more footage, but this required complex training of physical models AI.

What they did: WPP used the new G4 VM instance powered by NVIDIA RTX PRO 6000 Blackwell on Google Cloud to tackle the unique challenges of training physical AI for robotics in videography settings. After capturing human motion with the OptiTrack mocap system, they undertook reinforcement learning using the AI Hypercomputer together with the NVIDIA Isaac Sim image. MuJoCo, an open source physics engine by Google DeepMind, was a critical piece of simulation software that validated accuracy continuously, in real-time.

Why it matters: WPP was able to utilize a P2P topology that moves data directly between GPUs without the bottleneck of central processing. They saw speed increases in excess of 10x, taking training times down to less than one hour. Through high-volume simulation, the humanoid robots learned how to respond to small changes and bridge the tough "sim-to-real" gap, helping ensure the robot's simulated adaptability translated to safety and stability in the real world.

Learn from us: "Our process for mastering complex, natural movement on a film set can be replicated across industries to overcome the massive computational complexity of training robots." – Perry Nightingale, SVP of Creative AI, WPP

Breuninger boosted sales with its "be your own model" AI

Who: Breuninger, a fashion and lifestyle company based in Germany, thought emerging generative media models could be a good fit to answer the question every online fashion shopper asks: "How will this look on me?"

What they did: Working with Google Cloud, they built a virtual try-on experience that lets shoppers see high-end fashion on their own bodies using a simple selfie. Using the Virtual Try-On (VTO) API, Breuninger’s data team worked directly with Google’s engineers to test and refine the technology in three stages, ultimately moving from pre-selected models to a user-first, selfie-based approach. The project was also part of Breuninger’s move to a Flutter-based platform, which helped the team move from its vision to a live launch in only three months.

Why it matters: During a six-week A/B test over Black Week and the holiday season, the team found that shoppers who used the virtual try-on converted purchases at a higher rate than those who didn't. Customer surveys reinforced the numbers: shoppers responded well to the high image quality and the personalized experience.

Learn from us: “Breuninger continues to refine the experience based on how customers actually use virtual try-on in everyday shopping — the same user-first approach that shaped the project from the start.” – Daniel Rascher, Senior Product Owner, Breuninger & Dr. Michael Menzel, Customer AI Specialist, Google Cloud

Glance turns hours of video into mobile-ready clips

Who: Glance, a mobile-first content platform, processes 1-2 hour videos from sources like podcasts, news reports, movies, and web series, and transforms them into 30 to 180-second vertical clips optimized for mobile lock screens.

What they did: The goal was to create a complete pipeline that takes a long-form landscape video (16:9) and outputs multiple ready-to-publish short-form portrait videos (9:16). The final technical solution uses Google Cloud Speech-to-Text v2, Gemini, and the Google Vision API, combined with custom video manipulation using Samurai (an open-source object tracking tool), OpenCV and MoviePy. The process involves audio extraction, speech-to-text transcription, and using Gemini 2.5 Flash to analyze transcript text and identify optimal start and end timestamps for short video clips.

Why it matters: With daily volume projected to grow from 3,500 to over 10,000 videos per day, manual editing wasn’t a realistic path forward. Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. The system transforms thousands of long-form videos into mobile-ready clips each day, preserving narrative context while optimizing for vertical viewing. Rather than choosing between scale and quality, automated pipelines can deliver both.Learn from us: “Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. … The approach offers a template for any organization sitting on long-form video archives. Rather than choosing between scale and quality, automated pipelines can deliver both.” – Himanshu Aggarwal,

Machine Learning Engineer, Glance & Sharmila Devi, AI Consulting Lead, Google Cloud

Movix fills a gap in dental skills with specialized agentic AI

Who: Movix is building one of the first agentic AI solutions for dental appliance manufacturers and dental labs, to help solve a serious shortage of skilled dental technicians in aligner manufacturing.

What they did: Movix developed custom models for deep learning, computer vision, and 3D mesh analysis over a five-month period, using Google Cloud infrastructure. Once defects are detected, they use the Gemini Enterprise Agent Platform to generate client-facing feedback that reads as if it came directly from a human technician. Their 3D models use Cloud Run with L4 GPUs for the massive compute power required, and they use Compute Engine VMs to run experiments and train models.

Why it matters: Movix’s agentic solutions automate data entry and quality control, which are traditionally manual, time-consuming, and error-prone tasks. The automation and higher level of accuracy the QC agent delivers can save $300 per remake for an aligner manufacturer, and speed up the appliance manufacturing process with quicker turnaround times.

Learn from us: “We plan to build hybrid solutions … designing an architecture that connects our cloud-based AI agents with older, on-premises software that many conservative labs still use — through lightweight local connectors and standardized APIs. This will allow us to access a large market segment that has not yet migrated to the cloud.” – Marina Domracheva, CEO, Movix & Bakit Dzhumagulov, CTO, Movix

How Imgix processes 8 billion images daily with G4 VMs powered by NVIDIA Blackwell

Tue, 12 May 2026 16:00:00 +0000

The modern web is extremely visual. People are busy and easily-distracted, and smart companies know they have just seconds to attract would-be customers with compelling images, videos, animations, and other eye-catching elements. That’s why iconic brands like Bugatti, Yeti, Porsche, Spotify, and Sonos rely on Imgix to be the engine driving their online visual media.

Every day, Imgix serves more than 8 billion images and videos for brands like these and many others. With a platform designed to unify media optimization, AI transformation, and global delivery, Imgix ensures that its partners’ digital experiences are fast, personalized, and built for performance. Now more than ever, leading organizations are demanding real-time, high-fidelity media, and they need it to be fast.

To meet that demand, Imgix has evolved its infrastructure from private data centers to a full-stack, GPU-based environment on Google Cloud’s AI Hypercomputer. By transitioning to G4 VMs powered by NVIDIA RTX PRO 6000 Blackwell GPUs, Imgix ramped up its real-time processing capabilities, cutting median latency by 50% and increasing throughput per node by 6x. And it did all of that without changing its core application code.

The challenge: Instant visuals at scale

To capture people’s attention businesses need rich, fast-loading content that can reach millions of users simultaneously across a diverse array of devices.

A big part of that is real-time transformations — resizing, format negotiation, and applying artistic effects — and the computational power required for real-time transformations can be immense.

With inefficient technology, load times can be slow and brands risk giving their users poor experiences. Imgix’s solution to this challenge is a "just-in-time" philosophy. Achieving this requires high-performance instances. And with G4 VMs, they were able to process images instantly upon request rather than pre-rendering and storing millions of image variations.

Adopting the system that runs Google

When companies build on Google Cloud, they get more than just servers: they plug into the same intelligence engine powering Google's many billion-user products. Imgix is leveraging this structural advantage by using G4 VMs.

G4 VMs incorporate eight NVIDIA RTX PRO 6000 Blackwell GPUs, two AMD Turin CPUs, and Google Titanium offloads, which act as a dedicated administrative assistant for businesses’ servers. They handle the ”office chores” of security and data traffic in the background while the main processor does a company’s heavy lifting.

The G4 VM’s custom P2P interconnect yields up to 168% more throughput than standard configurations. With this architecture, Imgix can move all its image processing operations to NVIDIA GPUs and run multiple requests in parallel.

Inside the Imgix architecture

Imgix offers more than 150 different visual filters and its architecture is built to handle transformation requests dynamically based on which filters users choose. The pipeline has four primary stages:

Ingestion: The system retrieves assets directly from customers and routes them to a 2.5 petabyte storage cache on Google Cloud Storage (GCS). This high-speed layer replaces unreliable random web requests with a redundant, geographically distributed infrastructure.
Decoding: High-performance C libraries, supplemented by nvJPEG, decode assets into raw RGBA data. This leverages the G4 VM’s massive parallelism to handle multiple decoding stages, including Huffman decoding, Inverse DCT, and color space conversion.
Transformation: A custom Vulkan compute shader stack handles the core processing. Instead of fixed graphics pipelines, these shaders treat transformations (like resizing or masking) as parallel math problems rather than standard graphics tasks, enabling thousands of simultaneous pixel operations on the G4 VM clusters.
Encoding and Delivery: Once transformed, images are re-encoded using hardware-accelerated tools like NVENC and delivered via a global CDN. Because the G4 VM includes independent hardware engines for NVENC (encoding) and NVDEC (decoding), concurrent image manipulations on the CUDA cores aren’t slowed down.

Advanced video and image intelligence

Imgix is also using NVIDIA’s CUDA libraries for high-performance video analytics. By integrating NVIDIA DeepStream, it executes real-time object tracking within video streams for automated content analysis.

For static imagery, meanwhile, Imgix uses the nvJPEG library to offload computationally intensive JPEG decoding directly to the GPU. This prevents CPU bottlenecks during the ingestion of high-resolution assets while allowing the custom Vulkan compute shaders to begin immediate pixel-level transformations on the raw RGBA data residing in GPU memory.

The results: 50% faster and up to 6x more throughput

Thanks to its transition to G4 VMs, Imgix achieved the significant performance gains mentioned above without having to rewrite its core logic:

A 50% reduction in processing latency: It cut median latency from 100 milliseconds to 50 milliseconds.
A 5x to 6x increase in throughput: Its G4 VMs now handle up to six times the workload of its previous generation nodes.
Seamless migration: Imgix supported the G4 VMs by updating its Terraform scripts without needing to implement any application code changes.

"Building on Google Cloud's AI Hypercomputer isn't just about optimizing our current workloads; it's about future-proofing our platform. It gives us the foundational power to seamlessly weave advanced generative AI capabilities into real-time workflows, allowing our customers to push the boundaries of visual storytelling at global scale." - Alfonso Acosta, Head of Engineering, Imgix

Orchestrating at scale

To support the billions of image and video requests its customers process every day, Imgix built a sophisticated hybrid orchestration model:

Management: Google Cloud Run manages session and account layers.
Core Processing: Google Compute Engine-managed instance groups host the G4 VMs, which allows custom software to use the entire machine with no container "slicing."
Dynamic Scaling: Autoscaling relies on custom application metrics, such as machine queue length, rather than standard CPU use. This ensures that the stack’s most expensive elements are tuned for maximum efficiency.
Self-Healing: A custom mechanism monitors logs for driver faults, automatically "reaping" and restarting GPU instances without manual intervention.
Optimization: To maintain peak performance, Imgix uses NVIDIA Nsight Systems to identify and resolve code bottlenecks.

The future: From experimentation to execution

Even with the significant performance improvements it’s already achieved, Imgix is continuing to expand its AI infrastructure so its customers can access additional advanced capabilities like generative fill, background replacement, object removal, and image upscaling.

Features like these rely on high-performance machine learning systems that must process increasingly complex computations with no loss of speed or quality. By leveraging Google’s AI Hypercomputer, Imgix is now deploying and serving these models efficiently and offering its customers real-time, production-ready AI editing. And as demand grows for more dynamic and personalized visual experiences, this foundation is ensuring that Imgix can continue to deliver powerful capabilities reliably and at scale.

Get started

G4 VMs work natively with Google Compute Engine, Google Kubernetes Engine, Google Cloud Storage, and Vertex AI.

Dive deeper: Explore the Imgix architecture on GitHub.
Start building: Read the G4 VM documentation.

What’s new with the Cross-Cloud Network at Next ‘26

Wed, 22 Apr 2026 12:00:00 +0000

While generative AI sparked a revolution, the true paradigm shift is the rapid evolution from standalone AI models to multi-agent autonomous systems. In this new era, the network transcends basic connectivity to become the critical integration layer for your agentic enterprise.

As AI agents and services surge, your core applications remain as vital as ever. To thrive in this rapidly evolving landscape, you need a planet-scale network to connect, protect, govern, deliver, and secure all your users, data, agents, AI services, and core applications across clouds and on-premises.

Google Cloud's Cross-Cloud Network provides this unified foundation, and is now used by 65% of the Fortune 100 and handles up to 27 exabytes of data per month. At Google Cloud Next, we are introducing networking innovations to accelerate your AI infrastructure, strengthen security, and simplify operations.

Optimized networking infrastructure for AI

As we move toward an agentic world, the network must support massive-scale inference paired with reinforcement learning. At Google, we’ve spent years refining this cycle to power our own global AI services. Today, we’re announcing AI infrastructure network innovations that bring this same architecture directly to your workloads, across agents, inference, training, and beyond.

Networking for agents

The Gemini Enterprise Agent Platform is a comprehensive enterprise environment designed to build, scale, govern, and optimize the next generation of autonomous agents. Key innovations being announced in preview include:

Agent Gateway: Air-traffic control for agentic traffic

Agent Gateway understands MCP and A2A agentic protocols and provides an open, extensible, scalable way to enforce centralized governance policies to securely connect agents, models, and tools across runtimes.

Ambient networking: A seismic shift in service-to-service connectivity

Ambient networking, a new integrated data plane for Google Kubernetes Engine (GKE) and Cloud Run, provides service discovery, zero-trust access, and traffic management without the need for complex and resource-heavy sidecar proxies. It reduces operational overhead and enables up to a 10x reduction in GKE resource usage for layer 4 (L4) mesh capabilities

Ambient networking underpins two new capabilities:

Service bindings automatically establish service-to-service connectivity, allowing developers to move faster to build and scale their agentic applications and services.
Network Services Monitoring bridges application and network observability gaps resulting in faster root-cause analysis and simplified troubleshooting.

Rich partner integrations and customizations

With the help of Service Extensions, we are developing solutions for identity, governance, and AI security for agent-to-anywhere traffic. Coming soon in preview to Agent Gateway are:

Identity and governance administration: Offering delegated authorization to Cloud IAM and partner services from Okta, Ping, Saviynt, and Silverfort to enforce real-time, contextual governance policies based on application and business context.
Runtime security: As a universal enforcement point by integrating with Google Cloud’s Model Armor and partner solutions from Broadcom, Check Point, Cisco, CrowdStrike, Exabeam, F5, Netskope, Palo Alto Networks, Thales, and Zscaler. Together, these can help to secure agentic communications against emerging AI attack vectors.

These innovations are built on an open foundation including Envoy and Kubernetes, providing strong, integrated governance in multicloud environments using standard Kubernetes Gateway APIs.

Networking for inference

At Google we run inference at scale with optimized use of distributed GPU and TPU resources, automatic failover between regions for high availability, and optimized global request routing for fast end-user performance. GKE Inference Gateway delivers these capabilities to our cloud customers including the following new innovations:

Multi-region support allows scaling inference services across regions, enabling cross-regional failover, optimized utilization, and reduced global latency (preview).
Predictive latency boost improves utilization with intelligent request routing based on predefined performance targets (preview).
Disaggregated serving leverages llm-d’s SGLang support, offering the flexibility to choose between vLLM and SGLang for model serving (GA).

Gemini Enterprise Agent Platform reduced Time to First Token (TTFT) latency by over 35% for Qwen3-Coder by using GKE Inference Gateway.

“Before GKE Inference Gateway, managing our inference stack with Ray Serve created a complex, dual-orchestration layer that was a significant burden on our small operations team. Moving to the Inference Gateway and native Kubernetes deployments was the 'North Star' architecture we needed to simplify management and achieve robust production stability with a GKE-native batteries-included solution.” - Mikhail Lubinets, Lead HPC Engineer, Technology Innovation Institute

Networking for training

At Google, we build and run the largest AI models in the world — and we built a network to support that. Some of the new enhancements are:

Massive scale with Virgo Network

This new non-blocking data center fabric removes latency barriers:

Virgo can link up-to 134,000 chips with 47 Petabits/sec of non-blocking bi-sectional bandwidth to deliver 1.7K Exaflops of compute.
With enhancements in Pathways and JAX, you can further connect these Virgo fabrics to scale to over 1 million TPU chips in a single training cluster.
We are also making Virgo Network available on NVIDIA Vera Rubin NVL72, supporting up to 960,000 GPUs.

For more on Virgo Network, check out this blog.

Accelerator network profiles

It’s easier than ever to handle the complex networking prerequisites for accelerator-equipped GKE node pools with DRANET, which improves bandwidth for distributed AI/ML workloads by up to 60% (GA).

AI-native Cloud Interconnect

SLA-backed, and optimized for efficiency, Cloud Interconnect supports petabit-scale data transfers and is available with a fixed price option. Cloud Interconnect now supports:

400 Gbps circuits with up to 3.2 Tbps in a single connection (GA)
Partner Cross-Cloud Interconnect for AWS (GA), CoreWeave (in preview soon), and Lumen (in preview soon)

Cross-Cloud Network for AI and core applications

The Cross-Cloud Network helps ensure you can securely connect users, data, locations, applications, services, and infrastructure anywhere in the world, at planetary scale. We designed our global multi-shard network to scale horizontally to meet the demands of the AI era and enable us to accommodate our 10x WAN traffic growth from 2020 to 2025.

These are some of the improvements we’re making to the Cross-Cloud Network:

Ultra Low Latency Solution for financial exchanges

In partnership with CME Group, we are bringing the world's leading derivatives marketplace to Google Cloud. To support CME Group’s performance requirements, we developed an ultra low latency (ULL) networking and compute solution. This fully managed cloud environment will allow CME Group and its clients to migrate its core trading systems to Google Cloud.

Now in preview, the solution is designed to meet the unique and exacting requirements of running financial exchanges in the cloud. It includes several new technologies:

Deterministic high-performance compute powered by ULL networking, with bare metal and VM form factors, delivers a comprehensive portfolio for your trading compute needs.
Scalable multicast data distribution with hardware-based ultra-low latency enables reliable one-to-many market data sharing.
Nanosecond-level clock sync enabled by Firefly, a novel clock synchronization system. Firefly achieves sub-10ns NIC-to-NIC synchronization to support high-frequency trading.
Advanced network observability with 64-bit nanosecond timestamps, support for multiple traffic-mirroring destinations and multicast traffic, and support for auditing and regulatory requirements.
Low-latency inference allowing exchange participants to connect their AI-driven services to the exchange’s infrastructure.

“The Google Cloud Ultra Low Latency Solution provides the level of performance necessary for CME Group futures and options markets to run in the cloud, expanding access to clients worldwide.” - Sunil Cutinho, CIO, CME Group

Cross-cloud observability for networks, applications, and agents

Whether you’re running core applications or new AI agents, you need visibility into your network infrastructure. Cloud Network Insights, now in preview, offers network performance monitoring (NPM) and digital experience monitoring (DEM) to dramatically reduce the mean time to detect and mitigate network-related agent, application, and API issues.

Cloud Network Insights is enabled by technologies from Broadcom’s AppNeta and powered by AI-enabling natural language queries through Gemini Cloud Assist.

"In an environment as complex and high-scale as Sabre’s, total visibility isn't just a luxury — it's a requirement for operational resilience. Cloud Network Insights will enable us to further shift our posture from reactive troubleshooting to proactive optimization. By providing granular, real-time telemetry across our global cloud footprint, it helps eliminate the traditional 'black box' of the network, allowing our teams to resolve bottlenecks before they impact the traveler experience." - Alfredo Rodriguez, VP Cloud Platform Infrastructure, Sabre Corporation

Cross-Cloud Network for distributed applications

Multicloud and hybrid networks require secure, reliable, and high-performance connectivity. New enhancements for our foundational networking services and tools include:

Private Service Connect

Private Service Connect traffic volume grew 4x in 2025 and it now supports 40+ Google and third-party published services, enabling secure private global access to your managed services.
Private Service Connect endpoint-based security allows for granular authorization policies for producer-to-consumer service communications (preview).
Gemini Cloud Assist for Private Service Connect provides for automated troubleshooting (preview).

Cloud-native IP address management (IPAM)

Cloud Number Registry is an IPAM solution powered by agentic technologies. Network admins can easily find free IP ranges, track utilization, and allocate resources (preview). It also integrates with Infoblox Universal DDI for Cross-Cloud Network IPAM discovery and enforcement.
Hybrid Subnets allow you to migrate legacy workloads from on-premises to a VPC without needing to change hard-coded IP addresses (GA).
Cloud NAT allows you to connect your IPv6-only workloads to private IPv4 destinations using the combined power of DNS64 and private NAT64 (in preview soon).

Network Connectivity Center (NCC)

Partner Cross-Cloud Interconnect for AWS is available as a connectivity type in NCC (preview).
Support for static routes using an internal load balancer as the next hop allows the integration of Secure Web Proxy and third-party network security virtual appliances (GA).
Support for privately used public IP (PUPI) allows the exchange of PUPI IPv4 addresses with VPC spokes and producer VPC spokes (GA).

Granular networking charge visibility

Cost Explorer and the new App Optimize API now provide attribution of associated Data Transfer costs to the originating resources for Google Cloud products (in preview soon).

Cross-Cloud Network for internet-facing services

As part of Cross-Cloud Network, the Global Front End simplifies how you deliver, scale, and protect web, API, and AI workloads. New capabilities include:

Global Front End Enterprise delivers simplified consumption by combining capabilities from global Cloud Load Balancing, Google Cloud Armor, Cloud CDN, and Service Extensions with up to 15% lower TCO (in preview soon).
Post quantum cryptography (PQC) helps secure your workloads with industry-standard algorithms that provide a layered defense against both classical and quantum adversaries.
Google tag gateway, enabling advertisers to serve tags from their own domain, which can significantly improve the accuracy and resilience of measurement signals (GA soon).

In addition, Cloud CDN, an important part of the Global Front End, now offers:

Built-in image optimization to help you deliver content that best fits your end users’ screens and saves on bandwidth costs (in preview soon).
GKE Gateway support so you can enable and manage caching services using GKE APIs (GA).

Cross-Cloud Network’s Cloud WAN for global enterprises

Cloud WAN is a fully managed, reliable global backbone to connect your enterprise. New capabilities include:

Expanded geographic reach: Our network spans more than 10 million kilometers of terrestrial and subsea fiber, and Network Connectivity Center’s site-to-site data transfer is now available in over 25 countries.
NCC Gateway enables third-party secure service edge (SSE) integrations from Palo Alto Networks (GA soon) and Symantec (preview).
The Verified Peering Provider program, which offers highly reliable internet connectivity to Google, now has dramatically expanded availability through 175+ providers worldwide.
Last mile connectivity: Provision site-to-cloud private connectivity in minutes with preferred partners from the Google Cloud console (in preview soon).

“Cloud WAN enables Dun & Bradstreet to evolve our global network via composable, cloud-native constructs. Leveraging NCC, we’ve built a resilient, high-performance platform that simplifies operations and optimizes costs. This foundation supports continued modernization and AI-driven workloads. We expect to extend this architecture as new patterns emerge, maintaining our blueprints-first approach.” - Josh Barry, VP, Network Engineering, Dun & Bradstreet

AI-powered security against evolving threats

The threat landscape is evolving faster than ever, with AI-driven attacks. Staying ahead requires the latest defenses. Cross-Cloud Network relies on Cloud NGFW and Cloud Armor for advanced security capabilities. Here’s the latest on those offerings.

Cloud NGFW

Advanced malware sandbox uses AI models trained on data from 70k+ customers to stop 99% of known and unknown malware, including evasive zero-days. Advanced malware sandbox is powered by Palo Alto Networks Advanced Wildfire (in preview soon).
Internal Application and proxy Network Load Balancer support helps to enforce consistent, service-centric security for abstracted services like GKE, Cloud Run, and Private Service Connect traffic (preview).
Project-level policies allow for creating and managing Cloud NGFW endpoints, security profiles, and security profile groups at the project level (in preview soon).

Cloud Armor

Managed rules, built-in rulesets across 15 threat categories, deliver automated threat protection against a broad set of attacks and zero-day CVEs. This is powered by Thales Imperva based on visibility to 1.5 trillion web requests each month (in preview soon).
Google Cloud Fraud Defense integration helps to discern the legitimacy and authorization of bots, humans, and agents. Fraud Defense is the evolution of reCAPTCHA, which protects over 14 million domains from fraud and abuse.
Adaptive protection for Network Load Balancers & VMs brings advanced machine learning to L3/L4 traffic, to detect and mitigate volumetric DDoS attacks (in preview soon).
A simplified user experience with a visual rule builder makes custom rule creation easier (in preview soon).

AI-powered network operations

Finally, new AI-powered technologies in Gemini Cloud Assist can help automate manual tasks, ease troubleshooting, predict reliability issues, improve security, and help optimize your network to reduce toil and improve reliability with new specialist agents. These include:

A network security agent that streamlines network security operations by assisting with policy generation, recommendations, and impact analysis (in preview soon).
A network agent that optimizes workload placement for performance and reliability, and also provides advanced cost estimation for observability services (in preview soon).

Additionally, to enable customers and partners to build their own agents, we are releasing Network observability MCP tools and agent skills. This will allow their agents to leverage connectivity tests, and allows for natural language querying of VPC Flow Logs (both in preview).

The network that scales with you

We built our Cross-Cloud Network on the same global infrastructure that powers Google’s largest AI and internet services. This provides you with a blazing-fast, planet-scale foundation that is both secure by design and open by principle, allowing you to integrate your trusted partners across any environment.

As we move into the agentic era, our flexible, future-proof solutions ensure you can quickly adopt the latest AI technologies while maintaining the reliability of your core applications.

Whatever comes next, we’ve built the network to help you lead it. Attend our networking sessions at Next ’26 to learn more, or learn more about the Cross-Cloud Network!

Building the Agentic Enterprise with Google Cloud partners and a $750M innovation fund

Wed, 22 Apr 2026 12:00:00 +0000

We are now seeing the Agentic Enterprise become reality for customers, and this week at Next ‘26 we are announcing exciting, new innovations to help customers accelerate agentic AI even further.

Our partners play a critical role in enabling the Agentic Enterprise, and today we are also announcing new resources, technologies, and deep technical partnerships to ensure we offer customers the industry’s most capable partner ecosystem for the agentic era, including:

A $750 million partner fund for agentic development applicable across global consulting firms, software partners, and our channel partners.
New ways for customers to deploy partner agents in Gemini Enterprise.
Deeper and more technical partnerships with global consulting firms to support customers, including with new teams of Google forward deployed engineers.
Integrating Gemini models more deeply into enterprise platforms from Palantir, Salesforce, SAP, ServiceNow, and more.
More AI-powered features in Google Cloud Partner Network to help our partners deliver high quality services.

Investing to accelerate AI agent development

We are committed to offering customers the most AI-capable partner ecosystem in the industry. To empower our partners to drive real transformation in the agentic AI era, we are launching a $750 million innovation fund to accelerate agent development and deployment globally, applicable to every business process, function, and industry.

This funding will support a wide range of activities including:

Hands-on support for software companies to build AI agents into their products with the Gemini Enterprise Agent Platform and bring them to market through our Agent Marketplace and through the new Agent Gallery in Gemini Enterprise.
Expert Google forward deployed engineers (FDEs) who will partner with major systems integrators to help their customers solve deep technical challenges and deploy Google AI more rapidly.
Deployment and usage incentives to help services partners thrive in the agentic era.
Training, technical development initiatives, and workshops to help partners build and deploy agents for customers using Gemini Enterprise Agent Platform.

Surfacing partner-built agents in Gemini Enterprise

At Next, we’re announcing Gemini Enterprise Agent Platform, a comprehensive platform to build, scale, govern, and optimize agents. It includes Agent Gallery, where customers can browse a highly vetted set of agents built by our partners.

Today, Agent Gallery provides access to a wide range of third-party agents. These agents have been built on top of our secure, enterprise-grade infrastructure, meaning customers can deploy them within their businesses with the highest levels of governance and confidence. Today, this includes agents built by Accenture, Adobe, Atlassian, Deloitte, Lovable, Oracle, Palo Alto Networks, Replit, S&P Global, Salesforce, ServiceNow, Workday, and more.

Empowering global consulting partners to drive AI transformations

Today, Google Cloud’s global consulting and systems integrator partners offer customers more than 330,000 experts trained on implementing Google AI. At Next, we are expanding our partnerships with every major systems integrator, including:

Accenture is helping enterprises drive AI-powered reinvention and business value faster and at scale with the launch of a first-of-its-kind Gemini Enterprise Acceleration Program. The program brings elite engineering and forward deployed engineers from Google Cloud and Accenture directly to customers.
BCG is expanding its partnership with Google Cloud to accelerate Gemini Enterprise transformation, helping organizations deliver at-scale agentic adoption.
Capgemini is establishing a Google Cloud AI Enterprise Hub to accelerate enterprise-scale adoption of Gemini Enterprise.
Cognizant is launching a dedicated Gemini Enterprise practice group to accelerate enterprise adoption of Gemini Enterprise.
Deloitte is forming a dedicated Google Cloud Agentic Transformation practice focused on Gemini Enterprise and will roll out Gemini Enterprise to more than 100,000 of its own teams.
HCLTech is launching a Gemini Enterprise Business Unit to accelerate the development and adoption of industry-specific solutions built on Gemini Enterprise.
Infosys is leveraging Gemini Enterprise within its Infosys Topaz platform and is equipping more than 100,000 Infosys developers across Infosys’ global delivery teams with Gemini Enterprise.
KPMG is deploying Gemini Enterprise with a life sciences company and launching a new Financial Close Companion agent built with Workday and Google Cloud.
Kyndryl is deepening its Google Cloud partnership with expanded Google Distributed Cloud services for sovereign, AI-ready applications.
McKinsey is launching the McKinsey Google Transformation Group to accelerate enterprise AI outcomes with Gemini Enterprise, combining its strategic expertise with Google's AI stack to help organizations scale agentic transformation
PwC is launching a dedicated Google Cloud AI Center of Excellence to help organizations scale AI adoption, pairing industry expertise with Gemini Enterprise to deploy AI agents that reason, act, and automate processes at scale.
TCS is launching new agentic AI offerings and a dedicated Gemini Enterprise practice, featuring more than 3,000 industry-focused AI agents and an expanded global network of Gemini Experience Centres to accelerate AI-native, autonomous enterprise operations.

Additionally, we will begin to embed teams of Google Cloud engineers with a subset of global partners, including Accenture, Capgemini, Cognizant, Deloitte, , HCLTech, PwC, and TCS in order to help their customers more rapidly prototype and deploy AI agents within their businesses. AI-native services partners, including Altimetrik, Artefact, Covasant, deepsense.ai, Distyl.ai, Northslope, Quantium, Tribe.ai, and Tryolabs will launch Gemini Enterprise practices, receiving credits for sandbox development, technical upskilling, and referral opportunities.

We are also rolling out a new program offering early model access for a select group of partners, including Accenture, Bain & Company, BCG, Deloitte, and McKinsey, who will be able to preview and begin building with pre-release versions of upcoming Google DeepMind models.

Bringing Gemini to More Customers through Popular SaaS Platforms

Already, many of the world’s leading agentic SaaS and AI platform companies integrate Gemini into their products. At Next, we’re expanding these integrations even further, including:

Atlassian is bringing Gemini 3 Flash to Rovo and integrating multimodal capabilities into Remix in Confluence, helping teams instantly transform text-based documentation into high-fidelity diagrams and charts for faster stakeholder decision-making.
Box is launching new Box Agents powered by Gemini 3 Flash and Gemini Enterprise, helping enterprises transform static files into actionable intelligence by natively integrating AI orchestration into their secure content management workflows.
DocuSign is using Gemini to power new features that summarize complex agreements, identify key clauses, and help users understand the implications of their contracts.
Oracle is launching the Oracle AI Database Agent for Gemini Enterprise. This new agent enables end users to ask business questions of their Oracle data in natural language in Gemini Enterprise, without needing to write SQL or understand the underlying data model.
Palantir is adding Gemini and BigQuery integrations for commercial customers, enabling customers to connect best-in-class models to their most critical AI workflows and operations.
Salesforce is adding native Gemini support to its Atlas Reasoning Engine. This enables Agentforce to “see” across text, image, and video formats, drawing from years of customer history to accurately solve complex problems. This means faster, smarter resolutions, building upon the success thousands of customers are already seeing from Gemini within Agentforce to build prompts.
SAP is integrating Gemini Enterprise into its Engagement Cloud to deliver AI-powered customer service and sales insights alongside creative tools for image and text generation.
ServiceNow is integrating its AI agents with Gemini Enterprise, bringing autonomous operations to the world’s largest enterprises

Building a partner channel for the agentic era

Our new partner program, Google Cloud Partner Network, is designed to help partners thrive in the agentic era. Last year, we used AI to unlock deep insights across our partner tools; now, we are building the agentic workflows that turn those insights into autonomous growth. Key updates include:

The Partner Agent: Integrated into the Partner Network Hub, this agentic tool acts as a central orchestrator for the partner experience. Beyond answering questions, it actively guides partners on next steps, summarizes complex assets, and provides real-time coaching for registrations and statements of work.
The Agentic Earnings Hub: Here, partners can find new capabilities to auto-draft statements of work and monitor consumption milestones to auto-generate claim requests. When paired with the Earnings Potential Modeler, these tools provide contextual recommendations to map every available incentive down to the individual client level.
Partner Finder: We are also extending this intelligence to customers, turning discovery into a conversational experience where natural language prompts pinpoint the ideal partners for the most hyper-specific workloads.

Finally, we’re honored to highlight the winners of Google Cloud’s 2026 Partner Awards, which celebrate the transformative impact and incredible value our partners have delivered for customers over the past year. Our ecosystem continues to evolve to meet the needs of businesses across every industry, and we are constantly impressed by their ability to solve complex, global challenges using our technology. To discover more about these exceptional achievements, please read our full list of partner award winners.

I can’t wait to meet with thousands of you this week to build the future of the Agentic Enterprise!

Building the agentic future: A spotlight on Google Cloud’s media & entertainment partner ecosystem

Thu, 16 Apr 2026 13:15:00 +0000

As we gather in Las Vegas for NAB Show 2026, the industry conversation has shifted. We are no longer asking if AI works; we’re now focused on how it scales. The era of AI experimentation is over — production-grade execution is here.

At Google Cloud, we believe no studio or broadcaster should have to build this future in isolation. Our mission is to provide the agentic platform and AI and cloud tools that allow our partners to innovate at the speed of ideas — from the tools used in the edit suite, to the technology that delivers video to millions of viewers worldwide.

Enhancing production: From manual tasks to intelligent assistants

Modern creative workflows are often slowed down by manual technical tasks. Google Cloud is working with ecosystem leaders to integrate advanced AI capabilities directly into the core of production software, so creators can focus on their artistry, not tedious tasks. With AI acting as a proactive assistant within the creative suite, production teams can significantly reduce the time between a raw idea and a finished frame.

Avid: With the launch of Content Core on Google Cloud, Avid is delivering a truly cloud-native studio. And by integrating multimodal AI search into Media Composer, editors can find the exact frame they need using natural language, turning hours of logging into seconds of discovery.
Backlight: Backlight makes complex media workflows simple for teams of all sizes, from production through monetization. Built on Google Cloud with the Video Intelligence API, Backlight's Iconik platform automatically adds searchable metadata upon upload. Customers see up to 50% shorter production cycles and save up to 60% on storage by deeply understanding their media libraries, reducing duplications, and optimizing asset placement.
Brahma.ai: Brahma AI, an enterprise AI content platform, is powering high-fidelity digital likenesses across retail, entertainment, and healthcare, making them interactive and intelligence-driven within a secure and governed framework.

Unlocking content value: From static archives to active assets

Data is only as valuable as the insights you can extract from it. Our partners, listed on the Google Cloud Marketplace, are using generative media models to transform massive, static archives into searchable, revenue-generating engines. By making every frame discoverable, we’re helping media companies turn decades of history into immediate opportunities.

Ateme: Ateme helps simplify global distribution with its new generative AI-powered subtitling solution, which can significantly reduce the manual labor of localizing different media types.
Perfect Memory: Perfect Memory helps customers turn traditional storage into a context-aware knowledge engine. The platform understands the relationships between athletes, historical events, and emotional nuances — transforming massive media archives into an intelligent library that lets creative teams instantly surface the perfect content for any story.
VionLabs: Working with companies like Cineverse, Plex, and Crunchyroll, Vionlabs uses AI to analyze and index content libraries — making video assets more accessible and enabling metadata generation. By understanding the specific mood and aesthetic of each scene, Vionlabs helps streaming platforms move beyond basic genre tags toward more nuanced, sentiment-driven content discovery and marketing.

Scaling global reach: From simple streams to audience growth

To succeed today, media companies must provide a smooth viewing experience and easy payment options. Our ecosystem provides the tools to grow a company’s reach and maximize the value of every subscriber through reliable, high-quality delivery.

Bending Spoons: By leveraging the global scale of Google Cloud, Bending Spoons’ properties such as Brightcove and Vimeo are delivering professional-grade tools for large enterprises, SMBs, the next generation of creators, and more. Its platforms ensure that high-quality video production and distribution are accessible to everyone, from global brands to independent storytellers.
Bitmovin: Bitmovin enables streaming services to scale efficiently while delivering a premium experience across the widest range of devices. By combining real-time observability with AI-driven insights, media teams can proactively optimize engagement and monetization. Furthermore, Bitmovin’s advanced encoding ensures superior visual quality at lower bitrates, supporting everything from high-demand Video on Demand (VOD) to massive, 24/7 live events.
Evergent: Evergent automates complex billing and monetization workflows for AI-powered revenue management. Media and telecommunications companies can use Evergent’s tools to maximize subscription growth and improve long-term customer retention through personalized and agile payment offers.
Harmonic: Harmonic is helping major broadcasters like Grupo Globo modernize their operations. By integrating new digital broadcast capabilities into their cloud-based streaming solutions, Harmonic provides leaders with a faster, more efficient path to manage video processing and delivery at a massive scale.

Ensuring reliability: From infrastructure to a foundation of trust

High-quality content requires a high-performance foundation. We are partnering with infrastructure leaders to ensure that even the most complex global broadcasts remain stable, secure, and responsive.

Zixi provides the broadcast-grade transport and workflow automation needed to move professional video across any network. By offering centralized control and complete visibility into the delivery process, Zixi ensures that leaders like Fubo can manage high-stakes, broadcast-quality live events without the risk of a signal drop.

Visit the ecosystem in action

The strength of our ecosystem is its integration across all aspects of the media and entertainment landscape. From the cameras, to the cloud, to the viewers' screens, these partners represent the future of a more creative, efficient, and agentic media industry.

Visit the Google Cloud Booth (West Hall, #W2731) at NAB Show from April 19-22 to see many of these partners in action through live demonstrations and theater sessions.

aside_block: <ListValue: [StructValue([('title', '2026 AI Agent Trends in Media and Entertainment'), ('body', <wagtail.rich_text.RichText object at 0x7f91746fca60>), ('btn_text', 'Read it now.'), ('href', 'https://cloud.google.com/resources/content/ai-agent-trends-media-entertainment-2026'), ('image', <GAEImage: Confirmation email_500x450>)])]>

Cool stuff Google Cloud customers built, April edition: BMW big on SLMs, MLB’s Scout Insights AI, personalized resort experiences

Wed, 15 Apr 2026 16:00:00 +0000

AI and cloud technology are reshaping every corner of every industry around the world. Without our customers, who are building the future on our platform, there would be no Google Cloud. In this regular round-up, we dive into some of the exciting projects redefining businesses, shaping industries, and creating new categories.

For our latest edition, we learn why BMW Group is experimenting with small language models (SLMs); catch AI-powered commentary from Major League Baseball; hit the slopes with Vail Resort’s AI concierge; build an intelligent grid with CTC Global; witness how ID.me created secure global scale; and see how Manhattan Associates supply chain tools now handle 1 billion daily API calls.

Be sure to check back next month to see how more industry leaders and exciting startups are putting Google Cloud technologies to use. And if you haven’t already, please peruse our list of 1,001 real-world gen AI use cases from our customers.

BMW tests the big potential of small models

Who: As one of the world’s leading providers of premium cars and motorcycles, BMW Group is always at the forefront of automotive technology. This ethos pushed the company to test what type of AI language models are ideally suited to driving situations, where access to cloud-based LLMs isn’t always possible.

What they did: BMW Group wanted to explore the potential of small language models (SLMs), which could run within the limited hardware on a vehicle. Finding the right trade-off between size and capability requires careful optimization, and the sheer volume of viable combinations renders manual searches for the optimal configuration an incredibly tedious, if not impossible, undertaking. To overcome this challenge, BMW and Google built automated, reproducible workflows through executable pipelines using Vertex AI.

Why it matters: The path from a general-purpose LLM to a specialized SLM isn’t straightforward. Every choice — from type of quantization to characteristics and contents of the fine-tuning domain-specific dataset — affects the quality and efficiency of the final model. This creates an exponential range of configurations, each with different trade-offs. It’s a great example of using AI to scale an optimization problem for other AI.

Learn from us: “With automated pipelines, we can rapidly adapt models to our domain and rigorously test and evaluate them against domain-specific benchmarks. This allows us to iterate and optimize models in hours rather than days.” – Dr. Céline Laurent-Winter, vice president, Connected Vehicle Platforms at BMW Group

MLB Scout Insights: AI-powered color commentary

Who: Major League Baseball is famous for its colorful announcers. Now, MLB is bringing more baseball color straight to your pocket, and Gemini is helping give it a voice.

What they did: Each season, millions of baseball fans use the MLB app and tap over to the Gameday feature for live, up-to-the-pitch action across more than a dozen games. Starting this season, the league launched MLB Scout Insights in Gameday, which uses Gemini models to quickly scan decades of game and player data, cross-references it with situational game scenarios, and then delivers game-relevant context during key matchups.

Why it matters: Given the sport’s storied history, 162-game regular season, and global reach, baseball fans are among the most sophisticated and passionate out there. To keep them engaged with Gameday and the MLB app, the league wanted to deliver insights that truly felt meaningful and interesting. Building the tool meant answering a rather squishy question: What makes an insight actually insightful, not just an accurate fact, and how can an AI learn that distinction? The answer came from some clever “surprisal” analysis.

Learn from us: "With Scout Insights, every fan can feel like the smartest person in the stands, at the water cooler, or on the couch. It’s about deepening connections to the game, and sharing that passion with others. That’s the magic of sports, and we’re making more of it possible with the magic of AI." – Josh Frost, senior vice president of product & Matt Graser, director of engineering, Major League Baseball

Vail Resorts makes personalized AI assistance easy

Who: Vail Resorts operates some of the most iconic and beloved mountain destinations in the world, including Whistler Blackcomb, Park City Mountain, Stowe, and Crested Butte.

What they did: Vail Resorts launched My Epic Assistant during the 2024-2025 snow season, and expanded it this year to add even more AI-powered chat features powered by Google’s powerful Gemini models. The result is an agentic guide to the slopes that can help skiers and snowboarders decide on the right season pass, share the latest snow report, check on lesson preparations, or suggest a good stop for cocoa.

Why it matters: Vail Resorts wanted more than a chatbot; they wanted a digital concierge that understands the nuance between a powder day at Whistler and a family trip to Beaver Creek. As the company implemented and refined personalization, improved search, summary capabilities, and conversational flow within My Epic Assistant, the app has delivered a 45% reduction in escalation to human agents since launch.

Learn from us: "Utilizing tooling from Google Cloud, we could lean into agentic design patterns that gave us a way to unlock natural, personalized conversations. These boosted customer satisfaction, while reducing the need for manual intent design. These tools also let us combine flexibility and control to enable the assistant to respond fluidly but always within the boundaries of our brand, policies, and product strategy.”— The Vail Resorts technical team

CTC Global turns the smart grid into an intelligent one

Who: CTC Global is a leading manufacturer of advanced transmission conductors and power lines. While many nodes in the grid contain IoT sensors, it recognized a literal gap in the transmission lines themselves.

What they did: CTC’s new GridVista platform threads fiber-optic cable into its high-strength carbon fiber composite core, and connects these to monitoring technology built with AI and monitoring technology from Google Cloud and Tapestry. With GridVista, CTC can turn every inch of transmission into a smart sensor.

Why it matters: GridVista gives CTC grid operators an accurate and reliable view of what’s happening across the entire line — based on actual, real-time data from the entire length of the conductor, not point estimates from a static model of line conditions or the occasional clamped-on sensor. This means they can significantly improve safety, manage costs, increase the line’s capacity to transmit power, and enhance reliability with more precise insights about events that might trigger an outage.

Learn from us: “This awareness allows for a grid that can truly sense its own health in real time and provide unprecedented awareness of conditions on the entire line. Whether that’s real time storm impacts, ice load, wind load, branches on the wire, or temperatures on or under the line. The GridVista system truly represents next generation capabilities. ” — J.D. Sitton, CEO, CTC Global

ID.me reduces risk while scaling past 160 million users

Who: ID.me is transforming digital identity security for the modern era, offering a single login that lets you easily prove you’re you across a wide range of platforms and wallets.

What they did: ID.me currently serves more than 160 million users, including as many as 40,000 at any time, so they can prove their identity online as easily as flashing their driver’s license in person. Over the last two years, ID.me migrated more than 50 terabytes of data across 15 database instances to Google Cloud with minimal downtime. They also introduced a two-tier architecture with Cloud SQL supporting its smaller and more standard services, while AlloyDB runs heavier workflows that form the backbone of the ID.me platform.

Why it matters: AlloyDB AI has allowed ID.me to scale its systems to handle 10X-20X of what was possible before — and at a lower price to boot. That responsiveness and reliability led the U.S. federal government to recognize ID.me for its role in preventing large-scale fraud within national systems.

Learn from us: "We’ve been able to scale both our infrastructure and trust. With a platform that’s faster, smarter, and built to handle portable identity at massive scale, we’re one step closer to our goal: a secure, digital way to prove who you are, wherever you need it, that works everywhere you need it." — Kevin Liu, Cloud Platform Architect, ID.me

Manhattan Associates powers more than a billion daily API calls

Who: Manhattan Associates is a global leader in supply chain and omnichannel commerce solutions, offering tools and platforms that reach more than 2 billion people across 20 billion consumer touchpoints.

What they did: Manhattan Associates modernized its Manhattan Active SaaS platform by migrating from legacy Oracle and DB2 systems to Google Cloud databases. Each capability of Manhattan Active now runs as an independent, containerized service orchestrated by Google Kubernetes Engine (GKE). Data flows through Pub/Sub into BigQuery for real-time analytics, while Cloud Logging and Cloud Monitoring deliver observability at scale.

Why it matters: With its new microservices-first design, Manhattan gained the agility to evolve faster and the confidence that mission-critical operations would remain resilient across regions. With Cloud SQL and BigQuery, the company now processes more than a billion daily API calls with average response times of less than 150 milliseconds. This evolution supports hundreds of thousands of monthly active users across tens of thousands of stores and distribution centers. The new platform also created the foundation for Manhattan’s Agentic AI suite, which includes prebuilt agents — like the Intelligent Store Manager and Labor Optimizer — that coordinate real-time decisions across store and distribution center operations.

Learn from us: "Operationally, the platform has become more elastic and efficient. The system automatically handles hundreds of thousands of scaling events per day, ensuring performance remains consistent during peak surges without expensive overprovisioning." — Narayana Reddy Kothapu, Senior Director, Manhattan Associates & Rajkumar Ramani, Technical Director, Manhattan Associates

Cool stuff Google Cloud customers built, Feb. edition: Telco data reinvention; Golden State’s “G.O.A.T.T.”; John Lewis explores DORA

Thu, 26 Feb 2026 17:00:00 +0000

AI and cloud technology are reshaping every corner of every industry around the world. Without our customers, there would be no Google Cloud, as they are the ones building the future on our platform. In this regular round-up, we dive into some of the exciting projects redefining businesses, shaping industries, and creating new categories.

For our latest edition, we explore a new data approach for Vodafone and Fastweb; evaluating John Lewis Partnership’s developer platforms; the Golden State Warrior’s AI playbook; healthy, stable networks at Hackensack Meridian Health; and Ab Initio brings better context to data for AI.

Be sure to check back next year to see how more industry leaders and exciting startups are putting Google Cloud technologies to use. And if you haven’t already, please peruse our list of 1,001 real-world gen AI use cases from our customers.

Fastweb + Vodafone reimagined data workflows

Who: Following the acquisition of Vodafone Italy by Swisscom in 2025, these leading European telecom providers wanted to rethink how they serve customers and deliver timely, personalized experiences across mobile, broadband, and digital channels.

What they did: Both companies had already begun modernizing customer data workflows with BigQuery, but combining ecosystems exposed certain limits of the existing setup. In order to give every channel real-time access to accurate customer data, they implemented Spanner as a service and governance layer, delivering low-latency reads, horizontal scalability, high availability, and a fully managed environment with zero ops overhead. The team is also using Gemini to generate clear documentation directly from the code, which saves hours of manual work.

Why it matters: Using Spanner Graph allowed the organization to map lineage in a way that reflects how its platform actually works: which tables drive specific jobs, how transformations cascade, and where dependencies sit. Call centers now see more complete, up-to-date customer information, digital channels can rely on consistent data without custom integrations, and partners can access what they need with low latency through Apigee.

Learn from us: “Rebuilding our Customer 360 platform with Google Cloud services has already changed how Fastweb + Vodafone works. Workflow monitoring is simpler, pipelines are leaner, and real-time serving is now the norm. ” – Vincenzo Forciniti, IT AI Adoption & Platform Engineering Lead, Fastweb + Vodafone

John Lewis measures the value of its developer platform

Who: The John Lewis Partnership is a major UK retailer operating John Lewis department stores and Waitrose supermarkets. To power their digital transformation, they built the John Lewis Digital Platform (JLDP) to support dozens of product teams building high-quality software for johnlewis.com.

What they did: Moving beyond simple usage metrics, John Lewis developed a sophisticated, multi-stage approach to measuring the real value of their platform. They transitioned from initial speed-based metrics (like "Onboarding Lead Time") to a comprehensive model using DORA metrics and subjective engineer feedback via the DX platform. This included a custom "Technical Health" feature that uses small, automated jobs to monitor more than 35 health measures — such as Kubernetes best practices, security, and operational readiness — providing teams with real-time "traffic light" indicators of their service health.

Why it matters: By focusing on value rather than just activity, John Lewis ensured the platform was actually reducing friction for developers rather than just being a mandatory tool. Their automated Technical Health checks allow product teams to manage technical debt and security vulnerabilities proactively. This approach has decoupled centralized operations teams from individual services, leading to faster incident resolution (MTTR), fewer outages, and significant cost savings.

Learn from us: "Measurement is a journey, not a destination. Start by measuring something meaningful to your stakeholders, but be prepared to adapt as your platform evolves. The things that mattered when you were proving out the platform's viability are unlikely to be what are important several years later when your features are mature." – Alex Moss, Principal Platform Engineer, John Lewis Partnership

Hackensack Meridian Health de-risks network migration using VPC Flow Logs

Who: Hackensack Meridian Health is a leading not-for-profit healthcare organization and the largest hospital system in New Jersey. System reliability is a cornerstone value for HMH as they manage a vast network of hospitals, urgent care centers, and physician practices.

What they did: Preparing for a large-scale migration to a new Google Cloud network design, Hackensack Meridian Health used VPC Flow Logs and Flow Analyzer to eliminate the "black box" of hybrid traffic. By enabling logs on their Cloud Interconnect VLAN attachments, they captured granular telemetry — including source/destination IPs, ports, and protocols. They then exported this data to create a visual "who-is-talking-to-what" map. This allowed them to identify critical traffic patterns between on-premises data centers and specific Google Cloud regions, VPCs, and applications.

Why it matters: In a healthcare environment, even minor network disruptions can have major consequences. By mapping traffic proactively, Hudson Meridian Health pinpointed exactly which moments in the cutover carried the highest risk. This preparation allowed them to detect a migration issue in just three minutes and resolve it within five — a process that previously could have taken hours. Beyond migration, this level of visibility enables the organization to better manage capacity planning, cost attribution, and security compliance across their hybrid infrastructure.

Learn from us: "Getting a clear picture of our interconnect traffic always felt like a black box. Enabling VPC Flow Logs and feeding it into Flow Analyzer finally gave us the map we needed. Identifying those critical traffic flows before we changed any routes was key to de-risking the entire migration." — Randall Brokaw, Cloud Engineering Manager, Hackensack Meridian Health

The Golden State Warriors’ AI-powered back office

Who: The Golden State Warriors are one of the NBA’s most successful modern franchises. Behind their on-court wins are a specialized operations team who run what might be called organization’s "G.O.A.T.T." (Greatest of All-Time Technologies), a data and AI platform that helps drive game-time insights, trading decisions, and fan experience enhancements.

What they did: The Warriors transitioned from a "gut-feeling" culture to an "analytics-first" strategy by building an internal "digital brain" on Google Cloud. Using BigQuery and Gemini, the team now automates complex workflows that previously took hours, such as generating pre-game scouting reports. They use machine learning to run thousands of trade simulations that prioritize "team fit" over raw individual stats and employ computer vision to track the "shot quality" of every attempt in the NBA. On the business side, they built a content recommendation engine using the Discovery API to deliver personalized digital experiences to their global fan base.

Why it matters: This AI-driven approach narrows the decision tree for leadership, allowing them to focus human expertise on the most viable options. By automating the “science” of data processing, coaches and scouts have more time for the "art" of face-to-face training, planning, and player development. This integration has not only influenced on-court strategy — like the three-point revolution — but has also improved business efficiency, with employees now proactively bringing AI-driven ideas to the IT team rather than waiting for top-down mandates.

Learn from us: "You can never reach a point where either humans or machines are making all the decisions. The sweet spot is finding that middle ground where intuition and data converge on the same conclusion. Data helps us narrow our decision tree before we even start evaluating specific options." — Nick Manning, Senior Director of Consumer Products & Emerging Technology, Golden State Warriors

Ab Initio unlocks enterprise data for the agentic AI era

Who: Ab Initio is an enterprise software company specializing in high-volume data integration and governance. Their platform is trusted by large-scale organizations to manage complex data lifecycles across hybrid and multi-cloud environments.

What they did: To solve the challenge of grounding AI agents in accurate data, Ab Initio partnered with Google Cloud to integrate its data fabric with BigQuery, Dataplex Universal Catalog, and Gemini. They launched a suite of more than 500 metadata and data connectors that bridge the gap between legacy systems (like mainframes, COBOL, and SAS) and modern cloud environments. This integration provides field-level, end-to-end lineage, allowing Gemini to access well-documented, "AI-ready" data regardless of where it resides.

Why it matters: AI agents are only as effective as the data they can access. By using Ab Initio as a "neutral hub," enterprises can federate data from on-premises and multi-cloud sources into a single unified layer without moving the data itself. This provides the rich semantic context and lineage needed for Gemini to perform grounded, explainable reasoning. For businesses, this means faster transition from experimental AI to production-ready agentic workflows that are auditable, compliant, and capable of making complex, automated decisions.

Learn from us: "Agentic AI requires trusted, AI-ready data and metadata. Understanding the origin, quality, and meaning of information matters as much as the data itself. Gemini serves as a key component of the agentic layer, using this context to make decisions that are explainable and auditable." — Scott Studer, Head of Development, Ab Initio & Chai Pydimukkala, Data Governance, Sharing & Integration Product Lead, Google Cloud

Accelerate migrations with new incentives from the Rapid Migration and Modernization Program (RaMP)

Wed, 21 Jan 2026 16:00:00 +0000

To lead in 2026, you need to be AI-ready, lean, and optimized across every workload — and for many organizations, that means migrating and modernizing applications like SAP, Oracle, NetApp, and VMware to the cloud. At Google Cloud, we’ve helped thousands of customers with successful migrations through the Rapid Migration and Modernization Program (RaMP), and today, we’re introducing the new RaMP, so that the more you migrate, the more you save. Highlights of the new program include:

Google Cloud Service Credits: Earn credits based on your incremental usage of eligible workloads on Google Cloud.
Partner and Google Cloud Professional Services funds: We will fund partners and Google Cloud Professional Services to help you assess your needs, build your business case and implement your migration and modernization roadmap.
Earn more with advanced workloads: Earn additional credits and incentives for advanced workloads (like SAP, Oracle, VMware, Data Analytics, etc.) to offset higher technical costs.

From technical debt to AI readiness

You deserve a migration and modernization path that leads to less cost and complexity, not more. Moving to Google Cloud provides the infrastructure foundation you need to replace your technical debt with flexibility, so you can reduce cost and complexity.

But migrating and modernizing your infrastructure is about more than that — it’s about data accessibility, optimization, and innovation. For example, when you migrate a legacy SAP environment or a massive Oracle database to Google Cloud, you aren't just changing where the data sits; you are making that data accessible to Vertex AI and our Gemini models.

RaMP can accelerate your migration and replace technical debt with a scalable, secure foundation that can support existing enterprise workloads and the next generation of AI applications — whatever they may be.

Building for the future

Migrating and modernizing with RaMP gives you immediate access to world-class infrastructure, data, and AI solutions, giving you the foundation you need to succeed in 2026 and beyond. To get started, visit our RaMP page to learn more and start your assessment. Get ready to rapidly enter the AI era. Welcome to the fast lane.

Cool stuff Google Cloud customers built, Dec. edition: AI for better toys, reliable mapping tech, Gemini stumps an all-star & more

Wed, 31 Dec 2025 16:00:00 +0000

For our latest edition, we look into how Waze made its network more reliable; NBA superstar Stephen Curry gets quizzed by Gemini; a financial market transformation at CME Group; a multi-agent business forecasting platform from AppOrchid; Mattel crunches customer feedback with AI; VMO2 uses decentralized contracts for reliable data; Mercado Libre’s strategic use of Spanner; and how Ericsson enhances data governance.

Waze keeps traffic flowing with Memorystore

Who: Waze (a division of Google parent company Alphabet) is a community-driven, crowd-sourced navigation app with tens of millions of users who share real-time data to provide optimal driving routes, traffic updates, and alerts for hazards, police, and more.

What they did: Waze depends on vast volumes of dynamic, real-time user session data to power its core navigation features, but scaling that data to support concurrent users worldwide required a new approach. Their team built a centralized Session Server backed by Memorystore for Redis Cluster, a fully managed service with 99.99% availability that supports partial updates and easily scales to Waze’s use case of over 1 million MGET commands per second with ~1ms latency.

Why it matters: Moving from Memcached’s 99.9% SLA to Memorystore for Redis Cluster’s 99.99% means higher availability and resiliency from the service. And because Memorystore for Redis supports partial updates, Waze can change individual fields within a session object rather than rewriting the entire record. That reduces network traffic, speeds up write performance, and makes the system more efficient overall.

Learn from us: “Real-time data drives the Waze app experience. Our turn-by-turn guidance, accident rerouting, and driver alerts depend on up-to-the-millisecond accuracy. But keeping that experience seamless for millions of concurrent sessions requires robust and battle hardened infrastructure that is built to manage a massive stream of user session data.” – Eden Levin, Waze BE infrastructure developer & Yuval Kamran Waze site reliability engineer

What Stephen Curry learned from a custom Gemini agent

Who: Stephen Curry is arguably of the greatest three-point shooter of all-time in the NBA — as well as Google’s performance advisor and an all-around stats-obsessive.

What they did: For a special engagement with Curry, the Google Cloud team wanted to showcase the power of Gemini for creative thinking, analysis, and data mining. They took every regular season, play-in, and playoff game from Curry’s career (through the end of the 2024-2025 season) and input the data into a custom-built agent using Google Cloud’s Agent Development Kit and Gemini APIs.The system could then be queried for obscure stats, to see if the team could stump Curry and teach him more about his game.

Why it matters: For example, it found that his three-point shooting percentage after more than seven dribbles, with a minimum 105 attempts was 40.2%, and how many points Curry generated for his teammates off of screens since 2013: 1,105. Instead of countless hours of manual research, the team got query results in less than a minute. Some queries were so obscure, the team wouldn’t have reached a valid answer without the ability of the agent to analyze the rich data.

Learn from us: “Gemini is going to be in my head this year, cause I'm going to be looking at all these details.” – Stephen Curry, Golden State Warriors point guard and 4x NBA champ

How CME Group builds a faster, smarter exchange

Who: CME Group has evolved from a nineteenth-century commodities exchange into one of the most advanced financial market infrastructures in the world. To support real-time trading and risk management at a global scale, the company launched a strategic partnership with Google Cloud.

What they did: By migrating to Cloud SQL and adopting AI-powered insights, CME Group empowered developers, paid down technical debt, and unlocked new opportunities for data-driven innovation across financial markets.

Why it matters: Cloud SQL has given CME a foundation for increased developer and team agility. Fewer performance issues mean more time focused on innovation: expanding CME’s analytics capabilities, accelerating AI initiatives, and exploring new ways to commercialize data responsibly. When teams stopped chasing outages, they unlocked more time to take bigger bets and build the future.

Learn from us: “With Cloud SQL, we’ve found a way to keep our data layer as fast and dependable as the markets we serve. Cloud SQL gives our teams real-time visibility into what’s happening inside the database. When an application slows, we can identify the root cause in minutes instead of hours. Those insights are built into the platform, which means we don’t need custom tooling or manual analysis to keep operations steady.” – Kristofer Shane Sikora, Executive Director, Cloud Data Engineering, CME Group

AppOrchid’s multi-agent system for superior business forecasting

Who: App Orchid is an enterprise AI builder and a leader in making data actionable with AI, with a mission to make AI a force for good. Their goal is to empower every employee with trusted, understandable, and accessible data.

What they did: The business forecasting agent is actually built on the foundation of two powerful, specialized AI agents: a prediction agent built by Google Cloud and App Orchid’s Data Agent offering. These agents work in concert to solve complex business problems, acting as complementary specialists. App Orchid’s agent possesses unparalleled understanding of an enterprise's past and present, while Google’s agent brings world-class capabilities in predicting the future.

Why it matters: Adopting a multi-agent approach provides clear, tangible advantages that directly address the forecasting problems that often plague businesses, including improved accuracy; increased operational efficiency; faster insights; and reduced costs and increased revenue; and greater agility and adaptability. Neither of the underlying agents could achieve these results on their own, but working together, this agent is more than the sum of its subagents.

Learn from us: “As the agentic era gets underway, it is evolving quickly. Our multi-agent approach demonstrates both how true agentic systems are most successful when multiple agents are at play, and the importance of finding strong partners with distinct capabilities to help build and assemble these agentic systems.” – Brian Mills, Director, Enterprise AI, Google Cloud & Taka Shinagawa, Gen AI Field Solution Architect, Google Cloud

How Mattel uses AI for real-time product updates

Who: Since its humble beginnings in a garage in 1945, Mattel has consistently been reshaping play for children and families across the globe with iconic franchises like Barbie, Hot Wheels, and Fisher-Price.

What they did: To improve its understanding of consumer sentiment, Mattel developed an AI-powered feedback classification system, which can analyze millions of customer interactions from a diverse range of sources in a matter of seconds. At its core, the system relies on BigQuery for storing and efficiently processing its massive customer datasets and then utilizes Vertex AI and Google’s multimodal Gemini models to refine and train the sophisticated consumer feedback model.

Why it matters: Already, the new AI-powered system has delivered significant wins, delivering a staggering 100x boost in data processing capacity and reducing analysis times from a month to a single minute. By automating the analysis of many processes, analysts are now freed from the noise of everyday tasks, enabling them to focus on deeper research across the company’s iconic portfolio brand.

Learn from us: “Our big motto is ‘From months to minutes,’ but it’s real. We were literally spending months-worth of analysis and just getting data into the place that an analyst could tally up all the sentiment — and now it’s just at our fingertips.” – Shaun Applegate, Director of Product Quality Analytics, Mattel

Virgin Media O2 uses data contracts for scalable AI products

Who: Virgin Media O2 is one of Europe’s largest telecommunications and media providers, with 45.8 million broadband, mobile, phone, and home subscribers across the UK. To build AI products that are adaptable and data-driven, they needed a decentralized system that internal customers could count on for clean, reliable data.

What they did: New decentralized data contracts, built with Dataplex, serve as the data quality and assurance layer for VMO2’s data products; these ensure every dataset they publish is reliable, documented, and ready for consumption. Defined at the asset level, such as individual BigQuery tables or Google Cloud Storage buckets, data contracts are redefining how VMO2 manage and share data, enabling the creation of trusted and scalable AI products across their data mesh.

Why it matters: The power of this approach lies in moving beyond static documentation. Because they are machine-readable, data contracts become living guarantees with continuous enforcement and real-time validation directly within data pipelines. This proactive monitoring allows teams to detect schema changes or SLA breaches early, transforming data quality from a reactive fix into a scalable, automated mechanism.

Learn from us: “By operationalizing trust through data contracts, we are fostering a culture of shared responsibility and data-first thinking. This federated model does more than simply fix pipelines; it builds the trusted foundation needed to scale next-generation AI. It ensures that the resilient AI tools empowering our teams are built on data that is reliable, consistent, and well-defined.” – Chandu Bhuman, Head of Data Strategy, Cloud & Engineering, Virgin Media O2 & Dženan Softić, Data & AI Architect, Google Cloud

Inside Mercado Libre's multi-faceted Spanner architecture

Who: Mercado Libre, an e-commerce and fintech pioneer across Latin America, operates at a staggering scale, demanding an infrastructure that's not just resilient and scalable, but also a catalyst for rapid innovation.

What they did: At the heart of Mercado Libre's strategy is Fury, an in-house middleware platform designed to abstract away the complexities of various backend technologies, providing developers with standardized, simplified interfaces to build applications. Spanner provides Fury with an always-on, globally consistent, multi-model database with virtually unlimited scale. By designating Spanner as a choice within Fury, Mercado Libre ensures that applications built on the platform using Spanner stay consistent globally, scale without breaking, and rarely go down.

Why it matters: The strategic adoption of Spanner, amplified by internal platforms like Fury and sophisticated data workflows, has yielded significant benefits, including: significant cost savings and low total cost of ownership; business impact and agility for developers; and low operational overhead thanks to automation.

Learn from us: “Mercado Libre's adoption of Spanner demonstrates how to use a powerful, globally consistent database not just for its core capabilities, but as a strategic enabler for developer productivity, operational efficiency, advanced analytics, and future AI ambitions.” – Pablo Leopoldo Arrojo, Software Technical Leader, Mercado Libre

Ericsson achieves data integrity and superior governance with Dataplex

Who: Ericsson is one of the world's leading providers of telecommunications and networking technology and solutions. Its Managed Services unit provides network operations and optimization, including field operations, for various telecom and enterprise customers, including outsourcing network performance management, future provisioning, network vulnerability management, and network energy infrastructure management.

What they did: To power the future of its autonomous network operations and deliver on its strategic priorities across a global network of more than 710,000 sites, Ericsson's Managed Services has been on a transformative data journey with governance at the center of its strategy. Ericsson moved from foundational practices to a sophisticated, business-enabling data governance framework using the Dataplex Universal Catalog — turning data from a simple resource into a strategic asset.

Why it matters: With Dataplex as the governance foundation, Ericsson began implementing the core pillars of its governance program, moving from manual processes to an automated, intelligent data fabric. More specifically, Ericsson established a unified business vocabulary within Dataplex, which helped eliminate ambiguity and ensure their teams — from data scientists to data analysts — were speaking the same language.

Learn from us: “Governance is a value enabler, not a blocker. A modern data governance program should focus on business enablement first, driving value and innovation in order to complement policies, rules and risk management. Also remember this work is a journey, not a destination. Be prepared to fail fast, learn, and adapt. The landscape is constantly changing at breakneck speed.” – William McCann Murphy, Head of Data Authority, Ericsson & Akanksha Bhagwanani, EMEA Data Analytics Solution Lead, Google Cloud

How CME Group builds a faster, smarter exchange on Cloud SQL

Wed, 03 Dec 2025 17:00:00 +0000

Editor’s note: The Chicago Mercantile Exchange (CME Group) has evolved from a nineteenth-century commodities exchange into one of the most advanced financial market infrastructures in the world. To support real-time trading and risk management at a global scale, the company launched a strategic partnership with Google Cloud. By migrating to Cloud SQL and adopting AI-powered insights, CME Group empowered developers, paid down technical debt, and unlocked new opportunities for data-driven innovation across financial markets.

From butter and eggs to bandwidth

CME Group is where risk meets opportunity. Every transaction that happens in our exchange — every order placed, trade executed, or risk calculated — relies on data moving flawlessly and instantly. The integrity of our markets depends on it.

Behind each of those trades is a database storing valuations, ownership, and so much more information, all of which can shift from millisecond to millisecond throughout the day. At our scale, those databases have to store and retrieve that information under relentless demand. We’re processing millions of messages a day with no margin for latency or error. That level of precision doesn’t come easily, especially in a highly regulated industry where performance has to coexist with security and reporting. Every change we make must align with strict compliance standards and global regulatory frameworks.

Speed has always been our currency, but scale became a challenge. CME Group's legacy database estate required significant engineering effort to maintain performance and meet regulatory demands. We needed to reduce operational overhead while improving our security posture. This required a managed database solution that offered transparent observability and clear compliance controls.

When Cloud SQL meets the trading floor

Our 10-year strategic partnership with Google Cloud aims to address this by migrating all our technology to the cloud, enabling us to innovate and collaborate on pushing the boundaries of what cloud infrastructure can support. Together, we’re experimenting with new ways to achieve ultra-low-latency performance in the cloud. As data volumes surge and AI becomes increasingly central to risk management, the ability to move and interpret information in milliseconds is a technical requirement. We’re building systems with Google Cloud that let us keep the market running, even as we lead it into the future.

With Cloud SQL, we’ve found a way to keep our data layer as fast and dependable as the markets we serve. Cloud SQL gives our teams real-time visibility into what’s happening inside the database. When an application slows, we can identify the root cause in minutes instead of hours. Those insights are built into the platform, which means we don’t need custom tooling or manual analysis to keep operations steady.

But for us, the value of Cloud SQL goes beyond performance tuning. It’s about confidence. Our database administrators can focus on strategic improvements, and our developers can validate and optimize queries without waiting for escalation. Taken together, we have faster troubleshooting and a data foundation ready for the always-on demands of global trading.

aside_block: <ListValue: [StructValue([('title', 'Build smarter with Google Cloud databases!'), ('body', <wagtail.rich_text.RichText object at 0x7f9163e70e20>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Cloud SQL is our new favorite teammate

The more we use Cloud SQL, the more it feels like we’ve added a new member to the team. AI-assisted insights from Cloud SQL have changed how the CME Group team works. When an application slows, Cloud SQL tells us why. It surfaces anomalies, walks us through guided analysis, and even suggests query optimizations that restore performance in minutes. Developers can see those recommendations right in their workflows, test fixes, then move on. No waiting, no hand-offs, no firefights.

In other words, AI-assisted troubleshooting has made performance management into a shared responsibility. And because Cloud SQL delivers a consistent experience, our teams can move seamlessly between environments. There’s less training – and a lot more collaboration. The end result is a smarter, more unified data culture at CME Group.

Performance is our competitive advantage

The work we’re doing with Google Cloud is about more than modernization. Every improvement in speed, reliability, and visibility translates directly into business confidence. CME Group can now deploy new features faster while maintaining the continuity our clients depend on.

Cloud SQL has given us a foundation for that agility. Fewer performance issues mean more time focused on innovation: expanding our analytics capabilities, accelerating AI initiatives, and exploring new ways to commercialize data responsibly. When you stop chasing outages, it turns out you have more time to take bigger bets and build the future.

For us at CME Group, performance has always been the product. Now, it’s also the platform. We’re building the infrastructure with Google Cloud that keeps global markets moving and the intelligence that will shape what comes next.

Learn more:

Sign up for the new Cloud SQL free trial, a dedicated 30-day program designed to give both new and existing Google Cloud users hands-on access to premium, enterprise-grade features of Cloud SQL (PostgreSQL and MySQL).
Download this IDC report to learn how migrating to Cloud SQL can lower costs, boost agility, and speed up deployments.
Learn how Ford and Yahoo gained high performance and cut costs by modernizing with Cloud SQL.

AWS and Google Cloud collaborate to simplify multicloud networking

Sun, 30 Nov 2025 19:00:00 +0000

As organizations increasingly adopt multicloud architectures, the need for interoperability between cloud service providers has never been greater. Historically, however, connecting these environments has been a challenge, forcing customers to take a complex "do-it-yourself" approach to managing global multi-layered networks at scale.

To address these challenges and advance a more open cloud environment, Amazon Web Services (AWS) and Google Cloud collaborated to transform how cloud service providers could connect with one another in a simplified manner.

Today, AWS and Google Cloud are excited to announce a jointly engineered multicloud networking solution that uses both AWS Interconnect - multicloud and Google Cloud’s Cross-Cloud Interconnect. This collaboration also introduces a new open specification for network interoperability, enabling customers to establish private, high-speed connectivity between Google Cloud and AWS with high levels of automation and speed.

“Integrating Salesforce Data 360 with the broader IT landscape requires robust, private connectivity. AWS Interconnect - multicloud allows us to establish these critical bridges to Google Cloud with the same ease as deploying internal AWS resources, utilizing pre-built capacity pools and the tools our teams already know and love. This native, streamlined experience — from provisioning through ongoing support — accelerates our customers' ability to ground their AI and analytics in trusted data, regardless of where it resides.” - Jim Ostrognai, SVP Software Engineering, Salesforce

Previously, to connect cloud service providers, customers had to manually set up complex networking components including physical connections and equipment; this approach required lengthy lead times and coordinating with multiple internal and external teams. This could take weeks or even months. AWS had a vision for developing this capability as a unified specification that could be adopted by any cloud service provider, and collaborated with Google Cloud to bring it to market.

Now, this new solution reimagines multicloud connectivity by moving away from physical infrastructure management toward a managed, cloud-native experience. By integrating AWS with Google Cloud’s Cross-Cloud Network architecture, we are abstracting the complexity of physical connectivity, network addressing, and routing policies. Customers no longer need to wait weeks for circuit provisioning: they can now provision dedicated bandwidth on demand and establish connectivity in minutes through their preferred cloud console or API.

Reliability and security are the cornerstone of this collaboration. We have collaborated on this solution to deliver high resiliency by leveraging quad-redundancy across physically redundant interconnect facilities and routers. Both providers engage in continuous monitoring to proactively detect and resolve issues. And this solution is built on a foundation of trust, utilizing MACsec encryption between the Google Cloud and AWS edge routers.

“This collaboration between AWS and Google Cloud represents a fundamental shift in multicloud connectivity. By defining and publishing a standard that removes the complexity of any physical components for customers, with high availability and security fused into that standard, customers no longer need to worry about any heavy lifting to create their desired connectivity. When they need multicloud connectivity, it's ready to activate in minutes with a simple point and click.” - Robert Kennedy, VP of Network Services, AWS

“We are excited about this collaboration which enables our customers to move their data and applications between clouds with simplified global connectivity and enhanced operational effectiveness. Today's announcement further delivers on Google Cloud’s Cross-Cloud Network solution focused on delivering an open and unified multicloud experience for customers.” - Rob Enns, VP/GM of Cloud Networking, Google Cloud

This collaboration between AWS and Google Cloud is more than a multicloud solution: it’s a step toward a more open cloud environment. The API specifications developed for this product are open for other providers and partners to adopt, as we aim to simplify global connectivity for everyone. We invite you to explore this new capability today. To learn more about how to streamline your multicloud operations please visit the in-depth Google Cloud Cross-Cloud Interconnect blog and the AWS Interconnect - multicloud website to get started.

How Lightricks trains video diffusion models at scale with JAX on TPU

Tue, 11 Nov 2025 17:00:00 +0000

Training large video diffusion models at scale isn't just computationally expensive — it can become impossible when your framework can't keep pace with your ambitions.

JAX has become a popular computational framework across AI applications, now recognized for its capabilities in training large-scale AI models, such as LLMs and life sciences models. Its strength lies not just in performance but in an expressive, scalable design that gives innovators the tools to push the boundaries of what's possible. We're consistently inspired by how researchers and engineers leverage JAX's ecosystem to solve unique, domain-specific challenges — including applications for generative media.

Today, we're excited to share the story of Lightricks, a company at the forefront of the creator economy. Their LTX-Video team is building high-performance video generation models, and their journey is a masterclass in overcoming technical hurdles. I recently spoke with Yoav HaCohen and Yaki Bitterman, who lead the video and scaling teams, respectively. They shared their experience of hitting a hard scaling wall with their previous framework and how a strategic migration to JAX became the key to unlocking the performance they needed.

Here, Yoav and Yaki tell their story in their own words. – Srikanth Kilaru, Senior Product Manager, Google ML Frameworks

The creator's challenge

At Lightricks, our goal has always been to bring advanced creative technology to consumers. With apps like Facetune, we saw the power of putting sophisticated editing tools directly into people's hands. When generative AI emerged, we knew it would fundamentally change content creation.

We launched LTX Studio to build generative video tools that truly serve the creative process. Many existing models felt like a "prompt and pray" experience, offering little control and long rendering times that stifled creativity. We needed to build our own models—ones that were not only efficient but also gave creators the controllability they deserve.

Our initial success came from training our first real-time video generation model on Google Cloud TPUs with PyTorch/XLA. But as our ambitions grew, so did the complexity. When we started developing our 13-billion-parameter model, we hit a wall.

Hitting the wall and making the switch

Our existing stack wasn’t delivering the training step times and scalability we needed. After exploring optimization options, we decided to shift our approach. We paused development to rewrite our entire training codebase in JAX, and the results were immediate. Switching to JAX felt like a magic trick, instantly providing the necessary runtimes.

This transition enabled us to effectively scale our tokens per sample (the amount of data processed in each training step), model parameters, and chip count. With JAX, sharding strategies (sharding divides large models across multiple chips) that previously failed now work out of the box on both small and large pods (clusters of TPU chips).

These changes delivered linear scaling that translates to 40% more training steps per day — directly accelerating model development and time to market. Critical issues with FlashAttention and data loading also worked reliably. As a result, our team's productivity skyrocketed, doubling the number of pull requests we could merge in a week.

Why JAX worked: A complete ecosystem for scale

The success wasn't just about raw speed; it was about the entire JAX stack, which provided the building blocks for scalable and efficient research.

A clear performance target with MaxText: We used the open-source MaxText framework as a baseline to understand what acceptable performance looked like for a large model on TPUs. This gave us a clear destination and the confidence that our performance goals were achievable on the platform.
A robust toolset: We built our new stack on the core components of the JAX ecosystem based on the MaxText blueprint. We used Flax for defining our models, Optax for implementing optimizers, and Orbax for robust checkpointing — all core components that work together natively.
Productive development and testing: The transition was remarkably smooth. We implemented unit tests to compare our new JAX implementation with the old one, ensuring correctness every step of the way. A huge productivity win was discovering that we could test our sharding logic on a single, cheap CPU before deploying to a large TPU slice. This allowed for rapid, cost-effective iteration.
Checkpointing reliability: For sharded models, JAX’s checkpointing is much more reliable than before, making training safer and more cost-effective.
Compile speed & memory: JAX compilation with lax.fori_loop is fast and uses less memory, freeing capacity for tokens and gradients.
Smooth scaling on a supercomputer: With our new JAX codebase, we were able to effectively train on a reservation of thousands of TPU cores. We chose TPUs because Google provides access to what we see as a "supercomputer" — a fully integrated system where the interconnects and networking were designed first, not as an afterthought. We manage these large-scale training jobs with our own custom Python scripts on Google Compute Engine (GCE), giving us direct control over our infrastructure. We also use Google Cloud Storage and stream the training data to the TPU virtual machines.

Architectural diagram showing the Lightricks stack

Build your models with the JAX ecosystem

Lightricks' story is a great example of how JAX's powerful, modular, and scalable design can help teams overcome critical engineering hurdles. Their ability to quickly pivot, rebuild their stack, and achieve massive performance gains is a testament to both their talented team and the tools at their disposal.

The JAX team at Google is committed to supporting innovators like Lightricks and the entire scientific computing community.

Share your story: Are you using JAX to tackle a challenging scientific problem? We would love to learn how JAX is accelerating your research.
Help guide our roadmap: Are there new features or capabilities that would unlock your next breakthrough? Your feature requests are essential for guiding the evolution of JAX.

Please reach out to the team via GitHub to share your work or discuss what you need from JAX. Check out documentation, examples, news, events and more at jaxstack.ai and jax.dev.

Sincere thanks to Yoav, Yaki, and the entire Lightricks team for sharing their insightful journey with us. We're excited to see what they create next.

11 ways to reduce your Google Cloud compute costs today

Mon, 06 Oct 2025 16:00:00 +0000

As the saying goes, "a penny saved is a penny earned," and this couldn't be more true when it comes to cloud infrastructure. In today's competitive business landscape, you need to maintain the performance to meet your business needs. Luckily, Google Cloud’s Compute Engine and block storage services offer numerous opportunities to reduce costs without sacrificing performance, especially in the context of your migration and modernization initiatives.

In this article, we'll explore 11 key ways to optimize your infrastructure spending on Google Cloud, from simple adjustments to strategic decisions that can result in significant long-term savings.

1. Choose the right VM instances

One of the most effective ways to reduce Compute Engine costs is to ensure that you’ve properly selected and right-sized your virtual machines (VMs) for their workloads to support your migration and modernization efforts. Whether you're new to Google Cloud or already using Compute Engine, adopting the latest-generation VMs — such as N4, C4, C4D, and C4A — can deliver substantial savings and improved price-performance.

Powered by Google Cloud’s Titanium architecture, our latest-generation VMs offer faster CPUs, higher memory bandwidth, and more efficient virtualization than their predecessors, so you can handle the same workloads with fewer resources. For existing customers, migrating from older VM generations to the newest VMs can significantly lower total costs while helping you exceed current performance levels. Organizations that have made the switch often report 20–40% better performance along with meaningful reductions in cloud compute spend. For example, Elastic leveraged the general-purpose C4A machine series based on Google Cloud's Arm-based Axion CPUs, to achieve a compelling efficiency and performance uplift for their workloads.

Beyond general-purpose VMs, we also offer specialized machine types to address unique customer requirements. Compute-optimized HPC VMs like H4D are designed for high-performance computing and data analytics, offering extreme performance for demanding workloads. M4 and X4 instances cater to memory-intensive applications, while Z3 instances are ideal for storage-intensive workloads. Furthermore, if you need complete control over your hardware environment and maximum performance isolation, we offer bare metal instances.

These options help ensure that even the most specialized and performance-sensitive workloads can find an optimal and cost-effective home within the Compute Engine portfolio.

2. Optimize your block storage selections

The best way to lower your block storage TCO, while ensuring your workloads remain successful, is to drive high resource efficiency. Hyperdisk makes it simple to drive high performance and high efficiency by enabling you to optimize your block storage to your workload and through Storage Pools. We’ll discuss each of these capabilities, and how you can use them to lower your block storage TCO below.

Workload Optimized: With Hyperdisk, you can independently tune capacity and performance to match your block storage resources to your workload. Hyperdisk enables you to independently provision performance and capacity at the volume level. You can leverage this capability to purchase just the capacity and performance you need, no more and no less. You can also take advantage of Hyperdisk Balanced’s “baseline” performance (i.e. included free with every volume), you can serve the vast majority of your VMs without purchasing any extra performance.

Storage Pools: Hyperdisk is the only hyperscale cloud block storage to offer thin-provisioned performance and capacity. With Hyperdisk Storage Pools, you can provision the aggregate performance and capacity your workload requires, while still provisioning the volume level capacity performance your workloads request (also known as thin-provisioning). This allows you to pay for the resources you need, not the sum of the volumes you’ve provisioned. As a result, you can lower your overall block storage TCO by as much as 50%.

For more information on how to select the right block storage for your workload and to see how customers have benefitted from Hyperdisk, read this blog.

3. Consider custom compute classes

To get the most out of our latest-generation VMs, Google Kubernetes Engine (GKE) custom compute classes (CCC) offer an advanced way to optimize compute choices and provide high availability. Instead of being limited to a single machine type for your workloads, you can define a prioritized list of VM instance types. This allows you to set the newest, most price-performant VMs — including our latest-generation VMs — as your top priority. GKE custom compute classes provide the capability to automatically and seamlessly spin up instances based on your specified priority list. This feature helps you maximize the availability of your compute capacity while still aiming for the most cost-effective options, so your workloads can scale reliably without manual intervention.

Here are some specific use cases for how custom compute classes can help you optimize costs:

Autoscaling cost-performant fallbacks: When demand peaks, you might be tempted to autoscale using a highly available but less cost-efficient VM type. CCC allows you to take a tiered approach. You can set up several cost-efficient fallback alternatives, so that as demand increases, GKE first attempts to use the most cost-effective options, and progressively moves to the other choices in your list when necessary to meet demand.
AI/ML inference: Running AI/ML inference workloads often involves significant compute resources. Instead of maintaining a large, static reservation that might sit idle during off-peak times, CCC lets you provision a minimal base reservation and leverage more cost-effective capacity types, such as Spot VMs, to handle peak inference demand — all orchestrated through your CCC configuration.
Adopting new VM generations: Combine the power of GKE custom compute classes with Compute Flexible committed use discounts (Flex CUDs) to de-risk the adoption of new, cost-efficient VM series like N4 and C4. With CCC, you can define fallback options, providing workload resilience, while Flex CUDs offer financial adaptability, as the discounts apply across your total eligible compute spend, regardless of the specific VM series you use. This dual approach is a safe, cost-effective strategy for leveraging the latest hardware without disruption. For more information, read this blog.
Using flexible Spot VMs: Spot VMs offer significant savings but can be preempted. Being constrained to a single Spot VM shape increases the risk that capacity will not be available. With CCC, you can define multiple fallback Spot VM types. This "spot surfing" capability allows the application to remain on cost-efficient Spot capacity by automatically pivoting to alternative Spot instance types if the primary choice is unavailable.

In short, by leveraging GKE CCC, you can artfully mix and match various VM types and consumption models, including On-Demand, Spot, DWS FlexStart, and instances covered by CUDs, to build a resilient and highly cost-optimized infrastructure that adapts to the unique needs and patterns of your workloads.

4. Leverage custom machine types (CMT)

Custom machine types, available on N4 VMs, allow you to precisely configure virtual machines to your exact specifications. Rather than selecting from predefined machine types that might include excess capacity, you can tailor the CPU-to-memory ratio specifically for your workloads, so you only pay for resources you actually use. This targeted approach minimizes waste and can significantly reduce your cloud spend, especially when migrating from on-premises to Google Cloud or from other cloud providers.

This flexibility becomes particularly valuable if your applications have unique resource profiles that don't align well with our standard offerings. Custom machine types let you create the perfect environment for your needs. By avoiding the compromise of over-provisioning certain resources while potentially constraining others, you can achieve both better performance and more efficient spending across your Compute Engine deployment.

As an example, take a memory-intensive workload that runs best with 16 vCPU, and 70 GB memory. Normally, you would need to pick a VM with 128 GB memory with our standard shapes, or in other cloud contexts, resulting in higher costs to run your workload due to the extra provisioned resources. Instead, with custom machine types, you can easily launch a VM with 16 vCPU and 70 GB memory, resulting in an 18% cost savings vs standard N4-highmem-16 VMs.

5. Make the most of committed use discounts

CUDs are a strategic cost-saving opportunity for organizations with steady, predictable computing needs. By committing to resource usage over one- or three-year periods, you can reduce cloud costs by up to 70% compared to on-demand pricing. This approach not only helps ensure budget predictability but also converts fixed infrastructure spending into a financial advantage, making it ideal for stable workloads that support core business functions.

Google Cloud offers flexible CUD structures to align with various operational models. Resource-based commitments target specific machine types and regions, flexible commitments apply discounts across projects, regions, and machine series — great for dynamic environments. By analyzing historical usage and forecasting future needs, you can identify workloads suited for these discounts, reinvesting the savings into innovation and scaling initiatives.

6. Manage unused disk space

You pay for the total provisioned disk space, regardless of how much you actually use. Many organizations tend to over-provision storage "just in case," which often leads to unnecessary and costly waste. For instance, if you provision a 100GB disk but only use 20GB, you're still paying for the entire 100GB. Being intentional and precise with your storage allocations — rather than rounding up to common sizes — can lead to significant cost savings.

To optimize spending, it's important to adopt a few best practices. Using Ops Agent, regularly audit disk usage across your infrastructure to identify and eliminate inefficiencies. Resize disks to align with actual consumption, allowing a reasonable buffer for growth. Implement automated alerts in Cloud Monitoring to detect underutilized disks and take corrective action. For stateless applications, consider using smaller boot disk images to minimize overhead and reduce costs even further.

In addition, consider the following optimization strategies to further reduce costs and improve efficiency:

Use Google Cloud’s monitoring tools to track CPU, memory, and disk usage over time.
Establish a regular review cycle to identify and right-size over-provisioned resources.
Test workloads across different VM configurations to find the optimal balance between cost and performance.

7. Use Spot VMs

Spot VMs provide the same machine types and configuration options as standard virtual machines but at a significantly reduced cost — typically offering a 60% to 91% discount. This cost efficiency comes with the tradeoff of potential preemption at short notice, making them most suitable for workloads that are fault-tolerant and can recover quickly from unexpected interruptions. Spot VMs are designed to take advantage of unused compute capacity, allowing you to optimize your cloud spending without compromising access to high-performance resources.

Strong use cases for Spot VMs include batch processing jobs, big data and analytics workloads, continuous integration and deployment (CI/CD) pipelines, stateless web servers running in autoscaling groups, and compute-heavy tasks. When properly architected to handle interruptions — for example, by using job checkpointing, load balancing, task queues, or via GKE custom compute classes (see more above) — Spot VMs can play a critical role in minimizing infrastructure costs while maintaining high availability and system resilience. Leveraging Spot VMs in these scenarios lets you scale cost-effectively, especially when compute demand is variable or time-flexible.

8. Use optimization recommendations

Google Cloud's Recommenders are a powerful tool designed to help you optimize your cloud resources efficiently. When browsing the Google Cloud console, you may see lightbulb icons next to specific resources — these indicate potential improvements identified by Google's recommendation engine. By analyzing real-time usage patterns and current resource configurations, the Recommender delivers actionable insights tailored to each user's unique environment. This intelligent system highlights opportunities not only to reduce costs but also to enhance security, performance, reliability, management efficiency, and environmental sustainability.

For example, there are idle VM recommendations to help you identify VM instances that have not been used over the last 1 to 14 days. Common recommendations include switching to more suitable machine types, rightsizing underutilized compute instances, or adopting more cost-effective storage solutions. The tool allows you to apply many of these changes directly, streamlining the optimization process. By continuously evaluating workloads and offering these automated, data-driven suggestions, the Recommendation Hub helps organizations maintain cloud performance while managing costs more effectively.

9. Take advantage of auto-scaling and scheduling

Matching your compute resources to actual demand patterns is one of the most effective ways to reduce cloud waste and improve overall cost efficiency. Many organizations over-provision their resources to handle peak workloads, leaving machines underutilized during off-peak periods. By aligning compute capacity more closely with real-time or predictable usage patterns, such as business hours or seasonal trends, you can significantly cut unnecessary spending without sacrificing performance.

Autoscaling is the key to achieving this efficiency. In fact, customers who leverage Google Compute Engine autoscaling for their virtual machines have seen average infrastructure cost savings of more than 40%.

You can implement autoscaling strategies to dynamically adjust resources based on CPU utilization, load balancing capacity, or custom application metrics, so that workloads receive the necessary compute power when needed, while scaling down automatically during low-demand periods.

For workloads with predictable patterns, such as those that fluctuate with business hours or planned seasonal events, schedule-based scaling is a particularly powerful tool. This approach allows you to proactively increase resources in anticipation of high demand and scale them down during lulls, for the performance you need without constant over-provisioning.

In addition to autoscaling, several practical implementation techniques can further optimize your resource usage. Setting up instance scheduling lets you automatically start and stop development and test environments according to business hours — a simple yet highly effective approach that can lead to cost savings of up to 70%. You can also leverage maintenance windows to reduce disruptions and resource consumption, by concentrating updates and system changes into low-usage periods. Together, these tactics help maintain high availability and performance while keeping infrastructure costs under control.

10. Understand your spend with detailed billing analysis

Before implementing any cost-saving strategies in Google Cloud, it’s essential to understand your current spending in detail. Google Cloud’s billing panel offers granular visibility into your expenses, including costs broken down by individual SKUs. This level of transparency lets you track where your money is going and identify potential inefficiencies. Begin by regularly reviewing your billing dashboard to monitor usage trends and spot anomalies. Applying labels and tags to your resources can further help categorize and attribute costs accurately, especially in complex environments with multiple projects or departments.

In addition, setting up budget alerts is a practical way to stay ahead of overspending by notifying you when costs approach or exceed predefined thresholds. It’s also important to identify and eliminate unused or idle resources, such as virtual machines or persistent disks that are no longer in active use — these can often be shut down or deleted to immediately reduce costs. By thoroughly analyzing your cost structure, you can uncover “low-hanging fruit” — resources that provide little or no value — and make data-driven decisions to optimize your cloud usage efficiently.

11. Consider serverless alternatives

Last but not least, Google Cloud's serverless computing offerings provide a compelling alternative to traditional virtual machines, can deliver better cost efficiency, simplified operations, and greater scalability. By abstracting away infrastructure management, serverless platforms allow teams to focus on writing and deploying code without worrying about provisioning, scaling, or maintaining servers. This shift can not only reduce operational overhead but also cut costs by aligning compute spending directly with application usage.

There are multiple serverless options available, each tailored to different workloads. Cloud Run is designed for running containerized applications that need rapid scaling and flexible deployment. Cloud Run Functions supports lightweight, event-driven code execution for microservices or automation tasks. GKE (Autopilot Mode) simplifies Kubernetes operations by automatically managing nodes and scaling, allowing you to run Kubernetes workloads without handling the underlying infrastructure. All these options charge based on usage not allocation, significantly reducing costs associated with idle resources and over-provisioning. This makes them especially beneficial for variable or unpredictable workloads. Cloud Run and GKE both support GPU’s and flexibility to move between the two. You can start with Cloud Run then move to GKE or vice-versa. Some customers also leverage both offerings for workloads. The rule of thumb is to start with GKE if you need access to the Kubernetes API. Otherwise, start with Cloud Run.

Start reducing your costs today

Migrate to Google Cloud and optimize your infrastructure costs without compromising on what your workloads need. If you are new to Google Cloud, start with a migration assessment. Google Cloud’s Migration Center can help you with a clear understanding of your potential savings by migrating to Google Cloud, with detailed recommended paths for your workloads, along with TCO reports. Apply the strategies in this article and unlock substantial cost savings.

How Baseten achieves 225% better cost-performance for AI inference (and you can too)

Thu, 04 Sep 2025 17:00:00 +0000

Baseten is one of a growing number of AI infrastructure providers, helping other startups run their models and experiments at speed and scale. Given the importance of those two factors to its customers, Baseten has just passed a significant milestone.

By leveraging the latest Google Cloud A4 virtual machines (VMs) based on NVIDIA Blackwell and Google Cloud’s Dynamic Workload Scheduler (‘DWS’), Baseten has achieved 225% better cost-performance for high-throughput inference and 25% better cost-performance for latency-sensitive inference.

Why it matters: This breakthrough in performance and efficiency enables companies to move powerful agentic AI and reasoning models out of the lab and into production affordably. For technical leaders, this provides a blueprint for building next-generation AI products — such as real-time voice AI, search, and agentic workflows — at a scale and cost-efficiency that has been previously unattainable.

The big picture: Inference is the cornerstone of enterprise AI. As models for multi-step reasoning and decision-making demand exponentially greater compute, the challenge of serving them efficiently has become the primary bottleneck. Enter Baseten, a six-year-old Series C company that partners with Google Cloud and NVIDIA to provide enterprise companies a scalable inference platform for their proprietary models as well as open models like Gemma, DeepSeek, and Llama, with an emphasis on performance and cost efficiency. Their success hinges on a dual strategy: maximizing the potential of cutting-edge hardware and orchestrating it with a highly optimized, open software stack.

We wanted to share more about how Baseten architected its stack — and what this new level of cost-efficiency can unlock for your inference applications.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud AI and ML'), ('body', <wagtail.rich_text.RichText object at 0x7f91746f3b80>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Hardware optimization with the latest NVIDIA GPUs

Baseten delivers production-grade inference by leveraging a wide range of NVIDIA GPUs on Google Cloud, from NVIDIA T4s through the recent A4 VMs (NVIDIA HGX B200). This access to the latest hardware is critical for achieving new levels of performance.

With A4 VMs, Baseten now serves three of the most popular open-source models — DeepSeek V3, DeepSeek R1, and Llama 4 Maverick — directly on their Model APIs with over 225% better cost-performance for high throughput inference, and 25% better cost-performance for latency- sensitive inference.
In addition to its production-ready model APIs, Baseten provides additional flexibility with NVIDIA B200-powered dedicated deployments for customers seeking to run their own custom AI models with the same reliability and efficiency.

Advanced software for peak performance

Baseten’s approach is rooted in coupling the latest accelerated hardware with leading and open-source software to extract the most value possible from every chip. This integration is made possible with Google Cloud’s AI Hypercomputer, which includes a broad suite of advanced inference frameworks, including NVIDIA’s open-source software stack — NVIDIA Dynamo and TensorRT-LLM — as well as SGLang and vLLM.

Using TensorRT-LLM, Baseten optimizes and compiles custom LLMs for one of its largest AI customers, Writer. This has boosted their throughput by more than 60% for Writer’s Palmyra LLMs. The flexibility of TensorRT-LLM also enabled Baseten to develop a custom model builder that speeds up model compilation.
To serve reasoning models like DeepSeek R1 and Llama 4 on NVIDIA Blackwell GPUs, Baseten uses NVIDIA Dynamo. The combination of NVIDIA’s HGX B200 and Dynamo dramatically lowered latency and increased throughput, propelling Baseten to the top GPU performance spot on OpenRouter’s LLM ranking leaderboard.
The team leverages techniques such as kernel fusion, memory hierarchy optimization, and custom attention kernels to increase tokens per second, reduce time to first token, and support longer context windows and larger batch sizes — all while maintaining low latency and high throughput.

Building a backbone for high availability and redundancy

For mission-critical AI services, resilience is non-negotiable. Baseten runs globally across multiple clouds and regions, requiring an infrastructure that can handle ad hoc demand and outages. Flexible consumption models, such as the Dynamic Workload Scheduler within the AI Hypercomputer, help Baseten manage capacity similar to on-demand with additional price benefits. This allows them to scale up on Google Cloud if there are outages across other clouds.

"Baseten runs globally across multi-clouds and Dynamic Workload Scheduler has saved us more than once when we encounter a failure,” said Colin McGrath, head of infrastructure at Baseten. “Our automated system moves affected workloads to other resources including Google Cloud Dynamic Workload scheduler and within minutes, everyone is up and running again. It is impressive — by the time we’re paged and check-in, everything is back and healthy. This is amazing and would not be possible without DWS. It has been the backbone for us to run our business.”

Baseten’s scalable inference platform architecture

Unlocking new AI applications for end-users

Baseten's collaboration with Google Cloud and NVIDIA demonstrates how a powerful combination of cutting-edge hardware and flexible, scalable cloud infrastructure can solve the most pressing challenges in AI inference through Google Cloud’s AI Hypercomputer.

This unique combination enables end-users across industries to bring new applications to market, such as powering agentic workflows in financial services, generating real-time audio and video content in media, and accelerating document processing in healthcare. And it’s all happening at a scale and cost that was previously unattainable.

You can easily get started with Baseten's platform through the Google Cloud Marketplace, or read more about their technical architecture in their own post.

From clicks to clusters: Expanding Confidential Computing with Intel TDX

Fri, 29 Aug 2025 16:00:00 +0000

Privacy-protecting Confidential Computing has come a long way since we introduced Confidential Virtual Machines (VMs) five years ago. The technology, which can protect data while in use, strengthens a security gap beyond data encryption at rest and in transit.

Since then, customers have used Confidential Computing to protect patient medical data, comply with privacy guidance of GDPR and Schrems II for U.S.-Europe data transfers, and run high-performance computing (HPC) workloads securely.

By isolating workloads in hardware-based Trusted Execution Environments (TEEs), Confidential Computing empowers customers to process their most sensitive information in the public cloud with assurance.

As part of the advancements we’ve made with Confidential Computing, we added even more security capabilities with the introduction of Confidential VMs with Intel Trust Domain Extensions (TDX) last year. Intel TDX creates an isolated trust domain (TD) in a VM, uses hardware extensions for managing and encrypting memory to protect cloud workloads, and offers hardware-based remote attestation for verification.

Today, we are excited to highlight our greatly expanded, and generally available, Intel TDX-based offerings, which includes Confidential GKE Nodes, Confidential Space, Confidential GPU, and more regions and zones where customers can use Confidential Computing.

Click to create a Confidential VM

Google Cloud Console now offers Google Compute Engine (GCE) customers a new interface for Intel TDX — no code changes required. To get started, follow these steps:

Start at the GCE Create an instance page
Go to the Security tab and under Confidential VM service, click Enable
Then select Intel TDX from the dropdown menu and click Confirm.

It’s that simple to create a Confidential VM.

Create a new Confidential VM with Intel TDX in the Google Cloud console.

Get Confidential Computing in more regions and zones

Confidential VMs with Intel TDX were first available with support for three regions (and nine zones.) To accommodate growing demand, we’ve expanded support for Intel TDX on the C3 machine series to 10 regions (and 21 zones,) and we are planning more for the future. The full list is available here. As regional availability and scalability are critical, your account team is available to help you plan early to ensure your capacity needs are met.

Confidential GKE Nodes with Intel TDX, now generally available

Confidential GKE Nodes are built on top of Confidential VM and deliver hardware-based protections to your Google Kubernetes Engine (GKE) clusters and node pools to ensure that your containerized workloads remain encrypted in memory. Today, Confidential GKE Nodes are generally available with Intel TDX on GKE Standard and GKE Autopilot.

Confidential GKE Nodes with Intel TDX on the C3 machine series can be created on GKE Standard via CLI, API, UI, and Terraform. The confidential setting can be set at the cluster level or the node pool level with no code changes. You can learn more here.

Confidential GKE Nodes with Intel TDX on the C3 machine series can also be created on GKE Autopilot. It can be enabled through the use of custom compute classes. In GKE, a compute class is a profile that consists of a set of node attributes that GKE uses to provision the nodes that run your workloads during autoscaling events. Check out our documentation to get started.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud security products'), ('body', <wagtail.rich_text.RichText object at 0x7f91629bf160>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Confidential Space with Intel TDX, now generally available

Also built on Confidential VM, our Confidential Space offering is a robust solution for many common issues including addressing insider threats, enabling joint machine-learning training and private gen AI inference, and fostering multi-party collaboration on sensitive data. Here are just a few examples of what our customers have built with Confidential Space:

Confidential matching enabled customers to securely connect their first-party data for Google Ads measurement and audience solutions.
Symphony demonstrated with its Confidential Cloud how SaaS companies can guarantee isolation of customer data from privileged insiders in the highly regulated financial industry.
Duality delivered privacy-preserving federated learning solutions for a broad range of use cases in healthcare, financial services, and the public sector.
Flare spearheaded innovation in verifiable AI on blockchain.

Previously, Confidential Space was only available with AMD-based technology and hardware (on the N2D, C2D, C3D, and C4D machine series), but now it is also available with Intel-based technology and hardware. This is ideal for those wanting attestation guarantees with a hardware root of trust and for those focused on Intel’s C3 machine series.

Additionally, Confidential Space with Intel TDX is measured into runtime measurement registers (RTMR) and the measurements are verified by Google Cloud Attestation. Note that for Confidential VMs with Intel TDX, RTMRs are now populated as well. Confidential Space benefits are highlighted in the NCC Group’s latest independent security evaluation.

Confidential VM and Confidential GKE Nodes with NVIDIA H100 GPUs, now generally available

If you’re looking for performance and security while protecting data in use, Confidential VM and Confidential GKE Nodes with NVIDIA H100 GPUs on the accelerator-optimized A3 machine series are now generally available. These offerings deliver Google Cloud’s first Confidential GPUs, focus on ease of use to meet the demand for secure computing, and extend security to data-intensive, AI and ML workloads by having Intel TDX enabled on the CPU and NVIDIA Confidential Computing enabled on the GPU. You now have the ability to secure your data performantly during inference and training across models.

Confidential VM with NVIDIA H100 GPUs is available with the a3-highgpu-1g machine type and in three zones: europe-west4-c, us-central1-a, and us-east5-a. No code changes are needed for most AI and ML workloads. For pricing details, see here. Confidential GKE Nodes with NVIDIA H100 GPUs are generally available on both GKE Standard and GKE Autopilot (through custom compute class). To get started, click here.

And, we also have Confidential Space with NVIDIA H100 GPUs in preview.

Intel has a free tier for independent attestation

Intel’s attestation verifier service, Intel Tiber Trust Authority, now has a free tier. Google Cloud Confidential VMs and Confidential Space are both integrated with Intel Tiber Trust Authority as a third party attestation service, but now Intel Tiber Trust Authority is making secure attestation more accessible for all by offering a free tier (with optional paid support).

When Confidential VM and Confidential Space customers use Intel Tiber Trust Authority, they can gain stronger separation of duties security guarantees. Click here to learn more.

What our customers say

"Thanks to the joint efforts of Super Protocol, Google Cloud, and NVIDIA, the world now gains a new layer of possibility — unlocking Confidential AI without cloud borders. With A3 Confidential VMs built on NVIDIA H100 GPUs now integrated into Super’s decentralized infrastructure and marketplace, companies can securely run, monetize, and collaborate on sensitive AI and data — across any environment. This enables seamless collaboration between Google Cloud customers and partners in other clouds — with no need for shared trust, manual agreements, or compromise. For the broader market, A3 instances at scale accelerate global access, while Super ensures confidentiality, verifiability, and self-sovereignty — fully automated and requiring no expertise in confidential computing. We are excited to open this next chapter of Confidential AI, built to work wherever you and your partners are," said Nukri Basharuli, founder and CEO, Super Protocol.

“We’re proud to have partnered with Google Cloud to validate their Confidential Computing-enabled GPU solution — a major step forward in securing sensitive data for AI and machine learning workloads, without compromising on performance or scalability. Confidential Computing allows organizations to process sensitive workloads in the cloud while protecting sensitive data and models from both the cloud provider and the organization's insiders and internal threats. However, for gen AI and agentic AI use cases, protecting the CPU alone isn’t enough — both CPU and GPU must also run in confidential mode with mutual trust. With Google Cloud’s new offering, Anjuna can now launch Confidential Containers that leverage Intel TDX and NVIDIA H100 GPUs in confidential mode. This ensures that data, configurations, secrets, and code remain protected end-to-end from any untrusted entity, bringing state-of-the-art security for sensitive data.” said Steve Van Lare, CTO, Anjuna Security.

“With data processing worldwide growing up to three times faster than ever before and doubling every six months, the future of cloud computing must be built on trust. In collaboration with Google, Modelyo leverages Confidential VMs on the A3 machine series with NVIDIA H100 GPUs, transforming Confidential Computing into a seamless, intuitive, and fully integrated cloud experience. This enables us to deliver end-to-end managed solutions across interconnected environments, empowering organizations to innovate confidently knowing their data remains effortlessly protected at every stage.” said Benny Meir, CEO, Modelyo.

How to get started with Confidential Computing

To add that extra layer of protection and privacy to your sensitive workloads, check out our documentation for Confidential VMs and Confidential GKE Nodes today.

Run Gemini anywhere, including on-premises, with Google Distributed Cloud

Thu, 28 Aug 2025 05:00:00 +0000

Earlier this year, we announced our commitment to bring Gemini to on-premises environments with Google Distributed Cloud (GDC). Today, we are excited to announce that Gemini on GDC is now available to customers.

For years, enterprises and governments with the strictest data security and sovereignty requirements have faced a difficult choice: adopt modern AI or protect their data. Today, that compromise ends. We are announcing the general availability of Gemini on GDC air-gapped and preview of Gemini on GDC connected, bringing Google's most advanced models directly into your data center.

We are inspired by initial feedback from customers, including Singapore’s Centre for Strategic Infocomm Technologies (CSIT), Government Technology Agency of Singapore (GovTech Singapore), Home Team Science and Technology Agency (HTX), KDDI, and Liquid C2, who are excited to gain the advantages of generative AI with Gemini on GDC.

Transformative AI capabilities, on-premises

Gemini models offer groundbreaking capabilities, from processing extensive context to native multimodal understanding of text, images, audio, and video. This unlocks a wide array of high-impact use cases on secure infrastructure:

Unlock new markets and global collaboration: Instantly break down language barriers across your international operations, creating a more connected and efficient global workforce.
Accelerate decision-making: Make faster, data-driven decisions by using AI to automatically summarize documents, analyze sentiment, and extract insights from your proprietary datasets.
Improve employee efficiency and customer satisfaction: Deliver instant, 24/7 support and enhance user satisfaction by developing intelligent chatbots and virtual assistants for customers and employees.
Increase development velocity: Ship higher-quality software faster by using Gemini for automated code generation, intelligent code completion, and proactive bug detection.
Strengthen safety & compliance: Protect your users with AI-powered safety tools that automatically filter harmful content and ensure adherence to industry policies.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f917490c5e0>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Secure AI infrastructure where you need it

It takes more than just a model to drive business value with generative AI; you need a complete platform that includes scalable AI infrastructure, a library with the latest foundational models, high-performance inferencing services, and pre-built AI agents like Agentspace search. GDC provides all that and more with an end-to-end AI stack combining our latest-generation AI infrastructure with the power of Gemini models to accelerate and enhance all your AI workloads.

Delivering these transformative capabilities securely requires a complete, end-to-end platform that only Google is providing today :

Performance at scale: GDC utilizes the latest NVIDIA GPU accelerators, including the NVIDIA Hopper and Blackwell GPUs. A fully managed Gemini endpoint is available within a customer or partner data center, featuring a seamless, zero-touch update experience. High performance and availability are maintained through automatic load balancing and auto-scaling of the Gemini endpoint, which is handled by our L7 load balancer and advanced fleet management capabilities.
Foundation of security and control: Security is a core component of our solution, with audit logging and access control capabilities that provide full transparency for customers. This allows them to monitor all data traffic in and out of their on-premises AI environment and meet strict compliance requirements. The platform also features Confidential Computing support for both CPUs (with Intel TDX) and GPUs (with NVIDIA's confidential computing) to secure sensitive data and prevent tampering or exfiltration.
Flexibility and speed for your AI strategy: the platform supports a variety of industry-leading models including Gemini 2.5 Flash and Pro, Vertex AI task-specific models (translation, optical character recognition, speech-to-text, and embeddings generation), and Google’s open-source Gemma models. GDC also provides managed VM shapes (A3 & A4 VMs) and Kubernetes clusters giving customers the ability to deploy any open-source or custom AI model, and custom AI workloads of their choice. This is complemented by Vertex AI services that provide an end-to-end AI platform including a managed serving engine, data connectors, and pre-built agents like Agentspace search (in preview) for a unified search experience across on-premises data.

What our customers are saying

“As a key GDC collaboration partner in shaping the GDC air-gapped product roadmap and validating the deployment solutions, we’re delighted that this pioneering role has helped us grow our cutting-edge capabilities and establish a proven deployment blueprint that will benefit other agencies with similar requirements. This is only possible with the deep, strategic collaboration between CSIT and Google Cloud. We’re also excited about the availability of Gemini on GDC, and we look forward to building on our partnership to develop and deploy agentic AI applications for our national security mission.” - Loh Chee Kin, Deputy Chief Executive, Centre for Strategic Infocomm Technologies (CSIT)

“One of our priorities is to harness the potential of AI while ensuring that our systems and the services citizens and businesses rely on remain secure. Google Cloud has demonstrated a strong commitment to supporting the public sector with initiatives that enable the agile and responsible adoption of AI. We look forward to working more closely with Google Cloud to deliver technology for the public good.” - Goh Wei Boon, Chief Executive, Government Technology Agency of Singapore

“The ability to deploy Gemini on Google Distributed Cloud will allow us to bridge the gap between our on-premises data and the latest advancements in AI. Google Distributed Cloud gives us a secure, managed platform to innovate with AI, without compromising our strict data residency and compliance requirements.” - Ang Chee Wee, Chief AI Officer, Home Team Science & Technology Agency (HTX)

“The partnership with Google Cloud and the integration of Google's leading Gemini models will bring cutting-edge AI capabilities, meet specific performance requirements, address data locality and regulatory needs of Japanese businesses and consumers.” - Toru Maruta, Executive Officer, Head of Advancing Business Platform Division, KDDI

"Data security and sovereignty are paramount for our customers. With Gemini on Google Distributed Cloud, our Liquid Cloud and Cyber Security solution would deliver strategic value to ensure our customers in highly regulated industries can harness the power of AI while keeping their most valuable data under their control." - Oswald Jumira, CEO Liquid C2

Gemini everywhere is here

The era of on-premises AI without compromise is here. To bring the power of Gemini to your on-premises environment, request a strategy session with our experts.

An efficient path to production AI: Kakao’s journey with JAX and Cloud TPUs

Tue, 19 Aug 2025 16:00:00 +0000

When your messaging platform serves 49 million people – 93% of South Korea’s population – every technical decision carries enormous weight. The engineering team at Kakao faced exactly this challenge when their existing infrastructure hit critical limitations. Their solution? A strategic shift to Google Cloud TPUs using the JAX framework that not only solved their immediate scalability needs but opened new possibilities for advanced AI model development.

Kakao’s approach provides a compelling example of leveraging the high-performance array computing framework JAX for AI model development at scale. While their primary training environment was GPU-based, the team made a strategic decision to adopt the JAX stack on Google Cloud TPUs to optimize for cost and efficiency.

This work laid the groundwork for the development of their proprietary Kanana model family, and several Kanana models — including Kanana-MoE — have recently been released as open source on Hugging Face Hub.

In this post, Minho Ryu and Nayeon Kim detail Kakao’s technical journey. They cover their specific implementation details, from adapting the JAX large language model framework and MaxText for custom data pipelines to their work on mixture-of-experts (MoE) model training.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f91639929a0>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Kakao’s journey by Minho and Nayeon:

As engineers at Kakao, we develop models that serve KakaoTalk, a platform supporting services that extend far beyond text. Our rich ecosystem includes chat with over 700,000 images and stickers (emojis), voice and video calls, finance, and navigation.

KakaoTalk’s massive scale and complexity demand that our language models are not only highly efficient but also excel at understanding the Korean language and are flexible enough for diverse applications. These real-world product requirements directly influenced our technical decisions and our need for a customizable training framework.

Our journey with JAX began at an important inflection point. Our existing GPU-based infrastructure was reaching power and budget capacity constraints. We had two options: expand our GPU infrastructure and maintain our existing codebase, or adopt Cloud TPUs, which offered cost-performance advantages while requiring adoption of a new toolchain. We chose Cloud TPUs, viewing the short-term investment as worthwhile for long-term cost-performance benefits, and built our stack on JAX.

We use XPK for Kubernetes cluster management, which simplifies job creation and management on GKE without requiring Kubernetes expertise. For the data pipeline, we adopted Grain due to its deterministic behavior, which is essential for the stability of long-running AI model training jobs.

We focused on adapting the MaxText framework to fit our specific research and compatibility needs. We made two key customizations to the pipeline:

1. Multi-source data blending: When we began exploring training with MaxText, it assumed a single, pre-mixed corpus. Our research requires blending different data sources — such as web text, code, and math — with specific, dynamically-adjusted weights during different training phases. To achieve this flexibility without reprocessing terabytes of data for each experiment, we implemented a solution using Grain's mix function. This approach allows us to define blending ratios in our configuration, providing the adaptability essential for our iterative research process. We filed a PR for this feature to be supported in MaxText natively, and it has been incorporated here since.

2. Token Processing for Efficiency and Compatibility: To maintain compatibility with our existing Megatron-LM pipeline and improve efficiency, we modified MaxText's token processing logic. Our data preparation method constructs each training sequence by appending the first token of the subsequent sequence. This creates overlapping, continuous sequences, ensuring that no information is lost at the boundaries and maximizing data utilization.

To validate our new TPU-based workflow, we trained two models. First, we trained the Kanana 2.1 billion parameter model from scratch, and the results demonstrated that our MaxText implementation achieved performance comparable to our existing GPU-based Megatron-LM pipeline at each stage. Second, we performed depth upscaling with continued pre-training from our existing 8B model to a 9.8B architecture. Both approaches succeeded and showed consistent improvements across various benchmarks, confirming that the results on GPU were effectively reproduced on TPU.

Advancing our approach: Training Mixture-of-Experts (MoE) models with MaxText

With the core pipeline validated, we began experimenting with more advanced architectures, specifically MoE models, to build inference-efficient models that maintain strong performance. Our objectives were to explore upcycling an existing dense model into an MoE structure and to evaluate the suitability of the TPU and MaxText stack for this task.

For the experiment, we upcycled our 2.1B dense model into a 13.4B parameter (2.3B active) MoE architecture with 64 experts and 8 active experts per token. We trained this model on the exact same dataset as the original dense model to isolate the impact of the architectural change. The training was performed on v5e TPUs using MaxText with Fully Sharded Data Parallelism (FSDP).

The implementation process was straightforward. We found that MaxText's flexible design, built on Flax, Optax, and Orbax, was well-suited for the wide range of ablations required for MoE research. Specifically:

Integrated Kernels: Megablocks MoE kernels which support optimized MoE features like Group GEMM were already integrated into JAX.
Combining Schedules: We used the optax.join_schedules function to combine multiple learning rate schedules (e.g. warmup, constant, and annealing) into a single, custom schedule for our training run. This ability to combine different schedules is very useful to experiment with different training strategies.

Code Customization: We needed to enable the load balancing loss for our sparse matmul implementation. This required inserting a single line of code in the permute function within the MoE block of MaxText to calculate the loss directly from the router logits.

The results showed performance improvements, particularly in code and math benchmarks, suggesting domain specialization among the experts.

Performance Evaluation

This met our objectives and further demonstrated the JAX stack's utility for advanced model development. We are now extending this work by experimenting with shared experts and replacing initial MoE layers with dense layers, modifications which are simple to implement within the MaxText framework.

Performance improvements and key takeaways

During our work, we gained early access to Trillium TPUs. We managed the transition from v5e by changing a few parameters in our XPK cluster and workload configurations. We observed an immediate and substantial throughput increase of 2.7x across our models, along with improved cost-performance efficiency.

Based on our experience, the JAX stack on TPUs provides a comprehensive and efficient environment for AI model development. The key advantages for our team include:

Performance and scalability: The JAX and XLA combination provides just-in-time compilation, and MaxText is optimized for large-scale parallel computing with support for paradigms like SPMD and FSDP.
Customizability and control: The codebase, being pure Python and built on libraries like Flax, Optax, and Orbax is intuitive and easy to modify. This allows us to implement custom data pipelines, training strategies, and novel architectures with minimal overhead.
Rapid feature adoption: The MaxText framework is updated quickly with features from new state-of-the-art models, allowing us to stay current with our research.

These strengths have made the JAX stack a powerful and flexible foundation for our work in training large language models at Kakao.

Build your Language Models with the JAX Ecosystem:

Kakao's journey demonstrates how the JAX ecosystem’s modular design — including MaxText, Flax, Optax, and Orbax — enables the customization required for both production pipelines and advanced research, from tailored data blending to rapid experimentation with MoE architectures.

Our sincere thanks to Minho, Nayeon and their team for sharing their insightful engineering work. We look forward to seeing how they and other leading enterprises worldwide continue to use the JAX ecosystem to build the next generation of powerful and efficient language models.

How Yahoo Calendar broke free from hardware queues and DBA bottlenecks

Mon, 11 Aug 2025 16:00:00 +0000

Editor's note: Yahoo Mail is in the midst of one of its largest infrastructure transformations to date: a multi-year effort to modernize hundreds of petabytes of services by moving to Google Cloud.The Yahoo Mail migration - a high-scale always-on workload - began with Yahoo Calendar, a product that is an essential part of the experience for hundreds of millions of Yahoo Mail users. It was a massive undertaking with no room for error, and the result was a smooth cutover with no customer impact that proved Cloud SQL could handle the complexity and pace of Yahoo’s operations. It also marked a shift in how Yahoo works by reducing manual overhead, unlocking developer agility, and laying the foundation for what comes next.

At Yahoo, we knew migrating our cornerstone platform — Yahoo Mail — to the cloud would be one of the most significant infrastructure efforts we’d ever taken on. With over 500 petabytes of interconnected systems, we knew we needed to start with a smaller, high-impact workload to build early confidence. That’s how Yahoo Calendar, a product that is an essential part of the experience for hundreds of millions of Yahoo Mail users, , became the first production service to make the move.

We needed to migrate a high-scale, always-on service without disrupting the experience users rely on every day — or risk millions of people missing standups, birthday dinners, or that dentist appointment they actually remembered to schedule.

We chose Google Cloud to help us modernize our operations with managed infrastructure, reduce manual effort, and tap into a trusted ecosystem for large-scale transformation. Migrating Yahoo Calendar became our proving ground for running mission-critical services on Cloud SQL and would set the pace for the rest of our multi-year migration plan for Yahoo Mail.

aside_block: <ListValue: [StructValue([('title', 'Build smarter with Google Cloud databases!'), ('body', <wagtail.rich_text.RichText object at 0x7f9163966970>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Modernizing infrastructure without skipping a single invite

The infrastructure we were replacing included tens of on-premises MySQL (Percona) instances. It was solid but not built for operational speed. Scaling meant filing hardware requests and often waiting weeks or even months. Routine tasks like backups or upgrades had to go through a separate database administration (DBA) team. And as demand grew, the need for agility grew with it. To meet that growing need with more flexibility and speed, we took on a massive lift:

Migrating tens of database shards across multiple regions
Moving over 20+ TB of storage (excluding replicas)
Supporting peak traffic of 1 million QPS reads and 2,500 QPS writes
Replatforming our application stack to run on Google Kubernetes Engine (GKE) to support the Calendar experience for the hundreds of millions of Yahoo Mail users

Cloud SQL's support for our existing MySQL workloads with minimal changes lets us replicate our on-prem shards without a full re-architecture. That compatibility provided the foundation to restructure our full stack. To make it all work, we migrated the UI, API, and backend to GKE and connected everything to Cloud SQL deployments in multiple Google Cloud regions. All of this had to be migrated incrementally, with no downtime for public users. Traffic continued flowing through existing endpoints, and our proxy layer routed requests based on each user’s location and migration state. As database shards became ready, we carefully flipped them into read-write mode on Cloud SQL to keep Calendar users running on schedule while shifting the backend in stages.

Fig. 1 - MySQL On-Prem to Cloud SQL Initial Load + CDC

A migration this big needed backup

Google Cloud’s Professional Services Organization (PSO) played a critical role in getting us there. From the earliest stages, they were embedded with our team. They helped us evaluate Cloud SQL and Database Migration Service (DMS), guide proof-of-concept work, and stress-test our migration architecture.

When we hit a roadblock replicating data with DMS, PSO worked closely with Cloud SQL engineering and our internal security and DBA teams to design a custom workaround. During cutover, they were right there with us to help with hiccups like debugging capacity constraints or troubleshooting connection spikes during shadow traffic. They also helped us resolve reverse replication failures caused by permission changes — an edge case we wouldn’t have anticipated without their guidance.

Fig. 2 - Yahoo Calendar migration diagram

Cloud SQL helped us block time for what matters

With managed infrastructure, we’ve significantly reduced manual operations, reduced database admin overhead, and gained the agility to scale up without the wait. Our application teams now deploy and manage database shards ourselves using infrastructure as code (IaC), without relying on manual processes. Backups, patching, and failovers are automated to reduce risk and manual effort. Usage and cost monitoring are built-in, helping us optimize across the board. And thanks to tight integration with our security protocols, we’re able to maintain high confidence in operating a large-scale public-facing service.

Today, Yahoo Calendar processes hundreds of thousands of queries per second, operates 26 Cloud SQL instances with disaster recovery (DR), and runs on infrastructure that includes 2,500 virtual CPUs and 17 TB of memory for databases alone. Our application tier spans 850 pods and 2,200 vCPUs, with 10 TB of memory to match. We now run at scale, with confidence — and without waiting on hardware or handoffs.

Fig .3 - Architecture diagram of Yahoo Calendar’s services

Up next on our calendar

We’re seeing the benefits of infrastructure that works with us, not against us. And we’re doing it all without compromising on scale, performance, or security. Now that we've pressure-tested our migration strategy and refined how we operate in the cloud, we're ready to take on Yahoo Mail’s full environment — 500 petabytes and counting.

The next couple of years will be about scaling smart, staying nimble, and proving that modernization doesn’t have to mean disruption. But with the hardest part of any journey behind us (starting), and a calendar that runs on Cloud SQL, we’re in sync and right on schedule.

Learn more:

Discover how Cloud SQL can transform your business! Start a free trial today!
Download this IDC report to learn how migrating to Cloud SQL can lower costs, boost agility, and speed up deployments.
Learn how Ford and Lightricks gained high performance and cut costs by modernizing with Cloud SQL.

Infrastructure Modernization

BGP route policies: Top 3 use cases by customer demand

1. The foundation: Route filtering and network protection

2. Influencing traffic paths for active/standby architectures

3. Solving asymmetric routing with BGP communities

Get started today

Cloud Network Insights: end-to-end observability for the Cross-Cloud Network

Closing the visibility gap with active monitoring

How it works: active synthetic probing

Core capabilities

What customers are saying

Get started today

Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators & more

Urban Outfitters saves big by migrating order management

BASF manages supply chain decisions with AlphaEvolve

UKG unlocks real-time workforce intelligence at scale

WPP accelerates humanoid robot training 10x with G4 VMs

Breuninger boosted sales with its "be your own model" AI

Glance turns hours of video into mobile-ready clips

Movix fills a gap in dental skills with specialized agentic AI

How Imgix processes 8 billion images daily with G4 VMs powered by NVIDIA Blackwell

The challenge: Instant visuals at scale

Adopting the system that runs Google

Inside the Imgix architecture

Advanced video and image intelligence

The results: 50% faster and up to 6x more throughput

Orchestrating at scale

The future: From experimentation to execution

Get started

Google Cloud and NVIDIA expand AI innovation across industries at GTC 2026

What’s new with the Cross-Cloud Network at Next ‘26

Optimized networking infrastructure for AI

Networking for agents

Networking for inference

Networking for training

Cross-Cloud Network for AI and core applications

Ultra Low Latency Solution for financial exchanges

Cross-cloud observability for networks, applications, and agents

Cross-Cloud Network for distributed applications

Cross-Cloud Network for internet-facing services

Cross-Cloud Network’s Cloud WAN for global enterprises

AI-powered security against evolving threats

AI-powered network operations

The network that scales with you

Building the Agentic Enterprise with Google Cloud partners and a $750M innovation fund

Investing to accelerate AI agent development

Surfacing partner-built agents in Gemini Enterprise

Empowering global consulting partners to drive AI transformations

Bringing Gemini to More Customers through Popular SaaS Platforms

Building a partner channel for the agentic era

Building the agentic future: A spotlight on Google Cloud’s media & entertainment partner ecosystem

Cool stuff Google Cloud customers built, April edition: BMW big on SLMs, MLB’s Scout Insights AI, personalized resort experiences

BMW tests the big potential of small models

MLB Scout Insights: AI-powered color commentary

Vail Resorts makes personalized AI assistance easy

CTC Global turns the smart grid into an intelligent one

ID.me reduces risk while scaling past 160 million users

Manhattan Associates powers more than a billion daily API calls

Cool stuff Google Cloud customers built, Feb. edition: Telco data reinvention; Golden State’s “G.O.A.T.T.”; John Lewis explores DORA

Fastweb + Vodafone reimagined data workflows

John Lewis measures the value of its developer platform

Hackensack Meridian Health de-risks network migration using VPC Flow Logs

The Golden State Warriors’ AI-powered back office

Ab Initio unlocks enterprise data for the agentic AI era

Accelerate migrations with new incentives from the Rapid Migration and Modernization Program (RaMP)

From technical debt to AI readiness

Building for the future

Cool stuff Google Cloud customers built, Dec. edition: AI for better toys, reliable mapping tech, Gemini stumps an all-star & more

Waze keeps traffic flowing with Memorystore

What Stephen Curry learned from a custom Gemini agent

How CME Group builds a faster, smarter exchange

AppOrchid’s multi-agent system for superior business forecasting

How Mattel uses AI for real-time product updates

Virgin Media O2 uses data contracts for scalable AI products

Inside Mercado Libre's multi-faceted Spanner architecture

Ericsson achieves data integrity and superior governance with Dataplex

How CME Group builds a faster, smarter exchange on Cloud SQL

From butter and eggs to bandwidth

When Cloud SQL meets the trading floor

Cloud SQL is our new favorite teammate