Storage & Data Transfer

Cloud Storage Rapid: Turbocharged object storage for AI and analytics

Mon, 11 May 2026 17:00:00 +0000

At Google Cloud Next ’26 we announced Cloud Storage Rapid, a family of object storage capabilities for data-intensive workloads like AI and analytics. Out of the gate, Cloud Storage Rapid consists of Rapid Bucket (formerly Rapid Storage), a high-performance zonal object storage offering, and Rapid Cache (formerly Anywhere Cache), which accelerates reads on-demand and colocates compute and data for workloads in existing buckets.

Cloud Storage Rapid is our response to the generational shift in how organizations build with AI. Teams are training trillion-parameter models, deploying inference at global scale, and building autonomous agents that reason over vast amounts of enterprise data. While accelerators like GPUs and TPUs often get the spotlight, they have a critical dependency: storage.

Storage is the engine that feeds accelerators during training, and the fast-access layer that makes real-time inference responsive. But as models scale, storage performance can be a bottleneck. Every time an AI/ML cluster waits on a data read or a checkpoint write stalls, you are paying for expensive compute cycles that aren't doing useful work.

Historically, AI/ML practitioners have had to choose between the specialized performance of a niche, zonal storage system, and the reliability and scale of a global object store like Google Cloud Storage. Many developers value Cloud Storage for its simplicity, scalability, reliability, and cost-effectiveness, but as the AI era has progressed, they’ve been throwing hotter and hotter workloads at it, running training and inference workloads with thousands of GPUs and TPUs. We’ve reached a performance tipping point that traditional object storage was never meant to handle. The Rapid family provides multiple options for co-locating compute workloads directly with high-performance zonal storage. It minimizes I/O bottlenecks that can block accelerators, so that your GPUs and TPUs stay fully saturated and productive. In this blog, let’s take a closer look at Cloud Storage Rapid’s capabilities.

Rapid Bucket

Rapid Bucket (GA), helps Cloud Storage meet the evolving demands of massive-scale generative AI, analytics, and other high-performance workloads. It does so by leveraging Colossus, the Google distributed storage system that powers Gemini and YouTube, to provide massive read/write performance and ultra-low latency in a dedicated object storage zonal bucket.

Lightning-fast performance
By combining the sub-millisecond latency of block-like storage, the throughput of a parallel filesystem, and the scalability and ease of use of object storage, Rapid Bucket provides high performance from the same Cloud Storage that you know and love.

Highlights include:

Ultra-low latency: Achieve up to 20 million queries per second and sub-millisecond latency.
Massive scalability: Rapid Bucket delivers 15+ TB/s of aggregate read throughput from a single Rapid zonal bucket.
New semantics: Enable higher performance with new capabilities such as native appends, unlimited readers (while writing!), and vectored reads.

Optimized for AI and analytics
You can use Rapid Bucket for a variety of demanding scenarios, including AI/ML data preparation, training, checkpointing, batch and streaming analytics processing, and optimizing distributed database architectures.

Key benefits include:

Optimized accelerator utilization: With Rapid Bucket, we observed 50% reduced blocked GPU time and up to 2.5x faster data loading for multi-modal training runs.
Faster checkpointing: Rapid Bucket makes checkpoint restores up to 5x faster and writes 3.2x faster compared to traditional object storage. This ensures faster recovery from workload interruptions, minimizes wasted accelerator time, and increases overall efficiency.

>5x faster checkpoint restores with Rapid Bucket

>3.2x faster checkpoint writes with Rapid Bucket

You can get started with Rapid Bucket here.

Rapid Cache

Originally announced at Cloud Next ‘25, Rapid Cache accelerates bandwidth for AI/ML workloads like data prep, training, and bursty model loading for inference, delivering an aggregate read throughput of 2.5 TB/s for your existing buckets — with no code changes. For inference workloads, we’ve observed that Rapid Cache provides up to 2.1x (114%) accelerated model load, resulting in 47% TCO savings.

When combined with multi-region buckets, customers can flexibly access GPUs and TPUs distributed across regions in a geo, while maintaining a single bucket namespace. This eliminates the need for manually orchestrated data movements between buckets, while benefitting from zonally co-located high performance.

New: Rapid Cache ingest on write
Customers at some of the world’s largest frontier AI/ML labs told us that they were looking for ways to accelerate reads immediately after a write, such as checkpoint restore workloads or a data prep pipeline that then feeds training. Before, caching the data required an initial read to trigger ingestion, which was served directly from the bucket at standard performance.

Rapid Cache’s new ingest on write feature solves this by simultaneously writing data to the Rapid Cache as it is being written to a Cloud Storage bucket. This proactive approach eliminates the initial cache-miss penalty, and helps workloads benefit from an immediate cache hit on the very first read. This provides up to 2.2x faster checkpoint restore times, allowing training clusters to recover faster from interruption.

To enable ingest on write, simply modify the ingestion criteria of your existing Rapid Cache.

Rapid Cache’s simplicity and performance has resulted in explosive adoption. In just one year since General Availability, customers have deployed thousands of Rapid Caches with a 20x growth in caches deployed, In fact,Rapid Cache serves up to 20% of Cloud Storage’s global egress. Cutting-edge AI/ML customers deploy their workloads on Rapid Cache, including Anthropic who uses Rapid Cache to improve the resilience of their cloud workload by co-locating data with TPUs in a single zone and providing dynamically scalable read throughput up to 2.5TB/s.

Case study: Thinking Machines Lab
Thinking Machines Lab is an artificial intelligence research and product company. Its mission is to make AI systems that are adaptable and customizable, building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals.

At Next ‘26, James Sun, Member of Technical Staff at Thinking Machines Lab, spoke at our session, Cloud Storage Rapid: Turbocharged object storage for AI & Analytics, where he presented about the needs of the data-hungry AI/ML workloads that Thinking Machines Lab runs for high-performance storage at scale.

Thinking Machines runs diverse workflows: data processing in Dataflow, Kafka, and Spark, multi-model training, and serving Tinker — a flexible API for fine-tuning open source models. Thinking Machines' workloads run on Google Cloud Storage, Sun explained. Running these data-intensive AI/ML workloads at such a large scale introduces significant infrastructure challenges.

The first is managing a hub and spoke data architecture, where data processing hubs are located in one primary region while training GPUs are spread across multiple regions. Historically, this has made manual data movement and lifecycle management a major operational pain point. Furthermore, Thinking Machines Lab's workloads such as data prep and pretraining workflows, which rely on massive-scale Spark workloads to prepare their multi-modal datasets, often spike from cold to hot instantly. Previously, these surges led to disruptive 429 errors, which stalled data processing and loading, and interrupted critical training cycles.

To minimize these bottlenecks, Thinking Machines Lab integrated Rapid Cache across their AI/ML pipeline, to positive results.

“Rapid Cache has become a core foundation of our AI/ML data infrastructure, supporting our critical workflows, from data prep and pretraining to training and model loading. By acting as a crucial bandwidth shield and booster, it enables us to scale our data-intensive workloads across our entire fleet without compromise, providing us with the on-demand high bandwidth and consistent stability that we need to innovate at speed.” - James Sun, Member of Technical Staff, Thinking Machines Lab

In short, Cloud Storage and Rapid Cache provides Thinking Machines Lab with:

Easy, instant, scalable, on demand bandwidth: The team now achieves stable read throughput peaks of over 1.8TB/s.
Enhanced stability: Rapid Cache has greatly reduced tail-end latencies and 429 errors, providing the consistent performance needed for multi-modal training.
Fleet-wide scalability: Combined with multi-region buckets, they can now scale data-intensive workloads across their entire fleet, meeting the demands of a rapidly growing compute scale without the hassle of manual data movement while benefiting from zonally colocated storage for high performance.
Operational efficiency: The use of Hierarchical Namespace (HNS) has optimized their massive Spark workloads for data preparation, by supporting fast directory renames, along with providing the ability to ramp QPS more quickly as they scale out clusters. Rapid Cache’s "ingest on write" capability helps ensure immediate cache hits for checkpoint restores.

Choose your rocket ship

Whether you are running data preparation, massive-scale training, or low-latency inference, Cloud Storage Rapid delivers high performance together with the reliability and scalability that Cloud Storage is known for.

Rapid Bucket delivers the highest Cloud Storage throughput and queries per second as well as the lowest latency for read/write use cases, such as analytics, AI training, checkpointing, and model serving. This helps to reduce storage bottlenecks and increase compute utilization.
Rapid Cache provides higher read bandwidth and tail latency stabilization in existing buckets, without code changes. Key use cases include AI training, checkpoint restores, and serving, as well as accelerator optionality via multi-region buckets.

Get started with the Cloud Storage Rapid family today!

Storage innovations to accelerate your AI workloads at Next ‘26

Wed, 22 Apr 2026 12:00:00 +0000

At Google Cloud Next, we are announcing innovations across every layer of our storage stacks — performance, intelligence, and management — to ensure your data is as fast and as useful as the AI models, apps and agents you are building.

Why it matters: Storage is no longer just a place to keep data. When training AI models, storage is the engine that feeds data-hungry accelerators. During AI inference, it’s the access layer that makes it responsive, acting as the source for the context that AI agents need to be effective. When storage performance falls short, accelerators sit idle, agents respond slowly, and data remains invisible to AI models.

But storage performance is only half the battle; you also need storage that’s smart. With the help of Google’s AI models integrated directly into the storage layer, you’re no longer just storing bits, but data that has full context about its content. In this new era of smart storage, raw data becomes a valuable asset that’s ready to use by a variety of downstream AI and enterprise applications.

What’s new:

High-performance storage infrastructure: New Rapid family of features in Cloud Storage for high-performance object storage; delivering 10x performance enhancements plus a new cost-effective Dynamic tier for Google Cloud Managed Lustre.
Smart Storage: Unlocking unstructured data with automated metadata annotation, and AI agent connectivity via MCP.
Storage Intelligence: Streamlined data management through zero-configuration dashboards, aggregated activity views, and enhanced batch operations.
Enhanced ecosystem: Expanded capabilities across Google Cloud NetApp Volumes, Filestore for GKE, and our backup and data protection portfolio.

Let’s take a deeper look at the storage enhancements we are unveiling this week.

Storage infrastructure that keeps up with AI

As AI models scale, getting data from the storage to the compute layer fast enough can be a bottleneck. New storage capabilities move performance directly into the storage layer, reducing total cost of ownership (TCO) and keeping accelerators fully utilized.

Cloud Storage Rapid
Cloud-based object storage like our Cloud Storage is scalable and cost-effective, but bottlenecks can stall AI jobs and waste expensive compute cycles. Every time a training cluster waits on a read or a checkpoint write stalls, you're paying for accelerators that aren't doing useful work.

Cloud Storage Rapid marks a fundamental shift in designing AI infrastructure: you no longer have to choose between the reliability of object storage and the high performance of a specialized AI storage system. Cloud Storage Rapid lets you leverage the industry-leading durability, massive distributed scale, and cost-effective auto-tiering of object storage, while simultaneously achieving extreme throughput, frequent I/Os, and ultra-low latency. With native integrations into PyTorch and JAX, Cloud Storage Rapid is optimized out-of-box for the most popular AI/ML ecosystem frameworks, so that your data preparation, training, and inference workloads run on a high-performance and reliable foundation.

The Cloud Storage Rapid family consists of two offerings: Rapid Bucket and Rapid Cache.

Rapid Bucket, now generally available, leverages Colossus, the Google distributed storage system that powers Gemini and YouTube, to deliver more than 15 TB/s of bandwidth, 20 million requests per second, and sub-millisecond latency in a single zonal bucket. With access via high-performance gRPC and S3-compatible APIs, Rapid Bucket increases accelerator utilization for multi-modal training with 50% reduced GPU blocked time and 2.5x faster data loading. Checkpoint restores are 5x faster and checkpoint writes are 3.2x faster compared to traditional object storage, reducing workload interruptions and wasted GPU time.

Checkpoint writes are 3.2x faster and restores are 5x faster with Rapid Bucket

Rapid Cache, formerly Anywhere Cache, accelerates bandwidth for bursty workloads like model loading for inference, delivering an aggregate read throughput of 2.5 TB/s for existing buckets, with no code changes. The new ingest-on-write feature provides up to 2.2x faster checkpoint restores, allowing training clusters to recover faster from interruptions.

Rapid Cache’s combination of simplicity and performance has resulted in strong adoption, including from cutting-edge AI/ML customers like Thinking Machines Lab.

“Rapid Cache has become a core foundation of our AI/ML data infrastructure, supporting our critical workflows, from data prep and pretraining to training and model loading. By acting as a crucial bandwidth shield and booster, it enables us to scale our data-intensive workloads across our entire fleet without compromise, providing us with the on-demand high bandwidth and consistent stability that we need to innovate at speed.” - James Sun, Member of Technical Staff, Thinking Machines Lab

Google Cloud Managed Lustre
The Lustre parallel file system is the industry standard for organizations whose AI training and inference workloads require high throughput and sub-millisecond latency, and is trusted by AI labs and HPC centers worldwide to feed thousands of accelerators simultaneously and keep them saturated under pressure. Google Cloud Managed Lustre brings that capability as a fully managed service, and with today's announcements, it is the most performant managed Lustre offering available in any cloud.

Managed Lustre now delivers up to 10 TB/s of throughput — a 10x increase since last year and 4–20x higher than managed Lustre offerings from other hyperscalers for a single instance. Powered by C4NX VMs and Hyperdisk Exapools, Managed Lustre writes and restores checkpoints 2.6x faster when compared to other Google Cloud storage solutions.

The new Dynamic tier ($0.06/GB-month) delivers the low-latency performance required for intense AI workloads like training and checkpointing. By serving data from persistent disk rather than relying on object-based caching, we eliminate a performance cliff — helping ensure your data remains responsive and your accelerators stay productive. A single SKU provides simple, predictable billing without the hidden complexity of traditional data tiering.

“By integrating Managed Lustre we eliminated the typical onboarding bottlenecks, allowing us to hit the ground running with the inferencing workload. This high-throughput, low-latency storage keeps our B200 GPUs fully saturated, driving a substantial performance gain in LLM inference over the H200. For our customers, this translates directly into faster, more responsive AI agents that can handle complex reasoning at a fraction of the previous latency.” - Lavnaya Karanam, Software Engineering PMTS, Salesforce

Smart Storage: Context for the AI era

The beauty of an object storage system like Cloud Storage has long been its simplicity: the system knows the object’s name, its size, and when it was created. But if you want to understand the object’s content — what entities it references, whether it contains sensitive PII, or whether it’s relevant to a pending query — you need to use custom pipelines, separate databases, and bespoke enrichment systems.

AI has changed the equation. To fine-tune a model, you need to select the right objects from the get-go, from a corpus of millions. Building an agent requires retrieving the right context for each decision. To meet a compliance obligation, you need to know what every file contains up front, before it becomes a liability. In each case, the bottleneck isn’t compute or model quality — it’s the inability to describe, find, and act on objects at scale.

To bridge that gap between stored and usable data, last year we introduced Smart Storage, a set of capabilities built directly into Cloud Storage that makes every object self-describing. New Smart Storage capabilities include:

Automated annotations, which eliminates the need to build and maintain custom annotation pipelines. With Smart Storage enabled, Cloud Storage can now automatically generate context — including image annotations — so your data is discoverable and usable from the moment it lands. You pay to annotate the data once at write time, and every downstream system can use those annotations immediately for the life of the object.
Cloud Storage MCP server lets you read, write, and analyze Cloud Storage data using the standard MCP protocol.

Smart Storage enables these capabilities, and others, thanks to its object context, now generally available. This metadata substrate adds structured, mutable, IAM-governed context to every object. Customers write their own tags and classifications; Google's annotation pipelines automatically attach labels, extracted entities, and compliance signals.

With Smart Storage, ML teams can select training datasets from semantic criteria without building retrieval pipelines. AI agents can ground their reasoning in enterprise data without a separate retrieval layer.

Storage Intelligence: Data management at AI scale

As data estates grow to hundreds of petabytes, storage costs can spike without warning, and security blind spots can multiply across billions of objects. To manage this, teams have to stitch together multiple tools just to answer basic questions about their own data.

Last year we launched Storage Intelligence to give enterprises a unified management experience built directly into Cloud Storage. Today, 70% of our largest customers use Storage Intelligence, each of whom manage over 50 billion objects.

Storage Intelligence provides a single view across your entire project or organization, with unique capabilities like bucket relocations across regions. Today, we're making it significantly more powerful with:

New zero-configuration dashboards instantly surface cost anomalies and integrate Security Command Center’s Data Security Posture Management (DSPM) data governance feature, to detect critical security vulnerabilities across Cloud Storage — no setup required.
New object events and bucket activity tables in Insights Datasets now drive deeper cost analysis and accelerate operational tasks. You can use these insights to perform a wide range of analyses, from optimizing bucket placement based on egress patterns to quickly troubleshooting 429 errors by finding the impacted objects.
Enhanced batch operations make it even simpler to act on billions of objects with new change ACL and storage class operations, and support for multi-bucket operations.

Enhancing the storage ecosystem

Beyond our core storage offerings, we are streamlining how enterprises migrate to and protect data in the cloud.

Google Cloud NetApp Volumes: With the launch of Flex Unified, NetApp Volumes now provides a unified enterprise storage platform that bridges the data center and the cloud, provisioning both block (iSCSI, NVMe/TCP) and file (NFS/SMB) on the same storage pool. New ONTAP-mode lets you bring your existing automation (Terraform, Ansible) and ONTAP APIs directly to NetApp Volumes.
Filestore for GKE: Developers building AI workloads on Google Kubernetes Engine (GKE) can start small, with shares as small as 100 GiB, and scale capacity and IOPS independently. At the same time, tighter integration to the Colossus distributed file system provides more scale and enterprise capabilities.
Data protection: Google Cloud Backup and DR now features agentic AI capabilities that can autonomously audit your backup estate and remediate coverage gaps, with new GA integrations for AlloyDB and Filestore.

Where to start

As you navigate today’s generational AI shift, you need a storage foundation to support ever-larger, more intelligent, and autonomous models. With new high-performance and intelligent storage layers, plus enhanced storage management tools and a deeper data protection bench, Google Cloud’s storage platforms lets you understand and use your enterprise data in ways that weren’t previously possible, allowing you to:

Reduce the AI data bottleneck: Saturate compute and accelerate ROI. Keep your expensive GPUs and TPUs fully productive with high-throughput storage that delivers the extreme performance required for large-scale training and inference.
Build agent-ready data foundations: Shift from building custom pipelines to an active knowledge base where self-describing objects let AI agents instantly reason over data without manual prep.
Minimize blind spots across exabytes: Replace fragmented management tools with zero-configuration dashboards and datasets to instantly surface cost anomalies and security risks across billions of objects.
Embrace the storage ecosystem: Streamline migration and protection. Bridge your data center to the cloud, scale containerized apps, and automate data resilience with agentic AI.

Visit the Google Cloud Storage console to explore these new features, read more about Cloud Storage Rapid, or explore our Next '26 storage sessions.

Cross-cloud infrastructure innovation for the agentic enterprise

Wed, 22 Apr 2026 12:00:00 +0000

The era of agentic AI is accelerating from human- to machine-speed operations, while also creating profound stress on legacy technology infrastructure. This new reality pushes foundational systems to their limits: agents generate thousands of internal messages and complex queries, spawning more agents, all of which can rapidly overwhelm traditional networks and databases, and expose new security vulnerabilities.

Unlocking AI's full potential in the era of agents requires a secure, adaptive foundation. We call it cross-cloud infrastructure for the agentic enterprise – and at Google Cloud Next ‘26, we’re launching a powerful set of new innovations across four areas:

What’s new:

Fluid compute: Google Compute Engine and Kubernetes services work together to enable cost-effective, high-speed AI agents and enterprise workloads with new compute and orchestration capabilities.
Secure cross-cloud connectivity: Agent Gateway, Cloud Armor, and other tools deliver a secure, governed, and simplified networking foundation for AI agents, including observability of agentic traffic across clouds.
Unified data layer: Smart Storage, Knowledge Catalog, and other innovations transform passive data archives into dynamic reasoning engines, giving AI agents the context they need to execute.
Digital sovereignty: Confidential External Key Management and new features in Google Distributed Cloud bring Google’s leading models and AI enablers wherever your data lives.

Let’s take a closer look at all the news for each of these four areas.

Fluid compute

Agentic workloads are dynamic and unpredictable, impacting both traditional enterprise applications and the AI agents themselves. Fluid compute is enabled by Google Compute Engine and Google Kubernetes services working together to dynamically adapt and shift weight in real-time, enabling cost-effective, high-speed AI agents and operational enterprise workloads for all customers.

While our AI Hypercomputer delivers raw power for large-scale AI model training, fluid compute addresses the needs of operational workloads and agents. As agents move toward reasoning and reinforcement learning, CPUs are reclaiming a central role, excelling at the "branchy" logic, complex control flows, and secure execution sandboxes (like those for agentic orchestration, RL, SLM inference, and RAG) that agent workflows demand. CPUs also provide the critical isolation needed for secure agent execution, complementing the parallel processing strength of GPUs and TPUs used in training.

We are introducing new CPU families, GKE capabilities, and Hyperdisk block storage capabilities to run traditional workloads and AI agents securely at scale, including:

Google C4N Series: These VMs help ensure your enterprise workloads don't slow down under the demands of agentic AI by processing up to 95 million packets per second, up to 40% faster than other leading hyperscalers. This eliminates I/O bottlenecks for demanding workloads like security appliances, streaming media, and open source databases, even when utilizing smaller instance sizes.
Google M4N Series with Hyperdisk Extreme: M4N removes data pipeline bottlenecks and eliminates overprovisioning to deliver industry leading per-core IOPS and throughput required to handle massive data I/O from agents, analytics, and mission-critical databases. M4N provides 26.57 GB of RAM per vCPU, allowing you to scale mission-critical workloads cost-effectively on fewer cores. For example, M4N with Hyperdisk Extreme reduces Oracle workload total cost of ownership by over 20% compared to leading hyperscale clouds.
GKE Agent Sandbox: This solution secures agents with trusted gVisor isolation and handles demand spikes, launching up to 300 sandboxes per second, per cluster. Backed by the only managed sandbox technology available among leading hyperscale clouds, it achieves up to 30% better price-performance than competitors when running AI agents on GKE Agent Sandbox with Google Axion N4A.

“Wayfair's AI strategy is built on years of systematic infrastructure modernization on Google Cloud — migrating our core eCommerce engine and databases off legacy systems, decomposing monolithic services into cloud-native architecture, and unifying our data and analytics platform. That foundation is what makes everything else possible. Today, Gemini Enterprise Agent Platform is powering everything from catalog enrichment to generative shopping experiences that help customers create a home that's just right for them — and it's the same foundation preparing us for the agentic era, where AI doesn't just assist but actively drives discovery, personalization, and commerce across every customer touchpoint and across our business.” - Fiona Tan, Chief Technology Officer, Wayfair

Explore all our latest compute innovations in this blog.

Secure cross-cloud connectivity

Agentic AI replaces predictable human requests with autonomous “reasoning loops,” in which agents call other agents that, in turn, call LLMs, triggering massive, sudden surges in compute and machine-to-machine traffic. This shift creates unique challenges for network predictability and security of non-human identities. Optimized for agentic AI, our Cross-Cloud Network moves data across diverse environments, connecting employees, customers, and agents with visibility and security. New in Cross-Cloud Network are:

Agent Gateway: Governs and orchestrates your enterprise agentic traffic as the “air traffic controller” in Gemini Enterprise Agent Platform. It natively understands agent protocols like MCP and A2A to inspect and govern every agent interaction. By integrating with Google and third-party identity and AI safety services, it enables deep inspection to verify access, block attacks, and protect sensitive data, maintaining compliance across your core business.
Cloud Network Insights: Delivers broad visibility across your hybrid and multi-cloud infrastructure to drive faster troubleshooting and network resolutions. Continuously monitor your end-to-end agent, network and web performance across Google Cloud, AWS, Azure, data centers, internet applications, and agentic workloads. Using synthetic traffic analytics, Cloud Network Insights provides hop-by-hop network path visibility to help you pinpoint the source of degradations and is coupled with AI-powered insights from Gemini Cloud Assist to deliver more autonomous operations.
Enhanced Cloud Next Generation Firewall (NGFW) and Cloud Armor: Provides machine-speed, AI-powered protection to combat the rapid explosion of AI-generated polymorphic malware and zero-day exploits. Cloud NGFW advanced malware sandbox delivers real-time inline prevention of AI-generated threats, while Cloud Armor managed rules provides automated protection against both known and unknown Common Vulnerabilities and Exposures (CVEs). Together with Model Armor, these services analyze the intent and content of AI agent communications.

Discover more about how we optimized networking for AI in and outside of the data center.

Unified data layer

AI agents are only as powerful as the data they can access and the context they’re given. More applications and platforms are using structured and unstructured data, but it can be difficult to catalog, find, and act on that data at scale, leading to less effective agent interactions. To close the gap, your agents need all of your data brought together into a cohesive, queryable knowledge engine, or unified data layer. This way, your agents can identify and access accurate sources. At Next ‘26, we’re enhancing the unified data layer with:

Smart Storage: This solution transforms dark data into a powerful knowledge asset for AI agents and training by embedding new semantic intelligence directly into your data objects. With new Google Cloud Storage capabilities like automated annotation, entity extraction, and semantic search, your agents can instantly find and use the specific data they need — whether it's hidden in spreadsheets, PDFs, or other unstructured formats across your entire organization. This significantly speeds up the development and deployment of your AI solutions. Learn more about storage innovations to accelerate your AI workloads.
Knowledge Catalog: Knowledge Catalog maps business meaning across your entire data estate, providing a grounded source of truth so agents can deliver the most accurate results. This foundation enables AI training and inferencing and doesn’t require you to migrate your data; your agents interact with it directly, wherever it lives, with full context and governance, making modernization easier.

Part of our Agentic Data Cloud, Smart Storage and Knowledge Catalog can take your data from a passive archive into a dynamic reasoning engine.

“AI is critical to making our customers’ smart home and security solutions more intelligent and convenient. By leveraging Google Cloud’s Smart Storage, we auto-annotate rich metadata delivered in BigQuery. We’ve scaled and accelerated our data discovery and curation efforts, speeding up our AI development process from months to weeks, continuously delivering innovations that build trust and enhance the overall home experience.” - Brandon Bunker, VP of Product, AI, Vivint

Digital sovereignty

In the agentic era, digital sovereignty is a fundamental requirement for public sector and enterprise customers looking to accelerate innovation — without sacrificing control. There’s no one-size-fits-all solution, which is why we’ve designed a comprehensive set of offerings to meet different sovereign AI needs anywhere: public cloud, on-premises, or hybrid. New capabilities in our sovereign AI portfolio include:

Confidential External Key Management: Organizations can use Confidential External Key Management to maintain complete possession, custody, and control of your encryption keys and the policies that govern them. Confidential External Key Management leverages Confidential Compute to host the key management endpoint in a tamper-proof environment within Google Cloud. You are in control and determine where your keys are stored, who can access them, and under what circumstances. Even highly privileged Google administrators cannot access your keys without authorization, which you can revoke at any time. Your data, your control.
Gemini on Google Distributed Cloud: With Gemini on GDC, companies can securely deploy Gemini in sensitive environments, while meeting data sovereignty needs. Your choice of deployment models includes managed software on your connected hardware or a fully disconnected, air-gapped solution. You can now scale with Google's leading AI capabilities even in the most restricted, high-security environments — from powerful Gemini models to advanced coding, search, and other agentic capabilities.

In addition, Google Distributed Cloud supports an end-to-end AI stack, combining our latest-generation AI infrastructure with Gemini models to accelerate and enhance all your sovereign AI workloads. This stack includes:

NVIDIA Blackwell GPUs: NVIDIA Blackwell (NVIDIA HGX B200) and NVIDIA Blackwell Ultra platforms (NVIDIA HGX B300) GPUs accelerate AI performance, leveraging fifth-gen NVIDIA NVLink to deliver data-center scale bandwidth directly to your environment.
New VM families: New A4 family offerings provide the ability to handle the most demanding inference tasks, delivering a 2.25x increase in peak compute. Memory-Optimized M2 and M3 brings the high memory-to-vCPU ratios needed for massive ERP and data analytics workloads on-premises.
Enhanced storage: Eliminate storage bottlenecks with 6x storage capacity per zone and a 10x performance boost, giving you the ability to do AI reasoning on-premises. Now, your data infrastructure moves at the speed of AI reasoning.

"Our customers demand high-performance, private AI inference without the risks of multi-tenancy. Google Distributed Cloud allows us to provide dedicated, low-latency environments that meet strict sensitive data requirements. With the ability to run Gemini on B200s and B300s, we can significantly increase inference speeds and provide the token throughput our clients need to scale." - Dave Driggers, CEO & Co-founder, Cirrascale Cloud Services

Transforming vision into reality

When these product areas converge, your infrastructure evolves into a high-performing, secure, adaptive foundation for the agentic era. We're not just offering tools; we're providing the architectural blueprint to enable enterprises and the public sector to rapidly embrace the full power of AI and agents with confidence.

To learn more about key industry trends for AI Infrastructure, read our State of Infrastructure in the Agentic AI Era report.

New GKE Cloud Storage FUSE Profiles take the guesswork out of configuring AI storage

Wed, 08 Apr 2026 16:30:00 +0000

In the world of AI/ML, data is the fuel that drives training and inference workloads. For Google Kubernetes Engine (GKE) users, Cloud Storage FUSE provides high-performance, scalable access to data stored in Google Cloud Storage. However, we learned from customers that getting the maximum performance out of Cloud Storage FUSE can be complex.

Today, we are excited to introduce GKE Cloud Storage FUSE Profiles, a new feature designed to automate performance tuning and accelerate data access for your AI/ML workloads (training, checkpointing, or inference) with minimal operational overhead. With these profiles, tuned for your specific workload needs, you can enjoy high performance of Cloud Storage FUSE out of the box.

Before (manual tuning)

code_block: <ListValue: [StructValue([('code', 'apiVersion: v1\r\nkind: PersistentVolume\r\nmetadata:\r\n name: serving-bucket-pv\r\nspec:\r\n accessModes:\r\n - ReadWriteMany\r\n capacity:\r\n storage: 64Gi\r\n persistentVolumeReclaimPolicy: Retain\r\n storageClassName: ""\r\n claimRef:\r\n name: serving-bucket-pvc\r\n mountOptions:\r\n - implicit-dirs\r\n - metadata-cache:ttl-secs:-1\r\n - metadata-cache:stat-cache-max-size-mb:-1\r\n - metadata-cache:type-cache-max-size-mb:-1\r\n - file-cache:max-size-mb:-1\r\n - file-cache:cache-file-for-range-read:true\r\n - file-system:kernel-list-cache-ttl-secs:-1\r\n - file-cache:enable-parallel-downloads:true\r\n - read_ahead_kb=1024\r\n csi:\r\n driver: gcsfuse.csi.storage.gke.io\r\n volumeHandle: BUCKET_NAME\r\n volumeAttributes:\r\n skipCSIBucketAccessCheck: "true"\r\n gcsfuseMetadataPrefetchOnMount: "true"\r\n---\r\napiVersion: v1\r\nkind: PersistentVolumeClaim\r\nmetadata:\r\n name: serving-bucket-pvc\r\nspec:\r\n accessModes:\r\n - ReadWriteMany\r\n resources:\r\n requests:\r\n storage: 64Gi\r\n volumeName: serving-bucket-pv\r\n storageClassName: ""\r\n–--\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n name: gcs-fuse-csi-example-pod\r\n annotations:\r\n gke-gcsfuse/volumes: "true"\r\nspec:\r\n containers:\r\n # Your workload container spec\r\n ...\r\n volumeMounts:\r\n - name: serving-bucket-vol\r\n mountPath: /serving-data\r\n readOnly: true\r\n serviceAccountName: KSA_NAME \r\n volumes:\r\n - name: gke-gcsfuse-cache # gcsfuse file cache backed by RAM Disk\r\n emptyDir:\r\n medium: Memory \r\n - name: serving-bucket-vol\r\n persistentVolumeClaim:\r\n claimName: serving-bucket-pvc'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e1d5f70>)])]>

After (Cloud Storage FUSE mount options, CSI configs, and file cache medium automatically configured!)

code_block: <ListValue: [StructValue([('code', 'apiVersion: v1\r\nkind: PersistentVolume\r\nmetadata:\r\n name: serving-bucket-pv\r\nspec:\r\n accessModes:\r\n - ReadWriteMany\r\n capacity:\r\n storage: 64Gi\r\n persistentVolumeReclaimPolicy: Retain\r\n storageClassName: gcsfusecsi-serving\r\n claimRef:\r\n name: serving-bucket-pvc\r\n csi:\r\n driver: gcsfuse.csi.storage.gke.io\r\n volumeHandle: BUCKET_NAME\r\n---\r\napiVersion: v1\r\nkind: PersistentVolumeClaim\r\nmetadata:\r\n name: serving-bucket-pvc\r\nspec:\r\n accessModes:\r\n - ReadWriteMany\r\n resources:\r\n requests:\r\n storage: 64Gi\r\n volumeName: serving-bucket-pv\r\n storageClassName: gcsfusecsi-serving\r\n–--\r\napiVersion: v1\r\nkind: Pod\r\nmetadata:\r\n name: gcs-fuse-csi-example-pod\r\n annotations:\r\n gke-gcsfuse/volumes: "true"\r\nspec:\r\n containers:\r\n # Your workload container spec\r\n ...\r\n volumeMounts:\r\n - name: serving-bucket-vol\r\n mountPath: /serving-data\r\n readOnly: true\r\n serviceAccountName: KSA_NAME \r\n volumes: \r\n - name: serving-bucket-vol\r\n persistentVolumeClaim:\r\n claimName: serving-bucket-pvc'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e1d5b80>)])]>

The trouble with optimizing Cloud Storage FUSE

Optimizing Cloud Storage FUSE for high-performance workloads is a multi-dimensional problem. Historically, users had to navigate manual configuration guides that could span dozens of pages. And as AI/ML has evolved, Cloud Storage FUSE’s capabilities have also increased, with new mount options available to accelerate your workloads. The "right" settings were never static; they depended heavily on a variety of dynamic factors:

Bucket characteristics: The total size of your dataset and the number of objects significantly impact metadata and file cache requirements.
Infrastructure variability: Optimal configurations change based on whether you are using GPUs, TPUs, or general-purpose compute.
Node resources: Available RAM and Local SSD capacity determine how much data can be cached locally to minimize expensive round-trips to Cloud Storage.
Workload patterns: A training workload (high-throughput reads of large datasets) requires different tuning than a checkpointing workload (bursty, high-throughput writes) or a serving workload (latency-sensitive model loading).

In fact, many customers leave available performance on the table or face reliability issues (e.g., Pod Out-of-Memory kills) due to unoptimized or misconfigured Cloud Storage FUSE settings.

Introducing Cloud Storage FUSE Profiles for GKE

GKE Cloud Storage FUSE Profiles simplify this complexity with pre-defined, dynamically managed StorageClasses tailored for specific AI/ML patterns. Instead of manually adjusting dozens of mount options, you simply select a profile that matches your workload type.

These profiles operate on a layered model. They take the base best practices from Cloud Storage FUSE and add a GKE-specific intelligence layer. When you deploy a Pod using a profile, GKE automatically:

Scans your bucket (or a specific directory) to understand its size and object count.
Analyzes the target node to check for available RAM, Local SSD, and accelerator types.
Calculates optimal cache sizes and selects the best backing medium (RAM or Local SSD) automatically.

We are launching with three primary profiles:

gcsfusecsi-training: Optimized for high-throughput reads to keep GPUs and TPUs fed with data.
gcsfusecsi-serving: Optimized for model loading and inference, with automated Rapid Cache integration.
gcsfusecsi-checkpointing: Optimized for fast, reliable writes of large multi-gigabyte checkpoint files.

Using GKE Cloud Storage FUSE Profiles delivers several benefits:

Simplified tuning: Replace complex, error-prone manual configurations with three simple, purpose-built StorageClasses.
Dynamic, resource-aware optimization: The CSI driver automatically adjusts cache sizes based on real-time environment signals, so that you can maximize performance without risking node stability.
Accelerated read performance: The serving profile automatically triggers Rapid Cache, placing your data closer to your compute for faster cold-start model loading.
Granular performance insights: Gain visibility into automated tuning decisions through structured logs that detail exactly why specific cache sizes and mediums were selected for your Pod.

Using GKE Cloud Storage FUSE Profiles inference profile, we were able to reduce model loading time for a Qwen3-235B-A22B workload on TPUs (480GB) from 39 hours to just 14 minutes, helping customers achieve the maximum benefit of Cloud Storage FUSE GCSFuse out-of-the-box.

How to use Cloud Storage FUSE Profiles on GKE

To get started, ensure your cluster is running GKE version 1.35.1-gke.1616000 or later with the Cloud Storage FUSE CSI driver enabled.

1. Identify the StorageClass

GKE comes pre-installed with the profile-based StorageClasses. You can verify them with:

code_block: <ListValue: [StructValue([('code', 'kubectl get sc -l gke-gcsfuse/profile=true'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e1d5430>)])]>

2. Create your PV and PVC

When creating your PersistentVolume, point it to your Cloud Storage bucket. GKE automatically initiates a bucket scan to determine the optimal configuration.

code_block: <ListValue: [StructValue([('code', 'apiVersion: v1\r\nkind: PersistentVolume\r\nmetadata:\r\n name: gcs-pv\r\nspec:\r\n accessModes:\r\n - ReadWriteMany\r\n capacity:\r\n storage: 5Gi\r\n persistentVolumeReclaimPolicy: Retain \r\n storageClassName: gcsfusecsi-training\r\n mountOptions:\r\n - only-dir=my-ml-dataset-subdirectory # Optional\r\n csi:\r\n driver: gcsfuse.csi.storage.gke.io\r\n volumeHandle: my-ml-dataset-bucket\r\n---\r\napiVersion: v1\r\nkind: PersistentVolumeClaim\r\nmetadata:\r\n name: gcs-pvc\r\nspec:\r\n accessModes:\r\n - ReadWriteMany\r\n resources:\r\n requests:\r\n storage: 5Gi\r\n storageClassName: gcsfusecsi-training\r\n volumeName: gcs-pv'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e1d59a0>)])]>

3. Create your Deployment

Once your Persistent Volume Claim (PVC) is bound, simply consume it in your Deployment as you would any other volume. GKE mounts the volume with the precise settings your hardware and dataset require.

code_block: <ListValue: [StructValue([('code', 'apiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n name: my-deployment\r\nspec:\r\n replicas: 3\r\n selector:\r\n matchLabels:\r\n app: my-app\r\n template:\r\n metadata:\r\n labels:\r\n app: my-app\r\n annotations:\r\n gke-gcsfuse/volumes: "true"\r\n spec:\r\n serviceAccountName: my-ksa\r\n containers:\r\n - name: my-container\r\n image: busybox\r\n volumeMounts:\r\n - name: my-gcs-volume\r\n mountPath: "/data"\r\n volumes:\r\n - name: my-gcs-volume\r\n persistentVolumeClaim:\r\n claimName: gcs-pvc'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e1d5f10>)])]>

After it's deployed, the CSI driver automatically calculates optimal cache sizes and mount options based on your node's resources, such as GPUs or TPUs, memory, Local SSD, the bucket or sub-directory size, and the sidecar resource limits.

Get started today

GKE Cloud Storage FUSE Profiles remove the guesswork from configuring your cloud storage for high performance. By moving from manual "knob-turning" to automated, workload-aware profiles, you can spend less time debugging storage throughput and more time building the next generation of AI.

Ready to get started? GKE Cloud Storage FUSE Profiles are generally available in version 1.35.1-gke.1616000. Explore the official documentation to configure Cloud Storage FUSE profiles in GKE for your AI/ML workloads!

Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer

Thu, 04 Dec 2025 17:00:00 +0000

As large language models (LLMs) continue to grow in size and complexity, the time it takes to load them from storage to accelerator memory for inference can become a significant bottleneck. This "cold start" problem isn't just a minor delay — it's a critical barrier to building resilient, scalable, and cost-effective AI services. Every minute spent loading a model is a minute a GPU is sitting idle, a minute your service is delayed from scaling to meet demand, and a minute a user request is waiting.

Google Cloud and NVIDIA are committed to removing these barriers. We’re excited to highlight a powerful, open-source collaboration that helps AI developers do just that: the NVIDIA Run:ai Model Streamer now comes with native Google Cloud Storage support, supercharging vLLM inference workloads on Google Kubernetes Engine (GKE). Accessing data for AI/ML from Cloud Storage on GKE has never been faster!

The chart above shows how quickly the model streamer can fetch a 141GB Llama 3.3-7 70B model from Cloud Storage as compared to the default vLLM model loader (lower is better).

Boost resilience and scalability with fewer cold starts

For an inference server running on Kubernetes, a "cold start" involves several steps: pulling the container image, starting the process, and — most time-consuming of all — loading the model weights into GPU memory. For large models, this loading phase can take many minutes, with painful consequences such as slow auto-scaling and idling GPUs as they wait for the workload to start up.

By streaming the model into GPU memory, the model streamer slashes potentially the most time-consuming part of the startup process. Instead of waiting for an entire model to be downloaded before loading, the streamer fetches model tensors directly from object storage and streams them concurrently to GPU memory. This dramatically reduces model loading times from minutes to seconds.

For workloads that rely on model parallelism— where a single model is partitioned and executed across multiple GPUs— the model streamer goes a step further. Its distributed streaming capability is optimized to take full advantage of NVIDIA NVLink, using high-bandwidth GPU-to-GPU communication to coordinate loading across multiple processes. Reading the weights from storage is divided efficiently and evenly across all participating processes, with each one fetching a portion of the model weights from storage and then sharing its segment with the others over NVLink. This allows even multi-GPU deployments to benefit from faster startups and fewer cold-start bottlenecks.

Performance and simplicity

The latest updates to the Model Streamer introduce first-class support for Cloud Storage, creating an integrated and high-performance experience for Google Cloud users. This integration is designed to be simple, fast, and secure, especially for workloads running on GKE.

For users of popular inference servers like vLLM, enabling the streamer is as simple as adding a single flag to your vLLM command line:

--load-format=runai_streamer

Here’s how easy it is to launch a model stored in a Cloud Storage bucket with vLLM:

code_block: <ListValue: [StructValue([('code', 'vllm serve gs://your-gcs-bucket/path/to/your/model \r\n--load-format=runai_streamer'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806dc81fa0>)])]>

The NVIDIA Run:ai Model Streamer is a key component for Vertex AI Model Garden's large model deployments. With container image streaming and model weight streaming, we have been able to significantly improve the first deployment and autoscaling experience for our users, and the efficiency of NVIDIA GPUs.

When running on GKE, the Model Streamer can automatically use the cluster's Workload Identity. This means you no longer need to manually manage and mount service account keys, simplifying your deployment manifests and enhancing your security posture. The following deployment manifest shows how to launch a container serving Llama3 70B on GKE. We have added the model loader distributed option to accelerate loads when model parallelism > 1:

code_block: <ListValue: [StructValue([('code', 'apiVersion: apps/v1\r\nkind: Deployment\r\n…\r\n spec:\r\n serviceAccountName: gcs-access\r\n containers:\r\n - args:\r\n - --model=gs://your-gcs-bucket/path/to/your/model \r\n - --load-format=runai_streamer\r\n \t\t- --model-loader-extra-config={"distributed":true}\r\n\t\t…\r\n command:\r\n - python3\r\n - -m\r\n - vllm.entrypoints.openai.api_server\r\n image: vllm/vllm-openai:latest\r\n ….'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806dc816a0>)])]>

That’s it! The streamer handles the rest, auto-tuning streaming concurrency to match your VM’s performance. For more details, see the documentation on optimizing vLLM model loading on GKE.

Combining NVIDIA Run:ai Model Streamer with Cloud Storage Anywhere Cache

Anywhere Cache provides zonally co-located SSD-backed caching for data stored in a regional or multi-regional Cloud Storage bucket. Reducing latency by up to 70% and providing up to 2.5 TB/s of read throughput, Anywhere Cache is a great solution for scale-out inference workloads where the same model is downloaded multiple times across a series of nodes. Together, Anywhere Cache server-side acceleration, along with the NVIDIA Run:ai Model Streamer’s client-side acceleration, create an easy-to-manage, extremely performant model-loading system.

Get started today

The NVIDIA Run:ai Model Streamer is evolving into a critical piece of the AI infrastructure puzzle, enabling teams to build faster, more resilient, and more flexible MLOps pipelines on GKE.

To learn more about how to use the model streamer on GKE see our GKE NVIDIA Run:ai Guide.
For detailed instructions on using the streamer with vLLM, see the official vLLM documentation.
To learn more and contribute to the model streamers ongoing development check out the NVIDIA Run:ai Model Streamer project on GitHub.

Reducing TCO for AI inferencing with external KV Cache on Managed Lustre

Fri, 31 Oct 2025 16:00:00 +0000

The demand for AI inference infrastructure is accelerating, with market spend expected to soon surpass investment in training the models themselves. This growth is driven by the demand for richer experiences, particularly through support for larger context windows and the rise of agentic AI. As organizations aim to improve user experience while optimizing costs, efficient management of inference resources is paramount.

According to an internal experimental study of large model inferencing, external key-value caches — KV Cache or, “attention caches” — on high-performance storage like Google Cloud Managed Lustre, can reduce total cost of ownership (TCO) by up to 35%, allowing organizations to serve the same workload with ~40% fewer GPUs by offloading prefill compute to I/O. In this blog, we explore the core challenges of managing long-context AI inference and detail how Google Cloud Managed Lustre provides the high-performance external storage solution required to achieve these significant cost and efficiency benefits.

About KV Cache

During the inference phase, a KV Cache is a critical optimization technique for the efficient operation of Transformer-based large language models (LLMs).

The key innovation of the Transformer was the complete elimination of sequential processing (recurrence), which was achieved by introducing the self-attention mechanism to allow every element in a sequence to instantaneously and dynamically compare itself to and assess the relevance of every other element (a global, all-at-once evaluation). Within this self-attention mechanism, the model computes Key (K) and Value (V) vectors of all preceding tokens in the sequence. To generate the next token during the inference phase, the model needs the K and V vectors of all the previous tokens.

This is where the KV Cache comes into play. The KV Cache stores these K and V vectors after the initial context processing (known as the "prefill" stage), thereby avoiding the redundant, costly re-computation of the context sequence when generating subsequent tokens. By eliminating this re-computation, the KV Cache vastly speeds up the overall inference process. While smaller caches can fit in high-bandwidth memory (HBM) or host DRAM — up to a few TBs of memory may be available in a single multi-accelerator server — managing a KV Cache for contexts across multiple concurrent users that exceed the memory capacity often requires external or hierarchical storage solutions.

These large contexts can make the "prefill" computation — the calculation that an AI model performs when processing a large context window — very expensive:

For a large context of 100K or more tokens, the prefill computation may cause the time to first token (TTFT) to increase to tens of seconds.
Prefill computation requires a high number of floating-point operations (FLOPs). KV Cache reuse saves these costs and makes additional resources available on the accelerator.

The growth of agentic AI is likely to make the challenge of managing a long context even greater. Unlike a simple chatbot, agentic AI is built for action. It moves beyond conversation to solve problems proactively, completing tasks on your behalf. To do this, it actively gathers context from a wide range of digital sources. Agentic AI may, for example: check live flight data, pull a customer's history from a database, research topics on the web, and/or keep organized notes in its own files. Agentic AI thereby builds a rich understanding of its environment, but often increases context lengths and their associated KV Cache size.

The key to managing performance costs at scale is to ensure that the accelerator is utilized as fully as possible. High-performance, scale-out storage provides the required greater throughput per accelerator and therefore translates into lighter resource requirements.

External KV Cache on Google Cloud Managed Lustre

We believe that Google Cloud Managed Lustre should be your primary storage solution for external KV Cache. On GPUs, Lustre is assisted by locally attached SSDs. And on TPUs, where local SSDs are not available, Lustre’s role is even more central.

A recent LMCache blog post by Google’s Danna Wang, “LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage,” demonstrates the foundational value of host-level offloading. Our Managed Lustre strategy is the next evolution of this host-offloading concept. While Local SSDs and CPU RAM are effective node-local tiers, they are fixed in size and cannot be shared. Managed Lustre provides a parallel file system to act as the massive, high-throughput external storage, making it a great solution for large-scale, multi-node, and multi-tenant AI inference workloads where the cache exceeds the capacity of the host machine.

Here’s an example of how the performance gains of Managed Lustre can reduce your TCO:

In an experiment with a 50K token context and a high cache hit rate (about 75%), using Managed Lustre improved total inference throughput by 75% and reduced MTTF by more than 40% compared to using KV Cache in host memory alone (further detail below).
TCO analysis yielded a 35% savings from using an external attention/KV Cache for a workload processing 1 million Tokens per Second (TPS) and leveraging A3-Ultra VMs and Managed Lustre, when compared to a workload leveraging no external storage.
Our experiment demonstrated that with configuration tuning and an improvement in KV Cache software to adopt more I/O parallelism, Managed Lustre can substantially improve inference performance.

Total Cost of Ownership: Analysis

When evaluating a KV Cache solution, it's critical to consider the TCO, which includes not just compute and storage costs but also operational expenses and potential savings. Our analysis shows that a high performance storage-backed KV Cache, like one built on Managed Lustre, provides a compelling TCO advantage compared to purely memory-based solutions.

Cost savings

After taking incremental storage costs into account, we project that the TCO for a file-system-backed KV Cache solution, processing 1m TPS, is 35% lower compared to a memory-only solution. This makes it a more scalable and economically viable option for large-scale AI inference deployments.

The primary TCO benefit comes from a more efficient utilization of expensive compute resources. By offloading KV Cache to a high-performance storage solution, you can achieve a higher inference throughput per accelerator. This means that fewer accelerators are needed for the same workload: You can handle a specific number of queries per second with ~40% fewer accelerators, resulting in direct cost savings.

TCO model assumptions

The TCO calculation includes several key components:

Storage costs (list price): These are the costs of Managed Lustre. Testing used the 1000 MB/s per TiB Performance Tier. The TCO model includes sufficient Lustre capacity (73 A3-Ultra machines, with 18 TiB Lustre capacity per machine) to hit the 1m TPS target rate.
Compute costs (list price): A3-Ultra VMs each with 8x H200s GPUs and 8x 141 GB HBM (spot prices will be lower).

Performance benchmarks

Our experiments demonstrated Google Cloud Managed Lustre’s ability to deliver the high-performance I/O necessary with a state-of-the-art LLM. These experiments serve Deepseek-R1 on a Google Cloud A3-Ultra machine (8x H200s; 8x 141GB HBMs). The experiments ran a synthetic serving workload with a 50K token context and a high cache-hit rate (about 75% hit rate) with a total KV Cache size of about 3.4TiB. The memory-only baseline uses 1 TiB host memory for KV Cache. We experimented with two variants of Managed Lustre at high and low I/O parallelism. For high I/O parallelism, we utilized 32 I/O worker threads to read KV Cache data from Lustre in parallel.

Lustre improved total inference throughput by 75% and reduced the mean time to first token by greater than 40% compared to using KV Cache in host memory alone.

Ready to optimize your inference workloads?

To get started with an external KV Cache solution that solves the capacity limits of long context windows and delivers significant performance gains on your large-scale LLMs, follow these steps:

1. Provision your infrastructure; create a Managed Lustre instance:

Provision your Lustre file system in the same region and zone as your target accelerators (GPUs or TPUs) for optimal low-latency access.
Deploy your inference engine: Deploy your LLM using a high-performance inference server like vLLM or a similar framework that supports an external KV Cache or paged-attention architecture.

2. Configure for performance

Once you’ve mounted Managed Lustre, you must configure your inference engine software to leverage the high-performance storage:

Implement direct I/O: Configure your application to access Managed Lustre using the o_direct flag. This bypasses the general-purpose file system cache, allowing the inference engine to manage the critical host memory more effectively.
Tune I/O parallelism: Depending on your inference KV Cache software, its out-of-the-box storage I/O parallelism may not be ideal. You may need to tune the KV Cache software to read KV chunk files with enhanced parallelism to maximize performance.

To take the next step, read the documentation about how to get started with Managed Lustre.

From dark data to bright insights: The dawn of smart storage

Tue, 14 Oct 2025 16:00:00 +0000

Organizations interested in AI today have access to amazing computational power with Tensor Processing Units (TPUs) and Graphical Processing Units (GPUs), while foundational models like Gemini are redefining what's possible. Yet for many enterprises a critical obstacle to AI is the data itself, specifically unstructured data. According to Enterprise Strategy Group, for most organizations, 61% of their total data is unstructured, the vast majority of which sits unanalyzed and unlabeled in archives, so-called "dark data." But with the help of AI, this untapped resource is an opportunity to unlock a veritable treasure trove of insights.

At the same time, when it comes to unstructured data, traditional tools only scratch the surface, and subject matter experts must build massive, manual preprocessing pipelines and define the data’s semantic meaning. This prevents any real analysis at scale, preventing companies from using even a fraction of what they store.

Now imagine a world where your unstructured data isn't just stored, but understood. A world where you can ask complex questions of data such as images, videos, and documents, and get interesting answers in return. This isn't just a futuristic vision — the era of smart storage is upon us. Today we are announcing new auto annotate and object contexts features that use AI to generate metadata and insights on your data, so you can then use your dark data for discovery, curation, and governance at scale. Better yet, the new features relieve you from having to build and manage your own object-analysis data pipelines.

Leveraging AI to transform dark data

Now, as unstructured data lands in Google Cloud, it's no longer treated as a passive object. Instead, a data pipeline leverages AI to automatically process and understand the data, surfacing key insights and connections. Two new features are integral to this vision: auto annotate, which enriches your data by automatically generating metadata using Google’s pretrained AI models, and object contexts, which lets you attach custom, actionable tags to your data. Together, these two features can help transform passive data into active assets, unlocking use cases such as rapid data discovery for AI model training, streamlined data curation to reduce model bias, enhanced data governance to protect sensitive information, and the ability to build powerful, stateful workflows directly on your storage.

Making your data smart

Auto annotate, currently in a limited experimental release, automatically generates rich metadata (“annotations”) about objects stored in Cloud Storage buckets by applying Google's advanced AI models, starting with image objects. Getting started is simple: enable auto annotate for your selected buckets or an entire project, pick one or more available models, and your entire image library will be annotated. Furthermore, new images are automatically annotated as they are uploaded. An annotation’s lifecycle is always tied to its object’s, simplifying management and helping to ensure consistency. Importantly, auto annotate operates under your control, only accessing object content to which you have explicitly granted permissions. Then, you can query the annotations, which are available as object contexts, through Cloud Storage API calls and Storage Insights datasets. The initial release uses pretrained models for generating annotations: object detection with confidence scores, image labeling, and objectionable content detection.

a sample of generated annotations for an object

Then, with object contexts, you can attach custom key-value pair metadata directly to objects in Cloud Storage, including information generated by the new auto annotate feature. Currently in preview, object contexts are natively integrated with Cloud Storage APIs for listing and batch operations, as well as Storage Insights datasets for analysis in BigQuery. Each context includes object creation and modification timestamps, providing valuable lineage information. You can use Identity and Access Management (IAM) permissions to control who can add, change, or remove object contexts. When migrating data from Amazon S3 using Cloud Storage APIs, existing S3 Object Tags are automatically converted into contexts.

In short, object contexts provide a flexible and native way to add context to enrich your data. Combined with a smart storage feature like auto annotations, object contexts convert data into information, letting you build sophisticated data management workflows directly within Cloud Storage.

Now, let’s take a deeper look at some of the new use cases these smart storage features deliver.

1. Data discovery

One of the most significant challenges in building new AI applications is data discovery — how to find the most relevant data across an enterprise's vast and often siloed data stores. Locating specific images or information within petabytes of unstructured data can feel impossible. Auto annotate automatically generates rich, descriptive annotations for your data in Cloud Storage. Annotations, including labels and detected objects, are available within object contexts and fully indexed in BigQuery. After generating embeddings for them, you can then use BigQuery to run a semantic search for these annotations, effectively solving the "needle in a haystack" problem. For example, a large retailer with millions of product images can use auto annotate and BigQuery to quickly find 'red dresses' or 'leather sofas', accelerating catalog management and marketing efforts.

2. Data curation for AI

Building effective AI models requires carefully curated datasets. Sifting through data to ensure it is widely representative (e.g., "does this dataset have cars in multiple colors?") to reduce model bias, or to select specific training examples (e.g., “Find images with red cars”), is both time-consuming and error-prone. Auto annotate can identify attributes like colors and object types, to automate selecting balanced datasets.

For instance, an autonomous vehicle company training models could use petabytes of on-road camera data to recognize traffic signs, using auto annotate to identify and extract images that contain the word ‘Stop’ or 'Pedestrian Crossing'.

Vivint, a smart home and security company, has been using auto annotate to find and understand their data.

“Our customers trust us to help make their homes and lives safer, smarter, and more convenient, and AI is at the heart of our product and customer experience innovations. Cloud Storage auto annotate’s rich metadata delivered in BigQuery helps us scale our data discovery and curation efforts, speeding up our AI development process from 6 months to as little as 1 month by finding the needle-in-a-haystack data essential to improve our models.” - Brandon Bunker, VP of Product, AI, Vivint

3. Governing unstructured data at scale

Unstructured data is constantly growing, and manually managing and governing that data to identify sensitive information, detect policy violations, or categorize it for lifecycle management is a challenge. Auto annotate and object contexts help solve these data governance and compliance challenges. For example, a retail customer can use auto annotate to identify and flag images containing visible customer personally identifiable information (PII) such as shipping labels or order forms. This information, stored in object context, can then trigger automated governance actions such as moving flagged objects to a restricted bucket or initiating a review process.

BigID, a partner building solutions on Cloud Storage, reports that using object contexts is helping them manage their customers’ risk:

“Object contexts gives us a way to take the outputs of BigID's industry-leading data classification solutions and apply labels to Cloud Storage objects. Object contexts will allow BigID labels to shed light onto data in Cloud Storage: identifying objects which contain sensitive information and helping them understand and manage their risk across AI, security, and privacy." - Marc Hebrard, Principal Technical Architect, BigID

The future is bright for your data

At Google Cloud, we’re committed to building a future where your data is not just a passive asset but an active catalyst for innovation. Don't keep your valuable data in the dark. Bring your data to Cloud Storage and enable auto annotation and object contexts to unlock its full potential with Gemini, Vertex AI, and BigQuery.

You can start using object contexts today, and reach out to us for an early look at auto annotate. Once you have access, simply enable auto annotate for selected buckets or on an entire project, pick one or more available models, and your entire image library will be annotated. You can then query the annotations that are available as object contexts through Cloud Storage API calls and Storage Insights datasets.

To learn more, read about our end-to-end vision in a showcase paper with Enterprise Strategy Group: Illuminating Dark Data With Smart Storage from Google Cloud.

Power your enterprise applications in the cloud with unified block and file storage

Tue, 14 Oct 2025 16:00:00 +0000

Migrating enterprise applications to the cloud requires a storage foundation that can handle everything from high-performance block workloads to globally distributed file access. To solve these challenges, we’re thrilled to announce two new capabilities for Google Cloud NetApp Volumes: unified iSCSI block and file storage to enable your storage area network (SAN) migrations, and NetApp FlexCache to accelerate your hybrid cloud workloads. These features, along with a new integration for agents built with Gemini Enterprise, can help you modernize even your most demanding applications.

Run your most demanding SAN workloads on Google Cloud

For decades, enterprises have relied on NetApp for both network attached storage (NAS) and SAN workloads on-premises. We’re now bringing that same trusted technology to a fully managed cloud service, allowing you to migrate latency-sensitive applications to Google Cloud without changing their underlying architecture.

Our unified service is engineered for enterprise-grade performance, with features including:

Low latency engineered for your most demanding applications
Throughput that can burst up to 5 GiB/s with up to 160K random IOPS per volume
Independent scaling of capacity, throughput, and IOPS to control costs
Integrated data protection with NetApp Snapshots for rapid recovery and ransomware defense

iSCSI block protocol support is available now via private preview for interested customers.

Accelerate your hybrid cloud with NetApp FlexCache

For organizations with distributed teams and a hybrid cloud strategy, providing fast access to shared datasets is critical. NetApp FlexCache, a new capability for Google Cloud NetApp Volumes, provides high-performance, local read caches of remote volumes. This helps distributed teams access shared datasets as if they were local, and supports compute bursting for workloads that need low-latency data access, improving productivity and collaboration across your entire organization. FlexCache is available now in preview via an allowlist.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f806f8fdb50>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Bring your enterprise data to Gemini Enterprise

We’re also announcing that Google Cloud NetApp Volumes now serves as a data store for Gemini Enterprise. This integration unlocks new possibilities for retrieval-augmented generation (RAG), allowing you to ground your AI models on your own secure, factual, enterprise-grade data. Your data remains securely governed in NetApp Volumes and is quickly available for search and inference workflows, without the need for complex ETL or manual integrations.

Additional enhancements for your cloud environment

Google Cloud NetApp Volumes has several other new capabilities to help you modernize your data estate:

NetApp SnapMirror: You can now quickly replicate mission-critical data between on-prem NetApp systems and Google Cloud, providing a zero recovery point objective (RPO) and near-zero recovery time objective (RTO).
High-performance for large volumes: For applications with massive datasets such as HPC, AI, and EDA, we now offer large-capacity volumes that scale from 15TiB to 3PiB, with over 21GiB/s of throughput per volume.
Auto-tiering: To help you manage costs, built-in auto-tiering dynamically moves infrequently accessed data to lower-cost storage, with cold data priced at just $0.03/GiB for the Flex service level. As a turnkey, integrated feature, auto-tiering is transparent to any application built on Google Cloud NetApp Volumes, and can support a tiering threshold of anywhere from 2-183 days, with dynamically adjustable policy support.

Get started

Whether you’re migrating your enterprise SAN data, powering AI with Gemini Enterprise, or running high-throughput EDA workloads, Google Cloud NetApp Volumes can help you modernize your data estate. To learn more and get started, explore the product documentation.

The future of media sanitization at Google

Mon, 13 Oct 2025 16:00:00 +0000

At Google, protecting your data is our most important responsibility, and we are committed to keeping your data safe. To further this commitment, we are proud to announce that starting in November 2025, we will start transitioning our approach to media sanitization to fully rely on a robust and layered encryption strategy.

This marks a move away from the "brute force disk erase" process we have used for nearly two decades. While overwriting data has been an effective method, the storage technology landscape has changed dramatically. This process is no longer sustainable due to the size and technological complexity of today's modern media.

A smarter approach: Cryptographic erasure

To address these challenges, we are embracing a more modern and efficient method of media sanitization: cryptographic erasure.

By default, all user data in Google's services is protected by multiple layers of encryption. Cryptographic erasure leverages this encryption to sanitize media. Instead of overwriting the entire drive, we securely delete the cryptographic keys that are used to encrypt the data. Once the keys are gone, the data is rendered unreadable and unrecoverable.

This method is not only faster but also aligns with industry best practices. The National Institute of Standards and Technology (NIST) recognizes cryptographic erasure as a valid sanitization technique in its special publication 800-88. We are committed to meeting and exceeding these standards to ensure the security of your data.

Enhancing security through innovation

We implement cryptographic erasure with multiple layers of security, employing a defense in depth strategy. Our trust-but-verify model uses independent verification mechanisms to ensure permanent deletion of media encryption keys.

We also protect secrets involved in this process, like storage device keys, with industry-leading measures. Multiple key rotations enhance the security of customer data through independent layers of trusted encryption.

Sustainability and the circular economy

Our previous method of media erasure had an environmental cost. Any storage device that failed our rigorous verification process was physically destroyed. This resulted in the destruction of a significant number of devices each year.

Cryptographic erasure allows us to move towards a more sustainable, circular economy. By eliminating the need to physically destroy drives, we can reuse more of our hardware. This also allows us to recover valuable rare earth materials, such as neodymium magnets, from end-of-life media. This innovative magnet recovery process is a major accomplishment in sustainable manufacturing, showcasing our commitment to responsible growth.

Our path forward

We have consistently been strong advocates for doing what is truly right for our users, the broader industry, and the world at large. This transition to cryptographic erasure is a direct reflection of that commitment. It allows us to enhance security, align with the highest industry standards, and build a more sustainable future for our infrastructure. We believe this is the right path forward for our users, the industry, and the environment.

For more information about encryption at rest, including encryption key management, see our default encryption at rest security whitepaper.

11 ways to reduce your Google Cloud compute costs today

Mon, 06 Oct 2025 16:00:00 +0000

As the saying goes, "a penny saved is a penny earned," and this couldn't be more true when it comes to cloud infrastructure. In today's competitive business landscape, you need to maintain the performance to meet your business needs. Luckily, Google Cloud’s Compute Engine and block storage services offer numerous opportunities to reduce costs without sacrificing performance, especially in the context of your migration and modernization initiatives.

In this article, we'll explore 11 key ways to optimize your infrastructure spending on Google Cloud, from simple adjustments to strategic decisions that can result in significant long-term savings.

1. Choose the right VM instances

One of the most effective ways to reduce Compute Engine costs is to ensure that you’ve properly selected and right-sized your virtual machines (VMs) for their workloads to support your migration and modernization efforts. Whether you're new to Google Cloud or already using Compute Engine, adopting the latest-generation VMs — such as N4, C4, C4D, and C4A — can deliver substantial savings and improved price-performance.

Powered by Google Cloud’s Titanium architecture, our latest-generation VMs offer faster CPUs, higher memory bandwidth, and more efficient virtualization than their predecessors, so you can handle the same workloads with fewer resources. For existing customers, migrating from older VM generations to the newest VMs can significantly lower total costs while helping you exceed current performance levels. Organizations that have made the switch often report 20–40% better performance along with meaningful reductions in cloud compute spend. For example, Elastic leveraged the general-purpose C4A machine series based on Google Cloud's Arm-based Axion CPUs, to achieve a compelling efficiency and performance uplift for their workloads.

Beyond general-purpose VMs, we also offer specialized machine types to address unique customer requirements. Compute-optimized HPC VMs like H4D are designed for high-performance computing and data analytics, offering extreme performance for demanding workloads. M4 and X4 instances cater to memory-intensive applications, while Z3 instances are ideal for storage-intensive workloads. Furthermore, if you need complete control over your hardware environment and maximum performance isolation, we offer bare metal instances.

These options help ensure that even the most specialized and performance-sensitive workloads can find an optimal and cost-effective home within the Compute Engine portfolio.

2. Optimize your block storage selections

The best way to lower your block storage TCO, while ensuring your workloads remain successful, is to drive high resource efficiency. Hyperdisk makes it simple to drive high performance and high efficiency by enabling you to optimize your block storage to your workload and through Storage Pools. We’ll discuss each of these capabilities, and how you can use them to lower your block storage TCO below.

Workload Optimized: With Hyperdisk, you can independently tune capacity and performance to match your block storage resources to your workload. Hyperdisk enables you to independently provision performance and capacity at the volume level. You can leverage this capability to purchase just the capacity and performance you need, no more and no less. You can also take advantage of Hyperdisk Balanced’s “baseline” performance (i.e. included free with every volume), you can serve the vast majority of your VMs without purchasing any extra performance.

Storage Pools: Hyperdisk is the only hyperscale cloud block storage to offer thin-provisioned performance and capacity. With Hyperdisk Storage Pools, you can provision the aggregate performance and capacity your workload requires, while still provisioning the volume level capacity performance your workloads request (also known as thin-provisioning). This allows you to pay for the resources you need, not the sum of the volumes you’ve provisioned. As a result, you can lower your overall block storage TCO by as much as 50%.

For more information on how to select the right block storage for your workload and to see how customers have benefitted from Hyperdisk, read this blog.

3. Consider custom compute classes

To get the most out of our latest-generation VMs, Google Kubernetes Engine (GKE) custom compute classes (CCC) offer an advanced way to optimize compute choices and provide high availability. Instead of being limited to a single machine type for your workloads, you can define a prioritized list of VM instance types. This allows you to set the newest, most price-performant VMs — including our latest-generation VMs — as your top priority. GKE custom compute classes provide the capability to automatically and seamlessly spin up instances based on your specified priority list. This feature helps you maximize the availability of your compute capacity while still aiming for the most cost-effective options, so your workloads can scale reliably without manual intervention.

Here are some specific use cases for how custom compute classes can help you optimize costs:

Autoscaling cost-performant fallbacks: When demand peaks, you might be tempted to autoscale using a highly available but less cost-efficient VM type. CCC allows you to take a tiered approach. You can set up several cost-efficient fallback alternatives, so that as demand increases, GKE first attempts to use the most cost-effective options, and progressively moves to the other choices in your list when necessary to meet demand.
AI/ML inference: Running AI/ML inference workloads often involves significant compute resources. Instead of maintaining a large, static reservation that might sit idle during off-peak times, CCC lets you provision a minimal base reservation and leverage more cost-effective capacity types, such as Spot VMs, to handle peak inference demand — all orchestrated through your CCC configuration.
Adopting new VM generations: Combine the power of GKE custom compute classes with Compute Flexible committed use discounts (Flex CUDs) to de-risk the adoption of new, cost-efficient VM series like N4 and C4. With CCC, you can define fallback options, providing workload resilience, while Flex CUDs offer financial adaptability, as the discounts apply across your total eligible compute spend, regardless of the specific VM series you use. This dual approach is a safe, cost-effective strategy for leveraging the latest hardware without disruption. For more information, read this blog.
Using flexible Spot VMs: Spot VMs offer significant savings but can be preempted. Being constrained to a single Spot VM shape increases the risk that capacity will not be available. With CCC, you can define multiple fallback Spot VM types. This "spot surfing" capability allows the application to remain on cost-efficient Spot capacity by automatically pivoting to alternative Spot instance types if the primary choice is unavailable.

In short, by leveraging GKE CCC, you can artfully mix and match various VM types and consumption models, including On-Demand, Spot, DWS FlexStart, and instances covered by CUDs, to build a resilient and highly cost-optimized infrastructure that adapts to the unique needs and patterns of your workloads.

4. Leverage custom machine types (CMT)

Custom machine types, available on N4 VMs, allow you to precisely configure virtual machines to your exact specifications. Rather than selecting from predefined machine types that might include excess capacity, you can tailor the CPU-to-memory ratio specifically for your workloads, so you only pay for resources you actually use. This targeted approach minimizes waste and can significantly reduce your cloud spend, especially when migrating from on-premises to Google Cloud or from other cloud providers.

This flexibility becomes particularly valuable if your applications have unique resource profiles that don't align well with our standard offerings. Custom machine types let you create the perfect environment for your needs. By avoiding the compromise of over-provisioning certain resources while potentially constraining others, you can achieve both better performance and more efficient spending across your Compute Engine deployment.

As an example, take a memory-intensive workload that runs best with 16 vCPU, and 70 GB memory. Normally, you would need to pick a VM with 128 GB memory with our standard shapes, or in other cloud contexts, resulting in higher costs to run your workload due to the extra provisioned resources. Instead, with custom machine types, you can easily launch a VM with 16 vCPU and 70 GB memory, resulting in an 18% cost savings vs standard N4-highmem-16 VMs.

5. Make the most of committed use discounts

CUDs are a strategic cost-saving opportunity for organizations with steady, predictable computing needs. By committing to resource usage over one- or three-year periods, you can reduce cloud costs by up to 70% compared to on-demand pricing. This approach not only helps ensure budget predictability but also converts fixed infrastructure spending into a financial advantage, making it ideal for stable workloads that support core business functions.

Google Cloud offers flexible CUD structures to align with various operational models. Resource-based commitments target specific machine types and regions, flexible commitments apply discounts across projects, regions, and machine series — great for dynamic environments. By analyzing historical usage and forecasting future needs, you can identify workloads suited for these discounts, reinvesting the savings into innovation and scaling initiatives.

6. Manage unused disk space

You pay for the total provisioned disk space, regardless of how much you actually use. Many organizations tend to over-provision storage "just in case," which often leads to unnecessary and costly waste. For instance, if you provision a 100GB disk but only use 20GB, you're still paying for the entire 100GB. Being intentional and precise with your storage allocations — rather than rounding up to common sizes — can lead to significant cost savings.

To optimize spending, it's important to adopt a few best practices. Using Ops Agent, regularly audit disk usage across your infrastructure to identify and eliminate inefficiencies. Resize disks to align with actual consumption, allowing a reasonable buffer for growth. Implement automated alerts in Cloud Monitoring to detect underutilized disks and take corrective action. For stateless applications, consider using smaller boot disk images to minimize overhead and reduce costs even further.

In addition, consider the following optimization strategies to further reduce costs and improve efficiency:

Use Google Cloud’s monitoring tools to track CPU, memory, and disk usage over time.
Establish a regular review cycle to identify and right-size over-provisioned resources.
Test workloads across different VM configurations to find the optimal balance between cost and performance.

7. Use Spot VMs

Spot VMs provide the same machine types and configuration options as standard virtual machines but at a significantly reduced cost — typically offering a 60% to 91% discount. This cost efficiency comes with the tradeoff of potential preemption at short notice, making them most suitable for workloads that are fault-tolerant and can recover quickly from unexpected interruptions. Spot VMs are designed to take advantage of unused compute capacity, allowing you to optimize your cloud spending without compromising access to high-performance resources.

Strong use cases for Spot VMs include batch processing jobs, big data and analytics workloads, continuous integration and deployment (CI/CD) pipelines, stateless web servers running in autoscaling groups, and compute-heavy tasks. When properly architected to handle interruptions — for example, by using job checkpointing, load balancing, task queues, or via GKE custom compute classes (see more above) — Spot VMs can play a critical role in minimizing infrastructure costs while maintaining high availability and system resilience. Leveraging Spot VMs in these scenarios lets you scale cost-effectively, especially when compute demand is variable or time-flexible.

8. Use optimization recommendations

Google Cloud's Recommenders are a powerful tool designed to help you optimize your cloud resources efficiently. When browsing the Google Cloud console, you may see lightbulb icons next to specific resources — these indicate potential improvements identified by Google's recommendation engine. By analyzing real-time usage patterns and current resource configurations, the Recommender delivers actionable insights tailored to each user's unique environment. This intelligent system highlights opportunities not only to reduce costs but also to enhance security, performance, reliability, management efficiency, and environmental sustainability.

For example, there are idle VM recommendations to help you identify VM instances that have not been used over the last 1 to 14 days. Common recommendations include switching to more suitable machine types, rightsizing underutilized compute instances, or adopting more cost-effective storage solutions. The tool allows you to apply many of these changes directly, streamlining the optimization process. By continuously evaluating workloads and offering these automated, data-driven suggestions, the Recommendation Hub helps organizations maintain cloud performance while managing costs more effectively.

9. Take advantage of auto-scaling and scheduling

Matching your compute resources to actual demand patterns is one of the most effective ways to reduce cloud waste and improve overall cost efficiency. Many organizations over-provision their resources to handle peak workloads, leaving machines underutilized during off-peak periods. By aligning compute capacity more closely with real-time or predictable usage patterns, such as business hours or seasonal trends, you can significantly cut unnecessary spending without sacrificing performance.

Autoscaling is the key to achieving this efficiency. In fact, customers who leverage Google Compute Engine autoscaling for their virtual machines have seen average infrastructure cost savings of more than 40%.

You can implement autoscaling strategies to dynamically adjust resources based on CPU utilization, load balancing capacity, or custom application metrics, so that workloads receive the necessary compute power when needed, while scaling down automatically during low-demand periods.

For workloads with predictable patterns, such as those that fluctuate with business hours or planned seasonal events, schedule-based scaling is a particularly powerful tool. This approach allows you to proactively increase resources in anticipation of high demand and scale them down during lulls, for the performance you need without constant over-provisioning.

In addition to autoscaling, several practical implementation techniques can further optimize your resource usage. Setting up instance scheduling lets you automatically start and stop development and test environments according to business hours — a simple yet highly effective approach that can lead to cost savings of up to 70%. You can also leverage maintenance windows to reduce disruptions and resource consumption, by concentrating updates and system changes into low-usage periods. Together, these tactics help maintain high availability and performance while keeping infrastructure costs under control.

10. Understand your spend with detailed billing analysis

Before implementing any cost-saving strategies in Google Cloud, it’s essential to understand your current spending in detail. Google Cloud’s billing panel offers granular visibility into your expenses, including costs broken down by individual SKUs. This level of transparency lets you track where your money is going and identify potential inefficiencies. Begin by regularly reviewing your billing dashboard to monitor usage trends and spot anomalies. Applying labels and tags to your resources can further help categorize and attribute costs accurately, especially in complex environments with multiple projects or departments.

In addition, setting up budget alerts is a practical way to stay ahead of overspending by notifying you when costs approach or exceed predefined thresholds. It’s also important to identify and eliminate unused or idle resources, such as virtual machines or persistent disks that are no longer in active use — these can often be shut down or deleted to immediately reduce costs. By thoroughly analyzing your cost structure, you can uncover “low-hanging fruit” — resources that provide little or no value — and make data-driven decisions to optimize your cloud usage efficiently.

11. Consider serverless alternatives

Last but not least, Google Cloud's serverless computing offerings provide a compelling alternative to traditional virtual machines, can deliver better cost efficiency, simplified operations, and greater scalability. By abstracting away infrastructure management, serverless platforms allow teams to focus on writing and deploying code without worrying about provisioning, scaling, or maintaining servers. This shift can not only reduce operational overhead but also cut costs by aligning compute spending directly with application usage.

There are multiple serverless options available, each tailored to different workloads. Cloud Run is designed for running containerized applications that need rapid scaling and flexible deployment. Cloud Run Functions supports lightweight, event-driven code execution for microservices or automation tasks. GKE (Autopilot Mode) simplifies Kubernetes operations by automatically managing nodes and scaling, allowing you to run Kubernetes workloads without handling the underlying infrastructure. All these options charge based on usage not allocation, significantly reducing costs associated with idle resources and over-provisioning. This makes them especially beneficial for variable or unpredictable workloads. Cloud Run and GKE both support GPU’s and flexibility to move between the two. You can start with Cloud Run then move to GKE or vice-versa. Some customers also leverage both offerings for workloads. The rule of thumb is to start with GKE if you need access to the Kubernetes API. Otherwise, start with Cloud Run.

Start reducing your costs today

Migrate to Google Cloud and optimize your infrastructure costs without compromising on what your workloads need. If you are new to Google Cloud, start with a migration assessment. Google Cloud’s Migration Center can help you with a clear understanding of your potential savings by migrating to Google Cloud, with detailed recommended paths for your workloads, along with TCO reports. Apply the strategies in this article and unlock substantial cost savings.

5 best practices for Managed Lustre on Google Kubernetes Engine

Fri, 19 Sep 2025 16:00:00 +0000

Google Kubernetes Engine (GKE) is a powerful platform for orchestrating scalable AI and high-performance computing (HPC) workloads. But as clusters grow and jobs become more data-intensive, storage I/O can become a bottleneck. Your powerful GPUs and TPUs can end up idle, while waiting for data, driving up costs and slowing down innovation.

Google Cloud Managed Lustre is designed to solve this problem. Many on-premises HPC environments already use parallel file systems, and Managed Lustre makes it easier to bring those workloads to the cloud. With its managed Container Storage Interface (CSI) driver, Managed Lustre and GKE operations are fully integrated.

Optimizing your move to a high-performance parallel file system can help you get the most out of your investment from day one.

Before deploying, it's helpful to know when to use Managed Lustre versus other options like Google Cloud Storage. For most AI and ML workloads, Managed Lustre is the recommended solution. It excels in training and checkpointing scenarios that require very low latency (less than a millisecond) and high throughput for small files, which keeps your expensive accelerators fully utilized. For data archiving or workloads with large files (over 50 MB) that can tolerate higher latency, Cloud Storage FUSE with Anywhere Cache can be another choice.

Based on our work with early customers and the learnings from our teams, here are five best practices to ensure you get the most out of Managed Lustre on GKE.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud containers and Kubernetes'), ('body', <wagtail.rich_text.RichText object at 0x7f806e7ec6a0>), ('btn_text', ''), ('href', ''), ('image', None)])]>

1. Design for data locality

For performance-sensitive applications, you want your compute resources and storage to be as close as possible, ideally within the same zone in a given region. When provisioning volumes dynamically, the volumeBindingMode parameter in your StorageClass is your most important tool. We strongly recommend setting it to WaitForFirstConsumer. GKE provides a built-in StorageClass for Managed Lustre that uses WaitForFirstConsumer binding mode by default.

Generated yaml:

code_block: <ListValue: [StructValue([('code', 'apiVersion: storage.k8s.io/v1\r\nkind: StorageClass\r\nmetadata:\r\n name: lustre-regional-wait\r\nprovisioner: lustre.csi.storage.gke.io\r\nvolumeBindingMode: WaitForFirstConsumer\r\n...'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e7ecbb0>)])]>

Why it’s a best practice: Using WaitForFirstConsumer instructs GKE to delay the provisioning of the Lustre instance until a pod that needs it is scheduled. The scheduler then uses the pod's topology constraints (i.e., the zone it's scheduled in) to create the Lustre instance in that exact same zone. This guarantees co-location of your storage and compute, minimizing network latency.

2. Right-size your performance with tiers

Not all high-performance workloads are the same. Managed Lustre offers multiple performance tiers (read and write throughput in MB/s per TiB of storage) so you can align cost directly with your performance requirements.

1000 & 500 MB/s/TiB: Ideal for throughput-critical workloads like foundation model training or large-scale physics simulations where I/O bandwidth is the primary bottleneck.
250 MB/s/TiB: A balanced, cost-effective tier great for many general HPC workloads and AI inference serving, and data-heavy analytics pipelines.
125 MB/s/TiB: Best for large-capacity use cases where having a massive, POSIX-compliant file system is more important than achieving peak throughput. This is also useful for migrating on-premises containerized applications without modification, making it easier to migrate on-premises workloads to the cloud storage.

Why it’s a best practice: Defaulting to the highest tier isn't always the most cost-effective strategy. By analyzing your workload’s I/O profile, you can significantly optimize your total cost of ownership.

3. Master your networking foundation

A parallel file system is a network-attached resource. Getting the networking right up front will save you days of troubleshooting. Before provisioning, ensure your VPC is correctly configured by following the setup steps in our documentation. This involves three key steps detailed in our documentation:

Enable Service Networking.
Create an IP range for VPC peering.
Create a firewall rule to allow traffic from that range on the Lustre network port (TCP 988 or 6988).

Why it’s a best practice: This is a one-time setup per VPC that establishes the secure peering connection that allows your GKE nodes to communicate with the Managed Lustre service.

4. Use dynamic provisioning for simplicity, static for long-lived shared data

The Managed Lustre CSI driver supports two modes for connecting storage to your GKE workloads.

Dynamic provisioning: Use when your storage is tightly coupled to the lifecycle of a specific workload or application. By defining a StorageClass and PersistentVolumeClaim (PVC), GKE will automatically manage the Lustre instance lifecycle for you. This is the simplest, most automated approach.
Static provisioning: Use when you have a long-lived Lustre instance that needs to be shared across multiple GKE clusters and jobs. You create the Lustre instance once, then create a PersistentVolume (PV) and PVC in your cluster to mount it. This decouples the storage lifecycle from any single workload.

Why it’s a best practice: Thinking about your data’s lifecycle helps you choose the right pattern. Use dynamic provisioning as your default because of simplicity, and opt for static provisioning when you need to treat your file system as a persistent, shared resource across your organization.

5. Architecting for parallelism with Kubernetes Jobs

Many AI and HPC tasks, like data preprocessing or batch inference, are suited for parallel execution. Instead of running a single, large pod, use the Kubernetes Job resource to divide the work across many smaller pods.

Consider this pattern:

Create a single PersistentVolumeClaim for your Managed Lustre instance, making it available to your cluster.
Define a Kubernetes job with parallelism set to a high number (e.g., 100).
Each pod created by the Job mounts the same Lustre PVC.
Design your application so that each pod works on a different subset of the data (e.g., processing a different range of files or data chunks).

Why it’s a best practice: In this pattern, you create a single PVC for your Lustre instance and have each pod created by the Job mount that same PVC. By designing your application so that each pod works on a different subset of the data, you turn your GKE cluster into a powerful, distributed data processing engine. The GKE Job controller acts as the parallel task orchestrator, while Managed Lustre serves as the high-speed data backbone, allowing you to achieve massive aggregate throughput.

Get started today

By combining the orchestration power of GKE with the performance of Managed Lustre, you can build a truly scalable and efficient platform for AI and HPC. Following these best practices will help you create a solution that is not only powerful, but also efficient, cost-effective, and easy to manage.

Ready to get started? Explore the Managed Lustre documentation, and provision your first instance today.

Storage Insights datasets: How to optimize storage spend with deep visibility

Wed, 27 Aug 2025 17:00:00 +0000

Managing vast amounts of data in cloud storage can be a challenge. While Google Cloud Storage offers strong scalability and durability, storage admins sometimes sometimes struggle with questions like:

What’s driving my storage spend?
Where is all my data in Cloud Storage and how is it distributed?
How can I search across my data for specific metadata such as age or size?

Indeed, to achieve cost optimization, security, and compliance, you need to understand what you have, where it is, and how it's being used. That's where Storage Insights datasets, a feature of Storage Intelligence for Cloud Storage, comes in. Storage Intelligence is a unified management product that offers multiple powerful capabilities to analyze large storage estates and easily take actions. It helps you explore your data, optimize costs, enforce security, and implement governance policies. Storage insights datasets help you deeply analyze your storage footprint and you can use Gemini Cloud Assist for quick analysis in natural language. Based on these analyses, you can take action, such as relocating buckets and performing large-scale batch operations.

In this blog, we focus on how you can use Insights datasets for cost management and visibility, exploring a variety of common use cases. This is especially useful for cloud administrators and FinOps teams performing cloud cost allocation, monitoring and forecasting.

What are Storage Insights datasets?

Storage Insights datasets provide a powerful, automated way to gain deep visibility into your Cloud Storage data. Instead of manual scripts, custom one-off reports for buckets or managing your own collection pipelines, Storage Insights datasets generate comprehensive reports about your Cloud Storage objects and their activities, placing them directly in a BigQuery linked dataset.

Think of it as X-ray vision for your Cloud Storage buckets. It transforms raw storage metadata into structured, queryable data that you can analyze with familiar BigQuery tools to gain crucial insights, with automatic data refreshes delivered every 24hrs (after the initial set up, which could take up to 48hrs for the first load).

Key features

Customizable scope: Set the dataset scope to be at the level of the organization, a folder containing projects, a project / set of projects, or a specific bucket.
Metadata dataset: It provides a queryable dataset that contains bucket and object metadata directly in BigQuery.
Regular updates and retention: After the first load, datasets update with metadata every 24 hours and can retain data for up to 90 days.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f807ce74ca0>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Use cases

Calculate routine showback
Understanding who/which applications are consuming what storage is often the first step in effective cost management, especially for larger organizations. With Storage Insights datasets, your object and bucket metadata is available in BigQuery. You can run SQL queries to aggregate storage consumption by specific teams, projects, or applications. You can then attribute storage consumption by buckets or prefixes for internal chargeback or cost attribution, for example: "Department X used 50TB of storage in gs://my-app-data/department-x/ last month”. This transparency fosters accountability and enables accurate internal showback.

Here’s an example SQL query to determine the total storage per bucket and prefix in the dataset:

code_block: <ListValue: [StructValue([('code', "SELECT\r\n bucket,\r\n SPLIT(name, '/')[\r\nOFFSET\r\n (0)] AS top_level_prefix,\r\n SUM(size) AS total_size_bytes\r\nFROM\r\n object_attributes_view\r\nGROUP BY\r\n bucket, top_level_prefix\r\nORDER BY\r\n total_size_bytes DESC;\r\n//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details."), ('language', 'lang-sql'), ('caption', <wagtail.rich_text.RichText object at 0x7f806f15c970>)])]>

Understand how much data you have across storage classes
Storage Insights datasets identifies the storage class for every object in your buckets. By querying storageClass, timeCreated, updated in the object metadata view in BigQuery, you can quickly visualize your data distribution across various classes (standard, nearline, coldline, archive) for objects beyond a certain age, as well as when they were last updated. This lets you identify potentially misclassified data. It also provides valuable insights into whether you have entire buckets with coldline or archived data or if your objects unexpectedly moved across storage classes (for example, a file expected to be in archive is now in standard class) using the timeStorageClassUpdated object metadata.

Here’s an example SQL query to see all objects created two years ago, without any updates since and in standard class:

code_block: <ListValue: [StructValue([('code', "SELECT\r\n bucket,\r\n name,\r\n size,\r\n storageClass,\r\n timeCreated,\r\n updated\r\nFROM object_attributes_latest_snapshot_view\r\nWHERE\r\n EXTRACT(YEAR\r\n FROM\r\n timeCreated) = EXTRACT(YEAR\r\n FROM\r\n DATE_SUB(CURRENT_DATE(), INTERVAL 24 MONTH))\r\n AND (updated IS NULL\r\n OR updated = timeCreated)\r\n AND storageClass = 'STANDARD'\r\nORDER BY\r\n timeCreated;\r\n//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details."), ('language', 'lang-sql'), ('caption', <wagtail.rich_text.RichText object at 0x7f806f15c0d0>)])]>

Set lifecycle and autoclass policies: Automating your savings

Manual data management is time-consuming and prone to error. Storage Insights datasets helps you identify where the use of Object Lifecycle Management (OLM) or Autoclass might reduce costs.

Locate the buckets that don’t have OLM or Autoclass configured: Through Storage Insights datasets, you can query bucket metadata to see which buckets lack defined lifecycle policies by using the field lifecycle, autoclass.enabled. If a bucket contains data that should naturally transition to colder storage or be deleted after a certain period, but has no policy, you can take the appropriate action by knowing which parts of your estate you need to investigate further. Storage Insights datasets provides the data to flag these "unmanaged" buckets, helping you enforce best practices.

Here’s an example SQL query to see all buckets with lifecycle or autoclass configurations enabled and all those without any active configuration:

code_block: <ListValue: [StructValue([('code', "SELECT\r\n name AS bucket_name,\r\n storageClass AS default_class,\r\n CASE\r\n WHEN lifecycle = TRUE OR autoclass.enabled = TRUE THEN 'Managed'\r\n ELSE 'Unmanaged'\r\nEND\r\n AS lifecycle_autoclass_status\r\nFROM bucket_attributes_latest_snapshot_view\r\n//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details."), ('language', 'lang-sql'), ('caption', <wagtail.rich_text.RichText object at 0x7f806e5d70a0>)])]>

Evaluate Autoclass impact: Autoclass automatically transitions objects between storage classes based on a fixed access timeline. But how do you know if it's working as expected or if further optimization is needed? With Storage Insights datasets, you can find the buckets with autoclass enabled using the autoclass.enabled field and analyze object metadata by tracking the storageClass, timeStorageClassUpdated field over time for specific objects within Autoclass-enabled buckets. This allows you to evaluate the effectiveness of Autoclass, verify if the objects specified are indeed moving to optimal classes, and understand the real-world impact on your costs. For example, once you configure Autoclass on a bucket, you can visualize the movement of your data between storage classes on Day 31 as compared to Day 1 and understand how autoclass policies take effect on your bucket.
Evaluate Autoclass suitability: Analyze your bucket’s data to determine if it’s appropriate to use Autoclass with it. For example, if you have short-lived data (less than 30 days old) in a bucket (you can assess objects in daily snapshots to determine the average life of an object in your bucket using timeCreated and timeDeleted), you may not want to turn on Autoclass.

Here’s an example SQL query to find a count of all objects with age more than 30 days and age less than 30 days in bucketA and bucketB:

code_block: <ListValue: [StructValue([('code', "SELECT\r\n SUM(\r\n CASE\r\n WHEN TIMESTAMP_DIFF(t1.timeDeleted, t1.timeCreated, DAY) < 30 THEN 1\r\n ELSE 0\r\n END\r\n ) AS age_less_than_30_days,\r\n SUM(\r\n CASE\r\n WHEN TIMESTAMP_DIFF(t1.timeDeleted, t1.timeCreated, DAY) > 30 THEN 1\r\n ELSE 0\r\n END\r\n ) AS age_more_than_30_days\r\nFROM\r\n `object_attributes_view` AS t1\r\nWHERE\r\n t1.bucket IN ( 'bucketA', 'bucketB')\r\n AND t1.timeCreated IS NOT NULL\r\n AND t1.timeDeleted IS NOT NULL;\r\n//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details."), ('language', 'lang-sql'), ('caption', <wagtail.rich_text.RichText object at 0x7f806e5d7370>)])]>

Proactive cleanup and optimization

Beyond routine management, Storage Insights datasets can help you proactively find and eliminate wasted storage.

Quickly find duplicate objects: Accidental duplicates are a common cause of wasted storage. You can use object metadata like size, name or even crc32c checksums in your BigQuery queries to identify potential duplicates. For example, finding multiple objects with the exact same size, checksum and similar names might indicate redundancy, prompting further investigation.

Here’s an example SQL query to list all objects where their size, crc32c checksum field and name are the same values (indicating potential duplicates):

code_block: <ListValue: [StructValue([('code', 'SELECT\r\n name,\r\n bucket,\r\n timeCreated,\r\n crc32c,\r\n size\r\nFROM (\r\n SELECT\r\n name,\r\n bucket,\r\n timeCreated,\r\n crc32c,\r\n size,\r\n COUNT(*) OVER (PARTITION BY name, size, crc32c) AS duplicate_count\r\n FROM\r\n `object_attributes_latest_snapshot_view` )\r\nWHERE\r\n duplicate_count > 1\r\nORDER BY\r\nsize DESC;\r\n//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.'), ('language', 'lang-sql'), ('caption', <wagtail.rich_text.RichText object at 0x7f806e5d7970>)])]>

Find temporary objects to be cleaned up: Many applications generate temporary files that, if not deleted, accumulate over time. Storage Insights datasets allows you to query for objects matching specific naming conventions (e.g., *_temp, *.tmp), or located in "temp" prefixes, along with their creation dates. This enables you to systematically identify and clean up orphaned temporary data, freeing up valuable storage space.

Here’s an example SQL query to find all log files created a month ago:

code_block: <ListValue: [StructValue([('code', 'SELECT\r\n name, bucket, timeCreated, size\r\n FROM\r\n \'object_attributes_latest_snapshot_view\'\r\n WHERE\r\n name LIKE "%.log"\r\nAND DATE(timeCreated) <= DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH)\r\nORDER BY\r\nsize DESC;\r\n//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.'), ('language', 'lang-sql'), ('caption', <wagtail.rich_text.RichText object at 0x7f806e5d72e0>)])]>

List all objects older than a certain date for easy actioning: Need to archive or delete all images older than five years for compliance? Or perhaps you need to clean up logs that are older than 90 days? Storage Insights datasets provides timeCreated and contentType for every object. A simple BigQuery query can list all objects older than your specified date, giving you a clear, actionable list of objects for further investigation. You can use Storage Intelligence batch operations, which allows you to action on billions of objects in a serverless manner.
Check SoftDelete suitability: Find buckets that have a high storage size of data that has been soft deleted by querying for the presence of softDeleteTime and size in the object metadata tables. In those cases, data seems temporary and you may need to investigate soft delete cost optimization opportunities.

Taking your analysis further

The true power of Storage Intelligence Insights datasets lies not just in the raw data it provides, but in the insights you can derive and the subsequent actions you can take. Once your Cloud Storage metadata is in BigQuery, the possibilities for advanced analysis and integration are vast.

For example, you can use Looker Studio, Google Cloud's no-cost data visualization and dashboarding tool, to directly connect to your BigQuery Insights datasets, transforming complex queries into intuitive, interactive dashboards. Now you can:

Visualize cost trends: Create dashboards that show storage consumption by project, department, or storage class over time. This allows teams to easily track spending, identify spikes, and forecast future costs.
Track fast-growing buckets: Analyze the buckets with the most growth in the past week or month, and compare them against known projects for accurate cost attribution. Use Looker's alerting capabilities to notify you when certain thresholds are met, such as a sudden increase in the total size of data in a bucket.
Set up custom charts for common analysis: For routine FinOps use cases (such as tracking buckets without OLM policies configured or objects past their retention expiration time), you can generate weekly reports to relevant teams for easy actioning.

You can also use our template here to connect to your dataset for quick analysis or you can create your own custom dashboard.

Get started

Configure Storage Intelligence and create your dataset to start analyzing your storage estate via a 30-day trial today. Please refer to our pricing documentation for cost details.

Set up your dataset to a scope of your choosing and start analyzing your data:

Configure a set of Looker Studio dashboards based on team or departmental usage for monthly analysis by the central FinOps team.
Use BigQuery for ad-hoc analysis and to retrieve specific insights.
For a complete cost picture, you can integrate your Storage Insights dataset with your Google Cloud billing export to BigQuery. Your billing export provides granular details on all your Google Cloud service costs, including Cloud Storage.

Immutable, Air-Gapped, and Integrated: Data Protection for your Cloud SQL instances just got better

Wed, 06 Aug 2025 16:00:00 +0000

December 17, 2025: The Enhanced Backups for Cloud SQL capability is now generally available to protect the data in your production Cloud SQL instances. Additional features include support for Terraform, and billing capabilities.

In a world where data is your most valuable asset, protecting it isn’t just a nice-to-have — it's a necessity. That's why we are thrilled to announce a significant leap forward in protecting the data in your Cloud SQL instances, with Enhanced Backups for Cloud SQL.

This powerful new capability integrates Google Cloud Backup and DR Service directly into Cloud SQL, providing a robust, centralized, and secure solution to help ensure business continuity for your database workloads. The Backup and DR Service already protects Compute Engine VMs, Persistent Disks, and Hyperdisk, extending its ability to protect all of your workloads.

Modern defense for modern threats

Enhanced Backups for Cloud SQL provides advanced protection by storing database backups in logically air-gapped and immutable backup vaults. Managed by Google and completely separate from your source project, these vaults provide a critical defense against threats that could compromise your entire environment.

For customers like JFrog, Cloud SQL Enhanced Backup with Google Cloud Backup and DR is proving to be a superior and robust alternative:

"Using this integration will help us significantly bolster our security posture by offering logically air-gapped and immutable backup vaults, creating a vital defense layer against diverse data-loss scenarios.” - Shiran Melamed, DevOps Group Leader, JFrog

Control, compliance, and peace of mind

We designed Enhanced Backups to be both powerful and easy to use, giving you fine-grained control over your data protection strategy. These capabilities are now available in Preview for both Cloud SQL Enterprise and Enterprise Plus editions, and offer key features to help ensure your data is always secure and recoverable:

Immutable, air-gapped vaults: Protect your data with immutable backups stored in a secure, logically air-gapped vault. Setting minimum enforced retention and retention locks ensure backups cannot be deleted or changed for a predefined period, while a zero-trust access policy provides granular control.
Business continuity: Your data is safeguarded against both source-instance and source-project deletion, so you can recover your data even if the source project itself becomes unavailable.
Flexible policies that fit your needs: Your business isn't one-size-fits-all, and your backup strategy shouldn't be either. We offer highly customizable backup schedules, including hourly, daily, weekly, monthly, and yearly options. You can store backups for periods ranging from days to decades.
Centralized command and control: Manage everything from a single, unified dashboard in the Google Cloud console. Monitor job status, identify unprotected resources, and generate reports, all in one place.

But you don't have to take our word for it. See how customers like SQUARE ENIX and Rotoplas are already benefiting from Enhanced Backups for Cloud SQL:

"At SQUARE ENIX, protecting our users' data is paramount. Google Cloud SQL's Enhanced Backup integrated with the Backup and DR service is essential to our resiliency strategy. Its robust protection against instance- and even project-level deletion, combined with a secure, isolated vault and long-term retention, provides a critical safeguard for our most valuable asset. This capability will give us confidence in our data's integrity and recoverability, allowing our teams to focus on creating the unforgettable experiences our users expect." – Kazutaka Iga, SRE,SQUARE ENIX

"Google Cloud SQL's Enhanced Backup feature along with Google Professional Services support is a value add to our backup strategy at Rotoplas. The ability to centralize management, flexibly schedule backups, and store them independent of the source project gives us unprecedented control. This streamlined approach simplifies our operations and enhances security, ensuring our data is always protected and easily recoverable." - Agustín Chávez Cabrera, Devops manager, Rotoplas

Get started with Enhanced Backups

Getting started with Enhanced Backups is simple. Here’s how you can enable this enhanced protection for your Cloud SQL instances:

1. Create or select a backup vault: In the Backup and DR service, either create a new backup vault or use an existing one.

2. Create a backup plan: Define a backup plan for Cloud SQL within your chosen backup vault, setting your desired backup frequency and retention rules.

3. Apply the backup plan to the Cloud SQL instances: Apply your new backup plan to existing or new Cloud SQL instances.

Once you apply a backup plan, your backups will automatically be scheduled and moved to the secure backup vault based on the rules you defined. The entire experience can be managed through the tools you already use — whether it's the Google Cloud console, gcloud command-line tool, or APIs — so there’s no additional infrastructure for you to deploy or manage.

Protect your data now

With Enhanced Backups for Cloud SQL, you can build a superior data protection strategy that enhances security, simplifies operations, and strengthens your overall data resilience for Cloud SQL instances.

Get started and use it yourself. The new features are available now in supported regions.

Experience the new management solution in the console.
Watch this demo video and see the new features in action.
Explore the documentation to learn more about Enhanced Backups for Cloud SQL, disk backups, and VM backups. today.

Enhancing GKE data protection with cross-project backup and restore

Fri, 11 Jul 2025 16:00:00 +0000

As Google Kubernetes Engine (GKE) deployments grow and scale, adopting a multi-project strategy in Google Cloud becomes a best practice for security and environment organization. Creating clear boundaries by using distinct projects for development, testing, and production environments provides isolation and helps manage access control.

However, isolation introduces a data protection challenge: How do you effectively manage backups across these project boundaries? Without a native solution, centralizing backups, ensuring a clear separation of duties with IAM, and enabling robust disaster recovery all become complex tasks, often forcing teams to rely on custom scripts or inefficient manual processes.

Introducing cross-project backup and restore

To address this, Backup for GKE, now in preview, supports cross-project backup and restore. This new capability allows you to back up workloads from a GKE cluster in one Google Cloud project, securely store the backups in a second, and restore them to a cluster in a third. This streamlines data protection, enhances your security posture, and offers greater flexibility for your operational workflows.Storing backups in a separate, isolated project and region is essential for modern disaster recovery, safeguarding your recovery capability during a regional outage or a compromise in a primary Google Cloud project — the foundation of a resilient infrastructure. This separation also simplifies regulatory compliance, boosts security by limiting the blast radius of any potential incident, and helps you meet RTO/RPO objectives.

Key benefits of cross-project backup and restore

Centralized backup management: Consolidate GKE backups from multiple Google Cloud projects into a single project by pointing the backup plan for each cluster to the chosen backup project. This simple configuration provides your team with one control plane to oversee monitoring and manage backup policies.
Enhanced disaster recovery: Storing GKE backups in a separate project and region provides a vital layer of isolation, boosting your resilience against events like regional outages. If your source region becomes unavailable, you can create a restore plan from your backup project to recover your workloads to a cluster in another project.
Streamline operations: seeding, cloning, and collaboration

Cross-project capabilities bring agility to your development lifecycle by simplifying how you copy data between environments. You can now leverage production backup data for testing or rapidly clone entire application environments.

- Seed and clone environments: You can populate a staging environment with data from a prior backup or create a sandbox. Create a restore plan using an existing backup plan located in the backup project, then select a backup — such as one from production for seeding or a dev environment for cloning — and target a cluster in any other project as your destination. This lets you create test environments and isolated sandboxes.
- Simplify cross-team collaboration: Since all backups are stored in a central backup project, you can grant a developer from another team a role like Delegated Restore Admin, and also provide them with read permission on the specific backup plan and all of its associated backups. They can then use it to restore to their cluster without needing access to the other team's live source project.
Achieve separation of duties for security and compliance

Isolating backups in a dedicated project allows you to enforce the principle of least privilege by assigning distinct responsibilities. You can empower your application teams with self-service permissions to back up and restore applications within their own projects, without giving them control over the central backup repository. A central platform or operations team can be granted administrative control over the backup project to govern the entire data lifecycle — from setting retention policies with immutability to conducting audits, all without needing access to live production environments. This separation is key to reducing risk and simplifying audits.

For detailed guidance on Backup for GKE IAM roles and permissions, see the documentation.

Cross-project backup and restore for GKE helps you protect your containerized workloads across multiple Google Cloud projects. This feature allows you to strengthen your disaster recovery capabilities, improve your security posture, and streamline operational workflows.

Get started today

This feature is now generally available. To get started, check out the following guides:

Learn how to perform cross-project backups
Learn how to perform cross-project restores

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f806e2d7f40>), ('btn_text', 'Get started for free'), ('href', 'https://console.cloud.google.com/freetrial?redirectPath=/welcome'), ('image', None)])]>

Cloud Storage bucket relocation: An industry first for non-disruptive bucket migrations

Thu, 10 Jul 2025 16:00:00 +0000

As your operational needs change, sometimes you need to move data residing within Google’s Cloud Storage to a new location, to improve resilience, optimize performance, meet compliance needs, or simply to reorganize your infrastructure. Yet moving buckets can be a daunting, complex, risky endeavor that involves manual scripting, painstaking coordination, and the risk of data loss, or worse yet, extended downtime. This can discourage organizations from making the changes they need to their storage environments.

We recently introduced Cloud Storage bucket relocation, a unique feature among leading hyperscalers that makes it easy to change your bucket’s location. Bucket relocation eliminates the need for complex manual planning and helps prevent extended downtime, for an easy transition with minimal application disruption, and strong data integrity. Your bucket's name, and all the object metadata within it, remain identical throughout the relocation, so there are no path changes, and your applications experience minimal downtime while the underlying storage is moved. Furthermore, your objects retain their original storage class (e.g., Standard, Nearline, Coldline, Archive) and time-in-class in the new location. This is key for many cost efficiency strategies, helping ensure capabilities such as Autoclass continue to operate intelligently to optimize your storage costs post-migration.

Bucket relocation is a key capability within the Storage Intelligence suite, alongside tools like Storage Insights, which provides deep visibility into your storage landscape and identifies optimization opportunities. Bucket relocation then lets you act on these insights, and move your data between diverse Cloud Storage locations — regional locations for low latency, dual-regions for high availability and disaster recovery, or multi-regions for global accessibility — to meet your business, performance, and compliance objectives.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f806e2c99a0>), ('btn_text', 'Get started for free'), ('href', 'https://console.cloud.google.com/freetrial?redirectPath=/welcome'), ('image', None)])]>

Bucket relocation under the hood

Bucket relocation relies on two critical techniques.

Asynchronous data copy: Bucket relocation leverages a unique and optimized asynchronous data transfer mechanism that copies data in the background to minimize impact to ongoing operations. Existing operations like writing, reading, and updating objects continue while the entire dataset is being copied.
Metadata preservation: Historically, Google Cloud customers moved data with the Storage Transfer Service, which copied the objects to a new bucket and deleted existing ones. Bucket relocation, on the other hand, automatically and meticulously moves all your bucket’s and objects’ associated metadata, thereby preserving state. This includes information like:

Storage class: Your objects retain their original storage class (e.g., Standard, Nearline, Coldline, Archive) in the new location.
Bucket and object names: The naming structure of your buckets and objects remains identical.
Creation and update timestamps: These markers are preserved, so that features like object lifecycle management (OLM) rules continue to operate.
Access Control Lists (ACLs) and IAM policies: Bucket- and object-level permissions are transferred to help maintain your security posture.
Custom metadata: Any user-defined metadata associated with your objects is also migrated.

By handling the complexities of asynchronous data transfer and automatic metadata migration, bucket relocation minimizes the risks and overhead associated with a manual bucket migration. Crucially, because the bucket name is preserved throughout the relocation process, applications accessing the bucket don’t need to be modified.

Relocate your bucket in a few simple steps

With bucket relocation, you can move your Cloud Storage buckets in three simple steps. Here's a breakdown:

1. Initiate a dry run:

Before starting the actual relocation, it's highly recommended to perform a dry run. This simulates the process without moving any data, allowing you to identify potential issues early on, such as incompatible configurations.
The dry run checks for incompatibilities like customer-managed encryption keys (CMEK), locked retention policies, objects with temporary holds, and bucket tags, without you having to manually validate each of them.
Make sure to add the --dry-run flag!

code_block: <ListValue: [StructValue([('code', 'gcloud storage buckets relocate gs://BUCKET_NAME --location=LOCATION --dry-run'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e2c9ac0>)])]>

Replace BUCKET_NAME with the name of your bucket and LOCATION with the desired destination.

2. Start the relocation process:

This step initiates the actual data transfer from the source bucket to the destination bucket. During this phase, you can still read, modify, and delete objects in the bucket. However, the bucket metadata (i.e., bucket-level parameters and configurations) is write-locked to prevent changes that could affect the relocation.

Note: Removing the --dry-run flag from the dry-run command initiates the relocation.

code_block: <ListValue: [StructValue([('code', 'gcloud storage buckets relocate gs://BUCKET_NAME --location=LOCATION'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e2c9fa0>)])]>

3. Finalize the relocation process:

Once the incremental data copy is complete, you’re ready to trigger the final synchronization step (except when moving between multi-region and configurable dual-region). This involves a brief period where writes to the bucket are disabled to help ensure their data integrity; any last-second changes made to the objects within the bucket while the incremental copy was in progress are copied to the destination. After the data’s integrity is verified, the bucket's location is updated, and all requests are automatically redirected to the new location. During the final synchronization step, attempts to update objects in the bucket will result in an HTTP 412 error.
Do not initiate the final synchronization process until the relocation process progress reaches ~99%. This helps you minimize downtime because most of the data has already been synchronized in the background.

Note: If you’re moving between multi-regions and configurable dual-regions within the same multi-region code, you’re all set — bucket relocation handles the transition in the background, no finalization or downtime required!

code_block: <ListValue: [StructValue([('code', 'gcloud storage buckets relocate --finalize --operation=projects/_/buckets/BUCKET_NAME/operations/OPERATION_ID'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f806e2c9fd0>)])]>

The OPERATION_ID is provided as output from Step-2. The OPERATION_ID is listed with the keyword name. For instance:

name: projects/_/buckets/my-bucket/operations/AbCJYd8jKT1n-Ciw1LCNXIcubwvij_TdqO-ZFjuF2YntK0r74

And there you have it — In just three steps, you’ven moved your entire bucket, its data, and metadata, to its new location.

Early users of bucket relocation have had great success with the new feature.

“With Storage Intelligence and bucket relocation, we effortlessly transitioned to dual-region buckets. The seamless process, powered by the bucket relocation, minimized downtime and ensured data integrity. We migrated the buckets with peace of mind and without the manual headaches.” - Adam Steele, Product Manager, Spotify

“We recently utilized the bucket relocation feature of Storage Intelligence to successfully complete a ~300 bucket migration and PBs of data project from multi-region to regional storage, to optimize network data transfer costs. Without bucket relocation, this process would have required extensive automation and scripting, resulting in increased downtime and effort.” - Deepak Mahato, Data Platform Infrastructure Manager, GroupOn

Experience the ease and efficiency of managing your Cloud Storage buckets with bucket relocation in Storage Intelligence. To learn more, visit the bucket relocation documentation and the Storage Intelligence overview.

Accelerate your AI workloads with the Google Cloud Managed Lustre

Tue, 08 Jul 2025 17:00:00 +0000

Today, we're making it even easier to achieve breakthrough performance for your AI/ML workloads: Google Cloud Managed Lustre is now GA, and available in four distinct performance tiers that deliver throughput ranging from 125 MB/s, 250 MB/s, 500 MB/s, to 1000 MB/s per TiB of capacity — with the ability to scale up to 8 PB of storage capacity. The Managed Lustre solution is powered by DDN’s EXAScaler, combining DDN's decades of leadership in high-performance storage with Google Cloud's expertise in cloud infrastructure.

Managed Lustre provides a POSIX-compliant, parallel file system that delivers consistently high throughput and low latency, essential for:

High-throughput inference: For applications that require near-real-time inference on large datasets, Lustre provides high parallel throughput and sub-millisecond read latency.
Large-scale model training: Accelerate the training cycles of deep learning models by providing rapid access to petabytes-sized datasets. Lustre's parallel architecture ensures GPUs and TPUs are fed with data, minimizing idle time.
Checkpointing and restarting large models: Save and restore the state of large models during training faster, improving goodput and allowing for more efficient experimentation.
Data preprocessing and feature engineering: Process raw data, extract features, and prepare datasets for training, reducing the time spent on data pipelines.
Scientific simulations and research: Beyond AI/ML, Lustre excels in traditional HPC scenarios like computational fluid dynamics, genomic sequencing, and climate modeling, where massive datasets and high-concurrency access are critical.

Lustre is designed for the highly parallel and random I/O that characterizes many AI/ML training and inference tasks. This parallel processing capability across multiple clients ensures your compute resources are never starved for data.

Performance tiers and pricing

Managed Lustre offers flexible pricing and performance tiers designed to meet the diverse needs of your workloads, whether you're focused on capacity or highest throughput density.

Throughput MB/s per TiB of storage capacity	Storage pricing per GiB per month
125	$0.145
250	$0.21
500	$0.34
1000	$0.60

Please see more details at the Managed Lustre pricing page.

Irrespective of the aggregate throughput, all tiers come with sub-millisecond read latency, high single-stream throughput, and are perfect for parallel access to many small files.

Driving innovation together: partnering with DDN

Google Cloud’s Managed Lustre is powered by DDN’s EXAScaler, bringing together two industry leaders in high-performance computing and elastic cloud infrastructure. This partnership represents a joint commitment to simplifying the deployment and management of large-scale AI and HPC workloads in the cloud, thanks to:

Trusted leaders: By combining DDN's decades of expertise in high-performance Lustre with Google Cloud's global infrastructure and AI ecosystem, we are delivering a foundational capability that removes storage bottlenecks and helps our customers solve their most complex challenges in AI and HPC.
Fully managed and supported solution: Enjoy the benefits of a fully managed service from Google, with comprehensive support from both Google and DDN, for seamless operations and peace of mind.
Global availability and ecosystem integration: Managed Lustre is now globally accessible in multiple Google Cloud regions and integrates with the broader Google Cloud ecosystem, including Google Kubernetes Engine (GKE) and TPUs.

These benefits caught the attention of one of our largest partners, NVIDIA, who is looking forward to having it as part of its NVIDIA AI platform.

"Enterprises today demand AI infrastructure that combines accelerated computing with high-performance storage solutions to deliver uncompromising speed, seamless scalability and cost efficiency at scale. Google and DDN’s collaboration on Google Cloud Managed Lustre creates a better-together solution uniquely suited to meet these needs. By integrating DDN’s enterprise-grade data platforms and Google’s global cloud capabilities, organizations can readily access vast amounts of data and unlock the full potential of AI with the NVIDIA AI platform (or NVIDIA accelerated computing platform) on Google Cloud — reducing time-to-insight, maximizing GPU utilization, and lowering total cost of ownership.” - Dave Salvator, Director of Accelerated Computing Products, NVIDIA

Get started today!

Ready to supercharge your AI/ML and HPC workloads? Getting started with Managed Lustre is simple:

Navigate to Managed Lustre in the Google Cloud console.
Provision your Managed Lustre instance, choosing the performance tier and size that best fits your needs.
Connect your compute instances, GKE clusters to your new high-performance file system.

For detailed instructions and documentation, please visit the Managed Lustre documentation. And if needed, reach out to Google Cloud sales specialists.

Watch the Fireside Chat

Don't miss the opportunity to learn more about the strategic partnership between Google Cloud and DDN, and the unique capabilities of Managed Lustre. Read the official DDN press release here.

Watch the fireside chat with Sameet Agarwal, VP/GM Storage and Sven Oehme, CTO of DDN, here.

Expanding Z3 family with 9 new VMs and a bare metal instance for storage and I/O intensive workloads

Tue, 08 Jul 2025 16:00:00 +0000

Today, we are thrilled to announce the expansion of the Z3 Storage Optimized VM family with the general availability of nine new Z3 virtual machines that offer local SSD capacity ranging from 3 TiB to 18 TiB per VM, complementing existing Z3 VMs which offer 36TiB of Local SSD per VM. We are also very pleased to launch a Z3 bare metal instance, which includes up to 72 TiB of Local SSDs. Z3 VMs enable customers like Shopify, Tenderly and ScyllaDB to achieve impressive performance improvements for their high performance storage workloads by reducing the IO access latency by up to 35% compared to VM instances using previous-generation local SSDs.

Z3 VMs are designed to run I/O-intensive workloads that require large local storage capacity and high storage performance, including SQL, NoSQL, and vector databases, data analytics, semantic data search and retrieval, and distributed file systems. The Z3 bare metal instance provides direct access to the physical server CPUs and is ideal for workloads that require low-level system access like private and hybrid cloud platforms, custom hypervisors, container platforms, or applications with specialized performance or licensing needs.

Both Z3 VMs and the bare metal instance are based on Titanium SSDs, which offload local storage processing from CPU resources to deliver real-time data processing, low-latency, high-throughput storage performance and enhanced storage security. Z3 VMs with Titanium SSD offer up to 36 GiB/s of read throughput and up to 9M IOPS, increasing write storage performance by up to 25% compared to previous generation Local SSDs¹.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud infrastructure'), ('body', <wagtail.rich_text.RichText object at 0x7f807e8718e0>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectPath=/compute'), ('image', None)])]>

Based on the 4th Gen Intel Xeon scalable processor, Z3 VMs come with up to 176 vCPUs, 1,408 GiB of memory, and 36 TiB of local storage in 11 virtual machine shapes. The Z3 bare metal instance offers 192 vCPUs, 1,536 GiB of memory and 72 TiB of local storage. Z3 VMs and the bare metal instance deliver the connectivity and storage performance that enterprise workloads need, with up to 100 Gbps in standard bandwidth and up to 200 Gbps with Tier1 networking for high-traffic applications.

The expanded Z3 virtual machine portfolio lets you rightsize your infrastructure and scale your clusters to meet workloads requirements by providing larger total local SSD capacity and higher local SSD capacity per vCPU. Z3 offers two different VM types: the standardlssd VM types, which include five VM shapes that offer about 200 GiB of local SSD per vCPU. They are optimized for data analytics (OLAP), and SQL databases like MySQL and Postgres workloads. The highlssd VM types include six different VM shapes and the Z3 bare metal instance. They offer about 400 GiB of local SSD per vCPU and are optimized for distributed databases, data streaming, large parallel file systems and data search.

What our customers and partners are saying

"We are thrilled to announce Nutanix Cloud Clusters coming to Google Cloud at the end of CY25 as part of Nutanix’s commitment to delivering flexible, hybrid cloud solutions. Google Cloud’s Z3 instance types represent a perfect foundation for Nutanix to enable performance and resilience for enterprise applications. We’re excited about our partnership with Google Cloud in empowering our joint customers with greater choice and simplicity in their cloud journey." - Saveen Pakala, Vice President of Product Management, Nutanix

“OP Labs contributes to the Optimism protocol, which enables orders of magnitude of improved performance and scalability for Ethereum. Z3 reduces p99 block insertion tail latencies by 30-50% for our most I/O-demanding blockchain nodes compared to N2. By migrating our solution to Z3, we will be able to scale our blockchain nodes to handle L2 state growth in a more performant and cost-effective way.” - Zach Howard Senior Staff Engineer, OP Labs

The launch of Google Cloud's Z3 storage optimized instances with smaller VM shapes represents a leap forward in performance for high-traffic NoSQL environments. In internal tests and customer projects, ScyllaDB has impressively leveraged the advantages of Z3 including extremely low latencies under high read and write loads, high IOPS capacity enabling the processing of massive amounts of data and excellent cost-performance ratio for large-scale production systems. We are very excited to offer Z3 family servers in ScyllaDB Cloud, including Bring Your Own Account (BYOA)." - Avi Kivity, Co-founder and CTO, ScyllaDB

"Shopify has found Z3s to be an excellent platform to build our most performance sensitive storage systems on. We experienced a critical need for both large data volumes while remaining sensitive to latency and throughput on the storage side. While Google has a lot of options, local SSD was really the best fit, and Z3s allowed us to achieve the best price/performance along with enhanced stability appropriate for a source of truth Storage workload. Right now we see these storage optimized VMs as our platform of choice for the future." - Mattie Toia, VP Infrastructure, Shopify

"Tenderly is built to be your go-to for Web3 production and development, bringing all the necessary infrastructure into one place. This allows teams to operate with speed and confidence, making blockchain technology accessible. We've seen impressive results running blockchain workloads on Z3 instances, with a 40% improvement on read latency compared to N2 and N2D instances." - Ilija Petrovic, SRE Lead, Tenderly

“The VAST AI Operating System gives organizations a unified platform to reason over all of their data – structured, unstructured, and streaming through a global namespace that spans cloud and on-prem environments – enabling intelligent agents and applications to operate with full context and real-time speed. ,For customers running on Google Cloud, Z3 VMs complement this vision by providing the ideal storage infrastructure to accelerate these workloads, ensuring AI pipelines run fast and scale effortlessly in the cloud.” - Renen Hallak, Founder & CEO, VAST Data

Z3 VMs are also the physical foundation of AlloyDB, our flagship PostgreSQL-compatible database service, delivering sophisticated multi-level caching. AlloyDB uses Z3's expansive local SSDs as an ultra-fast cache, holding datasets up to 25x larger than can be stored in memory. Database queries can access these large, cached datasets at latencies that closely approach in-memory performance, particularly when factoring in overall end-to-end application response times. This is a significant advantage for very large databases, including real-time analytical workloads, as AlloyDB’s high-performance columnar engine operates entirely within this massive cache. AlloyDB on Z3 VMs will soon be available in preview, delivering up to 3x better performance than N-series VMs for transactional workloads, particularly for large datasets.

Enhanced maintenance experience

Z3 instances make it easier for you to plan ahead and schedule maintenance operations at a time of your choosing by providing notice from the system several days in advance of a required maintenance. The new Z3 VMs further enhance the maintenance experience by allowing you to live-migrate an instance during maintenance events for VMs with 18 TiB or less of local SSD storage. For Z3 VMs with 36 TiB of local SSD and for Z3 bare metal instances, you’ll also receive in-place upgrades that preserve your data through the planned maintenance events.

Support for Hyperdisk

Z3 VMs support Hyperdisk, Google Cloud’s workload-optimized block storage that lets you optimize the performance for each workload by independently tuning the storage performance and capacity for each instance.

Z3 VMs are compatible with Hyperdisk Balanced, Hyperdisk Throughput, and Extreme Hyperdisk storage for scalable, high-performance network-attached storage, supporting up to 512 TiB of capacity per instance. For general-purpose workloads, Hyperdisk Balanced, with up to 160K IOPS per instance, offers a mix of performance and cost-efficiency. Hyperdisk Extreme delivers ultra-low latency and supports up to 350K IOPS and 5,000 MiB/s throughput per Z3 VM instance and up 500K IOPS and 10,000 MiB/s throughput for the Z3 bare metal instance — making it well-suited for demanding workloads like databases. Using Hyperdisk for persistent storage and Z3 Local SSD for caching creates an optimal storage architecture for high end databases and mission critical workloads

Get started with Z3 today

Z3 VMs and bare metal instances are available today in most regions worldwide. To start using Z3 instances, select Z3 under the new Storage-Optimized machine family when creating a new VM or GKE node pool in the Google Cloud console. Learn more at the Z3 machine series page. Contact your Google Cloud sales representative for more information on regional availability.

^{1. Results are based on Google Cloud’s internal benchmarking}

Automate data resilience at scale with Eon and Google Cloud Backup

Wed, 18 Jun 2025 16:00:00 +0000

Cloud backups were once considered as little more than an insurance policy. Now, your backups should do more! They should be autonomous, cost-efficient, and analytics-ready by default.

That’s why Eon built a platform purposefully aligned with Google Cloud to eliminate backup blind spots, simplify recovery, and unlock the value inside backup data without requiring teams to become policy experts or infrastructure wranglers.

Still, no matter what platform you use, it’s critical to understand what resilient cloud backup looks like and how to get there with Google Cloud’s native capabilities.

What makes cloud backup resilient?

Before diving into tooling, it's worth asking: What does a resilient backup strategy look like in the cloud? In our work with Google Cloud users across industries, we’ve found five common criteria:

5 signs your backup posture may be at risk

You can’t easily see what’s backed up (or not)
Retention policies vary across projects and teams
Data is duplicated or stored inefficiently, driving up spend
Cloud ransomware protection is reactive rather than policy-driven
Recovery requires full restores even when you only need one object

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f806e6ab610>), ('btn_text', 'Get started for free'), ('href', 'https://console.cloud.google.com/freetrial?redirectPath=/welcome'), ('image', None)])]>

Best practices for data protection

Google Cloud provides foundational capabilities to protect your data if you configure and use them consistently. Here's how to maximize native protection:

1. Versioning and retention: first lines of defense

Enable Object Versioning in Cloud Storage to retain multiple object versions, making it easier to recover from accidental deletions. Pair this with Retention Policies to enforce minimum storage lifetimes for regulatory or critical datasets.

Tip: Use Bucket Lock for write-once-read-many (WORM) protection in the areas where compliance matters most.

2. Monitor for gaps in coverage

Use native services like Cloud SQL backups, GKE snapshots, and Persistent Disk images, but be mindful that backup responsibilities can fall to different teams. Without centralized visibility, coverage becomes inconsistent.

Tip: Use Cloud Asset Inventory or scheduled BigQuery queries to audit coverage.

3. Design for granular recovery

Plan for partial restores since not everything needs a full rollback. Whether it's a single BigQuery table or a specific Cloud Storage object, restoring only what you need saves time and cost.

Tip: Use Object Lifecycle Management to automatically transition older or less critical Cloud Storage objects to colder storage classes.

Watch out New Way to Cloud interview with Eon CTO and co-founder Ron Kimchi.

Automating the complexity away

Managing cloud backup at scale is hard to do manually. From onboarding new workloads to applying consistent policies, human-led approaches don’t scale well.

That’s why more teams are exploring autonomous Cloud Backup Posture Management (CBPM) solutions, like Eon, that detect new assets in real time, apply smart backup rules automatically, and enforce consistent protection across environments.

With Eon, you don’t have to tag resources or write custom scripts. Our platform classifies and protects your Google Cloud assets out of the box—whether you're working with GKE, Cloud SQL, BigQuery, or another solution.

From backups to business insights

Traditionally, backup data was siloed, underused, and only meant to be retrieved in emergencies. But, increasingly, teams are unlocking that data to:

Run analysis directly on backups using BigQuery and Dataproc,
Feed training and monitoring pipelines via Vertex AI,
Deliver audit-ready dashboards with Looker, powered by backup snapshots.

With Eon, this is built-in. We transform backups into zero-ETL data lakes that reduce pipeline costs and provide immediate access to structured data with no reprocessing required.

What a “mature” backup posture looks like

The end goal for many cloud-native teams is not just to “have backups.” It’s to develop a resilient, intelligent backup strategy that adapts to scale and risk.

Here’s what that looks like:

Automated discovery of new resources
Policy-driven protection tailored to data type and criticality
Immutable backups with time-locked retention
Search-first recovery instead of full snapshot restores
Cost-aware tiering and storage deduplication

Eon helps Google Cloud users reach this level of maturity faster without the burden of custom tooling or constant policy updates.

Ready to simplify backup?

If your team spends hours managing scripts, storage tiers, or backup tags across cloud environments, it may be time to rethink your approach.

Eon was built to make cloud backup resilient, autonomous, and actually useful. From ransomware protection to instant, object-level recovery—and now, zero-ETL access to analytics—we’re here to help you unlock the full potential of your backup data.

Book a demo to see how Eon can modernize your Google Cloud data protection strategy.

To discover how Google Cloud can support your startup, visit our program page. You can also sign up for our newsletter to stay informed about community activities, digital events, special offers, and more.

Enhancing backup vaults with support for Persistent Disk, Hyperdisk, and multi-regions

Tue, 17 Jun 2025 17:30:00 +0000

August 11, 2025: Backup vault support for persistent disk (PD) and hyperdisk backups is now all generally available.

To help protect against evolving digital threats like ransomware and malicious deletions, last year, we introduced backup vault in the Google Cloud Backup and DR service, with support for Compute Engine VM backups. This provided immutable and indelible backup capabilities for mission-critical VMs, for both VM metadata and all their attached disks.

Today, we're announcing two enhancements to backup vaults that can help you protect more types of workloads, better:

Backup vaults now support standalone Persistent Disk (PD) and Hyperdisk backups. Now generally available, it enables the direct backup of data on individual disks, providing a granular alternative to backing up the entire virtual machine.
Backup vaults can now be created in multi-region locations. Now generally available it supports regional data resilience and helping to meet business continuity requirements.

Immutability and indelibility

Traditional backups have a well-known vulnerability. If a malicious actor gains access to your environment, if they attempt to delete or corrupt the backup, preventing recovery and thus causing business loss, there is nothing preventing this from happening. This is where backup vaults fundamentally change the game.

A backup vault provides a secure, isolated storage environment in Google-managed projects that helps ensure your backups are immutable (secured against data modification) and indelible (secured against data deletion), providing protection against cyber attacks such as ransomware. When creating a backup vault, you can specify that vaulted backups must be secured against modification and deletion — even by a backup administrator who would traditionally have the ability to expire backups — until the specified minimum enforced retention timeframe has elapsed.

Once a backup is stored in a vault, it's logically air-gapped from your Google Cloud project, and cannot be changed during its user-defined enforced retention period. This means:

No deletion: The backup can’t be accidentally or deliberately deleted before its enforced retention period expires.
No alteration: The backup data cannot be changed, and remains exactly as it was when it was created.

This gives you the confidence that your crucial recovery points have not been modified, so they are available when you need them.

Backup Vault now supports Persistent Disk and Hyperdisk

Many applications rely on the durable storage provided by Persistent Disk and Hyperdisk. With support for Persistent Disk and Hyperdisk in addition to Compute Engine VMs, backup vaults now offer a holistic defense strategy for your entire compute environment:

For your VMs: Backup vaults can help protect your Compute Engine VMs (including VM metadata and all the attached disks). They can provide rapid and secure recovery of operating systems, configurations, application binaries, and all associated disks.
For critical data disks: Now you can secure specific Persistent Disks and Hyperdisks that contain application data, databases, and file shares. They can provide granular protection, for scenarios where a full VM backup isn't necessary, or you want to optimize costs.

This integrated approach ensures that whether you need to restore an entire VM or a specific disk, your recovery points are secured in a backup vault.

Key benefits of unified backup vault protection

By centralizing your Compute Engine VM, Persistent Disk, and Hyperdisk backups within backup vaults, you gain a powerful suite of advantages that transform your data protection strategy from reactive to proactively resilient:

Unified interface for easy management: Easily define and enforce consistent backup policies (including backup frequency and retention period) across your entire organization. Manage backups for your Compute Engine VMs, Persistent Disks, and Hyperdisks from a unified interface, even across multiple Google Cloud projects, simplifying administration.
Comprehensive monitoring and reporting: Benefit from centralized monitoring, detailed reporting, and timely alerting capabilities that streamline your day-to-day backup management. This enhanced visibility also significantly aids in meeting stringent audit and compliance requirements by providing clear, verifiable records of your backup posture.
Proactive security integration: Elevate your overall security posture with integration to Security Command Center, enabling proactive detection of anomalous activities, such as unauthorized backup deletion attempts or suspicious policy changes, so you can respond swiftly and decisively to threats.
Reduced operational complexity: Consolidate your backup management processes, moving away from disparate, script-based, or manual solutions. Backup and DR service provides a streamlined, fully managed service that simplifies operations, reduces human error, and frees up valuable IT resources, so you can focus on innovation.

Here's how it works

Create a backup vault: Begin by establishing a secure backup vault. This vault acts as your designated, isolated, and highly protected storage destination for all your managed backups.
Define a backup plan: Next, create a comprehensive backup plan, specifying parameters such as the desired backup frequency (how often your disks will be backed up), backup retention period, and designating the specific backup vault where the backup data will be stored.
Schedule your backups: Now you are ready to apply your backup plan to your desired Persistent Disks or Hyperdisks. The Backup and DR service automatically takes incremental crash-consistent backups according to your defined schedule, with no manual intervention on your part.

Once these backups are created and stored in your designated vault, the vault’s enforced retention policy is automatically applied, making the backups immutable and indelible for the specified enforced retention period.

Secure disaster recovery with multi-region backup vaults

In addition, you can now create backup vaults in Google-managed, multi-region locations. When using a multi-region backup vault, data is stored in more than one geographic region, thereby providing the security benefits of backup vault, while also making critical backup data available during unforeseen events.

Using multi-region backup vaults lets you:

Retain data access: Maintain accessibility and recoverability of critical backup data during a regional service disruption (such as natural disasters, power outages).
Satisfy business continuity requirements: Instill confidence in your business operations with your ability to perform on-demand, backup-based recoveries.
Secure your data: Retain all of the critical security benefits delivered by backup vaults.

Multi-region backup vault storage is generally available and currently supports Compute Engine full VM backups and disk backups to supported Locations. Complete this form to request access to the new feature.

Protect all your critical Compute Engine data

With the addition of multi-region backup vaults and disk-level backup, Backup and DR service can secure and recover critical Compute Engine data better than ever. Try the new capabilities yourself to optimize your VM data protection strategy.

To learn more about disk backup, start here.
To learn more about multi-region backup vaults, start here.
To request access to use multi-region backup vaults, please complete this form.
See here for pricing information relating to the new capabilities.

Selecting the right Hyperdisk block storage for your workloads

Wed, 11 Jun 2025 16:00:00 +0000

As you adopt Google Cloud or migrate to the latest Compute Engine VMs or to Google Kubernetes Engine (GKE), selecting the right block storage for your workload is crucial. Hyperdisk, Google Cloud's workload-optimized block storage that’s designed for our latest VM families (C4, N4, M4, and more), delivers high-performance storage volumes that are cost-efficient, easily managed at scale, and enterprise-ready. In this post, we guide you through the basics and help you choose the optimal Hyperdisk for your environment.

Introduction to Hyperdisk block storage

With Hyperdisk, you can independently tune capacity and performance to match your block storage resources to your workload. Hyperdisk is available in a few flavors:

Hyperdisk Balanced: Designed to fit most workloads and offers the best combination and balance of price and performance. This is also the boot disk for your compute instances. With Hyperdisk Balanced, you can independently configure the capacity, throughput, and IOPS of each volume. Hyperdisk Balanced is available in High Availability and Multi-writer mode.
Hyperdisk Extreme: Delivers the highest IOPS of all Hyperdisk offerings and is suited for high-end, performance-critical databases. With Hyperdisk Extreme, you can drive up to 350K IOPS from a single volume.
Hyperdisk Throughput: Delivers capacity at the cost of cold object storage with the semantics of a disk. Hyperdisk Throughput offers high throughput for bandwidth and capacity-intensive workloads that do not require low latency. It also can be used to deliver cost-effective disks for cost-sensitive workloads (e.g., cold disks).
Hyperdisk ML: Purpose-built for loading static data into your compute clusters. With Hyperdisk ML, you hydrate the disk with a fixed data set (such as model weights or binaries), then connect up to 2,500 compute instances to the same volume, so a single volume can serve over 150x more compute instances than competitive block storage volumes¹ in read-only mode. You get exceptionally high aggregate throughput across all of those nodes, enabling you to accelerate inference startup, train models faster, and ensure your valuable compute resources are highly utilized.

You can also leverage Hyperdisk Storage Pools, which lowers TCO and simplifies operations by pre-provisioning an aggregate amount of capacity and performance, which is then dynamically consumed by volumes in that pool. You create a storage pool with the aggregate capacity and performance that your workloads will need, and then create disks in the storage pool. You can then attach the disks to your VMs. When you create the disks, you can create them with a much larger size or provisioned performance limit than is needed. This simplifies planning and provides room for growth later, without needing to change the disk's provisioned size or performance.

You can also use a set of comprehensive data protection capabilities such as high availability, cross-region replication and recovery, backup, and snapshots to protect your business critical workloads.

For specifics around capabilities, capacity, machine support, and performance, please visit the documentation.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f806e1d7a30>), ('btn_text', 'Get started for free'), ('href', 'https://console.cloud.google.com/freetrial?redirectPath=/welcome'), ('image', None)])]>

Recommendations for the most common workloads

To make choosing the right Hyperdisk architecture simpler, here are high-level recommendations for some of the most common workloads we see. For an enterprise, the Hyperdisk portfolio lets you optimize an entire three-tier application matching the needs of each component of your application to the different flavors of Hyperdisk.

Enterprise applications including general-purpose databases:

Hyperdisk Balanced combined with Storage Pools offers an excellent solution for a wide variety of general-purpose workloads, including common database workloads. Hyperdisk Balanced can meet the IOPS and throughput needs for most databases including Clickhouse, MySQL, and PostgreSQL, at general-purpose pricing. Hyperdisk Balanced offers 160K IOPS per volume — better than AWS EBS gp3 volumes². With Storage Pools you can enhance efficiency and radically simplify planning. Storage Pools allows customers to save approximately 20-40% on storage costs for typical database workloads when compared to Hyperdisk Balanced Volumes or AWS EBS gp3 volumes³.

“At Sentry.io, a platform used by over 4 million developers and 130,000 teams worldwide to quickly debug and resolve issues, adopting Google Cloud's Hyperdisk has enabled us to create a flexible architecture for the next-generation of our Event Analytics Platform, a product at the core of our business. Hyperdisk Storage Pools with advanced capacity and performance enabled us to reduce our planning cycles from weeks to minutes, while saving 37% in storage costs, compared to persistent disks.” - Dave Rosenthal, CTO, Sentry

“High Availability is essential for Blackline — we run database failover clustering, at massive scale, for our global and mission-critical deployment of Financial Close Management. We are excited to bring this workload to Google Cloud leveraging Hyperdisk Balanced High Availability to meet the performance, capacity, cost efficiency, and resilience requirements that our customers demand, and helps us address our customer’s financial regulatory needs globally.” - Justin Brodley, SVP of Cloud Engineering and Operations, Blackline

Tier-0 databases

For high-end, performance-critical databases like SAP HANA, SQL Server, and Oracle Database, Hyperdisk Extreme delivers uncompromising performance. With Hyperdisk Extreme, you can obtain up to 350K IOPS and 10 GiB/s of throughput from a single volume.

AI, analytics, and scale-out workloads

Hyperdisk offers excellent solutions for the most demanding next-generation machine learning and high performance computing workloads.

Dynamically scaling AI and analytics workloads and high-performance file systems

Workloads with fluctuating demand, and high peak throughput and IOPS, benefit from Hyperdisk Balanced and Storage Pools. These workloads can include customer-managed parallel file systems and scratch disks for accelerator clusters. Storage Pools’ dynamic resource allocation helps ensure that these workloads get the performance they need during peak times without requiring constant manual adjustments or inefficient over-provisioning. Further, once your Storage Pool is set up, planning at the per-disk level is significantly simpler. Note: If you want a fully managed file system, Managed Lustre is an excellent option for you to consider.

“Combining our use of cutting-edge machine learning in quantitative trading at Hudson River Trading (HRT) with Google Cloud's accelerator-optimized machines, Dynamic Workload Scheduler (DWS) and Hyperdisk has been transformative in enabling us to develop [state-of-the-art] models. Hyperdisk storage pools have delivered substantial cost savings, lowering our storage expenses by approximately 50% compared to standard Hyperdisk while minimizing the amount of planning needed.” - Ragnar Kjørstad, Systems Engineer, Hudson River Trading

AI/ML and HPC data-load acceleration

Hyperdisk ML is specifically optimized for accelerating data load times for inference, training and HPC workloads — Hyperdisk ML accelerates model load time by 3-5x compared to common alternatives⁴. Hyperdisk ML is particularly well-suited for serving tasks compared to other storage services on Google Cloud because it can concurrently provide to many VMs exceptionally high aggregate throughput (up to 1.2 TiB/s of aggregate throughput per volume, offering greater than 100x higher performance than competitive offerings)⁵. You write once (up to 64 TiB per disk) and attach multiple VM instances to the same volume in a read-only mode. With Hyperdisk ML you can accelerate data load times for your most expensive compute resources, like GPUs and TPUs. For more, check out g.co/cloud/storage-design-ai.

“At Resemble AI, we leverage our proprietary deep-learning models to generate high-quality AI audio through text-to-speech and speech-to-speech synthesis. By combining Google Cloud’s A3 VMs with NVIDIA H100 GPUs and Hyperdisk ML, we’ve achieved significant improvements in our training workflows. Hyperdisk ML has drastically improved our data loader performance, enabling 2x faster epoch cycles compared to similar solutions. This acceleration has empowered our engineering team to experiment more freely, train at scale, and accelerate the path from prototype to production." - Zohaib Ahmed, CEO, Resemble AI

High-capacity analytics workloads:

For large-scale data analytics workloads like Hadoop and Kafka, which are less sensitive to disk latency fluctuations, Hyperdisk Throughput provides a cost-effective solution with high throughput. Its low cost per GiB and configurable throughput are ideal for processing large volumes of data with low TCO.

How to size and set up your Hyperdisk

To select and size the right Hyperdisk volume types for your workload, answer a few key questions:

Storage management. Decide if you want to manage the block storage for your workloads in a pool or individually. If your workload will have more than 10 TiB of capacity in a single project and zone, you should consider using Hyperdisk Storage Pools to lower your TCO and simplify planning. Note that Storage Pools do not affect disk performance; some data protection features such as Replication and High Availability are not supported in Storage Pools.
Latency. If your workload requires SSD-like latency (i.e., sub-millisecond), it likely should be served by Hyperdisk Balanced or Hyperdisk Extreme.
IOPS or throughput. If your application requires less than 160K IOPS or 2.4 GiB/s of throughput from a single volume, Hyperdisk Balanced is a great fit. If it needs more than that, consider Hyperdisk Extreme.
Sizing performance and capacity. Hyperdisk offers independently configurable capacity and performance, allowing you to pay for just the resources you need. You can leverage this capability to lower your TCO by understanding how much capacity your workload needs (i.e., how much data, in GiB or TiB, is stored on the disks which serve this workload) and the peak IOPS and throughput of the disks. If the workload is already running on Google Cloud, you can see many of these metrics in your console under “Metrics Explorer.”

Another important consideration is the level of business continuity and data protection required for your workloads. Different workloads have different Recovery Point Objective (RPO) and Recovery Time Objective (RTO) requirements, each with different costs. Think about your workload tiers when making data-protection decisions. The more critical an application or workload, the lower the tolerance for data loss and downtime. Applications critical to business operations likely require zero RPO and RTO in the order of seconds. Hyperdisk business continuity and data protection helps customers meet the performance, capacity, cost efficiency, and resilience requirements they demand, and helps them address their financial regulatory needs globally.

Here are a few questions to consider when selecting which variety of Hyperdisk to use for a workload:

How do I protect my workloads from attack and malicious insiders? Use Google Cloud Backup vault for cyber resilience, backup immutability, and indelibility for managed backup reporting and compliance. If you want to self-manage your own backups, Hyperdisk standard snapshots are an option for your workloads.
How do I protect data from user errors and bad upgrades cost efficiently with low RPO / RTO? You can use our point-in-time recovery with Instant Snapshots. This feature minimizes the risk of data loss from user error and bad upgrades with ultra-low RPO and RTO — creating a checkpoint is nearly instantaneous.
How do I easily deploy my critical workload (e.g., MySQL) with resilience across multiple locations? You can utilize Hyperdisk HA. This is a great fit for scenarios that require high availability and fast failover, such as SQL Server that leverages failover clustering. For such workloads, you can also choose our new capability with Hyperdisk Balanced High Availability with Multi-Writer support. This allows you to run clustered compute with workload-optimized storage in two zones with RPO=0 synchronous replication.
When a disaster occurs, how do I recover my workload elsewhere quickly and reliably, and run drills to confirm my recovery process? Utilize our disaster recovery capabilities with Hyperdisk Async Replication which enables cross-region continuous replication and recovery from a regional failure, with fast validation support for disaster recovery drills via cloning. Further, consistency group policies help ensure that workload data that’s distributed across multiple disks is recoverable when a workload needs to fail over between regions.

In short, Hyperdisk provides a wealth of options to help you optimize your block storage to the needs of your workloads. Further, selecting the right Hyperdisk and leveraging features such as Storage Pools can help you lower your TCO and simplify management. To learn more, please visit our website. For tailored recommendations, always consult your Google Cloud account team.

_{1. As of March 2025 based on published information for Amazon EBS, Azure managed disks.
2. As of May 2025, compared to Amazon EBS gp3 volumes max iops/volume
3. As of March 2025, at list price, 50 to 150 TiB, peak IOPS of 25K to 75K and 25% compressibility, compared to Amazon EBS gp3 volumes.
4. As of March 2025, based on internal Google benchmarking, compared to Rapid Storage, GCSFuse with Anywhere Cache, Parallelstore and Lustre for larger node sizes.
5. As of March 2025 based on published performance for Microsoft Azure Ultra SSD and Amazon EBS io2 BlockExpress}

^{The authors would like to thank David Seidman and Ruwen Hess for their contributions on this blog.}