<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Systems</title><link>https://cloud.google.com/blog/topics/systems/</link><description>Systems</description><atom:link href="https://cloudblog.withgoogle.com/blog/topics/systems/rss/" rel="self"></atom:link><language>en</language><lastBuildDate>Wed, 22 Apr 2026 13:24:46 +0000</lastBuildDate><image><url>https://cloud.google.com/blog/topics/systems/static/blog/images/google.a51985becaa6.png</url><title>Systems</title><link>https://cloud.google.com/blog/topics/systems/</link></image><item><title>Introducing Virgo Network, Google’s scale-out AI data center fabric</title><link>https://cloud.google.com/blog/products/networking/introducing-virgo-megascale-data-center-fabric/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The AI era requires a fundamental rethink of physical cloud architecture — networking, in particular. With foundational model parameters growing exponentially, traditional general-purpose networks are reaching their breaking points. To fuel the next decade of machine learning, Google designed Virgo Network, a new megascale AI data center fabric that embraces a "campus-as-a-computer" philosophy, and that underpins our &lt;/span&gt;&lt;a href="https://cloud.google.com/solutions/ai-hypercomputer?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Legacy network designs simply cannot handle some of the constraints of modern AI:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Massive scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Training demands now exceed the power and space of a single data center, requiring unified, multi-data-center domains.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Explosive bandwidth growth:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Because foundational model training is heavily network-bound, the required bandwidth per accelerator has surged significantly over the last few years, creating throughput and congestion bottlenecks for older architectures.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Synchronized bursts:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Intense, millisecond-level traffic spikes (figure 1) put immense pressure on network buffers. The outcome is that even a single "straggler" node can throttle the entire cluster’s performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Low latency: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;ML serving requires fast, consistent response times to deliver real-time inference, making strict latency control a critical architectural constraint.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Sub-millisecond_line-rate_bursts_of_an_A.max-1000x1000.png"
        
          alt="1 Sub-millisecond line-rate bursts of an AI training workload"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="s0df9"&gt;Figure 1: Sub-millisecond line-rate bursts of an AI training workload&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Reimagining the data center network&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Meeting the demands of the AI era requires a fundamental shift away from general-purpose network design towards a specialized flat, low-latency network architecture. To address the unique scale and latency constraints, we leverage our proven &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/networking/speed-scale-reliability-25-years-of-data-center-networking?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Jupiter&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; network for north-south traffic and are introducing a new fabric for east-west communication. The resulting architecture consists of three distinct and specialized layers that operate as one unified compute domain:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scale-up domain:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A high-bandwidth, low-latency interconnect fabric designed for tightly coupled communication between accelerators within a single pod. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scale-out accelerator fabric (east-west):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A dedicated accelerator-to-accelerator remote direct memory access (RDMA) fabric optimized for massive horizontal scale across pods. This layer is engineered for deterministic latency and maximum resilience, to provide high “&lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/goodput-metric-as-measure-of-ml-productivity"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;goodput&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;” for the ML workload.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Jupiter front-end network (north-south):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A high-capacity fabric that provides fast, reliable access to distributed storage and general-purpose compute resources. It ensures that data access does not become a bottleneck for training and serving workloads, and is also used to scale-across multiple sites for very large training runs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This architectural decoupling provides key strategic advantages:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Independent evolution:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We can evolve and upgrade each network domain independently, preventing system-wide disruptions while accelerating the innovation cycle. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Dedicated scale-out bandwidth:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A non-blocking network delivers massive bisectional bandwidth to accelerators for critical training tasks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;ML and network co-design:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The network is built in lockstep with each new generation of ML accelerators, helping ensure the fabric is matched to the hardware it supports.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Data_center_network_architecture.max-1000x1000.png"
        
          alt="2 Data center network architecture"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="s0df9"&gt;Figure 2: Data center network architecture&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Introducing Virgo Network: Megascale data center fabric&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Virgo Network is a scale-out fabric designed for the extreme requirements of modern AI workloads. Built on high-radix switches that reduce network layers by allowing more ports per switch, it employs a flat, two-layer non-blocking topology. Compared with traditional datacenter networks, this significantly reduces latency by minimizing network tiers. It features a multi-planar design with independent control domains to connect accelerators (figure 3). The accelerator racks also connect with the Jupiter north-south fabric to access compute and storage services. Together, this streamlined architecture delivers the massive bisection bandwidth and deterministic low latency necessary for both distributed training and serving workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_Megascale_data_center_fabric_Virgo_Netwo.max-1000x1000.png"
        
          alt="3 Megascale data center fabric (Virgo Network)"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="s0df9"&gt;Figure 3: Megascale data center fabric (Virgo Network)&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Virgo Network is the foundation of our next-generation accelerator designs and delivers the following advantages:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Massive fabric scale&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Virgo Network can link 134,000 chips (TPU 8t) with up to 47 petabits/sec of non-blocking bi-sectional bandwidth in a single fabric.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Generational performance leap&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: With up to 4x the bandwidth per accelerator (TPU 8t) over the previous generation, Virgo Network delivers the bandwidth you need to get the full power of every chip. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Predictable low latency&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Virgo Network delivers 40% lower unloaded fabric latency for TPUs compared to previous generation leading to more predictable performance for latency sensitive AI workloads.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Improving reliability at scale&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In a system supporting hundreds of thousands of chips, hardware failures are inevitable. Because a single faulty component can disrupt a synchronized training job, reliability at scale is a primary focus. To maximize workload goodput, we designed the Virgo Network architecture around fault isolation, deep observability, and the rapid mitigation of hangs and stragglers.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At this scale, system-wide resilience requires a solid network foundation. Virgo Network integrates independent switching planes that provide robust fault isolation, protecting cluster-wide goodput from being degraded by localized hardware failures.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_How_fail-stop_and_fail-slow_impact_MTTR.max-1000x1000.png"
        
          alt="4 How fail-stop and fail-slow impact MTTR"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="s0df9"&gt;Figure 4: How fail-stop and fail-slow impact MTTR&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Building on this foundation, we optimize the software and orchestration stack to maximize mean-time between interruptions (MTBI) and minimize mean-time to recovery (MTTR) through two primary areas:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Observability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Reliability at scale requires high-fidelity visibility. We use sub-millisecond telemetry to monitor network systems. This deep visibility allows us to detect transient congestion, optimize buffer management, and pinpoint the root causes of slowdowns across the hardware and software stack.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Identifying stragglers and hangs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Proactive monitoring is critical for identifying nodes that are experiencing performance degradation (stragglers) or that have stopped responding completely (hangs). By rapidly localizing these bottlenecks, with automated &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/stragglers-in-ai-a-guide-to-automated-straggler-detection?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;straggler&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and newly added hang detection, we accelerate the training job and protect it from localized slowdowns.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;The foundation of the AI Hypercomputer&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Virgo Network is a reimagined scale-out data center network custom-built for the stringent demands of modern AI workloads. This flat, multi-planar architecture unifies accelerators across pods into a single compute domain, addressing the bandwidth and scale limitations of traditional networks. By providing robust fault isolation directly at the hardware level, Virgo Network serves as the foundation for system-wide resilience, protecting synchronized workloads from localized hardware faults. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ultimately, Virgo Network delivers the scale, predictable latency, and reliability necessary to accelerate the agentic AI era. To learn more about how we are building infrastructure for the future of AI, visit our&lt;/span&gt;&lt;a href="https://cloud.google.com/ai-infrastructure"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; AI infrastructure solutions page&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, explore the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/architecture/ai-ml"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;technical documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, or attend the dedicated breakout &lt;/span&gt;&lt;a href="https://www.googlecloudevents.com/next-vegas/session-library?session_id=3913087&amp;amp;name=how-google&amp;amp;" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;session&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; at Google Cloud Next.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 22 Apr 2026 12:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/networking/introducing-virgo-megascale-data-center-fabric/</guid><category>Infrastructure</category><category>AI &amp; Machine Learning</category><category>Google Cloud Next</category><category>Systems</category><category>Networking</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCN26_102_BlogHeader_2436x1200_Opt_2_Dark.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Introducing Virgo Network, Google’s scale-out AI data center fabric</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/GCN26_102_BlogHeader_2436x1200_Opt_2_Dark.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/networking/introducing-virgo-megascale-data-center-fabric/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Benny Siman-Tov</name><title>Senior Director Product Management</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Arjun Singh</name><title>Engineering Fellow</title><department></department><company></company></author></item><item><title>AI infrastructure efficiency: Ironwood TPUs deliver 3.7x carbon efficiency gains</title><link>https://cloud.google.com/blog/topics/systems/ironwood-tpus-deliver-37x-carbon-efficiency-gains/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;At Google, we are committed to being &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/sustainability/tpus-improved-carbon-efficiency-of-ai-workloads-by-3x?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;transparent about the environmental impact of our AI infrastructure&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, publishing metrics on the lifetime emissions of our chips — from manufacturing to powering these chips in the data center. Today, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;we are updating these metrics for our seventh-generation TPU, Ironwood, which demonstrates an approximately 3.7x improvement in Compute Carbon Intensity (CCI) compared to TPU v5p&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;the previous generation of performance-optimized TPUs&lt;/span&gt;.&lt;/span&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In other words, despite the fact that AI is driving demand for additional compute resources, our ongoing work to optimize AI hardware is helping to improve the energy consumption and emissions of AI workloads.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Measuring AI accelerator efficiency: Compute Carbon Intensity (CCI)&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To help manage the environmental impact of AI workloads, we monitor the Compute Carbon Intensity (CCI) of our AI accelerator hardware. CCI is defined in &lt;/span&gt;&lt;a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=11097303" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;An Introduction to Life-Cycle Emissions of Artificial Intelligence Hardware&lt;/span&gt;&lt;/a&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;sup&gt; &lt;/sup&gt;as the estimated amount of CO2 equivalent emitted for every utilized floating-point operation (CO2e/FLOP). This metric provides a holistic, chip-level view by including both the embodied emissions associated with manufacturing, transportation, and data center construction (Scope 3), as well as the operational emissions associated with running these chips in data centers (Scope 1 and 2).&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;The Ironwood advantage: high performance, low footprint&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google’s TPU CCI continues to improve with each chip generation. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Drawing from empirical data measured in January 2026, Ironwood demonstrates a remarkable 3.7x &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;improvement&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; in CCI relative to TPU v5p. This accelerates efficiency gains from the 1.2x CCI improvement of TPU v5p relative to TPU v4, and demonstrates continued carbon efficiency optimization of Google’s performance-optimized TPU architecture.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These efficiency gains are driven by outsized compute performance increases between TPU generations relative to growth in machine energy consumption and manufacturing emissions.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; In fact, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;fleetwide measurements demonstrate a 5x improvement in utilized FLOPs across generations, from TPU v5p to Ironwood.&lt;/span&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt; Because the performance denominator in our CCI equation (CO2e/FLOP) is scaling faster than emissions, the net carbon cost per operation drops significantly with every new chip.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Oan2vLj.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: center;"&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Figure 1: Ironwood’s accelerating CCI improvement measured on Google’s performance-optimized TPU cohort, considering January 2026 workloads.&lt;/span&gt;&lt;/span&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Operating Google’s TPU fleet more efficiently&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Updated TPU CCI metrics also offer a direct comparison to the measurement we published in 2025. Specifically, from October 2024 to January 2026, Google’s versatile TPU cohort ran more efficiently than what we reported previously:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;TPU v5e achieved a 43% reduction in total CCI over 15 months, dropping to 228 gCO2e/EFLOP. This was driven by a 72% increase in average utilization.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium, the sixth-generation TPU, saw a 20% reduction in total CCI over the same time period, bringing its emissions intensity down to 125 gCO2e/EFLOP.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_HRjRsFh.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: center;"&gt;&lt;sup&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Figure 2: Google’s versatile TPU cohort demonstrates deployment efficiency gains for the same TPU generations between October 2024 and January 2026.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span&gt;&lt;span style="vertical-align: baseline;"&gt;These results demonstrate that Google continues to improve the carbon-efficiency of our AI infrastructure. While the massive scale of AI demand requires a significant and growing amount of power, our innovations allow us to deliver substantially more compute performance for every unit of energy consumed.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Decoupling energy and emissions from performance&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To what can we attribute these improvements? Beyond Ironwood’s raw hardware capabilities, these CCI gains are further enabled by deep software and system-level optimizations across our infrastructure:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Software efficiency (MoE):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The widespread adoption of sparse architectures, such as Mixture of Experts (MoE), routes computation only to necessary parameters. This drastically reduces the active FLOPs required per inference or training step without sacrificing model capacity or quality.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Lower precision math (FP8):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By heavily leveraging 8-bit floating-point (FP8) formats, we effectively double compute throughput and halve memory bandwidth requirements compared to 16-bit formats. This shows that we can maintain output quality while exponentially decreasing the energy cost per mathematical operation.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Workload mix and intelligent scheduling:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Advanced fleet orchestration continuously balances the workload mix across our infrastructure. By intelligently scheduling tasks, we ensure high continuous utilization rates, optimize duty cycles, and minimize the carbon penalty of idle power draw.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Scale sustainably with Google Cloud&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI’s trajectory requires infrastructure that can scale exponentially without an equivalent surge in carbon emissions. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;The 3.7x carbon efficiency improvement from TPU v5p to Ironwood demonstrates that we can achieve greater compute density while minimizing the growth of our energy and environmental footprint through deliberate hardware and software codesign.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; To learn more and get started with Ironwood, register your interest with &lt;/span&gt;&lt;a href="https://cloud.google.com/resources/ironwood-tpu-interest?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;this form&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sub&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;1. Following the methodology published in an &lt;/span&gt;&lt;a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=11097303" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;August 2025 technical report&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, we quantified the full lifecycle emissions of TPU hardware as a point-in-time snapshot across Google’s generations of TPUs as of January 2026. The functional unit for this study is one AI computer deployed in the data center, which includes one or more accelerator trays (containing TPUs) connected to one host tray (i.e., a computing server). Peripheral components beyond the tray (e.g., rack, shelf, and network equipment) and auxiliary computing and storage resources are excluded from the calculation of embodied and operational emissions. We include the electricity used in data center cooling in operational emissions. To estimate operational emissions from electricity consumption of running workloads, we used a one month sample of observed machine power data from our entire TPU fleet, applying Google’s 2024 average fleetwide carbon intensity. To estimate embodied emissions from manufacturing, transportation, and retirement, we performed a life-cycle assessment of the hardware. Data center construction emissions were estimated based on Google’s disclosed 2024 carbon footprint. These findings do not represent model-level emissions, nor are they a complete quantification of Google’s AI emissions. Based on the TPU location of a specific workload, CCI results of specific workloads may vary.&lt;br/&gt;&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;2. The authors would like to thank and acknowledge the co-authors of this paper for their important contributions to enable these results: Ian Schneider, Hui Xu, Stephan Benecke, Parthasarathy Ranganathan, and Cooper Elsworth.&lt;br/&gt;&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;3. This comparison considers the utilized FLOPS (BF16) between deployed TPU v5p and Ironwood chips in Google’s fleet in January 2026. This trend is consistent with the improvement in peak FLOPS (BF16) between v5p (459 FLOPS) and Ironwood (2,307 FLOPS).&lt;br/&gt;&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;4.The GHG protocol offers two accounting standards for operational emissions. Results presented here consider market-based emissions, which includes the impact of carbon-free energy purchases. Location-based accounting, which excludes carbon-free energy purchases, would raise operational CCI to 793, 712, and 195 gCO2e/EFLOP, respectively. The ratio of CCI improvements would be at a similar level, and Ironwood’s embodied CCI would drop from 23% to 8% of its total CCI.&lt;br/&gt;&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;5. To ensure a fair comparison across varying TPU utilizations, this analysis replicates the propensity score weighting methodology from the &lt;/span&gt;&lt;a href="https://ieeexplore.ieee.org/iel8/40/11236092/11097303.pdf" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;August 2025 technical report&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; and compares January 2026 results to the results published in 2025. This statistical technique adjusts for duty cycle variations to balance the comparison of TPUs during a given time period. This empirical methodology results in small variations in calculated CCI between temporal periods, reflecting fluctuations in real-world energy consumption and hardware utilization across the global infrastructure. &lt;/span&gt;&lt;/sub&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 06 Apr 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/ironwood-tpus-deliver-37x-carbon-efficiency-gains/</guid><category>Compute</category><category>Sustainability</category><category>TPUs</category><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>AI infrastructure efficiency: Ironwood TPUs deliver 3.7x carbon efficiency gains</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/ironwood-tpus-deliver-37x-carbon-efficiency-gains/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Keguo (Tim) Huang</name><title>Senior Data Scientist, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>David Patterson</name><title>Google Distinguished Engineer, Google</title><department></department><company></company></author></item><item><title>Firefly: Illuminating the path to nanosecond-level clock sync in the data center</title><link>https://cloud.google.com/blog/products/networking/understanding-the-firefly-clock-synchronization-protocol/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;From the high-frequency trading floors of Wall Street to orchestrating cloud data centers, the ability to synchronize events with nanosecond accuracy is critical. Yet, achieving this level of temporal precision across thousands of interconnected devices in a modern data center is fraught with challenges like clock drift, network jitter, and path asymmetries. And doing so on cloud-hosted infrastructure has traditionally been impossible, preventing a certain class of applications from running there. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This is where Firefly, a clock synchronization system developed by researchers and engineers at Google, comes in. Firefly isn't just a clock synchronization protocol; it's a software-driven approach that combines theoretical insights and practical engineering to deliver ultra-accurate, scalable, and cost-effective time synchronization on commodity hardware within a demanding data center environment.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The nanosecond race: Why precise timing matters&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Precise clock synchronization is the foundation of distributed systems. It is non-negotiable in financial exchanges, where regulatory requirements mandate sub-100µs external synchronization to Coordinated Universal Time, or UTC, and fairness demands sub-10ns internal clock synchronization. In high-frequency trading, a minuscule timing advantage can translate to significant financial gains, making accurate timestamping critical for market integrity. Beyond finance, numerous data center operations, including database consistency, distributed logging, virtual machine management, and network telemetry, rely on accurate temporal ordering of events. And as data centers scale, the need for a robust, scalable synchronization solution becomes even more important.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;But achieving nanosecond-level synchronization in a dynamic data center environment is difficult. Several factors conspire to undermine precision:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Clock drift:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Crystal oscillators, which are fundamental to all clocks, have inherent imperfections that cause them to gradually deviate over time. Although these deviations were considered minor previously, they are substantial when targeting sub-10ns.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Jitter:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Network components such as switches and network interface cards (NICs) introduce unpredictable delays. These delays, often stemming from queuing in network buffers or the intricate processing of packets, can manifest as jitter, disrupting the timing of synchronization messages.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Asymmetry:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The network path between two devices is rarely symmetrical. Differences in cable lengths, the number of hops, or the internal workings of network equipment can cause signals to take different amounts of time to travel in opposite directions. This asymmetry can introduce significant errors when estimating one-way delays and clock offsets.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scalability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; As data centers expand to house tens of thousands of servers, any synchronization solution must be able to scale efficiently without becoming a bottleneck or requiring disproportionate resources.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fault tolerance:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; In a distributed system, failures are inevitable. A synchronization protocol must be resilient to the loss or misbehavior of individual nodes or network links, so that the overall synchronization accuracy is not compromised.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Firefly: Bridging software and theory&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Firefly uses a multi-faceted strategy to tackle these challenges, distinguishing itself from prior synchronization protocols. Its core innovations lie in its architectural design and its theoretical underpinnings.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1-architecture_v1.jpg"
        
          alt="1-architecture"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;1. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Layered synchronization:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Firefly employs a novel layered synchronization technique. Instead of relying on a central clock, which can be a single point of failure or introduce delays, it first establishes tight internal synchronization amongst NICs within the data center. Each NIC in the network constantly communicates with a set of its peers, comparing times and making adjustments. From this "swarm" of devices emerges a highly stable and accurate consensus time that the entire group agrees upon. This internal synchronization is rapid and robust, effectively shielding it from external timing disturbances. Concurrently, Firefly synchronizes the entire swarm to UTC. Decoupling of these two processes is crucial, as it prevents external factors like time-server jitter or drift from directly impacting internal synchronization.&lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;2. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Distributed consensus over Random graphs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Unlike traditional hierarchical approaches that can be brittle and susceptible to single points of failure, Firefly uses a distributed consensus algorithm built on a d-regular random graph. This means each NIC communicates with a randomly selected set of 'd' peers. Theoretical analysis, as presented in &lt;/span&gt;&lt;a href="https://dl.acm.org/doi/10.1145/3718958.3750502" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the Firefly research paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, demonstrates that such random graphs offer significant advantages:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Faster convergence: Random graphs promote a more rapid dissemination of clock information across the network, leading to quicker synchronization.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Scalability: The theoretical bounds show that random graphs can maintain synchronization accuracy even as the size of the network grows, provided the number of peers ('d') scales logarithmically with the total number of nodes.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Resilience to asymmetry: The diverse probing paths inherent in random graphs help to average out and mitigate the impact of path asymmetries.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;3. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Mitigating jitter and asymmetry in practice: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Beyond the theoretical advantages of random graphs, Firefly incorporates practical techniques to further refine accuracy:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;RTT filtering: By analyzing round-trip time (RTT) measurements, Firefly can identify and discard probe samples that are likely affected by queuing jitter, thereby improving the accuracy of delay estimations.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Path profiling: Firefly actively probes network paths to identify and favor those with minimal asymmetry. This proactive approach helps to select the most reliable paths for synchronization.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Leveraging hardware: Where available, Firefly can utilize features like &lt;/span&gt;&lt;a href="https://docs.commscope.com/bundle/fastiron-10010-managementguide/page/GUID-A2A87D89-1224-4694-817A-D91F70D5F850.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Transparent Clock (TC)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in network switches to accurately account for in-switch delays, further reducing measurement error.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;4. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Robustness and fault tolerance:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Firefly’s use of distributed consensus, combined with its averaging mechanisms, makes it inherently resilient to failures. By not relying on a single time server or a fixed hierarchical structure, the system can gracefully handle the loss or misbehavior of individual nodes.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Performance in the real world&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The results discussed in our &lt;/span&gt;&lt;a href="https://dl.acm.org/doi/10.1145/3718958.3750502" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firefly research paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are compelling:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Internal synchronization:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Firefly consistently achieves sub-10ns NIC-to-NIC synchronization when used in conjunction with Google's latest data center fabric technology. This can be used to determine order of events like packets, logs, remote procedure calls (RPCs) across machines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;External synchronization:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The system also delivers significantly better synchronization to UTC than the 100µs regulatory requirement for financial exchanges.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2-graph_h5KX17K.max-1000x1000.jpg"
        
          alt="2-graph"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="ry130"&gt;The offset between a pair of clocks that are six hops away in a Firefly-synced network, measured by an oscilloscope via 1 pulse per second.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The accompanying video illustrates the accuracy of NIC-to-NIC synchronization, as quantified by an oscilloscope utilizing a one-pulse-per-second (1PPS) signal from the NICs. Each row corresponds to a NIC clock, with the rising edge indicating the precise moment the NIC clock attains an integer second. The oscilloscope observations confirm that all measured NICs exhibit close synchronization, maintaining alignment within a few nanoseconds.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=KB3z34OO9QU"
      data-glue-modal-trigger="uni-modal-KB3z34OO9QU-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_GLx4Roj.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Firefly: Sub-10ns NIC-to-NIC clock synchronization for datacenters&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-KB3z34OO9QU-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="KB3z34OO9QU"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=KB3z34OO9QU"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These results are particularly impressive given that Firefly operates purely in software on commodity hardware, avoiding the need for expensive, specialized synchronization equipment. This makes ultra-accurate time synchronization accessible to a broader range of data center applications.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A foundation for future applications&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Firefly's success in delivering nanosecond-level accuracy in a scalable and cost-effective manner has far-reaching implications:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Democratizing high-precision timing: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Firefly allows cloud-hosted financial services that traditionally rely on expensive dedicated hardware, to achieve the required precision using standard cloud infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enabling new applications:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The availability of precise, synchronized clocks across data center devices can unlock new possibilities in areas like fine-grained network telemetry and congestion control, time-coordinated distributed systems, and deterministic fabric for ML workloads.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Transforming data center operations:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By creating a tightly integrated and precisely timed computing entity, Firefly can enhance data centers’ overall efficiency, reliability, and performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In conclusion, Firefly represents a significant advancement in the field of clock synchronization. By ingeniously combining theoretical insights into graph theory and consensus algorithms with practical network engineering techniques, it overcomes the long-standing challenges of achieving nanosecond-level precision in complex, distributed environments. As data centers continue to evolve, systems like Firefly will be instrumental in building the high-performance, reliable, and fair infrastructure of the future.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;2026 AI Agent Trends in Financial Services&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f9fdc2ea2e0&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Read it now.&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://cloud.google.com/resources/content/ai-agent-trends-financial-services-2026&amp;#x27;), (&amp;#x27;image&amp;#x27;, &amp;lt;GAEImage: FSI_Confirmation email_500x450&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;</description><pubDate>Mon, 23 Feb 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/networking/understanding-the-firefly-clock-synchronization-protocol/</guid><category>Infrastructure</category><category>Systems</category><category>Networking</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Firefly: Illuminating the path to nanosecond-level clock sync in the data center</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/networking/understanding-the-firefly-clock-synchronization-protocol/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Rohit Dalal</name><title>Product Manager, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Yuliang Li</name><title>Software Engineer</title><department></department><company></company></author></item><item><title>At Google, the future is multiarch; AI and automation are helping us get there</title><link>https://cloud.google.com/blog/topics/systems/using-ai-and-automation-to-migrate-between-instruction-sets/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Axion processors, our first custom Arm®-based CPUs, mark a major step in delivering both performance and energy efficiency for Google Cloud customers and our first-party services, providing up to 65% better price-performance and up to 60% more energy-efficient than comparable instances on Google Cloud. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We put Axion processors to the test: running Google production services. Now that our clusters contain both x86 and Axion Arm-based machines, Google's production services are able to run tasks simultaneously on multiple instruction-set architectures (ISAs). Today, this means&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; most binaries that compile for x86 now need to compile to both x86 and Arm at the same time &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;— no small thing when you consider that the Google environment includes over 100,000 applications! &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently published a preprint of a paper called "&lt;/span&gt;&lt;a href="https://arxiv.org/abs/2510.14928" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Instruction Set Migration at Warehouse Scale&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;" about our migration process, in which we analyze 38,156 commits we made to Google's giant &lt;/span&gt;&lt;a href="https://research.google/pubs/why-google-stores-billions-of-lines-of-code-in-a-single-repository/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;monorepo&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Google3. To make a long story short, the paper describes the combination of hard work, automation, and AI we used to get to where we are today. We currently serve Google services in production on Arm and x86 simultaneously including YouTube, Gmail, and BigQuery, and we have migrated more than &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;30,000 applications to Arm&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, with Arm hardware fully-subscribed and more servers deployed each month.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let's take a brief look at two steps on our journey to make Google multi-architecture, or ‘multiarch’: an &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;analysis of migration patterns, and exploring the use of AI in porting the code.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; For more, be sure to read the entire paper. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Migrating all of Google's services to multiarch&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Going into a migration from x86-only to Arm &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;and&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; x86, both the multiarch team and the application owners assumed that we would be spending time on architectural differences such as floating point drift, concurrency, intrinsics such as platform-specific operators, and performance.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At first, we migrated some of our top jobs like F1, Spanner, and Bigtable using typical software practices, complete with weekly meetings and dedicated engineers. In this early period, we found evidence of the above issues, but not nearly as many as we expected. It turns out modern compilers and tools like sanitizers have shaken out most of the surprises. Instead, we spent the majority of our time working on issues like:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;fixing tests that broke because they overfit to our existing x86 servers &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;updating intricate build and release systems, usually for our oldest and highest-traffic services&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;resolving rollout issues in production configurations&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;taking care to avoid destabilizing critical systems &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Moving a dozen applications to Arm this way absolutely worked, and we were proud to get things running on &lt;/span&gt;&lt;a href="https://research.google/pubs/large-scale-cluster-management-at-google-with-borg/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Borg&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, our cluster management system. As one engineer remarked, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Everyone fixated on the totally different toolchain, and [assumed] surely everything would break.  The majority of the difficulty was configs and boring stuff." &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;And yet, it's not sufficient to migrate a few big jobs and be done. Although &lt;/span&gt;&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44271.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;~60% of our running compute is in our top 50 applications&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, the curve of usage across the remaining applications in Google's monorepo is relatively flat. The more jobs that can run on multiple architectures, the easier it is for Borg to fit them efficiently into cells. For good utilization of our Arm servers, then, we needed to address this long list of the remaining 100,000+ applications. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The multiarch team could not effectively reach out to so many application owners; just setting up the meetings would have been cost-prohibitive! Instead, we have relied on automation, helping to minimize involvement from the application teams themselves.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Automation tools&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We had many sources of automation to help us, some of which we already used widely at Google before we started the multiarch migration. These include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://abseil.io/resources/swe-book/html/ch22.html" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Rosie&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;which lets us programmatically generate large numbers of commits and shepherd them through the code review process. For example, the commit could be one line to enable Arm in a job's Blueprint: "&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;arm_variant_mode = ::blueprint::VariantMode::VARIANT_MODE_RELEASE&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;"&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://github.com/google/sanitizers" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Sanitizers&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; and fuzzers, &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;which catch common differences in execution between x86 and Arm (e.g., data races that are hidden by x86's TSO memory model). Catching these kinds of issues ahead of time avoids non-deterministic, hard-to-debug behavior when recompiling to a new ISA.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Continuous Health Monitoring Platform (CHAMP), &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;which is a new automated framework for rolling out and monitoring multiarch jobs. It automatically evicts jobs that cause issues on Arm, such as crash-looping or exhibiting very slow throughput, for later offline tuning and debugging.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We also began using an AI-based migration tool called CogniPort — more on that below. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Analysis&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The 38,156 commits to our code monorepo constituted most of the commits across the entire ISA migration project, from huge jobs like Bigtable to myriad tiny ones. To analyze these commits, we passed the commit messages and code diffs into Gemini Flash LLM’s 1M token context window in groups of 100, generating 16 categories of commits in four overarching groups.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_MLZW4Y1.max-1000x1000.jpg"
        
          alt="image3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="c1b1y"&gt;Figure 1: Commits fall into four overarching groups.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="pfifc"&gt;Once we had a final list, we ran commits again through the model and had it assign one of these 16 categories to each of them (as well as an additional "Uncategorized'' category, which improved stability of the categorization by catching outliers).&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_DDGyjo7.max-1000x1000.jpg"
        
          alt="Screenshot 2025-10-21 at 11.19.29 AM"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="vfpaf"&gt;Figure 2: Code examples in the first two categories. More examples are available in the &lt;a href="https://arxiv.org/abs/2510.14928"&gt;paper&lt;/a&gt;.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="njd2h"&gt;Altogether, this analysis covered about 700K changed lines of code. We plotted the timeline of our ISA migration, normalized, as lines of code per day or month changed over time.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_go6bg5V.max-1000x1000.png"
        
          alt="image2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="x2sqj"&gt;Figure 3: CLs by category by time, normalized.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="cgy3a"&gt;As you can see, as we started our multiarch toolchain, the largest set of commits were in tooling and test adaptation. Over time, a larger fraction of commits were around code adaptation, aligned with the first few large applications that we migrated. During this phase, the focus was on updating code in shared dependencies and addressing common issues in code and tests as we prepared for scale. In the final phase of the process, almost all commits were configuration files and supporting processes. We also saw that, in this later phase, the number of merged commits rapidly increased, capturing the scale-up of the migration to the whole repository.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Commits_by_category_over_time_1200.max-1000x1000.png"
        
          alt="4 Commits by category over time 1200"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="2nodr"&gt;Figure 4: CLs by category by time, in raw counts.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="cgy3a"&gt;It’s worth noting that, overall, most commits related to migration are small. The largest commits are often to very large lists or configurations, as opposed to signaling more inherent complexity or intricate changes to single files.&lt;/p&gt;&lt;h3 data-block-key="9or0c"&gt;&lt;b&gt;Automating ISA migrations with AI&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="57hi7"&gt;Modern generative AI techniques represent an opportunity to automate the remainder of the ISA migration process. We built an agent called &lt;b&gt;CogniPort&lt;/b&gt; which aims to close this gap. CogniPort operates on build and test errors. If at any point in the process, an Arm library, binary, or test does not build or a test fails with an error, the agent steps in and aims to fix the problem automatically. As a first step, we have already used CogniPort's Blueprint editing mode to generate migration commits that do not lend themselves to simple changes.&lt;/p&gt;&lt;p data-block-key="3ehn3"&gt;The agent consists of three nested agentic loops, shown below. Each loop executes an LLM to produce one step of reasoning and a tool invocation. The tool is executed and the outputs are attached to the agent's context.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_cCFsU5D.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="2nodr"&gt;Figure 5: CogniPort&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="cgy3a"&gt;&lt;br/&gt;The outermost agent loop is an orchestrator that repeatedly calls the two other agents, the build-fixer agent and the test-fixer agent. The build-fixer agent tries to build a particular target and makes modifications to files until the target builds successfully or the agent gives up. The test-fixer agent tries to run a particular test and makes modifications until the test succeeds or the agent gives up (and in the process, it may use the build-fixer agent to address build failures in the test).&lt;/p&gt;&lt;h3 data-block-key="9793h"&gt;Testing CogniPort&lt;/h3&gt;&lt;p data-block-key="etovg"&gt;While we only recently scaled up CogniPort usage to high levels, we had the opportunity to more formally test its behavior by taking historic commits from the dataset above that were created without AI assistance. Focusing on Code &amp;amp; Test Adaptation (categories 1-8) commits that we could cleanly roll back (not all of the other categories were suitable for this approach), we generated a benchmark set of &lt;b&gt;245 commits&lt;/b&gt;. We then rolled the commits back and evaluated whether the agent was able to fix them.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/6_Cogniport.max-1000x1000.png"
        
          alt="6 Cogniport"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="2nodr"&gt;Figure 6: CogniPort results&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Despite no special prompts or other optimizations, early tests were very encouraging, successfully fixing failed tests 30% of the time. CogniPort was particularly effective for test fixes, platform-specific conditionals, and data representation fixes.  We're confident that as we invest in further optimizations of this approach, we will be even more successful.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A multiarch future&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;From here, we still have tens of thousands more applications to address with automation. To cover future code growth, all new applications are designed to be multiarch by default. We will continue to use CogniPort to fix tests and configurations, and we will also work with application owners on trickier changes. (One lesson of this project is how well owners tend to know their code!)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Yet, we’re increasingly confident in our goal of driving Google's monorepo towards &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;architecture neutrality&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; for production services, for a variety of reasons:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;All of the code used for production services is visible in a vast monorepo (&lt;/span&gt;&lt;a href="https://research.google/pubs/why-google-stores-billions-of-lines-of-code-in-a-single-repository/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;still&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Most of the structural changes we need to build, run, and debug multiarch applications are done.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Existing automation like Rosie and the recently developed CHAMP allows us to keep expanding release and rollout targets without much intervention on our part.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Last but not least, LLM-based automation will allow us to address much of the remaining long tail of applications for a multi-ISA Google fleet. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To read even more about what we learned, don't miss the &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2510.14928" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;paper itself&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. And to learn about our chip designs and how we’re operating a more sustainable cloud, you can read about Axion at &lt;/span&gt;&lt;a href="http://g.co/cloud/axion" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;g.co/cloud/axion&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sub&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;This blog post and the associated paper represents the work of a very large team. The paper authors are Eric Christopher, Kevin Crossan, Wolff Dobson, Chris Kennelly, Drew Lewis, Kun Lin, Martin Maas, Parthasarathy Ranganathan, Emma Rapati, and Brian Yang, in collaboration with &lt;/span&gt;&lt;a href="https://arxiv.org/html/2510.14928v1#S8" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;dozens of other Googlers&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; working on our Arm porting efforts.&lt;/span&gt;&lt;/sub&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 21 Oct 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/using-ai-and-automation-to-migrate-between-instruction-sets/</guid><category>AI &amp; Machine Learning</category><category>Compute</category><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>At Google, the future is multiarch; AI and automation are helping us get there</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/using-ai-and-automation-to-migrate-between-instruction-sets/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Parthasarathy Ranganathan</name><title>VP, Engineering Fellow</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Wolff Dobson</name><title>Developer Relations Engineer</title><department></department><company></company></author></item><item><title>Agile AI architectures: A fungible data center for the intelligent era</title><link>https://cloud.google.com/blog/topics/systems/agile-data-centers-and-systems-to-enable-ai-innovations/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;It’s not hyperbole to say that AI is transforming all aspects of our lives: human health, software engineering, education, productivity, creativity, entertainment… Consider just a few of the developments from Google this past year: &lt;/span&gt;&lt;a href="https://store.google.com/intl/en/ideas/articles/magic-cue/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Magic Cue&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on the Pixel 10 for more personal, proactive, and contextually-relevant assistance; our viral &lt;/span&gt;&lt;a href="https://aistudio.google.com/models/gemini-2-5-flash-image" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Nano Banana Gemini 2.5 Flash&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; image generation; Code Assist for developer productivity; and AlphaFold, which won its creators the Nobel prize for chemistry. We like to joke that the past year in AI has been an amazing decade! &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Underpinning all these advances in AI are equally amazing advances in the computing infrastructure powering AI. If AI researchers are like space explorers discovering new worlds, then systems and infrastructure designers are the &lt;/span&gt;&lt;a href="https://ieeexplore.ieee.org/document/10315012" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;ones building the rockets&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. But keeping up with the demands of AI services will require even more from us. At Google I/O earlier this year, we announced &lt;/span&gt;&lt;a href="https://blog.google/technology/ai/io-2025-keynote/#google-beam" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;nearly 50X annual growth&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in the monthly tokens processed by Gemini models, hitting 480 trillion tokens per month. Since then we have seen an additional 2X growth, hitting nearly a quadrillion monthly tokens. Other statistics paint a similar picture: AI accelerator consumption has grown by 15X in the last 24 months; our Hyperdisk ML data has grown 37X since GA; and we’re seeing more than 5 billion AI-powered retail search queries per month. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;With great AI comes great computing&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This kind of growth brings with it new challenges. When planning for data centers and systems, we are accustomed to long lead times, paralleling the long time to build out hardware. However, AI demand projections are now changing dynamically and dramatically, creating a significant divergence in supply and demand. This mismatch requires new architectures and system design approaches that can respond to extreme volatility and growth.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Rapid technology innovations are essential, but must be carefully managed across the stack. For example, each generation of AI hardware (like TPUs and GPUs) has introduced new features, functionality, but also power, rack, networking and cooling requirements. The rate of introduction of these new generations is also on the rise, making it hard to build a coherent end-to-end system that can accommodate such a vast rate of change. Further, changes in form factors, board densities, networking topologies, power architectures, liquid cooling solutions, etc., all incrementally compound heterogeneity, so that when taken together, there is a &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;combinatorial&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; increase in the complexity of designing, deploying, and maintaining systems and data centers. In addition, we need to design for a spectrum of data center facilities — beyond traditional hyperscalar- or cloud-optimized offerings to “neoclouds” and industry-standard colocation providers – across multiple geographical regions. This adds yet another layer of diversity and dynamism, further constraining data center design for the new AI era. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We can address these two challenges — dealing with dynamic growth and compounding heterogeneity — if we design data centers with &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;fungibility &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;and&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; agility &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;as first-class considerations. Architectures need to be modular, where components can be designed and deployed independently. They should be interoperable across different vendors or generations. Equally important, they should support the ability to late-bind the facility and systems to handle dynamically changing requirements (for example, reuse infrastructure designed for one generation to the next ). Data centers should also be built on agreed-upon standard interfaces, so data center investments can be reused across multiple customer segments. And finally, these principles need to be applied holistically across &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;all&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; components of the data center – power delivery, cooling, server hall design, compute, storage, and networking. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;With great computing comes great power (and cooling and systems)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To achieve agility and fungibility in power, we must standardize power delivery and management to build a resilient end-to-end power ecosystem, including common interfaces at the rack power level. Partnering with other members of the &lt;/span&gt;&lt;a href="https://www.opencompute.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Open Compute Project&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (OCP), we &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/enabling-1-mw-it-racks-and-liquid-cooling-at-ocp-emea-summit?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;introduced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; new technologies around +/-400Vdc designs and an approach for transitioning from monolithic to disaggregated solutions using side-car power, a.k.a. &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-specification-diablo-400-v0p5p2-2025-05-30-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Mt. Diablo&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Promising new technologies, like low-voltage DC power combined with solid state transformers, will enable these systems to transition to future fully integrated data center solutions.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are also evaluating solutions for data centers to &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;become suppliers to the grid, not just consumers&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; from it, with corresponding standardization around battery-operated storage and microgrids. We already used such solutions to manage the challenges around the &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/mitigating-power-and-thermal-fluctuations-in-ml-infrastructure?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;“spikiness” of AI training workloads&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and are also applying them for additional savings around power efficiency and grid power usage. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Data center cooling, meanwhile, is also being reimagined for the AI era. Earlier this year, we announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/enabling-1-mw-it-racks-and-liquid-cooling-at-ocp-emea-summit"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Project Deschutes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a state-of-the-art liquid cooling solution that we contributed to the Open Compute community, and have since published the &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-specification-deschutes-final-2025-09-05-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;specification&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and design &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/projectdeschutescduv0p75-20250812-1-zip" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;collateral&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. The community is responding enthusiastically, with liquid cooling suppliers like Boyd, CoolerMaster, Delta, Envicool, Nidec, nVent, and Vertiv showcasing demos at major events this year, including the OCP Global Summit and SuperComputing 2025. But we have more opportunities to collaborate on: industry-standard cooling interfaces, new components like rear-door-heat exchangers, reliability, etc. One particularly important area is standardizing layouts and fit-out scopes across colos and third-party data centers, so we as an industry can enable more fungibility. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, we need to bring together compute, networking, and storage in the server hall, including physical attributes of the data center design such as rack height, width, and depth (and more recently, weight); aisle widths and layouts; as well as rack and network interfaces. We also need standards for telemetry and mechatronics to build and maintain these future data centers. With our fellow OCP partners, we are standardizing telemetry integration for third-party data centers, including establishing best practices, developing common naming and implementations, and creating standard security protocols. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Beyond physical infrastructure, we are collaborating with our partners to deliver open standards for more scalable and secure systems. A few highlights include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Resilience:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We’ve expanded our multi-year effort on &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-gpu-accelerator-management-interfaces-v1-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;manageability&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-gpu-and-accelerators-ras-requirements-1-0-final-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;reliability and serviceability&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; from GPUs to include CPU &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/hyperscale-cpu-impactless-firmware-updates-requirements-specification-v0-7-9-29-2025-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;firmware updates&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/hyperscale-cpu-ras-and-debug-requirements-specification-v0-7-09-29-2025-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;debuggability&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Security:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/caliptra-2-0-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Caliptra 2.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, the open-source hardware root of trust, now defends against future threats with post-quantum cryptography, while &lt;/span&gt;&lt;a href="https://www.opencompute.org/sp/about-ocp-safe" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP S.A.F.E.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; makes security audits routine and cost-effective.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Storage:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-lock-specification-v1-0-rc2-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP L.O.C.K.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; builds on Caliptra’s foundation to provide a robust, open-source key management solution for any storage device.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Networking:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=GCM3NjfY9Zo" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Congestion Signaling (CSIG)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; has been standardized and is delivering measured improvements in load balancing. Alongside continued advancements in &lt;/span&gt;&lt;a href="https://sonicfoundation.dev/event/sonic-workshop-and-sonic-booth-at-ocp-global-summit/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;SONiC&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a new effort is underway to standardize &lt;/span&gt;&lt;a href="https://www.opencompute.org/blog/the-open-compute-project-announces-new-optical-circuit-switching-ocs-project" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Optical Circuit Switching&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Sustainability is embedded in our work. To provide insight into the environmental impact of AI, we developed a &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure/measuring-the-environmental-impact-of-ai-inference?e=48754805?utm_source%3Dlinkedin"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;new methodology&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for measuring the energy, emissions, and water impact of emerging AI workloads, demonstrating that the median Gemini Apps text prompt consumes less than five drops of water and has the energy impact of watching TV for under nine seconds. We apply this type of data-driven approach to other collaborations across the OCP community: on an embodied carbon disclosure specification, green concrete, clean backup power, and reduced manufacturing emissions.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A call to action: community-driven innovation and AI-for-AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google has a long history of collaboration with open ecosystems that have demonstrated the compounding power of community collaborations, and we have the opportunity to repeat as we design agile and fungible data centers for the AI era. Join us in the new &lt;/span&gt;&lt;a href="https://www.opencompute.org/about/a-call-for-collaboration-on-ai-data-center-infrastructure-standards" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP Open Data Center for AI Strategic Initiative&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on common standards and optimizations for agile and fungible data centers. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As we look ahead to the next waves of growth in AI, and the amazing advances they will unlock, we will need to leverage these AI advances in our own work, to amplify our productivity and innovation. An early example is &lt;/span&gt;&lt;a href="https://deepmind.google/discover/blog/how-alphachip-transformed-computer-chip-design/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Deepmind AlphaChip&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which uses AI to accelerate and optimize chip design. We are seeing more promising uses of AI for systems:&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;across hardware, firmware, software, and testing; for performance, agility, reliability, and sustainability; and across design, deployment, maintenance, and security. These AI-enhanced optimizations and workflows are what will bring the next order-of-magnitude improvements to the data center. We look forward to the innovations ahead, and to your continued collaboration in driving them forward.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 13 Oct 2025 23:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/agile-data-centers-and-systems-to-enable-ai-innovations/</guid><category>AI &amp; Machine Learning</category><category>Infrastructure</category><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Agile AI architectures: A fungible data center for the intelligent era</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/agile-data-centers-and-systems-to-enable-ai-innovations/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Parthasarathy Ranganathan</name><title>VP, Engineering Fellow</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amin Vahdat</name><title>SVP and Chief Technologist, AI and Infrastructure</title><department></department><company></company></author></item><item><title>AI infrastructure is hot. New power distribution and liquid cooling infrastructure can help</title><link>https://cloud.google.com/blog/topics/systems/enabling-1-mw-it-racks-and-liquid-cooling-at-ocp-emea-summit/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI is fundamentally transforming the compute landscape, demanding unprecedented advances in data center infrastructure. At Google, we believe that physical infrastructure — the power, cooling, and mechanical systems that underpin everything — isn’t just important, but critical to AI’s continued scaling. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We have a long-standing partnership with the Open Compute Project (OCP) that has been instrumental in driving industry collaboration and open innovation in infrastructure. At the &lt;/span&gt;&lt;a href="https://www.opencompute.org/summit/emea-summit" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;2025 OCP EMEA Summit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; today, we discussed the power delivery transformation from 48 volts direct current (VDC) to the new +/-400 VDC, which will enable IT racks to scale from 100 kilowatts up to 1 megawatt. We also shared that we’ll contribute our fifth-generation cooling distribution unit, Project Deschutes, to OCP, helping to accelerate adoption of liquid cooling industry-wide.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f9fdd3cabb0&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Transforming power delivery with 1 MW per IT rack&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google has a long history of advancing data center power delivery. Almost 10 years ago, &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=x_U4FyTabpg" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;we championed&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; the adoption of 48 VDC inside the IT rack to significantly increase the power distribution efficiency and reduce losses compared to what typical 12 VDC solutions delivered. The industry responded to our call to action to collaborate on this technology, and the resulting &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/google-open-rack-v2-flatbed-tray-48v-to-12v-payload-adapter" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;architecture&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; has worked well, scaling from 10 kilowatts to 100 kilowatts IT racks. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The AI era requires even greater power delivery capabilities for two distinct reasons. The first is simply that ML will require more than 500 kW per IT rack before 2030. The second is the densification of each IT rack, where every millimeter of space in the IT rack is used for tightly interconnected “xPUs” (e.g. GPUs, TPUs, CPUs). This requires a much higher voltage DC power distribution solution, where power components and battery backup are outside of the IT rack.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to introduce +/-400 VDC power delivery that can support up to 1 MW per rack. This is about much more than simply increasing power delivery capacity — selecting 400 VDC as the nominal voltage allows us to leverage the supply chain established by electric vehicles (EVs), for greater economies of scale, more efficient manufacturing, and improved quality and scale, to name a few. As part of the &lt;/span&gt;&lt;a href="https://www.datacenterdynamics.com/en/news/microsoft-and-meta-reveal-open-ai-rack-design-with-separate-power-and-compute-cabinets/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Mt Diablo project&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, we are collaborating with Meta, and Microsoft at OCP to standardize the electrical and mechanical interfaces, and the 0.5 specification draft will be available for industry feedback in May. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The first embodiment of this work is an AC-to-DC sidecar power rack that disaggregates power components from the IT rack. This solution improves the end-to-end efficiency by ~ 3% while enabling the entire IT rack to be used for xPUs. Longer term, we are exploring directly distributing higher-voltage DC power within the data center and to the rack, for even greater power density and efficiency. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_-_400_VDC_power_delivery.gif"
        
          alt="1 - 400 VDC power delivery"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="bzqkm"&gt;+/-400 VDC power delivery: AC-to-DC sidecar power rack&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The liquid cooling imperative&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The dramatic increase in chip power consumption — from 100W chips to accelerators exceeding 1000W — has made advanced thermal management essential. Packing more powerful chips into racks also creates significant challenges for cooling density. Liquid cooling has emerged as the clear solution, given its superior thermal and hydraulic properties. Water can transport approximately 4000 times more heat per unit volume than air for a given temperature change, while the thermal conductivity of water is roughly 30 times greater than air. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google, we’ve deployed liquid cooling at GigaWatt scale across more than 2000 TPU Pods in the past seven years with remarkable uptime — consistently at about 99.999%. Google first used liquid cooling in TPU v3 that was deployed in 2018. Liquid-cooled ML servers have nearly half the geometrical volume of their air-cooled counterparts because they replace bulky heatsinks with compact cold plates. This allowed us to double chip density and quadruple the size of our liquid-cooled TPU v3 supercomputer compared to the air-cooled TPU v2 generation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’ve continued to refine this technology generation over generation, from TPU v3 and TPU v4, through TPU v5, and most recently, &lt;/span&gt;&lt;a href="https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ironwood&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Our implementation utilizes in-row coolant distribution units (CDUs) with redundant components and uninterruptible power supplies (UPS) for high availability. These CDUs isolate the rack's liquid loop from the facility loop, providing a controlled, high-performance cooling system delivered via manifolds, flexible hoses, and cold plates that are directly attached to the high-power chips. In our CDU architecture, named Project Deschutes, the pump and heat exchanger unit is redundant, which is what has enabled us to consistently achieve the above-mentioned fleet-wide CDU availability of ~99.999% since 2020.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We will contribute the fifth-generation Project Deschutes CDU, currently in development, to OCP later this year. This contribution, including system details, specifications, and best practices, is intended to help accelerate the industry's adoption of liquid cooling at scale. Our insights are drawn from nearly a decade of designing and deploying liquid cooling across four generations of TPUs, and encompass:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Design for high cooling performance&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Manufacturing quality&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Reliability and uptime&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Deployment velocity&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Serviceability and operational excellence&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Supply ecosystem advancements&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_-_Project_Deschutes.max-1000x1000.jpg"
        
          alt="2 - Project Deschutes"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="bzqkm"&gt;Project Deschutes CDU: 4th gen in deployment, 5th gen in concept&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get ready for the next generation of AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We're encouraged by the significant strides the industry has made in power delivery and liquid cooling. However, with the accelerating pace of AI hardware development, it's clear that we must collectively quicken our pace to prepare data centers for what’s next. We're particularly excited about the potential for rapid industry adoption of +/-400 VDC, facilitated by the upcoming Mt Diablo specification. We also strongly encourage the industry to adopt the Project Deschutes CDU design and leverage our extensive liquid cooling learnings. Together, by embracing these advancements and fostering deeper collaboration, we believe the most impactful innovations are still ahead.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 29 Apr 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/enabling-1-mw-it-racks-and-liquid-cooling-at-ocp-emea-summit/</guid><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>AI infrastructure is hot. New power distribution and liquid cooling infrastructure can help</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/enabling-1-mw-it-racks-and-liquid-cooling-at-ocp-emea-summit/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Madhusudan Iyengar</name><title>Principal Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amber Huffman</name><title>Principal Engineer, Google</title><department></department><company></company></author></item><item><title>How we got to 100 million cells in our global Li-ion rack battery fleet</title><link>https://cloud.google.com/blog/topics/systems/100-million-li-ion-cells-in-google-data-centers/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When it comes to data center power systems, batteries play an important role. The applications that run in our data centers require nearly continuous uptime. And while utility power is highly reliable, power outages are unavoidable. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When an outage happens, batteries can supply short-duration power, allowing servers to operate continuously when the facility switches between AC power sources, or to ride through transient power disturbances. Or, if a facility loses both primary and alternate power sources for an extended period of time, batteries can supply sufficient power to allow machines to execute a clean shutdown procedure. This is helpful in expediting machine restarts after the power outage. More importantly, it helps ensure that critical user data is safely stored to disk and not lost in the power disruption. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;quot;Ensure Your Data&amp;#x27;s Safety and Uptime with Google Cloud for free&amp;quot;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f9fdcfd6e20&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google, we rely on a 48Vdc rack power system with integrated battery backup units (BBUs), and in 2015, we became one of the first hyperscale data center providers to deploy Lithium-ion BBUs. These Li-ion batteries had twice the life, twice the power and half the volume of previous-generation lead-acid batteries. Switching from lead-acid batteries to Li-ion means we deploy only one-quarter the number of batteries, greatly reducing the battery waste generated by our data centers. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently reached an important milestone: Google has more than 100 million cells deployed in battery packs across our global data center fleet. This is remarkable, and only possible thanks to the safety-first approach we take to deploy Li-ion batteries at scale&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The main safety risk associated with Li-ion batteries is the battery going into thermal runaway if it’s accidentally mishandled or exposed to excessive temperatures or overcharging. While a rare event, the resulting fire is extremely difficult to extinguish due to the large amount of heat generated, driving a thermal runaway chain reaction to nearby cells. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To deploy this large fleet of Li-ion cells, we have had to make safety a core principle of our battery design. Specifically, as an early adopter of the &lt;/span&gt;&lt;a href="https://www.ul.com/services/ul-9540a-test-method" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;UL9540A thermal runaway test method&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, we subject our Li-ion BBU designs to rigorous flame safety testing that demonstrates their ability to limit thermal runaway. As a result, Google has successfully been granted permits to deploy BBUs in some of the world’s most stringent jurisdictions, in the APAC region. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In addition, our Li-ion BBUs benefit from our distributed UPS architecture that offers significant availability and TCO benefits compared to traditional monolithic UPS systems. The distributed UPS architecture improves machine availability by: 1) reducing the failure-domain blast radius to a single rack, and 2) locating the batteries in the rack to eliminate intermediate points of failure between the UPS and machines. This architecture also provides TCO benefits by scaling the UPS with the deployment, i.e., reducing day-1 UPS cost. Additionally, locating the batteries in the rack on the same DC bus as the machines eliminates intermediate AC/DC power conversion steps that cause efficiency losses. In 2016 &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/google-joins-open-compute-project-to-drive-standards-in-it-infrastructure"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;we shared the 48V rack power system spec with the Open Compute Project&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including specs for the Li-ion BBUs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Li-ion batteries have been crucial to ensuring the uninterrupted operation of Google Cloud data centers. By transitioning from lead-acid to Li-ion BBUs, we’ve significantly improved power availability, efficiency, and lifespan, even as we simultaneously address their critical safety risks. Our commitment to rigorous safety testing and adherence to standards and test methods like UL9540A has enabled us to deploy millions of Li-ion BBUs globally, providing our customers with the high level of reliability they expect from Google Cloud. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Getting to 100 million Li-ion batteries is just one of many examples of how we are building a reliable cloud and power-efficient AI. As data center power systems evolve to include new technologies including large battery energy storage systems (BESS) and new workload requirements (AI workloads), we remain dedicated to exploring and implementing innovative solutions to build the most efficient and safest cloud data centers.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;The authors would like to acknowledge Vijay Boovaragavan, Matt Tamashiro, Sandeep Sebastian, Thibault Pelloux-Gervais, Ken Wong, Mike Meakins, Stanley Fung, and Scott Sharp for their contributions.&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 25 Feb 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/100-million-li-ion-cells-in-google-data-centers/</guid><category>Infrastructure</category><category>Systems</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/100_million_Li-ion_cells_in_Google_data_cent.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How we got to 100 million cells in our global Li-ion rack battery fleet</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/100_million_Li-ion_cells_in_Google_data_cent.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/100-million-li-ion-cells-in-google-data-centers/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Christina Peabody</name><title>Cloud Rack Power Team TLM</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Gregory Sizikov</name><title>Cloud Power and Compliance Manager</title><department></department><company></company></author></item><item><title>Balance of power: A full-stack approach to power and thermal fluctuations in ML infrastructure</title><link>https://cloud.google.com/blog/topics/systems/mitigating-power-and-thermal-fluctuations-in-ml-infrastructure/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The recent explosion of machine learning (ML) applications has created unprecedented demand for power delivery in the data center infrastructure that underpins those applications. Unlike server clusters in the traditional data center, where tens of thousands of workloads coexist with uncorrelated power profiles, large-scale batch-synchronized ML training workloads exhibit substantially different power usage patterns. Under these new usage conditions, it is increasingly challenging to ensure the reliability and availability of the ML infrastructure, as well as to improve data-center &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/goodput-metric-as-measure-of-ml-productivity"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;goodput&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and energy efficiency. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google has been at the forefront of data center infrastructure design for several decades, with &lt;/span&gt;&lt;a href="https://ieeexplore.ieee.org/document/10551740" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;a long list of innovations&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to our name. In this blog post, we highlight one of the key innovations that allowed us to manage unprecedented power and thermal fluctuations in our ML infrastructure. This innovation underscores the power of full codesign across the stack — from ASIC chip to data center, across both hardware and software. We also discuss the implications of this approach and propose a call to action for the broader industry. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;New ML workloads lead to new ML power challenges&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today’s ML workloads require synchronized computation across tens of thousands of accelerator chips, together with their hosts, storage, and networking systems; these workloads often occupy one entire data-center cluster — or even multiples of them. The peak power utilization of these workloads could approach the rated power of all the underlying IT equipment, making power overscription much more difficult. Furthermore, power consumption rises and falls between idle and peak utilization levels much more steeply, due to the fact that the entire cluster’s power usage is now dominated by no more than a few large ML workloads. You can observe these power fluctuations when a workload launches or finishes, or when it is halted, then resumed or rescheduled. You may also observe a similar pattern when the workload is running normally, mostly attributable to alternating compute- and networking-intensive phases of the workload within a training step. Depending on the workload’s characteristics, these inter- and intra-job power fluctuations can occur very frequently. This can result in multiple unintended consequences on the functionality, performance, and reliability of the data center infrastructure.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_f9EbAew.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="9vul9"&gt;Fig. 1. Large power fluctuations observed on cluster level with large-scale synchronized ML workloads&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In fact, in our latest batch-synchronous ML workloads running on dedicated ML clusters, we observed power fluctuations in the tens of megawatts (MW), as shown in Fig.1. And compared to a traditional load variation profile, the ramp speed could be almost instantaneous, repeat as frequently as every few seconds, and last for weeks… or even months! &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Fluctuations of this kind pose the following risks:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Functionality and long-term reliability issues with rack and data center equipment, resulting in hardware-induced outages, reduced energy efficiency and increased operational/maintenance costs, including but not limited to rectifiers, transformers, generators, cables and busways&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Damage, outage, or throttling at the upstream utility, including violation of contractual commitments to the utility on power usage profiles, and corresponding financial costs&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Unintended and frequent triggering of the uninterrupted power supply (UPS) system from large power fluctuations, resulting in shortened lifetime of the UPS system&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Large power fluctuations may also impact hardware reliability at a much smaller per-chip or per-system scale. Although the maximum temperature is well under control, power fluctuations may still translate into large and frequent temperature fluctuations, triggering various forms of interactions including warpage, changes to thermal interface material property, and electromigration.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f9fdd65d8e0&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A full-stack approach to proactive power shaping&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Due to the high complexity and large scale of our data-center infrastructure, we posited that &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;proactively shaping a workload’s power profile&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; could be more efficient than simply adapting to it. Google’s full codesign across the stack — from chip to data center, from hardware to software, and from instruction set to realistic workload — provides us with all the knobs we need to implement highly efficient end-to-end power management features to regulate our workloads’ power profiles and mitigate detrimental fluctuations. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Specifically, we installed instrumentation in the TPU compiler to check on signatures in the workload that are linked with power fluctuations, such as sync flags. We then dynamically balance the activities of major compute blocks of the TPU around these flags to smooth out their utilization over time. This achieves our goal of mitigating power and thermal fluctuations with negligible performance overhead. In the future, we may also apply a similar approach to the workload’s starting and completion phases, resulting in a gradual, rather than abrupt, change in power levels. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’ve now implemented this compiler-based approach to shaping the power profile and applied it on realistic workloads. We measured the system’s total power consumption and a single chip’s hotspot temperature with, and without, the mitigation, as plotted in Fig. 2 and Fig. 3, respectively. In the test case, the magnitude of power fluctuations dropped by nearly 50% from the baseline case to the mitigation case. The magnitude of temperature fluctuations also dropped from ~20 C in the baseline case to ~10 C in the mitigation case. We measured the cost of the mitigation by the increase in average power consumption and the length of the training step. With proper tuning of the mitigation parameters, we can achieve the benefits of our design with small increases in average power with &amp;lt;1% performance impact.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_x9eRU4h.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="9vul9"&gt;Fig. 2. Power fluctuation with and without the compiler-based mitigation&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_lWze6j1.max-1000x1000.jpg"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="9vul9"&gt;Fig. 3. Chip temperature fluctuation with and without the compiler-based mitigation&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A call to action &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;ML infrastructure is growing rapidly and expected to surpass traditional server infrastructure in terms of total power demand in the coming years. At the same time, ML infrastructure’s power and temperature fluctuations are unique and tightly coupled with the ML workload’s characteristics. Mitigating these fluctuations is just one example of many innovations we need to ensure reliable and high-performance infrastructure. In addition to the method described above, we’ve been investing in an array of innovative techniques to take on ever-increasing power and thermal challenges, including data center water cooling, vertical power delivery, power-aware workload allocation, and many more. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;But these challenges aren’t unique to Google. Power and temperature fluctuations in ML infrastructure are becoming a common issue for many hyperscalers and cloud providers as well as infrastructure providers. We need partners at all levels of the system to help: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Utility providers to set forth a standardized definition of acceptable power quality metrics — especially in scenarios where multiple data centers with large power fluctuations co-exist within a same grid and interact with one another&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Power and cooling equipment suppliers to offer quality and reliability enhancements for electronics components, particularly for use-conditions with large and frequent power and thermal fluctuations&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Hardware suppliers and data center designers to create a standardized suite of solutions such as rack-level capacitor banks (RLCB) or on-chip features, to help establish an efficient supplier base and ecosystem&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;ML model developers to consider the energy consumption characteristics of the model, and consider adding low-level software mitigations to help address energy fluctuations&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google has been leading and advocating for industry-wide collaboration on these issues through forums such as Open Compute Project (OCP) to benefit the data center infrastructure industry as a whole. We look forward to continuing to share our learnings and collaborating on innovative new solutions together.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;A special thanks to Denis Vnukov, Victor Cai, Jianqiao Liu, Ibrahim Ahmed, Venkata Chivukula, Jianing Fan, Gaurav Gandhi, Vivek Sharma, Keith Kleiner, Mudasir Ahmad, Binz Roy, Krishnanjan Gubba Ravikumar, Ashish Upreti and Chee Chung from Google Cloud for their contributions.&lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 11 Feb 2025 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/mitigating-power-and-thermal-fluctuations-in-ml-infrastructure/</guid><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Balance of power: A full-stack approach to power and thermal fluctuations in ML infrastructure</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/mitigating-power-and-thermal-fluctuations-in-ml-infrastructure/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Houle Gan</name><title>Technical Lead Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Parthasarathy Ranganathan</name><title>VP, Engineering Fellow</title><department></department><company></company></author></item><item><title>Designing sustainable AI: A deep dive into TPU efficiency and lifecycle emissions</title><link>https://cloud.google.com/blog/topics/sustainability/tpus-improved-carbon-efficiency-of-ai-workloads-by-3x/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As AI continues to unlock new opportunities for business growth and societal benefits, we’re working to reduce the carbon intensity of AI systems — including by optimizing software, improving hardware efficiency, and powering AI models with carbon-free energy.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today we’re releasing a &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2502.01671" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;first-of-its-kind study&lt;/span&gt;&lt;/a&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt; on the lifetime emissions of our Tensor Processing Unit (TPU) hardware. Over two generations — from TPU v4 to Trillium — more efficient TPU hardware design has led to a 3x improvement in the carbon-efficiency of AI workloads.&lt;sup&gt;2&lt;/sup&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our life-cycle assessment (LCA) provides the first detailed estimate of emissions from an AI accelerator, using observational data from raw material extraction and manufacturing, to energy consumption during operation. These measurements provide a snapshot of the average, chip-level carbon intensity of Google’s TPU hardware, and enable us to compare efficiency across generations. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Introducing Compute Carbon Intensity (CCI)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our study examined five models of TPUs to estimate their full life-cycle emissions and understand how hardware design decisions have impacted their carbon-efficiency. To measure emissions relative to computational performance and enable apples-to-apples comparisons between chips, we developed a new metric — Compute Carbon Intensity (CCI) — that we believe can enable greater transparency and innovation across the industry.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;CCI quantifies an AI accelerator chip’s carbon emissions per unit of computation (measured in grams of &lt;span style="vertical-align: baseline;"&gt;CO&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: sub;"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;e&lt;/span&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; per Exa-FLOP).&lt;sup&gt;3&lt;/sup&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Lower CCI scores mean lower emissions from the AI hardware platform for a given AI workload — for example training an AI model. We've used CCI to track the progress we've made in increasing the carbon-efficiency of our TPUs, and we’re excited to share the results. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Key takeaways&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google’s TPUs have become significantly more carbon-efficient.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Our study found a 3x improvement in the CCI of our TPU chips over 4 years, from TPU v4 to Trillium. By choosing newer generations of TPUs — &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;like our 6th-generation TPU, Trillium&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; — our customers not only get cutting-edge performance, but also generate fewer carbon emissions for the same AI workload. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Operational electricity emissions are key.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Today, operational electricity emissions comprise the vast majority (70%+) of a Google TPU’s lifetime emissions. This underscores the importance of improving the energy efficiency of AI chips and reducing the carbon intensity of the electricity that powers them. Google’s efforts to&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://www.google.com/about/datacenters/cleanenergy/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;run on 24/7 carbon-free energy (CFE) on every grid where we operate by 2030&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; aims directly at reducing the largest contributor to TPU emissions — operational electricity consumption. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Manufacturing matters.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; While operational emissions dominate an AI chip's lifetime emissions, emissions associated with chip manufacturing are still notable — and their share of total emissions will increase as we reduce operational emissions with carbon-free energy. The study’s detailed manufacturing LCA helps us target our manufacturing decarbonization efforts towards the highest-impact initiatives. We're &lt;/span&gt;&lt;a href="https://www.gstatic.com/gumdrop/sustainability/google-2024-environmental-report.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;actively working&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with our supply chain partners to reduce these emissions through more sustainable manufacturing processes and materials. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our significant improvements in AI hardware carbon-efficiency in this paper complement rapid advancements in AI model and algorithm design. Outside of this study, continued optimization of AI models is reducing the number of computations required for a given model performance. Some models that once required a supercomputer to run can now be run on a laptop, and at Google we’re using techniques like &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/accurate-quantized-training-aqt-for-tpu-v5e"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Accurate Quantized Training&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://research.google/blog/looking-back-at-speculative-decoding/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;speculative decoding&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to further increase model efficiency. We expect model advancements to continue unlocking carbon-efficiency gains, and are working to quantify the impact of software design on carbon-efficiency in future studies. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;$300 in free credit to try Google Cloud TPU API&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f9fdcfe2490&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Start building for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;http://console.cloud.google.com/freetrial?redirectPath=/marketplace/product/google/tpu.googleapis.com&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Partnering for a sustainable AI future&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The detailed approach we’ve taken here allows us to target our efforts to continue increasing the carbon-efficiency of our TPUs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This life-cycle analysis of AI hardware is an important first step in quantifying and sharing the carbon-efficiency of our AI systems, but it's just the beginning. We will continue to analyze other aspects of AI’s emissions footprint — for example AI model emissions and software efficiency gains — and share our insights with customers and the broader industry. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Together, we can harness the &lt;/span&gt;&lt;a href="https://ai.google/advancing-ai/why-ai/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;transformative power of AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; while minimizing its impact on the planet.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Explore our latest &lt;/strong&gt;&lt;a href="https://cloud.google.com/tpu"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;TPU offerings&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; and learn more about how customers can &lt;/strong&gt;&lt;a href="https://cloud.google.com/sustainability"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;unlock sustainable growth&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; with Google Cloud.&lt;/strong&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;em&gt;1. &lt;span style="vertical-align: baseline;"&gt;The authors would like to thank and acknowledge the co-authors for their important contributions: Ian Schneider, Hui Xu, Stephan Benecke, Tim Huang, and Cooper Elsworth.&lt;br/&gt;2. &lt;span style="vertical-align: baseline;"&gt;A February 2025 Google case study quantified the full lifecycle emissions of TPU hardware as a point-in-time snapshot across Google’s generations of TPUs. To estimate operational emissions from electricity consumption of running workloads, we used a one month sample of observed machine power data from our entire TPU fleet, applying Google’s 2023 average fleetwide carbon intensity. To estimate embodied emissions from manufacturing, transportation, and retirement, we performed a life-cycle assessment of the hardware. Data center construction emissions were estimated based on Google’s disclosed 2023 carbon footprint. These findings do not represent model-level emissions, nor are they a complete quantification of Google’s AI emissions. Based on the TPU location of a specific workload, CCI results of specific workloads may vary.&lt;/span&gt;&lt;br/&gt;3. &lt;span style="vertical-align: baseline;"&gt;CCI includes both estimates of lifetime embodied and operational emissions in order to understand the impact of improved chip design on our TPUs. In this study, we hold the impact of carbon-free energy on carbon intensity constant across generations, by using Google's 2023 average fleetwide carbon intensity. We did this purposefully to remove the impact of deployment location on the results.&lt;/span&gt;&lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 05 Feb 2025 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/sustainability/tpus-improved-carbon-efficiency-of-ai-workloads-by-3x/</guid><category>Compute</category><category>AI &amp; Machine Learning</category><category>Systems</category><category>Sustainability</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Designing sustainable AI: A deep dive into TPU efficiency and lifecycle emissions</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/sustainability/tpus-improved-carbon-efficiency-of-ai-workloads-by-3x/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>David Patterson</name><title>Google Distinguished Engineer, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Parthasarathy Ranganathan</name><title>VP, Engineering Fellow</title><department></department><company></company></author></item><item><title>Speed, scale and reliability: 25 years of Google data-center networking evolution</title><link>https://cloud.google.com/blog/products/networking/speed-scale-reliability-25-years-of-data-center-networking/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Rome wasn’t built in a day, and neither was Google’s network. But 25 years in, we’ve built out network infrastructure with scale and technical sophistication that’s nothing short of remarkable.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;It’s all the more impressive because in the beginning, Google’s network infrastructure was relatively simple. But as our user base and the demand for our services grew exponentially, we realized that we needed a network that could handle an unprecedented scale of data and traffic, and that could adapt to dynamic traffic patterns as our workloads changed over time. This ignited a 25-year journey marked by numerous engineering innovations and milestones, ultimately leading to our current fifth-generation &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Jupiter data center network&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; architecture, which now scales to 13 Petabits/sec of bisectional bandwidth. To put this data rate in perspective, this network could support a video call (@1.5 Mb/s) for all 8 billion people on Earth! &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we have hundreds of Jupiter fabrics deployed around the world, simultaneously supporting hundreds of services, billions of active daily users, all of our Google Cloud customers, and some of the largest ML training and serving infrastructures in the world. I would like to share more about our journey as we look ahead to the next generation of data center network infrastructure.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;$300 to try Google Cloud networking&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f9fdd353640&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Start building for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;http://console.cloud.google.com/freetrial?redirectpath=/products?#networking&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Guiding principles&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our network evolution has been guided by a few key principles:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Anything, anywhere: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Our data center networks support efficiency and simplicity by allowing large-scale jobs to be placed anywhere among 100k+ servers within the same network fabric, with high-speed access to needed storage and support services. This scale improves application performance for internal and external workloads and eliminates internal fragmentation. &lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Predictable, low latency: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We prioritize consistent performance and minimizing tail latency by provisioning bandwidth headroom, maintaining 99.999% network availability, and proactively managing congestion through end-host and fabric cooperation.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Software-defined and systems-centric:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Leveraging software-defined networking (SDN) for flexibility and agility, we qualify and globally release dozens of new features every two weeks across our global network.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Incremental evolution and dynamic topology: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Incremental evolution helps us to refresh the network granularly (rather than bringing it down wholesale), while dynamic topology helps us to continuously adapt to changing workload demands. The combination of optical circuit switching and SDN supports in-place physical upgrades and an ever-evolving, heterogeneous network that supports multiple hardware generations in a single fabric.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Traffic engineering and application-centric QoS:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Optimizing traffic flows and ensuring Quality of Service helps us tailor the network to each application's needs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Integrating across the above principles is the foundation for our work. The network is the foundation of reliability for all other compute services, from storage to AI. As such, the network must fail last and fail least. To support this foundational responsibility, we rigorously define and monitor every &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;bad minute&lt;sup&gt;1&lt;/sup&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; across hundreds of clusters and millions of ports across our global network. Our progress on reliability is such that our in-house, software-defined Jupiter networks deliver a factor of &lt;/span&gt;&lt;a href="https://research.google/pubs/orion-googles-software-defined-networking-control-plane/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;50x more reliability&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; than prior versions of our data center networks. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;2015 - Jupiter, the first Petabit network &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In a seminal paper, we&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://research.google/pubs/jupiter-rising-a-decade-of-clos-topologies-and-centralized-control-in-googles-datacenter-network/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;showed&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;that&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Jupiter data center networks scaled to 1.3 Pb/s of aggregate bandwidth by leveraging merchant switch silicon, Clos topologies and Software Defined Networking (SDN). This generation of Jupiter was the culmination of five generations of data center networks developed in house by the Google networking team. At that time, this data rate — in &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;one&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Google data center — was more than the estimated aggregate IP traffic data rate for the global internet. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;2022 - Enabling 6 Petabit per second&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In 2022 we&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;that our Jupiter networks scaled to over 6 Pb/s, with deep integration of optical circuit switching (OCS), wave division multiplexing (WDM), and a highly scalable &lt;/span&gt;&lt;a href="https://www.usenix.org/conference/nsdi21/presentation/ferguson" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Orion&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; SDN controller. These technologies unlocked a range of advancements, including incremental network builds, enhanced performance, reduced costs, lower power consumption, dynamic traffic management, and seamless upgrades.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;2023 - 13 Petabit per second network&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We have further enhanced Jupiter to support native 400 Gb/s link speeds in the network core. The fundamental building block of Jupiter networks (called the &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;aggregation block&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;) now consists of 512 ports of 400 Gb/s of connectivity both to end hosts and to the rest of the data center, for an aggregate of 204.8 Tb/s of bidirectional non-blocking bandwidth per block. We support 64 such blocks for a total bisection bandwidth of 64*204.8 Tb/s = 13.1 Pb/s. This technology has been powering Google's production data centers for over a year, fueling the rapid advancement of artificial intelligence, machine learning, web search, and other data-intensive applications.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;2024 and beyond - Extreme networking in the age of AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While celebrating over two decades of innovation in data center networking, we’re already charting the course for the next generation of network infrastructure to support the age of AI. For example, our teams are busy working on networking infrastructure needs for our upcoming &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/trillium-sixth-generation-tpu-is-in-preview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;A3 Ultra VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, that feature NVIDIA ConnectX-7 networking, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; supports non-blocking 3.2 Tbps per server of GPU-to-GPU traffic over RoCE (RDMA over converged ethernet) and our future offerings based on &lt;/span&gt;&lt;a href="https://www.nvidia.com/en-us/data-center/gb200-nvl72/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NVIDIA GB200 NVL72&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Over the next few years, we will deliver significant advances in network scale and bandwidth, both per-port and network-wide. We will continue to push the boundaries of end-host integration, including the transport and congestion control stack, and streamline network stages to achieve even lower latency with tighter tails. Real-time topology engineering, deeper integration with the compute and storage stacks, and continued refinements to host-based load balancing techniques will further enhance network reliability and latency. With these innovations, our network will remain a cornerstone for the transformative applications and services that enrich the lives of our users throughout the world while simultaneously supporting the groundbreaking AI capabilities that power both our internal services and Google Cloud products.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to take on these challenges and opportunities to see what the next 25 years hold for Google networking!&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Further resources&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network, SIGCOMM ‘15 [&lt;/span&gt;&lt;a href="https://research.google/pubs/jupiter-rising-a-decade-of-clos-topologies-and-centralized-control-in-googles-datacenter-network/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Journey of the first Jupiter datacenter network leveraging merchant switch silicon, Clos topologies and Software Defined Networking (SDN).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;First deployed in production in 2012.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Mission Apollo: Landing Optical Circuit Switching at Datacenter Scale, &lt;/span&gt;&lt;a href="http://arxiv.org" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;arxiv.org&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, 2022 [&lt;/span&gt;&lt;a href="https://arxiv.org/abs/2208.10041" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;First deployed in production in 2013.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Orion: Google's Software-Defined Networking Control Plane. NSDI ‘21 [&lt;/span&gt;&lt;a href="https://research.google/pubs/orion-googles-software-defined-networking-control-plane/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Google's high-performance, scalable, intent-based distributed SDN platform used in both datacenter and wide area networks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;First deployed in production in 2016.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Jupiter Evolving: Transforming Google's Datacenter Network via Optical Circuit Switches and Software-Defined Networking, SIGCOMM ’22 [&lt;/span&gt;&lt;a href="https://research.google/pubs/jupiter-evolving-transforming-googles-datacenter-network-via-optical-circuit-switches-and-software-defined-networking/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enabling technologies: OCS (2013), Orion SDN (2016), 200Gbps networking (2020), direct-connect topology (2017), dynamic traffic engineering (2018), dynamic topology engineering (2021).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Swift: Delay is Simple and Effective for Congestion Control in the Datacenter, SIGCOMM ‘20 [&lt;/span&gt;&lt;a href="https://research.google/pubs/swift-delay-is-simple-and-effective-for-congestion-control-in-the-datacenter/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Swift, a congestion control protocol using hardware timestamps and AIMD control with a delay target, delivers excellent performance in Google datacenters with low flow completion times for short RPCs and high throughput for long RPCs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;First deployed in production in 2017&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;PLB: Congestion Signals are Simple and Effective for Network Load Balancing, SIGCOMM ‘22 [&lt;/span&gt;&lt;a href="https://research.google/pubs/plb-congestion-signals-are-simple-and-effective-for-network-load-balancing/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Protective Load Balancing (PLB) is a simple, effective host-based load balancing design that reduces network congestion and improves performance by randomly changing paths for congested connections, preferring to repath after idle periods to minimize packet reordering.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;First deployed in production in 2020&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;1. Any minute where a statistically significant number of network flows in the data center network experience a total or partial outage above a defined threshold.&lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 30 Oct 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/networking/speed-scale-reliability-25-years-of-data-center-networking/</guid><category>Systems</category><category>Networking</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/25_years.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Speed, scale and reliability: 25 years of Google data-center networking evolution</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/25_years.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/networking/speed-scale-reliability-25-years-of-data-center-networking/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amin Vahdat</name><title>VP/GM, Machine Learning, Systems, and Cloud AI, Google Cloud</title><department></department><company></company></author></item><item><title>Sustainable silicon to intelligent clouds: collaborating for the future of computing</title><link>https://cloud.google.com/blog/topics/systems/2024-ocp-global-summit-keynote/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;&lt;strong&gt;Editor’s note&lt;/strong&gt;: Today, we hear from Parthasarathy Ranganathan, Google VP and Technical Fellow and Amber Huffman, Principal Engineer. Partha delivered a keynote address today at the &lt;/span&gt;&lt;a href="https://www.opencompute.org/summit/global-summit" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;2024 OCP Global Summit&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, an annual conference for leaders, researchers, and pioneers in the open hardware industry. Amber is on the board of directors at the &lt;/span&gt;&lt;a href="http://www.opencompute.org" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;Open Compute Project&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; (OCP). Read on to hear about the past and future of hyperscale computing, and an overview of all of our activities in the OCP community.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are in an exciting era of hyperscale computing, one where a new wave of innovations is building the foundation for AI/ML computing in the cloud. Building on Google’s rich 25-year history in hyperscale computing, we look ahead to how &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;co-design&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;collaboration&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; — across the hardware-software stack, disciplines, and communities — will be key to this exciting new future. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;From scrappy beginnings to societal infrastructure&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When Google was founded in 1998, it was clear that successful web search would require enormous amounts of computing power and storage. This led to the design of the very first hyperscale computers specialized for search. These early makeshift systems included creative cost-reduction approaches like &lt;/span&gt;&lt;a href="https://collection.sciencemuseumgroup.org.uk/objects/co8358083/google-cork-board-server-1999" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;corkboard servers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and off-the-shelf fans from Walmart, and they set the stage for the hardware-software co-design and workload-specific specialization principles that we follow to this day. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Building on these first systems, over the subsequent decade, Google laid the groundwork for modern hyperscale computing, pioneering custom servers, custom networking, and custom data centers, and expanding our services beyond search to include Gmail, YouTube, and Android. All of this presaged the modern multi-workload cloud. During this period, we also developed essential systems software like &lt;/span&gt;&lt;a href="https://research.google/pubs/large-scale-cluster-management-at-google-with-borg/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Borg&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/a-peek-behind-colossus-googles-file-system"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Colossus&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://research.google/pubs/mapreduce-simplified-data-processing-on-large-clusters/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MapReduce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://research.google/pubs/bigtable-a-distributed-storage-system-for-structured-data/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Bigtable&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. In the following years, we focused on scaling these systems, while also prioritizing security, reliability, and power efficiency. The &lt;/span&gt;&lt;a href="https://en.wikipedia.org/wiki/Open_Compute_Project" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;formation of the Open Compute Project&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (OCP) in 2011 marked the transition of hyperscale computing from niche discipline to more mainstream offering. In the current decade, hyperscale computing is characterized by innovations to counter the slowing of Moore’s law: specialized hardware to support machine learning and video processing as well as software-defined servers to manage heterogeneity. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, hyperscale computing has truly come into its own, evolving into the crucial societal infrastructure that drives cloud and AI workloads.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Cross-disciplinary co-design: the heart of innovation&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Across all these Google innovations over the past 25 years, one theme has remained constant: a strong commitment to cross-disciplinary systems innovation and co-design. Looking ahead to the AI era, we continue to take a holistic approach: from “mud to cloud” — starting at the very ground on which we build our data centers up to to broader cloud computing services; and from “chip to ship” — designing hardware that we then deploy and use in production. This philosophy has driven some incredible efficiency gains, delivering orders-of-magnitude improvements across multiple generations of systems.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Take our Tensor Processing Units (TPUs). Multiple generations of these purpose-built AI accelerators (including our latest &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Trillium TPU&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) have driven significant advances in machine learning, including large-language models like &lt;/span&gt;&lt;a href="https://blog.google/technology/ai/google-gemini-ai/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://www.nature.com/articles/d41586-024-03214-7" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Nobel-prize-winning&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; scientific breakthroughs like &lt;/span&gt;&lt;a href="https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlphaFold&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. However, we’ve gone beyond just chip design to considering the entire system that surrounds them. We've &lt;/span&gt;&lt;a href="https://hc2023.hotchips.org/assets/program/conference/day2/ML%20training/HC2023.Session5.ML_Training.Google.Norm_Jouppi.Andy_Swing.Final_2023-08-25.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;coupled TPUs with innovations&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; like liquid cooling, advanced networking systems featuring cutting-edge optics and topology awareness, and a commitment to sustainable power, all in the service of creating a truly amazing AI platform. We've then layered open software frameworks like JAX, TensorFlow, OpenXLA, and Kubernetes on top of this hardware foundation, creating what we call the &lt;/span&gt;&lt;a href="https://cloud.google.com/solutions/ai-hypercomputer"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This hypercomputer is further enhanced by integrating with model gardens and applications, creating a vertically integrated ecosystem that's optimized for AI workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_w3Mkyhx.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Cross-industry collaboration: from ideas to impact&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;But there’s also another aspect of holistic co-design that has served us well: cross-industry collaborations, i.e., building standards and ecosystems. Our partnership with OCP is an important example of this. Since formally joining OCP in 2016, we’ve continued to grow our contributions year after year. Looking ahead, we want to highlight progress and opportunities in four key areas.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Sustainability&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Last year, Google, along with fellow hyperscalers, &lt;/span&gt;&lt;a href="https://imasons.org/press-releases/greener-concrete-for-digital-infrastructure-an-open-letter-and-call-to-action/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;rallied&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; the industry to reduce carbon emissions with an ambitious roadmap towards greener concrete. We have since made good progress, collaborating to develop new metrics and benchmarks, identifying streamlined data center designs that minimize concrete use, and even using AI to research new materials. At a recent &lt;/span&gt;&lt;a href="https://www.opencompute.org/blog/leading-data-center-companies-partner-with-open-compute-project-foundation-and-wje-to-trial-green-concrete" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;event&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, we &lt;/span&gt;&lt;a href="https://vimeo.com/1003646073" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;demonstrated&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; proof-of-concept concrete mixtures that can reduce carbon emissions by 20% to 40%. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As we work towards net-zero emissions by 2030 across our operations and value chain, there’s a lot more we can do. At OCP this year, we are discussing how to develop &lt;/span&gt;&lt;a href="https://www.environdec.com/product-category-rules-pcr/the-pcr" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;product category rules&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (PCRs) to accurately measure hardware emissions across the lifecycle, make more high-quality carbon data available, and develop clean reliable power backup for our data centers. Further, we’re continuing to look holistically at all aspects of our energy consumption, carbon footprint, and water usage. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Trusted silicon&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Trusted silicon is a foundational element of hyperscaler systems. Over the past three years, we have collaborated on &lt;/span&gt;&lt;a href="https://chipsalliance.github.io/Caliptra/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Caliptra&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a re-usable IP block for root-of-trust management, and delivered an open-source implementation of Caliptra 1.0 that is being integrated by companies across the ecosystem. Google's future TPUs and ARM SoCs will also include Caliptra. Leveraging Caliptra, the &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=ImeRgORWgOo&amp;amp;list=PLAG-eekRQBSgIzncibp47d3Yry4rIduQS&amp;amp;index=5&amp;amp;pp=iAQB" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP L.O.C.K.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; project will provide layered open-source cryptographic key management for storage devices, improving both trust and sustainability. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the area of silicon reliability, we are continuing our industry-academia collaborations around a systems approach to addressing silicon faults and silent data errors, including &lt;/span&gt;&lt;a href="https://www.opencompute.org/blog/ocps-server-resilience-initiative-sdc-academic-research-awards-announced" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;funding six leading academic institutions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for novel research. The &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/external-ver-1-0-open-compute-specification-server-component-resilience-sdc-workstream-docx-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Server Component Resilience (SDC) Specification&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; discusses the opportunities ahead with standardized information exchange and test metrics and open frameworks for detecting and mitigating errors. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;AI accelerators&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;AI represents a fundamental platform shift requiring us to innovate across hardware and software. Google has played an active role in driving standardization efforts for AI accelerators, particularly in areas like low-precision data formats (e.g., &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-12-01-pdf-1" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP FP8 and MX&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;), software frameworks (e.g., &lt;/span&gt;&lt;a href="https://openxla.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OpenXLA&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, JAX, TensorFlow), and networking (&lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Falcon&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://ultraethernet.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ultra Ethernet&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://www.businesswire.com/news/home/20240530653602/en/AMD-Broadcom-Cisco-Google-Hewlett-Packard-Enterprise-Intel-Meta-and-Microsoft-Form-Ultra-Accelerator-Link-UALink-Promoter-Group-to-Drive-Data-Center-AI-Connectivity" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ultra Accelerator Link&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;). Working with other hyperscalers and GPU suppliers, we have also aligned on common specifications for &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-gpu-fw-update-specification-v0-9-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;firmware updates&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-gpu-accelerator-management-interfaces-v0-9-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;management interfaces&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-gpu-and-accelerators-ras-requirements-v0-9-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RAS&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (reliability, availability, serviceability). &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;But as AI continues to drive exponential demands on computing, we can do more. As part of the &lt;/span&gt;&lt;a href="https://www.opencompute.org/projects/open-systems-for-ai" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP AI Strategic initiative&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, we are sharing learnings from deploying over 1 GW of liquid cooled infrastructure to help the industry scale this capability. We are also identifying new power-delivery solutions, from chips to racks to data centers. Notably, akin to how Google &lt;/span&gt;&lt;a href="https://www.datacenterfrontier.com/cloud/article/11431310/google-unveils-48v-data-center-rack-joins-open-compute" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;led the industry with 48V racks&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, at OCP Summit this year, we are proposing 400V DC distribution and rack solutions that can significantly improve data center density and efficiency.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Systems infrastructure&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, we continue to make great progress on foundational systems infrastructure. Google's contributions this past year span contributions to NVM Express for the data center (e.g., security enhancements, open test repositories), servers (e.g., &lt;/span&gt;&lt;a href="https://opentitan.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OpenTitan&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; platform root of trust), and networking (Falcon, SONiC advancements in telemetry and simulation, advanced PCIe enclosure compatible form factor), as well as new efforts such as the open-source random shock and vibration testing. At the same time, we’ve gone beyond technical contributions to form and co-chair the &lt;/span&gt;&lt;a href="https://www.opencompute.org/about/advisory-board" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP Advisory Board&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; as well as guide the formation of the OCP AI Strategic Initiative.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Looking ahead, we will continue to keep innovating in this space, particularly to meet the next level of scale required by AI infrastructure. Notably, at the OCP Summit this year, we are discussing the adoption of robotics and automation for data centers. Across a range of activities (material movement, monitoring/inspection, servicing/repair, media management), robotics enable data center operations to scale safely and sustainably, and present a fundamental shift in how we build these facilities. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Innovating for the new intelligence revolution&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We have a lot to be proud of over the past 25 years of hyperscale computing, but the best is yet to come. With AI, we are at an exciting inflection point in computing: the beginning of the &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;new intelligence revolution&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. Akin to prior shifts — the industrial revolution for manufacturing or the information revolution with the mobile internet — this revolution will have a profound impact on both technology and society, and holistic system innovations will be key to enabling it. We look forward to collaborating with all of you on this exciting journey. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 15 Oct 2024 18:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/2024-ocp-global-summit-keynote/</guid><category>Systems</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/OCP24_blog_hero_1.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Sustainable silicon to intelligent clouds: collaborating for the future of computing</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/OCP24_blog_hero_1.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/2024-ocp-global-summit-keynote/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Parthasarathy Ranganathan</name><title>VP, Engineering Fellow</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amber Huffman</name><title>Principal Engineer, Google</title><department></department><company></company></author></item><item><title>Advancing systems research: Synthesized Google storage I/O traces now available to the community</title><link>https://cloud.google.com/blog/topics/systems/synthesized-google-storage-io-traces-now-available-as-open-source/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Designing large-scale distributed storage systems is a complex challenge, requiring deep insights into how storage hardware and software interact under real-world conditions. To empower researchers in this field, we recently released a collection of synthesized Google I/O traces for storage servers and disks. This release accompanies our paper, "&lt;/span&gt;&lt;a href="https://dl.acm.org/doi/10.1145/3620666.3651337" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Thesios: Synthesizing Accurate Counterfactual I/O Traces from I/O Samples&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;," published at ASPLOS 2024.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What are I/O traces and why do they matter?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;I/O traces are records of the input/output operations happening on storage devices and servers, and are crucial for understanding real-world storage behavior and performance. Representative I/O traces that capture the diverse patterns and demands of exascale data centers (such as Google’s) are especially valuable. By studying these traces, researchers can:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Gain deeper insights into storage system performance and bottlenecks&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Build more accurate models and simulate realistic workloads&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Develop targeted optimizations for more efficient and reliable storage systems&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;But obtaining high-quality I/O traces is challenging due to storage-system heterogeneity and the need to capture details while minimizing overhead. To address these issues, we developed a novel methodology called Thesios.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Introducing Thesios: a methodology for I/O trace synthesis&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We developed the Thesios&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; methodology to create accurate and representative I/O traces. Thesios achieves this by combining down-sampled I/O traces (which are routinely collected in Google's data centers) from multiple disks across multiple storage servers.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_RyWYHjV.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="qj2cn"&gt;Thesios synthesizes a full-resolution I/O trace for a single disk by combining I/O samples from multiple independent disks.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2-synthesis-method.jpg"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="qj2cn"&gt;Thesios requires (1) a sampling service that collects I/O samples, (2) an entity identifier that identifies similar disks, (3) a trace synthesizer that generates server-level traces by combining samples, and (4) a trace reorganizer that adjusts request ordering and latency to produce a disk-level trace.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The challenge? Storage systems are internally heterogeneous, so naively combining samples collected from disks varying in model, size, utilization, and other aspects will not result in a representative trace. Thesios intelligently accounts for this diversity, helping to ensure that the synthesized traces accurately reflect real-world conditions. Our results show remarkable accuracy relative to actual aggregated statistics that we’ve collected:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;95-99.5%&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; accuracy in read/write request numbers&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;90-97%&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; accuracy in utilization&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;80-99.8%&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; accuracy in read latency&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3-reads-breakdown.max-1000x1000.jpg"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="qj2cn"&gt;Total number of read operations and breakdown by latency-sensitive (L), throughput-oriented (TP) and other (O) requests of synthesized traces vs. the actual statistics. The traces synthesized by Thesios faithfully capture the fluctuation across days of the week, and hours of the day.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A unique capability of Thesios is the ability to synthesize counterfactual I/O traces for conducting data-driven “what-if'' studies. In our paper, we demonstrate how Thesios enables diverse counterfactual I/O-trace synthesis and analyses of hypothetical policy, hardware, and server changes via four example case studies:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Synthesizing I/O traces for disks with hypothetical capacities, utilization, and fullness&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Experimenting with data segregation to form hot and cold disks by using different workload filtering criteria and analyzing the data segregation’s impacts on power consumption&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Evaluating the impact on energy consumption and latency of deploying a low rotations-per-minute (RPM) disk&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Estimating the impact on cache hits of increasing buffer cache size on a server&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Why open-source these traces?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We have released two-month-long synthesized representative traces from three different Google storage clusters, containing approximately 2.5 billion I/O records. These traces include I/O operations from both user-facing and internal applications. Our goal is to fuel storage-systems research by sharing realistic workloads that we encountered in our large-scale data centers. We hope these traces will:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Inspire new optimizations and innovations in storage technology&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable more accurate simulations and modeling of large-scale storage systems&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Serve as a model for how industry can securely share production traces with academia, fostering collaboration and progress&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We invite systems researchers to explore our Google I/O traces. We believe these traces offer a unique opportunity to delve into the complex world of large-scale storage and drive meaningful advancements. Download the &lt;/span&gt;&lt;a href="https://github.com/google-research-datasets/thesios" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;traces&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and start your research today!&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For a deeper dive into our methodology and the technical details, we encourage you to read our ASPLOS paper: &lt;/span&gt;&lt;a href="https://dl.acm.org/doi/10.1145/3620666.3651337" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Thesios: Synthesizing Accurate Counterfactual I/O Traces from I/O Samples&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;The research in this post describes joint work with our colleagues Soroush Ghodrati, Selene Moon, and Martin Maas. We also extend special thanks to Larry Greenfield, Mustafa Uysal, Arif Merchant, Seth Pollen, and Partha Ranganathan for their help and feedback on the trace release.&lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 25 Jun 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/synthesized-google-storage-io-traces-now-available-as-open-source/</guid><category>Storage &amp; Data Transfer</category><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Advancing systems research: Synthesized Google storage I/O traces now available to the community</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/synthesized-google-storage-io-traces-now-available-as-open-source/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Phitchaya Mangpo Phothilimthana</name><title>Staff Research Scientist, Google DeepMind</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Saurabh Kadekodi</name><title>Senior Research Scientist, Google</title><department></department><company></company></author></item><item><title>Announcing Trillium, the sixth generation of Google Cloud TPU</title><link>https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Generative AI is transforming how we interact with technology while simultaneously opening tremendous efficiency &lt;/span&gt;&lt;a href="https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;opportunities for business impact&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. But these advances require ever greater compute, memory, and communication to train and fine tune the most capable models and to serve them interactively to a global user population. For more than a decade, we at Google have been developing custom AI-specific hardware, Tensor Processing Units, or TPUs, to push forward the frontier of what is possible in scale and efficiency. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This hardware supported a number of the innovations we announced today at Google I/O, including new models like &lt;/span&gt;&lt;a href="https://blog.google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini 1.5 Flash&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://blog.google/technology/ai/google-generative-ai-veo-imagen-3/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Imagen 3&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemma 2&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;; all of these models have been trained on and are served using TPUs. To deliver the next frontier of models and enable you to do the same, we’re excited to announce Trillium, our sixth-generation TPU, the most performant and most energy-efficient TPU to date.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; TPUs achieve an impressive 4.7X increase in peak compute performance per chip compared to TPU v5e. We doubled the High Bandwidth Memory (HBM) capacity and bandwidth, and also doubled the Interchip Interconnect (ICI) bandwidth over TPU v5e. Additionally, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; is equipped with third-generation &lt;/span&gt;&lt;a href="https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#sparsecore"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;SparseCore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a specialized accelerator for processing ultra-large embeddings common in advanced ranking and recommendation workloads. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; TPUs make it possible to train the next wave of foundation models faster and serve those models with reduced latency and lower cost. Critically, our sixth-generation TPUs are also our most sustainable: Trillium TPUs are over 67% more energy-efficient than TPU v5e.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium can scale up to 256 TPUs in a single high-bandwidth, low-latency pod. Beyond this pod-level scalability, with &lt;/span&gt;&lt;a href="https://cloud.google.com/tpu/docs/multislice-introduction"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;multislice technology&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/titanium"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Titanium Intelligence Processing Units (IPUs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;), Trillium TPUs can scale to hundreds of pods, connecting tens of thousands of chips in a building-scale supercomputer interconnected by a &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;multi-petabit-per-second datacenter network&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;The next phase of AI innovation with Trillium&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;More than a decade ago&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Google recognized the need for a first-of-its-kind chip for machine learning. In 2013, we began work on the world’s first purpose-built AI accelerator, TPU v1, followed by the first Cloud TPU in 2017. Without TPUs, many of Google’s most popular services — such as real-time voice search, photo object recognition, and interactive language translation, along with the state-of-the-art foundation models such as Gemini, Imagen, and Gemma — would not be possible. In fact, the scale and efficiency of TPUs enabled foundational work on &lt;/span&gt;&lt;a href="https://research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Transformers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in Google Research, the algorithmic underpinnings of modern generative AI. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;4.7X increase in compute performance per Trillium chip&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;TPUs were designed from the ground up for neural networks, and we’re always working to improve training and serving times for AI workloads. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; achieves 4.7X peak compute per chip compared to TPU v5e. To achieve this level of performance, we’ve expanded the size of &lt;/span&gt;&lt;a href="https://cloud.google.com/tpu/docs/system-architecture-tpu-vm"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;matrix multiply units (MXUs)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and increased the clock speed. Additionally, SparseCores accelerate embedding-heavy workloads by strategically offloading random and fine-grained access from TensorCores. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;2X ICI and High Bandwidth Memory (HBM) capacity and bandwidth&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Doubling the HBM capacity and bandwidth allows Trillium to work with larger models with more weights and larger key-value caches. Next-generation HBM enables higher memory bandwidth, improved power efficiency, and a flexible channel architecture to increase memory throughput. This improves training time and serving latency for large models. That’s twice the model weights and key-value caches, accessed faster and with more compute capacity for accelerating ML workloads. Doubling the ICI bandwidth enables training and inference jobs to scale to tens of thousands of chips powered by a strategic combination of custom optical ICI interconnects with 256 chips in a pod and &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Jupiter Networking&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; that extends scalability to hundreds of pods in a cluster.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; will power the next generation of AI models &lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium TPUs will power the next wave of AI models and agents, and we’re looking forward to helping enable our customers with these advanced capabilities. For example, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Essential&lt;/strong&gt;&lt;strong style="vertical-align: baseline;"&gt; AI’s&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; mission is to &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;deepen&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; the partnership between humans and computers, and is looking forward to using Trillium to reinvent how businesses operate. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Nuro&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is dedicated to creating a better everyday life through robotics by training their models with Cloud TPUs; &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Deep Genomics&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is powering the future of drug discovery with AI and looking forward to how their next foundational model, powered by Trillium, will change the lives of patients; and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Deloitte&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, Google Cloud Partner of the Year for AI, will offer Trillium to transform businesses with generative AI. Support for &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;training and serving of long-context, multimodal models on Trillium TPUs&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; will also enable &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google DeepMind&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to train and serve the future generations of Gemini models faster, more efficiently, and with lower latency than ever before&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_carousel"&gt;


&lt;div class="h-c-page article-module"&gt;
  &lt;div class="article-module glue-pagination h-c-carousel h-c-carousel--simple h-c-carousel--dark ng-cloak" data-glue-pagination-config="{cyclical: true}"&gt;

    &lt;div class="h-c-carousel__wrap"&gt;
      &lt;ul class="glue-carousel ng-cloak" data-glue-carousel-options="{pointerTypes: ['touch', 'mouse'], jump: true}"&gt;

        
          &lt;li class="h-c-carousel__item article-carousel__slide"&gt;
            &lt;figure&gt;
              
                
                  
                  &lt;div class="article-carousel__slide-img" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/1_jeff_dean.max-2000x2000.png);"&gt;&lt;span class="h-u-visually-hidden"&gt;Jeff Dean&lt;/span&gt;&lt;/div&gt;
                
              

              
            &lt;/figure&gt;
          &lt;/li&gt;
        
          &lt;li class="h-c-carousel__item article-carousel__slide"&gt;
            &lt;figure&gt;
              
                
                  
                  &lt;div class="article-carousel__slide-img" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/Blog-post-5.max-2000x2000.png);"&gt;&lt;span class="h-u-visually-hidden"&gt;Blog-post-5&lt;/span&gt;&lt;/div&gt;
                
              

              
            &lt;/figure&gt;
          &lt;/li&gt;
        
          &lt;li class="h-c-carousel__item article-carousel__slide"&gt;
            &lt;figure&gt;
              
                
                  
                  &lt;div class="article-carousel__slide-img" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Andrew_Clare.max-2000x2000.png);"&gt;&lt;span class="h-u-visually-hidden"&gt;Andrew Clare&lt;/span&gt;&lt;/div&gt;
                
              

              
            &lt;/figure&gt;
          &lt;/li&gt;
        
          &lt;li class="h-c-carousel__item article-carousel__slide"&gt;
            &lt;figure&gt;
              
                
                  
                  &lt;div class="article-carousel__slide-img" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/3_Brendan_Frey.max-2000x2000.png);"&gt;&lt;span class="h-u-visually-hidden"&gt;Brendan Frey&lt;/span&gt;&lt;/div&gt;
                
              

              
            &lt;/figure&gt;
          &lt;/li&gt;
        
          &lt;li class="h-c-carousel__item article-carousel__slide"&gt;
            &lt;figure&gt;
              
                
                  
                  &lt;div class="article-carousel__slide-img" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Matt_Lacey.max-2000x2000.png);"&gt;&lt;span class="h-u-visually-hidden"&gt;Matt Lacey&lt;/span&gt;&lt;/div&gt;
                
              

              
            &lt;/figure&gt;
          &lt;/li&gt;
        

      &lt;/ul&gt;

      &lt;div class="h-c-carousel__paginate glue-pagination-previous" data-glue-pagination-label="Previous" data-glue-pagination-update-model="false"&gt;
        &lt;div class="h-c-carousel__paginate-wrap"&gt;
          &lt;svg role="img" class="h-c-icon h-c-icon--keyboard-arrow-left"&gt;
            &lt;use xlink:href="#mi-keyboard-arrow-right"&gt;&lt;/use&gt;
          &lt;/svg&gt;
        &lt;/div&gt;
      &lt;/div&gt;

      &lt;div class="h-c-carousel__paginate glue-pagination-next" data-glue-pagination-label="Next" data-glue-pagination-update-model="false"&gt;
        &lt;div class="h-c-carousel__paginate-wrap"&gt;
          &lt;svg role="img" class="h-c-icon h-c-icon--keyboard-arrow-right"&gt;
            &lt;use xlink:href="#mi-keyboard-arrow-right"&gt;&lt;/use&gt;
          &lt;/svg&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;

    &lt;div class="h-c-carousel__navigation"&gt;
      &lt;div class="glue-pagination-page-list"&gt;&lt;/div&gt;
    &lt;/div&gt;

  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; and AI Hypercomputer &lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium TPUs are a part of Google Cloud's &lt;/span&gt;&lt;a href="https://cloud.google.com/solutions/ai-hypercomputer"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a groundbreaking supercomputing architecture designed specifically for cutting-edge AI workloads. It integrates performance-optimized infrastructure (including Trillium TPUs), open-source software frameworks, and flexible consumption models. Our commitment to open-source libraries like JAX, PyTorch/XLA, and Keras 3 empowers developers. Support for JAX and XLA means that declarative model description written for any previous generation of TPUs maps directly to the new hardware and network capabilities of Trillium TPUs. We've also partnered with Hugging Face on Optimum-TPU for streamlined model training and serving.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Our partnership with Google Cloud makes it easier for Hugging Face users to fine-tune and run open models on Google Cloud’s AI infrastructure, including TPUs. We are excited to further accelerate open source AI with the upcoming &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;sixth-generation&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; Trillium TPUs, and we expect open models to continue to deliver optimal performance thanks to the 4.7X increase in performance per chip compared to the previous generation. We will make the performance of Trillium easily available to all AI builders through our new Optimum-TPU library!" &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Jeff Boudier, Head of Product, Hugging Face&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://sada.com/" rel="noopener" target="_blank"&gt;SADA&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (An Insight Company)&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; has been Partner of the Year each year since 2017 and delivers Google Cloud Services for maximum impact. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;As a proud Google Cloud Premier Partner, SADA has a 20-year long history with the world’s established AI pioneer. We are rapidly integrating AI for thousands of diverse customers. With our depth of experience and the AI Hypercomputer architecture, we can't wait to help our customers unlock the value of this next frontier of generative AI models with Trillium. -&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Miles Ward, CTO, SADA&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI Hypercomputer also offers the flexible consumption models required for AI/ML workloads. Dynamic Workload Scheduler (DWS) makes it easier to access AI/ML resources and helps customers optimize their spend. Flex start mode can improve the experience of bursty workloads such as training, fine-tuning, or batch jobs, by scheduling all the accelerators needed simultaneously, regardless of your entry point: Vertex AI Training, Google Kubernetes Engine (GKE) or Google Cloud &lt;span style="vertical-align: baseline;"&gt;Compute &lt;/span&gt;Engine.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Lightricks is excited to gain value back with the increase in performance coupled with the efficiency gain from AI Hypercomputer. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;“&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;We’ve been using TPUs for our text-to-image and text-to-video models since Cloud TPU v4. With TPU v5p and AI Hypercomputer efficiencies, we achieved a whopping 2.5X increase in training speed! The 6th generation of Trillium TPUs are incredible with a 4.7X increased compute performance per chip and 2X HBM Capacity and Bandwidth improvement over the previous generation. This came just in time for us as we scale our text-to-video models. We’re also looking forward to using Dynamic Workload Scheduler’s flex start mode to manage our batch inference jobs and to manage our future TPU reservations.” -&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Yoav HaCohen, PhD, Core Generative AI Research Team Lead, Lightricks&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about Google Cloud Trillium TPUs &lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud TPUs are the cutting-edge of AI acceleration, custom-designed and optimized to empower large-scale artificial intelligence models. Exclusively available through Google Cloud, TPUs deliver unparalleled performance and cost-efficiency for training and serving AI solutions. Whether it's the complex intricacies of large language models or the creative potential of image generation, TPUs help enable developers and researchers to push the boundaries of what's possible in the world of artificial intelligence. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The sixth-generation Trillium TPUs are a culmination of over a decade of research and innovation and will be available later this year. To learn more about &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Trillium&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; TPUs and AI Hypercomputer, &lt;/span&gt;&lt;a href="https://inthecloud.withgoogle.com/content-promotion-ai-infra-contact/dl-cd.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;please complete this form&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and our sales team will be in touch.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 14 May 2024 18:05:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus/</guid><category>AI &amp; Machine Learning</category><category>Systems</category><category>Google I/O</category><category>Compute</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Most-Advanced-TPU_1.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Announcing Trillium, the sixth generation of Google Cloud TPU</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Most-Advanced-TPU_1.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amin Vahdat</name><title>SVP and Chief Technologist, AI and Infrastructure</title><department></department><company></company></author></item><item><title>Caliptra: Building trust, one chip at a time</title><link>https://cloud.google.com/blog/topics/systems/google-security-innovation-at-the-ocp-regional-summit/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google, we build &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/google-systems-innovations-at-ocp-global-summit"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sustainable, secure, and scalable hardware and software&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to enable services that support billions of users. We have &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/announcing-open-innovations-for-a-new-era-of-systems-design"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;embraced open innovation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; as a core tenet to deliver these experiences. Our society’s AI-driven future includes many types of system-on-chips (SoCs) acting in concert with each other — from CPUs to GPUs to TPUs to NICs to SSDs and more. To deliver secure solutions at scale, there must be trust and transparency for the firmware that runs on all of these chips.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Welcoming Caliptra 1.0 &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/announcing-open-innovations-for-a-new-era-of-systems-design?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;partnered&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with AMD, Microsoft, and NVIDIA to develop &lt;/span&gt;&lt;a href="https://www.opencompute.org/blog/cloud-security-integrating-trust-into-every-chip" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Caliptra&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a standard at the &lt;/span&gt;&lt;a href="http://www.opencompute.org" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Open Compute Project&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (OCP) to raise the bar on security for chips. Caliptra is a hardware root-of-trust (RoT) that provides verifiable cryptographic assurances to help ensure that only recognized and trusted firmware is allowed to run production workloads. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Caliptra’s initial focus is on hardware implementations used in confidential computing, and, over time, will extend to all chips. To address the increasingly sophisticated nature of cyberattacks, the team went beyond a written specification to deliver an open-source implementation at the &lt;/span&gt;&lt;a href="https://www.chipsalliance.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;CHIPS Alliance&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. The result is a silicon-level intellectual property (IP) block for integration into future chips, including CPUs, GPUs, and SSDs. The Caliptra source code also covers the block’s ROM and firmware.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are pleased to announce that the Caliptra specification and open-source hardware and software implementation is complete, reaching the revision 1.0 milestone. The Caliptra community continues to grow and now includes 9elements, AMI, Antmicro, &lt;span style="vertical-align: baseline;"&gt;ASPEED&lt;/span&gt;, Axiado, Lubis EDA, ScaleFlux, Marvell and Nuvoton, who together have significant domain expertise across SoC design automation, firmware, and verification.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Caliptra IP block is currently being integrated by companies across the ecosystem into chips that will start to appear in the market in 2026. In less than two years, we have gone from project inception to a complete specification and open-source implementation of the hardware and software.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_VMCd7c5.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The team is already working on the next iteration with Caliptra 2.0, which will tackle quantum cryptography to comply with NIST’s recommendations for &lt;/span&gt;&lt;a href="https://csrc.nist.gov/pubs/fips/204/ipd" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;module-lattice-based digital signatures&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://csrc.nist.gov/pubs/sp/800/208/final" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;stateful hash-based signature schemes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Download the &lt;/span&gt;&lt;a href="https://www.opencompute.org/documents/ocp-caliptra-1-0-20240418-noheaders-pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Caliptra 1.0 specification&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and access the open source repositories at &lt;/span&gt;&lt;a href="http://caliptra.io" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;caliptra.io&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;OCP S.A.F.E.&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google, Microsoft, and OCP are also engaged in a complementary effort to raise the bar on security assessments: &lt;/span&gt;&lt;a href="https://drive.google.com/file/d/1Yyt8jGCbLf6nARPXTWGaFWhZOGqBm6Vn/view?usp=drive_link" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP Security Appraisal Framework for Enablement&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (OCP S.A.F.E.). This program provides security conformance assurance to consumers of devices such as SSDs. The program has certified a list of approved OCP Security Review Providers (SRPs) who conduct security conformance reviews to ensure the provenance, code quality, and software supply chain for firmware releases and patches for devices, while protecting the intellectual property of the device vendors. You can learn more about OCP’s &lt;/span&gt;&lt;a href="https://www.opencompute.org/projects/ocp-safe-program" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;S.A.F.E. program here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.  &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What’s to come&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Already, Caliptra has emerged as a high-quality specification and implementation that addresses security of a complex problem. And we’re following up on it with a new initiative called &lt;/span&gt;&lt;a href="https://drive.google.com/file/d/1_WWIyTUzWtub_IdePcfOvwjAXKbOXGd2/view?usp=drive_link" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP Layered Open-source Cryptographic Key-management&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (OCP L.O.C.K.) Established by Google, Microsoft, Samsung, Solidigm and KIOXIA, OCP L.O.C.K. defines and implements a standard for NVM Express (NVMe) key management block to protect customer data even if a physical drive is stolen from a data center. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;It’s energizing to unite with industry leaders to deliver technology that will make society’s infrastructure more trustworthy and secure, using open source as a mechanism to help the hardware, firmware, and software achieve the standard’s objectives in a transparent and auditable manner. You can learn more about Caliptra, OCP S.A.F.E., and OCP L.O.C.K. at the &lt;/span&gt;&lt;a href="https://2024ocpregional.fnvirtual.app/a/schedule/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OCP Regional Summit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; this week in Lisbon, Portugal. We are looking forward to discussing these technologies and inventing the future together.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 23 Apr 2024 20:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/google-security-innovation-at-the-ocp-regional-summit/</guid><category>Security &amp; Identity</category><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Caliptra: Building trust, one chip at a time</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/google-security-innovation-at-the-ocp-regional-summit/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Andrés Lagar-Cavilla</name><title>Distinguished Engineer, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amber Huffman</name><title>Principal Engineer, Google</title><department></department><company></company></author></item><item><title>What’s new with Google Cloud’s AI Hypercomputer architecture</title><link>https://cloud.google.com/blog/products/compute/whats-new-with-google-clouds-ai-hypercomputer-architecture/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Advancements in AI are unlocking use-cases previously thought impossible. Larger and more complex AI models are enabling powerful capabilities across a full range of applications involving text, code, images, videos, voice, music, and more. As a result, leveraging AI has become an innovation imperative for businesses and organizations around the world, with the potential to boost human potential and productivity.  &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;However, the AI workloads powering these exciting use-cases place incredible demands on the underlying&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; compute, networking, and storage infrastructure&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. And that’s only one aspect of the architecture: customers also face the challenge of integrating open-source software, frameworks, and data platforms, while optimizing for resource consumption to harness the power of AI cost-effectively. Historically, this has required manually combining component-level enhancements, which can lead to inefficiencies and bottlenecks. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That’s why today we’re pleased to announce significant enhancements at every layer of our &lt;/span&gt;&lt;a href="https://cloud.google.com/solutions/ai-hypercomputer"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; architecture. This systems-level approach combines performance-optimized hardware, open software and frameworks, and flexible consumption models to enable developers and businesses to be more productive, because the overall system runs with higher performance and effectiveness, and the models generated are served more efficiently. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In fact, just last month, Forrester Research recognized Google as a Leader in &lt;/span&gt;&lt;a href="https://inthecloud.withgoogle.com/forrester-2024-ai-infra-wave/dl-cd.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;The Forrester Wave™: AI Infrastructure Solutions&lt;/span&gt;&lt;/a&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt;, Q1 2024, with the highest scores of any vendor evaluated in both the Current Offering and Strategy categories in this report.  &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The announcements we’re making today span every layer of the AI Hypercomputer architecture:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Performance-optimized hardware enhancements &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;including the general availability of Cloud TPU v5p, and A3 Mega VMs powered by NVIDIA H100 Tensor Core GPUs, with higher performance for large-scale training with enhanced networking capabilities&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Storage portfolio optimizations for AI workloads &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;including Hyperdisk ML, a new block storage service optimized for AI inference/serving workloads, and new caching capabilities in Cloud Storage FUSE and Parallelstore, which improve training and inferencing throughput and latency &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Open software advancements&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; including the introduction of JetStream — a throughput- and memory-optimized inference engine for large language models (LLMs) that offers higher performance per dollar on open models like Gemma 7B, and JAX and PyTorch/XLA releases that improve performance on both Cloud TPUs and NVIDIA GPUs&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;New flexible consumption options with Dynamic Workload Scheduler, &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;including&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;calendar mode for start time assurance, and flex start mode for optimized economics&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about AI Hypercomputer with a rare look inside one of our data centers:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=XIyTs2Rr0sE"
      data-glue-modal-trigger="uni-modal-XIyTs2Rr0sE-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/240404_Ai-Infra_Thumb_v1.max-1000x1000.png);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Learn more about AI Hypercomputer with a rare look inside one of our data centers&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-XIyTs2Rr0sE-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="XIyTs2Rr0sE"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=XIyTs2Rr0sE"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/AI-Hypercomputer-Architecture.max-1000x1000.jpg"
        
          alt="Screenshot 2024-04-05 at 6.26.35 PM"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Advances in performance-optimized hardware&lt;/strong&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud TPU v5p GA&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We’re thrilled to announce the general availability of Cloud TPU v5p, our most powerful and scalable TPU to date. TPU v5p is a next-generation accelerator that is purpose-built to train some of the largest and most demanding generative AI models. A single TPU v5p pod contains 8,960 chips that run in unison — over 2x the chips in a TPU v4 pod. Beyond the larger scale, TPU v5p also delivers over 2x higher FLOPS and 3x more high-bandwidth memory on a per chip basis.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;It also delivers near-linear improvement in throughput as customers use larger slices, achieving 11.97X throughput for a 12x increase in slice size (from 512 to 6144 chips).&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Comprehensive GKE support for TPU v5p&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;To enable training and serving the largest AI models on GKE across large-scale TPU clusters, today we’re also announcing the general availability of both Google Kubernetes Engine (GKE) support for Cloud TPU v5p and TPU multi-host serving on GKE. TPU multi-host serving on GKE allows customers to manage a group of model servers deployed over multiple hosts as a single logical unit, so they can be managed and monitored centrally.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;“&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;By leveraging Google Cloud’s TPU v5p on Google Kubernetes Engine (GKE), Lightricks has achieved a remarkable 2.5X speed-up in training our text-to-image and text-to-video models compared to TPU v4. &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;GKE ensures that we are able to smoothly leverage TPU v5p for the specific training jobs that need the performance boost&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Yoav HaCohen, PhD, Core Generative AI Research Team Lead, Lightricks&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Expanded NVIDIA H100 GPU capabilities with A3 Mega GA and Confidential Compute&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We’re also expanding our NVIDIA GPU capabilities with additions to the A3 VM family, which now includes A3 Mega. A3 Mega, powered by NVIDIA H100 GPUs, will be generally available next month and offers double the GPU-to-GPU networking bandwidth of A3. Confidential Computing will also be coming to the A3 VM family, in preview later this year. Enabling confidential VMs on the A3 machine series protects the confidentiality and integrity of sensitive data and AI workloads and mitigates threats from unauthorized access. Enabling Confidential Computing on the A3 VM family encrypts the data transfers between the Intel TDX-enabled CPU and NVIDIA H100 GPU via protected PCIe, and requires no code changes.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Bringing NVIDIA Blackwell GPUs to Google Cloud&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We also recently announced that we will be bringing NVIDIA’s newest Blackwell platform to our AI Hypercomputer architecture in two configurations. Google Cloud customers will have access to VMs powered by both the NVIDIA HGX B200 and GB200 NVL72 GPUs. The new VMs with HGX B200 GPU &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;is designed for the most demanding AI, data analytics, and HPC workloads&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, while the upcoming VMs powered by the liquid-cooled GB200 NVL72 GPU will enable &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;a new era of computing with real-time LLM inference and massive-scale training performance for trillion-parameter scale models.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;  &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Customers leveraging both Google Cloud TPU and GPU-based services&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Character.AI is a powerful, direct-to-consumer AI computing platform where users can easily create and interact with a variety of characters. Character.AI is using Google Cloud’s AI Hypercomputer architecture across GPU- and TPU-based infrastructure to meet the needs of its rapidly growing community. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Character.AI is using Google Cloud's Tensor Processor Units (TPUs) and A3 VMs running on NVIDIA H100 Tensor Core GPUs to train and infer LLMs faster and more efficiently. The optionality of GPUs and TPUs running on the powerful AI-first infrastructure makes Google Cloud our obvious choice as we scale to deliver new features and capabilities to millions of users. It’s exciting to see the innovation of next-generation accelerators in the overall AI landscape, including Google Cloud TPU v5e and A3 VMs with H100 GPUs. We expect both of these platforms to offer more than 2X more cost-efficient performance than their respective previous generations.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Noam Shazeer&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, CEO, Character AI&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Storage optimized for AI/ML workloads&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To improve AI training, fine-tuning, and inference performance, we've added a number of enhancements to our storage products, including caching, which keeps the data closer to your compute instances, so you can train much faster. Each of these improvements also maximizes GPU and TPU utilization, leading to higher energy efficiency and cost optimization. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/storage/docs/gcs-fuse"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Storage FUSE&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;(generally available) is a file-based interface for Google Cloud Storage that harnesses Cloud Storage capabilities for more complex AI/ML apps by providing file access to our high-performance, low-cost cloud storage solutions. Today we announced that new caching capabilities are generally available. Cloud Storge FUSE caching improves training throughput by 2.9X and improves serving performance for one of our own foundation models by 2.2X.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/parallelstore?e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Parallelstore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; now also includes caching (in preview). Parallelstore is a high-performance parallel filesystem optimized for AI/ML and HPC workloads. New caching capabilities enable up to 3.9X faster training times and up to 3.7X higher training throughput, compared to native ML framework data loaders&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/filestore?e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Filestore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (generally available) is optimized for AI/ML models that require low latency, file-based data access. The network file system-based approach allows all GPUs and TPUs within a cluster to simultaneously access the same data, which&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; improves training times by up to 56%, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;optimizing the performance of your AI workloads and boosting your most demanding AI projects.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re also pleased to introduce Hyperdisk ML in preview&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;our next-generation block storage service optimized for AI inference/serving workloads. Hyperdisk ML &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;accelerates model load times up to 12X compared to common alternatives, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;and offers cost efficiency through read-only, multi-attach, and thin provisioning. It enables up to 2,500 instances to access the same volume and delivers up to 1.2 TiB/s of aggregate throughput per volume — &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;over 100X greater performance than Microsoft Azure Ultra SSD&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; and Amazon EBS io2 BlockExpress.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Advancements in our open software&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Starting from frameworks and spanning the full software stack, we’re introducing open-source enhancements that enable customers to improve time-to-value for AI workloads by simplifying the developer experience while improving performance and cost efficiency.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;JAX and high-performance reference implementations&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We’re pleased to introduce &lt;/span&gt;&lt;a href="https://github.com/google/maxdiffusion" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;MaxDiffusion&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a new high-performance and scalable reference implementation for diffusion models. We’re also introducing new LLM models in &lt;/span&gt;&lt;a href="https://github.com/google/maxtext" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;MaxText&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Gemma, GPT3, LLAMA2 and Mistral&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; across both Cloud TPUs and NVIDIA GPUs. Customers can jump-start their AI model development with these open-source implementations and then customize them further based on their needs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;MaxText and MaxDiffusion models are built on JAX, a cutting-edge framework for high-performance numerical computing and large-scale machine learning. JAX in turn is integrated with the OpenXLA compiler, which optimizes numerical functions and delivers excellent performance at scale, allowing model builders to focus on the math and let the software drive the most effective implementation. We’ve heavily optimized JAX and OpenXLA performance on Cloud TPU and also partnered closely with NVIDIA to optimize OpenXLA performance on large Cloud GPU clusters.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Advancing PyTorch support&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;As part of our commitment to PyTorch,&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;support for PyTorch/XLA 2.3 will follow the upstream release later this month.&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;PyTorch/XLA enables tens of thousands of PyTorch developers to get the best performance from XLA devices such as TPUs and GPUs, without having to learn a new framework. The new release brings features such as single program, multiple data (SPMD) auto-sharding, and asynchronous distributed checkpointing, making running a distributed training job much easier and more scalable.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;And for PyTorch users in the Hugging Face community, we worked with Hugging Face to launch &lt;/span&gt;&lt;a href="https://huggingface.co/docs/optimum-tpu/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Optimum-TPU&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a performance-optimized package that will help developers easily train and serve Hugging Face models on TPUs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Jetstream: New LLM inference engine&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We’re introducing &lt;/span&gt;&lt;a href="https://github.com/google/JetStream" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Jetstream&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, an open-source, throughput- and memory-optimized LLM inference engine for XLA devices, starting with TPUs, that offers up to 3x higher inferences per dollar on Gemma 7B and other open models. As customers bring their AI workloads to production, there’s an increasing demand for a cost-efficient inference stack that delivers high performance. JetStream supports models trained with both JAX and PyTorch/XLA, and includes optimizations for popular open models such as Llama 2 and Gemma. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Open community models in collaboration with NVIDIA&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Additionally, as part of the NVIDIA and Google collaboration with open community models, Google models will be available as NVIDIA NIM inference microservices to provide developers with an open, flexible platform to train and deploy using their preferred tools and frameworks.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;New Dynamic Workload Scheduler modes&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dynamic Workload Scheduler&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is a resource management and job scheduling service that’s designed for AI workloads. Dynamic Workload Scheduler improves access to AI computing capacity and helps you optimize your spend for AI workloads by scheduling all the accelerators needed simultaneously, and for a guaranteed duration. Dynamic Workload Scheduler offers two modes: flex start mode (in preview) for enhanced obtainability with optimized economics, and calendar mode (in preview) for predictable job start times and durations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Flex start jobs are cued to run as soon as possible, based on resource availability, making it easier to obtain TPU and GPU resources for jobs that have a flexible start time. Flex start mode is now integrated across Compute Engine &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instance-groups/create-resize-requests-mig"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Instance Groups&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/batch/docs/get-started"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Batch&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/vertex-ai/docs/training-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Vertex AI Custom Training&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, in addition to &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/provisioningrequest"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Kubernetes Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (GKE). With flex start, you can now run thousands of AI/ML jobs with increased obtainability across the various TPU and GPU types offered in Google Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Calendar mode offers short-term reserved access to AI-optimized computing capacity. You can reserve collocated GPUs for up to 14 days, which can be purchased up to 8 weeks in advance. This new mode extends Compute Engine &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/using-compute-engine-future-reservations-for-capacity-planning"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;future reservation capabilities&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Your reservations are confirmed, based on availability, and the capacity is delivered to your project on your requested start date. You can then simply create VMs targeting the capacity block for the entire duration of the reservation.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Dynamic Workload Scheduler improved on-demand GPU obtainability by 80%, accelerating experiment iteration for our researchers. Leveraging the built-in Kueue and GKE integration, we were able to take advantage of new GPU capacity in Dynamic Workload Scheduler quickly and save months of development work.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Alex Hays, Software Engineer, Two Sigma&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;AI anywhere with Google Distributed Cloud&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The acceleration of AI adoption by enterprises has highlighted the need for flexible deployment options to process or securely analyze data closer to where it is generated. &lt;/span&gt;&lt;a href="https://cloud.google.com/distributed-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Distributed Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (GDC) brings the power of Google's cloud services wherever you need them — in your own data center or at the edge. Today we introduced several enhancements to GDC, including a generative AI search package solution powered by &lt;/span&gt;&lt;a href="https://ai.google.dev/gemma" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemma&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, an expanded ecosystem of partner solutions, new compliance certifications and more. Learn more about &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure-modernization/unlock-ai-anywhere-with-google-distributed-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;how to use GDC to run AI anywhere&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Our growing momentum with Google AI infrastructure&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Next this week we’re launching incredible AI innovation across everything from AI platforms and models to AI assistance with Gemini for Google Cloud — all underpinned by a foundation of AI-optimized infrastructure. All of this innovation is driving incredible momentum for our customers. In fact, nearly 90% of generative AI unicorns and more than 60% of funded gen AI startups are Google Cloud customers. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Runway’s text-to-video platform is powered by AI Hypercomputer. At the base, A3 VMs, powered by NVIDIA H100 GPUs gave our training a significant performance boost over A2 VMs, enabling large-scale training and inference for our Gen-2 model. Using GKE to orchestrate our training jobs enables us to scale to thousands of H100s in a single fabric to meet our customers’ growing demand.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Anastasis Germanidis, CTO and Co-Founder, Runway&lt;/strong&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"By moving to Google Cloud and leveraging the AI Hypercomputer architecture with G2 VMs powered by NVIDIA L4 GPUs and Triton Inference Server, we saw a significant boost in our model inference performance while lowering our hosting costs by 15% using novel techniques enabled by the flexibility that Google Cloud offers.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;-&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Ashwin Kannan, Sr. Staff Machine Learning Engineer, Palo Alto Networks&lt;/strong&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Writer's platform is powered by Google Cloud A3 and G2 VMs powered by NVIDIA H100 and L4 GPUs. With GKE we're able to efficiently train and inference over 17 large language models (LLMs) that scale up to over 70B parameters. We leverage Nvidia NeMo Framework to build our industrial strength models which generate 990,000 words a second with over a trillion API calls per month. We're delivering the highest quality inferencing models that exceed those from companies with larger teams and bigger budgets and all of that is possible with the Google and Nvidia partnership.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Waseem Alshikh Cofounder and CTO, Writer&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about &lt;/span&gt;&lt;a href="https://cloud.google.com/solutions/ai-hypercomputer"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; at the Next sessions below, and ask your &lt;/span&gt;&lt;a href="https://cloud.google.com/contact"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sales representative&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; about how you can apply these capabilities within your own organization. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;SPTL205&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; - &lt;/span&gt;&lt;a href="https://cloud.withgoogle.com/next/session-library?filters=session-type-spotlight&amp;amp;session=SPTL205#all" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Workload-optimized and AI-powered Infrastructure&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;ARC108 &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;- &lt;/span&gt;&lt;a href="https://cloud.withgoogle.com/next/session-library?filters=session-type-spotlight&amp;amp;session=ARC108#all" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Take large scale AI from research to production with Google Cloud's AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;IHLT303&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; - &lt;/span&gt;&lt;a href="https://cloud.withgoogle.com/next/session-library?filters=session-type-spotlight&amp;amp;session=IHLT303#all" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;How Lightricks is powering generative image models with Cloud TPUs and AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;1. Forrester Research, The Forrester Wave™: AI Infrastructure Solutions, Q1 2024, Mike Gualtieri, Sudha Maheshwari, Sarah Morana, Jen Barton, March 17, 2024&lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Save is a graphical representation of Forrester’s call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave™. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change. &lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 09 Apr 2024 12:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/compute/whats-new-with-google-clouds-ai-hypercomputer-architecture/</guid><category>AI &amp; Machine Learning</category><category>Google Cloud Next</category><category>Systems</category><category>AI Hypercomputer</category><category>Compute</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Next24_Blog_Images_6-02.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new with Google Cloud’s AI Hypercomputer architecture</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Next24_Blog_Images_6-02.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/compute/whats-new-with-google-clouds-ai-hypercomputer-architecture/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mark Lohmeyer</name><title>VP and GM, AI and Computing Infrastructure</title><department></department><company></company></author></item><item><title>Coming of age in the fifth epoch of distributed computing, accelerated by machine learning</title><link>https://cloud.google.com/blog/topics/systems/the-fifth-epoch-of-distributed-computing/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Editor’s note:&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; Today, we hear from Google Fellow Amin Vahdat, who is the VP &amp;amp; GM for ML, Systems, and Cloud AI at Google. Amin originally delivered this as a &lt;/span&gt;&lt;a href="https://youtu.be/9lBbqH_1KS4?si=a5pIFwdhD86PglMp" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;keynote in 2023 at the University of Washington for The Allen School's Distinguished Lecture Series&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;. This post captures Amin’s reflections on the history of distributed computing, where we are today, and what we can expect for the next-generation of computing services.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Over the past fifty years, computing and communication have transformed society with sustained exponential growth in capacity, efficiency, and capability. Over that time, we have, as a community, delivered a &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;50-million-fold&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; increase in transistor count per CPU and grown the Internet from 4 nodes to &lt;/span&gt;&lt;a href="https://www.internetworldstats.com/stats.htm" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;5.39 billion&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While these advances are impressive, the human capabilities that result from these advances are even more compelling, sometimes bordering on what was previously the domain of science fiction. We now have near-instantaneous access to the evolving state of human knowledge, limited only by our ability to make sense of it. We can now perform real-time language translation, breaking down fundamental barriers to human communication. Commensurate improvements in sensing and network speeds are delivering real-time &lt;/span&gt;&lt;a href="https://blog.google/technology/research/project-starline/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;holographic projections&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; that will begin to support meaningful interaction at a distance. This explosion in computing capability is also powering next-generation AI systems that are solving some of the hardest scientific and engineering challenges of our time, for example, &lt;/span&gt;&lt;a href="https://www.nature.com/articles/s41586-021-03819-2" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;predicting the 3D structure of a protein&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, almost instantly, down to atomic accuracy, or unlocking &lt;/span&gt;&lt;a href="https://deepmind.google/technologies/imagen-2/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;advanced text-to-image diffusion technology&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, delivering high-quality, photorealistic outputs that are consistent with a user’s prompt.  &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Maintaining the pace of underlying technological progress has not been easy. Every 10-15 years, we encounter fundamental challenges that require foundational inventions and breakthroughs to sustain the exponential growth of the efficiency and scale of our infrastructure, which in turn power entirely new categories of services. It is as if every factor of a thousand exposes some new fundamental, progressively more challenging limit that must be overcome and creates &lt;/span&gt;&lt;a href="https://ieeexplore.ieee.org/document/4785818" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;some transformative opportunity&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. We are in one of those watershed moments, a once-in-a-generation challenge and opportunity to maintain and accelerate the awe-inspiring rate of progress at a time when the underlying, seemingly insatiable demand for computing is only accelerating.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A look back on the brief history of computing suggests that we have worked through four such major transitions, each defining an ‘epoch’ of computing. We offer a historical taxonomy that points to a manifest need to define and to drive a&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; fifth epoch&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; of computing, one that is data-centric, declarative, outcome-oriented, software-defined, and centered on &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;proactively&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; bringing insights to people. While each previous epoch made the previously unimaginable routine, this fifth epoch will bring about the largest transformation thus far, promising to democratize access to knowledge and opportunity. But at the same time, it will require overcoming some of the most intrinsically difficult, and cross-stack challenges in computing.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We begin our look back at &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Epoch 0. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Purists will correctly argue that we could look back thousands of years further, but we choose to start with some truly landmark and foundational developments in computer science that took place between 1947-1969, laying the basis for modern computing and communication.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;1947:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Bardeen&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, Brattain and Shockley&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; invent the first working transistor.&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;1948:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Shannon introduces Information Theory, the basis for all network communication.&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;1949:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Stored programs in computers become operational.&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;1956:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; High-level programming languages are invented.&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;1964: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Instruction Set Architectures, common across different hardware generations, emerge.&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;1965: &lt;/strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Moore%27s_law" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Moore’s Law&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; introduced, positing that transistor count per integrated circuit will double every 18-24 months.&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;1967:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Multi-user operating systems provide protected sharing of resources.&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;1969: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Introduction of the ARPANet, the basis for the modern Internet.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These breakthroughs became the basis for modern computing at the end of Epoch 0: four computers based on integrated circuits running stable instruction set architectures and a multi-user, time-shared operating system connected to a packet-switched internet. This seemingly humble baseline laid the foundation for exponential progress in subsequent epochs.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Epoch_1.max-1000x1000.png"
        
          alt="Epoch_1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the first Epoch, computer networks were largely used in an asynchronous manner: transfer data across the network (e.g., via FTP), operate on it, and then transfer results back. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Notable developments: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;SQL&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;FTP, email, and Telnet&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Interaction time among computers:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; 100 milliseconds &lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Characteristics:&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;• Low-bandwidth, high-latency networks&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Rare pairwise interactions between expensive computers&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Character keystroke interactions with humans&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• The emergence of open source software &lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Breakthrough: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Personal computers&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Epoch_2.max-1000x1000.png"
        
          alt="Epoch_2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Aided by increasing network speeds, prevalence of personal computers/workstations, and widespread, interoperable protocols (IP, TCP, NFS, HTTP), synchronous, transparent computation and communication became widespread in Epoch 2.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Notable developments:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Remote Procedure Call, client/server computing, LANs, leader election and consensus &lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Interaction time among computers:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; 10 milliseconds&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Characteristics:&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;• 10 Mbps networks&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Internet Architecture scales globally thanks to TCP/IP&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Full 32-bit CPU fits on a chip&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Shared resources between multiple computers&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Breakthrough:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The World Wide Web&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Epoch_3.max-1000x1000.png"
        
          alt="Epoch_3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In Epoch 3, the true breakthrough of HTTP and the World Wide Web brought network computing to the masses, breaking the confines of personal computing. To keep pace with continued exponential growth in the Internet and the needs of a global user population, many of the design patterns of modern computing were established during this period. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;One of the drivers of Epoch 3 was the end of &lt;/span&gt;&lt;a href="https://www.rambus.com/blogs/understanding-dennard-scaling-2/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dennard scaling&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which essentially limited the maximum clock frequency of a single CPU core. This limitation led the industry to adopt multi-core architectures, necessitating a move toward asynchronous, multi-threaded, and concurrent development environments.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Notable developments: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;HTTP, three-tier services, massive clusters, web search &lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Interaction time among computers:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; 1 millisecond&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Characteristics:&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;• 100 Mbps–1Gbs networks&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Autonomous Systems / BGP &lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Complex apps no longer fit on a single server; scaling to many servers&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Web indexing and search, population-scale email&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Breakthrough: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Cluster-based Internet services, mobile-first design, multithreading and instruction-level parallelism &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Epoch_4.max-1000x1000.png"
        
          alt="Epoch_4"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Epoch 4 established planetary-scale services available to billions of people through ubiquitous cellular devices. In parallel, a renaissance in machine learning drove more real-time control and insights. All of this was powered by warehouse-scale clusters of commodity computers interconnected by high-speed networks, which together processed vast datasets in real-time.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Notable developments:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Global cellular data coverage, planet-scale services, ubiquitous video&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Interaction time among computers:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; 100 microseconds&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Characteristics:&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;• 10-100 Gbps networks, flash&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Multiple cores per CPU socket&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Infrastructure that scales out across LANs (e.g., GFS, MapReduce, Hadoop)&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Mobile apps, global cellular data coverage&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Breakthroughs: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Mainstream machine learning, readily available specialized compute hardware, cloud computing.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Epoch_5.max-1000x1000.png"
        
          alt="Epoch_5"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we have transitioned to the fifth Epoch, which is marked by a superposition of two opposing trends. First, while transistor count per ASIC continues to increase at exponential rates, clock rates are flat and the cost of each transistor is now nearly flat, both limited by the increasing complexity and investment required to achieve smaller feature sizes. The implication is that performance normalized to cost improvements, or performance efficiency, of all of compute, DRAM, storage, and network infrastructure, is flattening. At the same time, ubiquitous network coverage, broadly deployed sensors, and data-hungry machine learning applications are accelerating the demand for raw computing infrastructure exponentially.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Notable developments:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Machine learning, generative AI, privacy, sustainability, societal infrastructure &lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Interaction time among computers:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; 10 microseconds&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Featuring&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;:&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• 200Gbps–1+Tb/s networks&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Ubiquitous, power-efficient, and high-speed wireless network coverage&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Increasingly specialized accelerators: TPUs, GPUs, Smart NICs&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Socket-level fabrics, optics, federated architectures&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;• Connected spaces, vehicles, appliances, wearables, etc…&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Breakthroughs: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Many coming...&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Without fundamental breakthroughs in computing design and organization, our ability as a community to meet societal demands for computing infrastructure will falter. Coming up with new architectures to overcome these limitations, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;new &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;hardware and increasingly, software architectures, will define the fifth epoch of computing.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While we cannot predict the breakthroughs that will be delivered in this fifth epoch of computing, we do know that each previous epoch has been characterized by a factor of 100x improvement in scale, efficiency, and cost-performance, all while improving security and reliability. The demand for scale and capability is only increasing, so delivering such gains without the tailwinds of Moore’s Law and Dennard scaling at our backs will be daunting. We imagine, however, the broad strokes will involve:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Declarative programming models:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The &lt;/span&gt;&lt;a href="https://en.wikipedia.org/wiki/Von_Neumann_architecture" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Von Neumann model&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; of sequential code execution on a dedicated processor has been incredibly useful for developers for decades. However, the rise of distributed and multi-threaded computing has broken the abstraction to the point where much of modern imperative code focuses on defensive, and often inefficient, constructs to manage asynchrony, heterogeneity, tail latency, optimistic concurrency, and failures. Complexity will only increase in the years ahead, essentially requiring new declarative programming models focused on intent, the user, and business logic. At the same time, managing execution flow and responding to shifting deployment conditions will need to be delegated to increasingly sophisticated compilers and &lt;/span&gt;&lt;a href="https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;ML-powered runtimes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Hardware segmentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; In earlier epochs, a general-purpose server architecture with a system balance of CPU, memory, storage, and networking could efficiently meet workload needs throughout the data center. However, when designing for specialized computing needs, ML training, inference, video processing, the conflicting requirements for storage, memory capacity, latency, bandwidth and communication is causing a proliferation of heterogeneous designs. When general-purpose compute performance was improving at 1.5x/year, pursuing even a 5x improvement for 10% of workloads did not make sense given the complexity. Today, such improvements can no longer be ignored. Addressing this gap will require new approaches to designing, verifying, qualifying, and deploying composable hardware ASICs and memory units in months, not years.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Software-defined infrastructure:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; As underlying infrastructure has become more complex and more distributed, multiple layers of virtualization from memory to CPU have maintained the single server abstraction for individual applications. This trend will continue in the coming epoch as infrastructure continues to scale out and become more heterogeneous. The corollary of hardware segmentation, declarative programming models and distributed computing environments comprised of thousands of servers, will stretch virtualization beyond the confines of individual servers to include distributed computing on a single server, multiple servers, storage/memory arrays, and clusters — in some cases bringing resources across an entire campus together to efficiently deliver end results.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Provably secure computation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; In the last epoch, the need to sustain compute efficiency inadvertently came at the cost of &lt;/span&gt;&lt;a href="https://dl.acm.org/doi/abs/10.1145/3399742" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;security&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;reliability&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. However, as our lives move increasingly online, the need for privacy and confidentiality increases exponentially for individuals, for business, and governments. Data sovereignty, or the need to restrict the physical location of data, even derived, will become increasingly important to adhere to government policies, but also to transparently show the lineage of increasingly ML-generated content. Despite some cost in baseline performance, these needs must be first-class requirements and constraints. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Sustainability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The first three epochs of computing delivered exponential improvements in performance for fixed power. With the end of Dennard scaling in the fourth epoch, global power consumption associated with computing has grown quickly, partially offset by the move to cloud-hosted infrastructure, which is 2-3x more power-efficient relative to earlier, on-premises designs. Further, cloud providers have made broad commitments to move to first carbon-neutral and then carbon-free power sources. However, the demand for data and compute will continue to grow and even likely accelerate in the fifth epoch. This will turn power-efficiency and carbon emissions into primary systems-evaluation metrics. Of particular note, &lt;/span&gt;&lt;a href="https://en.wikipedia.org/wiki/Embedded_emissions" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;embodied carbon&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; over the entire lifecycle of infrastructure build and delivery will require both improved visibility and optimization. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Algorithmic innovation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The tailwinds of exponentially increasing performance have allowed software efficiency improvements to often go neglected. As improvement in underlying hardware components slows, the focus will turn to software and algorithmic opportunities. &lt;/span&gt;&lt;a href="https://www.science.org/doi/10.1126/science.aam9744" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Studies&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; indicate that opportunities for 2-10x improvement in software optimization abound in systems code. Efficiently identifying these software optimization opportunities and developing techniques to gracefully and reliably deliver these benefits to production systems at scale will be a critical opportunity. Leveraging recent breakthroughs in coding LLMs to partially automate this work would be a significant accelerant in the fifth epoch. &lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Integrating across the above, the fifth epoch will be ruled by measures of overall &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;user-system&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; efficiency (useful answers per second) rather than lower-level per-component measures such as cost per MIPS, cost per GB of DRAM, cost per Gb/s, etc. Further, the units of efficiency will not be simply measured in performance-per-unit-cost but will &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=EFe7-WZMMhc" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;explicitly account for power consumption and carbon emissions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and will take security and privacy as primary metrics, all while enforcing reliability requirements for the infrastructure on which society increasingly depends. Taken together, there are many untapped opportunities to deliver the next generation of infrastructure: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;A greater than 10x opportunity in scale-out efficiency of our distributed infrastructure across hardware and software.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Another 10x opportunity in matching &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;application balance points&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; — that is, the ratio between different system resources such as compute, accelerators, memory, storage, and network — through software-defined infrastructure. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;A more than 10x opportunity in next-generation accelerators and segment-specific hardware components relative to traditional one-size-fits-all, general-purpose computing architectures.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, there is a hard-to-quantify but absolutely critical opportunity to improve developer productivity while simultaneously delivering substantially improved reliability and security.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Combining these trends, we are on the cusp of yet another dramatic 1000x efficiency gain over the next epoch that will define the next generation of infrastructure services and enable the next generation of computing services, likely centering around breakthroughs in multimodal models and generative AI. The opportunity to define, design, and deploy what computing means for the next generation does not come along very often, and the tectonic shifts in this fifth epoch promise perhaps the biggest technical transformations and challenges to date, requiring a level of responsibility, collaboration and vision perhaps not seen since the earliest days of computing. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Epoch_Infographic.max-1000x1000.jpg"
        
          alt="5_Epoch_Infographic"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;</description><pubDate>Thu, 15 Feb 2024 15:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/the-fifth-epoch-of-distributed-computing/</guid><category>Compute</category><category>Infrastructure Modernization</category><category>Systems</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Distributed-Computing-HeroBanner-2436x1200.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Coming of age in the fifth epoch of distributed computing, accelerated by machine learning</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Distributed-Computing-HeroBanner-2436x1200.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/the-fifth-epoch-of-distributed-computing/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amin Vahdat</name><title>VP/GM, Machine Learning, Systems, and Cloud AI, Google Cloud</title><department></department><company></company></author></item><item><title>Enabling next-generation AI workloads: Announcing TPU v5p and AI Hypercomputer</title><link>https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer/</link><description>&lt;div class="block-paragraph"&gt;&lt;p data-block-key="5z6n0"&gt;Generative AI (gen AI) models are rapidly evolving, offering unparalleled sophistication and capability. This advancement empowers enterprises and developers across various industries to solve complex problems and unlock new opportunities. However, the growth in gen AI models — &lt;a href="https://cloud.google.com/blog/products/compute/announcing-cloud-tpu-v5e-and-a3-gpus-in-ga"&gt;with a tenfold increase in parameters annually over the past five years&lt;/a&gt; — brings heightened requirements for training, tuning, and inference. Today's larger models, featuring hundreds of billions or even trillions of parameters, require extensive training periods, sometimes spanning months, even on the most specialized systems. Additionally, efficient AI workload management necessitates a coherently integrated AI stack consisting of optimized compute, storage, networking, software and development frameworks.&lt;/p&gt;&lt;p data-block-key="5hv55"&gt;Today, to address these challenges, we are excited to announce Cloud TPU v5p, our most powerful, scalable, and flexible AI accelerator thus far. TPUs have long been the basis for training and serving AI-powered products like YouTube, Gmail, Google Maps, Google Play, and Android. In fact, Gemini, Google’s most capable and general AI model &lt;a href="https://blog.google/technology/ai/google-gemini-ai" target="_blank"&gt;announced today&lt;/a&gt;, was trained on, and is served, using TPUs.&lt;/p&gt;&lt;p data-block-key="dlap3"&gt;In addition, we are also announcing AI Hypercomputer from Google Cloud, a groundbreaking supercomputer architecture that employs an integrated system of performance-optimized hardware, open software, leading ML frameworks, and flexible consumption models. Traditional methods often tackle demanding AI workloads through piecemeal, component-level enhancements, which can lead to inefficiencies and bottlenecks. In contrast, AI Hypercomputer employs systems-level codesign to boost efficiency and productivity across AI training, tuning, and serving.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=hszd5UqnfLk"
      data-glue-modal-trigger="uni-modal-hszd5UqnfLk-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/Thumbnail_-_AI_Infra_Launch_v1.max-1000x1000.png);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Introducing AI Hypercomputer with Cloud TPU v5p&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-hszd5UqnfLk-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="hszd5UqnfLk"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=hszd5UqnfLk"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;h3 data-block-key="5z6n0"&gt;&lt;b&gt;Inside Cloud TPU v5p, our most powerful and scalable TPU accelerator to date&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="ackie"&gt;Earlier this year, &lt;a href="https://cloud.google.com/blog/products/compute/announcing-cloud-tpu-v5e-in-ga"&gt;we announced&lt;/a&gt; the general availability of Cloud TPU v5e. With 2.3X price performance improvements over the previous generation TPU v4&lt;sup&gt;1&lt;/sup&gt;, it is our most &lt;i&gt;cost-efficient&lt;/i&gt; TPU to date. By contrast, Cloud TPU v5p, is our most &lt;i&gt;powerful&lt;/i&gt; TPU thus far. Each TPU v5p pod &lt;b&gt;composes together 8,960 chips&lt;/b&gt; over our &lt;b&gt;highest-bandwidth inter-chip interconnect (ICI) at 4,800 Gbps/chip in a 3D torus topology&lt;/b&gt;. Compared to TPU v4, TPU v5p features more than&lt;b&gt; 2X greater FLOPS and 3X more high-bandwidth memory (HBM)&lt;/b&gt;.&lt;/p&gt;&lt;p data-block-key="67r9p"&gt;Designed for performance, flexibility, and scale, TPU v5p can &lt;b&gt;train large LLM models 2.8X faster&lt;/b&gt; than the previous-generation TPU v4. Moreover, with second-generation &lt;a href="https://cloud.google.com/tpu"&gt;SparseCores&lt;/a&gt;, TPU v5p can &lt;b&gt;train embedding-dense models 1.9X faster&lt;/b&gt; than TPU v4&lt;sup&gt;2&lt;/sup&gt;.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_next-generation_AI_workloads.max-1000x1000.jpg"
        
          alt="1 next-generation AI workloads"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="z6r3m"&gt;Source: TPU v5p and v4 are based on Google Internal Data. As of November, 2023: All numbers normalized per chip seq-len=2048 for GPT-3 175 billion parameter model.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_next-generation_AI_workloads.max-1000x1000.jpg"
        
          alt="2 next-generation AI workloads"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="z6r3m"&gt;Source: TPU v5e data is from MLPerf™ 3.1 Training Closed results for v5e. TPU v5p and v4 are based on Google internal training runs. As of November, 2023: All numbers normalized per chip seq-len=2048 for GPT-3 175 billion parameter model. It shows relative performance per dollar using the public list price of TPU v4 ($3.22/chip/hour), TPU v5e ( $1.2/chip/hour) and TPU v5p ($4.2/chip/hour).&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="5z6n0"&gt;In addition to performance improvements, &lt;b&gt;TPU v5p is also 4X more scalable than TPU v4 in terms of total available FLOPs per pod.&lt;/b&gt; Doubling the floating-point operations per second (FLOPS) over TPU v4 and doubling the number of chips in a single pod provides considerable improvement in relative performance in training speed.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_next-generation_AI_workloads_v1.jpg"
        
          alt="3 next-generation AI workloads"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;h3 data-block-key="5z6n0"&gt;&lt;b&gt;Google AI Hypercomputer delivers peak performance and efficiency at large scale&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="7d6ot"&gt;Achieving both scale and speed is necessary, but not sufficient to meet the needs of modern AI/ML applications and services. The hardware and software components must come together into an integrated, easy-to-use, secure, and reliable computing system. At Google, we’ve done decades of research and development on this very problem, culminating in AI Hypercomputer, a system of technologies optimized to work in concert to enable modern AI workloads.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_next-generation_AI_workloads.max-1000x1000.png"
        
          alt="4 next-generation AI workloads"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;ul&gt;&lt;li data-block-key="5z6n0"&gt;&lt;b&gt;Performance-optimized hardware:&lt;/b&gt; AI Hypercomputer features performance-optimized compute, storage, and networking built over an ultrascale data center infrastructure, leveraging a high-density footprint, liquid cooling, and our &lt;a href="https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network"&gt;Jupiter data center network&lt;/a&gt; technology. All of this is predicated on technologies that are built with &lt;a href="https://www.google.com/about/datacenters/efficiency/" target="_blank"&gt;efficiency&lt;/a&gt; at their core; leveraging &lt;a href="https://cloud.google.com/blog/topics/sustainability/a-smarter-way-to-buy-clean-energy"&gt;clean energy&lt;/a&gt; and &lt;a href="https://blog.google/outreach-initiatives/sustainability/replenishing-water/?_ga=2.140272307.1460901017.1631498684-1474825438.1628277680" target="_blank"&gt;a deep commitment to water stewardship&lt;/a&gt;, and that are &lt;a href="https://blog.google/outreach-initiatives/sustainability/our-third-decade-climate-action-realizing-carbon-free-future/" target="_blank"&gt;helping us move toward a carbon-free future&lt;/a&gt;.&lt;/li&gt;&lt;li data-block-key="905s7"&gt;&lt;b&gt;Open software:&lt;/b&gt; AI Hypercomputer enables developers to access our performance-optimized hardware through the use of open software to tune, manage, and dynamically orchestrate AI training and inference workloads on top of performance-optimized AI hardware.&lt;ul&gt;&lt;li data-block-key="4clbe"&gt;Extensive support for popular ML frameworks such as JAX, TensorFlow, and PyTorch are available right out of the box. Both JAX and PyTorch are powered by &lt;a href="https://github.com/openxla/xla" target="_blank"&gt;OpenXLA&lt;/a&gt; compiler for building sophisticated LLMs. XLA serves as a foundational backbone, enabling the creation of complex multi-layered models (&lt;a href="https://pytorch.org/blog/high-performance-llama-2/" target="_blank"&gt;Llama 2 training and inference on Cloud TPUs with PyTorch/XLA&lt;/a&gt;). It optimizes distributed architectures across a wide range of hardware platforms, ensuring easy-to-use and efficient model development for diverse AI use cases (&lt;a href="https://cloud.google.com/blog/products/compute/assemblyai-on-cloud-tpu-v5e-price-performance"&gt;AssemblyAI leverages JAX/XLA and Cloud TPUs for large-scale AI speech&lt;/a&gt;).&lt;/li&gt;&lt;li data-block-key="98e3i"&gt;Open and unique &lt;a href="https://cloud.google.com/blog/products/compute/using-cloud-tpu-multislice-to-scale-ai-workloads"&gt;Multislice Training&lt;/a&gt; and &lt;a href="https://cloud.google.com/tpu/docs/v5e-inference"&gt;Multihost Inferencing&lt;/a&gt; software, respectively, make scaling, training, and serving workloads smooth and easy. Developers can scale to tens of thousands of chips to support demanding AI workloads.&lt;/li&gt;&lt;li data-block-key="asdik"&gt;Deep integration with &lt;a href="https://cloud.google.com/kubernetes-engine?hl=en"&gt;Google Kubernetes Engine (GKE)&lt;/a&gt; and &lt;a href="https://cloud.google.com/compute?hl=en"&gt;Google Compute Engine&lt;/a&gt;, to deliver efficient resource management, consistent ops environments, autoscaling, node-pool auto-provisioning, auto-checkpointing, auto-resumption, and timely failure recovery.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li data-block-key="6vp9k"&gt;&lt;b&gt;Flexible consumption&lt;/b&gt;: AI Hypercomputer offers a wide range of flexible and dynamic consumption choices. In addition to classic options, such as Committed Use Discounts (CUD), on-demand pricing, and spot pricing, AI Hypercomputer provides consumption models tailored for AI workloads via &lt;a href="https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler"&gt;Dynamic Workload Scheduler.&lt;/a&gt; Dynamic Workload Scheduler introduces two models: Flex Start mode for higher resource obtainability and optimized economics, as well as Calendar mode, which targets workloads with higher predictability on job-start times.&lt;/li&gt;&lt;/ul&gt;&lt;h3 data-block-key="d6pt0"&gt;&lt;b&gt;Leveraging Google’s deep experience to help power the future of AI&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="1p5hf"&gt;Customers like Salesforce and Lightricks are already training and serving large AI models with Google Cloud’s TPU v5p AI Hypercomputer — and already seeing a difference:&lt;/p&gt;&lt;p data-block-key="bmfqe"&gt;&lt;i&gt;“We’ve been leveraging Google Cloud TPU v5p for pre-training Salesforce’s foundational models that will serve as the core engine for specialized production use cases, and we’re seeing considerable improvements in our training speed. In fact, Cloud TPU v5p compute outperforms the previous generation TPU v4 by as much as 2X. We also love how seamless and easy the transition has been from Cloud TPU v4 to v5p using JAX. We’re excited to take these speed gains even further by leveraging the native support for INT8 precision format via the Accurate Quantized Training (AQT) library to optimize our models.” -&lt;/i&gt; Erik Nijkamp, Senior Research Scientist, Salesforce&lt;/p&gt;&lt;p data-block-key="5rp5"&gt;&lt;i&gt;“Leveraging the remarkable performance and ample memory capacity of Google Cloud TPU v5p, we successfully trained our generative text-to-video model without splitting it into separate processes. This optimal hardware utilization significantly accelerates each training cycle, allowing us to swiftly conduct a series of experiments. The ability to train our model quickly in each experiment facilitates rapid iteration, which is an invaluable advantage for our research team in this competitive field of generative AI.”&lt;/i&gt; - Yoav HaCohen, PhD, Core Generative AI Research Team Lead, Lightricks&lt;/p&gt;&lt;p data-block-key="kdp6"&gt;&lt;i&gt;“In our early-stage usage, Google DeepMind and Google Research have observed 2X speedups for LLM training workloads using TPU v5p chips compared to the performance on our TPU v4 generation. The robust support for ML Frameworks (JAX, PyTorch, TensorFlow) and orchestration tools enables us to scale even more efficiently on v5p. With the 2nd generation of SparseCores we also see significant improvement in the performance of embeddings-heavy workloads. TPUs are vital to enabling our largest-scale research and engineering efforts on cutting edge models like Gemini.” -&lt;/i&gt; Jeff Dean, Chief Scientist, Google DeepMind and Google Research&lt;/p&gt;&lt;p data-block-key="1i8qc"&gt;At Google, we’ve long believed in the power of AI to help solve challenging problems. Until very recently, training large foundation models and serving them at scale was too complicated and expensive for many organizations. Today, with Cloud TPU v5p and AI Hypercomputer, we’re excited to extend the result of decades of research in AI and systems design with our customers, so they can innovate with AI faster, more efficiently, and more cost effectively.&lt;/p&gt;&lt;p data-block-key="d91lv"&gt;To request access to Cloud TPU v5p and AI Hypercomputer, please reach out to your &lt;a href="https://cloud.google.com/contact/"&gt;Google Cloud account manager&lt;/a&gt;.&lt;/p&gt;&lt;hr/&gt;&lt;p data-block-key="ek6l1"&gt;&lt;i&gt;&lt;sup&gt;1: MLPerf™ v3.1 Training Closed, multiple benchmarks as shown. Retrieved November 8th, 2023 from&lt;/sup&gt;&lt;/i&gt; &lt;a href="http://mlcommons.org/" target="_blank"&gt;&lt;i&gt;&lt;sup&gt;mlcommons.org&lt;/sup&gt;&lt;/i&gt;&lt;/a&gt;&lt;i&gt;&lt;sup&gt;. Results 3.1-2004. Performance per dollar is not an MLPerf metric. TPU v4 results are unverified: not verified by MLCommons Association. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See&lt;/sup&gt;&lt;/i&gt; &lt;a href="http://www.mlcommons.org/" target="_blank"&gt;&lt;i&gt;&lt;sup&gt;www.mlcommons.org&lt;/sup&gt;&lt;/i&gt;&lt;/a&gt; &lt;i&gt;&lt;sup&gt;for more information.&lt;br/&gt;2: Google Internal Data for TPU v5p as of November, 2023: E2E steptime, SearchAds pCTR, batch size per TPU core 16,384, 125 vp5 chips&lt;/sup&gt;&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 06 Dec 2023 15:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer/</guid><category>Compute</category><category>Infrastructure Modernization</category><category>Systems</category><category>AI &amp; Machine Learning</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Blog_Project_Ariel_J_Templates_3-03.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Enabling next-generation AI workloads: Announcing TPU v5p and AI Hypercomputer</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Blog_Project_Ariel_J_Templates_3-03.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-tpu-v5p-and-ai-hypercomputer/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amin Vahdat</name><title>VP/GM, Machine Learning, Systems, and Cloud AI, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mark Lohmeyer</name><title>VP and GM, AI and Computing Infrastructure</title><department></department><company></company></author></item><item><title>How we’ll build sustainable, scalable, secure infrastructure for an AI-driven future</title><link>https://cloud.google.com/blog/topics/systems/google-systems-innovations-at-ocp-global-summit/</link><description>&lt;div class="block-paragraph"&gt;&lt;p data-block-key="99vp2"&gt;&lt;b&gt;&lt;i&gt;Editor’s note:&lt;/i&gt;&lt;/b&gt;&lt;i&gt; Today, we hear from Parthasarathy Ranganathan, Google VP and Technical Fellow and Amin Vahdat, VP/GM. Partha delivered a keynote address today at the&lt;/i&gt; &lt;a href="https://2023ocpglobal.fnvirtual.app/" target="_blank"&gt;&lt;i&gt;OCP Global Summit&lt;/i&gt;&lt;/a&gt;&lt;i&gt;, an annual conference for leaders, researchers, and pioneers in the open hardware industry. Partha served on the OCP Board of Directors from 2020 to earlier this year, when he was succeeded by Amber Huffman as Google’s representative. Read on to hear about the macro trends driving systems design today, and an overview of all of our activities in the community.&lt;/i&gt;&lt;/p&gt;&lt;hr/&gt;&lt;p data-block-key="9f29g"&gt;At Google, we build planet-scale computing for services that power billions of users, and these services have led to incredible opportunities for system designers to create hardware that operates with high performance, resilience, efficiency, and all at scale. In short, &lt;a href="https://cloud.google.com/blog/topics/systems/announcing-open-innovations-for-a-new-era-of-systems-design"&gt;we have embraced open innovation for a new era of systems design&lt;/a&gt;.&lt;/p&gt;&lt;p data-block-key="4iqbh"&gt;Today, we are at a new fundamental inflection point in computing: the rise of AI. Google products have always had a strong AI component, but in the past year, we have seen a tectonic shift in the industry and have supercharged our core products with the power of generative AI.&lt;/p&gt;&lt;p data-block-key="fr03k"&gt;These advances have shown up across our computing systems and workloads, from the original &lt;a href="https://blog.research.google/2017/08/transformer-novel-neural-network.html" target="_blank"&gt;Transformer model&lt;/a&gt; in 2017, to PaLM in 2022, to Bard today. Large language models have grown from having hundreds of millions of parameters to trillions of parameters, growing by almost an order of magnitude every year. As model sizes increase, so does the computation needed to run these models. That, in essence, sets up the challenge and opportunity that the open innovation community needs to solve together.&lt;/p&gt;&lt;p data-block-key="c32eq"&gt;AI isn’t just an enabler of new applications — it also represents a fundamental platform shift — something that we need to innovate on across hardware and software. Together, we need to build the hardware and software platforms that deliver powerful AI solutions across complex machine-learning supercomputers, all in a sustainable, secure, and scalable manner.&lt;/p&gt;&lt;h3 data-block-key="8e66b"&gt;&lt;b&gt;Towards sustainable systems&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="qp58"&gt;Sustainability is an imperative that we all share. Here are several efforts we are engaged in to help our industry towards achieving net-zero emissions:&lt;/p&gt;&lt;ul&gt;&lt;li data-block-key="2ndh5"&gt;&lt;b&gt;Net Zero Innovation Hub:&lt;/b&gt; The industry answered our call from the OCP Regional Summit in April for a pan-European public and private collaboration to advance sustainability at a regional level. We launched the &lt;a href="https://www.netzerodatacenters.com/" target="_blank"&gt;Net Zero Innovation Hub&lt;/a&gt; with co-founders Danfoss, Google, Microsoft, and Schneider Electric on September 28 with an ambitious agenda across all scopes, including waste-heat reuse and grid availability.&lt;/li&gt;&lt;li data-block-key="2o69e"&gt;&lt;b&gt;Greener concrete:&lt;/b&gt; In collaboration with iMasons Climate Accord, AWS, Google, Meta, and Microsoft, we delivered an ambitious &lt;a href="https://climateaccord.org/news/greener-concrete-for-data-centers-an-open-letter/" target="_blank"&gt;technology roadmap to decarbonize concrete&lt;/a&gt;. We invite the community to partner with us to execute this roadmap together.&lt;/li&gt;&lt;li data-block-key="6qdbs"&gt;&lt;b&gt;Sustainability metrics:&lt;/b&gt; Last year, we formed the &lt;a href="https://www.opencompute.org/projects/dcf-sustainability" target="_blank"&gt;OCP Data Center Facilities Sustainability Subproject&lt;/a&gt;, co-led by Google and Microsoft. The group is making important progress on establishing clear, consistent and standardized metrics for emissions/carbon, energy, water, and beyond. This work will enable an apples-to-apples data-driven approach to assess the best approaches to help achieve our shared goals.&lt;/li&gt;&lt;/ul&gt;&lt;h3 data-block-key="6t5aa"&gt;&lt;b&gt;Enhancing security across the systems stack&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="1ubi2"&gt;Security includes both trusted computing and &lt;a href="https://research.google/pubs/pub50337/" target="_blank"&gt;reliable computing&lt;/a&gt;, and there are several exciting developments coming in this space, including:&lt;/p&gt;&lt;ul&gt;&lt;li data-block-key="3tfiq"&gt;&lt;b&gt;Caliptra:&lt;/b&gt; &lt;a href="http://www.caliptra.org/" target="_blank"&gt;Caliptra&lt;/a&gt; is a re-usable IP block for root-of-trust management. Last year, with industry leaders, AMD, Microsoft, and NVIDIA, we &lt;a href="https://www.opencompute.org/blog/cloud-security-integrating-trust-into-every-chip" target="_blank"&gt;contributed the draft Caliptra specification&lt;/a&gt; to OCP. The Caliptra specification will be complete this year, with the IP block ready for integration into CPUs, GPUs, and other devices. Check out the code repository at &lt;a href="https://github.com/chipsalliance/caliptra" target="_blank"&gt;https://github.com/chipsalliance/caliptra&lt;/a&gt;.&lt;/li&gt;&lt;li data-block-key="5t64"&gt;&lt;b&gt;OCP S.A.F.E.:&lt;/b&gt; In partnership with OCP and Microsoft, we have developed the OCP Security Appraisal Framework and Enablement (S.A.F.E.) program. OCP S.A.F.E. provides a standardized approach for provenance, code quality, and software supply chain for firmware releases. Learn more at &lt;a href="https://www.opencompute.org/projects/ocp-safe-program" target="_blank"&gt;https://www.opencompute.org/projects/ocp-safe-program&lt;/a&gt;.&lt;/li&gt;&lt;li data-block-key="a6m41"&gt;&lt;b&gt;Reliable Computing:&lt;/b&gt; Last year, we formed a server-component resilience workstream at OCP along with AMD, ARM, Intel, Meta, Microsoft, and NVIDIA to take a systems approach to addressing silicon faults and silent data errors. The team has made great strides, including publishing the &lt;a href="https://www.opencompute.org/documents/external-ver-0-3open-compute-specification-server-component-resilience-workstream-pdf" target="_blank"&gt;draft specification&lt;/a&gt; and open-sourcing Silent Data Corruption (SDC) frameworks (e.g., Intel and ARM collaborating on &lt;a href="https://github.com/opendcdiag/opendcdiag" target="_blank"&gt;Open Datacenter Diagnostics&lt;/a&gt;, AMD’s &lt;a href="https://github.com/amd/Open-Field-Health-Check" target="_blank"&gt;Open Field Health Check&lt;/a&gt;, and NVIDIA’s &lt;a href="https://github.com/NVIDIA/dcgm" target="_blank"&gt;Datacenter GPU Manager&lt;/a&gt;). To advance this important area faster, we are launching a new academic grant program — the first of its kind at OCP — with member companies supporting significant academic research in this area.&lt;/li&gt;&lt;/ul&gt;&lt;h3 data-block-key="4bjn9"&gt;&lt;b&gt;Scalability from silicon to the cloud&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="fik0g"&gt;Scalable infrastructure is a primary area of focus for both Google and OCP, from silicon all the way to the cloud. At the OCP Summit this week, we will discuss a few advancements, specifically:&lt;/p&gt;&lt;ul&gt;&lt;li data-block-key="5mtvj"&gt;&lt;b&gt;Accelerators&lt;/b&gt;: This year, we partnered with AMD, ARM, Intel, Meta, and NVIDIA to deliver the &lt;a href="https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-06-20-pdf" target="_blank"&gt;OCP 8-bit Floating Point specification&lt;/a&gt; to enable training on one accelerator and serving on another. We partnered with Microsoft and NVIDIA to deliver a set of firmware specifications for GPUs and accelerators covering &lt;a href="https://www.opencompute.org/documents/finalocp-gpu-and-accelerators-ras-requirements-0-5-pdf" target="_blank"&gt;reliability&lt;/a&gt;, &lt;a href="https://www.opencompute.org/documents/ocp-gpu-accelerator-management-interfaces-v-5-pdf" target="_blank"&gt;manageability&lt;/a&gt;, and &lt;a href="https://www.opencompute.org/documents/ocp-gpu-fw-update-specification-v0-7-pdf" target="_blank"&gt;updates&lt;/a&gt;.&lt;/li&gt;&lt;li data-block-key="a4tpe"&gt;&lt;b&gt;AI:&lt;/b&gt; During the AI Track, we are highlighting the progress we are making with partners in the &lt;a href="https://opensource.googleblog.com/2023/03/openxla-is-ready-to-accelerate-and-simplify-ml-development.html" target="_blank"&gt;OpenXLA&lt;/a&gt; ecosystem. We are also discussing the &lt;a href="https://blog.research.google/2023/07/an-open-source-gymnasium-for-computer.html" target="_blank"&gt;Architecture Gym&lt;/a&gt;, a new effort in collaboration with MLCommons to go beyond systems for AI, to AI for systems, looking at how AI can transform systems design.&lt;/li&gt;&lt;li data-block-key="6eol"&gt;&lt;b&gt;Networking:&lt;/b&gt; To truly build large-scale AI infrastructure, you need world-class networking systems innovation. To help with this, we are opening Falcon, Google’s reliable low-latency hardware transport, and sharing some of the advances we have made over the past 10 years on performance, latency, traffic control, etc. This is part of our ongoing effort to advance Ethernet to the industry as a high-performance, low-latency fabric for hyperscaler environments. Learn more in the blog “&lt;a href="https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport"&gt;Google opens Falcon, a reliable low-latency hardware transport, to the ecosystem&lt;/a&gt;”.&lt;/li&gt;&lt;li data-block-key="238rl"&gt;&lt;b&gt;Storage:&lt;/b&gt; Google is joining the OCP Data Center NVM Express™ (NVMe) specification, working group with Meta, Microsoft, Dell, and HPE to provide clear requirements for features in datacenter SSDs including Flexible Data Placement, security, and telemetry. We are also kicking off a new open-source hardware effort to develop an NVMe Key Management block with partners Microsoft, Samsung, Kioxia and Solidigm.&lt;/li&gt;&lt;/ul&gt;&lt;p data-block-key="fajip"&gt;There is tremendous opportunity for all of us in the industry to create even more open ecosystems for innovation. At Google, we have a legacy of embracing and fostering open ecosystems, whether it’s Android, Chromium, Kubernetes, Kaggle, Tensorflow, or Jax. We set industry standards, grow communities, and share our innovations broadly. Our contributions to the Open Compute Project Foundation go back several years, from our first &lt;a href="https://www.opencompute.org/files/External-2018-OCP-Summit-Google-48V-Update-Flatbed-and-STC-20180321.pdf" target="_blank"&gt;48V contribution&lt;/a&gt; to today, sitting on the OCP Board and being one of its largest contributors. We believe the best is yet to come, through codesign and collaboration across hardware and software, multiple layers of the stack, compute, network, storage, infrastructure, industry and academia, and of course, across companies.&lt;/p&gt;&lt;p data-block-key="akl62"&gt;It is exciting to be in an era where we are literally inventing the future with new AI advances every day. All these amazing AI advances in turn need a healthy innovation ecosystem around infrastructure, from all of us — to build the sustainable, secure, scalable &lt;i&gt;societal infrastructure&lt;/i&gt; that we need for this AI-driven future. And all of this will be possible only through collaboration across all of us in the community. You can learn more about the OCP Global Summit agenda &lt;a href="https://2023ocpglobal.fnvirtual.app/a/schedule/" target="_blank"&gt;here&lt;/a&gt; and talks by Google &lt;a href="https://2023ocpglobal.fnvirtual.app/a/schedule/#view=calendar&amp;amp;company=google%2Cgoogle%20deepmind" target="_blank"&gt;here&lt;/a&gt;. We are looking forward to the vibrant discussions this week.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 17 Oct 2023 15:30:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/google-systems-innovations-at-ocp-global-summit/</guid><category>Sustainability</category><category>Networking</category><category>Security &amp; Identity</category><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How we’ll build sustainable, scalable, secure infrastructure for an AI-driven future</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/google-systems-innovations-at-ocp-global-summit/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Parthasarathy Ranganathan</name><title>VP, Engineering Fellow</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amin Vahdat</name><title>SVP and Chief Technologist, AI and Infrastructure</title><department></department><company></company></author></item><item><title>Google opens Falcon, a reliable low-latency hardware transport, to the ecosystem</title><link>https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport/</link><description>&lt;div class="block-paragraph"&gt;&lt;p data-block-key="s2u0m"&gt;At Google, we have a long history of solving problems at scale using Ethernet, and rethinking the transport layer to satisfy demanding workloads that require high burst bandwidth, high message rates, and low latency. Workloads such as storage have needed some of these attributes for a long time, however, with newer use cases such as massive-scale AI/ML training and high performance computing (HPC), the need has grown significantly. In the past, we’ve openly shared our learnings in traffic shaping, congestion control, load balancing, and more with the industry by contributing our ideas to the &lt;a href="https://www.acm.org/" target="_blank"&gt;Association for Computing Machinery&lt;/a&gt; and &lt;a href="https://ietf.org/" target="_blank"&gt;Internet Engineering Task Force&lt;/a&gt;. These ideas have been implemented in software and a few in hardware for several years. But going forward, we believe the industry at large will see more gains by implementing the set with dedicated and flexible hardware assist.&lt;/p&gt;&lt;p data-block-key="6nrfu"&gt;To achieve this goal, we developed Falcon to enable a step function in performance over software-only transports. Today at the &lt;a href="https://www.opencompute.org/summit/global-summit" target="_blank"&gt;OCP Global Summit&lt;/a&gt;, we are excited to open Falcon to the ecosystem through the &lt;a href="https://www.opencompute.org/" target="_blank"&gt;Open Compute Project&lt;/a&gt;, the natural venue to empower the community with Google’s production learnings to help modernize Ethernet.&lt;/p&gt;&lt;p data-block-key="2lr6f"&gt;As a hardware-assisted transport layer, Falcon is designed to be reliable, high performance, and low latency and leverages production-proven technologies including &lt;a href="https://research.google/pubs/pub46460/" target="_blank"&gt;Carousel&lt;/a&gt;, &lt;a href="https://research.google/pubs/pub48630/" target="_blank"&gt;Snap&lt;/a&gt;, &lt;a href="https://research.google/pubs/pub49448/" target="_blank"&gt;Swift&lt;/a&gt;, &lt;a href="https://research.google/pubs/pub52149/" target="_blank"&gt;PLB&lt;/a&gt;, and &lt;a href="https://datatracker.ietf.org/doc/html/draft-ravi-ippm-csig-00" target="_blank"&gt;CSIG&lt;/a&gt;.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Falcon.max-1000x1000.jpg"
        
          alt="1 Falcon"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="s2u0m"&gt;Falcon’s layers are illustrated in the figure below, including their associated function. We show the RDMA and NVM Express™ Upper layer protocols (ULPs), however, Falcon is extensible to additional ULPs as needed by the ecosystem.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Falcon.max-1000x1000.jpg"
        
          alt="2 Falcon"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="s2u0m"&gt;The lower layers of Falcon use three key insights to achieve low latency in high-bandwidth, yet lossy, Ethernet data center networks. Fine-grained hardware-assisted round-trip time (RTT) measurements with flexible, per-flow hardware-enforced traffic shaping, and fast and accurate packet retransmissions, are combined with multipath-capable and &lt;a href="https://cloud.google.com/blog/products/identity-security/announcing-psp-security-protocol-is-now-open-source"&gt;PSP-encrypted&lt;/a&gt; Falcon connections. On top of this foundation, Falcon has been designed from the ground up as a multi-protocol transport capable of supporting ULPs with widely varying performance requirements and application semantics. The ULP mapping layer not only provides out-of-the-box compatibility with Infiniband Verbs RDMA and NVMe ULPs, but also includes additional innovations critical for warehouse-scale applications such as flexible ordering semantics and graceful error handling. Last but not least, the hardware and software are co-designed to work together to help achieve the desired attributes of high message rate, low latency, and high bandwidth, while maintaining flexibility for programmability and continued innovation.&lt;/p&gt;&lt;p data-block-key="dap5b"&gt;Falcon reflects the central role that Ethernet continues to play in our industry. Falcon is designed for predictable high performance at warehouse scale, as well as flexibility and extensibility. We look forward to working with the community and industry partners to modernize Ethernet to serve the networking requirements of our AI-driven future. We believe that Falcon will be a valuable addition to the other ongoing efforts in this space.&lt;/p&gt;&lt;h3 data-block-key="p8og"&gt;Industry perspectives&lt;/h3&gt;&lt;p data-block-key="ed1k6"&gt;Our partners across the industry are enthusiastic about the promise that Falcon holds for developing the next generation of Ethernet.&lt;/p&gt;&lt;p data-block-key="b1lro"&gt;&lt;i&gt;“We welcome Google’s contribution of Falcon as it shares the Ultra Ethernet Consortium’s vision to drive Ethernet as the best data center fabric for AI and HPC, and look forward to continuing industry innovations in this important space.”&lt;/i&gt; - Dr. J Metz, Chair, Ultra Ethernet Consortium (led by AMD, Arista, Broadcom, Cisco, Eviden, Hewlett Packard Enterprise, Intel, Meta, Microsoft, and Oracle)&lt;/p&gt;&lt;p data-block-key="3m1kp"&gt;&lt;i&gt;“Falcon is first available in the Intel IPU E2000 series of products. The value of these IPUs is further enhanced as the first instance of an Ethernet transport to add low tail latency and congestion handling at scale. Intel is a Steering Member of Ultra Ethernet Consortium, which is working to evolve Ethernet for high performance AI and HPC workloads. We plan to deploy the resulting standards-based enhancements in future IPU and Ethernet products.”&lt;/i&gt; - Sachin Katti, SVP &amp;amp; GM, Network and Edge Group, Intel&lt;/p&gt;&lt;p data-block-key="ftfle"&gt;&lt;i&gt;"We are pleased to see a high-performance transport protocol for critical workloads such as AI and HPC that works over standard Ethernet/IP networks and enables massive application bandwidth at scale."&lt;/i&gt; - Hugh Holbrook, Group VP, SW Eng., Arista Networks&lt;/p&gt;&lt;p data-block-key="6a9pt"&gt;&lt;i&gt;“Cisco is pleased to see the contribution of Falcon to the OCP. Cisco has long supported open standards and believes in broad ecosystems. The rate and scale of modern data center networks and particularly AI/ML networks is unprecedented, presenting a challenge and opportunity to the industry. Falcon addresses many of the challenges of these networks, enabling efficient network utilization.”&lt;/i&gt; - Ofer Iny, Cisco Fellow, Cisco&lt;/p&gt;&lt;p data-block-key="c2se4"&gt;&lt;i&gt;“Juniper is a strong supporter of open ecosystems, and therefore we are pleased to see Falcon being opened to the OCP community. Falcon allows Ethernet to serve as the data center network-of-choice for demanding workloads, providing high-bandwidth, low tail latency and congestion mitigation. Falcon provides the industry with a proven solution today for demanding AI &amp;amp; ML workloads.”&lt;/i&gt; - Raj Yavatkar, Chief Technology Officer, Juniper&lt;/p&gt;&lt;p data-block-key="crjoq"&gt;&lt;i&gt;“Marvell strongly supports and is committed to the open Ethernet ecosystem as it evolves to support emerging, demanding workloads such as AI. We applaud the contribution of Falcon to OCP and welcome Google sharing practical experiences with the industry.”&lt;/i&gt; - Nick Kucharewski, SVP &amp;amp; GM Network Switching Group, Marvell&lt;/p&gt;&lt;h3 data-block-key="1g0i"&gt;Learn more&lt;/h3&gt;&lt;p data-block-key="d9t24"&gt;Networking is a foundational component in building the sustainable, secure, scalable societal infrastructure that we need for this AI-driven future. To learn more about Falcon, join us for the OCP Summit presentation, “A Reliable and Low Latency Ethernet Hardware Transport” by Google’s Nandita Dukkipati at 11:45am at the Expo Hall. We’ll contribute the Falcon specification to OCP in the first quarter of 2024.&lt;/p&gt;&lt;p data-block-key="2dmgl"&gt;To learn more about Google’s contributions to the Open Compute Project and our presence at the OCP Global Summit, check out the blog “&lt;a href="https://cloud.google.com/blog/topics/systems/google-systems-innovations-at-ocp-global-summit"&gt;How we’ll build sustainable, scalable, secure infrastructure for an AI-driven future&lt;/a&gt;”.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 17 Oct 2023 15:30:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport/</guid><category>Networking</category><category>HPC</category><category>Systems</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Google opens Falcon, a reliable low-latency hardware transport, to the ecosystem</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Dan Lenoski</name><title>VP of Engineering, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nandita Dukkipati</name><title>Principal Software Engineer, Google Cloud</title><department></department><company></company></author></item><item><title>Google’s Cloud TPU v4 provides exaFLOPS-scale ML with industry-leading efficiency</title><link>https://cloud.google.com/blog/topics/systems/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains/</link><description>&lt;div class="block-paragraph"&gt;&lt;p&gt;&lt;i&gt;&lt;b&gt;Editor’s note&lt;/b&gt;: Today, two legendary Google engineers describe the “secret sauce” that has made TPU v4 a platform of choice for the world’s leading AI researchers and developers for training machine learning models at scale. &lt;a href="https://en.wikipedia.org/wiki/Norman_Jouppi" target="_blank"&gt;Norm Jouppi&lt;/a&gt; is the chief architect for all Google’s TPUs, from TPU v1 to TPU v4. He is a Google Fellow and a member of the National Academy of Engineering (NAE). &lt;a href="https://en.wikipedia.org/wiki/David_Patterson_(computer_scientist)" target="_blank"&gt;David Patterson&lt;/a&gt;, a Google Distinguished Engineer, shared the &lt;a href="https://www.nytimes.com/2018/03/21/technology/computer-chips-turing-award.html" target="_blank"&gt;ACM A.M. Turing Award&lt;/a&gt; and the &lt;a href="https://www.nae.edu/266390/RISC-Chip-Innovators-Receive-the-2022-Charles-Stark-Draper-Prize-for-Engineering" target="_blank"&gt;NAE Charles Draper Prize&lt;/a&gt;. David is one of the creators of RISC and RAID, and his recent research has been on the &lt;a href="https://ieeexplore.ieee.org/document/9810097" target="_blank"&gt;CO2e emissions from machine learning&lt;/a&gt;. &lt;/i&gt;&lt;/p&gt;&lt;hr/&gt;&lt;p&gt;Scaling computing performance is foundational to advancing the state of the art in machine learning (ML). Thanks to key innovations in interconnect technologies and domain specific accelerators (DSA), the Google Cloud TPU v4 enabled: &lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;a nearly 10x leap forward in scaling ML system performance over TPU v3 &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;boosting energy efficiency ~2-3x compared to contemporary ML DSAs, and &lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;reducing CO2e as much as ~20x over these DSAs in typical on-premise data centers&lt;sup&gt;1&lt;/sup&gt;.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;As such, the performance, scalability, efficiency, and availability of TPU v4 make it an ideal vehicle for large language models.&lt;/p&gt;&lt;p&gt;TPU v4 provides exascale ML performance, with 4096 chips interconnected by an internally-developed industry-leading optical circuit switch (OCS). You can see one eighth of a TPU v4 pod below. Google’s Cloud TPU v4 outperforms TPU v3 by 2.1x on average on a per-chip basis and improves performance/Watt by 2.7x. The mean TPU v4 chip power is typically only 200W.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Cloud_TPU_v4.max-1000x1000.jpg"
        
          alt="1 Cloud TPU v4.jpg"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;i&gt;One eighth of a TPU v4 pod from Google's &lt;a href="https://cloud.google.com/blog/products/compute/google-unveils-worlds-largest-publicly-available-ml-cluster"&gt;world’s largest publicly available ML cluster&lt;/a&gt; &lt;br/&gt;located in Oklahoma, which runs on ~90% carbon-free energy.&lt;/i&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p&gt;TPU v4 is the first supercomputer to deploy a reconfigurable OCS. OCSes dynamically reconfigure their interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are &amp;lt;5% of TPU v4’s system cost and &amp;lt;5% of system power. &lt;a href="https://dl.acm.org/doi/pdf/10.1145/2829988.2787508" target="_blank"&gt;The figure below shows how an OCS works&lt;/a&gt;, using two MEMs arrays. No optical to electrical to optical conversion or power-hungry network packet switches are required, saving power.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Cloud_TPU_v4.max-1000x1000.jpg"
        
          alt="2 Cloud TPU v4.jpg"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p&gt;The combination of powerful yet efficient processors and a distributed shared memory system provides remarkable scalability for deep neural network models. The scalability of TPU v4 production workloads on a variety of model types is shown below on a log-log scale.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_Cloud_TPU_v4.max-1000x1000.jpg"
        
          alt="3 Cloud TPU v4.jpg"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p&gt;Dynamic OCS reconfigurability also helps with availability. Circuit switching makes it easy to route around failed components so that long-running tasks like ML training can utilize thousands of processors for weeks at a time. This flexibility even allows us to change the topology of the supercomputer interconnect to accelerate the performance of an ML model.&lt;/p&gt;&lt;p&gt;The performance, scalability, and availability make TPU supercomputers the workhorses of large language models like LaMDA, MUM, and &lt;a href="https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html" target="_blank"&gt;PaLM&lt;/a&gt;. The 540B-parameter PaLM model sustained a remarkable &lt;i&gt;57.8% of the peak hardware floating point performance over 50 days&lt;/i&gt; while training on TPU v4 supercomputers. TPU v4’s scalable interconnect also unlocks multidimensional model-partitioning techniques that enable &lt;a href="https://arxiv.org/abs/2211.05102" target="_blank"&gt;low-latency, high-throughput inference&lt;/a&gt; for these LMs.&lt;/p&gt;&lt;p&gt;TPU supercomputers are also the first with hardware support for embeddings, a key component of Deep Learning Recommendation Models (DLRMs) used in advertising, search ranking, YouTube, and Google Play. Each TPU v4 includes third-generation SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x–7x yet use only 5% of die area and power. &lt;/p&gt;&lt;p&gt;The performance of an internal recommendation model on CPUs, TPU v3, TPU v4, and TPU v4 with embeddings in CPU memory (not using SparseCore) is shown below. The TPU v4 SparseCore is 3X faster than TPU v3 on recommendation models, and 5–30X faster than systems using CPUs.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Cloud_TPU_v4.max-1000x1000.jpg"
        
          alt="4 Cloud TPU v4.jpg"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p&gt;Embeddings processing requires significant all-to-all communication, since the embeddings are distributed around TPU chips working together on a model. This pattern stresses the bandwidth of the shared memory interconnect. That’s why TPU v4 uses a 3D torus interconnect (vs. TPU v2 and v3 which used a 2D torus). TPU v4’s 3D torus provides a higher bisection bandwidth — i.e., the bandwidth from one half of the chips to the other half across the middle of the interconnect — to help support the larger number of chips and the higher SparseCore v3 performance. The figure below shows the significant bandwidth and performance increase from the 3D torus.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Cloud_TPU_v4.max-1000x1000.jpg"
        
          alt="5 Cloud TPU v4.jpg"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p&gt;TPU v4 has been operational at Google since 2020 and became available for customers on Google Cloud &lt;a href="https://cloud.google.com/blog/products/compute/google-unveils-worlds-largest-publicly-available-ml-cluster"&gt;last year&lt;/a&gt;. Since its launch, TPU v4 supercomputers have been actively used by leading AI teams around the globe for cutting-edge ML research and production workloads across language models, recommender systems and generative AI. For example, the &lt;a href="https://blog.allenai.org/cloud-tpus-unlock-many-large-scale-high-impact-projects-at-ai2-7aca9229e2c6" target="_blank"&gt;Allen Institute for AI&lt;/a&gt;, a non-profit institute founded by Paul Allen with the mission of conducting high-impact AI research for the common good, greatly benefitted from TPU v4 architecture and was able to unlock many of their large-scale, high-impact research initiatives. &lt;/p&gt;&lt;p&gt;“More recently, a number of researchers have turned to Cloud TPUs for their easy ability to distribute across many processing units. With GPUs, once you scale beyond a single machine you need to adjust your code for distribution and you might be disappointed by the connection speeds between your servers,” said Michael Schmitz, Senior Director of Engineering, Allen Institute for AI. “But with Cloud TPUs you can seamlessly scale individual workloads to thousands of chips, where all chips are directly connected to each other via a high-speed mesh network.”&lt;/p&gt;&lt;p&gt;&lt;a href="https://www.googlecloudpresscorner.com/2023-03-14-Midjourney-Selects-Google-Cloud-to-Power-AI-Generated-Creative-Platform" target="_blank"&gt;Midjourney&lt;/a&gt;, one of the leading text-to-image AI startups, have been using Cloud TPU v4 to train their state-of-the-art model, coincidentally also called “version four”. &lt;/p&gt;&lt;p&gt;"We’re proud to work with Google Cloud to deliver a seamless experience for our creative community powered by Google’s globally scalable infrastructure,” said David Holz, founder and CEO of Midjourney. “From training the fourth version of our algorithm on the latest v4 TPUs with JAX, to running inference on GPUs, we have been impressed by the speed at which TPU v4 allows our users to bring their vibrant ideas to life.”&lt;/p&gt;&lt;p&gt;We are proud to share additional details of our TPU v4 research in a &lt;a href="https://arxiv.org/abs/2304.01433" target="_blank"&gt;paper&lt;/a&gt; that will be presented at the &lt;a href="https://iscaconf.org/isca2023/" target="_blank"&gt;International Symposium on Computer Architecture&lt;/a&gt;, and we look forward to discussing our findings with the community. &lt;/p&gt;&lt;hr/&gt;&lt;p&gt;&lt;i&gt;&lt;sup&gt;The authors thank many of Google's engineering and product teams for making TPU v4 a success story. We also want to thank Amin Vahdat, Mark Lohmeyer, Maud Texier, James Bradbury and Max Sapozhnikov for their contributions to this blog post.&lt;/sup&gt;&lt;/i&gt;&lt;br/&gt;&lt;/p&gt;&lt;p&gt;&lt;i&gt;&lt;sup&gt;1. This ~20x improvement comes from a combination of: ~2-3x more energy efficient TPUs, ~1.4x lower &lt;a href="https://en.wikipedia.org/wiki/Power_usage_effectiveness" target="_blank"&gt;PUE&lt;/a&gt; of Google data centers relative to on-premise data centers, and ~6x for the cleaner energy in Oklahoma that houses all Cloud TPU v4 supercomputers versus the average energy cleanliness of the typical on-premise data center.&lt;/sup&gt;&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 05 Apr 2023 15:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/systems/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains/</guid><category>AI &amp; Machine Learning</category><category>Compute</category><category>Systems</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/tpu-v2-6.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Google’s Cloud TPU v4 provides exaFLOPS-scale ML with industry-leading efficiency</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/tpu-v2-6.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/systems/tpu-v4-enables-performance-energy-and-co2e-efficiency-gains/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Norm Jouppi</name><title>Google Fellow</title><department>Google</department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>David Patterson</name><title>Google Distinguished Engineer, Google</title><department></department><company></company></author></item></channel></rss>