<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>High Performance Computing</title><link>https://cloud.google.com/blog/topics/hpc/</link><description>High Performance Computing</description><atom:link href="https://cloudblog.withgoogle.com/blog/topics/hpc/rss/" rel="self"></atom:link><language>en</language><lastBuildDate>Wed, 04 Mar 2026 17:00:05 +0000</lastBuildDate><image><url>https://cloud.google.com/blog/topics/hpc/static/blog/images/google.a51985becaa6.png</url><title>High Performance Computing</title><link>https://cloud.google.com/blog/topics/hpc/</link></image><item><title>H4D VMs, now GA, deliver exceptional performance and scaling for HPC workloads</title><link>https://cloud.google.com/blog/products/compute/h4d-vms-now-ga/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we’re announcing  the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;general availability of H4D VMs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, our latest high performance computing (HPC)-optimized VM, powered by the 5th Generation AMD EPYC&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;™ processors&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. H4D VMs deliver exceptional performance, scalability, and value for industries like manufacturing, health care and life sciences, weather forecasting, and electronic design automation (EDA).&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; H4D supports orchestration via Cluster Toolkit with Slurm and via Google Kubernetes Engine (GKE). Each approach allows for near-instant deployment and scaling of demanding workloads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For the first time, the Google Cloud CPU portfolio features a VM family with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;C&lt;/strong&gt;&lt;strong style="vertical-align: baseline;"&gt;loud Remote Direct Memory Access (RDMA).&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;H4D’s RDMA is on the &lt;/span&gt;&lt;a href="https://cloud.google.com/titanium"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Titanium network adapter&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and lets you scale single-node H4D performance to multiple nodes, accelerating large production workloads. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Faster time to solution across domains and scales&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Powered by the high core density of the 5th Gen AMD EPYC CPU and Google’s innovative, low-latency &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Falcon hardware transport&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; H4D VMs enable you to iterate and discover faster than ever before.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We demonstrated H4D performance through a series of industry-standard benchmarks, showing its capabilities across diverse domains and problem sizes.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Healthcare and life sciences&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;For researchers in healthcare and life sciences (HCLS), H4D VMs accelerate complex molecular simulations critical to scientific discovery. Compared to our previous C2D VMs, H4D VMs deliver up to a 4.3X speedup running LAMMPs (LJ benchmark) at 96 VMs, delivering 95% parallel efficiency on 18k cores. For drug discovery, we demonstrated a 5.8X speed-up using GROMACS (water_33m) at 32 VMs delivering 72% parallel efficiency on 6k cores. H4D also delivers further scalability, which we demonstrated by running the LAMMPS LJ benchmark on 192 VMs (&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;~37k cores) while maintaining 92% parallel efficiency (see Figure 3).&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_JTLuwUW.max-1000x1000.jpg"
        
          alt="1-Figuer1&amp;amp;2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--medium
      
      
        h-c-grid__col
        
        h-c-grid__col--4 h-c-grid__col--offset-4
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_RA1vjLg.jpg"
        
          alt="2-Figuer3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Manufacturing&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;For manufacturing, H4D VMs help engineers shorten design cycles, run larger simulations, and iterate faster by delivering a strong performance boost for mission-critical Computer-Aided Engineering (CAE) workflows. Compared to our previous C2D VMs when running complex Computational Fluid Dynamics (CFD) simulations, H4D VMs deliver a 4.1X speedup running Ansys Fluent (F1_RaceCar_140m benchmark) on 32 VMs with 85% parallel efficiency. When running open-source OpenFOAM  (Motorbike_100m), we demonstrated a 5.2X speedup over C2D using 16 VMs and achieving superlinear parallel efficiency of 122%.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_9YSJuty.jpg"
        
          alt="3-Figuer4&amp;amp;5"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;A new standard for HPC price/performance&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;H4D VMs are designed to deliver the best price-performance for HPC workloads on Google Cloud by pairing superior performance with flexible consumption models. H4D supports Dynamic Workload Scheduler (DWS), which adapts to your workflow with Flex Start mode for just-in-time capacity and Calendar mode for guaranteed reservations. This allows you to access compute for as low as 3 cents per core-hour without long-term commitments. The resulting performance and cost efficiencies over previous generation VMs are detailed in Figures 6 and 7. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/4_VFxG3YM.jpg"
        
          alt="4-Figuer6"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/5_FKrLh4Z.jpg"
        
          alt="5-Figuer7"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Comprehensive HPC management&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To manage and deploy large, dense clusters of H4D VMs, you can leverage Google Cloud’s &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/ai-hypercomputer/docs/cluster-capabilities"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Director&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which offers advanced maintenance capabilities (you can sign up for the preview &lt;/span&gt;&lt;a href="https://forms.gle/dppWNms5DF44gCwV9" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) alongside the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/cluster-toolkit/docs/overview"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Toolkit&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for rapid cluster deployment  via turnkey system blueprints. For job and workload management, H4D VMs integrate with &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/batch/docs/get-started"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Batch&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Google Cloud’s fully managed, cloud-native service that handles queuing, scheduling, and resource provisioning. Additionally, there’s support for &lt;/span&gt;&lt;a href="https://cloud.google.com/products/dws/pricing?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DWS&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which can be used in both Calendar mode for future reservations and Flex Start mode for time-limited, on-demand usage.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;What customers and partners are saying&lt;/span&gt;&lt;/h3&gt;&lt;/div&gt;
&lt;div class="block-paragraph_with_image"&gt;&lt;div class="article-module h-c-page"&gt;
  &lt;div class="h-c-grid uni-paragraph-wrap"&gt;
    &lt;div class="uni-paragraph
      h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
      h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"&gt;

      






  

    &lt;figure class="article-image--wrap-small
      
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/jump.max-1000x1000.jpg"
        
          alt="jump"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  





      &lt;p data-block-key="ciutv"&gt;&lt;i&gt;“We were able to test the H4D platform in early access at&lt;/i&gt; &lt;a href="https://www.jumptrading.com/"&gt;&lt;i&gt;Jump Trading&lt;/i&gt;&lt;/a&gt;&lt;i&gt;, and were extremely impressed with the results. The successful testing process demonstrated that H4D offers the performance, stability, and efficiency we require for demanding, high-volume operations. We see up to 50% better price/performance compared to prior generation machines and are now accelerating integration with our critical grid workloads on Google Cloud."&lt;/i&gt; &lt;b&gt;- Alex Davies, Chief Technology Officer &amp;amp; Benjamin Stromski, HPC Linux Engineering, Jump Trading&lt;/b&gt;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_with_image"&gt;&lt;div class="article-module h-c-page"&gt;
  &lt;div class="h-c-grid uni-paragraph-wrap"&gt;
    &lt;div class="uni-paragraph
      h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
      h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"&gt;

      






  

    &lt;figure class="article-image--wrap-small
      
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/hmx_labs.max-1000x1000.jpg"
        
          alt="hmx labs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  





      &lt;p data-block-key="ciutv"&gt;&lt;i&gt;“There lingers, especially in large-scale and compute-intensive domains, the idea that the fastest systems can only be built on premises and run on bare metal hardware. Terms such as ‘hypervisor tax” are often thrown around as justification for operating with bare metal. Our testing paints a different picture. The Google H4D VM performs better on our financial risk benchmark than the bare metal top of stack AMD CPU of the same generation."&lt;/i&gt; &lt;b&gt;- Hamza Mian/CEO, HMxLabs&lt;/b&gt;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_with_image"&gt;&lt;div class="article-module h-c-page"&gt;
  &lt;div class="h-c-grid uni-paragraph-wrap"&gt;
    &lt;div class="uni-paragraph
      h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
      h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"&gt;

      






  

    &lt;figure class="article-image--wrap-small
      
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/totalcare.max-1000x1000.jpg"
        
          alt="totalcare"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  





      &lt;p data-block-key="ciutv"&gt;&lt;i&gt;"As a leading provider of managed HPC solutions for the demanding CAE and manufacturing sectors, our evaluation of the H4D platform was focused heavily on its ability to handle our clients' largest, most tightly-coupled simulation workloads. We are extremely impressed with the results. The testing confirmed that the underlying RDMA fabric exhibits the outstanding low-latency and high-bandwidth performance required for massive parallel processing. This level of interconnect efficiency is non-negotiable for speeding up critical manufacturing simulations like crash testing and CFD. H4D has proven itself to be a true accelerator for high-throughput engineering workloads, and we are excited about its potential to redefine the performance ceiling for HPC in the engineering world."&lt;/i&gt; &lt;b&gt;- Rodney Mach/President, TotalCAE&lt;/b&gt;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_with_image"&gt;&lt;div class="article-module h-c-page"&gt;
  &lt;div class="h-c-grid uni-paragraph-wrap"&gt;
    &lt;div class="uni-paragraph
      h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
      h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"&gt;

      






  

    &lt;figure class="article-image--wrap-small
      
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Google.max-1000x1000.jpg"
        
          alt="Google"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  





      &lt;p data-block-key="ciutv"&gt;&lt;b&gt;&lt;i&gt;“&lt;/i&gt;&lt;/b&gt;&lt;i&gt;The new H4D instances are a significant step forward for our demanding next-generation TPU simulation workloads. We've seen a 30% performance improvement across a variety of EDA benchmarks compared to C2D, demonstrating the strong single core performance of H4D. This directly translates to faster development cycles and allows our engineering teams to iterate more quickly”&lt;/i&gt;&lt;b&gt; - Trevor Switkowski, Technical Lead of Chip Design Methodology, Google Cloud&lt;/b&gt;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Experience H4D today&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;H4D is now available in &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;us-central1-a (Iowa), europe-west4-b (Netherlands) and asia-southeast1-a (Singapore)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; with additional regions coming soon. Check regional availability on our &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/regions-zones#available"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Regions and Zones page&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and deploy your most demanding HPC workloads by leveraging &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/compute/docs/instances/create-vm-with-rdma"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud RDMA&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;. &lt;/strong&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sub&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;The following configurations were run for the above benchmarks:&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;LAMMPS version 20250722, GROMACS: version 2023.1, OpenFOAM version 2312, Ansys Fluent version 2024R1. All runs used IntelMPI 2021.17.2. C2D/C3D/C4D used TCP, H4D used RDMA with RXM &amp;amp; SAR_LIMIT=2G. All runs used full ppn (processes-per-node) available on each platform (56, 180, 192 for C2D, C3D and C4D/H4D respectively). Ansys Fluent runs used 168ppn on H4D and variable ppn for C4D. SMT off for all. Cost comparision across single nodes of H4D-highmem-192 with DWS Flex Start price, c3d-standard-360 and c2d-standard-112 OD price.&lt;/span&gt;&lt;/em&gt;&lt;/sub&gt;&lt;/p&gt;
&lt;p&gt;&lt;sub&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;Parallel efficiency and optimal node count depend on input size and communication patterns, and therefore vary across workloads.&lt;/span&gt;&lt;/em&gt;&lt;/sub&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 04 Mar 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/compute/h4d-vms-now-ga/</guid><category>HPC</category><category>Compute</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>H4D VMs, now GA, deliver exceptional performance and scaling for HPC workloads</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/compute/h4d-vms-now-ga/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Aysha Keen</name><title>Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Felix Schürmann</name><title>Senior HPC Technologist</title><department></department><company></company></author></item><item><title>Accelerating discovery at the speed of cloud: What’s New for HPC at Google Cloud for SC25</title><link>https://cloud.google.com/blog/topics/hpc/accelerating-innovation-and-discovery-at-sc25/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With the pace of scientific discovery moving faster than ever, we’re excited to join the supercomputing community as it gets ready for its annual flagship event, &lt;/span&gt;&lt;a href="https://sc25.supercomputing.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;SC25&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, in St. Louis from November 16-21, 2025. There, we’ll share how Google Cloud is poised to help with our lineup of HPC and AI technologies and innovations, helping researchers, scientists, and engineers solve some of humanity's biggest challenges.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Redefining supercomputing with cloud-native HPC&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;S&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;upercomputers are evolving from a rigid, capital-intensive resource into an adaptable, scalable service. To go from “HPC in the cloud” to “cloud-native HPC,” we leverage core principles of automation and elastic infrastructure to fundamentally change how you consume HPC resources, allowing you to spin up purpose-built clusters in minutes with the exact resources you need. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This cloud-native model is very flexible. You can augment an on-premises cluster to meet peak demand or build a cloud-native system tailored with the right mix of hardware for your specific problem — be it the latest CPUs, GPUs, or TPUs. With this approach, we’re democratizing HPC, putting world-class capabilities into the hands of startups, academics, labs, and enterprise teams alike. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Key highlights at SC25:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Next-generation infrastructure: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We’ll be showcasing our latest &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/compute/docs/compute-optimized-machines#h4d_series"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;H4D VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, powered by 5th generation AMD EPYC processors and featuring Cloud RDMA for low-latency networking. You’ll also see our latest accelerated compute resources including &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;A4X&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/now-shipping-a4x-max-vertex-ai-training-and-more"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;A4X Max&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; VMs featuring the latest NVIDIA GPUs with RDMA.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Powering your essential applications: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Run your most demanding simulations at massive scale — from Computational Fluid Dynamics (CFD) with Ansys, to Computer-Aided Engineering with Siemens, computational chemistry with Schrodinger, and risk modeling in FSI.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Dynamic Workload Scheduler:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Discover how &lt;/span&gt;&lt;a href="https://cloud.google.com/products/dws/pricing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dynamic Workload Scheduler&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and its innovative &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/dws"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Flex Start mode&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, integrated with familiar schedulers like Slurm, is reshaping HPC consumption. Move beyond static queues toward flexible, cost-effective, and efficient access to high-demand compute resources. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Easier HPC with Cluster Toolkit: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Learn how &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/cluster-toolkit/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Toolkit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; can help you deploy a supercomputer-scale cluster with less than 50 lines of code.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;High-throughput, scalable storage:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Get a deep dive into &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-lustre"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Managed Lustre&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a fully managed, high-performance parallel file system that can handle your most demanding HPC and AI workloads.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Hybrid for the enterprise: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;For our enterprise customers, especially in financial services, we're enabling hybrid cloud with &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/compute/docs/instances/ibm-symphony"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;IBM Spectrum Symphony Connectors&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, allowing you to migrate or burst workloads to Google Cloud and reduce time-to-solution.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;AI-powered scientific discovery&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;There’s a powerful synergy between HPC and AI — where HPC builds more powerful AI, and AI makes HPC faster and more insightful. This complementary relationship is fundamentally changing how research is done, accelerating discovery in everything from drug development and climate modeling to new materials and engineering. At Google Cloud, we’re at the forefront of this transformation, building the models, tools, and platforms that make it possible. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;What to look for: &lt;/strong&gt;&lt;/p&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;AI for scientific productivity: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We’ll be showcasing Google’s suite of AI tools designed to enhance the entire research lifecycle. From &lt;/span&gt;&lt;a href="https://cloud.google.com/agentspace/docs/idea-generation"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Idea Generation agent&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to &lt;/span&gt;&lt;a href="https://codeassist.google/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Code Assist&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with &lt;/span&gt;&lt;a href="https://cloud.google.com/gemini-enterprise?_gl=1*9qpvwe*_up*MQ..&amp;amp;gclid=EAIaIQobChMIptGF-7qrkAMVRyvUAR0VwSw1EAAYASAAEgIZMPD_BwE&amp;amp;gclsrc=aw.ds#module-7"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, you’ll see how AI can augment your capabilities and accelerate discovery. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;AI-powered scientific applications: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Learn about the latest advancements in our AI-powered scientific applications including AlphaFold 3 and Weather Next&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The power of TPUs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Explore Google's &lt;/span&gt;&lt;a href="https://cloud.google.com/tpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;TPUs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including the latest seventh-generation &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/ironwood-tpus-and-new-axion-based-vms-for-your-ai-workloads"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ironwood&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; model, and discover how they can enhance AI workload performance and efficiency.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Join the Google Cloud at SC25: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud, we believe the cloud is the supercomputer of the future. From purpose-built HPC and AI infrastructure to quantum breakthroughs and simplified open-source tools, let Google Cloud be the platform for your next discovery.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We invite you to connect with our experts and learn more. Join the &lt;/span&gt;&lt;a href="https://sites.google.com/view/advancedcomputingcommunity/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Advanced Computing Community&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to engage in discussions with our partners and the broader HPC, AI, and quantum communities.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We can’t wait to see what you discover.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;See us at the show:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Visit us in booth #3724: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Stop by for live demos of our latest HPC and AI solutions, including Dynamic Workload Scheduler, Cluster Toolkit, our latest AI agents, and even see our TPUs. Our team of experts will be on hand to answer your questions and discuss how Google Cloud can meet your needs.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Attend our technical talks:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Keep an eye on &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-sc-25" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;our SC25 schedule&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for Google Cloud presentations and technical talks, where our leaders and partners will share deep dives, insights, and best practices.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Passport program: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Grab a passport card from the Google booth and visit our demos, labs, and talks to collect stamps and learn about how we’re working with organizations across the HPC ecosystem to democratize HPC. Come back to the Google booth with your completed passport card to choose your prize!&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Play a game:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Join us in the Google booth and at our events to enjoy some Gemini-driven games — test your tech trivia knowledge or compete head-to-head with others to build the best LEGO creation!&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Join our community kickoff: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Are you a member of the Google Cloud Advanced Computing Community? Secure your spot today for our &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-advanced-computing-community-sc25" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;SC25 Kickoff Happy Hour&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;!&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Celebrate with NVIDIA and Google Cloud: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We’re proud to co-host a &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-nvidia-sc25" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;reception with NVIDIA&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and we look forward to toasting another year of innovation with our customers and partners. Register today to secure your spot!&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Fri, 14 Nov 2025 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/accelerating-innovation-and-discovery-at-sc25/</guid><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Accelerating discovery at the speed of cloud: What’s New for HPC at Google Cloud for SC25</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/accelerating-innovation-and-discovery-at-sc25/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Megan Gawlik</name><title>Outbound Product Manager</title><department></department><company></company></author></item><item><title>How scientists can leverage AI agents using Gemini Enterprise, Gemini Code Assist, and Gemini CLI</title><link>https://cloud.google.com/blog/products/ai-machine-learning/how-scientists-can-use-gemini-enterprise-for-ai-workflows/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scientific inquiry has always been a journey of curiosity, meticulous effort, and groundbreaking discoveries. Today, that journey is being redefined, fueled by the incredible capabilities of AI. It’s moving beyond simply processing data to actively participating in every stage of discovery, and Google Cloud is at the forefront of this transformation, building the tools and platforms that make it possible. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The sheer volume of data generated by modern research is immense, often too vast for human analysis alone. This is where AI steps in, not just as a tool, but as a collaborative force. We’re seeing powerful new models and AI agents assist with everything from identifying relevant literature and generating novel hypotheses to designing experiments, running simulations, and making sense of complex results. This collaboration doesn’t replace human intellect; it amplifies it, allowing researchers to explore more avenues, more quickly, and with greater precision. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud, we’re bringing together high-performance computing (HPC) and advanced AI on a single, integrated platform. This means you can seamlessly move from running massive-scale simulations to applying sophisticated machine learning models, all in one environment. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;So, how can you leverage these capabilities to get to insights faster? The journey begins at the foundation of scientific inquiry: the hypothesis.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;AI-enhanced scientific inquiry&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Every great discovery starts with a powerful hypothesis. With millions of research papers published annually, identifying novel opportunities is a monumental task. To overcome this information overload, scientists can now turn to AI as a powerful research partner.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our &lt;/span&gt;&lt;a href="https://cloud.google.com/agentspace/docs/research-assistant"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Deep Research&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; agent tackles the first step: performing a comprehensive analysis of published literature to produce detailed reports on a given topic that would otherwise take months to compile. Building on that foundation, our &lt;/span&gt;&lt;a href="https://cloud.google.com/agentspace/docs/idea-generation"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Idea Generation agent&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; then deploys an ensemble of AI collaborators to brainstorm, evaluate, propose, debate, and rank novel hypotheses. This powerful combination, available in &lt;/span&gt;&lt;a href="https://cloud.google.com/gemini-enterprise?_gl=1*9qpvwe*_up*MQ..&amp;amp;gclid=EAIaIQobChMIptGF-7qrkAMVRyvUAR0VwSw1EAAYASAAEgIZMPD_BwE&amp;amp;gclsrc=aw.ds#module-7"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, transforms the initial phase of scientific inquiry, empowering researchers to augment their expertise and find connections they might otherwise miss.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Go from hypothesis to results, faster&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once a hypothesis is formed, the work of translating it into executable code begins. This is where AI coding assistants, such as &lt;/span&gt;&lt;a href="https://codeassist.google/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Code Assist&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, excel. They automate the tedious tasks of writing analysis scripts and simulation models by generating code from natural language and providing real-time suggestions, dramatically speeding up the core development process. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;But modern research is more than just a single script; it’s a complete workflow of data, environments, and results managed from the command line. For this, &lt;/span&gt;&lt;a href="https://cloud.google.com/gemini/docs/codeassist/gemini-cli"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini CLI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; brings that same conversational power directly to your terminal. It acts as the ultimate workflow accelerator, allowing you to instantly synthesize research and generate hypotheses with simple commands, then seamlessly transition to experimentation by generating sophisticated analysis scripts, and debugging errors on the fly, all without ever breaking your focus. Gemini CLI can further accelerate your path to impact by transforming raw results into publication-ready text, generating the code for figures and tables, and refining your work for submission. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This capability extends to automating the entire research environment. Beyond single commands, Gemini CLI can manage complex, multi-step processes like cloning a scientific application, installing its dependencies, and then building and testing it—all with a simple prompt, maximizing your productivity.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;The new era of discovery: Your expertise, AI agents, and Google Cloud&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The new era of scientific discovery is here. By embedding AI into every stage of the scientific process - from sparking the initial idea to accelerating the final analysis - Google Cloud provides a single, unified platform for discovery. This new era of AI-enhanced scientific inquiry is built on a robust, intelligent infrastructure that combines the strengths of HPC simulation and AI. This includes purpose-built solutions like our H4D VMs optimized for scientific simulations, alongside the latest A4 and A4X VMs, powered by the latest NVIDIA GPUs, and Google Cloud Managed Lustre, a parallel file system that eliminates storage bottlenecks and allows your HPC and AI workloads to create and analyze massive datasets simultaneously. We provide the power to streamline the entire process so you can focus on scientific creativity - and changing the world! &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Join the &lt;/span&gt;&lt;a href="https://sites.google.com/view/advancedcomputingcommunity/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Advanced Computing Community&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to connect with other researchers, share best practices, and stay up to date on the latest advancements in AI for scientific and technical computing, or &lt;/span&gt;&lt;a href="https://cloud.google.com/contact"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;contact sales&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to get started today. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 03 Nov 2025 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/how-scientists-can-use-gemini-enterprise-for-ai-workflows/</guid><category>HPC</category><category>Google Cloud</category><category>AI &amp; Machine Learning</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How scientists can leverage AI agents using Gemini Enterprise, Gemini Code Assist, and Gemini CLI</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/how-scientists-can-use-gemini-enterprise-for-ai-workflows/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Megan Gawlik</name><title>Outbound Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Jay Boisseau</name><title>Advanced Computing Strategist</title><department></department><company></company></author></item><item><title>Evolving Ray and Kubernetes together for the future of distributed AI and ML</title><link>https://cloud.google.com/blog/products/containers-kubernetes/ray-on-gke-new-features-for-ai-scheduling-and-scaling/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ray is an OSS compute engine that is popular among Google Cloud developers to handle complex distributed AI workloads across CPUs, GPUs, and TPUs. Similarly, platform engineers have long trusted Kubernetes, and specifically Google Kubernetes Engine, for powerful and reliable infrastructure orchestration. Earlier this year, we &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/partnering-with-anyscale-to-integrate-rayturbo-with-gke"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announced a partnership&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with Anyscale to bring the best of Ray and Kubernetes together, forming a distributed operating system for the most demanding AI workloads. Today, we are excited to share some of the open-source enhancements we have built together across Ray and Kubernetes.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Ray and Kubernetes label-based scheduling&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;One of the key benefits of Ray is its flexible set of primitives that enable developers to write distributed applications without thinking directly about the underlying hardware. However, there are some use cases that weren’t very well covered by the existing support for virtual resources in Ray.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To improve scheduling flexibility and empower the Ray and Kubernetes schedulers to perform better autoscaling for Ray applications, we are &lt;/span&gt;&lt;a href="https://www.anyscale.com/blog/introducing-label-selectors-scheduling-ray" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;introducing label selectors to Ray&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Ray label selectors are heavily inspired by Kubernetes &lt;/span&gt;&lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;labels and selectors&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and intend to offer a familiar experience and smooth integration between the two systems. The Ray Label Selector API is available starting on Ray v2.49 and offers improved scheduling flexibility for distributed tasks and actors.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With the new &lt;/span&gt;&lt;a href="https://docs.ray.io/en/latest/ray-core/scheduling/labels.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Label Selector API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Ray now directly helps developers accomplish things like: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Assign labels to nodes in your Ray cluster (e.g. &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gpu-family=L4, market-type=spot, region=us-west-1&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;When launching tasks, actors or placement groups, declare which zones, regions or accelerator types to run on.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Use custom labels to define topologies and advanced scheduling policies.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For scheduling distributed applications on GKE, you can use &lt;/span&gt;&lt;a href="https://docs.ray.io/en/master/cluster/kubernetes/user-guides/label-based-scheduling.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ray and Kubernetes label selectors&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; together to gain full control over application and the underlying infrastructure. You can also use this combination with GKE &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/about-custom-compute-classes"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;custom compute classes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to define fallback behavior when specific GPU types are unavailable. Let’s dive into a specific example.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Below is an example Ray remote task that could run on various GPU types depending on available capacity. Starting in Ray v2.49, you can now define the accelerator type to bind GPUs with fallback behavior in cases where the primary GPU type or market type is not available. In this example, the remote task is targeting spot capacity with L4 GPUs but with a fallback to on-demand:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;@ray.remote(\r\n  label_selector={\r\n      &amp;quot;ray.io/accelerator&amp;quot;: &amp;quot;L4&amp;quot;\r\n       &amp;quot;ray.io/market-type&amp;quot;: &amp;quot;spot&amp;quot;\r\n  },\r\n  fallback_strategy=[\r\n    {\r\n      &amp;quot;label_selector&amp;quot;: {\r\n        &amp;quot;ray.io/accelerator&amp;quot;: &amp;quot;L4&amp;quot;\r\n        &amp;quot;ray.io/market-type&amp;quot;: &amp;quot;on-demand&amp;quot;\r\n       }\r\n    },\r\n  ]\r\n)\r\ndef func():\r\n    pass&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17587bcc10&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;On GKE, you can couple the same fallback logic using custom compute classes such that the underlying infrastructure for the Ray cluster matches the same fallback behavior:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;apiVersion: cloud.google.com/v1\r\nkind: ComputeClass\r\nmetadata:\r\n  name: gpu-compute-class\r\nspec:\r\n  priorities:\r\n  - gpu:\r\n      type: nvidia-l4\r\n      count: 1\r\n    spot: true\r\n  - gpu:\r\n      type: nvidia-l4\r\n      count: 1\r\n    spot: false\r\n  nodePoolAutoCreation:\r\n    enabled: true\r\n  whenUnsatisfiable: DoNotScaleUp&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744790790&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Refer to the &lt;/span&gt;&lt;a href="https://docs.ray.io/en/master/cluster/kubernetes/user-guides/label-based-scheduling.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ray documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to get started with Ray label selectors.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Advancing accelerator support in Ray and Kubernetes&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Earlier this year we demonstrated the ability to use the new Ray Serve LLM APIs to deploy large models such as &lt;/span&gt;&lt;a href="https://www.anyscale.com/blog/deepseek-vllm-ray-google-kubernetes" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DeepSeek-R1 on GKE&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with A3 High and A3 Mega machine instances. Starting on GKE v1.33 and KubeRay v1.4, you can use &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/concepts/about-dynamic-resource-allocation"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dynamic Resource Allocation (DRA)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for flexible scheduling and sharing of hardware accelerators, enabling the use of the next-generation of AI accelerators with Ray. Specifically, you can now use DRA to deploy Ray clusters on A4X series machines utilizing the NVIDIA GB200 NVL72 rack-scale architecture. To use DRA with Ray on A4X, &lt;/span&gt;&lt;a href="https://cloud.google.com/ai-hypercomputer/docs/create/gke-ai-hypercompute-custom-a4x"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;create an AI-optimized GKE cluster on A4X&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and define a ComputeDomain resource representing your NVL72 rack:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;apiVersion: resource.nvidia.com/v1beta1\r\nkind: ComputeDomain\r\nmetadata:\r\n  name: a4x-compute-domain\r\nspec:\r\n  numNodes: 18\r\n  channel:\r\n    resourceClaimTemplate:\r\n      name: a4x-compute-domain-channel&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744790250&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;And then specify the claim in your Ray worker’s Pod template:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;workerGroupSpecs:\r\n    ...\r\n    template:\r\n...\r\nspec:\r\n  ...\r\n  volumes:\r\n    ...\r\n  containers:\r\n    - name: ray-container\r\n      ...\r\n      resources:\r\n        limits:\r\n          nvidia.com/gpu: 4\r\n\t claims:\r\n        - name: compute-domain-channel\r\n        ...\r\nresourceClaims:\r\n  - name: compute-domain-channel\r\n    resourceClaimTemplateName: a4x-compute-domain-channel&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744790a90&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Combining DRA with Ray ensures that Ray worker groups are correctly scheduled on the same GB200 NVL72 rack for optimal GPU performance for the most demanding Ray workloads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re also partnering with Anyscale to bring a more native TPU experience to Ray and closer ecosystem integrations with frameworks like JAX. Ray Train introduced a &lt;/span&gt;&lt;a href="https://docs.ray.io/en/latest/train/getting-started-jax.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;JAXTrainer API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; starting in Ray v2.49, streamlining model training on TPUs using JAX. For more information on these TPU improvements in Ray, read &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/ray-on-tpus-with-gke-a-more-native-experience"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;A More Native Experience for Cloud TPUs with Ray&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Ray-native resource isolation with Kubernetes writable cgroups&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Writable cgroups allow the container's root process to create nested cgroups within the same container without requiring privileged capabilities. This feature is especially critical for Ray, which runs multiple control-plane processes alongside user code inside the same container. Even under the most intensive workloads, Ray can dynamically reserve a portion of the total container resources for system critical tasks, which significantly improves the reliability of your Ray clusters.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Starting on GKE v1.34, you can enable writable cgroups for Ray clusters. This first requires a one-time setup on your node pools by customizing the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;containerd&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; configuration. Add the following to your containerd configuration file:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;writableCgroups:\r\n enabled: true&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17447904f0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You then specify this updated configuration when you create or update a cluster or node pool. Once your nodes are configured, you can enable writable cgroups for Ray clusters by adding the following annotations:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;metadata:\r\n  annotations:\r\n    node.gke.io/enable-writable-cgroups.test-container: &amp;quot;true&amp;quot;&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744790640&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To enable Ray resource isolation using writable cgroups, set the following flags in &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;ray start&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;ray start --head --enable-resource-isolation&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744790b50&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This capability is one such example of how we’re evolving Ray and Kubernetes to improve reliability across the stack without compromising on security.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the near future, we plan to also introduce support for per-task and per-actor resource limits and requirements, a long requested feature in Ray. Additionally, we are collaborating with the open-source Kubernetes community to upstream this feature. To learn more, check out the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/how-to/writable-cgroups"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Ray vertical autoscaling with in-place pod resizing&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With the &lt;/span&gt;&lt;a href="https://kubernetes.io/blog/2025/05/16/kubernetes-v1-33-in-place-pod-resize-beta/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;introduction of in-place pod resizing in Kubernetes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; v1.33, we’re in the early stages of integrating vertical scaling capabilities for Ray when running on Kubernetes. Our early benchmarks show a 30% increase in workload efficiency due to scaling pods vertically before scaling horizontally. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_abzFIQW.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="bev4j"&gt;Benchmark based on completing two TPC-H workloads (Query 1 and 5) with Ray, 3 times on a GKE cluster with 3 worker nodes, each with 32 CPUs and 32 GB of memory.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In-place pod resizing enhances workload efficiency in the following ways:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster task/actor scale-up:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; With in-place resizing, Ray workers can scale up their available resources in seconds, an improvement over the minutes it could take to provision new nodes. This capability significantly accelerates the scheduling time for new Ray tasks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhanced bin-packing and resource utilization:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; In-place pod resizing enables more efficient bin-packing of Ray workers onto Kubernetes nodes. As new Ray workers scale up, they can reserve smaller portions of the available node capacity, freeing up the remaining capacity for other workloads.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Improved reliability and reduced failures:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; In-place scaling of memory can significantly reduce out-of-memory (OOM) errors. By avoiding the need to restart failed jobs, this capability&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;improves overall workload efficiency and stability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Ray + Kubernetes = The distributed OS for AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to highlight the recent joint innovations from our partnership with Anyscale. The powerful synergy between Ray and Kubernetes positions them as the distributed operating system for modern AI/ML. We believe our continued partnership will accelerate innovation within the open-source Ray and Kubernetes ecosystems, ultimately driving the future of distributed AI/ML.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Together, these updates are a significant step toward Ray working seamlessly on GKE. Here’s how to get started:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Request capacity:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Get started quickly with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Dynamic Workload Scheduler Flex Start&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/dws-flex-start-training-tpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;TPUs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/dws-flex-start-training"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GPUs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which provides access to compute for jobs that run for less than 7 days.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Get started with &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/add-on/ray-on-gke/concepts/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ray on GKE&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;Try out &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/tutorials/distributed-training-tpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;JaxTrainer with TPUs&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;&lt;/div&gt;
&lt;div class="block-related_article_tout"&gt;





&lt;div class="uni-related-article-tout h-c-page"&gt;
  &lt;section class="h-c-grid"&gt;
    &lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/ray-on-tpus-with-gke-a-more-native-experience/"
       data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }'
       class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
        h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"&gt;
      &lt;div class="uni-related-article-tout__inner-wrapper"&gt;
        &lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;

        &lt;div class="uni-related-article-tout__content-wrapper"&gt;
          &lt;div class="uni-related-article-tout__image-wrapper"&gt;
            &lt;div class="uni-related-article-tout__image" style="background-image: url('')"&gt;&lt;/div&gt;
          &lt;/div&gt;
          &lt;div class="uni-related-article-tout__content"&gt;
            &lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;A more native experience for Cloud TPUs with Ray on GKE&lt;/h4&gt;
            &lt;p class="uni-related-article-tout__body"&gt;Ray on GKE has new features: label-based scheduling, atomic slice reservations, JaxTrainer, built-in TPU awareness (topologies/SPMD/metri...&lt;/p&gt;
            &lt;div class="cta module-cta h-c-copy  uni-related-article-tout__cta muted"&gt;
              &lt;span class="nowrap"&gt;Read Article
                &lt;svg class="icon h-c-icon" role="presentation"&gt;
                  &lt;use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"&gt;&lt;/use&gt;
                &lt;/svg&gt;
              &lt;/span&gt;
            &lt;/div&gt;
          &lt;/div&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;/section&gt;
&lt;/div&gt;

&lt;/div&gt;</description><pubDate>Mon, 03 Nov 2025 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/containers-kubernetes/ray-on-gke-new-features-for-ai-scheduling-and-scaling/</guid><category>AI &amp; Machine Learning</category><category>Containers &amp; Kubernetes</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Evolving Ray and Kubernetes together for the future of distributed AI and ML</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/containers-kubernetes/ray-on-gke-new-features-for-ai-scheduling-and-scaling/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Andrew Sy Kim</name><title>Staff Software Engineer, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Edward Oakes</name><title>Staff Software Engineer, Anyscale</title><department></department><company></company></author></item><item><title>Google Cloud and AMD at STAC Summit NYC: H4D VMs for Finance</title><link>https://cloud.google.com/blog/topics/hpc/h4d-delivers-strong-performance-for-financial-services-workloads/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In capital markets, the race for low latency and high performance is relentless. That’s why Google Cloud is partnering with AMD at the premier &lt;/span&gt;&lt;a href="https://stacresearch.com/events/fall2025nyc/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;STAC Summit NYC&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on Tuesday, October 28th! We’re joining forces to demonstrate how our combined innovations are tackling the most demanding workloads in the financial services industry, from real-time risk analysis to algorithmic trading. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;H4D VMs for financial services&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At the core of our offerings are the &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/new-h4d-vms-optimized-for-hpc?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud H4D VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, now in Preview, powered by 5th Gen AMD EPYC processors (codenamed Turin).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The financial world operates at lightning speed, where every millisecond counts. The H4D VM series is purpose built to deliver the extreme performance required for high-frequency trading (HFT), backtesting, market risk simulations (e.g. Monte Carlo), and derivatives pricing. With its exceptional speed and efficiency of communication between cores, massive memory capacity, and optimized network throughput, the H4D series is designed to execute complex computations faster, reduce simulation times, and ultimately deliver a competitive edge.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;H4D: Superior performance for financial workloads&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To quantify the generational performance leap, we commissioned performance testing by AMD. They compared the new H4D VM directly against the previous generation C3D VM (powered by 4th Gen AMD EPYC processors), using the &lt;/span&gt;&lt;a href="https://github.com/KxSystems/nano" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;KX Nano open-source &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;benchmark. This benchmark utility is designed to test the raw CPU, memory, and I/O performance of systems running data operations for kdb+ databases. These high-performance, column-based time series databases are widely used by major financial institutions, including investment banks and hedge funds, to handle large volumes of time-series data like stock market trades and quotes.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The results demonstrated a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;significant, out-of-the-box performance gain&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for the H4D series. With no additional system tuning, the&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; H4D VM outperformed the C3D VM by an average of ~34% across all KX Nano test scenarios&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Scenario1.max-1000x1000.png"
        
          alt="Scenario1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="cbtjk"&gt;Figure 1: Per-core, cache-sensitive operations (Scenario 1) showed H4D's generational lead with a ~1.36x uplift in performance across all test types, confirming superior speed and efficiency of communication between cores and memory latency for key financial modeling functions. *1&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Scenario2.max-1000x1000.png"
        
          alt="Scenario2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="cbtjk"&gt;Figure 2: Multi-core scalability with the number of processors set to the max core count and 1 kdb worker per thread (Scenario 2) delivered a ~1.33x performance uplift across all test types, demonstrating H4D's strong capability for parallel processing across all available cores. *2&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Scenario3.max-1000x1000.png"
        
          alt="Scenario3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="cbtjk"&gt;Figure 3: For heavy, concurrent multi-threaded workloads with 8 threads per kdb+ instance and 1 thread per core (Scenario 3), H4D sustained substantial leadership, delivering relative gains of ~1.33x uplift across all test types. *3&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These benchmark results demonstrate the H4D VMs are built to accelerate your most demanding, low-latency workloads, providing the performance required for high-frequency trading, risk simulations, and quantitative analysis.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A full spectrum of financial services solutions&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The H4D VMs will be a major highlight for Google Cloud and AMD at the STAC Summit next Tuesday. Our booths will also showcase our full spectrum of solutions for financial institutions. Stop by to discuss how we can help optimize your entire technology stack, from data storage to advanced computation:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/blog/topics/hpc/announcing-new-ibm-spectrum-symphony-hostfactory-connectors"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;IBM Symphony GCE and GKE Connectors&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Discover how to extend and manage your existing Platform Symphony grid compute environments by bursting jobs to Compute Engine or Google Kubernetes Engine (GKE).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/products/managed-lustre?e=48754805&amp;amp;hl=en"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Lustre&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Get extreme performance file storage for your most demanding HPC and quantitative workloads without the operational overhead.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/gpu?e=48754805&amp;amp;hl=en"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;GPUs&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; and &lt;/strong&gt;&lt;a href="https://cloud.google.com/tpu?e=48754805&amp;amp;hl=en"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;TPUs&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Learn how our powerful accelerators can dramatically speed up machine learning, AI, and risk analysis tasks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/blog/products/compute/managed-slurm-and-other-cluster-director-enhancements?e=48754805#:~:text=Cluster%20Director%20provides%20fault%2Dtolerant,%2C%20and%20boot%2Ddisk%20size."&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Director with Managed Slurm&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Easily deploy and manage your HPC cluster workloads with our integration for the popular Slurm workload manager.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Come talk to experts!&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We know that performance, security, and compliance are non-negotiable in financial services. Our team will be on site to discuss your specific challenges and demonstrate how Google Cloud, in partnership with AMD, provides the robust, high-performance foundation your firm needs to innovate and thrive.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;We look forward to connecting with you at the Google Cloud and AMD booths at &lt;/strong&gt;&lt;a href="https://stacresearch.com/events/fall2025nyc/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;STAC Summit NYC&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; on October 28th!&lt;/strong&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt; &lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_RsToAkv.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_a8ogcdA.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_TVF43or.max-1000x1000.png"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;</description><pubDate>Wed, 22 Oct 2025 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/h4d-delivers-strong-performance-for-financial-services-workloads/</guid><category>Compute</category><category>Financial Services</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Google Cloud and AMD at STAC Summit NYC: H4D VMs for Finance</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/h4d-delivers-strong-performance-for-financial-services-workloads/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Annie Ma-Weaver</name><title>Group Product Manager, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Anthony Frery</name><title>Customer Engineer, Google Cloud HPC</title><department></department><company></company></author></item><item><title>G4 VMs under the hood: A custom, high-performance P2P fabric for multi-GPU workloads</title><link>https://cloud.google.com/blog/products/compute/g4-vms-p2p-fabric-boosts-multi-gpu-workloads/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we announced the &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/g4-vms-powered-by-nvidia-rtx-6000-blackwell-gpus-are-ga"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;general availability of the G4 VM family&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; based on NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Thanks to unique platform optimizations only available in Google Cloud, G4 VMs deliver the best performance of any commercially available NVIDIA RTX PRO 6000 Blackwell GPU offering for inference and fine-tuning on a wide range of models, from less than 30B to over 100B parameters. In this blog, we discuss the need for these platform optimizations, how they work, and how to use them in your own environment. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Collective communications performance matters &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Large language models (LLMs) vary significantly in size, as characterized by their number of parameters: small (~7B), medium (~70B), and large (~350B+). LLMs often exceed the memory capacity of a single GPU, including the NVIDIA RTX PRO 6000 Blackwell’s, with its 96GB of GDDR7 memory. A common solution is to use tensor parallelism, or TP, which works by distributing individual model layers across multiple GPUs. This involves partitioning a layer's weight matrices, allowing each GPU to perform a partial computation in parallel. However, a significant performance bottleneck arises from the subsequent need to combine these partial results using collective communication operations like All-Gather or All-Reduce.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The G4 family of GPU virtual machines utilizes a PCIe-only interconnect. We drew on our extensive infrastructure expertise to develop this high-performance, software-defined PCIe fabric that supports peer-to-peer (P2P) communication. Crucially, G4’s platform-level P2P optimization substantially accelerates collective communications for workloads that require multi-GPU scaling, resulting in a notable boost for both inference and fine-tuning of LLMs.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;How G4 accelerates multi-GPU performance&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Multi-GPU G4 VM shapes get their significantly enhanced PCIe P2P capabilities from a combination of both custom hardware and software. This advancement directly optimizes collective communications, including All-to-All, All-Reduce, and All-Gather collectives for managing GPU data exchange. The result is a low-latency data path that delivers a substantial performance increase for critical workloads like multi-GPU inference and fine-tuning.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In fact, across all major collectives, the enhanced G4 P2P capability provides an acceleration of &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;up to 2.2x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; without requiring any changes to the code or workload.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/01_collective_communications.max-1000x1000.jpg"
        
          alt="01_collective_communications"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Inference performance boost by P2P on G4&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;On G4 instances, enhanced peer-to-peer communication directly boosts multi-GPU workload performance, particularly for tensor parallel inference with vLLM, with up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;168% higher throughput&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;41% lower inter-token latency &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;(ITL).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We observe these improvements when using tensor parallelism for model serving, especially when compared to standard non-P2P offerings.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/02_throughput.max-1000x1000.jpg"
        
          alt="02_throughput"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At the same time, G4 coupled with software-defined PCIe and P2P innovation, significantly enhances inference throughput and reduces latency, giving you the control to optimize your inference deployment for your business needs.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/03_latency.max-1000x1000.jpg"
        
          alt="03_latency"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Throughput or speed: G4 with P2P lets you choose&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The platform-level optimizations on G4 VMs translate directly into a flexible and powerful competitive advantage. For interactive generative AI applications, where user experience is paramount, G4’s P2P technology delivers up to 41% less inter-token latency — the critical delay between generating each part of a response. This results in a noticeably snappier and more reactive end-user experience, increasing their satisfaction with your AI application.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alternatively, for workloads where raw throughput is the priority, such as batch inference, G4 with P2P enables customers to serve up to 168% more requests than comparable offerings. This means you can either increase the number of users served by each model instance, or significantly improve the responsiveness of your AI applications. Whether your focus is on latency-sensitive interactions or high-volume throughput, G4 provides a superior return on investment compared to other NVIDIA RTX PRO 6000 offerings in the market.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Scale further with G4 and GKE Inference Gateway&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While P2P optimizes performance for a single model replica, scaling to meet production demand often requires multiple replicas. This is where the &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/about-gke-inference-gateway"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GKE Inference Gateway&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; really shines. It acts as an intelligent traffic manager for your models, using advanced features like prefix-cache-aware routing and custom scheduling to maximize throughput and slash latency across your entire deployment.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By combining the vertical scaling of G4's P2P with the horizontal scaling of the Inference Gateway, you can build an end-to-end serving solution that is exceptionally performant and cost-effective for the most demanding generative AI applications. For instance, you can use G4's P2P to efficiently run a 2-GPU Llama-3.1-70B model replica with 66% higher throughput, and then use GKE Inference Gateway to intelligently manage and autoscale multiple of these replicas to meet global user demand.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/04_inference_gateway.max-1000x1000.jpg"
        
          alt="04_inference_gateway"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;G4 P2P supported VM Shapes&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Peer-to-peer capabilities for NVIDIA RTX PRO 6000 Blackwell are available with the following multi-GPU G4 VM shapes:&lt;/span&gt;&lt;/p&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Machine Type&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;GPUs&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Peer-to-Peer&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;GPU Memory (GB)&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;vCPUs&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Host Memory (GB)&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Local SSD (GB)&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;g4-standard-96&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;2&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;Yes&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;192&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;96&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;360&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;3,000&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;g4-standard-192&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;4&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;Yes&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;384&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;192&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;720&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;6,000&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;g4-standard-384&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;8&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;Yes&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;768&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;384&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;1,440&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: middle; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;12,000&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For VM shapes smaller than 8 GPUs, our software defined PCIe fabric ensures path isolation between GPUs assigned to different VMs on the same physical machine. PCIe paths are created dynamically at VM creation and are dependent on the VM shape, ensuring isolation on multiple levels of the platform stack to prevent communication between GPUs that are not assigned to the same VM.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started with P2P on G4&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The G4 peer-to-peer capability is transparent to the workload, and requires no changes to the application code or to libraries such as the &lt;/span&gt;&lt;a href="https://developer.nvidia.com/nccl" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NVIDIA Collective Communications Library&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (NCCL). All peer-to-peer paths are automatically set up during VM creation. You can find more information about enabling peer-to-peer for NCCL-based workloads in the &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/accelerator-optimized-machines?hl=en#g4-gpu-p2p"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;G4 documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Try &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud G4 VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with P2P from the Google Cloud console today, and start building your inference platform with GKE Inference Gateway. For more information, please contact your Google Cloud sales team or reseller.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 20 Oct 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/compute/g4-vms-p2p-fabric-boosts-multi-gpu-workloads/</guid><category>AI &amp; Machine Learning</category><category>HPC</category><category>Compute</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>G4 VMs under the hood: A custom, high-performance P2P fabric for multi-GPU workloads</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/compute/g4-vms-p2p-fabric-boosts-multi-gpu-workloads/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Cyrill Hug</name><title>Sr. Product Manager Accelerator Software, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Prashanth Prakash</name><title>Software Engineer, Google</title><department></department><company></company></author></item><item><title>Open-source and enterprise-ready: IBM Spectrum Symphony connectors for Google Cloud</title><link>https://cloud.google.com/blog/topics/hpc/announcing-new-ibm-spectrum-symphony-hostfactory-connectors/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud, we are committed to helping customers deploy their high performance computing (HPC) grid workloads to our platform. Today, we are thrilled to announce the general availability of open-source &lt;/span&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/ibm-symphony/ibm-symphony"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;IBM Spectrum Symphony HostFactory connectors for Google Compute Engine and Google Kubernetes Engine (GKE)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This integration between Google Cloud and IBM Spectrum Symphony gives you access to the benefits of Google Cloud for your grid workloads by supporting common architectures and requirements, namely:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Extending your on-premises cluster to Google Cloud and automatically adding compute capacity to reduce execution time of your jobs, or&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Deploying an entire cluster in Google Cloud and automatically provisioning and decommissioning compute resources based on your workloads&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These connectors are provided in the form of IBM Spectrum Symphony HostFactory custom cloud providers. They are open source and can be easily deployed either via &lt;/span&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Toolkit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or manually.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Partner-built and tested for enterprise scale&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To deliver robust, production-ready connectors, we collaborated with key partners who have deep expertise in financial services and HPC. Accenture built the Compute Engine and GKE connectors and Aneo performed rigorous user acceptance testing to ensure they met the stringent demands of our enterprise customers.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Accenture is proud to have collaborated with Google Cloud to help develop the IBM Spectrum Symphony connectors. Our expertise in both financial services and cloud solutions allows us to enable customers to seamlessly migrate their critical HPC workloads to Google Cloud's high-performance infrastructure." &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Keith Jackson, Managing Director - Financial Services, Accenture&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“At Aneo, we subjected the IBM Spectrum Symphony connectors to rigorous, large-scale testing to ensure they meet the demanding performance and scalability requirements of enterprise HPC. We validated the connector's ability to efficiently manage up to 5,000 server nodes, confirming its readiness for production workloads." &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- William Simon Horn, Cloud HPC Engineer, and Wilfried Kirschenmann, CTO, Aneo&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud rapidly scales to meet extreme HPC demands, provisioning over 100,000 vCPUs across 5,000 compute pods in under 8 minutes with the new IBM Spectrum Symphony connector for GKE. IBM has tested and supports Spectrum Symphony up to 5,000 compute nodes, so we set this as our target for scale testing the new GCP connector.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_3DA0tSq.max-1000x1000.png"
        
          alt="image2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The GCE connector demonstrates excellent provisioning speed and stability up to the mid-scale range. The connector also successfully scales to over 5,000 nodes and 125,000 vCPUs in less than 2 minutes.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_VF5jiVC.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We achieved this performance by leveraging innovative GKE features like image preloading and custom compute classes, enabling customers in demanding sectors like FSI to accelerate mission-critical workloads while optimizing for cost and hybrid cloud flexibility. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Powerful features to run your way&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The connectors are built to provide the flexibility and control needed to manage complex HPC environments. They are available as open-source software in a Google-owned repository. Key features include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Support for Compute Engine and GKE&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Separate IBM Spectrum Symphony Host Factory cloud providers for Compute Engine and GKE allow you to scale your cluster across both virtual machines and containerized environments.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexible consumption models&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Support for Spot VMs, on-demand VMs, or a mix of both lets you optimize cost and performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Template-based provisioning&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Use configurable resource templates that align with your workloads requirements.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Comprehensive instance support&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Full integration with managed instance group (MIG) APIs, GPUs, Local SSD, and Confidential Computing VMs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Event-driven management&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Pub/Sub integration allows for event-driven resource management for Compute Engine instances.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Kubernetes-native&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The GKE connector uses a custom Kubernetes operator with Custom Resource Definitions (CRDs) to manage the entire lifecycle of Symphony compute pods. Leverage GKE’s scaling capabilities and custom hardware like GPUs and TPUs through transparent compatibility with GKE &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/about-custom-compute-classes"&gt;custom compute classes (CCC)&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and Node Pool Autoscaler.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;High-scalability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The connectors are built for high-performance with asynchronous operations to handle large-scale deployments.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Resiliency&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Automatic detection and handling of Spot VM preemptions helps ensure workload reliability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Logging and monitoring&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Integrated with Google Cloud's operations suite for observability and reporting.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enterprise support&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The connectors are supported as a first-party solution by Google Cloud, with an established escalation path to our development partner, Accenture.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Getting started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can begin using the IBM Spectrum Symphony connectors for Google Cloud today.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Find the connectors&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; in the &lt;/span&gt;&lt;a href="https://github.com/google/symphony-gcp/tree/main" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud repository&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Explore the &lt;/strong&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/ibm-symphony/ibm-symphony"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;technical documentation&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including the reference architecture, to get started.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/contact"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Contact Google Cloud&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; or your Google Cloud account team&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to learn more about how to migrate your HPC workloads.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To help ensure your success, we will continue to invest in the solutions you need to accelerate your research and business goals. We look forward to seeing what you can achieve with the scale and power of Google Cloud.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 14 Oct 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/announcing-new-ibm-spectrum-symphony-hostfactory-connectors/</guid><category>Compute</category><category>Containers &amp; Kubernetes</category><category>GKE</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Open-source and enterprise-ready: IBM Spectrum Symphony connectors for Google Cloud</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/announcing-new-ibm-spectrum-symphony-hostfactory-connectors/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Annie Ma-Weaver</name><title>Group Product Manager, Google Cloud HPC</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Anthony Frery</name><title>Customer Engineer, Google Cloud HPC</title><department></department><company></company></author></item><item><title>5 best practices for Managed Lustre on Google Kubernetes Engine</title><link>https://cloud.google.com/blog/products/containers-kubernetes/gke-managed-lustre-csi-driver-for-aiml-and-hpc-workloads/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Kubernetes Engine (GKE) is a powerful platform for orchestrating scalable AI and high-performance computing (HPC) workloads. But as clusters grow and jobs become more data-intensive, storage I/O can become a bottleneck. Your powerful GPUs and TPUs can end up idle, while waiting for data, driving up costs and slowing down innovation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://goo.gle/managed-lustre-overview" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Managed Lustre&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is designed to solve this problem. Many on-premises HPC environments already use parallel file systems, and Managed Lustre makes it easier to bring those workloads to the cloud. With its managed Container Storage Interface (CSI) driver, Managed Lustre and GKE operations are fully integrated.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Optimizing your move to a high-performance parallel file system can help you get the most out of your investment from day one. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Before deploying, it's helpful to know when to use Managed Lustre versus other options like Google Cloud Storage. For most AI and ML workloads, Managed Lustre is the recommended solution. It excels in training and checkpointing scenarios that require very low latency (less than a millisecond) and high throughput for small files, which keeps your expensive accelerators fully utilized. For data archiving or workloads with large files (over 50 MB) that can tolerate higher latency, Cloud Storage FUSE with Anywhere Cache can be another choice.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Based on our work with early customers and the learnings from our teams, here are five best practices to ensure you get the most out of Managed Lustre on GKE.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;$300 in free credit to try Google Cloud containers and Kubernetes&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744e9fa30&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Design for data locality &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For performance-sensitive applications, you want your compute resources and storage to be as close as possible, ideally within the same zone in a given region. When provisioning volumes dynamically, the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;volumeBindingMode&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; parameter in your &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;StorageClass&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; is your most important tool. We strongly recommend setting it to &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;WaitForFirstConsumer&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;. GKE provides a built-in StorageClass for Managed Lustre that uses &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;WaitForFirstConsumer&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; binding mode by default.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Generated yaml:&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;apiVersion: storage.k8s.io/v1\r\nkind: StorageClass\r\nmetadata:\r\n  name: lustre-regional-wait\r\nprovisioner: lustre.csi.storage.gke.io\r\nvolumeBindingMode: WaitForFirstConsumer\r\n...&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744e9ffd0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it’s a best practice:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Using &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;WaitForFirstConsumer&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; instructs GKE to delay &lt;/span&gt;&lt;span style="text-decoration: line-through; vertical-align: baseline;"&gt;the&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; provisioning &lt;/span&gt;&lt;span style="text-decoration: line-through; vertical-align: baseline;"&gt;of&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; the Lustre instance until a pod that needs it is scheduled. The scheduler then uses the pod's topology constraints (i.e., the zone it's scheduled in) to create the Lustre instance in that exact same zone. This guarantees co-location of your storage and compute, minimizing network latency.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Right-size your performance with tiers&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Not all high-performance workloads are the same. Managed Lustre offers multiple &lt;/span&gt;&lt;a href="https://cloud.google.com/managed-lustre/docs/performance"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;performance tiers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (read and write throughput in MB/s per TiB of storage) so you can align cost directly with your performance requirements.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;1000 &amp;amp; 500 MB/s/TiB:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Ideal for throughput-critical workloads like foundation model training or large-scale physics simulations where I/O bandwidth is the primary bottleneck.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;250 MB/s/TiB:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A balanced, cost-effective tier great for many general HPC workloads and AI inference serving, and data-heavy analytics pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;125 MB/s/TiB:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Best for large-capacity use cases where having a massive, POSIX-compliant file system is more important than achieving peak throughput. This is also useful for migrating on-premises containerized applications without modification,&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;making it easier to migrate on-premises workloads to the cloud storage.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_JuBQFJn.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it’s a best practice: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Defaulting to the highest tier isn't always the most cost-effective strategy. By analyzing your workload’s I/O profile, you can significantly optimize your total cost of ownership. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Master your networking foundation&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A parallel file system is a network-attached resource. Getting the networking right up front will save you days of troubleshooting. Before provisioning, ensure your VPC is correctly configured by following the setup steps in our &lt;/span&gt;&lt;a href="https://cloud.google.com/managed-lustre/docs/vpc#create_and_configure_the_vpc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;This involves three key steps detailed in our documentation:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enable Service Networking.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Create an IP range&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for VPC peering.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Create a firewall rule&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to allow traffic from that range on the Lustre network port (TCP 988 or 6988).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it’s a best practice:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This is a one-time setup per VPC that establishes the secure peering connection that allows your GKE nodes to communicate with the Managed Lustre service. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;4. Use dynamic provisioning for simplicity, static for long-lived shared data&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Managed Lustre CSI driver supports &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/lustre-csi-driver-new-volume"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;two modes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for connecting storage to your GKE workloads.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Dynamic provisioning:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Use when your storage is tightly coupled to the lifecycle of a specific workload or application. By defining a StorageClass and PersistentVolumeClaim (PVC), GKE will automatically manage the Lustre instance lifecycle for you. This is the simplest, most automated approach.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Static provisioning:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Use when you have a long-lived Lustre instance that needs to be shared across multiple GKE clusters and jobs. You create the Lustre instance once, then create a PersistentVolume (PV) and PVC in your cluster to mount it. This decouples the storage lifecycle from any single workload.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it’s a best practice:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Thinking about your data’s lifecycle helps you choose the right pattern. Use dynamic provisioning as your default because of simplicity, and opt for static provisioning when you need to treat your file system as a persistent, shared resource across your organization.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;5. &lt;/strong&gt;&lt;strong style="vertical-align: baseline;"&gt;Architecting for parallelism with Kubernetes Jobs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many AI and HPC tasks, like data preprocessing or batch inference, are suited for parallel execution. Instead of running a single, large pod, use the Kubernetes Job resource to divide the work across many smaller pods.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Consider this pattern:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Create a single PersistentVolumeClaim for your Managed Lustre instance, making it available to your cluster.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Define a Kubernetes job with parallelism set to a high number (e.g., 100).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Each pod created by the Job mounts the same Lustre PVC.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Design your application so that each pod works on a different subset of the data (e.g., processing a different range of files or data chunks).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it’s a best practice: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;In this pattern, you create a single PVC for your Lustre instance and have each pod created by the Job mount that same PVC. By designing your application so that each pod works on a different subset of the data, you turn your GKE cluster into a powerful, distributed data processing engine. The GKE Job controller acts as the parallel task orchestrator, while Managed Lustre serves as the high-speed data backbone, allowing you to achieve massive aggregate throughput.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By combining the orchestration power of GKE with the performance of Managed Lustre, you can build a truly scalable and efficient platform for AI and HPC. Following these best practices will help you create a solution that is not only powerful, but also efficient, cost-effective, and easy to manage.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ready to get started? Explore the &lt;/span&gt;&lt;a href="https://cloud.google.com/managed-lustre/docs/overview"&gt;&lt;span style="vertical-align: baseline;"&gt;Managed Lustre documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and provision your first instance today.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 19 Sep 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/containers-kubernetes/gke-managed-lustre-csi-driver-for-aiml-and-hpc-workloads/</guid><category>Storage &amp; Data Transfer</category><category>HPC</category><category>Containers &amp; Kubernetes</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>5 best practices for Managed Lustre on Google Kubernetes Engine</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/containers-kubernetes/gke-managed-lustre-csi-driver-for-aiml-and-hpc-workloads/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nishtha Jain</name><title>Engineering Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Dan Eawaz</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>Accelerate your AI workloads with the Google Cloud Managed Lustre</title><link>https://cloud.google.com/blog/products/storage-data-transfer/google-cloud-managed-lustre-for-ai-hpc/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we're making it even easier to achieve breakthrough performance for your AI/ML workloads: &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-lustre"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Managed Lustre&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is now GA, and available in four distinct performance tiers that deliver throughput ranging from 125 MB/s, 250 MB/s, 500 MB/s, to 1000 MB/s per TiB of capacity — with the ability to scale up to 8 PB of storage capacity. The Managed Lustre solution is powered by DDN’s EXAScaler, combining DDN's decades of leadership in high-performance storage with Google Cloud's expertise in cloud infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Managed Lustre provides a POSIX-compliant, parallel file system that delivers consistently high throughput and low latency, essential for:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;High-throughput inference:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; For applications that require near-real-time inference on large datasets, Lustre provides high parallel throughput and sub-millisecond read latency.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Large-scale model training:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Accelerate the training cycles of deep learning models by providing rapid access to petabytes-sized datasets. Lustre's parallel architecture ensures GPUs and TPUs are fed with data, minimizing idle time.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Checkpointing and restarting large models:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Save and restore the state of large models during training faster, improving goodput and allowing for more efficient experimentation.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data preprocessing and feature engineering:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Process raw data, extract features, and prepare datasets for training, reducing the time spent on data pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scientific simulations and research:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Beyond AI/ML, Lustre excels in traditional HPC scenarios like computational fluid dynamics, genomic sequencing, and climate modeling, where massive datasets and high-concurrency access are critical.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Lustre is designed for the highly parallel and random I/O that characterizes many AI/ML training and inference tasks. This parallel processing capability across multiple clients ensures your compute resources are never starved for data.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Performance tiers and pricing&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Managed Lustre offers flexible pricing and performance tiers designed to meet the diverse needs of your workloads, whether you're focused on capacity or highest throughput density. &lt;/span&gt;&lt;/p&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table style="width: 98.4334%;"&gt;&lt;colgroup&gt;&lt;col style="width: 56.3665%;"/&gt;&lt;col style="width: 43.6335%;"/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Throughput &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;MB/s&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;per TiB &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;of storage capacity&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Storage pricing per GiB per month&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;125&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;$0.145&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;250&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;$0.21&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;500&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;$0.34&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;1000&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;$0.60&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Please see more details at the &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-lustre/pricing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Lustre pricing page&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Irrespective of the aggregate throughput, all tiers come with sub-millisecond read latency, high single-stream throughput, and are perfect for parallel access to many small files.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Driving innovation together: partnering with DDN&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud’s Managed Lustre is powered by &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;DDN’s EXAScaler&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, bringing together two industry leaders in high-performance computing and elastic cloud infrastructure. This partnership represents a joint commitment to simplifying the deployment and management of large-scale AI and HPC workloads in the cloud, thanks to:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Trusted leaders:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By combining DDN's decades of expertise in high-performance Lustre with Google Cloud's global infrastructure and AI ecosystem, we are delivering a foundational capability that removes storage bottlenecks and helps our customers solve their most complex challenges in AI and HPC.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fully managed and supported solution:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Enjoy the benefits of a fully managed service from Google, with comprehensive support from both Google and DDN, for seamless operations and peace of mind.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Global availability and ecosystem integration:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Managed Lustre is now globally accessible in &lt;/span&gt;&lt;a href="https://cloud.google.com/managed-lustre/docs/locations"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;multiple Google Cloud regions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and integrates with the broader Google Cloud ecosystem, including Google Kubernetes Engine (GKE) and TPUs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These benefits caught the attention of one of our largest partners, NVIDIA, who is looking forward to having it as part of its NVIDIA AI platform. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;"&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Enterprises today demand AI infrastructure that combines accelerated computing with high-performance storage solutions to deliver uncompromising speed, seamless scalability and cost efficiency at scale. Google and DDN’s collaboration on Google Cloud Managed Lustre creates a better-together solution uniquely suited to meet these needs. By integrating DDN’s enterprise-grade data platforms and Google’s global cloud capabilities, organizations can readily access vast amounts of data and unlock the full potential of AI with the NVIDIA AI platform (or NVIDIA accelerated computing platform) on Google Cloud — reducing time-to-insight, maximizing GPU utilization, and lowering total cost of ownership.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;” - Dave Salvator, Director of Accelerated Computing Products, NVIDIA&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today!&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ready to supercharge your AI/ML and HPC workloads? Getting started with Managed Lustre is simple:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Navigate to &lt;/span&gt;&lt;a href="https://console.cloud.google.com/managed-lustre/"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Lustre in the Google Cloud console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Provision your Managed Lustre instance, choosing the performance tier and size that best fits your needs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Connect your compute instances, GKE clusters to your new high-performance file system.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For detailed instructions and documentation, please visit the Managed Lustre &lt;/span&gt;&lt;a href="https://cloud.google.com/managed-lustre/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. And if needed, &lt;/span&gt;&lt;a href="https://cloud.google.com/contact"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;reach out to Google Cloud sales specialists&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Watch the Fireside Chat&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Don't miss the opportunity to learn more about the strategic partnership between Google Cloud and DDN, and the unique capabilities of Managed Lustre. Read the official DDN press release &lt;/span&gt;&lt;a href="https://www.ddn.com/press-releases/google-cloud-launches-general-availability-of-managed-lustre-powered-by-ddns-exascaler-technology/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Watch the fireside chat with Sameet Agarwal, VP/GM Storage and Sven Oehme, CTO of DDN, &lt;/strong&gt;&lt;a href="https://www.youtube.com/watch?v=i6gEHUzIo1w" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 08 Jul 2025 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/storage-data-transfer/google-cloud-managed-lustre-for-ai-hpc/</guid><category>AI &amp; Machine Learning</category><category>HPC</category><category>Storage &amp; Data Transfer</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Accelerate your AI workloads with the Google Cloud Managed Lustre</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/storage-data-transfer/google-cloud-managed-lustre-for-ai-hpc/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Asad Khan</name><title>Sr. Director of Product Management, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Kirill Tropin</name><title>Group Product Manager</title><department></department><company></company></author></item><item><title>SandboxAQ: Accelerating drug discovery through cloud integration</title><link>https://cloud.google.com/blog/products/infrastructure-modernization/sandboxaq-speeds-up-drug-discovery-with-the-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The traditional drug discovery process involves massive capital investments, prolonged timelines, and is plagued with daunting failure rates. From initial research to obtaining regulatory approval, bringing a new drug to market can take decades. During this time, many drug candidates that had seemed very promising fail to deliver, either due to inefficacy or safety concerns. Only a small fraction of candidates successfully make it through clinical trials and regulatory hurdles. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Enter &lt;/span&gt;&lt;a href="https://www.sandboxaq.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;SandboxAQ&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which is helping researchers explore vast chemical spaces, gain deep insights into molecular interactions, and predict biological outcomes with precision. It does so with cutting-edge computational approaches such as active learning, &lt;/span&gt;&lt;a href="https://pubs.acs.org/doi/10.1021/acs.jctc.4c00399" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;absolute free energy perturbation solution (AQFEP)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2405.11785" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;generative AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, structural analysis, and predictive data analytics, ultimately reducing drug discovery and development timelines. And it does all this on a cloud-native foundation. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Drug design involves an iterative cycle of designing, synthesizing, and testing molecules referred to as the Design-Make-Test cycle. Many customers approach SandboxAQ during the design phase, often when their computational methods are falling short. By improving and accelerating this part of the cycle, SandboxAQ helps medicinal chemists bring innovative and effective molecules to market. For example, in a project related to neurodegenerative disease, SandboxAQ’s approach expanded chemical space from 250,000 to 5.6 million molecules, achieving a 30-fold increase in hit rate and dramatically accelerating the discovery of candidate molecules. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_OZJ38Qu.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud-native development for scientific insight&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;SandboxAQ’s software relies on large-scale computation and to maximize flexibility and scale, they use a cloud strategy,  which includes Google Cloud infrastructure and tools. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The technologies in large-scale virtual screening campaigns need to be agile and scale cost-effectively. Specifically, SandboxAQ engineers need to be able to quickly iterate on scientific code, immediately run that code at scale cost-effectively, and store and organize all of the data it produces. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;SandboxAQ achieved a significant boost in efficiency and scalability with Google Cloud infrastructure. They scaled their computational throughput by 100X to leverage tens of thousands of virtual machines (VMs) in parallel. They also improved utilization by reducing idle time by 90%. By consolidating development and deployment on Google Cloud, SandboxAQ streamlined its workflows, from code development and testing to large-scale batch processing and machine-learning model training. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1758720d60&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;All of SandboxAQ’s development and deployment takes place in the cloud. Code and data live in cloud-based services, and development is done on a cloud-based platform that provides scientists and engineers with self-service VMs with standardized and centrally maintained environments and tools. This is important, because scientific code often requires heavy-duty computing hardware. Scientists have access to hefty 96-core machines, or instances with large GPUs. They can also create new machines with alternate configurations or CPU types as depicted below, enabling low-friction testing and development processes across heterogeneous resources.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_mgPMly4.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;SandboxAQ scientists and developers manage and access their Bench machines (see above) using the company’s `bench` client. They can connect to machines via SSH or use any number of managed tools, for example a browser-based VNC service for instant remote desktop, or JupyterLab for a familiar notebook development flow.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As code is ready to be run at a larger scale, researchers can dispatch SandboxAQ parameterized sets of computations as jobs on an internal tool powered by &lt;/span&gt;&lt;a href="https://cloud.google.com/batch?e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Batch&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a fully managed service to schedule, queue, and execute batch jobs on Google infrastructure. With development and batch runtime environments closely synced, changes can be quickly run at scale. Code developed on bench machines is pushed to GitHub and immediately available for batch execution. Then, as tools are reviewed and merged into `main` of the company’s monorepo, the new tools become automatically available on SandboxAQ scientists’ bench machines, who can launch parallel jobs processing millions of molecules on any kind of Google Cloud VM resource in any global zone, utilizing either on-demand or &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/spot"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Spot VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;SandboxAQ's implementation of a globally resolved transitive dependency tree, enables simple package and dependency management. With this practice, Google Batch can seamlessly integrate with individual tools developed by engineers to train many instances of a model in parallel.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Machine learning is a core component of SandoxAQ’s strategy, making easy data access especially important. At the same time, SandboxAQ’s Drug Discovery team also works with clients who have sensitive data. To secure customers’ data, bench and batch workloads read and write data from a unified interface that’s managed via IAM, allowing granular control of different data sources within the organization.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Meanwhile, Google Cloud services like Cloud Logging, Cloud Monitoring, Compute Engine and Cloud Run make it simple to develop tools to monitor these workloads, easily surface logs to SandboxAQ scientists, and comb through huge amounts of output data. As new features are tested or bugs show up, changes are made immediately available to the scientific team, without having to wrangle infrastructure. Then, as code becomes stable, they can incorporate it into downstream production applications, all in a centrally secured, unified way on Google Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In short, having a unified development, batch compute, and production environment on Google Cloud reduces the friction SandboxAQ faces to develop new workloads and run them at scale. With shared environments for scientific workload development and engineering, SandboxAQ makes it quick and easy for customers to move from experimentation to production, delivering the results customers want, fast.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;SandboxAQ solution in the real world&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;SandboxAQ is already having a profound impact on drug discovery programs targeting a range of hard-to-treat diseases. For example, there are advanced collaborations with Professor Stanley Pruisner's lab at University of California San Francisco (&lt;/span&gt;&lt;a href="https://www.sandboxaq.com/press/sandboxaq-announces-bio-pharma-molecular-simulation-division-to-speed-life-saving-drugs-to-patients-through-ai-and-quantum-solutions" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;UCSF&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;), &lt;/span&gt;&lt;a href="https://www.sandboxaq.com/press/sandboxaq-announces-bio-pharma-molecular-simulation-division-to-speed-life-saving-drugs-to-patients-through-ai-and-quantum-solutions" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Riboscience&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://www.sandboxaq.com/press/sandboxaq-selected-by-sanofi-for-quantitative-ai-driven-biomarker-identification" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Sanofi&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and with the &lt;/span&gt;&lt;a href="https://www.sandboxaq.com/press/the-michael-j-fox-foundation-selects-sandboxaq-partner-for-25-million-initiative-to-develop-novel-parkinsons-disease-treatments" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Michael J Fox Foundation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, to name a few. With this approach built on Google CloudSandboxAQ has achieved &lt;/span&gt;&lt;a href="https://www.sandboxaq.com/post/biopharmas-quantum-leap-2" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;a superior hit rate compared to other methods like high throughput screening&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, demonstrating the transformative potential of SandboxAQ on drug discovery and bringing cures to patients faster. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Visit the &lt;/span&gt;&lt;a href="https://cloud.google.com/solutions/ai-hypercomputer?hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud AI Hypercomputer web page&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to learn about Google Cloud AI infrastructure.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 29 Apr 2025 15:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/infrastructure-modernization/sandboxaq-speeds-up-drug-discovery-with-the-cloud/</guid><category>AI &amp; Machine Learning</category><category>HPC</category><category>Infrastructure Modernization</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>SandboxAQ: Accelerating drug discovery through cloud integration</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/infrastructure-modernization/sandboxaq-speeds-up-drug-discovery-with-the-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Ruslan Mursalzade</name><title>Product Marketing Lead, Google Cloud AI Infrastructure</title><department></department><company></company></author></item><item><title>H4D VMs: Next-generation HPC-optimized VMs</title><link>https://cloud.google.com/blog/products/compute/new-h4d-vms-optimized-for-hpc/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud Next, we introduced H4D VMs, our latest machine type for high performance computing (HPC). Building upon existing &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/compute-optimized-machines"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;HPC offerings&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, H4D VMs are designed to address the evolving needs of demanding workloads in industries such as manufacturing, weather forecasting, EDA, and healthcare and life sciences.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;H4D VMs are powered by the &lt;/span&gt;&lt;a href="https://www.amd.com/en/products/processors/server/epyc/9005-series.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;5th Generation AMD EPYC&lt;/span&gt;&lt;/a&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;TM&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt; Processors, offering improved &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;whole-node VM performance &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;of more than 12,000 &lt;span style="vertical-align: baseline;"&gt;gflops&lt;/span&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;and&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; improved memory bandwidth &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;of more than 950 GB/s&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; H4D provides &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;low-latency and 200 Gbps network bandwidth using Cloud Remote Direct Memory Access (RDMA) on Titanium, &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;the first of our CPU-based VMs to do so.&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;This powerful combination enables you to efficiently scale your HPC workloads and achieve insights faster. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_H4D_Performance_Overview_70YwFM8.max-2800x2800.jpg"
        
          alt="1 H4D Performance Overview"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="fvc15"&gt;VM and core performance, as well as memory bandwidth for H4D vs. C2D and C3D, showing generational improvement&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For open-source High-Performance Linpack (OSS-HPL), a widely-used benchmark for measuring the floating-point computing power of supercomputers, H4D offers 1.8x higher performance per VM and 1.6x higher performance per core compared to C3D. Additionally, H4D offers 5.8x higher performance per VM and 1.7x higher performance per core compared to C2D.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For STREAM Triad, a benchmark to measure memory bandwidth, H4D offers 1.3x higher performance per VM and 1.4x higher performance per core compared to C3D. Additionally, H4D offers 3x higher performance per VM and 1.4x higher performance per core compared to C2D.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;$300 in free credit to try Google Cloud infrastructure&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744736a00&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Start building for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;http://console.cloud.google.com/freetrial?redirectPath=/compute&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Improved HPC application performance&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;H4D VMs deliver strong compute performance and memory bandwidth, significantly outperforming previous generations of AMD-based VMs like C2D and C3D, allowing for faster simulations and analysis, and delivering significant performance gains &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;(relative to a prior generation AMD-based HPC VM, C2D)&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; across various HPC applications and benchmarks, as illustrated below:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Manufacturing&lt;/span&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;CFD apps like Siemens&lt;/span&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;TM&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt; Simcenter STAR-CCM+&lt;/span&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;TM&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt;/HIMach show up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;3.6x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; improvement.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;CFD apps like Ansys Fluent/f1_racecar_140 show up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;3.6x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; improvement.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;FEA Explicit apps like Altair Radioss/T10m show up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;3.6x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; improvement.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;CFD apps like OpenFoam/Motorbike_20m show up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;2.9x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; improvement. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;FEA Implicit apps like Ansys Mechanical/gearbox shows up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;2.7x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; improvement.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Healthcare and life sciences:&lt;/span&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Molecular Dynamics (GROMACS) shows up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;5x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; improvement.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Weather forecasting&lt;/span&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Industry standard benchmark WRFv4 shows up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;3.6x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; improvement.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_H4D_Performance_Overview__0J9kfD9.jpg"
        
          alt="2 H4D Performance Overview"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="t80p8"&gt;Figure 2: Single VM HPC Application performance (speed-up) of H4D, C3D and C2D relative to C2D. Applications ran on single VMs using all cores.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Our deep collaboration with Google Cloud powers the next generation of cloud-based HPC with the announcement of the new H4D VMs. Google Cloud has leveraged the architectural advances of our 5&lt;/span&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;th&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; Gen AMD EPYC CPUs to create an offering that delivers impressive performance uplift compared to previous generations across a variety of HPC benchmarks. This will empower customers to achieve fast insights and accelerate their most demanding HPC workloads.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Ram Peddibhotla, corporate vice president, Cloud Business, AMD&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster HPC with Cloud RDMA on Titanium&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;H4D’s performance is made possible with Cloud RDMA, a new Titanium offload that’s available for the first time on these VMs. Cloud RDMA is specifically engineered to support HPC workloads that rely heavily on inter-node communication, such as computational fluid dynamics, weather modeling, molecular dynamics, and more. By offloading network processing, Cloud RDMA provides predictable, low-latency, high-bandwidth communication between compute nodes, thus minimizing host CPU bottlenecks. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Under the hood, Cloud RDMA uses Google’s innovative &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Falcon&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; hardware transport for reliable, low-latency communication over our Ethernet-based data center networks, effectively resolving the traditional challenges of RDMA over Ethernet while helping to ensure predictable, high performance at scale. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud RDMA over Falcon speeds up simulations by efficiently utilizing more computational resources. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;For example, for smaller CFD problems like OpenFoam/motorbike_20m and Simcenter Star-CCM+/HIMach10, which have limited inherent parallelism and are typically challenging to accelerate,&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; H4D results in 3.4x and 1.9x speedup, respectively, on four VMs compared to TCP.&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_H4D_Performance_Overview_ACW0JRf.max-280.max-1000x1000.jpg"
        
          alt="3 H4D Performance Overview"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="t80p8"&gt;Figure 3: Left: OpenFoam/Motorbike_20m offers a 3.4x improvement with H4D Cloud RDMA over TCP at four VMs. Right: Simcenter STAR-CCM+/HIMach10 offers a 1.9x improvement with H4D Cloud RDMA over TCP at four VMs.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For larger models, Falcon also helps maintain strong scaling. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Using 32 VMs, Falcon achieved a 2.8x speedup over TCP for GROMACS/Lignocellulose and a 1.3x speedup for WRFv4/Conus 2.5km.&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_H4D_Performance_Overview_xD2Vaok.max-280.max-1000x1000.jpg"
        
          alt="4 H4D Performance Overview"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="t80p8"&gt;Figure 4: Left: GROMACS/Lignocellulose offers a 2.8x improvement with H4D Cloud RDMA over TCP at 32 VMs. Right: WRFv4/Conus 2.5km offers a 1.3x improvement with H4D Cloud RDMA over TCP at 32 VMs.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Cluster management and scheduling capabilities&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;H4D VMs will support both &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dynamic Workload Scheduler&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (DWS) and Cluster Director (formerly known as Hypercompute Cluster).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;DWS helps schedule HPC workloads for optimal performance and cost-effectiveness, providing resource availability for time-sensitive simulations and flexible HPC jobs.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cluster Director, which lets you deploy and scale a large, physically-colocated accelerator cluster as a single unit, is now extending its capabilities to HPC environments. Cluster Director simplifies deploying and managing complex HPC clusters on H4D VMs by allowing researchers to easily set up and run large-scale simulations.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;VM sizes and regional availability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We offer H4D VMs in both standard and high-memory configurations to cater to diverse workload requirements. We also provide options with local SSD for workloads that demand high-speed storage, such as CPU-based seismic processing and structural mechanics applications (e.g., Abaqus, NASTRAN, Altair OptiStruct and Ansys Mechanical).&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table style="width: 98.1723%;"&gt;&lt;colgroup&gt;&lt;col style="width: 41.8699%;"/&gt;&lt;col style="width: 16.0569%;"/&gt;&lt;col style="width: 19.7154%;"/&gt;&lt;col style="width: 22.3577%;"/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;VM&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cores&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Memory&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Local SSD&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;h4d-highmem-192-lssd&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.v6nt3yh66eo1","dde-fdv":"192","dde-sii":"dropdownItem.qbe6z9jllztp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.qbe6z9jllztp","di-v":"192","di-dv":"192","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Cores","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.v6nt3yh66eo1","dde-fdv":"192","dde-sii":"dropdownItem.qbe6z9jllztp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.qbe6z9jllztp","di-v":"192","di-dv":"192","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Cores","type":"dropdown"}' style="vertical-align: baseline;"&gt;192&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.jrgcwdxdw0th","dde-fdv":"1488","dde-sii":"dropdownItem.uf9c917igrjp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.uf9c917igrjp","di-v":"1488","di-dv":"1488","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.ctlrlddwu0uc","di-v":"720","di-dv":"720","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Memory","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.jrgcwdxdw0th","dde-fdv":"1488","dde-sii":"dropdownItem.uf9c917igrjp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.uf9c917igrjp","di-v":"1488","di-dv":"1488","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.ctlrlddwu0uc","di-v":"720","di-dv":"720","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Memory","type":"dropdown"}' style="vertical-align: baseline;"&gt;1488&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.avrzv42pizzx","dde-fdv":"3.75TB","dde-sii":"dropdownItem.y7zm1kyovrnt","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.y7zm1kyovrnt","di-v":"3.75TB","di-dv":"3.75TB","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.im98eal11wpt","di-v":"N/A","di-dv":"N/A","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Local SSD","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.avrzv42pizzx","dde-fdv":"3.75TB","dde-sii":"dropdownItem.y7zm1kyovrnt","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.y7zm1kyovrnt","di-v":"3.75TB","di-dv":"3.75TB","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.im98eal11wpt","di-v":"N/A","di-dv":"N/A","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Local SSD","type":"dropdown"}' style="vertical-align: baseline;"&gt;3.75TB&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;h4d-standard-192&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.v6nt3yh66eo1","dde-fdv":"192","dde-sii":"dropdownItem.qbe6z9jllztp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.qbe6z9jllztp","di-v":"192","di-dv":"192","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Cores","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.v6nt3yh66eo1","dde-fdv":"192","dde-sii":"dropdownItem.qbe6z9jllztp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.qbe6z9jllztp","di-v":"192","di-dv":"192","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Cores","type":"dropdown"}' style="vertical-align: baseline;"&gt;192&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.jrgcwdxdw0th","dde-fdv":"720","dde-sii":"dropdownItem.ctlrlddwu0uc","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.uf9c917igrjp","di-v":"1488","di-dv":"1488","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.ctlrlddwu0uc","di-v":"720","di-dv":"720","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Memory","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.jrgcwdxdw0th","dde-fdv":"720","dde-sii":"dropdownItem.ctlrlddwu0uc","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.uf9c917igrjp","di-v":"1488","di-dv":"1488","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.ctlrlddwu0uc","di-v":"720","di-dv":"720","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Memory","type":"dropdown"}' style="vertical-align: baseline;"&gt;720&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.avrzv42pizzx","dde-fdv":"N/A","dde-sii":"dropdownItem.im98eal11wpt","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.y7zm1kyovrnt","di-v":"3.75TB","di-dv":"3.75TB","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.im98eal11wpt","di-v":"N/A","di-dv":"N/A","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Local SSD","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.avrzv42pizzx","dde-fdv":"N/A","dde-sii":"dropdownItem.im98eal11wpt","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.y7zm1kyovrnt","di-v":"3.75TB","di-dv":"3.75TB","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.im98eal11wpt","di-v":"N/A","di-dv":"N/A","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Local SSD","type":"dropdown"}' style="vertical-align: baseline;"&gt;N/A&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;h4d-highmem-192&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.v6nt3yh66eo1","dde-fdv":"192","dde-sii":"dropdownItem.qbe6z9jllztp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.qbe6z9jllztp","di-v":"192","di-dv":"192","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Cores","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.v6nt3yh66eo1","dde-fdv":"192","dde-sii":"dropdownItem.qbe6z9jllztp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.qbe6z9jllztp","di-v":"192","di-dv":"192","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Cores","type":"dropdown"}' style="vertical-align: baseline;"&gt;192&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.jrgcwdxdw0th","dde-fdv":"1488","dde-sii":"dropdownItem.uf9c917igrjp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.uf9c917igrjp","di-v":"1488","di-dv":"1488","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.ctlrlddwu0uc","di-v":"720","di-dv":"720","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Memory","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.jrgcwdxdw0th","dde-fdv":"1488","dde-sii":"dropdownItem.uf9c917igrjp","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.uf9c917igrjp","di-v":"1488","di-dv":"1488","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.ctlrlddwu0uc","di-v":"720","di-dv":"720","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":"#e8eaed"},"ts_fgc2":{"clr_type":0,"hclr_color":"#000000"},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Memory","type":"dropdown"}' style="vertical-align: baseline;"&gt;1488&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span data-rich-links='{"dde_di":"kix.avrzv42pizzx","dde-fdv":"N/A","dde-sii":"dropdownItem.im98eal11wpt","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.y7zm1kyovrnt","di-v":"3.75TB","di-dv":"3.75TB","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.im98eal11wpt","di-v":"N/A","di-dv":"N/A","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Local SSD","type":"dropdown"}' style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span data-rich-links='{"dde_di":"kix.avrzv42pizzx","dde-fdv":"N/A","dde-sii":"dropdownItem.im98eal11wpt","ddefe-ddi":{"cv":{"op":"set","opValue":[{"di-id":"dropdownItem.y7zm1kyovrnt","di-v":"3.75TB","di-dv":"3.75TB","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}},{"di-id":"dropdownItem.im98eal11wpt","di-v":"N/A","di-dv":"N/A","di-ts":{"ts_bd":false,"ts_fs":11,"ts_ff":"Arial","ts_it":false,"ts_sc":false,"ts_st":false,"ts_tw":400,"ts_un":false,"ts_va":"nor","ts_bgc2":{"clr_type":0,"hclr_color":null},"ts_fgc2":{"clr_type":0,"hclr_color":null},"ts_bd_i":false,"ts_fs_i":false,"ts_ff_i":false,"ts_it_i":false,"ts_sc_i":false,"ts_st_i":false,"ts_un_i":false,"ts_va_i":false,"ts_bgc2_i":false,"ts_fgc2_i":false},"di-cv":{"dicv_v":0,"dicv_ft":0}}]}},"ddefe-t":"Local SSD","type":"dropdown"}' style="vertical-align: baseline;"&gt;N/A&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;H4D VMs are currently available in us-central1-a (Iowa), and europe-west4-b (Netherlands), with additional regions in progress. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What our customers and partners are saying&lt;/strong&gt;&lt;/h3&gt;&lt;/div&gt;
&lt;div class="block-paragraph_with_image"&gt;&lt;div class="article-module h-c-page"&gt;
  &lt;div class="h-c-grid uni-paragraph-wrap"&gt;
    &lt;div class="uni-paragraph
      h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
      h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"&gt;

      






  

    &lt;figure class="article-image--wrap-small
      
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/harvard.max-1000x1000.jpg"
        
          alt="harvard"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  





      &lt;p data-block-key="u212f"&gt;&lt;i&gt;"With the power of Google's new H4D-based clusters, we are poised to simulate systems approaching a trillion particles, unlocking unprecedented insights into circulatory functions and diseases. This leap in computational capability will dramatically accelerate our pursuit of breakthrough therapeutics, bringing us closer to effective precision therapies for blood vessel damage in heart disease."&lt;/i&gt; -&lt;b&gt; Petros Koumoutsakos, Jr. Professor of Computing in Science and Engineering, Harvard University&lt;/b&gt;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_with_image"&gt;&lt;div class="article-module h-c-page"&gt;
  &lt;div class="h-c-grid uni-paragraph-wrap"&gt;
    &lt;div class="uni-paragraph
      h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
      h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"&gt;

      






  

    &lt;figure class="article-image--wrap-small
      
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/ansys_yPNqr91.max-1000x1000.jpg"
        
          alt="ansys"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  





      &lt;p data-block-key="u67zu"&gt;&lt;i&gt;“The launch of Google Cloud's H4D platform marks a significant advancement in engineering simulation. As GCP’s first VM with RDMA over Ethernet, combined with higher memory bandwidth, generous L3 cache, and AVX-512 instruction support, H4D delivers up to 3.6x better performance for Ansys Fluent simulations compared to C2D VMs. This performance boost allows our customers to run simulations faster, explore a wider range of design options, and drive innovation with greater efficiency.”&lt;/i&gt; - &lt;b&gt;Wim Slagter, Senior Director of Partner Programs, Ansys&lt;/b&gt;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_with_image"&gt;&lt;div class="article-module h-c-page"&gt;
  &lt;div class="h-c-grid uni-paragraph-wrap"&gt;
    &lt;div class="uni-paragraph
      h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
      h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"&gt;

      






  

    &lt;figure class="article-image--wrap-small
      
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Altair.max-1000x1000.jpg"
        
          alt="Altair.jpg"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  





      &lt;p data-block-key="wk7cf"&gt;&lt;i&gt;"The generational performance leap achieved with Google H4D VMs, powered by the 5th Generation AMD EPYC™, is truly remarkable. For compute-intensive, highly non-linear simulations, such as car crash analysis, Altair® Radioss® delivers a stunning 3.6x speedup. This breakthrough paves the way for faster and more accurate simulations, which is crucial for our customers in the era of the digital thread!”&lt;/i&gt; –&lt;b&gt; Eric Lequiniou, SVP Radioss Development and Altair Solvers HPC&lt;/b&gt;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_with_image"&gt;&lt;div class="article-module h-c-page"&gt;
  &lt;div class="h-c-grid uni-paragraph-wrap"&gt;
    &lt;div class="uni-paragraph
      h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
      h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3"&gt;

      






  

    &lt;figure class="article-image--wrap-small
      
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/siemens.max-1000x1000.jpg"
        
          alt="siemens"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  





      &lt;p data-block-key="wk7cf"&gt;&lt;i&gt;“The latest H4D VMs, powered by 5th Generation AMD EPYC Processors and Cloud RDMA, allow our customers to realize faster time-to-results for their Simcenter STAR-CCM+ simulations. For HIMach10, we’re seeing up to 3.6x performance gains compared to the C2D instance and 1.9x speedup on four H4D Cloud RDMA VMs compared to TCP. Our partnership with Google has been key to achieving these reduced simulation times.”&lt;/i&gt; &lt;b&gt;- Lisa Mesaros, Vice President, Simcenter Solution Domains Product Management, Siemens&lt;/b&gt;&lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Want to try it out?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We're excited to see how H4D VMs will empower you to achieve faster results with your HPC workloads! Sign up for the preview by filling out this&lt;/span&gt;&lt;a href="https://forms.gle/ky1R1VVR5VRsJqsCA" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;form&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 10 Apr 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/compute/new-h4d-vms-optimized-for-hpc/</guid><category>Google Cloud Next</category><category>HPC</category><category>Compute</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/H4D_VMs_optimized_for_HPC.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>H4D VMs: Next-generation HPC-optimized VMs</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/H4D_VMs_optimized_for_HPC.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/compute/new-h4d-vms-optimized-for-hpc/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Aysha Keen</name><title>Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Felix Schürmann</name><title>Senior HPC Technologist</title><department></department><company></company></author></item><item><title>Colossus: the secret ingredient in Rapid Storage’s high performance</title><link>https://cloud.google.com/blog/products/storage-data-transfer/how-the-colossus-stateful-protocol-benefits-rapid-storage/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As an object storage service, Google &lt;/span&gt;&lt;a href="https://cloud.google.com/storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Storage&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is popular for its simplicity and scale, a big part of which is due to the stateless REST protocols that you can use to read and write data. But with the rise of AI and as more customers look to run data-intensive workloads, two major obstacles to using object storage are its higher latency and lack of file-oriented semantics. With the launch of &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/high-performance-storage-innovations-for-ai-hpc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Rapid Storage&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on Google Cloud, we’ve added a stateful gRPC-based streaming protocol that provides sub-millisecond read/write latency and the ability to easily append data to an object, while maintaining the high aggregate throughput and scale of object storage. In this post, we’ll share an architectural perspective into how and why we went with this approach, and the new types of workloads it unlocks.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;It all comes back to &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/a-peek-behind-colossus-googles-file-system?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Colossus&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Google’s internal zonal cluster-level file system that underpins most (if not all) of our products. As we discussed in a recent blog post, Colossus supports our most demanding performance-focused products with &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/how-colossus-optimizes-data-placement-for-performance"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sophisticated SSD placement techniques&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; that deliver low latency and massive scale. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Another key ingredient in Colossus’s performance is its stateful protocol — and with Rapid Storage, we’re bringing the power of the Colossus stateful protocol directly to Google Cloud customers. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17587367c0&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When a Colossus client creates or reads a file, the client first opens the file and gets a &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;handle&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, a collection of state that includes all the information about how that file is stored, including which disks the file’s data is stored on. Clients can use this handle when reading or writing to talk directly to the disks via an optimized RDMA-like network protocol, as we previously outlined in our &lt;/span&gt;&lt;a href="https://research.google/pubs/snap-a-microkernel-approach-to-host-networking/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Snap networking system paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Handles can also be used to support ultra-low latency durable appends, which is extremely useful for demanding database and streaming analytics applications. For example, Spanner and Bigtable both write transactions to a log file that requires durable storage and that is on the critical path for database mutations. Similarly, BigQuery supports streaming to a table while massively parallel batch jobs perform computations over recently ingested data. These applications open Colossus files in append mode, and the Colossus client running in the application uses the handle to write their database mutations and table data directly to disks over the network. To ensure the data is stored durably, Colossus replicates its data across several disks, performing writes in parallel and using a quorum technique to avoid waiting on stragglers. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Colossus_Rapid_Storage_Blog.max-1000x1000.jpg"
        
          alt="1 Colossus Rapid Storage Blog"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="l906q"&gt;Figure 1: Steps involved in appending data to a file in Colossus.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The above image shows the steps that are taken to append data to a file.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The application opens the file in append mode. The Colossus Curator constructs a handle and sends it to the Colossus Client running in-process, which caches the handle.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The application issues a write call for an arbitrary-sized log entry to the Colossus Client.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The Colossus Client, using the disk addresses in the handle, writes the log entry in parallel to all the disks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Rapid Storage builds on Colossus’s stateful protocol, leveraging gRPC-based streaming for the underlying transport. When performing low-latency reads and writes to Rapid Storage objects, the Cloud Storage client establishes a stream, providing the same request parameters used in Cloud Storage’s REST protocols, such as the bucket and object name. Further, all the time-consuming Cloud Storage operations such as user authorization and metadata accesses are front-loaded and performed at stream creation time, so subsequent read and write operations go directly to Colossus without any additional overhead, allowing for appendable writes and repeated ranged reads with sub-millisecond latency.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span style="vertical-align: baseline;"&gt;This Colossus architecture enables Rapid Storage to support 20 million requests per second in a single bucket — a scale that is extremely useful in a variety of AI/ML applications. For example, when pre-training a model, once data preparation is complete, a randomized set of data samples are fed into GPUs or TPUs, typically in large files that each contain hundreds of millions to billions of tokens. But the data is rarely read sequentially, for example, because different random samples are read in different orders as the training progresses. With Rapid Storage’s stateful protocol, a stream can be established at the start of the training run before executing massively parallel ranged-reads at sub-millisecond speeds. This helps to ensure that accelerators aren’t blocked on storage latency.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Likewise, with appends, Rapid Storage takes advantage of Colossus’s stateful protocol to provide durable writes with sub-millisecond latency, and supports unlimited appends to a single object up to the object size limit.  A major challenge with stateful append protocols is how to handle cases where the client or server hangs or crashes. With Rapid Storage, the client receives a handle from Cloud Storage when creating the stream. If the stream gets interrupted but the client wants to continue reading or appending to the object, the client can re-establish a new stream using this handle, which streamlines this flow and minimizes any latency hiccups. It gets trickier when there is a problem on the client, and the application wants to continue appending to an object from a new client. To simplify this, Rapid Storage guarantees that only one gRPC stream can write to an object at a time; each new stream takes over ownership of the object, transactionally locking out any prior stream. Finally, each append operation includes the offset that’s being written to, ensuring that data correctness is always preserved even in the face of network partitions and replays.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Colossus_Rapid_Storage_Blog.max-1000x1000.jpg"
        
          alt="2 Colossus Rapid Storage Blog"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="l906q"&gt;Figure 2: A new client taking over ownership of an object.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the above image, a new client takes over ownership of an object, locking out the previous owner. &lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Initially, client 1 appends data to an object stored on three disks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The application decides to fail over to client 2, which opens this object in append mode. The Colossus Curator transactionally locks out client 1 by increasing a version number on each object data replica.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Client 1 attempts to append more data to the object, but cannot because its ownership was tied to the old version number.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To make it as easy as possible to integrate Rapid Storage into your applications, we are also updating our SDKs to support gRPC streaming-based appends and expose a simple application-oriented API. Writing data using handles is a familiar concept in the filesystems world, so we’ve integrated Rapid Storage into &lt;/span&gt;&lt;a href="https://cloud.google.com/storage/docs/cloud-storage-fuse/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Storage FUSE&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which provides clients with file-like access to Cloud Storage buckets, for low-latency file-oriented workloads. Rapid Storage also natively enables &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/cloud-storage-hierarchical-namespace-improves-aiml-checkpointing?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Hierarchical Namespace&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; as part of its zonal bucket type, providing enhanced performance, consistency, and folder-oriented APIs.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In short, Rapid Storage combines the sub-millisecond latency of block-like storage, the throughput of a parallel filesystem, and the scalability and ease of use of object storage, and it does all this in large part due to Colossus. Here are some interesting workloads we've seen our customers explore during the preview:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;AI/ML data preparation, training, and checkpointing&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Distributed database architecture optimization&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Batch and streaming analytics processing&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Video live-streaming and transcoding&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Logging and monitoring&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Interested in trying Rapid Storage? Indicate your interest &lt;/span&gt;&lt;a href="https://forms.gle/S5kyQGWrcHtduTRN9" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or reach out through your Google Cloud representative. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Visit us at Google Cloud Next and attend the breakout sessions “&lt;/span&gt;&lt;a href="https://cloud.withgoogle.com/next/25/session-library?session=BRK2-025&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY25-Q2-global-EXP106-physicalevent-er-next25-mc&amp;amp;utm_content=reg-is-live-next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;What’s new with Google Cloud’s Storage&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;” (BRK2-025), “&lt;/span&gt;&lt;a href="https://cloud.withgoogle.com/next/25/session-library?session=BRK2-020&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY25-Q2-global-EXP106-physicalevent-er-next25-mc&amp;amp;utm_content=reg-is-live-next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Hypercomputer: Mastering your Storage Infrastructure&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;” (BRK2-020), and “&lt;/span&gt;&lt;a href="https://cloud.withgoogle.com/next/25/session-library?session=BRK2-026#all" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Under the Iceberg: Simple, unified Cloud Storage for analytics data lakes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;” (BRK2-026) to learn more.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 10 Apr 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/storage-data-transfer/how-the-colossus-stateful-protocol-benefits-rapid-storage/</guid><category>HPC</category><category>AI &amp; Machine Learning</category><category>Google Cloud Next</category><category>Storage &amp; Data Transfer</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Colossus_for_Rapid_Storage.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Colossus: the secret ingredient in Rapid Storage’s high performance</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Colossus_for_Rapid_Storage.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/storage-data-transfer/how-the-colossus-stateful-protocol-benefits-rapid-storage/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Denis Serenyi</name><title>Distinguished Software Engineer, Storage</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vivek Saraswat</name><title>Group Product Manager, Storage</title><department></department><company></company></author></item><item><title>Enabling global scientific discovery and innovation on Google Cloud</title><link>https://cloud.google.com/blog/topics/hpc/powering-scientific-discovery-with-google-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;From unraveling the mysteries of our planet and the universe, to accelerating medical research and industrial innovation, scientific discovery impacts nearly every facet of human life. Today, scientific progress depends on the interplay of theory, experimentation, and computation, and increasingly, the most important and challenging problems require high-performance computing (HPC) and other advanced computing technologies and techniques. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In recent years, artificial intelligence (AI) has emerged as a powerful tool for information assessment and generation, while also becoming a powerful tool for scientific discovery, business innovation, and productivity. More recently, advances in quantum computing are increasing our confidence in shortening the timelines to solving problems beyond the reach of classical computers. Quantum computers under development now will lead to larger production systems that will catalyze the creation of new drugs and materials, reduce costs and risks in complex financial and logistics scenarios, and enable the development of more capable AI models. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google, our vision is to be the most comprehensive, capable, and accessible platform for science. Since 2008, Google Cloud has powered scientific discoveries, providing computational and data storage capabilities — including HPC clusters — to scientists, engineers, and developers worldwide. And this week, to enable continued revolutionary new science, we are bringing the best of Google DeepMind and Google Research together with new infrastructure and AI capabilities in Google Cloud, providing researchers with highly capable, cloud-scale tools for scientific computing. These new capabilities include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Supercomputing-class infrastructure for scientific computing:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Researchers can now deploy and use supercomputing clusters powered by the latest H4D VMs powered by AMD CPUs, and A4/A4X VMs powered by the latest NVIDIA GPUs. These VMs have new low-latency networking that provides supercomputer-like scaling and performance. We’re also announcing Google Cloud Managed Lustre for high performance storage I/O. These resources will enable scientists to tackle large-scale, complex science problems.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced scientific applications powered by AI models for weather forecasting and biology:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We’re now offering our first AI-powered science applications for the broader science community: &lt;/span&gt;&lt;a href="https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlphaFold 3&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for predicting the structure and interactions of biomolecules, and &lt;/span&gt;&lt;a href="https://deepmind.google/technologies/weathernext" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;WeatherNext&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; models for weather forecasting. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;AI agents for quicker ideas and faster discovery:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Two new AI agents in &lt;/span&gt;&lt;a href="https://cloud.google.com/products/agentspace"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Agentspace&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; – Deep Research and Idea Generation – can help prepare comprehensive research reports and rapidly generate new scientific hypotheses. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s take a look at these new capabilities in more detail.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1757bb7f70&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Supercomputing-class infrastructure and tools for science&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Supercomputers are designed to achieve maximum performance on very large problems, as well as to train large AI models. With ongoing advances in science and AI, quick and easy access to supercomputing resources is critical. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Researchers can now deploy and use supercomputering-class HPC clusters in Google Cloud based on new&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;a href="https://cloud.google.com/blog/products/compute/new-h4d-vms-optimized-for-hpc"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;H4D&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; VMs (virtual machines), our most powerful CPU-based VMs that use 5th Generation AMD EPYC&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;TM&lt;/span&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Processors. H4D clusters are connected with Remote Direct Memory Access (RDMA) networking utilizing Google’s &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Falcon&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/titanium?hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Titanium&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;offload technologies, providing low-latency communications for HPC applications. By using standard message-passing libraries over RDMA, H4D VMs can efficiently scale applications up to tens of thousands of cores, resulting in faster time-to-solution. You can register for the H4D VM preview &lt;/span&gt;&lt;a href="https://forms.gle/ky1R1VVR5VRsJqsCA" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.harvard.edu/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Harvard University&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is using Google Cloud to advance heart disease research by simulating large-scale systems of red blood cells and other structures, including magnetically controlled artificial bacterial flagella (ABF), with the goal of developing therapies to attack and dissolve blood clots and circulating tumor cells in human vasculatures.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"With the power of Google's new H4D-based clusters, we are poised to simulate systems approaching a trillion particles, unlocking unprecedented insights into circulatory functions and diseases. This leap in computational capability will dramatically accelerate our pursuit of breakthrough therapeutics, bringing us closer to effective precision therapies for blood vessel damage in heart disease." -&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Petros Koumoutsakos, Harvard University&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_SdrVAFk.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="awe8m"&gt;Professor Koumoutsakos’ research involves the simulation of blood flowing in a microfluidics device which is designed to capture circulating tumor cells.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;HPC clusters based on our recently announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/google-cloud-goes-to-nvidia-gtc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;A4 and A4X&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; VMs are also a critical component of our scientific discovery portfolio. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;A4 VMs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;built on NVIDIA’s latest HGX B200 GPUs, are a versatile and powerful tool for multiple scientific computing applications, offering excellent performance for direct numerical simulation, and for AI training. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;A4X VMs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, accelerated by NVIDIA GB200 NVL72 GPUs, are purpose-built for training and serving the most demanding, extra-large-scale AI workloads. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Clusters using these GPU-powered VMs can also unlock supercomputing-class performance for the next frontier of innovation: quantum computing. In the future, quantum computing systems will allow scientists to solve problems that are intractable even with the most powerful  traditional supercomputers. In the meantime, HPC clusters based on A-series VMs can be used to design tomorrow’s quantum computers and optimize quantum algorithms, by simulating large quantum circuits using the &lt;/span&gt;&lt;a href="https://goo.gle/quantumsimulation-a3" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;quantum simulation solution blueprint&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For example, &lt;/span&gt;&lt;a href="https://quantumai.google/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Research’s Quantum AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; team leverages Google Cloud to simulate the intricate device physics of quantum hardware, develop sophisticated hybrid quantum-classical algorithms, and explore and test novel quantum algorithms. This robust simulation environment facilitates scientific breakthroughs by delivering the performance and scalability essential for demanding quantum research workflows.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"We observed excellent scalability simulating a 43-qubit circuit with a depth of 30 on Google Cloud's new GPU-based supercomputers. These results underscore the potential for researchers to develop and test larger and deeper quantum circuits, which is important for understanding the performance of quantum algorithms and accelerating progress toward applications for today’s quantum computers."&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Sergio Boixo, Director, Computer Science, Google Quantum AI&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;HPC clusters demand high I/O performance to keep computational performance from stalling. Our new &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/high-performance-storage-innovations-for-ai-hpc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Managed Lustre&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; storage service, developed in collaboration with DataDirect Networks and based on EXAScaler technology, provides the I/O performance needed for supercomputing-scale applications. Google Cloud Managed Lustre delivers a high-performance, fully-managed parallel file system optimized for HPC and AI applications. With petabyte-scale capacity and up to 1 TB/s throughput, Managed Lustre ensures researchers have the I/O performance they need to power their scientific discoveries. Request access to the Managed Lustre preview by &lt;/span&gt;&lt;a href="https://cloud.google.com/contact?e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;contacting your account representative&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced scientific applications powered by AI models&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently announced our first AI-powered science applications for researchers and enterprises on Google Cloud: the groundbreaking &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlphaFold 3&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; molecular structure and interaction prediction model, and the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;WeatherNext&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; weather forecasting models.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AlphaFold 3,&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;developed by Google DeepMind and Isomorphic Labs, is revolutionizing biology through its ability to predict the structure and interactions of all of life’s molecules with unprecedented accuracy. Understanding molecular structures and their interactions helps researchers better grasp complex interactions in human health and disease. AlphaFold 3 is now available for non-commercial use on Google Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Having access to the scientific capabilities of AlphaFold on Google Cloud can help our research rapidly predict and explore the structure and interactions of all biomolecule classes. This change in capability will accelerate our understanding of diseases and enable the generation of therapeutic hypotheses.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Sumaiya Iqbal, Senior group lead of the Ladders to Cures Accelerator, Broad Institute&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To further support users, we’re simplifying access to AlphaFold 3 through a new &lt;/span&gt;&lt;a href="https://goo.gle/clustertoolkit-alphafold3" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;high-throughput solution&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; deployable via Cluster Toolkit. This turnkey solution enables efficient batch processing of hundreds to tens of thousands of sequences while minimizing costs by  autoscaling infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the domain of weather, Google DeepMind and Google Research WeatherNext models use AI for fast and accurate weather forecasting, and we recently released live WeatherNext AI forecasts on BigQuery and Earth Engine. Today, we’re introducing access to WeatherNext AI models via Google Cloud’s &lt;/span&gt;&lt;a href="https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/weathernext"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Vertex AI Model Garden&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, enabling practitioners to customize and deploy these advanced models for energy prediction, logistics, agriculture, risk management, and more.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With easier and more affordable access to faster and more accurate weather forecasting models, researchers can study far more scenarios, and organizations can better prepare for weather events — &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;such as heat waves, floods, and hurricanes —&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; to reduce their impact on infrastructure, personnel, supply chains, and communities. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_bwclQqv.gif"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="awe8m"&gt;WeatherNext Graph forecasts visualized in Google Earth Engine, showing forecasted wind speed, wind direction, and precipitation as of September 8, 2023. The visualization demonstrates the projected path of Hurricane Lee over the Atlantic Ocean.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For instance, &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2025-03-05-Carrier-and-Google-Cloud-Join-Forces-to-Strengthen-Grid-Resilience-with-AI-Powered-Home-Energy-Management-Systems" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Carrier&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; plans to leverage Google Cloud’s WeatherNext AI models as part of its Home Energy Management System (HEMS) to help enhance grid flexibility and enable smarter energy management. Once deployed, WeatherNext AI models are expected to help HEMS intelligently manage energy flows in real time — charging, discharging, and redirecting energy based on grid conditions, energy demands, and weather forecasts — contributing to a more balanced and sustainable energy grid. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Using AI as the ultimate research partner&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google's robust ecosystem of information, productivity, and advanced AI tools has long helped drive scientific research, providing researchers with information and insight. &lt;/span&gt;&lt;a href="https://scholar.google.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Scholar&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is an indispensable resource for navigating the vast landscape of scientific literature and for discovering and tracking relevant publications. Then there’s &lt;/span&gt;&lt;a href="https://gemini.google.com/app?is_sa=1&amp;amp;is_sa=1&amp;amp;android-min-version=301356232&amp;amp;ios-min-version=322.0&amp;amp;campaign_id=bkws&amp;amp;utm_source=sem&amp;amp;utm_source=google&amp;amp;utm_medium=paid-media&amp;amp;utm_medium=cpc&amp;amp;utm_campaign=bkws&amp;amp;utm_campaign=2024enUS_gemfeb&amp;amp;pt=9008&amp;amp;mt=8&amp;amp;ct=p-growth-sem-bkws&amp;amp;gad_source=1&amp;amp;gclid=EAIaIQobChMI1oPm19rIjAMV8HN_AB1J5xK2EAAYASAAEgJTa_D_BwE&amp;amp;gclsrc=aw.ds" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which can synthesize, summarize and explain information from highly scientific and technical content. And &lt;/span&gt;&lt;a href="https://notebooklm.google/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NotebookLM&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, an AI-powered research assistant, intelligently processes and summarizes selected research papers and datasets, dramatically accelerating literature reviews and extracting crucial information. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re excited to announce two new AI agents in &lt;/span&gt;&lt;a href="https://cloud.google.com/products/agentspace"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentspace&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; that have the potential to further accelerate scientific research and to revolutionize hypothesis generation. &lt;/span&gt;&lt;a href="https://gemini.google/overview/deep-research/?hl" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Deep Research&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;condenses hours of research by synthesizing information across internal and external sources to generate in-depth research reports. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Idea Generation&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; helps rapidly develop novel ideas through AI agents that create ideas, then test them against each other to find the best hypotheses. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scientists can also leverage &lt;/span&gt;&lt;a href="https://aistudio.google.com/welcome" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;AI Studio&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;and&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;a href="https://cloud.google.com/vertex-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Vertex AI&lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;on Google Cloud to develop customized AI applications and advanced machine learning workflows. We also recently announced &lt;/span&gt;&lt;a href="https://blog.google/technology/developers/gemma-3/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Gemma 3&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a collection of lightweight, state-of-the-art open models built from the same research and technology that powers our Gemini 2.0 models. These are our most advanced, portable and responsibly developed open models yet, and can be used to create scientific applications on local devices. Finally, Google Research’s &lt;/span&gt;&lt;a href="https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Geospatial Reasoning&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; framework, leveraging Vertex AI Agent Engine, will allow scientists and analysts to unlock powerful insights about the world through new geospatial foundation models and generative AI. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enabling transformational science today and tomorrow&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Together, these new advanced infrastructure, AI applications, and AI productivity technologies provide new cloud-scale scientific capabilities for all kinds of computational science research. Combined with our discovery, collaboration, and productivity tools, we are providing scientists and researchers with a comprehensive array of cloud-powered scientific capabilities. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.anl.gov/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Argonne National Laboratory&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a leading laboratory for open science computational research, is working with Google Cloud to explore how advanced computing technologies and AI tools can empower scientists and engineers to make groundbreaking discoveries faster than ever. Through the collaboration, ANL will use and evaluate Google Cloud solutions for computational research, providing feedback and guidance to further advance the design, performance, and usefulness of Google Cloud for supercomputing-scale science. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Having access to powerful computational capabilities is critical for making new scientific discoveries and accelerating innovations that power business and society.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;We are eager to work with Google Cloud to leverage their comprehensive, global-scale AI and HPC infrastructure, software technologies and AI-powered applications such as AlphaFold 3. Argonne National Laboratory’s collaboration with Google Cloud will effectively drive innovation and enable discoveries that change the world — and bring these capabilities to researchers everywhere.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Rick Stevens, Associate Laboratory Director for Computing, Environment and Life Sciences, Argonne National Laboratory&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scientific discoveries are more important than ever for solving the world’s greatest challenges. At Google, we’re building powerful advanced computing technologies to enable scientific discoveries and innovations, and we are excited to bring all these capabilities together in Google Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Practitioners can get started today with credits, training, and more with &lt;/span&gt;&lt;a href="https://cloud.google.com/edu/researchers?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud for Researchers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. To stay&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; informed and learn more about Google Cloud can help advance scientific research and discovery, join the &lt;/span&gt;&lt;a href="https://sites.google.com/corp/view/advancedcomputingcommunity/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Advanced Computing Community&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 10 Apr 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/powering-scientific-discovery-with-google-cloud/</guid><category>AI &amp; Machine Learning</category><category>Compute</category><category>Google Cloud Next</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Enabling global scientific discovery and innovation on Google Cloud</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/powering-scientific-discovery-with-google-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Bill Magro</name><title>Director &amp; Chief Technologist, High Performance Computing, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Gemma Jennings</name><title>Group Product Manager, Google DeepMind</title><department></department><company></company></author></item><item><title>Driving enterprise transformation with new compute innovations and offerings</title><link>https://cloud.google.com/blog/products/compute/delivering-new-compute-innovations-and-offerings/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the last 12 months, we’ve made incredible enhancements to our &lt;/span&gt;&lt;a href="https://cloud.google.com/products/compute"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Compute Engine platform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This is driven most notably by new fourth-generation compute instances and Hyperdisk block storage as well as major customer experience enhancements. Across all workloads, Google Cloud’s compute portfolio can help you optimize your performance and costs, while delivering enterprise-grade scalability, reliability, security, and workload consistency, helping you grow efficiently and have more to invest for innovation. Let’s explore what we’ll be announcing today at Google Cloud Next 2025. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;New and enhanced compute for every workload&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;C4D offers 80% higher throughput per vCPU and stronger performance&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Our new C4D VMs are built on AMD's 5th Gen EPYC processors, paired with Google Titanium's latest advancements, and have a higher core frequency (up to 4.1 GHz). C4D delivers impressive performance gains over prior generations across a wide set of general computing workloads — &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;up to 30% vs C3D on the estimated SPECrate®2017_int_base benchmark&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; — helping you meet the needs of business-critical applications with fewer resources. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For databases, C4D achieves an &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;up to a 55% increase in queries per second on MySQL&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; and a &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;35% performance improvement for Redis workloads compared to C3D&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. For web-serving workloads, C4D delivers up to 80% higher throughput per vCPU compared to previous generations, driving faster page rendering and a smoother end-user experience. C4D offers confidential computing and is available in 49 industry-standard shapes, with sizes ranging from 2 vCPU to 384 vCPU in three memory configurations of up to 3TB of DDR5 memory, and will include both our first AMD based bare metal offering and our new Titanium LSSD. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Now available in&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;preview&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/general-purpose-machines#c4d_series" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;try out C4D&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in Compute Engine and Google Kubernetes Engine (GKE) today.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;“&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;AppLovin, a global leader in mobile advertising, is constantly looking for cutting-edge infrastructure innovations to deliver exceptional performance for our clients. Google Cloud's C4D VMs enable us to do just that — driving a ~40% improvement over the prior generation, which leads to significant efficiency gains and latency reduction.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Basil Shikin, CTO, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AppLovin&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;$300 in free credit to try Google Cloud infrastructure&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17545297f0&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Start building for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;http://console.cloud.google.com/freetrial?redirectPath=/compute&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;C4 VMs enable new capabilities and greater flexibility&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;For demanding, low-latency tasks such as gaming, inference, large-scale data processing, and real-time workloads, our C4 machine series is expanding to enable new capabilities and configurations, including larger shapes, Local SSD, and bare metal. These new C4 shapes, built exclusively on the latest 6th generation Intel Granite Rapids CPUs, feature the highest frequency of any Compute Engine VM — up to 4.2 GHz.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;C4 shapes with Titanium Local SSD offer improved performance for I/O-intensive workloads like databases and caching layers, achieving Local SSD &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;latency reductions of up to 35%&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. New C4 bare metal instances provide performance gains of up to &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;35% for general compute&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; and up to &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;65% for ML recommendation&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; workloads compared to the prior generation. The new, larger C4 VM shapes scale up to 288 vCPU, with 2.2TB of high-performing DDR5 memory and larger cache sizes, enabling better scalability for databases, data analytics, and other memory-constrained workloads. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Request preview access &lt;/span&gt;&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSecsrgBtH-EJR1wZC5_m79NzHEblJ_3ocrbPfWwvd_cbz8xGA/viewform" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;H4Ds offer tremendous performance improvements for HPC workloads&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;S&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;cale your HPC workloads and get insights faster than ever before with H4D VMs.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;These VMs are built on the &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;5th gen AMD EPYC CPUs and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;offer the highest whole-node VM performance of more than 12,000 gflops, the highest per-core performance, and the best memory bandwidth of more than 950 GB/s of our VM families&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. H4D VMs provide 200 Gbps of low latency Titanium RDMA network bandwidth to support clusters with over 10,000 cores and plans for even more scale. Learn more in &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;our &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/hpc/powering-scientific-discovery-with-google-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Scientific Innovations blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;a href="https://forms.gle/ky1R1VVR5VRsJqsCA" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sign up for the H4D &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“The generational performance leap achieved with Google H4D VMs, powered by the 5th Generation AMD EPYC, is truly remarkable. For compute-intensive, highly non-linear simulations such as car crash analysis, Altair Radioss delivers a stunning 3.6x speedup. This breakthrough paves the way for faster and more accurate simulations, which is crucial for our customers in the era of the digital thread!” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;-&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; Eric Lequiniou, SVP Radioss Development and Altair Solvers HPC&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;M4 VMs double performance for demanding SAP workloads &lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Backed by Compute Engine’s memory-optimized 99.95% single instance SLA, &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/memory-optimized-machines#m4_series"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;M4&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; VMs offer up to 65% better price-performance and 2.25x more SAP Application Performance Standard (SAPS) compared to our previous memory optimized M3. Built on 5th Generation Intel Xeon Scalable processors, M4 VMs are certified for business-critical, in-memory SAP HANA workloads ranging from 744GB to 3TB, and for SAP NetWeaver Application Server.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Z3 for storage-intensive workloads&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;For I/O-intensive workloads such as data warehouses, SQL, and NoSQL databases, our Z3 storage-optimized family now features new Titanium SSDs and offers nine new smaller shapes, ranging from 3TB to 18TB per instance. We are also introducing new storage-optimized bare-metal instance which include  up to 72TB of Titanium SSDs and direct access to the physical server CPUs. Now in preview, register your interest by&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; signing up &lt;/span&gt;&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSexdRfC9-JfDRMEqjBy_fBukLUDkap290NvZSfZWNInwFJg2w/viewform" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Nutanix Cloud Clusters are now on Google Cloud&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We’re excited to partner with Nutanix, who selected the new Z3-metal instances to launch Nutanix Cloud Clusters (NC2) on Google Cloud. Nutanix NC2 is a hybrid cloud platform that simplifies the ability to run, manage, and operate apps, data, and AI across private and public clouds. NC2’s common operating model makes it easy to manage workloads in a consistent manner, accelerating customers’ migration to Google Cloud and helping them modernize their apps. &lt;/span&gt;&lt;a href="https://www.nutanix.com/products/nutanix-cloud-clusters/google-cloud" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Learn more&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and sign up for public preview. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"We are thrilled to announce the private preview of Nutanix Cloud Clusters on Google Cloud, marking a significant milestone in Nutanix’s commitment to delivering flexible, hybrid cloud solutions. Google Cloud’s Z3 instance types represent a perfect foundation for Nutanix to enable performance and resilience for enterprise applications. We’re excited about our partnership with Google Cloud in empowering our joint customers with greater choice and simplicity in their cloud journey." &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Saveen Pakala, Vice President of Product Management, Nutanix&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;More options to optimize your VMware environment in the cloud&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;With &lt;/span&gt;&lt;a href="https://cloud.google.com/vmware-engine"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud VMware Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, we provide one of the fastest ways to lift and transform your existing VMware estate into Google Cloud. Today, we are offering 18 additional node shapes, bringing the total number of node shapes across VMware Engine v1 and v2 to 26 — six times more node shapes than competitors. Now, you have the industry’s widest range of options to shape your capacity to your workloads’ needs and optimize your TCO. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Storage and platform capabilities for greater scale and efficiency&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our fourth-generation compute, networking, and block storage portfolio is built on several highly differentiated foundational technologies. &lt;/span&gt;&lt;a href="https://cloud.google.com/titanium"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Titanium&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is a system of purpose-built custom silicon and multiple tiers of scale-out offloads that free up the CPU, enhancing performance, reliability, security and maximizing workload efficiency. It is integrated across our compute, storage, and networking offerings, which you’ve seen in a number of the announcements above. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Recently, we also updated the Titanium ML Adapter to securely integrate NVIDIA ConnectX-7 network interface cards (NICs), providing 3.2 Tbps of non-blocking GPU-to-GPU bandwidth. In addition, Titanium Offload Processors now integrate our GPU clusters with the Jupiter data center fabric, providing greater cluster scale. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Next-generation block storage with Hyperdisk &lt;br/&gt;&lt;/strong&gt;&lt;a href="https://cloud.google.com/products/block-storage?e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Hyperdisk&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is Google Cloud’s workload-optimized, high-performance block storage that’s cost-efficient, easy-to-use and that delivers comprehensive data protection capabilities for your workloads. With unique capabilities like the ability to independently tune capacity and performance specific to your workloads, Hyperdisk &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/hyperdisk-storage-pools-is-now-generally-available?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Storage Pools&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; enable thin provisioning and data reduction, lowering TCO and simplifying management at scale. As customers move larger and larger workloads, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;we are expanding Storage Pools to store up to 5 PiB of data in a single pool — a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;5x &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;increase from before&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In addition, we are also introducing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Hyperdisk Exapools,&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; a new variant of &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/hyperdisk-storage-pools-is-now-generally-available?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Storage Pools&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; purpose-built for the largest and most demanding AI training workloads. With Hyperdisk Exapools you can provision and manage block storage delivering multiple exabytes of capacity and terabytes per second of throughput for your biggest AI clusters, while leveraging thin-provisioning and data reduction to lower your TCO and simplify management.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/hyperdisk-ml"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Hyperdisk ML&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; has also added new capabilities, including hydrating from Cloud Storage using GKE volume populator, attaching to the latest Compute Engine instances, and performing data loading acceleration from Hyperdisk ML to run training/inference on the latest TPU VM families. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more in today’s &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/whats-new-with-ai-hypercomputer"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI infrastructure blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Match resources to your usage patterns&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Finally&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, we’re providing you with greater efficiency, flexibility, and control over demanding computing tasks with &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instance-groups"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;managed instance groups&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (MIGs) — collections of virtual machines that you can manage as a single entity. For example, you can now configure MIGs to use multiple VM types and it automatically finds capacity — even during periods of high demand and rapid growth. You can also use stopped and suspended VMs in a MIG with pre-initialized VMs, to save cost and accelerate application startup. We also introduced committed use discounts (CUDs) and reservation sharing with Vertex AI and Autopilot, letting you purchase infrastructure once and utilize it across multiple services.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Invest for innovation with optimized compute&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Delivering infrastructure that provides the highest performance and flexibility for all of your workloads is our top commitment. From general-purpose VMs to specialized solutions for HPC, SAP, and databases, we offer workload-optimized solutions tailored to your needs, helping you &lt;/span&gt;&lt;a href="https://cloud.google.com/products/compute"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;unlock the innovation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; your business needs. Got questions? &lt;/span&gt;&lt;a href="https://cloud.google.com/contact"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Get in touch&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 09 Apr 2025 12:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/compute/delivering-new-compute-innovations-and-offerings/</guid><category>Google Cloud Next</category><category>Storage &amp; Data Transfer</category><category>HPC</category><category>Compute</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/Blue_Lights_in_Server_Row.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Driving enterprise transformation with new compute innovations and offerings</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/original_images/Blue_Lights_in_Server_Row.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/compute/delivering-new-compute-innovations-and-offerings/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nirav Mehta</name><title>VP Product Management, Google Compute Platforms</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Salil Suri</name><title>Director, Product Management, Compute Engine</title><department></department><company></company></author></item><item><title>What’s new with HPC and AI infrastructure at Google Cloud</title><link>https://cloud.google.com/blog/topics/hpc/whats-new-with-hpc/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud, we’re rapidly advancing our high-performance computing (HPC) capabilities, providing researchers and engineers with powerful tools and infrastructure to tackle the most demanding computational challenges. Here's a look at some of the key developments driving HPC innovation on Google Cloud, as well as our presence at Supercomputing 2024.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can also stay apprised of our HPC and AI advances by joining the new &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-advanced-computing-community" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Advanced Computing Community&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (details below). &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Next-generation HPC VMs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We began our H-series with &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/compute-optimized-machines#h3_series"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;H3 VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, specifically designed to meet the needs of demanding HPC workloads. Now, we’re excited to share some key features of the next generation of the H family, bringing even more innovation and performance to the table. The upcoming VMs will feature:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Improved workload scalability via RDMA-enabled 200 Gbps networking&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Native support to directly provision full, tightly-coupled HPC clusters on demand &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dynamic Workload Scheduler&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to provision fixed-lifetime clusters now or in the future&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/titanium?e=0&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Titanium&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;technology that delivers superior performance, reliability, and security &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We provide &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cluster-toolkit/blob/main/examples/hpc-enterprise-slurm.yaml" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;system blueprints&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for setting up turnkey, pre-configured HPC clusters on our H series VMs.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The next generation of H series is coming in early 2025.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f175aa441c0&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Parallelstore: World’s first fully-managed DAOS offering&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/parallelstore?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Parallelstore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is a fully managed, scalable, high-performance storage solution based on next-generation &lt;/span&gt;&lt;a href="https://daos.io/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DAOS technology&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, designed for demanding HPC and AI workloads. It is now generally available and provides:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Up to 6x greater read throughput performance compared to competitive Lustre scratch offerings&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Low latency (&amp;lt;0.5ms at p50) and high throughput (&amp;gt;1GiB/s per TiB) to access data with minimal delays, even at massive scale&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;High IOPS (30K IOPS per TiB) for metadata operations&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Simplified management that reduces operational overhead with a fully managed service  &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Parallelstore is great for applications requiring fast access to large datasets, such as:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Analyzing massive genomic datasets for personalized medicine&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Training large language models (LLMs) and other AI applications efficiently  &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Running complex HPC simulations with rapid data access&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A3 Ultra VMs with NVIDIA H200 Tensor Core GPUs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For GPU-based HPC workloads, we recently announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/trillium-sixth-generation-tpu-is-in-preview?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;A3 Ultra VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which feature NVIDIA H200 Tensor Core GPUs. A3 Ultra VMs offer a significant leap in performance over previous generations. They are built on servers with our new &lt;/span&gt;&lt;a href="https://cloud.google.com/titanium"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Titanium ML network adapter&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, optimized to deliver a secure, high-performance cloud experience for AI workloads, and powered by NVIDIA ConnectX-7 networking. Combined with our datacenter-wide 4-way rail-aligned network, A3 Ultra VMs deliver non-blocking 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE). &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Compared with A3 Mega, A3 Ultra offers: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;2x the GPU-to-GPU networking bandwidth, powered by Google Cloud’s Titanium ML network adapter and backed by our Jupiter data center network&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Up to 2x higher LLM inferencing performance with nearly double the memory capacity and 1.4x more memory bandwidth&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Ability to scale to tens of thousands of GPUs in a dense, performance-optimized cluster for large AI and HPC workloads&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With &lt;/span&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/deploy/a3-mega-cluster-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;system blueprints&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, available through &lt;/span&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Toolkit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, customers can quickly and easily create turnkey, pre-configured HPC clusters with Slurm support on A3 VMs.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A3 Ultra VMs will also be available through &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Kubernetes Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (GKE), which provides an open, portable, extensible, and highly-scalable platform for large-scale training and serving of AI workloads.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Trillium: Ushering in a new era of TPU performance for AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Tensor Processing Units, or TPUs, power our most advanced AI models such as &lt;/span&gt;&lt;a href="https://cloud.google.com/products/gemini?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, popular Google services like Search, Photos, and Maps, as well as scientific breakthroughs like AlphaFold 2 — which &lt;/span&gt;&lt;a href="https://www.nature.com/articles/d41586-024-03214-7" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;led to a Nobel Prize this year&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;!&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently announced that &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/trillium-sixth-generation-tpu-is-in-preview?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Trillium, our sixth-generation TPU&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, is available to Google Cloud customers in preview. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Compared with TPU v5e, Trillium delivers: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Over 4x improvement in training performance &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Up to 3x increase in inference throughput &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;67% increase in energy efficiency&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;4.7x increase in peak compute performance per chip &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Double the high bandwidth memory capacity &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Double the interchip interconnect bandwidth &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Cluster Toolkit: Streamlining HPC deployments&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We continue to improve &lt;/span&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Toolkit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, providing open-source tools for deploying and managing HPC environments on Google Cloud. Recent updates include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cluster-toolkit/tree/main/examples#major-changes-in-from-slurm-gcp-v5-to-v6" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slurm-gcp V6&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is now generally available, providing faster deployments and robust reconfiguration among other benefits.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/support?e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Customer Care&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is now available for Toolkit. You can find more information &lt;/span&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/getting-support"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on how to get support via the Cloud Customer Care console.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/blog/topics/hpc/ga-rocky-linux-8-and-centos-7-versions-of-hpc-vm-image?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;HPC VM Image&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Rocky Linux 8 is now generally available, making it easy to build an HPC-ready VM instance, incorporating our &lt;/span&gt;&lt;a href="https://cloud.google.com/solutions/hpc?hl=en&amp;amp;e=0#section-7"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;best practices running HPC on Google Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;GKE: Container orchestration with scale and performance&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;GKE continues to lead the way for containerized workloads with &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/gke-65k-nodes-and-counting?e=4875480"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the support of the largest Kubernetes clusters in the industry&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. With support for up to 65,000 nodes, we believe GKE offers more than 10X larger scale than the other two largest public cloud providers.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At the same time, we continue to invest in automating and simplifying the building of HPC and AI platforms, with:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/data-container-image-preloading"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Secondary boot disk&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which provides faster workload startups through container image caching &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/dcgm-metrics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Fully-managed DCGM metrics&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for improved accelerator monitoring &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/about-custom-compute-classes"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Custom compute classes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, offering greater control over compute resource allocation and scaling&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Extensive innovations in &lt;/span&gt;&lt;a href="http://kueue.sh/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Kueue.sh&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which is becoming the de facto standard for job queueing on Kubernetes with topology-aware scheduling, priority and fairness in queueing, multi-cluster support (&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=xMmskWIlktA&amp;amp;list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&amp;amp;index=4" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;see demo by Google and CERN engineers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;), and more&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Customer success stories: Atommap and beyond&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/hpc/atommap-builds-elastic-supercomputer-on-google-cloud?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Atommap&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a company specializing in atomic-scale materials design, is using Google Cloud HPC to accelerate its research and development efforts. With H3 VMs and Parallelstore, Atommap has achieved:  &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Significant speedup in simulations: Reduced time-to-results by more than half, enabling faster innovation &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Improved scalability: Easily scaled resources for 1,000s to 10,000s of molecular simulations, to meet growing computational demands &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Better cost-effectiveness: Optimized infrastructure costs, with savings of up to 80%, while achieving high performance &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Atommap's success story highlights the transformative potential of Google Cloud HPC for organizations pushing the boundaries of scientific discovery and technological advancement.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Looking ahead&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud is committed to continuous innovation for HPC. Expect further enhancements to HPC VMs, Parallelstore, Cluster Toolkit, Slurm-gcp, and other HPC products and solutions. With a focus on performance, scalability, compatibility, and ease of use, we’re empowering researchers and engineers to tackle the world's most complex computational challenges.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud Advanced Computing Community&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re excited to announce the launch of the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud Advanced Computing Community&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, a new kind of community of practice for sharing and growing HPC, AI, and quantum computing expertise, innovation, and impact.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This community of practice will bring together thought leaders and experts from Google, its partners, and HPC, AI, and quantum computing organizations around the world for engaging presentations and panels on innovative technologies and their applications. The Community will also leverage Google’s powerful, comprehensive, and cloud-native tools to create an interactive, dynamic, and engaging forum for discussion and collaboration.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Community launches now, with meetings starting in December 2024 and a full rollout of learning and collaboration resources in early 2025. To learn more, register &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-advanced-computing-community" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud at Supercomputing 2024&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The annual &lt;/span&gt;&lt;a href="https://supercomputing.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Supercomputing Conference&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; series brings together the global HPC community to showcase the latest advancements in HPC, networking, storage and data analysis. Google Cloud is excited to return to &lt;/span&gt;&lt;a href="https://sc24.supercomputing.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Supercomputing 2024&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in Atlanta with our largest presence ever. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Visit Google Cloud at &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;booth #1730&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to jump in and learn about our HPC, AI infrastructure, and quantum solutions. The booth will feature a Trillium TPU board, NVIDIA H200 GPU and ConnectX-7 NIC, hands-on labs, a full schedule of talks, a comfortable lounge space, and plenty of great swag!&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The booth theater will include talks from ARM, Altair, Ansys, Intel, NAG, SchedMD, Siemens, Sycomp, Weka, and more. Booth labs will get you deploying Slurm clusters to fine-tune the Llama2 model or run GROMACS using Cloud Batch to run microbenchmarks or quantum simulations, and more.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re also involved in several parts of SC24's technical program, including BoFs, User Groups, and Workshops. Googlers will participate in the following technical sessions: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://sc24.conference-program.com/presentation/?id=bof236&amp;amp;sess=sess586" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Converged HPC and Cloud Computing in the Era of Generative AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (&lt;/span&gt;&lt;a href="https://sc24.conference-program.com/presenter/?uid=169204" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Bill Magro&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; speaking)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://sc24.conference-program.com/presentation/?id=bof239&amp;amp;sess=sess667" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;HPC &amp;amp; Cloud Convergence: drivers, triggers, and constraints&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (&lt;/span&gt;&lt;a href="https://sc24.conference-program.com/presenter/?uid=222953" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Felix Schürmann &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;speaking)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://daos.io/dug24" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DAOS User Group (DUG) ‘24&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (&lt;/span&gt;&lt;a href="https://sc24.conference-program.com/presenter/?uid=648153" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dean Hildebrand&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; speaking)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://sc24.conference-program.com/presentation/?id=bof199&amp;amp;sess=sess639" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DAOS BoF&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Dean Hildebrand speaking)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://www.pdsw.org/index.shtml" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;9th International Parallel Data Systems Workshop (PDSW)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Dean Hildebrand speaking)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://sc24.conference-program.com/presentation/?id=bof108&amp;amp;sess=sess606" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;IO500: The High-Performance Storage Community BoF&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Dean Hildebrand speaking)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://sc24.conference-program.com/presentation/?id=tut143&amp;amp;sess=sess417" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;High-Performance Object Storage: I/O for the Exascale Era Tutorial&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Dean Hildebrand speaking)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://womeninhpc.org/events/sc-2024-workshop" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Women in HPC Workshop&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google is also hosting or sponsoring the following exciting events during SC24. We’re looking forward to seeing you there!&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://d126qb04.na1.hubspotlinks.com/Ctc/2M+113/d126qB04/VVt2492XYN3JW1_zCzQ21wWdCW6tdK5Q5mCdXBN6D7jYv3qn9gW7Y8-PT6lZ3pXW1dLqXH8DHPZwW7MKvrq761rrQW2L76ML8K8xFDN8rtGLzR1rPDW2W_Vhd7WLvTMW1r77qY4xVGbdW7gb9d72rp-S7W4PjwX73Zbp5lW7qQb138JVdmjN4dzXC8KGkkwVqn3091JTxz4W1kPDm26rfKJjW1ps5d06tgM2VW49hWyz5G-vYpW6zFBT51tkwgbW6Y2x_33PdjMJW4Hn3xM672S4rW7cQz4S2CFDqRN6FRq-1lKCcqW2kjp7m8CZTq-W4x6nVm4yP08KW8_F1z518GbkjW29VsDr8CBfDbW246K4578Lm_dW4Q_kln19yjxBW7hS4bP5Z92wjf5XTdKd04" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Sycomp Reception&lt;/span&gt;&lt;/a&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://beowulfbash.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Beowulf Bash&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://hyperionresearch.com/register-breakfast-briefing/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Hyperion Research - Breakfast Briefing&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://womeninhpc.org/events/sc-2024-networking-reception" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Women in HPC Reception&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://carahevents.carahsoft.com/Event/Register/544427-google" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Carahsoft Reception&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, we’ll be holding private meetings and roadmap briefings with our HPC leadership throughout the conference. To schedule a meeting, please contact &lt;/span&gt;&lt;a href="mailto:hpc-sales@google.com"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;hpc-sales@google.com&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 15 Nov 2024 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/whats-new-with-hpc/</guid><category>AI &amp; Machine Learning</category><category>Compute</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new with HPC and AI infrastructure at Google Cloud</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/whats-new-with-hpc/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Annie Ma-Weaver</name><title>Group Product Manager, Google Cloud HPC</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Wyatt Gorman</name><title>Solutions Manager, HPC &amp; AI Infrastructure, Google Cloud</title><department></department><company></company></author></item><item><title>Parallelstore is now GA, fueling the next generation of AI and HPC workloads</title><link>https://cloud.google.com/blog/products/storage-data-transfer/parallelstore-high-performance-file-service-for-hpc-and-ai-is-ga/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Organizations use artificial intelligence (AI) and high-performance computing (HPC) applications to process massive datasets, run complex simulations, and train generative models with billions of parameters for diverse use cases such as LLMs, genomic analysis, quantitative analysis, or real-time sports analytics. These workloads place big performance demands on their storage systems, requiring high throughput and I/O performance that scales and that maintains sub-millisecond latencies, even when thousands of clients are concurrently reading and writing the same shared files.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To power these next-generation AI and HPC workloads, we &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/storage-announcements-at-next24"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Parallelstore at Google Cloud Next 2024, and today, we are excited to announce that it is now generally available. Built on the &lt;/span&gt;&lt;a href="https://docs.daos.io/v2.6/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Distributed Asynchronous Object Storage (DAOS) architecture&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/parallelstore"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Parallelstore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; combines a fully distributed metadata and key-value architecture to deliver high-performance throughput and IOPS.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Read on to learn how Parallelstore serves the needs of complex AI and HPC workloads, allowing you to maximize &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/goodput-metric-as-measure-of-ml-productivity"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;goodput&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and GPU/TPU utilization, programmatically move data in and out of Parallelstore, and provision &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Kubernetes Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/products/compute"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Compute Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; resources.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Maximize goodput and GPU/TPU utilization&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To overcome the performance limitations of traditional parallel file systems, Parallelstore uses a distributed metadata management system and a key-value store architecture. Parallelstore’s high-throughput parallel data access minimizes latency and I/O bottlenecks, and allows it to saturate the network bandwidth of individual compute clients. This efficient data delivery maximizes goodput to GPUs and TPUs, a critical factor for optimizing AI workload costs. Parallelstore can also provide continuous read/write access to thousands of VMs, GPUs and TPUs, satisfying modest-to-massive AI and HPC workload requirements. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For a 100 TiB deployment, the maximum Parallelstore deployment, throughput scales to ~115 GiB/s, ~3 million read IOPS, ~1 million write IOPS, and a low-latency of ~0.3 ms. This means that Parallelstore is also a good platform for small files and random, distributed access across a large number of clients. For AI use cases, Parallelstore’s performance with small files and metadata operations enables up to 3.9x faster training times and up to 3.7x higher training throughput compared to native ML framework data loaders, as measured by Google Cloud benchmarking.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Programmatically move data in and out of Parallelstore &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many AI and HPC workloads store data in &lt;/span&gt;&lt;a href="https://cloud.google.com/storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Storage&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for data preparation or archiving. You can use Parallelstore’s integrated import/export API to automate movement of the data you’d like to import to Parallelstore for processing. With the API, you can ingest massive datasets from Cloud Storage into Parallelstore at ~20GB/s for files larger than 32MB, and at ~5,000 files per second for files under 32MB.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud alpha parallelstore instances import-data $INSTANCE_ID\r\n--location=$LOCATION --source-gcs-bucket-uri=gs://$BUCKET_NAME\r\n[--destination-parallelstore-path=&amp;quot;/&amp;quot;] --project= $PROJECT_ID&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744bad6a0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When an AI training job or HPC workload is complete, you can export results programmatically to Cloud Storage for further assessment or longer-term storage. You can also automate data transfers via the API, minimizing manual intervention and streamlining data pipelines.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud alpha parallelstore instances export-data $INSTANCE_ID --location=$LOCATION --destination-gcs-bucket-uri=gs://$BUCKET_NAME\r\n[--source-parallelstore-path=&amp;quot;/&amp;quot;]&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744bad7f0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Programmatically provision GKE resources through the CSI driver&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;It’s easy to efficiently manage high-performance storage for containerized workloads through Parallelstores’ GKE CSI driver. You can dynamically provision and manage Parallelstore file systems as persistent volumes or access existing Parallelstore instances in Kubernetes workloads, directly within your GKE clusters using familiar Kubernetes APIs. This reduces the need to learn and manage a separate storage system, so you can focus on optimizing resources and lowering TCO. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;apiVersion: storage.k8s.io/v1\r\nkind: StorageClass\r\nmetadata:\r\n  name: parallelstore-class\r\nprovisioner: parallelstore.csi.storage.gke.io\r\nvolumeBindingMode: Immediate\r\nreclaimPolicy: Delete\r\nallowedTopologies:\r\n- matchLabelExpressions:\r\n  - key: topology.gke.io/zone\r\n    values:\r\n    - us-central1-a&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744bad3d0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the coming months, you’ll be able to preload data from Cloud Storage via the fully managed GKE Volume Populator, which automates the preloading of data from Cloud Storage directly into Parallelstore during the PersistentVolumeClaim provisioning process. This helps ensure your training data is readily available, so you can minimize idle compute-resource time and maximize GPU and TPU utilization.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Programmatically provision Compute Engine resources with the Cluster Toolkit&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;It’s easy to deploy Parallelstore instances for Compute Engine with the support of the &lt;/span&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Toolkit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Formerly known as Cloud HPC Toolkit, Cluster Toolkit is open-source software for deploying HPC and AI workloads. Cluster Toolkit provisions compute, network, and storage resources for your cluster/workload following best practices. You can get started with Cluster Toolkit today by incorporating the&lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cluster-toolkit/tree/main/modules/file-system/parallelstore" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Parallelstore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; module into your blueprint with only a four-line change in your blueprint; we also provide &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cluster-toolkit/blob/main/examples/ps-slurm.yaml" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;starter blueprints&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for your convenience. In addition to the Cluster Toolkit, there are also &lt;/span&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/parallelstore_instance" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Terraform templates&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for deploying Parallelstore, supporting operations and provisioning processes through code and minimizing manual operational overhead. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;resource &amp;quot;google_parallelstore_instance&amp;quot; &amp;quot;instance&amp;quot; { \r\ninstance_id = &amp;quot;instance&amp;quot; \r\nlocation = &amp;quot;us-central1-a&amp;quot; \r\ndescription = &amp;quot;test instance&amp;quot; \r\ncapacity_gib = 12000 \r\nnetwork = google_compute_network.network.name \r\nfile_stripe_level = &amp;quot;FILE_STRIPE_LEVEL_MIN&amp;quot; \r\ndirectory_stripe_level = &amp;quot;DIRECTORY_STRIPE_LEVEL_MIN&amp;quot; \r\nlabels = { \r\ntest = &amp;quot;value&amp;quot; \r\n} \r\nprovider = google-beta \r\ndepends_on = [google_service_networking_connection.default] \r\n} \r\n\r\nresource &amp;quot;google_compute_network&amp;quot; &amp;quot;network&amp;quot; { \r\nname = &amp;quot;network&amp;quot; \r\nauto_create_subnetworks = true \r\nmtu = 8896 \r\nprovider = google-beta \r\n} \r\n\r\n# Create an IP address \r\nresource &amp;quot;google_compute_global_address&amp;quot; &amp;quot;private_ip_alloc&amp;quot; { \r\nname = &amp;quot;address&amp;quot; \r\npurpose = &amp;quot;VPC_PEERING&amp;quot; \r\naddress_type = &amp;quot;INTERNAL&amp;quot; \r\nprefix_length = 24 \r\nnetwork = google_compute_network.network.id \r\nprovider = google-beta \r\n} \r\n\r\n# Create a private connection \r\nresource &amp;quot;google_service_networking_connection&amp;quot; &amp;quot;default&amp;quot; { \r\nnetwork = google_compute_network.network.id \r\nservice = &amp;quot;servicenetworking.googleapis.com&amp;quot;\r\nreserved_peering_ranges = [google_compute_global_address.private_ip_alloc.name] \r\nprovider = google-beta \r\n}&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1744bad6d0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-world impact: Respo.vision sees more with Parallelstore&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Respo.Vision, a leader in sports video analytics, is leveraging Parallelstore to accelerate an upgrade from 4K to 8K videos for their real-time system. By using Parallelstore as the transport layer, Respo.vision helps capture and label granular data markers, delivering actionable insights to coaches, scouts, and fans. With Parallelstore, Respo.vision avoided pricey infrastructure investments to manage surges of high-performance video processing, all while maintaining low compute latency. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Our goal was to process 8K video streams at 25 frames per second to deliver richer quality sports analytical data to our customers, and Parallelstore exceeded expectations by effortlessly handling the required volume and delivering an impressive read latency of 0.3 ms. The integration into our system was remarkably smooth and thanks to its distributed nature, Parallelstore has significantly enhanced our system's scalability and resilience.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Wojtek Rosinski, CTO, Respo.vision &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;HPC and AI usage is growing rapidly. With its combination of innovative architecture, performance, and integration with Cloud Storage, GKE, and Compute Engine, Parallelstore is the storage solution you need to keep the demanding GPU/TPUs and workloads satisfied. To learn more about Parallelstore, check out the &lt;/span&gt;&lt;a href="https://cloud.google.com/parallelstore/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and reach out to your sales team for more information.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 04 Oct 2024 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/storage-data-transfer/parallelstore-high-performance-file-service-for-hpc-and-ai-is-ga/</guid><category>AI &amp; Machine Learning</category><category>HPC</category><category>Storage &amp; Data Transfer</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Parallelstore is now GA, fueling the next generation of AI and HPC workloads</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/storage-data-transfer/parallelstore-high-performance-file-service-for-hpc-and-ai-is-ga/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Barak Epstein</name><title>Sr Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Chinmayee Rathi</name><title>Product Manager</title><department></department><company></company></author></item><item><title>Boosting Google Cloud HPC performance with optimized Intel MPI</title><link>https://cloud.google.com/blog/topics/hpc/how-the-intel-mpi-library-boosts-hpc-performance/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/solutions/hpc?hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;High performance computing&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (HPC) is central to fueling innovation across industries. Through simulation, HPC accelerates product design cycles, increases product safety, delivers timely weather predictions, enables training of AI foundation models, and unlocks scientific discoveries across disciplines to name but a few examples. HPC tackles these computationally demanding problems by employing large numbers of computing elements, servers, or virtual machines, in tight orchestration with one another and communicating via the Message Passing Interface (MPI). In this blog, we show how we boosted HPC performance on Google Cloud using Intel® MPI Library. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud offers a wide range of VM families that cater to demanding workloads, including &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/new-h3-vm-instances-are-optimized-for-hpc/"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;H3 compute optimized VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which are ideal for HPC workloads. These VMs feature Google’s &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/titanium-underpins-googles-workload-optimized-infrastructure"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Titanium&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; technology, for advanced network offloads and other functions, and are optimized by Intel software tools to bring together the latest innovations in computing, networking, and storage into one platform. In third-generation VMs such as H3, &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/introducing-c3-machines-with-googles-custom-intel-ipu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;C3&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, C3D or A3, the &lt;/span&gt;&lt;a href="https://www.intel.com/content/www/us/en/products/details/network-io/ipu.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Intel Infrastructure Processing Unit (IPU)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; E2000 offloads the networking from the CPU onto a dedicated device, securely enabling low latency 200G Ethernet. Further, integrated support for Titanium in the Intel MPI library, brings the benefits of network offload to HPC workloads such as molecular dynamics, computational geoscience, weather forecasting, front-end and back-end Electronic Design Automation (EDA), Computer Aided Engineering (CAE), and Computational Fluid Dynamics (CFD). The latest version of the Intel MPI Library is included in the Google Cloud &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/create-hpc-vm"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;HPC VM Image&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;MPI Library optimized for 3rd gen VMs and Titanium &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Intel MPI Library&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is a multi-fabric message-passing library that implements the MPI API standard. It’s a commercial-grade MPI implementation based on the open-source &lt;/span&gt;&lt;a href="https://www.mpich.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MPICH project&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and it uses the OpenFabrics Interface (OFI, aka libfabric) to handle fabric-specific communication details. Various libfabric providers are available, each optimized for a different set of fabrics and protocols. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Version 2021.11 of the Intel MPI Library specifically improves the PSM3 provider and provides tunings for the PSM3 and OFI/TCP providers for the Google Cloud environment, including the Intel IPU E2000. The Intel MPI Library 2021.11 also takes advantage of the high core counts and advanced features available on 4th Generation Intel Xeon Scalable Processors and supports newer Linux OS distributions and newer versions of applications and libraries. Taken together, these improvements unlock additional performance and application features on 3rd generation VMs with Titanium.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Boosting HPC application performance&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Applications like Siemens &lt;/span&gt;&lt;a href="https://plm.sw.siemens.com/en-US/simcenter/fluids-thermal-simulation/star-ccm/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Simcenter&lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;TM&lt;/span&gt;&lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; STAR-CCM+&lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;TM&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; software shorten the time-to-solution through parallel computing. For example, if doubling the computational resources solves the same problem in half the time, the parallel scaling is 100% efficient, and the speedup is 2x compared to the run with half the resources. In practice, a speedup of 2x per doubling may not be achieved for a variety of reasons, such as not exposing enough parallelism, or overhead from inter-node communication. An improved communication library directly improves the latter problem.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To demonstrate the performance improvements of the new Intel MPI Library, Google and Intel tested Simcenter STAR-CCM+ with several standard benchmarks on H3 instances. The figure shows five standard benchmarks up to 32 VMs (2,816 cores)&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; As you can see, good speedups are achieved throughout the tested scenarios; only the smallest benchmark (LeMans_Poly_17M) stops scaling beyond 16 nodes due to its small problem size (which is not addressed by communication library performance). In some benchmarks (LeMans_100M_Coupled and AeroSuvSteadyCoupled106M), superlinear scaling can even be observed for some VM counts, likely due to the increased available cache.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/SimcenterTM_STAR-CCMTM_Wall_Clock_Speedup_.max-1000x1000.png"
        
          alt="Simcenterᵀᴹ STAR-CCM+ᵀᴹ Wall Clock Speedup Ratios_Intel MPI 2021.11+PSM3 vs Intel MPI 2021.7"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To show the improvements of Intel MPI 2021.11 over Intel MPI 2021.7, we used the ratio of runtimes between the two for each run. This speedup ratio is computed by dividing the parallel runtime of the older version by the parallel runtime of the newer version; we show those speedup ratios in the table below.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The table shows that for nearly all benchmarks and node counts, the optimized Intel MPI 2021.11 version delivers higher parallel scalability and absolute performance. This gain in efficiency — and thus shorter time-to-solution and lower cost — is already present at just two VMs (up to 1.06x improvement) and grows dramatically at larger VM counts (between 2.42x and 5.27x at 32 VMs). For the smallest benchmark (LeMans_Poly_17M) at 16 VMs, there’s an impressive improvement of 11.53x, which indicates that, unlike the older version, the newer MPI version allows good scaling up to 16 VMs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These results demonstrate that the optimized Intel MPI Library increases the scalability of Simcenter STAR-CCM+ on Google Cloud, allowing for faster time-to-solution for end users and more efficient use of their cloud resources.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_x56vucJ.max-1000x1000.jpg"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Benchmarks were run using Intel MPI 2021.7 and its TCP provider and Intel MPI 2021.11 and the PSM3 libfabric provider. Simcenter STAR-CCM+ version 2306 (18.06.006) was tested&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;on &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/new-h3-vm-instances-are-optimized-for-hpc/"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud’s H3 instances&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, with 88 MPI processes per node and 200 Gbps networking, running CentOS Linux release 7.9.2009. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What customers and partners are saying&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Intel is proud to collaborate with Google to deliver leadership software and hardware for the Google Cloud Platform and H3 VMs. Together, our work gives customers new levels of performance and efficiency for computational fluid dynamics and HPC workloads.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; -Sanjiv Shah, Vice President, Intel, Software and Advanced Technology Group, General Manager, Developer Software Engineering&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Trademarks&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A list of relevant Siemens trademarks can be found &lt;/span&gt;&lt;a href="https://www.sw.siemens.com/en-US/trademarks/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Other trademarks belong to their respective owners.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-related_article_tout"&gt;





&lt;div class="uni-related-article-tout h-c-page"&gt;
  &lt;section class="h-c-grid"&gt;
    &lt;a href="https://cloud.google.com/blog/topics/hpc/enhancements-to-cloud-hpc-toolkit-include-new-blueprint-catalog/"
       data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }'
       class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
        h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"&gt;
      &lt;div class="uni-related-article-tout__inner-wrapper"&gt;
        &lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;

        &lt;div class="uni-related-article-tout__content-wrapper"&gt;
          &lt;div class="uni-related-article-tout__image-wrapper"&gt;
            &lt;div class="uni-related-article-tout__image" style="background-image: url('')"&gt;&lt;/div&gt;
          &lt;/div&gt;
          &lt;div class="uni-related-article-tout__content"&gt;
            &lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;Cloud HPC made easy: A Blueprint Catalog for Google&amp;#x27;s Cloud HPC Toolkit&lt;/h4&gt;
            &lt;p class="uni-related-article-tout__body"&gt;Solutions in Cloud HPC Toolkit’s new Blueprint Catalog make it easy to get started with HPC on Google Cloud.&lt;/p&gt;
            &lt;div class="cta module-cta h-c-copy  uni-related-article-tout__cta muted"&gt;
              &lt;span class="nowrap"&gt;Read Article
                &lt;svg class="icon h-c-icon" role="presentation"&gt;
                  &lt;use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"&gt;&lt;/use&gt;
                &lt;/svg&gt;
              &lt;/span&gt;
            &lt;/div&gt;
          &lt;/div&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;/section&gt;
&lt;/div&gt;

&lt;/div&gt;</description><pubDate>Tue, 13 Aug 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/how-the-intel-mpi-library-boosts-hpc-performance/</guid><category>Compute</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Boosting Google Cloud HPC performance with optimized Intel MPI</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/how-the-intel-mpi-library-boosts-hpc-performance/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mansoor Alicherry</name><title>HPC Software Engineer, Cloud ML Compute Services, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Todd Rimmer</name><title>Director of Software Architecture, Intel NEX Cloud Connectivity Group</title><department></department><company></company></author></item><item><title>Build large-scale AI/ML and HPC clusters with Cluster Toolkit (formerly HPC Toolkit)</title><link>https://cloud.google.com/blog/topics/hpc/build-aiml-hpc-clusters-with-cluster-toolkit/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Update&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;: Starting the week of September 16, 2024, Google Cloud customers with eligible support plans can access assistance for the Cluster Toolkit through the Google Cloud console. Cluster Toolkit, formerly known as Cloud HPC Toolkit, is open-source software offered by Google Cloud that simplifies the process for you to deploy HPC, AI and ML workloads on Google Cloud. The Cloud Support team will handle filed cases, ensuring that you receive timely and effective support for your Cluster Toolkit implementations. Select 'Cluster Toolkit' as the sub-category under 'Compute Engine' when creating a support ticket in the console to get in touch about any Cluster Toolkit issues.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Cloud HPC Toolkit, now rebranded as &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cluster Toolkit&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, simplifies the creation and management of high performance computing environments on Google Cloud. Initially focused on scientific and technical computing workloads, it has expanded to encompass AI/ML applications, reflecting its widespread adoption across various domains.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Cluster Toolkit empowers users to focus on their workloads by streamlining cluster setup and deployment, leveraging Google Cloud's best practices, and offering flexibility for diverse computing tasks. Key benefits include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Easy deployment and management of clusters&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The Toolkit simplifies the process of setting up and maintaining clusters, allowing users to focus on their workloads rather than infrastructure management. The Toolkit supports multiple schedulers including Slurm, GKE, and Batch.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Quickstart options for HPC and AI/ML workloads:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The Toolkit has a library of pre-built blueprints and modules that let users begin running their workloads quickly, accelerating time-to-value. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Integration of Google Cloud best practices&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The aforementioned blueprints and modules incorporate Google Cloud's recommended configurations, ensuring that clusters are set up for optimal performance and efficiency.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Regular updates and new features&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The Toolkit is actively maintained and updated with new features and improvements, providing users with ongoing support and enhancements.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Open-source accessibility&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The Toolkit is open-source, allowing users to customize and extend its capabilities to meet their specific needs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What's new in Cluster Toolkit&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In addition to a new name, Cluster Toolkit has&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; several new features for HPC and AI/ML workloads:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/examples/machine-learning/a3-megagpu-8g" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;A3 Mega Blueprint&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;: This blueprint makes it easy to deploy a cluster of A3 Mega VMs ready for training large language models (LLMs) and other AI/ML workloads. Earlier in the year, we also launched the &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/hpc/cloud-hpc-toolkit-blueprint-deploys-nemo-framework-on-a3-vms?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;A3 Blueprint&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/create-hpc-vm"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;HPC VM Image&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;: This VM Image is pre-installed with popular HPC tools and libraries, ensuring you can begin running your HPC workloads quickly with assured performance. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/hpc/ga-rocky-linux-8-and-centos-7-versions-of-hpc-vm-image?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Rocky 8 version of the HPC VM Image is now GA&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Note that we have released the final CentOS 7 version of the &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/create-hpc-vm"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;HPC VM Image&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. CentOS &lt;/span&gt;&lt;a href="https://www.redhat.com/en/topics/linux/centos-linux-eol" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;reached end-of-life on June 30, 2024&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, meaning that it will no longer receive security updates. Going forward, we strongly recommend moving to Rocky 8 and will be releasing regular Rocky 8 versions of the HPC VM Image. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We are releasing the ability to disable automatic updates in the HPC VM Image. Automatic updates can disrupt the performance of HPC applications, so we’re giving you the option to &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/create-hpc-vm#disable_automatic_updates"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;turn them off via metadata&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/blog/topics/hpc/slurm-gcp-v6-is-now-ga?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slurm-gcp v6&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;: The latest version of the Slurm-gcp solution, which provides a seamless experience for running Slurm workloads on Google Cloud, is now GA. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Guidelines for existing Toolkit customers&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We've &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cluster-toolkit/discussions/2844" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;renamed&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; our &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cluster-toolkit" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GitHub repo&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to “Cluster Toolkit” and some commands (e.g., ghpc is now gcluster). Existing Git operations and commands will still work, but we strongly recommend updating local clones and command names to avoid confusion.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;How to get started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To get started with the Cluster Toolkit, select one of our &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/examples" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;easy-to-use HPC and AI/ML blueprints&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, available through our &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cluster-toolkit" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GitHub repo&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and use it to set up a cluster. We also offer a variety of resources to help you get started, including &lt;/span&gt;&lt;a href="https://cloud.google.com/hpc-toolkit/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/cluster-toolkit/docs/quickstarts/slurm-cluster"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;quickstarts&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://www.youtube.com/playlist?list=PLIivdWyY5sqK8M7k7fZ_C8ZDaDDlJm-8q" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;videos&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 02 Aug 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/build-aiml-hpc-clusters-with-cluster-toolkit/</guid><category>AI Hypercomputer</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Build large-scale AI/ML and HPC clusters with Cluster Toolkit (formerly HPC Toolkit)</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/build-aiml-hpc-clusters-with-cluster-toolkit/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Annie Ma-Weaver</name><title>Group Product Manager, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Shivani Matta</name><title>Software Engineering Manager, Google Cloud</title><department></department><company></company></author></item><item><title>Enhancing the HPC experience with Slurm-GCP v6 and TPU support</title><link>https://cloud.google.com/blog/topics/hpc/slurm-gcp-v6-is-now-ga/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;On Google Cloud, our HPC-optimized infrastructure, including the &lt;/span&gt;&lt;a href="https://cloud.google.com/solutions/ai-hypercomputer?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, can be deployed in multiple ways according to user preferences. For customers that want a &lt;/span&gt;&lt;a href="https://slurm.schedmd.com/overview.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slurm&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;-based environment, we recommend using the &lt;/span&gt;&lt;a href="https://cloud.google.com/hpc-toolkit/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud HPC Toolkit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a Google product that helps simplify the creation and management of HPC systems for AI/ML and traditional HPC workloads. The Toolkit features our &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/slurm-gcp" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slurm-GCP offering&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a set of Slurm scripts that helps automate the installation, deployment, and certain operational aspects of Slurm on Google Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today we’re excited to announce the general availability of &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/examples#major-changes-in-from-slurm-gcp-v5-to-v6" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slurm-GCP v6&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, the latest and recommended version, which will run on Slurm 23.11. This release is the result of our ongoing multi-year collaboration with the engineering experts at &lt;/span&gt;&lt;a href="https://www.schedmd.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;SchedMD&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Slurm-GCP v6 provides the following benefits, compared with v5:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster deployments &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A simple cluster - consisting of Slurm infrastructure with a pre-existing VPC and without deploying any file systems in parallel or using autoscaling clusters - now deploys 3x faster than the previous version.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Robust reconfiguration &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Reconfiguration is a Slurm-GCP mechanism allowing making changes to a running cluster and this process is now managed by a service that runs on each instance, providing a more consistent experience. Reconfiguration has also been enabled by default, enabling easier reconfiguration of a running cluster.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;More deployments in a single project &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We have lifted the restriction on the number of clusters that can be deployed in a single project.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Fewer dependencies in the deployment environment &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Reconfiguration and compute node cleanup features are now enabled by default and no longer require users to set them up, making it easier to manage Slurm clusters. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Full support for TPU v3 and v4 &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;TPU v3 and v4 are now fully supported, allowing TPU and GPU partitions to be configured alongside each other for maximum flexibility in choosing your preferred accelerators.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Start using v6 today by navigating to the &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/examples" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Toolkit blueprint library&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. These include blueprints like &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/examples#hpc-slurm6-tpu-maxtextyaml--" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Running the MaxText ML Benchmark on TPUs with Slurm&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/examples#hpc-slurm6-apptaineryaml--" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Running Apptainer Containers with Slurm&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Blueprints using a prior version of Slurm-gcp will contain “v5” in the name and be supported through November 2024. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 10 Jun 2024 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/slurm-gcp-v6-is-now-ga/</guid><category>Compute</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Enhancing the HPC experience with Slurm-GCP v6 and TPU support</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/slurm-gcp-v6-is-now-ga/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Annie Ma-Weaver</name><title>Group Product Manager, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nick Stroud</name><title>Tech Lead, Google Cloud HPC</title><department></department><company></company></author></item><item><title>Performing large-scale computation-driven drug discovery on Google Cloud</title><link>https://cloud.google.com/blog/topics/hpc/atommap-builds-elastic-supercomputer-on-google-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Editor’s note:&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; Today we hear from Atommap, a computational drug discovery company that has built an elastic supercomputing cluster on the Google Cloud to empower large-scale, computation-driven drug discovery. Read on to learn more.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Bringing a new medicine to patients typically happens in four stages: (1) &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;target identification&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; that selects the protein target associated with the disease, (2) &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;molecular discovery&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; that finds the new molecule modulating the function of the target, (3) &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;clinical trial&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; that tests the candidate drug molecule’s safety and efficacy in patients, and (4) &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;commercialization&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; that distributes the drug to patients in needs. The molecular discovery stage, in which novel drug molecules are invented, involves solving two problems: first, we need to establish an effective mechanism to modulate the target function that maximizes the therapeutic efficacy and minimizes the adverse effect; second, we need to design, select, and make the right drug molecule that faithfully implements the mechanism, is bioavailable, and has acceptable toxicity.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;What makes molecular discovery hard?&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A protein is in constant thermal motion, which changes its shape (conformation) and binding partners (other biomolecules), thus affecting its functions. Structural detail of a protein’s conformational dynamics time and again suggests novel mechanisms of functional modulation. But such information often eludes experimental determination, despite tremendous progress in experimental techniques in recent years.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The chemical “universe” of all possible distinct small molecules — estimated to number 10&lt;/span&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;60&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt; (&lt;/span&gt;&lt;a href="https://doi.org/10.1039/C0MD00020E" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Reymond et al. 2010&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) — is vast. Chemists have made probably ten billion so far, so we still have about 10&lt;/span&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;60&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt; to go.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;There lie two major challenges of molecular discovery and its endless opportunity: chances are that we have not considered all the mechanisms of action or found the best molecules, thus we can always invent a better drug. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Atommap’s computation-driven approach to molecular discovery&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Harnessing the power of high-performance computing, Atommap’s molecular engineering platform enables the discovery of novel drug molecules against previously intractable targets through new mechanisms, making the process faster, cheaper, and more likely to succeed. In past projects, Atommap’s platform has dramatically reduced both the time (by more than half) and cost (by 80%) of molecular discovery. For example, it played a pivotal role in advancing a molecule against a challenging therapeutic target to the clinical trial in 17 months (&lt;/span&gt;&lt;a href="https://clinicaltrials.gov/study/NCT04609579" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NCT04609579&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) and it substantially accelerated the discovery of novel molecules that degrade high-valued oncological targets (&lt;/span&gt;&lt;a href="https://doi.org/10.1021/acs.jcim.3c00603" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Mostofian et al. 2023&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Atommap achieves this by:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Advanced molecular dynamics (MD) simulations&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; that unveil complex conformational dynamics of the protein target and its interactions with the drug molecules and other biomolecules. They establish the dynamics-function relationship for the target protein, which is instrumental to choosing the best mechanism of action for the drug molecules. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Generative models&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; that enumerate novel molecules. Beginning with a three-dimensional blueprint of a drug molecule's interaction with its target, our models computationally generate thousands to hundreds of thousands of new virtual molecules, which are designed to form the desired interactions and to satisfy both synthetic feasibility and favorable drug-like properties. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Physics-based, ML-enhanced predictive models&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; that accurately predict molecular potencies and other properties. Every molecular design is evaluated computationally for its target-binding affinity, its effects on the target, and its drug-likeness. This allows us to explore many times more molecules than can be synthesized and tested in the wet lab, and to perform multiple rounds of designs while waiting for often-lengthy experimental evaluation, leading to compressed timelines and increased probability of success.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Computation as a Service and Molecular Discovery as a Service&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To truly, broadly impact drug discovery, Atommap needs to augment its deep expertise in molecular discovery by partnering with external expertise in the other stages — target identification, clinical trials, and commercialization. We form partnerships in two ways: Computation as a Service (CaaS) and Molecular Discovery as a Service (MDaaS, pronounced &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Midas&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;), which make it easy and economically attractive for every drug discovery organization to access our computation-driven molecular engineering platform. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Instead of selling software subscriptions, Atommap’s pay-as-you-go CaaS model lets any discovery project first try our computational tools at a small and affordable scale, without committing too much budget. Not every project is amenable to computational solutions, but most are. This approach allows every drug discovery project to introduce the appropriate computations cheaply and quickly, with demonstrable impact, and then deploy them at scale to amplify their benefits.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For drug hunters who would like to convert their biological and clinical hypotheses into drug candidates, our MDaaS partnership allows them to quickly identify potent molecules with novel intellectual property for clinical trials. Atommap executes the molecular discovery project from the first molecule (initial hits) to the last molecule (development candidates), freeing our partners to focus on biological and clinical validation. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;The need for elastic computing&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_apPqgNo.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="eug34"&gt;Figure 1. Diverse computational tasks in Atommap’s molecular engineering platform require elastic computing resources.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For Atommap, the number of partnership projects and the scale of computation in each project fluctuate over time. In building structural models to enable structure-based drug design, we run hundreds of long-timescale MD simulations on high-performance GPUs to explore the conformational ensembles of proteins and complexes between proteins and small molecules, each of which can last hours to days. Our NetBFE platform for predicting the binding affinities invokes thousands, sometimes tens of thousands, of MD simulations, although each one is relatively short and completes in a few hours. Atommap’s machine learning (ML) models take days to weeks to train on high-memory GPUs, but once trained and deployed in a project, run in seconds to minutes. Balancing the different computational loads associated with different applications poses a challenge to the computing infrastructure. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To meet this elastic demand, we chose to supplement our internal computer clusters with &lt;/span&gt;&lt;a href="https://cloud.google.com/"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;How to build an elastic supercomputer on Google Cloud&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;It took us several steps to move our computing platform from our internal cluster to a hybrid environment that includes Google Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="vertical-align: baseline;"&gt;Slurm&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many workflows in our platform depended on &lt;/span&gt;&lt;a href="https://slurm.schedmd.com/overview.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slurm&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for managing the computing jobs. To migrate to Google Cloud, we built a cloud-based Slurm cluster using &lt;/span&gt;&lt;a href="https://cloud.google.com/hpc-toolkit/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud HPC Toolkit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, an open-source utility developed by Google. Cloud HPC Toolkit is a command line tool that makes it easy to stand up connected and secure cloud HPC systems. With this Slurm cluster up and running in minutes, we quickly put it to use with our Slurm-native tooling to set up computing jobs for our discovery projects.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud HPC Toolkit naturally fits our DevOps function into best practices. We defined our compute clusters as “blueprints” within YAML files that allow us to simply and transparently configure specific details of individual Google Cloud products. The Toolkit transpiles blueprints into input scripts that are executed with Hashicorp’s &lt;/span&gt;&lt;a href="https://www.terraform.io/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Terraform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, an industry standard tool for defining “infrastructure-as-code” such that it can be committed, reviewed, and version-controlled. Within the blueprint we also defined our compute machine image through a startup script that’s compatible with Hashicorp’s &lt;/span&gt;&lt;a href="https://www.packer.io/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Packer&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This allowed us to easily “bake in” the software our jobs typically need, such as conda, Docker, and Docker container images that provide dependencies such as &lt;/span&gt;&lt;a href="https://ambermd.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AMBER&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://openmm.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OpenMM&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://pytorch.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;PyTorch&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The deployed Slurm cloud system is as accessible and user-friendly as any Slurm system we have used before. The compute nodes are not deployed until requested and are spun down when finished, thus we only pay for what we use; the only persistent nodes are the head and controller nodes that we log into and deploy from.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Batch&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Compared to Slurm, the cloud-native Google &lt;/span&gt;&lt;a href="https://cloud.google.com/batch"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Batch&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; gives us even greater flexibility in accessing the computing resources. Batch is a managed cloud job-scheduling service, meaning it can be used to schedule cloud resources for long-running scientific computing jobs. Virtual machines that Batch spins up can easily mount either NFS stores or &lt;/span&gt;&lt;a href="https://cloud.google.com/storage/docs/json_api/v1/buckets"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Storage buckets&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, the latter of which are particularly suitable for holding our multi-gigabyte MD trajectories and thus useful as output directories for our long-running simulations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Running our workflows on Google Cloud through the Batch involves two steps: 1) copying the input files to Google Cloud storage, 2) submitting the batch job.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;$ gcloud storage cp -R ./local_input_dir gs://my-gcs-bucket/work_dir\r\n$ gcloud batch jobs submit example-job --config=./job_cfg.json --location=us-central1&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1757ee0f40&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;SURF&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A common pattern has emerged in most of our computational workflows. First, each job has a set of complex input files, including the sequence and structures of the target protein, a list of small molecules and their valence and three-dimensional structures, and the simulation and model parameters. Second, most computing jobs take hours to days to finish even on the highest-performance machines. Third, the computing jobs produce output datasets of substantial volume and subject to a variety of analyses.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Accordingly, we have recently developed a new computing infrastructure, SURF (submit, upload, run, fetch), which seamlessly integrates our internal cluster and Google Cloud through one simple interface and automatically brings the data to where it is needed by computation or the computation to where the data resides. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;$ am-surf submit my_job/ --num-cpus 16 --num-gpus 8 --where [gcp|internal]&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f1757ee0820&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;SURF submits jobs to Google Cloud Batch using Google’s Python API.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We now have an elastic supercomputer on the cloud that gives us massive computing power when we need it. It empowers us to explore the vast chemical space at an unprecedented scale and to invent molecules that better human health and life.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;Andrew Sabol supported us from the very beginning of Atommap, even before we knew whether we could afford the computing bills. Without the guidance and technical support of Vincent Beltrani, Mike Sabol, and other Google colleagues, we could not have rebuilt our computing platform on Google Cloud in such a short time. Our discovery partners put their trust in our young company and our burgeoning platform; their collaborations helped us validate our platform in real discovery projects and substantially improve its throughput, robustness, and predictive accuracy. &lt;/span&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 13 May 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/hpc/atommap-builds-elastic-supercomputer-on-google-cloud/</guid><category>Customers</category><category>HPC</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Performing large-scale computation-driven drug discovery on Google Cloud</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/hpc/atommap-builds-elastic-supercomputer-on-google-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Huafeng Xu</name><title>CEO, Atommap</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Christopher Ryan</name><title>Director of Machine Learning and Data Sciences, Atommap</title><department></department><company></company></author></item></channel></rss>