<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Data Analytics</title><link>https://cloud.google.com/blog/products/data-analytics/</link><description>Data Analytics</description><atom:link href="https://cloudblog.withgoogle.com/blog/products/data-analytics/rss/" rel="self"></atom:link><language>en</language><lastBuildDate>Thu, 04 Jun 2026 16:00:03 +0000</lastBuildDate><image><url>https://cloud.google.com/blog/products/data-analytics/static/blog/images/google.a51985becaa6.png</url><title>Data Analytics</title><link>https://cloud.google.com/blog/products/data-analytics/</link></image><item><title>What's new for Managed Service for Apache Spark clusters</title><link>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud, our goal is to let you run large-scale analytical and data science workloads with maximum efficiency so you can process big data pipelines, machine learning, and ETL tasks. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently announced that the Dataproc service is now &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-service-for-apache-spark"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, reflecting our deep integration with the &lt;/span&gt;&lt;a href="https://cloud.google.com/data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To support the diverse architectural needs of today’s modern data teams, we offer the service in two distinct deployment modes: serverless and managed clusters. The serverless deployment mode completely abstracts infrastructure management for ephemeral or ad-hoc jobs, while the managed clusters deployment mode is designed for teams that require fine-grained infrastructure customization, persistent environments, long-running stateful processing, or native integration with custom Compute Engine hardware configurations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When it comes to managed cluster deployments, we’ve re-imagined the experience from the ground up, focusing on three core pillars: making Spark &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;faster&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; by supercharging execution speeds, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;easier&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to run by maximizing resource obtainability and reducing operational overhead, and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;smarter&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; by embedding AI directly into the development and operational lifecycle. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This blog post focuses specifically on what we announced at Google Cloud Next ‘26 for the Managed Spark clusters deployment mode: providing enhanced flexibility to fine-tune performance and cost through native execution engine, smarter scaling policies, and Gemini-powered extensions. For the latest of the serverless deployment mode, check out &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;this blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster, with the Lightning Engine native execution engine&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Arguably the biggest update for Managed Spark clusters is &lt;/span&gt;&lt;a href="https://cloud.google.com/dataproc/docs/guides/lightning-engine"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lightning Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which introduces massive performance gains for Spark DataFrame/Dataset APIs and heavy Spark SQL queries. Powered by a native, C++ vectorized execution engine built on Velox and Gluten, with specialized internal enhancements, Lightning Engine bypasses JVM execution bottlenecks by compiling query plans into native instructions optimized for SIMD (Single Instruction, Multiple Data) vectorization.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This native execution engine delivers:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Up to 4.9x faster performance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; than standard open-source Spark&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;up to 2x the price-performance &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;over the leading high-speed Spark alternative&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Crucially, taking advantage of these performance gains doesn’t require any code changes to your existing Spark applications. Because your jobs complete faster, you directly reduce your aggregate Compute Engine runtime hours and overall spend.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To enable Lightning Engine on your managed clusters, simply specify the Lightning Engine option when you’re creating a cluster.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      data-glue-modal-trigger="uni-modal-2uYC821jtEk-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_u5e7XRu.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;The new way to use Spark: Intelligent, automated, and lightning fast&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
      &lt;figcaption class="article-video__caption h-c-page"&gt;
        
          &lt;h4 class="h-c-headline h-c-headline--four h-u-font-weight-medium h-u-mt-std"&gt;Learn technical details and hear Lowe’s experience with Lightning Engine&lt;/h4&gt;
        
        
      &lt;/figcaption&gt;
    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-2uYC821jtEk-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="2uYC821jtEk"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Easier: Maximize resource obtainability via Flexible VMs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Temporary localized shortages of a specific machine type can stall cluster creation or interrupt autoscaling. To dramatically improve cluster resilience against capacity constraints, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/concepts/configuring-clusters/flexible-vms"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Flexible VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for Managed Spark clusters are now generally available. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Flexible VMs allow you to define up to ten ranked machine types for your master, primary, and secondary worker nodes. Managed Service for Apache Spark pairs this preference with automated regional zone placement, dynamically scanning the entire region to fulfill your capacity requests using the best available hardware layout. This helps ensure your pipelines spin up predictably, drastically reducing resource availability errors, and maximizing your ability to capture cost-effective Spot VM capacity during periods of peak demand.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_vPfgVT7.max-1000x1000.jpg"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Easier: Zero-scale clusters and scheduled stops&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To give you better fiscal control over persistent and developmental environments, we recently announced the general availability of two highly requested FinOps features: &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/guides/create-zero-scale-cluster"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;zero-scale clusters&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/concepts/configuring-clusters/scheduled-stop"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;cluster scheduled stops&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Zero-scale clusters&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: You can now provision environments that use exclusively secondary workers (Spot VMs), enabling the cluster to automatically scale down to absolutely zero worker nodes when no processing is active, leaving only the master node online to preserve metadata.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cluster scheduled stops&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: This feature lets you configure automated cluster shutdown policies based on specific idle-time limits or a precise future timestamp.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Because these features are natively integrated, they reduce the operational friction of having to delete and reconstruct your environment, while you can stop paying for idle compute overhead during nights and weekends.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Managed Service for Apache Spark MCP Server&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To bridge the gap between generative AI and data engineering, we launched the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/guides/use-dataproc-mcp"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Model Context Protocol (MCP) server for Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This open-standard integration allows LLMs and AI assistants to securely and dynamically interact with your Managed Spark clusters using natural language.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By utilizing the MCP server, your AI agents can securely connect to your data platform under existing IAM permissions. This allows agents to perform cluster-based operations, such as creating a cluster, submitting a job, or adjusting an autoscaling policy, directly from your AI application. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Accelerating AI with the Data Agent Kit&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Data Agent Kit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; extension allows data scientists, engineers, and developers to manage their entire data workload lifecycle directly within their preferred development environment. We rolled out native support for this extension on Managed Spark clusters, enabling teams to seamlessly build and deploy specialized Data Agents for code generation and data wrangling.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_nOOSIdE.max-1000x1000.jpg"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers can choose to use &lt;/span&gt;&lt;a href="https://antigravity.google/blog/introducing-google-antigravity-2-0" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Antigravity 2.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Google's standalone, agentic development platform or bring these agentic capabilities into their preferred IDE including VS Code, Claude Code, or Codex via the Data Agent Kit extensions and plugins. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;By pairing this streamlined workflow with the raw processing power of managed clusters, these intelligent agents can securely execute complex workflows directly over petabyte-scale data lakes. Specifically, the Data Agent Kit enables developers to:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Build and orchestrate pipelines:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Author multi-node data pipelines and generate comprehensive code documentation using natural language.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Perform real-time debugging: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Leverage Gemini Cloud Assist to sift through executor logs, pinpoint root causes of job failures, and recommend actionable fixes.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Easily connect to Spark resources: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Instantly attach to serverless Spark runtimes or managed clusters without manual network configuration or local Spark installations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Streamline Git and CI/CD management:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Commit, merge, and deploy code directly from your IDE of choice, triggering automated testing and deployment pipelines without friction.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Next-generation Lakehouse &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently launched &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/introduction"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which delivers read/write interoperability between engines like Managed Service for Apache Spark and BigQuery. By leveraging the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/about-lakehouse-catalogs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse runtime catalog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; as a unified, serverless metadata layer, it removes data silos and the need for complex translation layers. This agentic-first approach allows organizations to process open formats directly from Google Cloud Storage, or even query remote AWS datasets using the newly introduced &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/about-cross-cloud-lakehouse"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;cross-cloud Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, all while maintaining a single source of truth for security and governance.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For customers utilizing Managed Spark clusters, this integration unlocks several powerful new capabilities. Data teams can now accelerate their most demanding ETL and data science workloads by up to 4.9x using the optimized Lightning Engine.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_ywa0kAz.max-1000x1000.png"
        
          alt="4"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Next-gen runtimes: Cluster Image 3.0 with Spark 4.1&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Keeping pace with the open-source ecosystem, we rolled out &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/release-notes#May_03_2026"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Image 3.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in preview, built with Apache Spark 4.1 and that features an upgraded default Java runtime, Java 21. Spark 4.1 introduces a set of core open-source capabilities, including real-time mode for structured streaming. This enables your Spark environment to support real-time streaming with continuous, sub-second latency processing.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These updates are live and ready to use today in Managed Spark clusters! You can enable these new features directly through the Google Cloud console or via the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gcloud&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; CLI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To spin up a new Managed Cluster and natively unlocking the performance of &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning Engine,&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; run the following command in your terminal:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud dataproc clusters create my-optimized-cluster \\\r\n    --region=us-central1 \\\r\n    --image-version=2.3 \\\r\n    --engine=lightning \\&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea0512250&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alternatively, navigate to the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark page in the console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, click Create cluster, and select ‘Enable Lightning Engine’ under the cluster configuration settings to automatically activate Lightning Engine for your Spark jobs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We look forward to hearing about the environments you build and run as Managed Service for Apache Spark clusters!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 04 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</guid><category>AI &amp; Machine Learning</category><category>Streaming</category><category>Open Source</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What's new for Managed Service for Apache Spark clusters</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Qiqi Wu</name><title>Senior Product Manager, Google Cloud</title><department></department><company></company></author></item><item><title>What’s new with Google Data Cloud</title><link>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;June 1 - June 5&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Beyond the Query: Powering AI Agents with Bigtable, Firestore &amp;amp; Memorystore &lt;br/&gt;&lt;/strong&gt;&lt;span style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"&gt;Discover the latest advancements in Google Cloud's NoSQL Database portfolio, including Bigtable, Firestore, and Memorystore. This series is designed for a broad audience: whether you are exploring these databases for the first time or are an existing user looking to leverage the new capabilities announced at Next '26. &lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/beyond-the-query-powering-ai-agents-with-bigtable-firestore-memorystore" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Register here to secure your spot!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Engineer's AI Toolkit Workshops: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Solve data-driven challenges with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery, AlloyDB&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and more. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Hosted by Google Cloud Labs, this highly technical event is built specifically for Platform Engineers, SREs, and cloud infrastructure teams ready to bridge the gap between AI prototypes and production-grade deployments. Look out for more locations coming soon&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Toronto&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; - June 25 (Data Cloud) | &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-labs-data-cloud-toronto" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RSVP Here&lt;/span&gt;&lt;/a&gt;&lt;br/&gt;&lt;strong style="vertical-align: baseline;"&gt;Chicago&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; - June 30 (Data Cloud) | &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-labs-data-cloud-chicago" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RSVP Here&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Start a 10-day &lt;/strong&gt;&lt;a href="https://cloud.google.com/bigtable"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Bigtable&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; free trial with a 1 node SSD cluster and up to 500GB of storage capacity. &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;W&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;ith no credit card required to start, you can easily ingest workloads and manage workloads that require low-latency, high-throughput, and predictable access. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Plus, new Google Cloud customers get &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/sql/docs/mysql/create-free-trial-instance"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;$300 in free credits&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on signup.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;May 11 - May 15&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Airflow&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; has launched a wave of new features, including the general availability of Airflow 3.1, AI-powered agentic troubleshooting, a new managed Airflow MCP Server for custom agent integration, and declarative YAML-based orchestration pipelines—discover all the details in the&lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/managed-apache-airflow-scaling-data-and-ai-workloads"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;full blog post&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 20 - April 24&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google-built ODBC Driver for BigQuery is now available in Preview&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the launch of the new, Google-built ODBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for applications to BigQuery and is developed entirely in-house by Google. &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/odbc-for-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Download a new driver and connect your application to BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 13 - April 17&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/looker-studio-is-data-studio"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;we are reintroducing Data Studio&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to play a significant role in the AI era, expanding from data visualizations and reports to host BigQuery conversational agents and data apps built in Colab notebooks.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-graph"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Graph is now available in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, offering an easy-to-use, highly scalable graph analytics solution, empowering data professionals to model, analyze and visualize massive-scale relationships in an entirely new way. &lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 6 - April 10&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-embedded-adds-conversational-analytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics for Looker Embedded environments&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, enabling users to add natural language experiences to their own custom data-driven applications, powered by Gemini. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;We expanded Looker’s capabilities for faster ad-hoc analysis, with the &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-self-service-explores"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;introduction of self-service Explores&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, enabling you to bring your own data to Looker’s semantic layer and gain instant access to insights in a governed data environment.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;March 23 - March 27&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We showed you how you can &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/databases/cloudsql-read-pools-support-autoscaling"&gt;&lt;span style="vertical-align: baseline;"&gt;scale your reads with Cloud SQL autoscaling read pools.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; This feature allows you to provision multiple read replicas that are accessible via a single read endpoint and to dynamically adjust your read capability based on real-time application needs. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Our customers are leveraging the full power of Conversational Analytics and Looker to drive major business and technical breakthroughs in the AI era. Companies like &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/telenor-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Telenor&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/petcircle-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Pet Circle&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/fluent-commerce"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Fluent Commerce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/lighthouse"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lighthouse Intelligence&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/wego"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Wego&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/roller"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;ROLLER&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are turning data into insights and actions, grounded by Looker’s semantic layer.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;March 16 - March 20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/gemini-supercharges-the-bigquery-studio-assistant"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;an enhanced Gemini assistant in BigQuery Studio&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, transforming the agent from a code assistant into a fully context-aware analytics partner.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 23 - February 27&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/databases/managed-mcp-servers-for-google-cloud-databases"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;managed and remote MCP support for Google Cloud databases&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including AlloyDB, Spanner, Cloud SQL, Bigtable and Firestore, to power the next generation of agents. This announcement extends the ability for AI models to plan, build, and solve complex problems, connecting to the database tools our customers leverage daily as the backbone of their work environment.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We outlined how you can &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/build-data-agents-with-conversational-analytics-api"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;build a conversational agent in BigQuery using the Conversational Analytics API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to help you build context-aware agents that can understand natural language, query your BigQuery data, and deliver answers in text, tables, and visual charts.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 16 - February 20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Our customers are leveraging the full power of Looker to drive major business and technical breakthroughs. Companies like &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/arrive"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Arrive&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/audika"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Audika&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/looker-carousell"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Carousell&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/framebridge"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Framebridge&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/gumgum"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GumGum&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/intel-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Intel&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/overdose-digital"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Overdose Digital&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/one-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ocean Network Express&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/subskribe"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Subskribe&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/promevo-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Promevo&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are leveraging Looker’s newest AI-driven capabilities, including Conversational Analytics, to transform data to insights and actions, and empower their entire organization with a single source of truth, powered by Looker’s semantic layer.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 2 - February 6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Join us on March 4 for our webinar, Win Your AI Strategy with Cloud SQL Enterprise Plus, to learn how to power your generative AI workloads with 3x higher performance and 99.99% availability. &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/win-your-ai-strategy-with-cloud-sql-enterprise-plus" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Register today&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to discover how to build a scalable, enterprise-grade foundation for your most demanding AI applications.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;January 26 - January 30&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-conversational-analytics-in-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics in BigQuery&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, which allows users to analyze data using natural language.&lt;/span&gt;&lt;/a&gt; &lt;span style="vertical-align: baseline;"&gt;Conversational Analytics in BigQuery is an intelligent agent that generates, executes and visualizes answers grounded in your business context directly in BigQuery Studio, making data insights for data professionals more conversational.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We outlined how &lt;/span&gt;&lt;a href="https://cloud.google.com/transform/from-asset-to-action-how-data-products-have-become-the-foundation-for-ai-agents"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;data products have become the foundation for AI agents&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, providing the context needed to make autonomous agents reliable and trusted for real business use, backed by organized business logic and semantic understanding.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We highlighted how &lt;/span&gt;&lt;a href="https://cloud.google.com/use-cases/data-analytics-agents"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;you can supercharge data analytics workflows&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and outlined Google Cloud’s AI agent offerings for data engineering, data science, and development tools, so you can integrate agentic workflows in your applications, empower your teams and speed discovery.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;January 19 - January 23&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We have fundamentally reimagined &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/new-firestore-query-engine-enables-pipelines"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore with pipeline operations for Enterprise edition&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Experience a powerful new engine featuring over a hundred new query features, index-less queries, new index types, and observability tooling to improve query performance. Seamlessly migrate using built-in tools and leverage Firestore’s existing differentiated serverless foundation, virtually unlimited scale, and industry-leading SLA. Join a community of 600K developers to craft expressive applications that maximize the benefits of rich queryability, real-time listen queries, robust offline caching, and cutting-edge AI-assistive coding integrations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://www.mssqltips.com/sqlservertip/11578/introducing-google-cloud-sql/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Introducing Google Cloud SQL on MSSQLTips&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We are highlighting a new technical guide published on MSSQLTips titled "Introducing Google Cloud SQL." This article serves as an essential resource for SQL Server administrators and developers exploring Google Cloud's fully managed database service. It provides a detailed overview of Cloud SQL capabilities, including high availability, security integration, and the seamless transition of on-premises SQL Server workloads to the cloud, making it an ideal resource for those planning their migration strategy.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the &lt;/span&gt;&lt;strong&gt;&lt;a href="https://medium.com/google-cloud/bridging-the-identity-gap-microsoft-entra-id-integration-with-cloud-sql-for-sql-server-a30207d63035" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Public Preview of Microsoft Entra ID&lt;/span&gt;&lt;/a&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; (formerly Azure Active Directory) integration with Cloud SQL for SQL Server. Designed to tackle the challenge of identity sprawl in multi-cloud environments, this integration allows organizations to govern database access using their existing Microsoft identity infrastructure. Key benefits include centralized identity management, enhanced security features like Multi-Factor Authentication (MFA), and simplified user administration through direct group mapping. This feature is available for SQL Server 2022 and supports both public and private IP configurations.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;January 12 - January 16&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google-built JDBC Driver for BigQuery is now available in Preview&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the launch of the new, Google-built JDBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for Java applications to BigQuery and is developed entirely in-house by Google. &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/jdbc-for-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Download a new driver and connect your Java application to BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Troubleshoot Airflow tasks instantly with Gemini Cloud Assist investigations:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Cloud Composer just got smarter. We are excited to announce that &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini Cloud Assist investigations &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;are now available directly within&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; Cloud Composer 3&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. Instead of manually sifting through raw logs, you can now simply click "Investigate" on a failed Airflow task. Gemini analyzes logs and task metadata to identify failure patterns—such as resource exhaustion or timeouts—and provides actionable recommendations driven by Gemini Cloud Assist to resolve the issue. This integration shifts the debugging experience from manual toil to automated root cause analysis, significantly reducing the time required to restore your pipelines.&lt;/span&gt; &lt;a href="https://docs.cloud.google.com/composer/docs/composer-3/troubleshooting-dags#investigations"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Learn more about AI-assisted troubleshooting&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-related_article_tout"&gt;





&lt;div class="uni-related-article-tout h-c-page"&gt;
  &lt;section class="h-c-grid"&gt;
    &lt;a href="https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud-2025/"
       data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }'
       class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
        h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"&gt;
      &lt;div class="uni-related-article-tout__inner-wrapper"&gt;
        &lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;

        &lt;div class="uni-related-article-tout__content-wrapper"&gt;
          &lt;div class="uni-related-article-tout__image-wrapper"&gt;
            &lt;div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/whats_new_data_cloud_fWg4bKK.max-500x500.png')"&gt;&lt;/div&gt;
          &lt;/div&gt;
          &lt;div class="uni-related-article-tout__content"&gt;
            &lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;What’s new with Google Data Cloud - 2025&lt;/h4&gt;
            &lt;p class="uni-related-article-tout__body"&gt;Recent product news and updates from our data analytics, database and business intelligence teams.&lt;/p&gt;
            &lt;div class="cta module-cta h-c-copy  uni-related-article-tout__cta muted"&gt;
              &lt;span class="nowrap"&gt;Read Article
                &lt;svg class="icon h-c-icon" role="presentation"&gt;
                  &lt;use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"&gt;&lt;/use&gt;
                &lt;/svg&gt;
              &lt;/span&gt;
            &lt;/div&gt;
          &lt;/div&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;/section&gt;
&lt;/div&gt;

&lt;/div&gt;</description><pubDate>Thu, 04 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</guid><category>Databases</category><category>Business Intelligence</category><category>Data Analytics</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/whats_new_data_cloud_fWg4bKK.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new with Google Data Cloud</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/original_images/whats_new_data_cloud_fWg4bKK.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>The Google Cloud Data Analytics, BI, and Database teams </name><title></title><department></department><company></company></author></item><item><title>What’s new in serverless Managed Service for Apache Spark</title><link>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you use it for data preparation, real-time interactive queries, AI model training, or something entirely different, running Apache Spark at scale is demanding — you shouldn’t have to manage the underlying infrastructure too.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Late last year, we &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/release-notes#December_04_2025"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; the general availability (GA) of our serverless &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-service-for-apache-spark"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; runtime version 3.0, prioritizing speed, simplicity, and reliability. Since then, customer use of Managed Service for Apache Spark for data science has nearly doubled year over year. This is a testament to our belief that using Google Cloud is the easier, smarter, and faster place to run your Apache Spark workloads. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In this blog, let’s dive into a few key features that make our serverless Apache Spark offering a great fit for a wide range of workflows, including feature engineering, GPU-accelerated model training and tuning, semantic search, RAG, building AI agents and applications, and more.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Zero-setup onboarding&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The most significant barrier to entry for a cloud service is often the "time to magic moment" — the interval between creating a project and running your first workload. Previously, with serverless Spark, you still needed to manually configure IAM roles, VPC networking, and firewall rules before submitting a single job.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the serverless Spark 3.0 runtime version, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;zero-setup onboarding&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; significantly reduces the time to launch your first workload on serverless Spark. It does so by automating the following steps:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Permissions:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Necessary IAM roles and permissions are automatically provisioned to the appropriate service accounts.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Networking:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/concepts/network#private-google-access-requirement"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Private Google Access&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is auto-enabled on subnets, and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/concepts/network#automatically_created_regional_system_firewall_policy"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;system firewall policies&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are configured automatically.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;API management&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Enabling APIs is now more efficient; you can just enable the Managed Service for Apache Spark API instead of manually having to enable several different APIs, as you did previously.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Fast startup for SLA-sensitive workloads&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Latency matters, especially for interactive data science and SLA-sensitive batch pipelines. Historically, serverless Spark startup times could take several minutes. With the 3.0 runtime, we’ve dropped startup times by 75% across both standard and premium tiers, delivered automatically without any code or configuration changes and at no additional cost. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This massive improvement qualifies serverless Spark for a much broader range of SLA-sensitive workloads, and we’re always looking to optimize startup times even further. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Serverless Spark allowed us to quickly reap benefits by removing the need for fine-grain machine management. This drove faster model development and significantly reduced our data processing costs." &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- César Narnajo, Principal Engineer, Moloco&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=190tVajZgRI"
      data-glue-modal-trigger="uni-modal-190tVajZgRI-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/yt_SnqmNb0.max-1000x1000.png);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Serverless data science: Seamless AI workflows with Spark and BigQuery&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-190tVajZgRI-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="190tVajZgRI"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=190tVajZgRI"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Better GPU obtainability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Support for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/managed-spark/docs/guides/dws-serverless"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dynamic Workload Scheduler (DWS)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Flex Start Mode in the serverless 3.0 runtime version allows serverless Spark to queue customer requests for a configurable duration when GPUs are unavailable. This feature addresses the obtainability challenges for high-demand accelerators like NVIDIA A100 and L4 that are the subject of frequent regional shortages. By pausing workloads until the necessary GPU capacity becomes accessible with DWS, you can dramatically increase obtainability and reliability for your latency-sensitive AI/ML workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_L0aDvOP.max-1000x1000.jpg"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;First-class support for Apache Spark 4.x&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The s&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;erverless Spark 3.0 runtime version supports current and upcoming &lt;/span&gt;&lt;a href="https://spark.apache.org/releases/spark-release-4-0-0.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Apache Spark 4.x&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; innovations, including Spark Connect, which supports a decoupled client-server architecture that enables remote connectivity from any client.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhanced multi-zonal support&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To protect global enterprise workloads from zonal outages or hardware stockouts, the serverless Spark 3.0 runtime introduces enhanced multi-zonal support by default. The service can now automatically allocate execution nodes across multiple zones within a single region to help ensure obtainability.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Crucially, we do not charge for cross-zonal network traffic between nodes in a region, providing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;high availability without the traditional multi-zone tax.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This is another benefit that you can realize by bringing your global Apache Spark workloads to Google Cloud.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_2SbCvxI.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Looking ahead&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In addition to&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; the above, we’re also continuing to innovate and push the boundaries of ease of use in areas such as history-based &lt;/span&gt;&lt;a href="https://medium.com/google-cloud/a-google-engineers-take-on-a-common-spark-problem-and-how-we-re-fixing-it-44b26293cce0" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autotuning&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and goal based &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/managed-spark/docs/concepts/autoscaling-serverless#profiles"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autoscaling&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can take advantage of these features today by specifying &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;runtime_version: 3.0&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; in your batch workloads or interactive sessions.  To run your first workload on serverless Spark, perform the following simple steps:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/flows/enableapi?apiid=dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;If you aren’t the project owner, ask your project admin for the serverless Managed Service for Apache Spark &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/iam/docs/roles-permissions/dataproc#dataproc.serverlessEditor"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Editor &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;(&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;roles/dataproc.serverlessEditor&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;) role on the project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Now you’re ready to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/quickstarts/spark-batch#submit_a_spark_batch_workload"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;start running your workloads&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on the Serverless 3.0 runtime version.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; For more details, visit our updated &lt;/span&gt;&lt;a href="https://cloud.google.com/dataproc-serverless/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and access serverless Managed Service for Apache Spark in the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 03 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new in serverless Managed Service for Apache Spark</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vinay Londhe</name><title>Software Engineering Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Bhooshan Mogal</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>Accelerating data lakes: Optimizing Apache Iceberg and Spark with gcs-analytics-core</title><link>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many data engineers spend significant time managing compatibility and getting best performance across multiple analytics engines. To help solve this pain point, we are excited to announce &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;gcs-analytics-core&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a new open-source Java library designed to centralize and accelerate analytics optimizations for &lt;/span&gt;&lt;a href="https://cloud.google.com/storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Storage (GCS)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With this, you get the flexibility to select your preferred analytics engine while achieving high performance on GCS. The gcs-analytics-core library provides optimizations across various analytics engines that you use today on GCS, like the Iceberg Spark engine and plan to expand to other analytics engines by the end of this year.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Built to be shared across major data processing frameworks like Apache Spark, this library consolidates and improves performance for analytics workloads on GCS. Available natively in the Apache Iceberg Java runtime starting from version &lt;/span&gt;&lt;a href="https://iceberg.apache.org/releases/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;1.11.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, this library improves read operations for columnar formats like Parquet.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What is the gcs-analytics-core library?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is a centralized optimization layer that sits between your analytics engines — such as Apache Spark, Trino, and Apache Hive — and the underlying GCS Java SDK. It intercepts read calls and injects performance enhancements, providing a consistent experience without requiring framework-specific tuning.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For Apache Iceberg users, it integrates into the GCSFileIO implementation, replacing traditional sequential reads with parallelized strategies to minimize latency and maximize throughput.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Key technical optimizations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The library introduces specific optimizations designed to reduce time spent on I/O and end-to-end execution time:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Vectored I/O (threaded):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This feature improves read performance by fetching multiple data ranges in parallel within a single operation, reducing the overhead of GCS calls. Without this feature, the system needs to issue a separate call for each data range, increasing both the number of operations and open file latency for each request.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Smart Parquet prefetching:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When reading Parquet data, analytics engines typically perform an initial read of the file’s footer, which contains the data structure and information about where specific data ranges are located. The library automatically prefetches this footer data in a single chunk (typically 50KB–100KB), avoiding the multiple network calls that often occur when engines repeatedly seek backward to fetch metadata..&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Spotlight: Apache Iceberg integration&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We delivered the first major integration of this library into &lt;/span&gt;&lt;a href="https://iceberg.apache.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Apache Iceberg&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;With Iceberg 1.11.0 or later, analytics engines utilizing Iceberg’s GCSFileIO can leverage these performance enhancements. To adopt the library in your environment, verify your Iceberg catalog is configured to use the native GCS FileIO:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;# Spark configuration example\r\nspark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea030af40&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Because the core optimizations are embedded within the updated Iceberg runtime and the GCS connector architecture, you automatically benefit from Parquet footer prefetching and multi-threaded vectored reads — with no complex custom tuning required.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can follow the specific integration details in Apache Iceberg &lt;/span&gt;&lt;a href="https://github.com/apache/iceberg/issues/14326" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Issue #14326&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Catalog compatibility&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is compatible with all Iceberg catalogs  including the REST catalog, Hive, and other metadata management systems. By decoupling the performance optimizations from the catalog management layer, the library provides consistent read improvements without requiring adjustments to your existing infrastructure setup so you can scale across diverse data lake architectures.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS Performance Benchmarks using Spark&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To validate these improvements, end-to-end benchmarking was performed using an open source Apache Spark cluster with an Iceberg catalog configured to use GCSFileIO along with the gcs-analytics-core library.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The benchmark leveraged the industry-standard &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; schema across varying dataset sizes (from 1GB up to 10TB), specifically comparing the new library's optimizations against the default GCSFileIO implementation, which uses sequential vectored reads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By alleviating the I/O bottleneck at the storage layer, compute engines spend less time waiting for network responses (scan time) and more time processing data (execution time).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here are the end-to-end TPC-DS benchmark results showcasing the percentage improvement when enabling gcs-analytics-core:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/TPC-DS_benchmark_for_gcs-analytics-core_I7.max-1000x1000.jpg"
        
          alt="TPC-DS benchmark for gcs-analytics-core"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table style="width: 99.4778%;"&gt;&lt;colgroup&gt;&lt;col style="width: 29.2169%;"/&gt;&lt;col style="width: 32.5301%;"/&gt;&lt;col style="width: 38.253%;"/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS schema size&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Scan time improvement&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Execution time improvement&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;71.51%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;32.61%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;10 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;48.48%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;18.94%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;100 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;40.98%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;10.95%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1 TB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;35.86%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;3.38%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;10 TB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;18.40%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;1.58%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As the data shows, there is a consistent improvement across all dataset sizes. The library is effective for the complex query patterns in TPC-DS, delivering scan time reductions that directly lower overall query execution time.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Before running your Spark workloads, confirm that the following requirements and configurations are met:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Use Apache Iceberg Spark runtime 1.11.0+ and the iceberg-gcp-bundle 1.11.0+.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Configure your catalog to use GCSFileIO.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable the gcs-analytics-core optimization flag (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable vectorized I/O (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;spark.sql.iceberg.vectorization.enabled=true&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;) to achieve read performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;spark-submit \\\r\n  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.11.0,org.apache.iceberg:iceberg-gcp-bundle:1.11.0 \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true \\\r\n  --conf spark.sql.iceberg.vectorization.enabled=true \\\r\n  &amp;lt;your-application-jar-or-script&amp;gt;&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea030a9a0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is open source and available for developers to contribute to the project and explore the source code. Our implementation and micro-benchmark configurations are part of the repository and can be referenced for your contributions or validations.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;GitHub repository:&lt;/strong&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GoogleCloudPlatform/gcs-analytics-core&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Documentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Review the&lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;design document&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for deep architectural details.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We want to hear about your experience. If you test this on your own datasets, please feel free to open an issue on GitHub or share your results with the community. We look forward to seeing how you utilize these optimizations in your data lakes.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 02 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Accelerating data lakes: Optimizing Apache Iceberg and Spark with gcs-analytics-core</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Ajay Yadav</name><title>Software Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nivedita Aggarwal</name><title>Engineering Manager</title><department></department><company></company></author></item><item><title>The fully-managed Remote MCP Server for AlloyDB is now Generally Available</title><link>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI agents possess incredible reasoning capabilities and can perform increasingly complex actions. But the reliability of agentic outcomes depends entirely on the quality of the context they can access&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;— context that is frequently locked away in operational databases.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To bridge this gap, we are excited to announce the Remote Model Context Protocol (MCP) Server for &lt;/span&gt;&lt;a href="https://cloud.google.com/products/alloydb?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is now generally available. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Model Context Protocol (MCP) is an open-source standard that gives LLMs a secure, consistent way to connect to external data sources. As part of Google Cloud’s recent rollout of &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/google-managed-mcp-servers-are-available-for-everyone?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;50+ Google-managed MCP servers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, this new integration makes it easier than ever for both interactive and autonomous agents to securely harness the full power of your enterprise data. For example, you can now ask an AI agent for an up-to-the-millisecond view of your delivery fleet by connecting it to your real-time logistics data in AlloyDB, avoiding inaccuracies due to stale data and reducing the need for manual reporting.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Why AlloyDB is the strong foundation for agentic apps&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By connecting MCP to AlloyDB, your agents get access to the premier database built for enterprise-grade AI. AlloyDB delivers the scale, speed, and intelligence required for the most demanding agentic workloads:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Supercharged vector performance:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Scale to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/choose-index-strategy#:~:text=Scales%20well%20to%2010B%20vectors"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;over 10 billion vectors&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; at up to 6x the speed of standard PostgreSQL for vector queries (and up to 10x faster for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/filtered-vector-search-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;filtered queries&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) with the ScaNN index.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced search and reranking:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Power multimodal applications with hybrid search via &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/create-rum-index"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RUM&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (in Preview) and intelligent reranking through Reciprocal Rank Fusion (RRF) or &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/rank-rerank-search-results-rag"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise Platform models&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-time intelligence:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Efficiently generate &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/generate-manage-auto-embeddings-for-tables"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;millions of embeddings&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; using built-in &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/ai-query-engine-landing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to facilitate low-latency, real-time agentic experiences.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified data access:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Give agents a single PostgreSQL interface to seamlessly join operational data in AlloyDB with analytical data in BigQuery or archived data in Iceberg tables via &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/bigquery-view-alloydb-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse Federation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enterprise-grade scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Rest easy with a &lt;/span&gt;&lt;a href="https://cloud.google.com/alloydb/sla?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;99.99% SLA&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/overview#automatic"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autopilot&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; database optimizations, and auto-scaling read pools with up to 20 nodes. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Why Remote MCP matters for AlloyDB&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Local MCP servers are great for local development, but communicating over standard input/output (stdio) streams becomes difficult when you scale to production workloads. It is both architecturally complex and administratively burdensome to provision and manage all of the infrastructure and security guardrails you need to run agents for high-value use cases that interact with sensitive operational data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Remote MCP Server for AlloyDB runs on fully-managed Google Cloud infrastructure and exposes an HTTP endpoint that connects your AI applications to your data. This solves key challenges for teams building agents on PostgreSQL:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Centralized discovery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Find, secure, and manage your database's MCP server using &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/agent-registry/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Registry&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fully-managed HTTP endpoints&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: No need to deploy or maintain the infrastructure required for connectivity. Configure your agent to use the endpoint to get started.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fine-grained authorization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Instead of using shared database passwords or API keys, you use &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/iam/docs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Identity and Access Management (IAM)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to restrict agents to specific tables, schemas, or views. With the read-only execute SQL tool, you can prevent your agent from making accidental changes and deletions from your database. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Operational instance management&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The AlloyDB toolset gives agents the ability to do more than run queries. Agents can update instances, export and import data, create backups, and restore clusters.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Model Armor protection&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/model-armor?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Model Armor&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; provides optional prompt and response security to screen and filter data, defending against prompt injections or accidental data exfiltration.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Audit logging&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Every query, action, and tool call goes to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/logging/docs/audit"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Audit Logs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, giving security teams a full audit trail.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Let's see it in action: A quick demo&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Getting started with the AlloyDB Remote MCP server is a straightforward process. To see it in action in your own environment, you can follow our &lt;/span&gt;&lt;a href="https://codelabs.developers.google.com/alloydb-ai-mcp" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;new Codelab&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which guides you through these essential steps:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;API &amp;amp; environment prep&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Enable the AlloyDB, &lt;/span&gt;&lt;a href="https://cloud.google.com/products/compute?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Compute Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/products/gemini-enterprise-agent-platform?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; APIs in your Google Cloud project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Provision your database&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Deploy your AlloyDB cluster, create your database, and import your sample data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enable data access API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Permit the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/use-alloydb-mcp#execute-sql"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Data Access API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on your AlloyDB instance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Connect the agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Configure your MCP client by providing the remote endpoint (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;https://alloydb.googleapis.com/mcp&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;). Pass your Google Cloud IAM credentials using an OAuth 2.0 bearer token in the HTTP Authorization header.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once the connection is established, your agent can provide reliable, grounded answers to complex business questions using your real-time operational data. By performing introspection queries, the agent automatically understands your database schema – including tables and columns – enabling it to construct sophisticated joins and queries to fulfill user requests accurately.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_-_Setup.gif"
        
          alt="1 - Setup"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once your agent has access to the AlloyDB toolset, it can execute queries, analyze operational trends, and dynamically rank text data using AlloyDB &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/ai-query-engine-landing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; like &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.RANK()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_-_Rank.gif"
        
          alt="2 - Rank"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Security remains paramount: the Remote MCP Server for AlloyDB integrates seamlessly with Model Armor. This provides protection against sensitive data leaks, even if the agent’s service account possesses broad access permissions within the database. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_-_Secure.gif"
        
          alt="3 - Secure"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Watch the full demo below!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=-dPZ19fGM20"
      data-glue-modal-trigger="uni-modal--dPZ19fGM20-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_ZNMrpaE.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;How to connect AI agents directly to your enterprise data: Introducing the AlloyDB remote MCP server&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal--dPZ19fGM20-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="-dPZ19fGM20"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=-dPZ19fGM20"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What's next&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By enabling agents to interact securely with transactional data, we are embracing an architecture where AI agents can reliably access and act upon your enterprise’s single source of truth. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ready to build? Discover AlloyDB with a &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/free-trial-cluster"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;30-day free trial&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and dive into the &lt;/span&gt;&lt;a href="https://codelabs.developers.google.com/alloydb-ai-mcp" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Remote MCP for AlloyDB Codelab&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to start powering your enterprise agentic applications today.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 01 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</guid><category>AI &amp; Machine Learning</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>The fully-managed Remote MCP Server for AlloyDB is now Generally Available</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Paul Ramsey</name><title>Product Manager, AlloyDB, Cloud SQL, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Gleb Otochkin</name><title>Cloud Advocate, Databases, Google Cloud</title><department></department><company></company></author></item><item><title>Modeling a digital twin of a food supply chain using BigQuery Graph</title><link>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The example of a growing restaurant&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Imagine you are running a restaurant chain. You just can't physically feel and touch things to know how your business operates. You need tools and a digital replica of your business to&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; sense the health of the business for you.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The friction of growth&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Growth creates a unique kind of friction that spreadsheets simply weren't built to solve:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The bullwhip effect:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Small downstream demand shifts swell into upstream inventory tidal waves.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;SOP drift:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Tiny departures from standard prep work eventually erode the entire brand vibe.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The food safety blast radius:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; One contaminated ingredient creates a messy, complex map of risk across the network.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Maverick spend:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The "million-dollar leak" caused by local managers purchasing ingredients off-contract.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The digital twin&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Digital models empower us to ask more insightful questions about the world, but they also force a critical choice in how we structure data. While traditional relational tables have been the standard, we must ask: are they still the right tool for everything? Given that our world is inherently interconnected, perhaps shifting to graph-based models is the natural evolution for capturing reality.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When managing thousands of assets, complex supply chains, or global logistics networks, traditional relational databases require massive, resource-intensive SQL joins to trace dependencies. This architecture creates a latency gap between physical events and operational awareness.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Modeling with BigQuery Graph&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery Graph allows you to build a digital twin of your entire supply chain within your existing data platform. By turning your physical world—items, recipes, and locations—into a searchable map of nodes and edges, you gain a new level of clarity.&lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Defining the Semantic Layer&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Instead of moving data to a new database, you create a Graph View over your existing tables. This tells BigQuery exactly how your tables relate to one another.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Query Language:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;# Build the Graph Nodes &amp;amp; Edges\r\nCREATE or REPLACE PROPERTY GRAPH `restaurant.bombod`\r\nNODE TABLES (\r\n  `restaurant.item` label item properties all columns,\r\n  `restaurant.location` label location properties all columns,\r\n  `restaurant.itemlocation` label itemlocation properties all columns\r\n)\r\nEDGE TABLES (\r\n  `restaurant.bom`\r\n  KEY(bomKey)\r\n  SOURCE KEY (childItemLocation) REFERENCES `restaurant.itemlocation`(itemLocationKey)\r\n  DESTINATION KEY (parentItemLocation) REFERENCES `restaurant.itemlocation`(itemLocationKey)\r\n  LABEL consists_of properties all columns\r\n);&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea099cc40&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_6on1ArC.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="zg2w6"&gt;Image of a fictitious restaurant supply chain modeled using BigQuery Graph&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Precision in practice&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;How does this change daily operations? It moves the business from panic to precision.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Surgical recalls:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; If a supplier reports a Listeria breakout, you walk the graph forward to find exactly which menu items in which specific restaurants are affected.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Weather risk analysis:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When a hurricane threatens a distribution center, you don't see a list of stores; you see the blast radius. You identify the locations critically dependent on that hub and reroute supplies.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Executing the search&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Graph Queries are a new tool for modelers and data scientists to query their data - it simplifies complex multi-domain data concepts and simplifies querying and makes data analysis a simpler more natural representation of problem articulation. For example: If I want to know which all locations handle chicken I could run a graph query as shown below:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To investigate a specific complaint or risk, you run a search on the model using graph query language. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Graph Query Language&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;# Navigate to the source of a specific ingredient issue\r\nGraph restaurant.bombod\r\nMATCH (a:itemlocation)-[c:consists_of]-&amp;gt;(b:itemlocation) \r\nWHERE b.itemKey LIKE &amp;#x27;%Chicken%&amp;#x27;\r\nRETURN to_json([to_json(a),to_json(c),to_json(b)]) as result&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea099cdf0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_aIlciIs.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="zg2w6"&gt;Source of a foul odor - modeled as a graph&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Building for the future&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To get the most out of your digital twin, follow these guiding principles:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus on structure:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Use graphs for relationships and dependencies; keep daily sales totals in relational tables.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Clean your keys:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Spend time on data engineering; a graph is only as strong as its connections.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Capture edge properties:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Store metadata like lead times or shipping costs directly on the edges to increase the model's utility.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The restaurant industry has outgrown the relational way of treating business data only as a list. By building inter-domain relationships as a digital twin with BigQuery Graph, you move from reactive problem solving to proactive modeling. It’s time to stop managing your network with a list and start seeing the connections in seconds.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Check out the tutorial &lt;/strong&gt;&lt;a href="https://codelabs.developers.google.com/codelabs/supplychaingraph#0" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Visit the BigQuery documentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; find &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/graph-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;overview &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/graph-create"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;quickstart guide&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Share your feedback:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; join our &lt;/span&gt;&lt;a href="http://tinyurl.com/bqgraph-userforum" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;community&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and get your questions answered via &lt;/span&gt;&lt;a href="mailto:bq-graph-preview-support@google.com"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;bq-graph-preview-support@google.com&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Related blog: &lt;/strong&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-graph?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Introducing BigQuery Graph&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Mon, 01 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</guid><category>BigQuery</category><category>Databases</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Modeling a digital twin of a food supply chain using BigQuery Graph</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Guru Rangavittal</name><title>Cloud Transformation Technical Lead, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Candice Chen</name><title>Product Manager, BigQuery</title><department></department><company></company></author></item><item><title>Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators &amp; more</title><link>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;AI and cloud technology are reshaping every corner of every industry around the world. Without our customers, who are building the future on our platform, there would be no Google &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Cloud. In this &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up-april-2026"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;regular round-up&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, we dive into some of the exciting projects redefining businesses, shaping industries, and creating new categories. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;For our latest edition, we learn how &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Urban Outfitters&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; sped up its order management; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;BASF&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; uses AlphaEvolve algorithms to map global supply chains; the unification strategy for &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;UKG&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;’s workforce intelligence; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;WPP&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;’s secrets to training humanoid robot camera operators; how &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Breuninger&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; piloted Virtual Try-On APIs; creating automated video clips with &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Glance&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;; and &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Movix&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; improves the production of dental aligners.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Be sure to check back next month to see how more industry leaders and exciting startups are putting Google Cloud technologies to use. And if you haven’t already, please peruse our list of &lt;/span&gt;&lt;a href="https://workspace.google.com/blog/ai-and-machine-learning/how-our-customers-are-using-ai-for-business" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;1,302 real-world gen AI use cases&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; from our customers.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Urban Outfitters saves big by migrating order management&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Urban Outfitters, Inc. (URBN), the popular clothing and home goods retailer, relies on IBM Sterling OMS as the nerve center of its global ecommerce operations. However, the foundation of this critical system — a massive 11TB Oracle database — was increasingly becoming a bottleneck.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/urban-outfitters-moves-sterling-oms-to-alloydb-for-postgresql"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; URBN completed a major infrastructure upgrade, migrating its IBM Sterling OMS from an Oracle database to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud's AlloyDB for PostgreSQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. To enhance performance and provide high availability and scalability, the AlloyDB deployment architecture includes two read replicas, providing low-latency access to data for reporting and analytics. Google Cloud and IBM teams also assisted URBN in a rigorous, iterative switchover testing strategy.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The migration to AlloyDB has fundamentally reshaped URBN’s data strategy, delivering a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;more favorable total cost of ownership&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; through an optimized storage and compute architecture, without sacrificing performance or reliability. Furthermore, the shift to a PostgreSQL-compatible database gave URBN the flexibility of an open-source ecosystem, providing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;freedom from vendor lock-in&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, as well as &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;significant speed improvements &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;that enhanced responsiveness.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; "URBN’s successful migration serves as a blueprint for organizations looking to modernize their mission-critical infrastructure and future-proof their environment for AI expansion. This journey proves that even the most complex, mission-critical migrations can be achieved through deep cross-organizational partnership and a phased, risk-mitigated approach." – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Rob Frieman&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, CIO, Urban Outfitters &amp;amp;&lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt; Raj Pai&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, VP, Product Management, Databases, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;BASF manages supply chain decisions with AlphaEvolve&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; BASF Agricultural Solutions manages a complex network of 180 production sites with more than 5,000 distinct value chains. Currently, human planners make thousands of local decisions every day on what to produce, when to produce it, and how much safety stock to hold.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/how-basf-manages-thousands-of-supply-chain-decisions-with-alphaevolve"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; To understand how local decisions ripple across their entire global network, BASF turned to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlphaEvolve on Google Cloud&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to build a digital twin of their supply chain. In collaboration with Google Cloud and prognostica GmbH, BASF fed the model three years of historical data and then generated variations of the code, mutating the logic to see if it could simulate a supply chain that matched the real-world historical data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By running thousands of experiments, AlphaEvolve developed a clear, human-readable algorithm that explains how the BASF network truly operates. The final algorithm successfully mirrored the actual historical performance of the supply chain, significantly &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;reducing the error rates&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; compared to the initial seed model. It automatically discovered factually correct, domain-specific supply chain rules, providing a clear foundation for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;optimizing asset utilization globally&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; “We had several attempts to build a digital twin. … By using AlphaEvolve, we cannot only map the complex network based on system data, but at the same time understand and copy the human decisions that drive our daily operations.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Dr. Goetz Krabbe&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;vice president for global supply chain at BASF&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;UKG unlocks real-time workforce intelligence at scale&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; UKG is one of the leading providers of human capital management (HCM) and workforce management (WFM) solutions, but years of growth led to backend sprawl. They have 126 application teams, dozens of tech stacks, and more than 12,000 database instances.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/how-ukg-taps-workforce-intelligence-with-the-agentic-data-cloud"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; To bring the full UKG suite onto one real-time foundation, the company built People Fabric, a new data and intelligence platform powered by &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlloyDB for PostgreSQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and the just-announced &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. They created a custom change data capture (CDC) framework to extract changes from existing operational databases, and for larger analytical workloads, the same data flows into &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, while &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud SQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; holds the metadata and tenancy context.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; People Fabric gives UKG a complete and consistent view of people, work, pay, and culture data that’s updated continuously and ready for AI to use in real time. For engineering teams, People Fabric acts as a database-as-a-service that &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;accelerates development and supports modernization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; without customer disruption. Additionally, migrating core person and employment data off their on-prem monolith has generated &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;cost savings significant enough to fund half of People Fabric&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us: “&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;As we continue expanding People Fabric, we’re laying the groundwork for deeper agentic automation, more responsive analytics, and a growing set of AI-driven capabilities — all on a trusted, scalable foundation built for what’s next.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Radhi Chagarlamudi&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Group Vice President, Product Engineering, UKG &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Heather White&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Cloud Data Architect, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;WPP accelerates humanoid robot training 10x with G4 VMs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; WPP is one of the world’s largest marketing organizations, handling $70 billion of media for enterprise clients. They work on some of the most complex commercial film shoots and were eager to test the viability of robotic cameras to capture more footage, but this required complex training of physical models AI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure/wpp-humanoid-robots-ai-training"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; WPP used the new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;G4 VM instance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; powered by NVIDIA RTX PRO 6000 Blackwell on Google Cloud to tackle the unique challenges of training physical AI for robotics in videography settings. After capturing human motion with the OptiTrack mocap system, they undertook reinforcement learning using the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AI Hypercomputer&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; together with the NVIDIA Isaac Sim image. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;MuJoCo&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, an open source physics engine by Google DeepMind, was a critical piece of simulation software that validated accuracy continuously, in real-time.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; WPP was able to utilize a P2P topology that moves data directly between GPUs without the bottleneck of central processing. They saw &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;speed increases in excess of 10x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, taking training times down to less than one hour. Through high-volume simulation, the humanoid robots learned how to respond to small changes and bridge the tough "sim-to-real" gap, helping ensure the robot's simulated adaptability translated to safety and stability in the real world.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; "Our process for mastering complex, natural movement on a film set can be replicated across industries to overcome the massive computational complexity of training robots." – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Perry Nightingale&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;SVP of Creative AI, WPP&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Breuninger boosted sales with its "be your own model" AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Breuninger, a fashion and lifestyle company based in Germany, thought emerging generative media models could be a good fit to answer the question every online fashion shopper asks: "How will this look on me?"&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/retail/how-breuninger-boosted-sales-with-its-be-your-own-model-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Working with Google Cloud, they built a virtual try-on experience that lets shoppers see high-end fashion on their own bodies using a simple selfie. Using the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Virtual Try-On (VTO) API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, Breuninger’s data team worked directly with Google’s engineers to test and refine the technology in three stages, ultimately moving from pre-selected models to a user-first, selfie-based approach. The project was also part of Breuninger’s move to a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Flutter&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;-based platform, which helped the team move from its vision to a live launch in only three months.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; During a six-week A/B test over Black Week and the holiday season, the team found that shoppers who used the virtual try-on &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;converted purchases at a higher rate &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;than those who didn't. Customer surveys reinforced the numbers: shoppers responded well to the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;high image quality&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;personalized experience&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us: &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Breuninger continues to refine the experience based on how customers actually use virtual try-on in everyday shopping — the same user-first approach that shaped the project from the start.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Daniel Rascher&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Senior Product Owner, Breuninger &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Dr. Michael Menzel&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Customer AI Specialist, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Glance turns hours of video into mobile-ready clips&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Glance, a mobile-first content platform, processes 1-2 hour videos from sources like podcasts, news reports, movies, and web series, and transforms them into 30 to 180-second vertical clips optimized for mobile lock screens.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/media-entertainment/how-glance-turns-hours-of-video-into-mobile-ready-clips-with-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; The goal was to create a complete pipeline that takes a long-form landscape video (16:9) and outputs multiple ready-to-publish short-form portrait videos (9:16). The final technical solution uses &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud Speech-to-Text v2&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Vision API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, combined with custom video manipulation using Samurai (an open-source object tracking tool), OpenCV and MoviePy. The process involves audio extraction, speech-to-text transcription, and using &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini 2.5 Flash&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to analyze transcript text and identify optimal start and end timestamps for short video clips.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; With daily volume projected to grow from 3,500 to over 10,000 videos per day, manual editing wasn’t a realistic path forward. Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. The system transforms thousands of long-form videos into mobile-ready clips each day, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;preserving narrative context while optimizing for vertical viewing&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. Rather than choosing between scale and quality, automated pipelines can &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;deliver both&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. … The approach offers a template for any organization sitting on long-form video archives. Rather than choosing between scale and quality, automated pipelines can deliver both.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Himanshu Aggarwal&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Machine Learning Engineer, Glance &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Sharmila Devi&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, AI Consulting Lead, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Movix fills a gap in dental skills with specialized agentic AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Movix is building one of the first agentic AI solutions for dental appliance manufacturers and dental labs, to help solve a serious shortage of skilled dental technicians in aligner manufacturing.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/startups/filling-the-gaps-in-dental-skills-with-specialized-agentic-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Movix developed custom models for deep learning, computer vision, and 3D mesh analysis over a five-month period, using Google Cloud infrastructure. Once defects are detected, they use the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini Enterprise Agent Platform&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to generate client-facing feedback that reads as if it came directly from a human technician. Their 3D models use &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Run with L4 GPUs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for the massive compute power required, and they use &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Compute Engine VMs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to run experiments and train models.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Movix’s agentic solutions automate data entry and quality control, which are traditionally manual, time-consuming, and error-prone tasks. The automation and higher level of accuracy the QC agent delivers can &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;save $300 per remake&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for an aligner manufacturer, and speed up the appliance manufacturing process with quicker turnaround times.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; “&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;We plan to build hybrid solutions … designing an architecture that connects our cloud-based AI agents with older, on-premises software that many conservative labs still use — through lightweight local connectors and standardized APIs. This will allow us to access a large market segment that has not yet migrated to the cloud.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Marina Domracheva&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;CEO, Movix &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Bakit Dzhumagulov, &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;CTO, Movix&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 29 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</guid><category>Partners</category><category>AI &amp; Machine Learning</category><category>Data Analytics</category><category>Application Modernization</category><category>Infrastructure Modernization</category><category>Customers</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/cool_stuff_may.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators &amp; more</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/cool_stuff_may.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Google Cloud Content &amp; Editorial </name><title></title><department></department><company></company></author></item><item><title>From petabytes to predictions: Easy BigQuery insights in Google Sheets</title><link>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Many organizations’ single source of truth is data that resides in BigQuery, Google’s governed, secure and petabyte-scale data platform. However, the "last mile" of ad-hoc analysis, modeling, and reporting often happens where business users are most comfortable: Google Sheets.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Bridging this gap usually involves exporting data as CSVs. But this is inefficient, creating data silos, version control problems, and security and governance risks. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Connected Sheets&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; helps to eliminate this trade-off, turning the familiar Google Sheets interface into a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;direct, live window&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; into your BigQuery data platform, letting you analyze petabytes of data quickly, securely, and easily.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;In this post, we’ll do a quick overview of Connected Sheets, walk through real-world use cases, and show you how to perform enterprise-grade data analysis using BigQuery directly in Google Sheets. &lt;/span&gt;&lt;/p&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;A live window into the single source of truth&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Business users often wait days or weeks for simple reports. Connected Sheets solves this by letting you analyze your critical data via a secure, direct connection to billions of rows of live data, with no SQL required. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;For &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;data admins&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, this architecture is appealing because it maintains a strong security and governance posture. They can provision access to specific tables or views, confident that the underlying data cannot be altered from a Connected Sheet. Admins can also take advantage of Google Workspace’s enterprise data protections to control reading, sharing, and copying data throughout its lifecycle.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;end users&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, the benefit is immediate agility and ease of use. They can use familiar tools like pivot tables, charts, calculated columns, and formulas to analyze billions of rows of live data as if it were a local file, balancing centralized control with the business's demand for speed. End users don’t have to learn technical concepts like databases, schemas, tables, and query languages like SQL to access, analyze, and visualize the data.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/intro_cs.gif"
        
          alt="intro cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Key use cases and core journeys&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;We consistently hear about three primary use cases for Connected Sheets from customers across industries. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Self-service exploratory analysis:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Data teams provide access to curated tables and datasets in BigQuery. Business Analysts in sales, operations, finance, or marketing can then build their own pivot tables or charts that run over the entire live data source directly from Sheets, then filter data to answer day-to-day questions, freeing the data team from a constant backlog of ad-hoc requests.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Deep-dive investigation&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A sales manager analyzes millions of global transactions to review quarterly performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Using a Connected Sheets pivot table, they quickly create a pivot table to summarize revenue by region and product line. When they spot an anomaly — an unexpected revenue spike in EMEA, for example — they simply double-click the summarized value to drill down and learn more about exactly what led to that value.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Connected Sheets instantly queries and retrieves the precise, granular transaction rows behind that summary value, making it easy and fast to find the root cause.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/pivot_table_cs.gif"
        
          alt="pivot table cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Operational reporting:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Business users can create live, refreshable, and easy-to-understand dashboard-like views of their data that their partner teams can rely on and share with executives and leads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Automated executive summary&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; An operations lead provides weekly updates on sales invoices to their leadership, based on a BigQuery dataset with millions of rows.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The operations lead creates their Connected Sheet and builds a series of charts to visualize invoice trends over time. They then configure the sheet to automatically refresh on a schedule every Monday morning, so it’s always ready ahead of their executive review.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The manual routine of exporting data and pasting it into workbooks is completely eliminated. Leadership gets a reliable report and analysis powered by the latest warehouse data.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/schedule_refresh_cs.gif"
        
          alt="schedule refresh cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Hybrid data modeling:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Data practitioners often need to blend governed warehouse data with real-time manual inputs and annotations. For example, a finance team might pull revenue data from BigQuery and combine it with manual procurement entries from your ERP system in a separate tab, using VLOOKUP to create a consolidated view for month-end reporting.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Custom business metrics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A financial analyst calculates custom commission payouts based on live sales data from your CRM system. The commission tier logic changes frequently and isn't modeled in the central data warehouse.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Instead of requesting a new data pipeline from their data team, the analyst can add a calculated column directly within the Connected Sheet. They use standard spreadsheet formulas (like &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;IF&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;IFS&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;) to apply custom business logic directly against the BigQuery data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The analyst retains the flexibility to model scenarios and calculate metrics quickly, while maintaining governed BigQuery data as their single source of truth.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Getting started&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Connecting Google Sheets to BigQuery is straightforward and requires only a Google Workspace account and a billing-enabled Google Cloud project. There are two primary ways to establish a connection and create a Connected Sheet.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Path 1: Starting from Sheets&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;This is the typical workflow for users who work primarily within spreadsheets.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Open a new Google Sheet.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Navigate to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Data &amp;gt; Data Connectors &amp;gt; Connect to BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Select your billing-enabled Google Cloud project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Browse available datasets, select a Saved Query to connect right away, or input a custom SQL query. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Click &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Connect&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Path 2: Starting from BigQuery&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;This workflow is common for data analysts starting from the Google Cloud console.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Navigate to the BigQuery UI in the console.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;In the Explorer pane, locate the table or query result you wish to analyze.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Click the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Export&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; menu (or the three-dot action menu) next to the asset.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Select &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Open in &amp;gt; Connected Sheets&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;From petabytes to predictions with Connected Sheets&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We designed Connected Sheets to help you bridge the gap between the scalability of the cloud and the flexibility of the spreadsheet. With Connected Sheets, we’re making it easier than ever for organizations to put data into the hands of the people who need it.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To explore these features, connect your BigQuery data to Google Sheets today. For more technical details, visit the &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery/docs/connected-sheets"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Connected Sheets documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 29 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</guid><category>BigQuery</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>From petabytes to predictions: Easy BigQuery insights in Google Sheets</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Tarak Parekh</name><title>Sr. Product Manager, BigQuery</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Laura Gagliano</name><title>Sr. Product Manager, Workspace</title><department></department><company></company></author></item><item><title>Evolving Dataflow to process massive datasets for machine learning</title><link>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google created &lt;/span&gt;&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MapReduce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; more than 20 years ago to solve the scaling problems in data processing that the then young company was running into. The AI era that we are in now demands efficient, large-scale data processing for everything from training frontier models like Gemini by Google DeepMind to powering fully autonomous vehicles like Waymo. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many aspects of machine learning, including data ingestion, transformation, and feature extraction, rely heavily on processing massive datasets. To meet this astronomical scale required by efforts across Google, we evolved our data platform, Flume, the successor to the original MapReduce, with innovations focused on &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;scalability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;efficiency&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and a better &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;developer experience&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. And many of those innovations are available as part of &lt;/span&gt;&lt;a href="https://cloud.google.com/products/dataflow"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dataflow&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;our fully managed batch and streaming platform built on the same core technology Google uses to power its most demanding internal workloads.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. In this blog, we provide an overview of the many innovations in the Flume platform, and a glimpse into how Google Cloud customers are putting those features into action with Dataflow. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Addressing massive scalability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The scale of data processing at Google has exploded over the last 20 years and continues to drive innovation. To tackle the challenges of immense scale, we introduced several features within Google's data processing platform, which are also available in Dataflow::&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Liquid sharding&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; dynamically splits work units (shards) during execution for on-the-fly rebalancing. This helps pipelines with uneven data distribution and stragglers to maximize worker efficiency as data grows.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Global compute&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; enables enormous scaling by dynamically scheduling workloads across Google's global infrastructure. The system automatically determines the optimal location based on factors like data locality and resource availability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Automatic pipeline optimization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; fuses consecutive operations into a single stage. This reduces I/O and stage-transition overhead, allowing large-scale execution to scale more gracefully.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rate-limiting external API calls&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; manages load on external services. This is essential for modern ML pipelines that frequently call external APIs for tasks like model evaluation, preventing high data volumes from overloading systems.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Tandem pools&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; facilitate serverless remote inference. This feature helps overcome scalability limitations often found in remote inference systems by efficiently hosting, sharing, managing, and autoscaling external model servers.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Boosting efficiency with accelerators&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Doing more with less isn't just a constraint; it fuels our progress. By finding ways to run more efficiently, we create the space and capacity needed for rapid innovation. This is particularly evident for teams that use accelerators like TPUs for their workloads. To improve utilization and cost efficiency, our engineers devised several novel features for our platform, now part of Dataflow:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Heterogeneous worker pools&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; allow developers to specify custom resource requirements for different pipeline stages. For example, TPU-intensive work runs on TPU-equipped workers, while other stages use standard CPU workers. This ensures optimal resource allocation.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;TPU-aware autoscaling&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; prevents excessive initial assignment of TPU workers and improves efficiency during subsequent autoscaling events.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Duty-cycle policy enforcement&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; automatically scales down TPU workloads when the accelerator's duty cycle (the fraction of time it is active) is low, scaling back up only when utilization improves.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;TPU fungibility&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: By working with other infrastructure teams, we developed optimizations to encourage scheduling jobs to the most suitable TPU version and cell location based on quota and resource availability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhancing the developer experience&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Considering the wide mix of backgrounds and tools across Google, rapid prototyping, iteration, and reliable production operations are extremely important. Google has invested in significant capabilities to enhance the overall user experience:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Language flexibility&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is provided through a versatile SDK with a simple API in C++ (internal to Google), Java, Python, and Go (with SQL support). This allows users to build batch, ML, and streaming pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Integration&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; with ML frameworks like &lt;/span&gt;&lt;a href="https://docs.jax.dev/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;JAX&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is available, along with native support for LLM-specific optimizations. The underlying platform also provides building blocks for robust agentic inference pipelines and supports simple transitions between bulk and streaming paradigms.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified batch and streaming&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; enables users to use the same code for both historical batch and live streaming data. This simplifies the architecture, which traditionally would have required separate pipelines for batch and streaming data processing.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Observability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for production pipelines is available through the monitoring UI, which offers comprehensive control and essential diagnostic data. Detailed performance metrics, such as stage-level TPU utilization graphs, provide transparency for troubleshooting and optimization tasks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced developer workflows&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for quicker day 0 and day 2 operations include features like sampling and dry-run to help ensure code accuracy. Users can also test pipelines on small in-memory collections, and even pause and resume production pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Dataflow brings innovation from Google's internal platform to Google Cloud &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Dataflow is built upon Google's internal platform, sharing many core components, including the execution engine and the Apache Beam SDK (which originated from Flume’s APIs). This close relationship means that the cutting-edge solutions we build to handle Google’s internal data processing challenges, like pipelines that process hundreds of billions of documents, directly benefit Dataflow users. In fact, unique Dataflow features like vertical scaling, right fitting, dynamic sharding, and straggler detection all resulted from solutions developed for Google’s internal workloads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This is one of the reasons many Google Cloud customers rely on Dataflow for critical ML applications: &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Spotify&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; uses Dataflow for &lt;/span&gt;&lt;a href="https://engineering.atspotify.com/2023/04/large-scale-generation-of-ml-podcast-previews-at-spotify-with-google-dataflow" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;large-scale generation of ML podcast previews&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;. Etsy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; leverages Dataflow for &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/etsy-ai?hl=en&amp;amp;e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;data preparation and ETL&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for its ML workloads. And &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Moloco&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; uses Dataflow to process &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/moloco"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;terabytes of data a day to update its prediction model&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for real-time ad bidding.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The momentum continues: Last quarter we launched support for TPU in Dataflow in addition to supporting GPUs. Looking ahead, we are working on an advanced reliability feature called speculative execution and are enhancing the developer experience with features like failure isolation and replay and pause/resume, which are coming soon. To learn more or get started with Dataflow visit &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataflow/docs/get-started"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://docs.cloud.google.com/dataflow/docs/get-started&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 28 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</guid><category>AI &amp; Machine Learning</category><category>Customers</category><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Evolving Dataflow to process massive datasets for machine learning</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Shan Kulandaivel</name><title>Group Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mustafa Saglam</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>The future of agentic development: Redefining the data practitioner lifecycle with Data Agent Kit</title><link>https://cloud.google.com/blog/products/data-analytics/data-agent-kit-brings-data-skills-and-tools-to-your-ide-or-cli/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;The modern software development landscape isn’t happening just on one surface — it’s happening across an entire ecosystem of agentic tools. Agents are being developed at an unprecedented scale, and these agents require direct access to enterprise data for context and grounding.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;However, the current tooling for building agents and managing data is heavily fragmented. This can make it difficult to access data, increasing security risks, and causing broken developer experiences that hinder innovation.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;To address this challenge, we recently launched &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/whats-new-in-the-agentic-data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Data Agent Kit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a unified, open-source collection of data engineering and data science skills, tools and plugins that integrate directly into the environments practitioners already use, such as VS Code, Claude Code, Codex, Gemini CLI and the Antigravity CLI. By seamlessly bringing together these core tools and skills with your enterprise data, the Data Agent Kit effectively serves as a comprehensive harness for agentic context, memory, and personalization. It provides:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic skills:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Pre-codified pathways for interacting with your data estate, covering query optimization, ML best practices, data validation, data drift checks, governance, and troubleshooting.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Model Context Protocol (MCP) tools:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Secure connections between agentic workflows and cloud data platforms like BigQuery, AlloyDB, and Google Cloud Storage. Developers can now configure connection parameters for their cloud datasets and data processing engines without having to manage complex, manual pipeline code.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Plugins and extensions:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Native IDE integrations that enable rich, context-aware developer interactions.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Together, these Data Agent Kit capabilities help data practitioners go from manually writing code to &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;intent-driven data science and engineering: &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;defining the desired business outcomes, constraints, and success criteria, and allowing the AI-augmented system to figure out how to execute it. This shift is critical because today, when building agentic applications that navigate complex data architectures, there’s often a 'context window tax' i.e., developers have to manually paste vast amounts of schema metadata into prompts, eating up token limits and increasing latency. Meanwhile, data practitioners often lack guidance about how to efficiently query, optimize, and troubleshoot cloud data, while specialized, fragmented development environments cannot see across your entire data estate. Data Agent Kit helps with these challenges and others, providing the foundational capabilities data practitioners need for a new agentic way of working. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Read on for an overview of Data Agent Kit’s features and benefits, how to install it and connect your local environment to your data estate, and an intent-driven engineering example&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;A unified hub for your data estate and lifecycle&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Data Agent Kit makes your entire data estate available in a single view. This goes beyond providing a simple catalog for databases such as BigQuery, AlloyDB and Spanner; rather, it integrates data engineering and science tasks, orchestration pipelines, and jobs into a single interface. This allows practitioners to manage their entire data workflow — from discovery to production — without context switching. Data Agent Kit’s intelligent routing automatically chooses the optimal compute engine for your task — whether that’s BigQuery for SQL-native analytics and ELT, or Spark for custom Python transformations and distributed ML training. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_Unified_Catalog.gif"
        
          alt="1 Unified Catalog"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Unified Hub of your entire data estate and lifecycle&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Ecosystem-led&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; intelligence: C&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;odified agentic skills &lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Data Agent Kit offers a library of predefined agentic skills (e.g., ML best practices, ELT, building data apps) based on Google Cloud’s data engineering and science expertise. Rather than relying on generic LLM prompts, it codifies prescriptive guidelines into your workflow. This allows you to inject enterprise-grade data intelligence directly into your IDE or CLI.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_Agentic_Skills.gif"
        
          alt="2 Agentic Skills"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Browsing a predefined list of agentic data engineering and science skills&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Transforming data exploration through natural language&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Grounded in this unified data, Data Agent Kit delivers native conversational analytics directly within your workspace, making it easy to explore your data. Powered by the same Gemini natural language to SQL technology found in our first-party agents (e.g., Conversational &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-conversational-analytics-in-bigquery?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;and &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-conversational-analytics-now-ga/?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Looker&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;), Data Agent Kit lets you run natural language queries to profile, search, and visualize your datasets. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_Conversational_Analytics.gif"
        
          alt="3 Conversational Analytics"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Within Data Agent Kit, you can use Conversational Analytics to explore your data&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;A practical&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; walkthrough: &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Unifying data and building models&lt;/span&gt;&lt;/h2&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;To see how Data Agent Kit’s skills and MCP tools work together, consider a financial services scenario: Your company is facing rising fraud claims. With your transaction data stored in Cloud Storage, you need to build a high-confidence fraud detection model and schedule orchestration pipelines. Traditionally, this involves hours of data wrangling across multiple consoles. With the Data Agent Kit, you can complete this in minutes, directly within your IDE or CLI. Let’s see how.&lt;/span&gt;&lt;/p&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Onboarding: The one-minute setup&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;You can get started with the Data Agent Kit in under a minute through an integrated setup process.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;To do so, search for "Google Cloud Data Agent Kit" in your IDE’s marketplace (VS Code) or via the GitHub repo in your CLI (Gemini, Antigravity, Claude, Codex) from the links in the “Get started today” section below. Data Agent Kit automatically configures dependencies and checks your Google Cloud login status.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_VS_Code_Marketplace_Extension.max-1000x1000.jpg"
        
          alt="4 VS Code Marketplace Extension"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Click the Google Cloud icon in your activity bar to authenticate via IAM. Once logged in, your Cloud Storage, databases, and catalog assets appear instantly in your workspace.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Use the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;settings&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; menu to set project IDs, regions, and verify MCP status to ensure all backend services are authorized. Data Agent Kit also includes a quick-start guide on using the tools and skills. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Data_Agent_Kit_Extension_Installed.max-1000x1000.jpg"
        
          alt="5 Data Agent Kit Extension Installed"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;An intent-driven data engineering example&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;With Data Agent Kit installed, you can s&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;kip the manual ETL boilerplate, and directly describe your high-level goal to your coding assistant (e.g., Claude Code, GitHub Copilot) in natural language. The assistant leverages Data Agent Kit’s skills to plan and execute the workflow.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Prompt:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;I have the &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;raw transaction logs&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; landing in the &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;GCS &lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;bucket gs://fin-clearing-raw/.&lt;/code&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;First, &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;create a Spark notebook&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; and (1) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;ingest &lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;these logs into an &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;Iceberg table in BigQuery&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;.&lt;/code&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;Second, &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;create a dbt project&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; to (2) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;deduplicate &lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;them, (3) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;remove the transactions with invalid transaction id&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; and store them in a separate Iceberg table, (4) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;standardize &lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;the timestamps and perform any other necessary cleanup tasks (5) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;sync the output to another Iceberg table&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; (6) join this output table with tables that have payer and payees identities and write the output to a final Iceberg table.&lt;/code&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;Third, I would like you to &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;train an ML model on Spark using a notebook&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; to detect fraudulent transactions in the output table. I am thinking about a LightGBM model but I am open to any suggestions you might have. Use the relevant datasets in the project.&lt;/code&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;Finally, &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;create an inferencing step using Spark notebook&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; to the above pipeline to perform batch inferencing and write flagged transactions to a Spanner table.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code style="vertical-align: baseline;"&gt;Create an &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;orchestration pipeline&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; that first runs the ingestion then the dbt and next the inference notebook.&lt;/code&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3 style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Under the hood: Data pipeline steps&lt;/strong&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Behind the scenes, Data Agent Kit plans a robust multi-step orchestration of the entire data lifecycle, from exploration to inference. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 1: Notebook creation, ingestion and initial storage&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Find your bronze data — raw, unfiltered data on financial transactions — and bring it into an Iceberg table before doing the transformations. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Automatically create a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Notebook&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to ingest the raw logs from Cloud Storage. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Write the necessary &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;SQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and store the ingested data into an &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Iceberg table&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; in BigQuery.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/6_Ingestion.gif"
        
          alt="6 Ingestion"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Ingestion into a bronze table&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 2: Transformation (dbt Project)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Now, clean the bronze data into silver and gold tables: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data preparation: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Deduplicate&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;the transaction logs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Filter invalid IDs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Identify transactions with invalid IDs and store them in a separate Iceberg table.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Clean and standardize:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Standardize timestamps and perform other necessary cleanup tasks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Sync:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Output the cleaned data to another Iceberg table, leveraging the BigQuery MCP server.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enrichment:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Join the cleaned table with payer and payee identity tables.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Final output:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Write the joined dataset to a final Iceberg table.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/7_Transformation.gif"
        
          alt="7 Transformation"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Data transformation to create silver and gold tables&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 3: Machine learning and inferencing&lt;/strong&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;With your gold table minted, it’s time for some data science: model training and inferencing. Here, the agent hands the clean data from the previous step to the model to spot fraudulent patterns.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Training:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Use a Spark notebook to train an ML model.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Inference:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Create a Spark notebook inferencing step for batch processing.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Storage:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Write all flagged fraudulent transactions to a Spanner table by leveraging the Spanner MCP.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/8_ML_Inferencing.gif"
        
          alt="8 ML Inferencing"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Machine learning and inference&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 4: Orchestration and execution&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, you’re ready to move to production and schedule the whole orchestration pipeline: Ingestion -&amp;gt; Transformation -&amp;gt; Inference.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/9_Orchestration.gif"
        
          alt="9 Orchestration"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Orchestration pipelines and scheduling runs&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;When things go sideways: Agentic incident management and intelligent recovery&lt;/strong&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;If an orchestration pipeline fails, not to worry, Data Agent Kit streamlines resolution using its intelligent incident management capabilities:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Intelligent diagnosis:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Automatically conducts root cause analysis to pinpoint failure sources&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Autonomous remediation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Drafts and tests fixes, bypassing manual debugging&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Automated recovery:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Validates and deploys fixes via automated Git workflows&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/10_Issue_diagnosis_and_remediation.gif"
        
          alt="10 Issue diagnosis and remediation"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Issue diagnosis and remediation&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;And there you have it: You’ve gone from raw discovery to a fully automated, fraud-catching machine in a matter of minutes, all from within the same UX. No need to hop between multiple browser tabs, IDE interfaces, or learn data engineering and science best practices — Data Agent Kit orchestrates a clean end-to-end flow leveraging various MCP tools and codified skills. Ultimately, this approach helps you achieve what matters most: shipping innovative, high-performance data applications at scale.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Data Agent Kit is available today in preview. Start by installing it in your favorite IDE or CLI:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://marketplace.visualstudio.com/items?itemName=GoogleCloudTools.datacloud" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;VS Code Marketplace&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension/antigravity/install"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Antigravity CLI&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://github.com/gemini-cli-extensions/data-agent-kit-starter-pack" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GitHub Repo (Gemini CLI, Claude Code, Codex)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://open-vsx.org/extension/googlecloudtools/datacloud" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;VSX&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://claude.com/plugins/data-agent-kit-starter-pack" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Claude Code Plugin&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then visit the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to learn more and get started. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 19 May 2026 17:45:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/data-agent-kit-brings-data-skills-and-tools-to-your-ide-or-cli/</guid><category>AI &amp; Machine Learning</category><category>Google I/O</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>The future of agentic development: Redefining the data practitioner lifecycle with Data Agent Kit</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/data-agent-kit-brings-data-skills-and-tools-to-your-ide-or-cli/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Brahm Kohli</name><title>Group Product Manager, Data Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Dinesh Chandnani</name><title>Director of Engineering, Data Cloud</title><department></department><company></company></author></item><item><title>Beyond the Query: 5 Scenarios Laying the Foundation for the Agentic Era</title><link>https://cloud.google.com/blog/products/data-analytics/building-an-agentic-data-layer-on-google-cloud-5-key-scenarios/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Accessing enterprise data is shifting from static reports to dynamic use by autonomous systems. To keep up, organizations must route fragmented data from SaaS, IoT, and legacy sources into secure, scalable endpoints.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;However, moving to AI-driven exposure requires more than just connecting an LLM to a database, it requires a fundamental architectural shift to manage security, costs, and semantic accuracy.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;What we’ll cover&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This article explores the technical evolution of data exposure through five architectural patterns: from manual SQL development to autonomous workflows standardized by the Model Context Protocol (MCP).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While the examples use BigQuery and mocked CRM data, the patterns apply to most enterprise data assets transitioning into an agentic workflow.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;The 5 Scenarios of Data Evolution&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The transition from static reports to agentic insights is defined by two factors: Trust and complexity.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Trust dictates autonomy: Low-trust environments (like external client-facing apps) require deterministic, hard-coded logic to prevent errors. High-trust environments (like internal tools for power users) allow for probabilistic LLM reasoning, where there is more tolerance for non-deterministic outputs.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Complexity defines utility: Simple lookups need fast, cached responses. In contrast, complex, cross-functional problems require an agent to orchestrate multiple tools and data sources.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To navigate this shift, we will examine five technical scenarios, starting with the baseline of the static API.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 1: The Static API Contract&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Maximum stability and deterministic execution&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 1 represents the traditional model of data exposure. A developer acts as the intermediary, translating specific business requirements—such as "Show me top-selling products by category"—into optimized, hard-coded SQL queries.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Isolation and Predictability&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This approach provides the highest level of security and performance:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Low logic risk&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Because the SQL is pre-written and vetted, there is no risk of a user (or an agent) crafting a query that accesses unauthorized data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Secure by design&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Using parameterized queries instead of string concatenation provides a hard barrier against SQL injection.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Reliability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The output is deterministic. If the development lifecycle is robust, the user is guaranteed to receive exactly what they requested, with predictable execution costs and performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This snippet demonstrates the baseline for data exposure: a direct, static API contract. It offers maximum predictability by using parameterized queries to prevent SQL injection and ensure consistent performance.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;A note on the code examples:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; To prioritize architectural clarity, these examples are provided as conceptual blueprints rather than production-ready code. They are designed for pedagogical purposes and intentionally omit "industrial" requirements such as persistent session state, IAM/Auth protocols, and comprehensive exception handling. Use these only as a logic guide before implementing your own hardened and production-ready solution.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;from google.cloud import bigquery\r\ndef fetch_products(limit=10):\r\n    client = bigquery.Client()\r\n    # Use named parameters to ensure security and prevent SQL injection\r\n    sql = &amp;quot;&amp;quot;&amp;quot;\r\n        SELECT id, name \r\n        FROM `bigquery-public-data.thelook_ecommerce.products` \r\n        LIMIT @limit\r\n    &amp;quot;&amp;quot;&amp;quot;\r\n    job_config = bigquery.QueryJobConfig(\r\n        query_parameters=[\r\n            bigquery.ScalarQueryParameter(&amp;quot;limit&amp;quot;, &amp;quot;INT64&amp;quot;, limit)\r\n        ]\r\n    )\r\n    return client.query(sql, job_config=job_config).to_dataframe()&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea170c790&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Users cannot change the query logic or filters without code changes.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Query plans are static; costs are predictable and easy to budget.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Low response times leveraging for example BigQuery's query cache.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Every new business question requires a developer and a deployment.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;When to use Scenario 1?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This approach is the benchmark for external-facing applications, customer portals, and high-traffic production dashboards. It is the best choice when your requirements include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Strict auditability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You need a version-controlled (Git-based) history of every query executed against your data warehouse.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Performance at scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You require sub-second latency, leveraging BigQuery’s result caching for high-concurrency workloads.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Deterministic logic:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You must guarantee that specific inputs always produce the exact same output, with no room for AI-driven variability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;External multi-tenancy:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You are exposing data to third parties and need absolute assurance against data cross-contamination.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 2: Custom Agent with SQL Generation&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; User flexibility and managed autonomy.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To resolve the development bottleneck of manual SQL authoring, Scenario 2 introduces an LLM agent (via the Agent Platform SDK) to act as a dynamic translator. In this model, the developer stops writing individual queries and starts focusing on metadata documentation.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;From Query Writing to Metadata Curation&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Using the Agent Platform SDK (for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/machine-learning/python-sdk/use-vertex-ai-python-sdk"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Python&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, for example), developers implement a reasoning engine that maps natural language to schema metadata. Rather than "guessing" the SQL, the agent follows a structured reasoning loop:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Analyze:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; It parses the natural language intent (e.g., "Which region had the highest growth?").&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Retrieve:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; It looks up the relevant schema metadata provided in the system context.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Construct:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; It generates a syntactically correct, BigQuery-compatible statement.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For the LLM to generate accurate queries, it must "see" the data structures. You provide this through system instructions that include table names, column types, and—crucially—semantic descriptions (e.g., &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"created_at: The timestamp when the user first registered"&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;). By curating this metadata space, you define the boundaries of what the agent can explore and execute.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Access control relies entirely on underlying database permissions (like RLS). Because the agent passes generated SQL dynamically, data boundaries must be enforced at the database level.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This marks the first step into agentic workflows, where an LLM acts as a translator between natural language and structured schema.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;from google.cloud import bigquery\r\nfrom vertexai.generative_models import GenerativeModel\r\n\r\ndef ai_query(user_prompt):\r\n    # Initialize the model\r\n    model = GenerativeModel(&amp;quot;YOUR_LLM_MODEL&amp;quot;)\r\n    \r\n    # SYSTEM CONTEXT: Grounding the model with schema metadata\r\n    # This prevents the AI from guessing table names or column types.\r\n    system_instruction = (\r\n        &amp;quot;You are a BigQuery SQL expert. Output ONLY raw SQL code without markdown backticks. &amp;quot;\r\n        &amp;quot;Context: The \&amp;#x27;products\&amp;#x27; table in \&amp;#x27;bigquery-public-data.thelook_ecommerce\&amp;#x27; &amp;quot;\r\n        &amp;quot;contains: id (INT), name (STRING), and category (STRING).&amp;quot;\r\n    )\r\n    \r\n    full_prompt = f&amp;quot;{system_instruction}\\n\\nUser request: {user_prompt}&amp;quot;\r\n    \r\n    # Generate the SQL string\r\n    response = model.generate_content(full_prompt)\r\n    sql_code = response.text.strip().replace(&amp;quot;```sql&amp;quot;, &amp;quot;&amp;quot;).replace(&amp;quot;```&amp;quot;, &amp;quot;&amp;quot;)\r\n    \r\n    # Execute the AI-generated query\r\n    client = bigquery.Client()\r\n    return client.query(sql_code).to_dataframe()&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea170c880&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Users can ask virtually any question in plain English.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;LLMs may generate unoptimized queries (e.g., missing partitions).&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Includes LLM "thinking" time.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers manage "prompt schemas" rather than SQL code.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;When to use Scenario 2?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 2 is best suited for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;internal data discovery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;analyst-led exploration&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. It bridges the gap between raw data and business users when you require:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;High-variability querying:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When the range of potential business questions is too broad (the "infinite question space") to be efficiently covered by pre-built, static APIs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid prototyping:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When analysts need to quickly explore datasets and validate hypotheses before committing to the development of formal, production-grade dashboards.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Semantic interpretation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When you need an agent to resolve natural language ambiguities—such as mapping "last quarter" or "active users"—into specific, technical filter criteria automatically.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 3: Conversational Analytics&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Managed reasoning and verified logic.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 3 shifts the responsibility from a self-managed custom agent to a specialized, platform-native reasoning engine. By leveraging the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics-api/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (currently in Pre-GA), you can deploy Data Agents - intelligent, governed layers that use enterprise-specific metadata and verified SQL to keep the LLM within strictly defined guardrails. This API translates natural language into precise queries across BigQuery, Looker, and Data Studio, while extending support to Google Cloud’s primary database solutions. We’ll consider BigQuery as our primary example for exploring these conversational insights.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;The Power of Verified Queries&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Unlike generic LLM prompts that guess the SQL structure, these agents are grounded in your organization’s source of truth:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Verified queries:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You provide a library of verified queries (vetted, high-quality SQL examples) that the agent uses as a reference for complex joins and business logic. This ensures the agent follows your established coding patterns.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed context:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The platform handles the retrieval of schema information and documentation, reducing the prompt bloat that often leads to hallucinations in custom-built agents.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Aligned outputs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By grounding the model in existing production SQL, the system ensures that AI-generated insights remain consistent with your official reporting metrics.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This solution inherits existing BigQuery IAM permissions and provides a view of the reasoning and SQL behind every answer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Can all of this be done with enough work on a fully customized agent? Yes. Is the custom approach practical, and time/cost efficient? Maybe not.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This approach leverages a specialized reasoning engine to handle intent discovery and data grounding. The developer no longer manages the translation logic: they simply call the managed agent.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;from google.cloud import geminidataanalytics_v1beta as gda\r\ndef chat_data(user_query):\r\n    # Initialize the client for the Data Agent service\r\n    client = gda.DataAgentServiceClient() \r\n    # Path to your pre-configured Data Agent resource\r\n    agent_path = &amp;quot;projects/YOUR_PROJECT_ID/locations/us/dataAgents/YOUR_AGENT_ID&amp;quot;\r\n    # Execute: The agent uses its &amp;quot;Verified Queries&amp;quot; and metadata to find the answer\r\n    request = gda.ExecuteDataAgentRequest(name=agent_path, query=user_query)\r\n    response = client.execute_data_agent(request=request)\r\n    \r\n    # The agent returns both the natural language answer and the supporting data\r\n    return response.answer&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea170c820&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;High for the data sources it knows, but restricted by its Verified instructions and metadata scope.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Grounded queries are typically more efficient than raw LLM generation.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Higher than static queries, due to the multi-stage reasoning and summarization process.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Managed by Google; analysts focus on coaching the agent through metadata and verified SQL.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;When to use Scenario 3?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 3 is the ideal path for BigQuery-centric analysis where accuracy is non-negotiable. Choose this when you require:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Governed trust:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Business logic (e.g., "Revenue") must follow pre-vetted verified queries every time.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Native intelligence:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Users need to perform complex tasks like forecasting or anomaly detection via BigQuery AI using natural language.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Auditability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Stakeholders require a transparent reasoning path to see exactly how the AI arrived at its numbers.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While Scenario 2 requires building a custom reasoning engine from scratch, Scenario 3 provides a platform-native experience that prioritizes verified logic over raw LLM generation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The limitation: This data companion is ultimately confined to the BigQuery or Google Cloud ecosystem. To scale an agentic workforce across heterogeneous platforms and tools, we must look toward vendor-agnostic standards like the Model Context Protocol (MCP).&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 4: Managed MCP Tools&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Standardized connectivity and decoupled architecture.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 4 introduces the Model Context Protocol (MCP)—an open-source standard designed to normalize how AI applications interact with data and tools. While previous scenarios rely on custom SDKs or platform-specific APIs, MCP provides a universal interface that separates the reasoning layer from the tool execution layer.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Standardized Abstraction&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;MCP enables tool discovery by exposing a manifest of capabilities that any compliant agent can ingest. This allows for a modular system where the data logic is "externalized" from the agent itself.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The MCP client:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The reasoning engine (the LLM) that identifies the user's intent. Because it uses a standardized protocol, the client can connect to any MCP server and instantly discover what it can do without needing new integration code.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The MCP server:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The domain-specific service that exposes data and logic. The managed BigQuery MCP server doesn't just pass queries: it encapsulates the logic required to navigate Google Cloud’s infrastructure safely. It exposes tools such as:&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;code style="vertical-align: baseline;"&gt;list_dataset_ids&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;: Context-aware discovery of the data environment.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;code style="vertical-align: baseline;"&gt;get_dataset_info&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;: Metadata retrieval for semantic grounding.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;code style="vertical-align: baseline;"&gt;execute_sql&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;: Controlled execution of data retrieval.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;(see &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/reference/mcp"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://docs.cloud.google.com/bigquery/docs/reference/mcp&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for the updated toolset).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Access control is managed via standard IAM service accounts and lacks programmatic logic-checks.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This decoupling future-proofs your AI stack. You can swap your LLM provider or upgrade your agent's reasoning model without rewriting the data access logic, because the interface between them remains consistent and governed.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In an MCP-based architecture, connecting an AI agent to a data source is reduced to a simple configuration handshake. Instead of writing custom integration logic, you provide an MCP-compliant client (such as the Gemini CLI or a modern IDE) with a manifest defining the server’s location and security requirements.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The following manifest allows the client to connect to Google’s managed BigQuery MCP server, enabling it to dynamically discover and execute data tools without a single line of custom code:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;{\r\n  &amp;quot;mcpServers&amp;quot;: {\r\n    &amp;quot;bigquery&amp;quot;: {\r\n      &amp;quot;httpUrl&amp;quot;: &amp;quot;https://bigquery.googleapis.com/mcp&amp;quot;,\r\n      &amp;quot;authProviderType&amp;quot;: &amp;quot;google_credentials&amp;quot;,\r\n      &amp;quot;oauth&amp;quot;: {\r\n        &amp;quot;scopes&amp;quot;: [\r\n          &amp;quot;https://www.googleapis.com/auth/bigquery&amp;quot;\r\n        ]\r\n      }\r\n    }\r\n  }\r\n}&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea1787bb0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Agents can contextually explore any table the MCP server exposes.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Tools are standardized, but a curious agent can still trigger large scans.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Includes standard overhead for the protocol handshake and tool-calling.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Uses a managed MCP Server which requires no maintenance. The work is only on the MCP client.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;When to use Scenario 4?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 4 is the architectural choice for multi-agent environments that require standardized data connectivity with minimal maintenance overhead. It is the ideal path when you require:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed infrastructure:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You want to offload the security, execution, and maintenance of your toolset by consuming a managed BigQuery MCP server rather than building and patching custom data-retrieval code.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;LLM portability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You need an open-standard interface, allowing you to use the same tools across different LLMs or agent frameworks without rewriting proprietary function calls.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Autonomous discovery:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Your agents must navigate and inspect complex datasets dynamically. MCP’s standardized endpoints allow agents to crawl metadata and schema information autonomously to determine the best path for a query.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 5: Custom Hosted MCP Servers&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Architectural extensibility and custom tool definition.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 5 takes the standardized connectivity of Scenario 4 and adds complete control by replacing the managed service with a custom-built MCP server. Typically hosted on scalable infrastructure like Cloud Run, you can rely on open source solutions such as &lt;/span&gt;&lt;a href="https://github.com/googleapis/mcp-toolbox" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MCP toolbox&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This approach removes the guardrails of managed offerings, granting engineering teams full freedom to define specialized tools, integrate disparate third-party APIs, and implement proprietary execution logic within the protocol.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Architectural Advantages of Custom MCP&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Shifting to a custom-hosted MCP server moves operational complexity from the LLM prompt to the server-side logic, unlocking three critical capabilities:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Deterministic tool tailoring:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Instead of forcing an agent to navigate raw, sprawling schemas, developers define high-level functions with specific data shapes. This replaces probabilistic SQL generation with deterministic execution, virtually eliminating schema-based hallucinations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified source orchestration&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: A custom MCP server acts as a consolidated gateway. Within a single tool execution, the server can orchestrate calls across BigQuery, external SaaS APIs, and legacy on-premises systems. The agent receives a pre-processed, unified response, abstracting away the multi-source complexity.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Programmable governance:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This scenario enables code-level security difficult to implement in managed environments. You can implement granular controls directly within the protocol layer, such as:&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Dynamic PII masking: Automatically redacting sensitive data before it reaches the LLM.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Custom authentication: Injecting enterprise-specific middleware.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Contextual rate limiting: Throttling tool usage based on the end-user’s identity or cost center.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In this scenario, when using &lt;/span&gt;&lt;a href="https://github.com/googleapis/mcp-toolbox" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MCP toolbox&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, you use a declarative tools.yaml file to define the interface of your custom MCP server. This file acts as the absolute boundary for your agent—it defines the BigQuery connection, enables safe discovery for schema inspection, and wraps complex, multi-table joins into a single, parameterized tool.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;# ----------------------------------------------------------------------\r\n# Minimal Configuration\r\n# Dataset: bigquery-public-data.thelook_ecommerce\r\n# ----------------------------------------------------------------------\r\n\r\nsources:\r\n  bq-thelook-ecommerce:\r\n    kind: &amp;quot;bigquery&amp;quot;\r\n    project: &amp;quot;${PROJECT_ID}&amp;quot;\r\n    location: &amp;quot;${BQ_LOCATION}&amp;quot;\r\n\r\ntools:\r\n  # 1. Discovery Tool: Helps the agent understand the database schema\r\n  bigquery_get_table_info:\r\n    kind: bigquery-get-table-info\r\n    source: bq-thelook-ecommerce\r\n    description: Retrieves table metadata and schema details. Run this before executing custom queries.\r\n\r\n  # 2. Execution Tool: Parameterized SQL for safe, repeatable data fetches\r\n  thelook_get_user_orders_summary:\r\n    kind: bigquery-sql\r\n    source: bq-thelook-ecommerce\r\n    statement: |\r\n      SELECT\r\n        orders.user_id,\r\n        COUNT(DISTINCT orders.order_id) AS count_of_orders,\r\n        COUNT(order_items.id) AS count_of_items,\r\n        SAFE_DIVIDE(COUNT(order_items.id), COUNT(DISTINCT orders.order_id)) AS avg_items_per_order\r\n      FROM `bigquery-public-data.thelook_ecommerce.orders` AS orders\r\n      INNER JOIN `bigquery-public-data.thelook_ecommerce.order_items` AS order_items\r\n        ON orders.order_id = order_items.order_id \r\n        AND orders.user_id = order_items.user_id\r\n      WHERE orders.status = &amp;quot;Complete&amp;quot; \r\n        AND orders.user_id = @user_id\r\n      GROUP BY orders.user_id;\r\n    description: Retrieves an order summary for a specific user ID, including total completed orders, items purchased, and average items per order.\r\n    parameters:\r\n      - name: user_id\r\n        type: integer\r\n        description: The unique identifier of the user.\r\n\r\ntoolsets:\r\n  # Binds the tools together for agent use\r\n  thelook_core_insights_toolset:\r\n    - bigquery_get_table_info\r\n    - thelook_get_user_orders_summary&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea17874f0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Supports cross-domain orchestration (e.g., BigQuery + legacy APIs) and unlimited custom tool definitions.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Allows developers to inject programmatic query cost estimation and budget thresholds prior to execution.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Custom multi-hop orchestration, network transit, and container cold-starts introduce latency.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Requires full ownership of the application lifecycle, including CI/CD, dependency patching, and container scaling.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;When to use Scenario 5?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This architecture is the power user choice, essential for highly regulated environments and hybrid infrastructures where managed services fall short. Implement this approach when your design requires:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Secure hybrid orchestration:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You must bridge BigQuery with private on-premises systems or restricted APIs, returning a pre-processed, consolidated payload that the agent can use immediately without navigating the raw network gap.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Hardened business logic:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You need to move complex, non-negotiable calculations off the LLM and into a controlled code environment, exposing only high-level "expert" tools to guarantee absolute accuracy.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Centralized enterprise tooling:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You want to maintain a single, governed source of truth for your proprietary tools that can be served uniformly across different LLM providers or internal frameworks without vendor lock-in.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Conclusion: The Foundation of the Agentic Era&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The journey from Scenario 1 to Scenario 5 traces a clear technical evolution: we are moving away from rigid, hard-coded data silos and toward a world of autonomous discovery and standardized connectivity. By adopting frameworks like the Model Context Protocol (MCP), organizations can decouple their data logic from their AI models, ensuring that as LLMs evolve, their access to the enterprise "brain" remains seamless, scalable, and vendor-agnostic.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;However, increased autonomy does not mean decreased oversight. While we haven’t touched on these points in depth in this article, we must adhere to a fundamental truth: data access must be governed and controlled using governance and security tools. Regardless of the access scenario—more or less agentic depending on the use case—security, credentials, quality management, and standardized governance are absolutely essential.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;On a more lighthearted note, it’s worth remembering that the golden rule of computing still applies: "Garbage In, Garbage Out". You can build the most sophisticated, autonomous agentic layer in the world, but if you feed it messy, uncurated data, you’ll simply get "garbage" answers at a much faster and more confident pace. Sophisticated AI doesn't fix bad data: it just makes it more visible. Maintaining high data quality is not just a legacy requirement—it is the fuel that makes the agentic engine actually work.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 18 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/building-an-agentic-data-layer-on-google-cloud-5-key-scenarios/</guid><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Beyond the Query: 5 Scenarios Laying the Foundation for the Agentic Era</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/building-an-agentic-data-layer-on-google-cloud-5-key-scenarios/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Marco Liotta</name><title>Technical Account Manager, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Lorenzo Caggioni</name><title>Data &amp; AI Architect, Google Cloud</title><department></department><company></company></author></item><item><title>What we announced in streaming AI at Next ‘26</title><link>https://cloud.google.com/blog/products/data-analytics/streaming-ai-news-from-next26/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Every device, user, and microservice generates data. Ingesting this data, extracting meaning and insights, and driving business decisions in real time has the potential to deliver transformational business value.The rise of agentic AI represents an opportunity for users to overcome the challenges inherent in real-time analytics. But while agentic AI has the potential to accelerate adoption, users face a new set of challenges with effectively leveraging real-time data:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-time context is hard to implement. &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Teams will choose to incorporate data from batch-oriented approaches, like periodic database syncs and scheduled refreshes. Agents have to either rely on stale data or require memory-intensive context windows. This “context lag” makes them ineffective for real-time agentic tasks like fraud detection, dynamic e-commerce recommendations, or autonomous supply chain adjustments. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-time systems are inflexible. &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Agentic tools lack the modularity to adapt to customer-specific requirements, forcing organizations to make difficult architectural choices. Data practitioners need a platform to meet them where they are, where they are free to make the tradeoff between latency, accuracy, and cost. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud provides a tightly integrated, unified streaming data platform that delivers both fully managed, Google Cloud-native services, as well as open-source-compatible services, and that come together to power large-scale AI training and inference. The platform is comprised of five key services: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Pub/Sub:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Highly reliable, serverless, and fully managed service for messaging and event streaming that’s integrated with BigQuery, Dataflow, and Cloud Storage. Pub/Sub is utilized by organizations like Anthropic. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Dataflow&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: A serverless engine for batch, streaming, and now agentic AI. Leading enterprise organizations like &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/partners/palo-alto-networks-builds-a-multi-tenant-unified-data-platform?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Palo Alto Networks&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; use Dataflow, as do Google services like Waymo and Google Maps. For instance, Waymo cars use Dataflow to help it “see” the world, plan their routes, and predict obstacles. Before a car hits the actual pavement, it “drives” millions of miles in a simulator, with Dataflow generating training datasets and validating the models that are used for autonomous driving.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Kafka:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The fully managed way to run the popular open source streaming storage and data integration system on Google Cloud that’s highly reliable, secure, and cost efficient. Across the largest enterprises and startups, Apache Kafka serves as a staging location for critical training data and real time updates to AI agent context. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: A unified platform for real-time ingestion and analysis. The Storage Write API provides high-throughput streaming into BigQuery and Lakehouse for Apache Iceberg tables with exactly-once delivery semantics and stream-level transactions. Additionally, BigQuery continuous queries enable real-time AI inference directly within the data pipeline by calling generative functions like AI.GENERATE_TEXT, allowing for immediate insights as data is ingested.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Bigtable&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Google’s NoSQL real-time database for processing streaming data from Pub/Sub and Dataflow automatically using  &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigtable/docs/continuous-materialized-views"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;continuous materialized views&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, delivering results in seconds that are ready for low-latency serving using Bigtable’s &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigtable/docs/in-memory-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;in-memory tier&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Moving from insight to autonomous action&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud Next, we announced a set of &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;streaming AI capabilities&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, providing autonomous agents with instant context and enabling real-time actions, helping organizations feed real-time context to their AI agents.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For instance, imagine a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;supply chain agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; that doesn't just monitor IoT data, but autonomously reroutes a shipment around bad weather, confirms new delivery windows with the receiving warehouse, and updates the customer's portal — all before a human supervisor is even aware of the problem. Consider a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;financial services agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; that identifies a fraudulent transaction pattern, instantly freezes the account, communicates with the customer via their preferred channel, and initiates a new card shipment — all within seconds of the suspicious activity. Whether you’re creating embeddings on streaming data to power search, or building a sophisticated multi-agent fraud detection system, these new capabilities add powerful tools to your toolbox. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s take a closer look at these new capabilities. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;New streaming AI capabilities&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Next ‘26, we launched tightly integrated capabilities to our platform across three key areas:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_rME1Dt8.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;1. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Providing real-time, enriched context for agents&lt;/strong&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;1.1. &lt;a href="https://docs.cloud.google.com/pubsub/docs/smts/ai-inference-smt"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Pub/Sub AI Inference SMT&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;(GA)&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;You can now run inference on messages streamed through Pub/Sub. Data practitioners can choose any models available on Gemini Enterprise Agent Platform. Pub/Sub makes the inference call and appends the result to each message before sending it downstream, bringing Pub/Sub’s simplicity together with the Gemini Enterprise’s fully managed tools.&lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;1.2. &lt;a href="https://docs.cloud.google.com/pubsub/docs/bigtable-subscriptions"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Pub/Sub Bigtable subscriptions&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;(Preview)&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Stream Pub/Sub data directly to Bigtable. Pub/Sub Bigtable subscriptions directly materialize event data from a Pub/Sub topic into a Bigtable table, eliminating the need for custom pipelines and dramatically simplifying your streaming architecture. For instance, you can easily ingest vector embeddings into Bigtable to power semantic search workloads. &lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;1.3. &lt;a href="https://docs.cloud.google.com/bigquery/docs/continuous-queries#stateful_processing_with_joins_and_windowing_aggregations"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery continuous queries stateful data processing&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Preview): &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery continuous queries can now perform complex correlations between multiple data streams using JOINs and calculate metrics over consistent time intervals with tumbling window aggregations. This enables sophisticated analysis, such as calculating 30-minute averages or correlating events across different streams, directly as data is ingested into BigQuery. Furthermore, you can integrate AI directly into your data pipelines by calling generative functions like AI.GENERATE_TEXT, as well as materialize continuous query SQL results into BigQuery tables or export them to operational sinks like Bigtable, Spanner, and Pub/Sub for real-time reverse ETL.&lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;2. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Direct agents to manage your resources&lt;/strong&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;2.1. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Model Context Protocol (MCP) support for &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/pubsub/docs/use-pubsub-mcp"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Pub/Sub&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/managed-service-for-apache-kafka/docs/use-managed-service-for-apache-kafka-mcp"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Managed service for Apache Kafka&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigtable/docs/use-bigtable-mcp"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Bigtable&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/use-bigquery-mcp"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (GA)&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Your agents can manage Pub/Sub,Managed service for Apache Kafka services, and BigQuery using fully managed MCP endpoints. Agents can also publish messages to Pub/Sub. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;2.2. &lt;a href="https://adk.dev/integrations/?topic=google" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;ADK integration&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;(GA)&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Your agents can interact with your real-time data stored in Pub/Sub, Bigtable, BigQuery, or other Google Cloud services using pre-built ADK integrations. Developers can build agents acting on real-time context without having to implement complex configurations or plumbing.&lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;3. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Combine multi-agent systems with your data processing&lt;/strong&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;3.1. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Event-driven autonomous agents&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: As agents become core to our workflows, real-time data pipelines must evolve to incorporate them directly into the stream. We have enabled this capability by treating agentic logic as a &lt;/span&gt;&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.agent_development_kit.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;first-class citizen&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; within the Dataflow pipeline. You can now incorporate your agent code using the &lt;/span&gt;&lt;a href="https://adk.dev/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Development Kit (ADK)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and deploy it as a specialized node using the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;RunInference&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; transform and the new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;ADKAgentModelHandler&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Key advantages of this approach include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="list-style-type: none;"&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Massive scalability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Leverage Dataflow’s architecture to process high velocity events upstream and keep hundreds of agents sessions active simultaneously, each driven by specific incoming events.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Pre-processing power:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Dataflow handles the heavy lifting of complex data enrichment, delivering a “ready-to-act” context directly to the agent so it can focus on reasoning.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;3.2. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Dataflow Unified embeddings Sinks:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We are introducing unified embedding generation directly within the data stream to eliminate “context lag”. You can now transform incoming data into high-dimensional vectors at low latency using Dataflow. These real-time embeddings are then seamlessly materialized into our expanded suite of high-throughput vector sinks, which now includes &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Spanner&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; (featuring its new built-in vector search) and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlloyDB&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, providing you with an up to date vector database for semantic search needs as well as for your autonomous agents making RAG calls with an instantly searchable and perfectly synchronized long-term memory. This feature works with both remote and local models, for example &lt;/span&gt;&lt;a href="https://developers.googleblog.com/en/deploying-embeddinggemma-at-scale-with-dataflow/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemma&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As we continue to build out the platform, customers can expect to see even tighter integrations and more powerful capabilities. We look forward to seeing what you build with these new capabilities.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 18 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/streaming-ai-news-from-next26/</guid><category>Streaming</category><category>Google Cloud Next</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What we announced in streaming AI at Next ‘26</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/streaming-ai-news-from-next26/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Jagdeep Singh</name><title>Director Product Management</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Prateek Duble</name><title>Group Product Manager</title><department></department><company></company></author></item><item><title>The power of LLMs on your data, more than two orders of magnitude faster and cheaper</title><link>https://cloud.google.com/blog/products/data-analytics/more-than-100x-faster-and-cheaper-llm-powered-sql-queries-with-proxy-models/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Databases have introduced new AI-powered SQL functions which take natural language instructions as input and are evaluated using LLMs. They leverage the power of LLMs to answer new kinds of queries: &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Which product reviews are negative about durability?&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Which customer support tickets have been resolved by providing a workaround?&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These new AI functions push the boundaries of what is possible in a SQL query engine by bringing the semantic understanding of LLMs to your data, thus enabling previously impossible analyses and applications. But, their cost and performance limited their applicability. LLM invocations add 10-100x to the overall query latency and ~1000x on cost. This is much too slow for operational databases. In analytics, a medium-sized query on 10-100 millions of rows would consume an amount of tokens that is prohibitively expensive for some applications.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud has published a &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2603.15970" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;new paper at SIGMOD&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; where we show how to accelerate and reduce the cost of LLM-powered AI functions by using &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;proxy models&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. Proxy models are cost-optimized ultra-lightweight models tailored to a specific query (aka prompt) and tuned for your data. They replace the majority of LLM calls during query execution (thus the name proxy model) and can be trained on-the-fly or ahead of time. The fundamental ideas behind proxy models were proposed in &lt;/span&gt;&lt;a href="https://arxiv.org/pdf/2407.09522" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Universal Query Engine (UQE)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; at NeurIPS 2024 by Google DeepMind.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our paper shows that proxy models are automatically applicable in many (but not all) cases, sometimes with no loss of quality, sometimes with minor quality loss and a few times with a gain of quality. BigQuery and AlloyDB already implement this optimization under the optimized mode feature for AI.IF (&lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-if"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery docs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/evaluate-semantic-queries-ai-operators"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB docs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) and AI.CLASSIFY (&lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-classify"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery docs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;). This article is a tl;dr of the SIGMOD paper and provides the key intuitions on three questions: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Why &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;do proxy models work so accurately for so many cases, even though they are so much more performant than LLMs? &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;How&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; do they work?&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;In which &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;use cases do they deliver accurate answers? In which cases they fail and accuracy needs LLMs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Why Proxy Models Work Accurately at Ultra Low Latency and Cost?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;How can an ultra-lightweight proxy model, such as the logistic regression currently in use at BigQuery and AlloyDB, have the semantic understanding power of LLMs, which is required for accurate question answering? The key intuition is that these proxy models input rich embeddings of the data that they query. By default, we are using the &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2503.07891" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini embedding generators&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which do the heavy lifting of bringing semantics to your data when the embeddings are generated. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then the ultra low latency and cost are easy to see: Since embeddings are generated once and used many times, the cost of bringing semantics to your data is amortized; it now happens once as opposed to happening for each query. Furthermore, the proxy models run fast in the CPU — no need for dedicated hardware.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We hope that we gave you good intuitions for why proxy models work. But a word of caution is also needed: Proxy models are fundamentally an approximation technique more limited than LLMs. Proxy models perform well on some prompts but may be deficient to LLMs in others. Case in point, the SIGMOD26 paper shows that the proxy/LLM predictive performance (as measured by F1) ratio ranged from 90% to 116% in 10 benchmarks. For example, they might break down on problems that require reasoning to connect multiple semantic concepts. Rather, think of them as specializing the model to your query and your data. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The good news is that the query processors automatically check the effectiveness and feasibility of implementing AI Functions by proxies. Let’s see how they do it. &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;How Proxy Models Work?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s go through a simple example of a semantic filter (AI.IF). Our taste in movies is very particular: We like movies with an interesting plot and great cinematography. The query below processes IMDB reviews to find such movies.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;SELECT\r\n  DISTINCT t.primary_title\r\n FROM \r\n   bigquery-public-data.imdb.reviews r, \r\n   bigquery-public-data.imdb.title_basics t\r\n WHERE TRUE\r\n   AND r.movie_id = t.tconst\r\n   AND AI.IF(&amp;quot;Is the plot interesting? Review: &amp;quot; || r.review, \r\n     embeddings =&amp;gt; r.review_embedded)\r\n   AND AI.IF(&amp;quot;Does the review praise the cinematography? Review: &amp;quot; || r.review, \r\n     embeddings =&amp;gt; r.review_embedded)&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;lang-sql&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efea0ac78e0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The column &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;review&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; contains the free-form text of the review. The column &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;review_embedded&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; contains Gemini embeddings of the review text. When you run this query in BigQuery, the query engine will&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;For the first AI.IF, create a training samples’ set consisting of about one thousand rows of the input relation, the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;imdb.reviews&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; table.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Use an LLM to label the first sample set, marking each review as either TRUE (yes, the plot is interesting) or FALSE (no, the plot is not interesting).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Train a proxy model for the first AI.IF using the labels computed at the previous step.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Create a test sample set of rows for the first AI.IF and evaluate the quality of the proxy model on this test set.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Based on the eval results, the optimizer adaptively decides to either perform inference using the proxy model or fall back to LLM inference for the first AI.IF&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Repeat the above steps for the second AI.IF&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_VsHiEj1.max-1000x1000.jpg"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In BigQuery, all steps happen on-the-fly during query execution. AlloyDB, being an operational database that targets sub-second latencies, avoids the online proxy model training and the online evaluation. Rather, the query’s proxy models are computed ahead of time in a PREPARE statement, thus moving the cost of sampling, labelling and training out of the critical query path. This enables the offline creation of a big pool of PREPARE statements, while the application chooses the proper PREPARE statement and executes it in the online path.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s take a step back and look at what is really happening at step #3. The proxy model uses each dimension of the review embeddings (from &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;review_embedded)&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; as its features. Modern dense embedding models like Gecko or Gemini capture myriads of semantic notions. In our example with movie reviews, at a high level of abstraction, relevant notions would include: “aesthetic”, “thought-provoking plot”, “underwhelming plot”, or perhaps “boring movie”. We stress the “high level of abstraction” because, in the binary “language” of foundation models, all these notions (and many more) are spread in the numbers of the dense embedding. Do not expect to spot a dimension that corresponds directly to cinematography. Importantly, the embedding space contains many more notions that are irrelevant to our task. The training of the proxy model essentially weighs heavily relevant notions and discards irrelevant ones.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_NyftwXO.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="3w3bd"&gt;A proxy model (green plane) isolating relevant semantic notions by cutting the embedding space (blue sphere)&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Now, let’s enter the details of the particular proxy model, which is used by our current version: logistic regression. To visualize what is happening, think of embeddings as unit vectors forming a (hyper)sphere. For a binary classification task, the proxy model essentially cuts the sphere in two halves. In our example “aesthetic” and “thought-provoking plot” would fall on one side of the plane, whereas “underwhelming plot” and “boring movie” would be on the other side. Conceptually, the orientation of the plane determines which semantic notions are more relevant. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Importantly, the proxy model is tuned for your data and your question: The training of the proxy used a high quality LLM to label a sample from your data for the particular question. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Revisiting when Proxy Models Work&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We can now see more clearly what distinguishes cases that proxy models work from cases they don’t: proxy models work well for prompts that can be decided by detecting semantic notions in the embedding space. They will fail for complex prompts that require forms of reasoning that go beyond detecting patterns in the embedding model.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The good news is that, in practice, we have observed that proxy models work for a large class of AI+SQL queries. The &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2603.15970" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;SIGMOD26 paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; provides a comprehensive evaluation, showing that proxies worked in 11 benchmarks. Specifically, in 10 benchmarks the ratio of proxy F1 to LLM F1 ranged from 90% to 102% and in the 11th benchmark (Amazon Reviews) it was 116%. Notice that the proxy may even deliver better accuracy because it got the benefit of being trained by multiple samples as opposed to the LLM that addressed each row as a new problem.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;There is a second limitation currently: extreme selectivities. Notice that Step 1 collects samples. It needs to collect many examples for TRUE and many examples for FALSE. Multiple sophisticated techniques are employed to achieve this, even when the TRUEs are many more than the FALSEs or vice versa. However, no purely sampling technique can confront cases of extreme selectivity, i.e., cases of very few TRUEs or very few FALSEs. This is the reason that the proxies will not be employed in such extreme selectivity cases. However, notice that this problem is fundamentally addressable by various techniques. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Why isn’t Vector Search Enough?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Proxy models appear … suspiciously close to vector search. After all, they also input vector embeddings. Why not just vector search? There are two reasons why vector search is not enough: The obvious one is that proxies are not rankers; they are classifiers: multiclass classifiers (AI.CLASSIFY) or binary classifiers (AI.IF). But, even if you narrow down to just AI.IF, an attempt to simulate AI.IF with vector search will be both hard-to-setup and will give suboptimal results. While proxy models are tailored to your data and your prompts, vector search is based on generic distance functions (such as cosine)&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Experimental Results&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We present here a subset of characteristic benchmarks from &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2603.15970" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the SIGMOD26 paper.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; We compare the accuracy of proxy models with using LLM inference on all rows. In terms of quality, the relative accuracy varies from 0.92 (lowest) to 1.16 (highest), which means that for some tasks, proxy models perform slightly better than straight LLM inference. &lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Dataset&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Prompt&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;F1 (Proxy)&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;F1 (LLM)&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Relative (Proxy/LLM)&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Amazon Reviews 10k &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Review is {sentiment label}&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.860 &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.739 &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;1.163&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Banking77 &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Is intent {intent label}? Think step-by-step: {CoT instructions}&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.700 &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.707 &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.990&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;California Housing&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Location in Latitude &amp;amp; Longitude belongs to Southern California&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.953 &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.953&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;1.0&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;FEVER&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Is the claim supported by the text?&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.782 &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.853 &lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p style="text-align: center;"&gt;&lt;span style="vertical-align: baseline;"&gt;0.917&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In terms of scalability and costs, the architectural differences between BigQuery and AlloyDB lead to slightly different results for each system. At a high-level, proxy models move parts of the computation from specialized hardware used by LLM inference services to ordinary database workers. This results in a large reduction in costs and in query latency. In the online training case, employed by BigQuery, for a typical one million row query, proxy models consume about 400x less tokens, and the latency goes down by 30x-100x. In AlloyDB’s case the LLM costs of PREPARE, which are similar to BigQuery’s, can be amortized over arbitrarily many runs of the prepared statements that invoke proxy models.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_oF0uTc4.max-1000x1000.png"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="3w3bd"&gt;The cost reduction (token consumed) and latency improvement (query speed up) for various table sizes.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Conclusion&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI functions calling LLMs are becoming commonplace in databases. Choosing the proper model for each AI function is an active area of academic research (e.g. &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2509.02896" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BARGAIN&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;). The key intuition is right-sizing models: Performant cheap models for “easy” problems, powerful reasoning models for the hard problems. Our work builds on the same principles, but while academic research has only used LLMs to navigate the performance spectrum, non-LLM proxy models push performance much further using ultra-lightweight and highly specialized models that deliver surprisingly good quality for many problems. Yet, we should not be surprised: After all, the proxy models feed on the rich semantics that foundation models bring to embeddings and they also feed on being trained by LLMs. As embedding models improve and extract increasingly richer and finer semantics from text and multimodal data (image, video), we suspect that non-linear classifiers will be useful to identify even more complex semantic patterns, further extend the applicability of proxy models (e.g. to AI joins also) and explore additional points on the performance/quality Pareto.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you would like to learn more, our &lt;/span&gt;&lt;a href="https://arxiv.org/abs/2603.15970" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;full paper&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; dives into the differences between online vs. offline training, and compares the performance of different embedding models as well as various proxy models (linear regression, SVM, XGB).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can try proxy models today in BigQuery (&lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/optimize-ai-functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;docs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) and AlloyDB (&lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/accelerate-queries-optimized-functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;docs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;), dramatically speed up the AI Functions of your SQL queries and reduce their token consumption.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sub&gt;&lt;em&gt;&lt;span style="vertical-align: baseline;"&gt;We would like to thank Bo Dai, Yuchen Zhuang, Xingchen Wan, and Dale Schuurmans from Google Deepmind for developing the fundamental principles on proxy models in &lt;/span&gt;&lt;a href="https://arxiv.org/pdf/2407.09522" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;UQE&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and for their continuous guidance &amp;amp; support along our journey to bring them to Cloud customers. We also thank Yeounoh Chung and Fatma Özcan, our partners in the System Research Group, as well as the AlloyDB and BigQuery engineering teams.&lt;/span&gt;&lt;/em&gt;&lt;/sub&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 13 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/more-than-100x-faster-and-cheaper-llm-powered-sql-queries-with-proxy-models/</guid><category>AI &amp; Machine Learning</category><category>Databases</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>The power of LLMs on your data, more than two orders of magnitude faster and cheaper</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/more-than-100x-faster-and-cheaper-llm-powered-sql-queries-with-proxy-models/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Thibaud Hottelier</name><title>Software Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Yannis Papakonstantinou</name><title>Distinguished Engineer</title><department></department><company></company></author></item><item><title>Cloud Storage Rapid: Turbocharged object storage for AI and analytics</title><link>https://cloud.google.com/blog/products/storage-data-transfer/cloud-storage-rapid-turbocharges-object-storage-for-ai-analytics/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud Next ’26 we &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/next26-storage-announcements?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Cloud Storage Rapid, a family of object storage capabilities for data-intensive workloads like AI and analytics. Out of the gate, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/storage/docs/rapid/high-performance-storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Storage Rapid&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; consists of &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/how-the-colossus-stateful-protocol-benefits-rapid-storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Rapid Bucket&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (formerly Rapid Storage), a high-performance zonal object storage offering, and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/storage/docs/rapid/rapid-cache"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Rapid Cache&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (formerly Anywhere Cache), which accelerates reads on-demand and colocates compute and data for workloads in existing buckets.   &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=EKjCo-0wXao"
      data-glue-modal-trigger="uni-modal-EKjCo-0wXao-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_Xu33ocm.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Cloud Storage Rapid: Turbocharged object storage for AI and analytics&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-EKjCo-0wXao-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="EKjCo-0wXao"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=EKjCo-0wXao"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Storage Rapid is our response to the generational shift in how organizations build with AI. Teams are training trillion-parameter models, deploying inference at global scale, and building autonomous agents that reason over vast amounts of enterprise data. While accelerators like GPUs and TPUs often get the spotlight, they have a critical dependency: storage.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Storage is the engine that feeds accelerators during training, and the fast-access layer that makes real-time inference responsive. But as models scale, storage performance can be a bottleneck. Every time an AI/ML cluster waits on a data read or a checkpoint write stalls, you are paying for expensive compute cycles that aren't doing useful work.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Historically, AI/ML practitioners have had to choose between the specialized performance of a niche, zonal storage system, and the reliability and scale of a global object store like &lt;/span&gt;&lt;a href="https://cloud.google.com/storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Storage&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Many developers value Cloud Storage for its simplicity, scalability, reliability, and cost-effectiveness, but as the AI era has progressed, they’ve been throwing hotter and hotter workloads at it, running training and inference workloads with thousands of GPUs and TPUs. We’ve reached a performance tipping point that traditional object storage was never meant to handle. The Rapid family provides multiple options for co-locating compute workloads directly with high-performance zonal storage. It minimizes I/O bottlenecks that can block accelerators, so that your GPUs and TPUs stay fully saturated and productive. In this blog, let’s take a closer look at Cloud Storage Rapid’s capabilities. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid Bucket&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://docs.cloud.google.com/storage/docs/rapid/rapid-bucket"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Rapid Bucket&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (GA), helps Cloud Storage meet the evolving demands of massive-scale generative AI, analytics, and other high-performance workloads.  It does so by  &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/how-the-colossus-stateful-protocol-benefits-rapid-storage"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;leveraging Colossus&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; the Google distributed storage system that powers Gemini and YouTube, to provide massive read/write performance and ultra-low latency in a dedicated object storage zonal bucket.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning-fast performance&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;By combining the sub-millisecond latency of block-like storage, the throughput of a parallel filesystem, and the scalability and ease of use of object storage, Rapid Bucket provides high performance from the same Cloud Storage that you know and love.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Highlights include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Ultra-low latency&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Achieve up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;20 million queries per second&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;sub-millisecond latency.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Massive scalability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Rapid Bucket delivers &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;15+ TB/s&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; of aggregate read throughput from a single Rapid zonal bucket.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;New semantics:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Enable higher performance with new capabilities such as native appends, unlimited readers (while writing!), and vectored reads.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Optimized for AI and analytics &lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;You can use Rapid Bucket for a variety of demanding scenarios, including AI/ML data preparation, training, checkpointing, batch and streaming analytics processing, and optimizing distributed database architectures.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Key benefits include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Optimized accelerator utilization:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; With Rapid Bucket, we observed &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;50% reduced blocked GPU time&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;2.5x faster data loading&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for multi-modal training runs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster checkpointing&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Rapid Bucket makes checkpoint restores up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;5x faster and writes 3.2x faster&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; compared to traditional object storage. This ensures faster recovery from workload interruptions, minimizes wasted accelerator time, and increases overall efficiency.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;&amp;gt;5x faster checkpoint restores&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;with Rapid Bucket&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_5x_faster_checkpoint_restores_with_Rapid.max-1000x1000.png"
        
          alt="1_5x faster checkpoint restores with Rapid Bucket"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;&amp;gt;3.2x faster checkpoint writes with Rapid Bucket&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_3.2x_faster_checkpoint_writes_with_Rapid.max-1000x1000.png"
        
          alt="2_3.2x faster checkpoint writes with Rapid Bucket"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can get started with Rapid Bucket &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/storage/docs/rapid/rapid-bucket"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid Cache&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Originally announced at Cloud Next ‘25, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/storage/docs/anywhere-cache"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Rapid Cache&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; accelerates bandwidth for AI/ML workloads like data prep, training, and bursty model loading for inference, delivering an aggregate read throughput of 2.5 TB/s for your existing buckets — &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;with no code changes&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. For inference workloads, we’ve observed that Rapid Cache provides up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;2.1x (114%) accelerated model load, resulting in 47% TCO savings.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When combined with multi-region buckets, customers can flexibly access GPUs and TPUs distributed across regions in a geo, while maintaining a single bucket namespace. This eliminates the need for manually orchestrated data movements between buckets, while benefitting from zonally co-located high performance.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_AcceleratedDataAccess.max-1000x1000.png"
        
          alt="3_AcceleratedDataAccess"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;New: Rapid Cache ingest on write&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Customers at some of the world’s largest frontier AI/ML labs told us that they were looking for ways to accelerate reads immediately after a write, such as checkpoint restore workloads or a data prep pipeline that then feeds training. Before, caching the data required an initial read to trigger ingestion, which was served directly from the bucket at standard performance. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Rapid Cache’s new &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/storage/docs/rapid/rapid-cache#ingest-on-write"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;ingest on write&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; feature solves this by simultaneously writing data to the Rapid Cache as it is being written to a Cloud Storage bucket. This proactive approach eliminates the initial cache-miss penalty, and helps workloads benefit from an immediate cache hit on the very first read. This provides up to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;2.2x faster checkpoint restore times&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, allowing training clusters to recover faster from interruption.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Ingest_on_write.max-1000x1000.png"
        
          alt="4_Ingest on write"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To enable ingest on write, simply &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/storage/docs/rapid/use-rapid-cache#console_3"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;modify the ingestion criteria&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; of your existing Rapid Cache. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Rapid Cache’s simplicity and performance has resulted in explosive adoption. In just one year since General Availability, customers have deployed thousands of Rapid Caches with a 20x growth in caches deployed, In fact,&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid Cache serves up to 20% of Cloud Storage’s global egress.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Cutting-edge AI/ML customers deploy their workloads on Rapid Cache, including Anthropic who uses Rapid Cache to improve the resilience of their cloud workload by co-locating data with TPUs in a single zone and providing dynamically scalable read throughput up to 2.5TB/s. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_CustomerLoveRC.max-1000x1000.png"
        
          alt="5_CustomerLoveRC"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Case study: Thinking Machines Lab&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Thinking Machines Lab is an artificial intelligence research and product company. Its mission is to make AI systems that are adaptable and customizable, building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Next ‘26, James Sun, Member of Technical Staff at Thinking Machines Lab, spoke at our &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=EKjCo-0wXao" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;session&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Cloud Storage Rapid: Turbocharged object storage for AI &amp;amp; Analytics, where he presented about the needs of the data-hungry AI/ML workloads that Thinking Machines Lab runs for high-performance storage at scale.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Thinking Machines runs diverse workflows: data processing in Dataflow, Kafka, and Spark, multi-model training, and serving &lt;/span&gt;&lt;a href="https://thinkingmachines.ai/tinker/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Tinker&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; — a flexible API for fine-tuning open source models. Thinking Machines' workloads run on Google Cloud Storage, Sun explained.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Running these data-intensive AI/ML workloads at such a large scale introduces significant infrastructure challenges. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The first is managing a hub and spoke data architecture, where data processing hubs are located in one primary region while training GPUs are spread across multiple regions. Historically, this has made manual data movement and lifecycle management a major operational pain point. Furthermore, Thinking Machines Lab's workloads such as data prep and pretraining workflows, which rely on massive-scale Spark workloads to prepare their multi-modal datasets, often spike from cold to hot instantly. Previously, these surges led to disruptive 429 errors, which stalled data processing and loading, and interrupted critical training cycles.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To minimize these bottlenecks, Thinking Machines Lab integrated Rapid Cache across their AI/ML pipeline, to positive results. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Rapid Cache has become a core foundation of our AI/ML data infrastructure, supporting our critical workflows, from data prep and pretraining to training and model loading. By acting as a crucial bandwidth shield and booster, it enables us to scale our data-intensive workloads across our entire fleet without compromise, providing us with the on-demand high bandwidth and consistent stability that we need to innovate at speed.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- James Sun, Member of Technical Staff, Thinking Machines Lab&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In short, Cloud Storage and Rapid Cache provides Thinking Machines Lab with:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Easy, instant, scalable, on demand bandwidth:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The team now achieves stable read throughput peaks of over 1.8TB/s. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhanced stability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Rapid Cache has greatly reduced tail-end latencies and 429 errors, providing the consistent performance needed for multi-modal training.  &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fleet-wide scalability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Combined with multi-region buckets, they can now scale data-intensive workloads across their entire fleet, meeting the demands of a rapidly growing compute scale without the hassle of manual data movement while benefiting from zonally colocated storage for high performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Operational efficiency:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The use of Hierarchical Namespace (HNS) has optimized their massive Spark workloads for data preparation, by supporting fast directory renames, along with providing the ability to ramp QPS more quickly as they scale out clusters.  Rapid Cache’s "ingest on write" capability helps ensure immediate cache hits for checkpoint restores.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/6_TMLGCP.max-1000x1000.png"
        
          alt="6_TML+GCP"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Choose your rocket ship&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you are running data preparation, massive-scale training, or low-latency inference, Cloud Storage Rapid delivers high performance together with the reliability and scalability that Cloud Storage is known for. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid Bucket&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; delivers the highest Cloud Storage throughput and queries per second as well as the lowest latency for read/write use cases, such as analytics, AI training, checkpointing, and model serving. This helps to reduce storage bottlenecks and increase compute utilization.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid Cache&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; provides higher read bandwidth and tail latency stabilization in existing buckets, without code changes. Key use cases include AI training, checkpoint restores, and serving, as well as accelerator optionality via multi-region buckets.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Get started with the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/storage/docs/rapid/high-performance-storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Storage Rapid family today&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 11 May 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/storage-data-transfer/cloud-storage-rapid-turbocharges-object-storage-for-ai-analytics/</guid><category>AI &amp; Machine Learning</category><category>Data Analytics</category><category>AI infrastructure</category><category>Storage &amp; Data Transfer</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Cloud Storage Rapid: Turbocharged object storage for AI and analytics</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/storage-data-transfer/cloud-storage-rapid-turbocharges-object-storage-for-ai-analytics/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Marco Abela</name><title>Senior Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Luigi Pontes</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms</title><link>https://cloud.google.com/blog/products/ai-machine-learning/how-basf-manages-thousands-of-supply-chain-decisions-with-alphaevolve/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The agricultural and crop protection supply chain is one of the most intricate networks in the world. It takes up to two years to turn active ingredients into the final products farmers need, and a single change in weather or regulations can disrupt everything. Planners at &lt;/span&gt;&lt;a href="https://agriculture.basf.com/global/en" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BASF Agricultural Solutions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; navigate this reality daily across 180 production sites. To understand how local decisions ripple across their entire global network, BASF turned to &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/alphaevolve-on-google-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlphaEvolve on Google Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to build a digital twin of their supply chain.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Planning across a two-year lead time&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BASF Agricultural Solutions manages a network with over 5,000 distinct value chains. Creating a single end product requires a bill of materials that can be over 30 levels deep, moving across different production sites and regions.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Currently, human planners make thousands of local decisions every day. They decide what to produce, when to produce it, and how much safety stock to hold. Because the network is so large, a planner can’t easily see how a localized decision affects the rest of the global supply chain. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This scale can lead to additional working capital and inventory and or cause production imbalances. Traditional mathematical models struggle to capture the dynamic reality of the network that planners navigate based on years of experience.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Building a foundation for decision support&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlphaEvolve&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is an evolutionary coding agent that generates and refines algorithms autonomously. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;In collaboration with Google Cloud and prognostica GmbH&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; BASF’s objective was not to replace human decision-making, but to establish a new model for decision support that helps planners handle the real-world complexity of the production network.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The team gave AlphaEvolve a foundational "seed" program. This initial code established a standard planning logic that translated demand forecasts into production schedules, serving as a functional baseline before introducing dynamic, network-wide coordination. From there, they fed the model three years of historical data, including inventory levels, market demand, and actual production outputs. AlphaEvolve then generated variations of the code, mutating the logic to see if it could simulate a supply chain that matched the real-world historical data.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Measuring what good looks like in initial tests&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For AlphaEvolve to improve, it needed a specific goal. The evaluation function scored every new piece of generated code on one primary metric: how closely the simulated inventory levels and production decisions matched the actual historical reality recorded by BASF.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The latest AlphaEvolve runs delivered more than 80% relative improvement in accuracy compared to the initial seed model. With further adjustments, the team expects to push performance even higher — bringing the model to a level of accuracy not achieved with other approaches and making it actionable for operational use.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The results&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The evolved planning logic delivered immediate, measurable improvements over the initial seed model. The final algorithm successfully mirrored the actual historical performance of the supply chain, significantly reducing the error rate compared to the initial seed.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“We had several attempts to build a digital twin for our complex supply network using deterministic models, and all of them failed,” said Dr. Goetz Krabbe, vice president for global supply chain at BASF. “By using AlphaEvolve, we cannot only map the complex network based on system data, but at the same time understand and copy the human decisions that drive our daily operations. This gives us a highly accurate and easy to maintain data driven digital twin of the entire network. Using it we can optimize our inventory levels and respond to market volatility with confidence while avoiding stockouts."&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What the evolved algorithm actually does&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By running thousands of experiments, AlphaEvolve developed a clear, human-readable algorithm that explains how the BASF network truly operates. It automatically discovered factually correct, domain-specific supply chain rules that explain the observed production outputs and inventory levels for the tested product value chain:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Production consolidation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The algorithm learned to group production amounts together, accurately mapping how planners optimize plant time.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Dynamic safety stocks:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; It introduced safety stock parameters to handle volatile and seasonal demand patterns, helping to strictly manage capital costs while preventing out-of-stock situations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Network-wide coordination:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The model successfully mapped the dependencies between different production tiers, providing a clear foundation for optimizing asset utilization globally.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What's next&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The initial simulations showed that evolutionary AI can accurately model large-scale, dynamic supply chains. BASF’s objective is to create a digital twin of their entire global production network as a new foundation for simulation, decision support, scenario forecasting and optimization. This will allow the team to continuously simulate operations, identify hidden bottlenecks before they affect throughput, and optimize asset utilization across all global facilities.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sub&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;This project was a collaboration between the BASF SE team including: Benjamin Priese, Michael Arlt, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Debora Morgenstern and Tobias Hausen as well as Manuel Doerr and Thomas Christ from Prognostica GmbH Würzburg, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;and the AI for Science team at Google Cloud including (but not limited to): Kartik Sanu, Laurynas Tamulevičius, Nicolas Stroppa, Chris Page, Srikanth Soma, John Semerdjian, Skandar Hannachi, Vishal Agarwal and Anant Nawalgaria as well as &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Christoph Tittelbach from&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; the Google account team and &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;partners at Google DeepMind&lt;/span&gt;&lt;/sub&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 07 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/how-basf-manages-thousands-of-supply-chain-decisions-with-alphaevolve/</guid><category>Data Analytics</category><category>Customers</category><category>Developers &amp; Practitioners</category><category>Google Cloud in Europe</category><category>AI &amp; Machine Learning</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_BFm5ksn.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_BFm5ksn.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/how-basf-manages-thousands-of-supply-chain-decisions-with-alphaevolve/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Benjamin Priese</name><title>Senior Digital SC Manager, BASF Agricultural Solutions</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Anant Nawalgaria</name><title>Group AI Product Manager &amp; Engineer, Google</title><department></department><company></company></author></item><item><title>Scaling data and AI with Managed Service for Apache Airflow</title><link>https://cloud.google.com/blog/products/data-analytics/managed-apache-airflow-scaling-data-and-ai-workloads/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Orchestration is no longer just about moving data; it is about governing enterprise intelligence. To reflect our deep commitment to and embrace of open-source software, we shared earlier that &lt;/span&gt;&lt;a href="https://cloud.google.com/composer"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Composer&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is now officially &lt;/span&gt;&lt;a href="https://cloud.google.com/composer"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Airflow&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We announced a massive leap forward in our orchestration capabilities, fundamentally reimagining how data teams operate in the AI era. With four major launches, we are embedding AI directly into your workflows to democratize access, accelerate productivity, and power your most demanding MLOps.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Apache Airflow 3.1 is now Generally Available&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We announced &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/composer/docs/composer-versions#images-composer-3"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Apache Airflow 3.1&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in General Availability to power your most demanding AI and MLOps workloads. This release combines the significant foundation of &lt;/span&gt;&lt;a href="https://airflow.apache.org/blog/airflow-three-point-oh-is-here/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Airflow 3.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with the recent community innovations of &lt;/span&gt;&lt;a href="https://airflow.apache.org/blog/airflow-3.1.0/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;3.1&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Key capabilities include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Decoupled architecture:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A robust separation between the entire Airflow system and the execution layer for better scalability and enhanced security. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;DAG versioning:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Native support for automated DAG versioning, retaining the historical structure and run history.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Powerful managed backfills:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A redesigned backfill system that is now a first-class citizen, fully managed by the scheduler.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Event-driven scheduling and data assets:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Enhanced capabilities for triggering workflows based on assets as well as external events, like messages arriving in a message queue.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Human-in-the-Loop (HITL) and deadline alerts:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Pause execution for human decision-making via the UI, and set proactive time-based thresholds for critical pipelines.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;And many more…&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_-_Airflow3.gif"
        
          alt="1 - Airflow3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;2. Agentic troubleshooting with Data Engineering Agents&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Managing complex pipelines just got significantly easier. The Data Engineering Agent is now embedded directly in your Managed Airflow dashboard to quickly analyze logs, identify root causes, and suggest fixes.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid resolution:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By integrating &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/composer/docs/composer-2/troubleshooting-dags#investigations"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Cloud Assist Investigations&lt;/span&gt;&lt;/a&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: super;"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/sup&gt;&lt;span style="vertical-align: baseline;"&gt;, you can leverage AI to troubleshoot DAG Run failures and receive personalized fix proposals directly in the console.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Reduced MTTR:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This agentic approach helps minimize Mean Time to Repair (MTTR) by eliminating manual log parsing. Furthermore, troubleshooting is now elevated to the DAG execution level—rather than just the task level—providing a holistic view of pipeline health.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_-_ComposerTroubleshootingAgent.gif"
        
          alt="2 - ComposerTroubleshootingAgent"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Orchestration pipelines and deployment automation framework&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You no longer need to be an Apache Airflow expert to harness its power. &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/orchestration-pipelines/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Orchestration pipelines&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are a core component of our new cross-product Deployment Automation Framework, allowing you to create end-to-end data pipelines efficiently.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Declarative orchestration:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Define your entire pipeline—including the orchestration logic, infrastructure configuration, and dependencies—in simple, human-readable YAML files.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cross-product bundles:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; These YAML definitions are easily deployed as a complete bundle to the cloud. For example, without knowing Airflow syntax, a user can quickly create and deploy a comprehensive data integration pipeline across dbt, Spark, DTS, and more. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified IDE experience:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Alongside automated validation and deployment via GitHub actions, the Google Data Cloud extension makes agentic authoring and troubleshooting the centerpiece of your workflow. You can now rely on powerful AI agents to build and debug pipelines directly in your IDE, with the ability to visually inspect the agent-generated DAGs for complete oversight.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Crucially, this declarative approach breaks down the traditional silos between advanced Python developers and data analysts. By shifting to human-readable YAML, we are fostering a more inclusive data culture where a wider range of practitioners can independently author, understand, and manage critical data workflows.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;4. MCP Server for Managed Airflow (Public Preview)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To further bridge the gap between AI and orchestration, we are launching the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/composer/docs/composer-3/use-composer-mcp"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Airflow MCP Server&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in Public Preview.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic tooling:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This server provides tools like &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;list_environments&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;get_dag_run&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;get_task_instance&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; to fetch critical information about your environments.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Seamless integration &amp;amp; reduced context-switching:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Both humans and agents can use these tools to simplify task management. Most importantly, this drastically reduces the context-switching developers face when debugging complex DAGs. By bringing environment and task data directly into your preferred interfaces, you can troubleshoot faster without constantly pivoting between different consoles.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Embrace the future of data orchestration&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With these launches, we are fundamentally lowering the barrier to entry for orchestration while simultaneously raising the ceiling for what power users can achieve. By taking away the infrastructure burden and providing native, agentic tooling, data teams can stop wrestling with boilerplate code and start focusing primarily on deriving insights and driving business value.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you are a seasoned Data Engineer building dynamic Python DAGs or a Data Analyst defining straightforward YAML pipelines, Managed Service for Apache Airflow is built for you.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Get Started Today&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Ready to experience the next generation of data pipeline orchestration? Create a new environment in the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/managed-airflow/"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, explore the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Data Agent Kit extension&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and start building your agentic future today.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sub&gt;&lt;em&gt;1. &lt;span style="vertical-align: baseline;"&gt;Availability might be limited (&lt;/span&gt;&lt;a href="https://docs.cloud.google.com/cloud-assist/investigations"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;details&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;)&lt;/span&gt;&lt;/em&gt;&lt;/sub&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 04 May 2026 18:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/managed-apache-airflow-scaling-data-and-ai-workloads/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Scaling data and AI with Managed Service for Apache Airflow</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/managed-apache-airflow-scaling-data-and-ai-workloads/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Piotr Wieczorek</name><title>Senior Product Manager, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Rafal Biegacz</name><title>Lead Engineering Manager</title><department></department><company></company></author></item><item><title>UKG unlocks real-time workforce intelligence at scale with the Agentic Data Cloud</title><link>https://cloud.google.com/blog/products/databases/how-ukg-taps-workforce-intelligence-with-the-agentic-data-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At UKG, we’ve spent years building and expanding our human capital management (HCM) and workforce management (WFM) solutions with new products, capabilities, and a series of acquisitions. Our cloud platform includes a suite of connected systems that support every corner of the employee experience, including scheduling and workforce operations, HR and payroll, and culture and engagement tools. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These connected tools offer customers incredible depth, but it also means our backend reflects years of evolution. We have 126 application teams, dozens of tech stacks, and more than 12,000 database instances inherited through acquisitions and product growth. And each product carries its own schema and operational footprint.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Previously, data moved through bespoke pipelines not built for real-time use. As AI advanced, expectations did too. Customers wanted instant insights across HR, time, pay, culture, and operations, and those insights increasingly needed to drive automated workflows and intelligent applications.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Internally, teams needed consistent, high-performance access to shared data to innovate faster and modernize our architecture. We needed a unified foundation for the next generation of intelligence across our suite. That’s why we built People Fabric, our new data and intelligence platform powered by &lt;/span&gt;&lt;a href="https://cloud.google.com/products/alloydb"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB for PostgreSQL&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and the just-announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/whats-new-in-the-agentic-data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=d2AONtZFsdM"
      data-glue-modal-trigger="uni-modal-d2AONtZFsdM-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_wyY212d.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;How UKG uses AlloyDB to scale its People Fabric platform&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-d2AONtZFsdM-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="d2AONtZFsdM"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=d2AONtZFsdM"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Unifying the systems behind the suite&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;People Fabric started with a simple need: bring the full UKG suite onto one real-time foundation. Getting there started with defining a single canonical data model for the entire suite. This would serve as the shared language for people, work, pay, and culture data — consistent no matter where the information originated. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We needed an operational database that could ingest changes quickly and scale horizontally. That’s why we chose AlloyDB as the core of People Fabric. It gives us millisecond-level read-after-write behavior, high-throughput ingestion, scalable read pools, and native vector capabilities to support AI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With the model defined and the operational store selected, the next step was building the pipeline that feeds the platform. We created a custom change data capture (CDC) framework to extract changes from our existing operational databases inherited over the years. Those changes flow through &lt;/span&gt;&lt;a href="https://cloud.google.com/products/dataflow"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dataflow&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, where they’re transformed into the canonical structure that AlloyDB for PostgreSQL expects. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once in AlloyDB, that data becomes the real-time backbone of the platform. Applications use it for near-instant queries. AI agents rely on it for cross-domain decisions, and vector search engines use it to power natural-language and similarity-based experience layers. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For larger analytical workloads, the same data flows into &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which gives our teams and our customers the ability to perform organization-wide reporting and analysis without straining the system. &lt;/span&gt;&lt;a href="https://cloud.google.com/sql"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud SQL&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; holds the metadata and tenancy context that govern who can see what and how different parts of the suite interact with People Fabric. From there, the system runs continuously. Data enters through streaming ingestion and gets modeled once in AlloyDB for PostgreSQL to make it available everywhere.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Build smarter with Google Cloud databases!&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7efe8f3f52e0&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Bringing people intelligence to intelligent people&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With the architecture in place, People Fabric gives us something we never had before: a complete and consistent view of people, work, pay, and culture data that’s updated continuously and ready for AI to use in real time. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That unified context is what powers our assistive experiences, including conversational reporting and natural-language interactions. Leaders can ask questions in plain English and get answers that reflect the full picture — not just a single system’s slice of it.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With the power of &lt;/span&gt;&lt;a href="https://cloud.google.com/data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google’s Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, our platform unifies analytical and transactional data to power real-time AI. This allows agents to reason over live workforce signals and trigger immediate actions. Because this data is governed and modeled from the start, our agents can reliably handle multi-step workflows across HR, payroll, and timekeeping. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether they're identifying pay discrepancies, adjusting schedules, or flagging compliance risks, they operate with the same shared semantics and security model that guides our applications. It’s the difference between AI that reacts and AI that can truly assist.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Driving impact across every layer&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For engineering teams, People Fabric acts as a database-as-a-service that removes the need for each microservice to manage its own datastore or pipelines. This accelerates development and supports modernization without customer disruption. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AlloyDB for PostgreSQL delivers millisecond read-after-write behavior, zero replication lag, and near-real time ingestion latency, enabling real-time workloads with far less complexity. Migrating core person and employment data off our on-prem monolith has generated cost savings significant enough to fund half of People Fabric.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Real-time operational data now gives managers a live view of staffing, pay, and workforce activity. More than 1,000 organizations are already on the platform, with another 1,000 in progress. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As we continue expanding People Fabric, we’re laying the groundwork for deeper agentic automation, more responsive analytics, and a growing set of AI-driven capabilities — all on a trusted, scalable foundation built for what’s next.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;UKG’s success illustrates how leveraging AlloyDB for PostgreSQL and the Agentic Data Cloud allows organizations to unify operational and analytical data, creating the essential foundation for real-time, agentic AI.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about &lt;/span&gt;&lt;a href="https://cloud.google.com/products/alloydb"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB for PostgreSQL&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and get started with a free trial today!&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cgc-ui-preview.corp.google.com/bricks_preview/resources/offers/data-strategy-workshop?pageiddeb=3193ff41-560a-43d2-93d2-83c693c386a7&amp;amp;hl=en&amp;amp;e=StableIdToEditorFeatureClickToFocusEditorLaunch::Launch::Enrolled" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Sign up for a strategy workshop today&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on how to get your data ready for the agentic era!&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Wed, 29 Apr 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/databases/how-ukg-taps-workforce-intelligence-with-the-agentic-data-cloud/</guid><category>Data Analytics</category><category>AI &amp; Machine Learning</category><category>Customers</category><category>Databases</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/ukg-agentic-data-cloud-hero.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>UKG unlocks real-time workforce intelligence at scale with the Agentic Data Cloud</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/ukg-agentic-data-cloud-hero.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/databases/how-ukg-taps-workforce-intelligence-with-the-agentic-data-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Radhi Chagarlamudi</name><title>Group Vice President, Product Engineering, UKG</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Heather White</name><title>Cloud Data Architect, Google Cloud</title><department></department><company></company></author></item><item><title>Mapping a smarter future with BigQuery and Google Earth AI models and datasets</title><link>https://cloud.google.com/blog/products/data-analytics/google-earth-ai-models-and-datasets-in-bigquery/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Last year we &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/sustainability/new-geospatial-datasets-in-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;introduced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; new geospatial analytics capabilities integrated for BigQuery. Building on this,  we announced an &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;expanded suite of tools&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; at Google Cloud Next ‘26, designed to help your business unlock deeper insights and make smarter, data-driven decisions. These Google Maps Platform models and datasets, leveraging innovation from &lt;/span&gt;&lt;a href="https://ai.google/earth-ai/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Earth AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, are integrated with BigQuery and Gemini Enterprise Agent Platform. They help you transform geospatial information into actionable intelligence, empowering you to understand our planet and its communities like never before.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Harnessing AI for planetary understanding&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In March &lt;/span&gt;&lt;a href="https://mapsplatform.google.com/resources/blog/turning-280-billion-images-into-actionable-infrastructure-insights-street-view-insights-is-now-generally-available/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;we launched Street View Insights&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in general availability, which draws on Google Street View’s vast repository of over 280 billion images and turns them into actionable understanding of physical infrastructure. This enables customers in telecom, utilities and the public sector to reduce weeks of manual work to minutes and get insights right from their desks. In the coming weeks we’re bringing the experimental release of LiDAR data to Street View Insights, providing precise measurements of infrastructure. With this, you can accurately determine the height of utility poles, the clearance of overhead lines, or the specific dimensions of road signs without having to manually gather measurements from the field.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re also expanding our Imagery portfolio in the coming weeks to include the experimental release of Aerial and Satellite Insights, providing a multi-perspective view of infrastructure that includes aerial, satellite and Street View imagery. This will help organizations manage assets at scale and with context. You can now combine top-down aerial and satellite views for large-scale planning and regional assessments with the ground-level detail of Street View to verify specific asset conditions.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, we’re taking geospatial analysis to new heights with our Aerial and Satellite Models, developed as part of Google Research’s &lt;/span&gt;&lt;a href="https://research.google/blog/google-earth-ai-unlocking-geospatial-insights-with-foundation-models-and-cross-modal-reasoning/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Remote Sensing Foundation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; effort, and now available in experimental within Model Garden. Now you can license our stand-alone model to build custom applications on any high-resolution aerial or satellite imagery source. Read our &lt;/span&gt;&lt;a href="https://mapsplatform.google.com/resources/blog/unlocking-a-new-dimension-of-understanding-advanced-geospatial-ai-using-google-imagery" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to learn more about Street View Insights, Aerial and Satellite Insights, and Aerial and Satellite Models.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Fiq2gO6.max-1000x1000.jpg"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pwf8k"&gt;With Aerial and Satellite Models, an energy analyst can type a prompt like “find large HVAC cooling towers”. The model identifies relevant cooling tower objects across large geographies.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;How Vantor is using Aerial and Satellite Models&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Following a severe storm, recovery teams need a clear picture of the damage to help communities rebuild. Vantor, a leading spatial intelligence company, uses these models in its Sentry application to turn raw satellite imagery into actionable insights. This helps organizations quickly identify washed-out roads and damaged infrastructure, so they can proactively remove storm debris and prioritize long-term repairs.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“The combination of Vantor’s spatial foundation and Google’s Aerial and Satellite Models is creating a new class of geospatial intelligence systems that can interpret activity across the planet, surface meaningful signals, and deliver insights directly into operational workflows. In demonstrations with customers, where we’ve integrated models into our persistent monitoring application called Sentry, the level of insight has been remarkable.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Peter Wilczynski, Chief Product Officer, Vantor&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_xPhNhU8.gif"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="n0p6m"&gt;Vantor’s Sentry application uses Aerial and Satellite Models to turn raw imagery into actionable insights. After a storm, this helps their own users quickly identify washed-out roads and damaged infrastructure, so they can proactively remove storm debris and prioritize long-term repairs.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Understanding communities&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To learn about populations and their behaviors, researchers typically rely on three types of data sources — censuses, surveys, and satellite imagery — all of which are infrequently updated and can lack scale.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To address this, we’re announcing the preview of Population Dynamics Insights, a first-of-its-kind geospatial embeddings dataset powered by Google Research’s &lt;/span&gt;&lt;a href="https://research.google/blog/insights-into-population-dynamics-a-foundation-model-for-geospatial-inference/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Population Dynamics Foundation Model&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (&lt;/span&gt;&lt;a href="https://arxiv.org/pdf/2411.07207" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;PDFM&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) designed to help organizations decode the complex relationship between human behavior and the physical world. By distilling anonymized trends derived from Google search trends, Google Maps points of interest, busyness, air quality and pollen data into rich 330-dimensional vectors for places across the globe, it enables a new era of spatial machine learning without the need for manual feature engineering. Learn more in our &lt;/span&gt;&lt;a href="https://mapsplatform.google.com/resources/blog/from-static-maps-to-geospatial-ai-announcing-population-dynamics-insights" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_VFBoi0e.max-1000x1000.png"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Safer and smarter road networks&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We want to help local authorities make roads safer and smoother for everyone. That’s why we’re adding new preview features to Road Management Insights. You can now measure vehicle counts, to provide accurate traffic estimates that are required to evaluate the impact of new roads, bridges, and major maintenance projects. We’re also adding real-time disruptions for things like road closures that provide early signals about the potential reasons for traffic slowdowns. Finally, we’re announcing that Road Management Insights is expanding beyond the public sector, and is now available to logistics and roadside assistance companies. Get more information in our &lt;/span&gt;&lt;a href="https://mapsplatform.google.com/resources/blog/roads-management-insights-expands-with-new-capabilities-for-the-public-and-private-sectors" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Accelerate renewable energy adoption&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re also introducing the experimental release of Solar Insights, now available in BigQuery. Built on the same imagery data and models available within Aerial and Satellite Insights and Aerial and Satellite Models, it provides high-resolution, building-level data on solar potential and existing arrays to help utilities and service providers accelerate renewable-energy adoption and optimize network planning. With Solar Insights, you can predict the next frontier of renewable energy market opportunities with BigQuery. Overlay information about solar potential per building, along with existing solar deployments to reveal untapped market opportunities and optimize investment strategies. Additionally, integrating these building-level details with our weather models and historical weather data allows you to accurately predict rooftop solar power contributions, increasing energy reliability and driving more profitable investments in renewable infrastructure. Learn more about Solar Insights &lt;/span&gt;&lt;a href="https://mapsplatform.google.com/resources/blog/unlocking-a-new-dimension-of-understanding-advanced-geospatial-ai-using-google-imagery" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Optimize health and well-being&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Understanding how environmental factors impact health is more crucial than ever. We're excited to announce new environment datasets, now available in experimental through Google Maps Platform. These datasets provide air quality, pollen and weather insights, and enable you to go beyond real-time data to unlock environmental understanding through hyper-local, high-resolution historical data that’s tightly integrated with BigQuery. This makes it easy to spot long-term patterns, like how allergy seasons affect your business or where air quality impacts public health. By mixing this environmental data with your own records, you can stop reacting to the weather and start planning for it. Whether you're deciding where to send resources or how to protect your customers, you’ll have the full picture of how the environment shapes your world. Read more in our &lt;/span&gt;&lt;a href="https://mapsplatform.google.com/resources/blog/from-reaction-to-resilience-empowering-industries-with-advanced-environmental-intelligence" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_UUdqdpT.max-1000x1000.png"
        
          alt="4"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="n0p6m"&gt;Visualizing the median PM2.5 levels in Manhattan on a specific day&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;How these datasets can work together&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;One example of how these datasets can work together is Google for Health's &lt;/span&gt;&lt;a href="https://blog.google/innovation-and-ai/technology/health/google-ai-heart-health-australia/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Population Health AI (PHAI)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;an advanced&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; analytics&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; eng&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;ine that helps identify hidden health risks within communities. The goal is to equip our partners with insights that could help them shift from treating problems to proactively managing chronic condition risks. To provide this comprehensive view, PHAI utilizes Google Maps Platform’s Population Dynamics Insights, Places Insights and air quality and pollen datasets. By analyzing these diverse, de-identified data sets — ranging from geographic factors like the air we breathe to local access to fresh food — the AI model helps healthcare providers understand the shift from reactive treatment to proactive, tailored management of chronic condition risks for specific towns or postcodes.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ready to explore what's possible? &lt;/span&gt;&lt;a href="https://mapsplatform.google.com/maps-products/geospatial-analytics/?utm_source=product-page&amp;amp;utm_medium=blog&amp;amp;utm_campaign=cloud-next-2026&amp;amp;utm_content=gmp-cloud-blog-website" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Visit our website&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to discover how Google's geospatial analytics can help you unlock your next big opportunity, or &lt;/span&gt;&lt;a href="https://mapsplatform.google.com/lp/geospatial-analytics-signup/?utm_source=landing-page&amp;amp;utm_medium=blog&amp;amp;utm_campaign=cloud-next-2026&amp;amp;utm_content=gmp-cloud-blog-signup" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sign up&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for early access and to learn more.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 27 Apr 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/google-earth-ai-models-and-datasets-in-bigquery/</guid><category>Maps &amp; Geospatial</category><category>Data Analytics</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/0_hero_ZNSV49C.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Mapping a smarter future with BigQuery and Google Earth AI models and datasets</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/0_hero_ZNSV49C.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/google-earth-ai-models-and-datasets-in-bigquery/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Greg Leon</name><title>Group Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Dan Meyer</name><title>Product Marketing Manager</title><department>Google Maps Platform</department><company></company></author></item><item><title>Day 1 at Google Cloud Next ‘26 recap</title><link>https://cloud.google.com/blog/topics/google-cloud-next/next26-day-1-recap/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Last year at Google Cloud Next ‘25, we asked you to imagine a new future for AI. At Next ‘26, the question before you is how do you move AI into production across your entire enterprise?&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;According to Google Cloud CEO Thomas Kurian, the answer is straightforward: You need a unified stack, with “chips that are designed for models, models that are grounded in your data, agents and applications that are built with those models,” and the whole thing “secured by the infrastructure,” Thomas said in his keynote. (This is the same unified stack that Google uses for Search, YouTube, Chrome, and Android. As Alphabet CEO Sundar Pichai said in his opening remarks, “a big focus of ours  is to always be customer zero for our own technologies.”)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As AI matures, we’ve laid out a blueprint on how to succeed. Read on for a whirlwind tour of what we announced from the keynote stage&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=11PBno-cJ1g"
      data-glue-modal-trigger="uni-modal-11PBno-cJ1g-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/next26_live_stream.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Google Cloud Next &amp;#x27;26 Opening Keynote&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-11PBno-cJ1g-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="11PBno-cJ1g"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=11PBno-cJ1g"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Gemini Enterprise: The end-to-end system for the agentic era&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Throughout this unified stack is Gemini Enterprise — “the connective tissue between your data, your people, and your goals,” Thomas said, providing a combination of intelligence and automation across multiple layers. Here’s what’s new.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_103840-5416_ALIVE.max-1000x1000.jpg"
        
          alt="GCNEXT2026_0422_103840-5416_ALIVE"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3 role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;1. Gemini Enterprise Agent Platform&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Gemini Enterprise Agent Platform is where you go to build, scale, govern, and optimize agents. As the evolution of Vertex AI, it’s built on top of our leading infrastructure, and deeply integrated with our data and security capabilities — the foundation of the Agentic Enterprise. Here’s a sampling of Agent Platform’s new features and capabilities:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Build:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Choose the right environment for the job — from the low-code, visual interface of the new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Studio&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, to the code-first logic of the upgraded &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Development Kit (ADK)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. We’ve simplified the entire lifecycle with AI-native coding capabilities to help you ship production-grade agents faster.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Clear the path to production with the re-engineered &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Runtime&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. This supports long-running agents that maintain state for days at a time and are backed by &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Memory Bank&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for persistent, long-term context.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Govern:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Establish centralized control with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Identity&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Registry&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Gateway&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. These capabilities help ensure every agent — whether built on Agent Platform or sourced from our partner ecosystem — has a trackable identity and operates within enterprise-grade guardrails. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Optimize:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Guarantee quality with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Simulation&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Evaluation&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Observability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. These tools provide full execution traces and a real-time lens into agent reasoning to help ensure your agents always hit their goals.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To dive deep into Agent Platform, read more in our announcement blog &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h3 role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;2. Gemini Enterprise app&lt;/span&gt;&lt;/h3&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_091032-8701_ALIVE.max-1000x1000.jpg"
        
          alt="GCNEXT2026_0422_091032-8701_ALIVE"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Gemini Enterprise app is&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; “the primary environment where your business actually operates,” Thomas explained. The app is where many workers, especially non-technical ones, can ask questions of enterprise agents, create generative media, engage with prebuilt agents, and even create their own with conversational interfaces — all with governance, compliance, and security built in. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Here’s a sample of what’s new in this foundational interface: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini Enterprise Projects &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;give your agents permanent memory.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Deep Think &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;solves your most complex business challenges without context pollution.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Microsoft 365 &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;interoperability makes it easy to export docs you create with Canvas into Microsoft Office formats.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To illustrate the power of Gemini Enterprise, Shaun White, three-time Olympic gold medalist, entrepreneur, and snowboarding legend, joined us on stage.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“Back when I was training, our tools were camcorders and guesswork. You’d land a trick and watch it back. And you’d be thinking, ‘How can I make that trick better?’” he said.  &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alongside Jason Davenport, Google Cloud Tech Lead, they showed a model that Google Cloud built in collaboration with Google DeepMind that tracked Shaun in space from a two-dimensional video, helping him understand what he was doing right and wrong. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“Learning the trick on the mountain is one thing, but actually understanding the physics of a trick is a whole other thing,” Shaun said.  &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Read more on the Gemini Enterprise app &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/whats-new-in-gemini-enterprise?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_093546-4510_ALIVE.max-1000x1000.jpg"
        
          alt="Shaun White"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;div data-draftjs-conductor-fragment='{"blocks":[{"key":"dsr0p","text":"3. AI Hypercomputer","type":"unstyled","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}},{"key":"4v39t","text":"The same technology foundation that athletes like Shaun White use to understand their performance is being used by enterprises to transform their businesses. ","type":"unstyled","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}},{"key":"fhjv0","text":"Amin Vadhat, SVP and chief technologist, AI and Infrastructure, took to the stage to announce enhancements to AI Hypercomputer, the integrated supercomputing underneath every AI workload on Google Cloud.","type":"unstyled","depth":0,"inlineStyleRanges":[],"entityRanges":[{"offset":110,"length":16,"key":0}],"data":{}}],"entityMap":{"0":{"type":"LINK","mutability":"MUTABLE","data":{"url":"https://cloud.google.com/solutions/ai-hypercomputer"}}}}'&gt;
&lt;div class="Draftail-block--unstyled" data-block="true" data-editor="fujua" data-offset-key="ma2lh-0-0"&gt;
&lt;h3 class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="ma2lh-0-0"&gt;&lt;span data-offset-key="ma2lh-0-0"&gt;3. AI Hypercomputer&lt;/span&gt;&lt;/h3&gt;
&lt;/div&gt;
&lt;div class="Draftail-block--unstyled" data-block="true" data-editor="fujua" data-offset-key="25o56-0-0"&gt;
&lt;div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="25o56-0-0"&gt;&lt;span data-offset-key="25o56-0-0"&gt;The same technology foundation that athletes like Shaun White use to understand their performance is being used by enterprises to transform their businesses. &lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="Draftail-block--unstyled" data-block="true" data-editor="fujua" data-offset-key="et8mb-0-0"&gt;
&lt;div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="et8mb-0-0"&gt; &lt;/div&gt;
&lt;div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="et8mb-0-0"&gt;&lt;span data-offset-key="et8mb-0-0"&gt;Amin Vadhat, SVP and chief technologist, AI and Infrastructure, took to the stage to announce enhancements to &lt;/span&gt;&lt;a class="TooltipEntity" data-draftail-trigger="true" href="https://cloud.google.com/solutions/ai-hypercomputer" role="button"&gt;&lt;span data-offset-key="et8mb-1-0"&gt;AI Hypercomputer&lt;/span&gt;&lt;/a&gt;&lt;span data-offset-key="et8mb-2-0"&gt;, the integrated supercomputing underneath every AI workload on Google Cloud.&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_094223-0331_ALIVE.max-1000x1000.jpg"
        
          alt="GCNEXT2026_0422_094223-0331_ALIVE"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;First, there’s the eighth-generation Tensor Processing Unit, or TPU — “a thing of beauty,” Amin said. And because “the demands of training and serving have completely diverged,” this TPU family actually consists of two chips: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;TPU 8t, optimized for training&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, uses new Inter-Chip Interconnect (ICI) technology to scale up to 9,600 TPUs and 2 petabytes of shared, high-bandwidth memory in a single superpod. It achieves three times the processing power of Ironwood and delivers up to double the performance per watt. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;TPU 8i, optimized for inference&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, uses the new Boardfly topology to directly connect 1,152 TPUs in a single pod. It features three times more on-chip SRAM compared to previous versions and a specialized Collectives Acceleration Engine offloads resource-heavy tasks. Taken together, TPU 8i delivers 80% better performance per dollar for inference than the prior generation, helping &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;millions of concurrent agents to run cost-effectively&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI Hypercomputer also supports Arm-based Google Cloud Axion processors, such as the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;N4A, now generally available&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and will be one of the first platforms to offer &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;NVIDIA’s Vera Rubin NVL72&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; platform when it is released.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Other parts of the network need to keep up with the demands of the agentic era. For example, the new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Virgo Network&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; doubles connectivity to scale training beyond AI Hypercomputer superpods, and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud Managed Lustre&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; now supports an industry-leading 10 terabytes per second of throughput. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about all of our AI infrastructure innovations &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/ai-infrastructure-at-next26?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h3 role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;4. Agentic Data Cloud&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The AI era hinges on data. Lots of data. That data comes with a catch: It needs to be grounded in context. That’s because “reasoning without context is just a guess,” explained Karthik Narain, chief product and business officer.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_085702-_ALIVE.max-1000x1000.jpg"
        
          alt="GCNEXT2026_0422_085702-_ALIVE"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To that end, we’re totally rethinking our data platform, and giving it a new name: the Agentic Data Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here’s a sampling of what you’ll find in the Agentic Data Cloud:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Knowledge Catalog&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; constructs a unified, dynamic context graph of your entire business enabling you to ground agents in all of your business data and semantics. With &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Smart Storage&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and the Object Context API, files in Google Cloud Storage are instantly tagged and enriched with metadata before an agent touches them. Knowledge Catalog is also integrated with Gemini’s &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Deep Research Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data Agent Kit&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; delivers a Gemini-powered data science authoring experience across your IDEs, Notebooks, and agentic terminals. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning Engine for Apache Spark &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;is a real-time, serverless engine that is up to 4.5 times faster than open-source alternatives and offers up to double price-performance over the leading competitor for large datasets.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cross-Cloud Lakehouse&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, based on Apache Iceberg, lets you query data in Amazon Web Services or Azure without having to copy it.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Learn all about all the innovations in the Agentic Data Cloud &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/whats-new-in-the-agentic-data-cloud?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;5. Agentic Defense&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI makes — and demands — that everything go faster, and security operations are no exception. Increasingly, “human analysts can’t keep up with AI-driven attacks,” said Francis deSouza, COO, and president, Security Products. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“Security must become an autonomous force, responding faster than the threat itself,” he said. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To help, we introduced three new agents in &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/security-operations"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Security Operations&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to help you defend at the speed of AI. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Threat Hunting agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Helps teams proactively hunt for novel attack patterns and stealthy adversary behaviors that bypass traditional defenses. Now in preview. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Detection Engineering agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Identifies coverage gaps and creates new detections for threat scenarios. Now in preview. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Third-Party Context agent: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Enriches workflows with contextual data from third-party content. Coming soon to preview.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can also &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;build your own security agents&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; with remote Google Cloud MCP server support for Google Security Operations, now generally available, and access it from the Google Security Operations chat interface, now in preview.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_100749-9822_ALIVE.max-1000x1000.jpg"
        
          alt="GCNEXT2026_0422_100749-9822_ALIVE"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then there’s &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/google-completes-acquisition-of-wiz?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Wiz, now a part of Google Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, whose AI-Application Protection Platform (AI-APP), Wiz Security Agents, and Wiz Workflow help you identify and respond to risks and threats at machine speed. New in the Wiz family today are&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;:  &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Secure vibe-coded applications: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;A new integration runs Wiz security scanning directly inside the Lovable platform so vulnerabilities, secrets, and misconfigurations caught by Wiz surface in Lovable's built-in security view. Generally available in May.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Secure AI-generated code&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Wiz can now remove risks from AI-generated code with inline AI security hooks integrated directly into IDEs and agent workflows, injecting security guardrails before code is committed.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent-based remediation&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Wiz Skills can equip coding agents and AI-native IDEs with full code-to-cloud context and validated attack surface findings from the Wiz Security Graph.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://www.wiz.io/academy/ai-security/ai-bom-ai-bill-of-materials" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;AI-Bill of Materials&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; (AI-BOM):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Work towards eliminating shadow IT by automatically inventorying all AI frameworks, models, and IDE extensions across your environment.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/introducing-google-cloud-fraud-defense-the-next-evolution-of-recaptcha"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Fraud Defense&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;: The evolution of reCAPTCHA, this platform is designed to discern the legitimacy and authorization of bots, humans, and agents. Now generally available.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Read more about these security innovations &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/next26-redefining-security-for-the-ai-era-with-google-cloud-and-wiz?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;h3 role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;6. Workspace Intelligence&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For a look at AI for end-users, we heard from Yulie Kwon Kim, VP, Product, Google Workspace, who shared new ways that AI is manifesting in our collaboration and productivity suite. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Workspace Intelligence &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;is a unifying semantic layer that breaks down information and context silos for you and your agents. It understands your work, your priorities and the people you work with to help you get more done. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_102731-1098_ALIVE.max-1000x1000.jpg"
        
          alt="GCNEXT2026_0422_102731-1098_ALIVE"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“Think of it as a unified intelligence layer that lives inside every Workspace app. It connects the dots and lets AI do the heavy lifting," Yulie said.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here’s what’s new:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Ask Gemini in Google Chat &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;allows you to instantly synthesize information, surface insights, and query projects from across Workspace directly from your Google Chat window. It provides proactive daily briefings to help you prioritize, and also lets you take immediate action — such as scheduling a meeting on your calendar or creating a Google Doc to develop a pre-meeting brief — turning your conversations into momentum without having to switch tabs.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=YppfLqH7Fps"
      data-glue-modal-trigger="uni-modal-YppfLqH7Fps-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_JTY9905.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Ask Gemini in Chat&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-YppfLqH7Fps-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="YppfLqH7Fps"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=YppfLqH7Fps"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Reimagined content creation in Docs, Sheets, and Slides &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;uses Workspace Intelligence to synthesize information from across Workspace and the web and creates professionally formatted drafts that match your voice, style, and brand. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;AI Inbox and AI Overviews in Gmail&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; creates a&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; personal, proactive &lt;/span&gt;&lt;a href="https://blog.google/products-and-platforms/products/gmail/gmail-is-entering-the-gemini-era/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;inbox assistant with Gemini&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Drive Projects&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; instantly organizes your team's files and emails to manage workflows, generate content, and deliver specific answers based on rich project context. In addition to newly added &lt;/span&gt;&lt;a href="https://blog.google/products-and-platforms/products/workspace/gemini-workspace-updates-march-2026/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Overviews and Ask Gemini&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Projects is another way we’re transforming Drive from a storage tool into an active collaborator to provide insights about your data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Workspace agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; in Gemini Enterprise executes complex, multi-step tasks across Google Workspace apps without having to leave Gemini Enterprise. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For more, check out the full Workspace Intelligence &lt;/span&gt;&lt;a href="https://workspace.google.com/blog/product-announcements/introducing-workspace-intelligence?_gl=1*1lcx1ks*_up*MQ..&amp;amp;gclid=CjwKCAjw46HPBhAMEiwASZpLRCh04El-PH-mQX3OW7IcONinrI6ZdqmWKi_j1tyhxEOFnZTaaMBr2xoCFb8QAvD_BwE&amp;amp;gclsrc=aw.ds" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3 role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;7. Agentic Commerce&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For enterprises, AI agents are reshaping how consumers engage with companies and their products. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“In the agentic era, an agent isn’t just a tool; it’s a strategic extension of your business, built to expand your reach, deepen engagement, and personalize service at scale,” said Carrie Tharp, Google Cloud Vice President of Go To Market Strategic Industries. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/products/gemini-enterprise-for-customer-experience"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise for Customer Experience&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; offers a suite of tools to enhance the entire customer journey, from the first moment of discovery through on-going interactions that remember the customer like the best shopkeeper would. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Shopping agent &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Food Ordering agent &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;bring new conversational sales and ordering capabilities direct to businesses and third-party chat and digital interfaces.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Omnichannel Gateway&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; helps agents maintain context across web, mobile, and voice, so a company’s agents can offer more personalized service.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Agent Assist&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; helps during complex customer service situations, coaching employees to deliver fast and more accurate answers to customer questions by having organizational data readily available through gen AI grounding.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With Omnichannel Gateway in particular, about bridging the physical, digital, and agentic shopping experience, so consumers always have a familiar, brand-aware experience. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“If a customer moves from text chat to a phone call, the agent seamlessly remembers exactly where they left off,” said Carrie. Now that’s progress!&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For more customer stories, check out all &lt;/span&gt;&lt;a href="https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;1,302 of the latest gen AI use cases&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; from businesses around the globe.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Innovate all the things &lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;From new products, to new solutions, to new ways of working, there are so many other ways that we’re helping organizations take their AI from pilot to production. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Over the following week, we’ll share even more news, helpful how-to guides, and go deeper on today’s announcements. Stay tuned!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 22 Apr 2026 23:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/google-cloud-next/next26-day-1-recap/</guid><category>AI &amp; Machine Learning</category><category>Application Development</category><category>Data Analytics</category><category>Google Cloud Next</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_090309-3826_ALIVE.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Day 1 at Google Cloud Next ‘26 recap</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/GCNEXT2026_0422_090309-3826_ALIVE.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/google-cloud-next/next26-day-1-recap/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Google Cloud Content &amp; Editorial </name><title></title><department></department><company></company></author></item><item><title>Converging operational and analytical data for AI transformation</title><link>https://cloud.google.com/blog/products/databases/unify-analytical-and-operational-data-for-ai/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To act at the speed of business, AI agents must operate in fast and trusted reasoning loops. They need to “think” by reasoning across both your historical context and your live operational reality. Only by understanding this complete, real-time picture can they “do” — taking immediate action.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For decades, data architectures have been built with a structural wall that breaks this loop, separating the platforms that generate insights from the platforms that manage actions. This latency means insights are gleaned after the critical window for an agent to take action has closed. Achieving true AI transformation requires organizations to move from a passive system of record to a proactive System of Action, built on a closed-loop architecture that converges operational and analytical data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud Next, we announced new unifying capabilities that drive our &lt;/span&gt;&lt;a href="https://cloud.google.com/transform/shift-system-of-action-architecting-the-agentic-data-cloud-AI"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, eliminating silos and enabling 98% of our largest data cloud customers to run operational and analytical workloads in a unified data platform. By operating &lt;/span&gt;&lt;a href="https://cloud.google.com/products/alloydb"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/spanner"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Spanner&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; together, we are delivering an AI-native architecture that unlocks the full potential of your data for real-time, agentic applications.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Flexible, real-time data agents&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To act effectively, agents require both operational and historical signals for sound decision-making. Our &lt;/span&gt;&lt;a href="https://cloud.google.com/data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; bridges the gap between the operational "now" and analytical history by handling the complex plumbing for you. We provide diverse integration models across data federation, reverse ETL, and real-time ingestion to the lakehouse, empowering your agents to make high-stakes decisions with both live context and historical depth.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For example, sometimes an agent driving a live operational application needs to pull historical context on demand. Through &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/bigquery-view-alloydb-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse federation for AlloyDB&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Preview), agents can access &lt;/span&gt;&lt;a href="https://cloud.google.com/biglake"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; data directly from AlloyDB itself. This allows frontline systems to instantly query extensive historical data without relying on brittle data movement pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In other scenarios, the challenge is reversed: deeply complex historical insights have already been calculated in the data warehouse, but an agent needs to deliver them to millions of users at conversational speeds. &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/export-to-spanner"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Reverse ETL for BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Preview) provides a one-click solution to push these heavy analytical insights back into AlloyDB, Bigtable, or Spanner, enabling agents to serve them with sub-millisecond latency.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/One-click_reverse_ETL.png.max-1000x1000.jpg"
        
          alt="One-click reverse ETL.png"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="8ihsc"&gt;One-click reverse ETL&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Teams running real-time analytics on live operational data typically have to move that data into analytical systems — an error-prone process that introduces lag. With &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/spanner/docs/columnar-engine"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Spanner Columnar Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (GA) users can perform analytical queries that run up to 200 times faster with zero impact on production transactional workloads. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, the reasoning loop is not complete until an agent’s real-time action is captured for downstream analysis. To close this loop, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/datastream/docs/destination-blmt"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Datastream for Lakehouse Apache Iceberg tables&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; provides real-time Change Data Capture (CDC) from AlloyDB, Cloud SQL, Spanner, and Oracle directly into the open Lakehouse. This process streams every operational change as an append-only event into Lakehouse tables, making that data immediately available in BigQuery for ML model training, feature engineering, and real-time analytics.  &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"AlloyDB, along with other Google Cloud products like BigQuery, provides the agility and performance needed to continually enhance our platform's capabilities and help us anticipate emerging trends rather than merely reacting.” &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;- Javi Fernández, CTO, Loyal Guru&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Grounding agents in a unified governance foundation&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Inconsistent definitions and unclear data ownership across operational and analytical systems can cause agents to hallucinate. To address this, we are extending &lt;/span&gt;&lt;a href="https://cloud.google.com/dataplex"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Knowledge Catalog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Preview), formerly Dataplex, with new integrations for AlloyDB, BigQuery, Bigtable, Cloud SQL, and Spanner to provide a unified map of your data landscape. Integrations with Oracle AI Database@Google Cloud and Firestore are coming soon. The Knowledge Catalog works by aggregating native context across your Google and partner data platforms, semantic models, and third-party catalogs, unifying them into a single, governed source of truth needed to build and scale reliable agents. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Seven-Eleven Japan created “Seven Central,” a scalable data platform that uses Spanner and BigQuery to provide real-time insights and support the company’s digital innovation strategies. We collect data from all 21,000+ stores, and in anticipation of a future expansion in business operations, we have designed a system that can scale up and run without issue, even if we were to have 30,000 stores, with 1,000 customers per store per day."&lt;/span&gt;&lt;br/&gt;&lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;-Izuru Nishimura, Executive Officer and Head of ICT Department, Seven-Eleven Japan&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Unified engines for deep reasoning&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To move beyond simple Q&amp;amp;A chatbots to autonomous agents, AI must reason across every dimension of your data estate. Historically, combining keyword search, semantic understanding, and relationship mapping meant moving data out of operational databases and into specialized, siloed search engines — introducing latency and complexity.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google’s Agentic Data Cloud eliminates these silos. By embedding native vector and full-text search directly into operational databases like AlloyDB, Bigtable, Cloud SQL, Firestore, and Spanner, agents can execute highly accurate hybrid searches combining keyword relevance and semantic intent. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re also bringing together graph and vector support across BigQuery and Spanner. With graph federation, an agent can match live user intent in Spanner and immediately trace that intent through historical graph relationships in BigQuery Graph — accelerating autonomous decision-making without moving the data. This multi-model approach powers advanced GraphRAG patterns, equipping agents with the rich, interconnected context required to accelerate autonomous decision-making.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“To deliver AI that actually works across HR, payroll, and workforce operations, you need a consistent, real-time data layer. With the power of Google’s Agentic Data Cloud, People Fabric is the backbone of UKG’s Workforce Operating Platform — turning fragmented systems into a single source of truth that powers intelligent, agent-driven experiences.”&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;-Radhi Chagarlamudi, Group Vice President, Product Engineering, UKG&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Built for performance at agent scale&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our Agentic Data Cloud delivers the closed-loop architecture required for the AI era without compromising operational performance. Built on open standards like Iceberg and PostgreSQL, and governed by universal semantics, Google Cloud provides the speed, throughput, and trusted context needed to build the next generation of conversational and autonomous applications.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Build: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Explore the &lt;/span&gt;&lt;a href="https://cloud.google.com/alloydb/docs/ai"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB AI documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to start grounding your agents.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Connect: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Visit the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Console&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;to set up your first federated query to Spanner.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Govern: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Opt-in to the &lt;/span&gt;&lt;a href="https://cloud.google.com/dataplex/docs/introduction"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Knowledge Catalog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for unified visibility.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Wed, 22 Apr 2026 12:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/databases/unify-analytical-and-operational-data-for-ai/</guid><category>Data Analytics</category><category>Google Cloud Next</category><category>Databases</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/GCN26_102_BlogHeader_2436x1200_Opt_20_Light.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Converging operational and analytical data for AI transformation</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/GCN26_102_BlogHeader_2436x1200_Opt_20_Light.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/databases/unify-analytical-and-operational-data-for-ai/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sean Zinsmeister</name><title>Director of Product Management, Data Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sujatha Mandava</name><title>Director of Product Management,  Databases</title><department></department><company></company></author></item></channel></rss>