<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Data Analytics</title><link>https://cloud.google.com/blog/products/data-analytics/</link><description>Data Analytics</description><atom:link href="https://cloudblog.withgoogle.com/blog/products/data-analytics/rss/" rel="self"></atom:link><language>en</language><lastBuildDate>Mon, 22 Jun 2026 19:09:02 +0000</lastBuildDate><image><url>https://cloud.google.com/blog/products/data-analytics/static/blog/images/google.a51985becaa6.png</url><title>Data Analytics</title><link>https://cloud.google.com/blog/products/data-analytics/</link></image><item><title>Boost BigQuery with Python: Managed Python UDFs now generally available</title><link>https://cloud.google.com/blog/products/data-analytics/python-udf-in-bigquery-now-generally-available/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;SQL is the industry standard for high-performance structured data analysis. However, expressing complex procedural logic, scientific computations, advanced string manipulations, or machine learning workflows in pure SQL can be highly challenging, if not impossible. That kind of work is better done with Python. Data practitioners often take on additional infrastructure management tasks &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;—&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; maintaining custom images and containers, and working with additional compute services — just to run simple helper functions with custom Python code and libraries. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we are thrilled to announce the general availability (GA) of&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Managed Python User-Defined Functions (UDFs)&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This launch represents a major milestone in BigQuery’s extensibility strategy, allowing data scientists, engineers, and analysts to execute custom Python code directly and securely inside BigQuery using standard SQL queries or &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/bigquery-dataframes-introduction"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery DataFrames&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (BigFrames) in Python. With this release, Python UDFs are fully supported for production enterprise workloads and completely integrated into BigQuery's billing SKUs. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Bridging SQL and the Rich Python Ecosystem&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery Managed Python UDFs run on BigQuery-managed serverless resources that automatically scales to billions of rows, without having to set up infrastructure or manage containers. BigQuery automatically handles the compilation, image building, security patching, deployment, and execution of your Python code, making it super simple to use Python functions in your SQL.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Core benefits&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Access the vast Python ecosystem — including top-tier scientific and mathematical libraries like NumPy, SciPy, pandas, scikit-learn and more — directly in your SQL select statements.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Tight external API integration:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Clean and enrich your BigQuery tables in real time by calling external web APIs or Google Cloud services such as Cloud Translation, Gemini Enterprise Agent Platform or custom microservices securely within your queries.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fully managed and serverless:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; BigQuery handles the underlying container infrastructure and auto-scales performance dynamically.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Code example &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here is an example of a Python UDF that utilizes a popular Python package —&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt; beautifulsoup&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; — to remove HTML tags. We use this function to process &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;StackOverflow answer bodies that are stored in a BigQuery public table:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;CREATE OR REPLACE FUNCTION `your_project.your_dataset.clean_html`(html_content STRING)\r\nRETURNS STRING\r\nLANGUAGE python\r\nOPTIONS (\r\n  runtime_version = \&amp;#x27;python-3.11\&amp;#x27;,\r\n  entry_point = \&amp;#x27;strip_tags\&amp;#x27;,\r\n  packages = [\&amp;#x27;beautifulsoup4&amp;gt;=4.12.0\&amp;#x27;]\r\n) AS r\&amp;#x27;\&amp;#x27;\&amp;#x27;\r\nfrom bs4 import BeautifulSoup\r\n\r\ndef strip_tags(html_content):\r\n    if not html_content:\r\n        return &amp;quot;&amp;quot;\r\n    soup = BeautifulSoup(html_content, &amp;quot;html.parser&amp;quot;)\r\n    return soup.get_text(separator=&amp;quot; &amp;quot;)\r\n\&amp;#x27;\&amp;#x27;\&amp;#x27;;&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a5776760&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;How to query it:&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;SELECT \r\n  id, \r\n  `your_project.your_dataset.clean_html`(body) AS cleaned_answer_body\r\nFROM \r\n  `bigquery-public-data.stackoverflow.posts_answers`\r\nLIMIT 100&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a57767f0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced capabilities&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For advanced users, Python UDF adds a set of capabilities to tune the performance as well as monitor the usage. Here are some examples. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Vectorized processing with Pandas PyArrow&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;To maximize throughput, the GA release supports direct processing of vectorized input as PyArrow RecordBatches. By processing columns of data in bulk rather than row-by-row, PyArrow eliminates Python serialization and conversion overhead, boosting performance by up to 10x for data-intensive calculations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Configurable container resources&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;For heavy-duty data science and ML data preparation, you can now provision container memory (up to 16 GB) and CPU (up to 4 vCPUs) per function. This enables memory-intensive workloads (such as loading large serialized models or geospatial datasets) to run directly within the sandbox.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Customizable concurrency&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Optimize your throughput and resource efficiency by configuring concurrent requests per container (up to 1,000 concurrent operations). This helps ensure that your scale-out execution is highly cost-effective and performs exceptionally well under heavy parallel loads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Streaming logs and real-time metrics&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Easily d&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;ebug and monitor your production workloads. The BigQuery console now features a direct link from your query results to real-time CPU, memory, and concurrency metrics in Cloud Monitoring.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Billing&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery Managed Python UDF are billed with &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery/pricing#bigquery-services-pricing"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Services SKU&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This SKU is fully eligible for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery spend commitment-based usage discounts (CUDs)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, allowing you to maximize budget efficiency.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can also get cost observability through &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;INFORMATION_SCHEMA.JOBS &lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;as well as using billing labels &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;MANAGED_ROUTINE_EXECUTION&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;MANAGED_ROUTINE_BUILD&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;See more details in the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python#pricing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Pricing&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; section of the documentation. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Getting started &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To get started with BigQuery Python UDFs, first check out &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;product documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then, try out the functions &lt;/span&gt;&lt;a href="https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1sbigquery-public-data!2spython_udfs!3stokenize"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;published&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in the public BigQuery dataset. For example, run the following code in a BigQuery project to tokenize country names data from BigQuery public data. Under the hood, the token UDF utilizes the &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;o200k_base&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; tokenizer library.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;SELECT \r\n  country_code,\r\n  country_name,\r\n  `bigquery-public-data`.python_udfs.tokenize(country_name) AS name_tokens,\r\n  ARRAY_LENGTH(`bigquery-public-data`.python_udfs.tokenize(country_name)) AS token_count\r\nFROM \r\n  `bigquery-public-data.census_bureau_international.country_names_area`\r\nORDER BY \r\n  country_name&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a5776dc0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Or, try out this &lt;/span&gt;&lt;a href="https://codelabs.developers.google.com/managed-python-udfs" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;code lab&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to explore some advanced scenarios. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then, to learn how to implement other advanced design patterns, we encourage you to explore our official public documentation guides: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Calling Google Cloud or online services (with connections):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; To connect to first-party Google Cloud services such as Gemini Enterprise Agent Platform or Cloud Translation, or external API endpoints securely using Cloud Resource connections, - check out the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python#use-online-service"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Call Google Cloud or online services in Python code guide&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery DataFrames (BigFrames) Python UDFs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;To learn how to write, deploy, and scale custom Python functions natively from standard Jupyter notebook or Colab environments using BigQuery DataFrames, visit the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python#bigquery-dataframes_1"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Customize Python functions for BigQuery DataFrames guide&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Bring your Python workflows out of isolation and directly into the heart of your data warehouse today!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 22 Jun 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/python-udf-in-bigquery-now-generally-available/</guid><category>Application Development</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Boost BigQuery with Python: Managed Python UDFs now generally available</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/python-udf-in-bigquery-now-generally-available/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sandeep Karmarkar</name><title>Group Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Chao Shen</name><title>Tech lead</title><department></department><company></company></author></item><item><title>From AI potential to agentic reality: Driving the UK’s next chapter</title><link>https://cloud.google.com/blog/topics/inside-google-cloud/london-summit-2026-uk-leads-agentic-enterprise-ai-infrastructure-data-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The United Kingdom, and London in particular, continues to be one of the great hubs for AI development in Europe and the world. We’re home to Google DeepMind, of course, as well as significant AI unicorns — and Google Cloud customers — like &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-16-Ineffable-Intelligence-Selects-Google-Cloud-To-Power-Its-Superintelligence-Mission" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Ineffable Intelligence&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which is today announcing an important partnership with us. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A year ago, we joined you for the London Summit to showcase &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/inside-google-cloud/london-summit-2025-gen-ai-agents-transforming-business-civil-service"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the vast potential of generative AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including a major investment in upskilling the UK civil service. Today, as we welcome our partners once again to the historic vaults of Tobacco Dock, that potential has become &lt;/span&gt;&lt;a href="https://cloud.google.com/transform/next-26-building-the-agentic-enterprise-industry-highlights"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;an industrial-scale reality&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. In my conversations with leaders across both Whitehall and The City, the focus has moved from chatbots and media experiments to full-production execution. This is &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the moment of the agentic enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, where we shift from systems that simply chat with us to systems that can reason, plan, and execute multi-step workflows.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This transition is the cornerstone of the UK’s projected &lt;/span&gt;&lt;a href="https://blog.google/company-news/inside-google/around-the-globe/google-europe/united-kingdom/ai-potential-uk/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;£400 billion economic boost from AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; by 2030. At Google Cloud, we are the only provider offering &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/ai-infrastructure-at-next26"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the full integrated stack&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; — custom silicon, frontier models, and planet-scale infrastructure — required to turn the Agentic Enterprise into a reality.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The new frontier of British enterprise and research&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The banking sector is a key proving ground for this shift. And &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;HSBC&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, one of the largest and most important financial institutions in the world, is showing the way. Today, we’re &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-17-HSBC-AND-GOOGLE-CLOUD-ANNOUNCE-TRANSFORMATIVE-AI-BANKING-PARTNERSHIP" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announcing&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; a multi-year transformational partnership with HSBC to accelerate AI adoption across HSBC’s products and services globally. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;This new collaboration will further accelerate the shift towards AI-enabled ways of working across HSBC’s global operations. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;HSBC will work with Google Cloud and Google DeepMind engineering teams to collaborate on new AI-powered tools and programmes, with access to Google’s latest agentic AI capabilities – including Gemini models and the Gemini Enterprise Agent Platform. &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;The initial delivery focus on three areas: hyper‑personalised wealth management support, stronger financial crime risk management, and AI tools to enhance frontline/relationship manager client service&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;UK startups also continue to break new ground with technology, and AI in particular, as demonstrated by the work of frontier labs like &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-16-Ineffable-Intelligence-Selects-Google-Cloud-To-Power-Its-Superintelligence-Mission" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Ineffable Intelligence&lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; The company, which launched earlier this year, has chosen Google Cloud as its preferred cloud partner, utilizing Google’s full stack of AI-optimized hardware and tools to build and train Ineffable’s first generation of foundational models. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Led by David Silver, a former Google DeepMind researcher who &lt;/span&gt;&lt;a href="https://deepmind.google/research/alphago/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;was instrumental in the AlphaGo project&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Ineffable Intelligence is taking a unique approach to AI development. The team are building systems that learn primarily through their own experience through &lt;/span&gt;&lt;a href="https://cloud.google.com/discover/what-is-reinforcement-learning?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;reinforcement learning&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;,&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; instead of relying on the large-scale human-generated datasets behind language models. The ambition is to create a “superlearner” that develops knowledge through trial and error. This year, Ineffable Intelligence set a record for a European seed funding round of $1.1 billion, and now Ineffable Intelligence will support its training work by deploying one of the largest clusters of A5X, powered by the NVIDIA Vera Rubin NVL72 platform on Google Cloud, delivering massive computational scale.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To move from experimentation to true industrial production, businesses need more than just models; they need a roadmap. To help show them the way, we’re expanding our partnership with &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-17-Deloitte-and-Google-Cloud-Collaborate-to-Launch-London-AI-Studio-to-Spearhead-UKs-Transition-to-Agentic-AI" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Deloitte&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which will open a new AI Studio at its London campus. Developed in collaboration with Google Cloud, the studio will help British organisations move beyond AI experimentation to deploy autonomous, action-oriented AI systems at scale. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Deloitte is also committing to upskill 1,000 members of its UK AI and data workforce on &lt;/span&gt;&lt;a href="https://cloud.google.com/gemini-enterprise?utm_source=google&amp;amp;utm_medium=cpc&amp;amp;utm_campaign=1713762-Gemini_Enterprise-DR-NA-US-en-Google-BKWS-EXA-GEnterprise&amp;amp;utm_content=c-Hybrid+%7C+BKWS+-+MIX+%7C+Txt_Gemini+Enterprise-189528400785&amp;amp;utm_term=gemini+enterprise&amp;amp;gclsrc=aw.ds&amp;amp;gad_source=1&amp;amp;gad_campaignid=23370621055&amp;amp;gclid=CjwKCAjwxb7RBhA5EiwAQ-AAdKh3HIPjJKRwMUI9Oxjo06q7orhp2vGKY396Yd4ENN8oULqQrQ2vkhoCAqQQAvD_BwE&amp;amp;e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This certification program will ensure that Deloitte’s AI and data engineers’ are equipped with the technical expertise to implement Google’s most advanced agentic architecture, providing UK clients with one of the largest pools of certified AI talent in the region.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Building a future-ready public sector&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The blueprint for a modern digital government requires moving away from rigid legacy contracts toward agile, AI-driven public services. In collaboration with the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Ministry of Housing, Communities and Local Government (MHCLG)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;i.AI &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;incubator, Google Deepmind, and Faculty, we are delivering &lt;/span&gt;&lt;a href="https://blog.google/company-news/inside-google/around-the-globe/google-europe/united-kingdom/google-cloud-summit-london-2026" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;tangible public sector reform and tools for reinvention&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; that directly support the national goal to "get Britain building."&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Agencies like MHCLG are already using a tool called Extract which was built using Google technology to help transform planning processes by reducing document processing times from two hours to just two minutes. Simultaneously, we are supporting trials of an AI planning tool — co-created with local planning authorities in Barnet, Dorset, and Camden — which aims to cut decision times for everyday applications by 50%. Furthermore, &lt;/span&gt;&lt;a href="https://blog.google/company-news/inside-google/around-the-globe/google-europe/united-kingdom/uk-department-for-transport-accelerates-public-policy-insights-with-google-cloud-ai/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;the Department for Transport (DfT)&lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;is utilizing Gemini to streamline public consultation analysis, a move projected to save £4 million annually.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Innovation on this scale also requires a secure, sovereign foundation. That is why Google Cloud is working to strengthen our UK data residency commitments, including measures like making Gemini 3.5 Flash, which features in-country AI processing, available by late June 2026 for sensitive sovereign use cases. We are giving British organizations the confidence to innovate within strict compliance boundaries.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To help keep businesses safe from the challenges posed by bad actors using AI and other digital threats, we also recently announced a &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/detecting-and-containing-powered-threats-with-google-security-operations-agents"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;comprehensive AI-powered cybersecurity platform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; — Google AI Threat Defense — which combines Wiz, Mandiant, Gemini &amp;amp; CodeMender to find, fix, and protect our customers from vulnerabilities.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Proven impact from the high street to public service&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Autonomous agents are no longer a future prospect; they are delivering value across the UK economy today. Our work with &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-17-THG-Ingenuity-Launches-AI-Shopping-Assistant-in-Collaboration-with-Google-Cloud,-Driving-8x-Higher-Conversions" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;THG Ingenuity&lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; an ecommerce solutions provider, has delivered an 8x higher conversion rate via its AI Shopping Assistant. &lt;/span&gt;&lt;a href="https://www.starlingbank.com/news/starling-launches-pioneering-ai-banking-tool/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Starling&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;is similarly empowering customers with "spending intelligence" tools for instant habit analysis around purchases and expenses. And Rightmove, has launched a beta version of an AI-powered conversational property search, built with Google’s Gemini models, enabling users to search for homes in their own words.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The breadth of this impact is visible across every sector: &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=Txfm-3RZ1GQ&amp;amp;t=2s" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Kingfisher&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is pioneering retail-specific agentic applications; &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-03-25-Openreach-Taps-Google-Cloud-AI-to-Accelerate-High-Speed-Internet-Access-and-Cut-Carbon,1" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Openreach&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is driving field service optimization in telecommunications; andUnilever is using AI at scale across the entire value chain to drive growth and build desirable brands in the new era of consumer goods.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Meanwhile, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;VMO2&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is streamlining complex data operations; &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2024-10-08-Vodafone-and-Google-Deepen-Strategic-Partnership-with-Ten-Year,-Billion-Dollar-Deal-including-Cloud,-Cybersecurity-and-Devices-Across-Europe-and-Africa" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Vodafone&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is executing a $1 billion partnership to redefine network performance; and &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;WPP is integrating Gemini across creative workflows, whether that's generating high-fidelity campaign assets at speed and scale, powering AI agents, or training &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure/wpp-humanoid-robots-ai-training?e=48754805"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;robotic camera operators&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Empowering the engine of growth for small to medium businesses and startups &lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The true measure of Britain’s AI success &lt;/span&gt;&lt;a href="https://cloud.google.com/topics/startups/london-summit-2026-smb-sme-ai-innovation"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;lies in its small and medium enterprises&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and startup ecosystem. Our AI Works research highlights a pivotal moment: AI has the potential to boost productivity for small and medium enterprises by 20% and unlock £198 billion in output for the UK economy. With 56% of smaller firms already seeking guidance, we have launched the &lt;/span&gt;&lt;a href="https://about.google/intl/ALL_uk/around-the-globe/local-info/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;AI Works for Britain&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; upskilling&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; initiative to ensure no business is left behind.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We also continue to foster the next generation of British unicorn startups through &lt;/span&gt;&lt;a href="https://technation.io/london-ai-hub-partnership-withhttps://technation.io/london-ai-hub-partnership-with-google-cloud/-google-cloud/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;our ongoing partnership with Tech Nation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; at the London AI Hub. This sustained commitment ensures founders have the resources and community needed to scale, and this September, we will further this mission by hosting the&lt;/span&gt;&lt;a href="https://startup.google.com/programs/gemini-startup-forum/cyber-security/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; Gemini Startup Forum: Cybersecurity&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in London to help startups build secure-by-design AI applications. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;The Model Garden&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; at &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Platform 37&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our belief in the UK’s potential is reflected in our physical footprint, too. We are continuing to invest in the UK's digital infrastructure to support growing demand: Our state-of-the-art data center in Waltham Cross launched in September 2025, a key part of our two-year, £5 billion investment to help power the UK's AI economy. And earlier this year, we opened our new&lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;office in London in Kings Cross, &lt;/span&gt;&lt;a href="https://blog.google/company-news/inside-google/around-the-globe/google-europe/united-kingdom/platform-37-the-ai-exchange/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Platform 37&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, along with plans for The AI Exchange, a new public space dedicated to deepening understanding of AI. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Building on this momentum, we are excited to introduce &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-17-Google-Clouds-Model-Garden-at-Platform-37-An-Exclusive-Customer-Hub-for-AI-Innovation-and-Collaboration" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;The Model Garden at Platform 37,&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; launching in the fourth quarter of 2026. This London-based hub is far more than a physical space; it serves as a strategic investment designed to fundamentally elevate how we engage with our most important customers. Blending the timeless aesthetics of a classic English garden with immersive, high-tech innovation — from living digital walls to a three-story atrium — The Model Garden acts as a physical marketplace for our best ideas. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;The blueprint for the agentic enterprise&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For UK businesses, civic leaders, and organizations to continue to lead in the AI moment, they must not only rethink the technology they use but also fundamental aspects of how we work. As we support thousands of organizations and millions of teams here and around the globe, we see three core strategies helping achieve success with AI:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Culture:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We must reimagine our organizations for the future. True transformation means getting teams excited, enabled, and equipped to work with AI agents in completely new ways. It is about human-AI collaboration, not just automation.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Responsibility:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We must build with safety and security in mind from day one. Protecting your users, your customers, and your brand is paramount. Our frontier models are built on a foundation of rigorous AI principles and secure-by-design infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Sustainability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; In an era of rising compute demands, we must scale in a way that is both financially viable and positive for our planet. At Google, we are committed to carbon-free energy 24/7, ensuring that the UK’s AI growth does not come at the cost of our climate goals.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Architecting the future together&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud is the primary partner for the UK’s agentic transition. We are moving beyond the hype of experimentation into the rigor of production. From the research labs of King's Cross to the diverse enterprises powering the high street, we are architecting a resilient, sovereign, and prosperous future for the United Kingdom. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Thank you to everyone who’s joining us in London — yesterday, today, and into the future. This year we’ve packaged up an &lt;/span&gt;&lt;a href="https://www.googlecloudevents.com/london-summit?utm_content=online_blog&amp;amp;utm_source=cloud_sfdc&amp;amp;utm_medium=blog&amp;amp;utm_campaign=FY26-Q2-EMEA-EME39630-physicalevent-er-London-Summitmc-168582" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;exclusive on-demand experience&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, allowing you to stream the defining London Summit moments, available anywhere, anytime.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 17 Jun 2026 08:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/inside-google-cloud/london-summit-2026-uk-leads-agentic-enterprise-ai-infrastructure-data-cloud/</guid><category>AI &amp; Machine Learning</category><category>Data Analytics</category><category>Security &amp; Identity</category><category>Sustainability</category><category>Customers</category><category>Partners</category><category>Startups</category><category>Inside Google Cloud</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_LmjIDy5.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>From AI potential to agentic reality: Driving the UK’s next chapter</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_LmjIDy5.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/inside-google-cloud/london-summit-2026-uk-leads-agentic-enterprise-ai-infrastructure-data-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Maureen Costello</name><title>Vice President, UK, Ireland &amp; Sub-Saharan Africa</title><department></department><company></company></author></item><item><title>How Siemens "slices the elephant," advancing agentic workflows for industrial software development</title><link>https://cloud.google.com/blog/products/ai-machine-learning/how-siemens-sliced-the-elephant-modernizing-legacy-code-with-agentic-workflows/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For technology companies like Siemens, software is the nervous system of factories, energy grids, and transportation networks worldwide.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As a global leader in industrial AI, industrial software, and industrial automation, Siemens brings decades of domain expertise across factory and process automation, energy infrastructure, and intelligent transportation — expertise that no off-the-shelf AI solution can replicate. But innovation carries a heavy anchor: legacy code. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With codebases spanning hundreds of millions of lines developed for over more than a decade, Siemens faced a challenge that standard AI tools couldn't solve: understanding and modernizing this code and the applications which run on it. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;The scale and depth of industrial-grade software demand a fundamentally different approach. Existing coding assistants lacked the contextual depth required to navigate complex, multi-layered industrial codebases — a gap Siemens set out to close.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To solve this, Siemens and Google Cloud created Knowledge Fabric&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;an AI system for automating the software development lifecycle. It was built using knowledge graphs on Spanner Graph, the Google Agent Development Kit, Gemini API,  Agent Platform, Gemini CLI, and Anthropic Claude Code. In a pilot migrating existing frontiers to web-based interfaces, Knowledge Fabric reduced implementation effort, freeing engineers to focus on customer innovations while maintaining full system compatibility.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“By ingesting the entire software ecosystem into an intelligent agentic system equipped with custom knowledge graphs, we aren’t just helping developers optimize their development time; we are enabling autonomous agents to reason across the past to build the future,” said &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Franz Menzl, senior vice president, product creation excellence at Siemens.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; “This is about freeing engineers from repetitive work so they can focus on higher-value problem solving.”&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The challenge: the complexity of industrial software&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Modernizing large-scale industrial-grade software systems&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; is often compared to rebuilding a jet while flying it. For Siemens, the challenge had four dimensions:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The repositories are massive — far exceeding the context windows of standard large language models.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fragmentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Critical knowledge was scattered across code, Jira tickets, Confluence pages, and scanned PDF manuals from the early 2000s.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Complexity:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Tracing the link between a specific line of code and a functional requirement document from 10 years ago presented a challenge that no manual or conventional tooling approach could address efficiently. It’s a reality shared across the industry.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Responsibility:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Systems must adhere to strict quality, compliance, and lifecycle requirements, often over 15 to 20 years of operation. AI‑generated outputs must therefore be explainable, traceable, and verifiable. Hallucinated or unvalidated changes are not merely inefficient but operationally unacceptable.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;"We realized that standard RAG (retrieval-augmented generation) wasn't enough," said Agata Gołębiowska, technical lead, Google Cloud. "Code isn't just text; it has inherent structure. A class belongs to a file, which belongs to a module. Flattening that into a vector database meant losing the representation of relationships elements of the codebase."&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The solution: &lt;/strong&gt;&lt;strong style="vertical-align: baseline;"&gt;A domain-aware Knowledge Fabric&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To make this sprawling software environment navigable for AI-driven workflows, the teams built the Knowledge Fabric agent. This agent goes beyond keyword matching to “understand” the relationships between assets.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We use Spanner Graph to model the inherent structure of the codebase, applying the same rigor to documentation across formats. By mapping connections between these domains, we can link specific code snippets directly to requirements in a design document. Agents then traverse this graph, using tools to query the structure via &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/spanner/docs/reference/standard-sql/graph-intro"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Graph Query Language (GQL)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;But GQL is only one piece. To enable semantic understanding, we generate embeddings for every node, using Spanner's &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/spanner/docs/find-approximate-nearest-neighbors"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Approximate Nearest Neighbors (ANN)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; algorithm to perform efficient vector search across the full codebase. Finally, we give agents &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/databases/spanner-graph-full-text-search?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;full-text search&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; capabilities, which can be combined with GQL to pinpoint nodes and edges with precision.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2-diagram.max-1000x1000.png"
        
          alt="2-diagram"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Combining these three methods lets an LLM agent answer complex queries, such as: &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Which functions need to be updated if I change the logic in the Axis Control Panel?"&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; The system traverses the graph — weighing keyword and semantic similarity — to identify dependencies, retrieve relevant documentation, and present a precise impact analysis.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This precise context is what lets a coding agent produce a valid, usable, and maintainable implementation.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;"Slicing the elephant:" the agentic workflow&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A key insight from the project was that AI agents struggle with massive, ambiguous tasks. To succeed, the team adopted a design pattern dubbed "slicing the elephant."&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The system breaks a sweeping request like “refactor this module” into smaller, more manageable tasks, each handled by a specialized agent built with the Google Agent Development Kit (ADK):&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Search agent:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Acts as a deep-research specialist. It uses tools to explore the code graph and cross-reference findings with documentation in &lt;/span&gt;&lt;a href="https://cloud.google.com/products/gemini-enterprise-agent-platform/agent-search?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Search&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;User story agent:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Interviews the product owner to gather requirements, then drafts detailed user stories with acceptance criteria linked to existing system contexts.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Architecture impact agent:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Analyzes proposed changes against the graph to predict side effects before a single line of code is written.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Task breakdown agent: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Consumes the analysis from the architecture impact agent and breaks the work into small, manageable tasks, each carrying all the context relevant to a specific change.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Coding agent: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Implements the change described in a specific task. Reaching this step without context and prior analysis  produces unusable code.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The system keeps a human in the loop at every step, which ensures reliable, production‑grade outcomes and keeps engineers focused on meaningful work rather than routine implementation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;"By slicing the elephant — breaking complex refactoring jobs into smaller, agent-led tasks — we observed a significant productivity increase," said Alexander Lomakin, project lead at Siemens. "We essentially gave the AI the roadmap it needed to navigate the complexity."&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Pilot results: Faster, more efficient engineering&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers saw results almost immediately.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Analyzing dependencies for a new feature once required senior engineers to spend several days navigating codebases and legacy documentation. With the Knowledge Fabric, the same work now takes far less time.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In a recent production pilot migrating legacy control panels to modern web‑based interfaces, the Knowledge Fabric reduced overall coding effort while preserving system integrity and industrial quality standards. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Engineers now spend more time creating customer value and less on repetitive work.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Knowledge Fabric shows that generative AI can do more than write boilerplate code, it can also help teams modernize the legacy systems their businesses depend on most.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To learn more about building graph-based agents for your own legacy modernization:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Read about &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/the-unified-graph-solution-with-spanner-graph-and-bigquery-graph"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Spanner Graph&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Explore &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Platform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and find pre-built &lt;/span&gt;&lt;a href="https://x.com/GoogleCloudTech/status/2048066787233943773" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;production-grade agents&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/build/agent-garden"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Garden&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Check out the &lt;/span&gt;&lt;a href="https://adk.dev/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Development Kit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;a href="https://www.siemens.com/en-us/company/artificial-intelligence/industrial-ai/" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt;Read more&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on how Siemens is advancing industrial AI.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Tue, 16 Jun 2026 14:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/how-siemens-sliced-the-elephant-modernizing-legacy-code-with-agentic-workflows/</guid><category>Customers</category><category>Data Analytics</category><category>Manufacturing</category><category>AI &amp; Machine Learning</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/siemens-alphaevolve-generative-evolved-codeb.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How Siemens "slices the elephant," advancing agentic workflows for industrial software development</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/siemens-alphaevolve-generative-evolved-codeb.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/how-siemens-sliced-the-elephant-modernizing-legacy-code-with-agentic-workflows/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Anant Nawalgaria</name><title>Group Product Manager &amp; AI engineer, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Tomasz Świtoń</name><title>Senior AI Engineer, Google</title><department></department><company></company></author></item><item><title>What’s new in data agents: Supercharging your AI workflows</title><link>https://cloud.google.com/blog/products/data-analytics/new-data-agents-across-the-agentic-data-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The rise of AI agents is fundamentally disrupting applications and analytical systems. Generic AI platforms don't usually have access to the context stored within enterprise databases. This is because traditional data architectures often lack context for agents across the data estate, which can lead to agents being inaccurate. They’re also prone to security gaps due to a lack of granular access controls. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google’s Agentic Data Cloud is an AI-native system of action that includes both operational and analytical systems. By infusing AI across the entire stack — from custom silicon to frontier Gemini models — we provide a deterministic, template-driven developer framework that allows agents to ground their reasoning in real-time enterprise data with near-100% accuracy, as well as unified governance.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we’re making it easier to develop agents, with a whole host of new data agents and tools: for business analysts within Conversational Analytics; for data scientists, engineers, and database admins with a series of Google-built Data Agents that provide greater automation and intelligence; and finally, for developers, with Data Agent tools that help you better integrate with today’s open agentic ecosystem.&lt;/span&gt;&lt;/p&gt;
&lt;h3 role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;1. Conversational Analytics&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To support developers building agents using natural language, we’re announcing expanded support for Conversational Analytics across Data Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/conversational-analytics"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics in BigQuery&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; integrates a sophisticated AI reasoning engine directly into BigQuery Studio, helping data and business teams go beyond writing manual SQL, leveraging business context to ground answers using multimodal synthesis and deep-dive research. Agentic workflows, in preview for select customers, automate root-cause analysis, and schedule actions — turning enterprise data into proactive, actionable intelligence. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_M5Wjn2O.gif"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="jtzzw"&gt;Create agents for faster data insights with Conversational Analytics in BigQuery&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Conversational Analytics in Lakehouse&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/conversational-analytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, extends the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; unified infrastructure, so users can query distributed data lakes across AWS, Azure, and Google Cloud using natural language. This makes it possible to combine insights across cloud platforms without moving a single byte of data. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Conversational Analytics in &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics/alloydb"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics/spanner"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Spanner&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, and &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics/sql-postgres"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud SQL&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, now in preview, supports out-of-the-box conversational AI, making data accessible for everyone. AlloyDB, Spanner, and Cloud SQL users can start natural-language conversations with their databases to gain visibility into their real-time operational data and capture analytical insights.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_YqI8Fra.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="jtzzw"&gt;Use Conversational Analytics to get answers from your operational data&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Looker Embedded Conversational Analytics&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-embedded-adds-conversational-analytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now generally available&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, allows you to embed &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;agents directly into your custom applications and internal workflows via a low-code iframe implementation, making it easier to ship production-ready, conversational AI within any application. Additionally, with the&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/reference/looker-api/latest/methods/ConversationalAnalytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics API in Looker&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;you can create multi-turn conversational workflows that offer AI-powered recommendations, while also verifying and explaining the underlying SQL query. We are also significantly upgrading Looker’s core&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-conversational-analytics-now-ga/?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics agent&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;which is already GA, with superior reasoning and semantic grounding, helping to eliminate ambiguity.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_vDitSbe.gif"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="jtzzw"&gt;Embed agents directly into your applications for conversational AI&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;2. New data agents&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To help data professionals move from reactive data management to proactive intelligence, and business analysts better interact with their dashboards, we’re announcing a new set of data agents that bring automation, intelligence, and natural language capabilities into their daily workflows. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data Engineering Agent, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/data-engineering-agent-pipelines"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now generally available&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, automates the heavy lifting of building and maintaining data pipelines. It transforms natural language requirements into optimized SQL or Python code for BigQuery and Dataflow, while proactively identifying and fixing pipeline breaks. By suggesting schema improvements and partitioning strategies, it ensures your data foundation is scalable, reliable, and performance-tuned without manual trial and error.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/colab-data-science-agent"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Data Science Agent&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; accelerates the path from raw data to production-ready models. It assists data scientists by suggesting relevant features, generating boilerplate notebook code, and automating the technical documentation process. &lt;/span&gt; &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Database Observability Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, in preview with select Cloud SQL, AlloyDB, Spanner, and Bigtable customers, proactively monitors database performance and continuously identifies potential issues before they escalate. It then delivers intelligent recommendations and multi-turn remediation workflows for fast, comprehensive troubleshooting and optimization. It provides performance analytics for the entire database fleet, helping you quickly identify performance optimization opportunities across databases.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Database Onboarding Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, in preview with select customers, takes the guesswork out of database selection and deployment. By evaluating your stated requirements — from simple use case descriptions, to complex enterprise needs — it recommends the best Google Cloud database and guides you through provisioning.&lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Looker Dashboard Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/conversational-analytics-looker-data-agents-dashboards"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview,&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; enables conversational interaction with data within dashboards. Users can ask natural language questions and receive context-aware answers within the dashboard. This feature also provides AI-generated summaries that highlight key takeaways and insights from the dashboard. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Conversational Analytics in Gemini Enterprise, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/create-data-agents#publish-agent-gemini-enterprise"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for Looker, BigQuery, and Lakehouse, brings governed intelligence built by data practitioners directly to business leaders. It serves as a "front door" to the Google Data Cloud, allowing business users to consume agents built in BigQuery, Looker, or Lakehouse without needing to access technical consoles. By publishing these agents from Google Data to Gemini Enterprise, organizations provide a single, grounded interface for precision data exploration and immediate answers to the business users. &lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Deep Research Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/deep-research" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, uses the Knowledge Catalog to solve high-stakes, multi-layered business problems. It moves beyond simple search to build comprehensive research plans that synthesize intelligence from internal documents, BigQuery tables, and the public web. The result is a detailed report with dynamic visualizations and verifiable citations, that respect enterprise privacy and user permissions all the while. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;3. Tools for data agents &lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Open-source standards for agentic development provide developers building AI applications and custom agents with a unified framework to access data and tools consistently and securely. Today, we are announcing the following tools to help ground your agentic development initiatives:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data Agent Kit: &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, provides a standardized suite of skills and tools directly within preferred developer environments (IDE/CLI), empowering data practitioners to discover, transform, and action data at scale using the prescriptive guidance from the Agentic Data Cloud capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed MCP Servers for Databases, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/mcp/manage-mcp-servers"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now generally available&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for AlloyDB, Spanner, Cloud SQL, Bigtable, and Firestore, fully manages the infrastructure required to connect AI models securely to your data, so you don’t have to host, secure, or scale MCP servers yourself. Now, developers can provide their agents with up-to-date context from across our database portfolio, so that your AI models can reason and act upon your most up-to-date enterprise data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed MCP Server for Looker&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/mcp"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, allows any MCP client or agent platform to query Looker's semantic models, extending governed BI insights across third-party applications.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_rcQ0IiI.max-1000x1000.png"
        
          alt="4"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="jtzzw"&gt;Access Looker semantic models through Managed MCP Server&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;MCP Toolbox for Databases 1.0, &lt;/strong&gt;&lt;a href="https://github.com/googleapis/mcp-toolbox" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now generally available&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, has achieved a major stability milestone, giving you the confidence to build production applications. We also overhauled the documentation, making the platform significantly more approachable for both human developers and autonomous agents.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;QueryData for &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/sql/docs/postgres/data-agent-overview"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud SQL&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/data-agent-overview"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, and &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/spanner/docs/data-agent-overview"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Spanner&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;,&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; now in preview, turns natural language questions into database queries. It’s built natively into these databases, and provides near-100% accuracy for natural language to SQL conversions through metadata, query examples, and evals. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Universal Commerce Protocol (UCP) Analytics powered by BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, now in preview, enables merchants and developers to stream real-time events from UCP directly into BigQuery (see &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/data-agent-kit/tree/main/ucp-analytics" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sample&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;). This &lt;/span&gt;&lt;a href="https://developers.google.com/merchant/ucp/guides/bq-storage" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;integration&lt;/span&gt;&lt;/a&gt; &lt;span style="vertical-align: baseline;"&gt;provides out-of-the-box observability for agentic commerce, allowing teams to monitor conversion funnels, track automated checkout performance, and identify system errors. By standardizing these metrics within BigQuery, businesses can bridge the gap between AI-driven transactions and existing business intelligence workflows. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Details on how to access the new agents and tools can be found from each of the documentation links on this page. Data agents are also available through Gemini Enterprise and the Google Cloud console. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 15 Jun 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/new-data-agents-across-the-agentic-data-cloud/</guid><category>Databases</category><category>Business Intelligence</category><category>Google Cloud Next</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new in data agents: Supercharging your AI workflows</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/new-data-agents-across-the-agentic-data-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sean Rhee</name><title>Product Management, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Geeta Banda</name><title>Head of Outbound Product Management, Google Cloud</title><department></department><company></company></author></item><item><title>Introducing the Open Knowledge Format</title><link>https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As foundation models continue to improve, the lack of relevant context often limits what they can do, especially as they are used to build agentic systems. While these models can help you write code, summarize documents, or analyze a dataset, they still need the right information to produce accurate and actionable results. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That’s why today, we’re introducing the Open Knowledge Format (OKF), an open specification that formalizes the &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener" target="_blank"&gt;LLM-wiki&lt;/a&gt; pattern into a portable, interoperable format. This is a vendor-neutral, agent- and human-friendly standard for representing the metadata, context, and curated knowledge that modern AI systems need.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As published, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;OKF v0.1&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; represents knowledge as a directory of markdown files with YAML frontmatter, with a small set of agreed-upon conventions that let wikis written by different producers be consumed by different agents without translation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That's it. No complex compression scheme, no new runtime, no required SDK. A bundle of OKF documents is:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Just markdown&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; — readable in any editor, renderable on GitHub, indexable by any search tool&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Just files&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; — shippable as a tarball, hostable in any git repo, mountable on any filesystem&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Just YAML frontmatter&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; — for the small set of structured fields that need to be queryable: &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;type&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;title&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;description&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;resource&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;tags&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;timestamp&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you've used Obsidian, Notion, Hugo, or any of the LLM wiki patterns that have emerged over the past year, the shape will feel familiar. OKF formalizes the small set of conventions needed to make these patterns interoperable.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s take a look at the problem that OKF can solve for your organization, how it works, how to get started with it, and what’s next.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;A fragmented context landscape&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In most organizations, the information that foundation models use is overwhelmingly internal knowledge: the schema of a table, your business’ meaning of a metric, the runbook for an incident, the join paths between two systems, the deprecation notice for an old API, etc.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, these atoms of knowledge live in a variety of highly fragmented systems:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Metadata catalogs with their own APIs&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Wikis, third-party systems, or in shared drives&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Code comments, docstrings, or notebook cells&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The heads of a few senior engineers&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When an AI agent needs to answer &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"How do I compute weekly active users from our event stream?"&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; it has to assemble the answer from these scattered, mutually incompatible surfaces. Every vendor offers its own catalog, its own SDK, its own knowledge-graph schema, and none of the knowledge is easily portable across products or organizations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The result: Every agent builder is solving the same context-assembly problem from scratch, every catalog vendor is reinventing the same data models, and the knowledge itself is locked behind whichever surface created it.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Knowledge as a living wiki&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developer teams are changing how they build AI agents. Instead of using models to search the same documents for the same facts over and over, you can give your agents a shared markdown library that grows more useful over time. This lets your agents take on the drudgery of reading and updating their own files, while your team curates the content and manages it like code. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Andrej Karpathy, the prominent AI researcher and educator, articulates this idea most crisply in his &lt;/span&gt;&lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;LLM Wiki gist&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass," he writes. The bookkeeping that causes humans to abandon personal wikis is exactly what LLMs are good at.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Similar knowledge-as-Wiki pattern keeps reappearing under different names: &lt;/span&gt;&lt;a href="https://obsidian.md/help/vault" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Obsidian vaults&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; wired to coding agents, the &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;AGENTS.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; / &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;CLAUDE.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; family of convention files, repos full of &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;index.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;log.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; artifacts that agents consult before doing real work, and "metadata as code" repositories inside data teams. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The pattern is compelling and powerful, but each instance is bespoke. Karpathy's wiki and your team's wiki and a vendor's catalog export may all &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;look&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; alike (markdown, frontmatter, cross-links), but none of them are intentionally designed to cooperate. There is no agreed-upon answer to &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;what fields every document should carry&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, or &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;what filenames mean what&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. As a result, the knowledge encoded in wikis remains siloed within the original teams, leading to redundant effort whenever a new agent is built.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;What's missing is a format, not another service&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The answer to this problem isn’t another knowledge service. You need a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;format&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, a way to represent knowledge that:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Anyone can produce, without an SDK&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Anyone can consume, without an integration&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Survives moving between systems, organizations, and tools&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Lives in version control alongside the code it describes&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Is readable by humans &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;and&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; parseable by agents: the same file, no translation layer&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By design, OKF is that format. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;How OKF works: The design in one screen&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;An OKF &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;bundle&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is a directory of markdown files representing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;concepts: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;anything you want to capture, including tables, datasets, metrics, playbooks, runbooks, and APIs. Each concept is one file. The file path is the concept's identity:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;sales/\r\n├── index.md\r\n├── datasets/\r\n│   ├── index.md\r\n│   └── orders_db.md\r\n├── tables/\r\n│   ├── index.md\r\n│   ├── orders.md\r\n│   └── customers.md\r\n└── metrics/\r\n│   ├── index.md\r\n     └── weekly_active_users.md&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a6e16070&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Each concept document has a small block of YAML front matter for structured fields and a markdown body for everything else:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;---\r\ntype: BigQuery Table\r\ntitle: Orders\r\ndescription: One row per completed customer order.\r\nresource: https://console.cloud.google.com/bigquery?p=acme&amp;amp;d=sales&amp;amp;t=orders\r\ntags: [sales, revenue]\r\ntimestamp: 2026-05-28T14:30:00Z\r\n---\r\n\r\n# Schema\r\n\r\n| Column        | Type      | Description                              |\r\n|---------------|-----------|------------------------------------------|\r\n| `order_id`    | STRING    | Globally unique order identifier.        |\r\n| `customer_id` | STRING    | FK to [customers](/tables/customers.md). |\r\n\r\n# Joins\r\n\r\nJoined with [customers](/tables/customers.md) on `customer_id`.&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a6e16220&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Concepts link to each other with normal markdown links, turning the directory into a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;graph&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; of relationships that is richer than the parent/child links implied by the file system. Bundles can optionally include &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;index.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; files (for progressive disclosure as agents navigate the hierarchy) and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;log.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; files (for chronological history of changes).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The full v0.1 specification (including conformance criteria, cross-linking rules, and the small number of reserved filenames) fits on a single page.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Three principles behind the design&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Minimally opinionated.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; OKF requires exactly one thing of every concept: a &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;type&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; field. Everything else (e.g., what types exist, what other fields to include, what sections the body has) is left to the producer. The spec defines the &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;interoperability surface&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, not the content model.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Producer/consumer independence.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; OKF cleanly separates &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;who writes the knowledge&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; from &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;who consumes it&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one LLM can be queried by another. The format is the contract; the tooling at each end is independently swappable.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Format, not platform.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; OKF is not tied to any specific cloud, database, model provider, or agent framework. It will never require a proprietary account or SDK to read, write, or serve. We're publishing it as an open standard because the value of a knowledge format comes from how many parties speak it, not from who owns it.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;What we're shipping with the spec&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To make the format concrete, we're publishing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;reference implementations&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; at both the producer and consumer ends:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;An &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;enrichment agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; that walks a BigQuery dataset, drafts an OKF concept document for every table and view, then runs a second LLM pass that crawls authoritative documentation and enriches each concept with citations, schemas, and join paths.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;A &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;static HTML visualizer&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; that turns any OKF bundle into an interactive graph view in a single self-contained file; no backend, no install on the viewing side, no data leaves the page.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Three ready-to-browse sample bundles&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://developers.google.com/analytics/bigquery/web-ecommerce-demo-dataset" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GA4 e-commerce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1sbigquery-public-data!2sstackoverflow" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Stack Overflow&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/public-datasets/bitcoin-in-bigquery-blockchain-analytics-on-public-data?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Bitcoin public datasets&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, produced by the reference agent and committed to the repo as living examples of conformant OKF.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These are proofs of concept, deliberately. The agent demonstrates &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;one&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; way to produce OKF; nothing about the format requires a specific agent framework or LLM. The visualizer demonstrates &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;one&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; way to consume it; nothing about the format requires HTML or a graph view. We expect (and want!) the ecosystem of producers and consumers to grow far beyond what we've shipped.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Where we go from here&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;OKF v0.1 is a starting point, not a finished standard. The format will evolve as more producers and consumers emerge and as we collectively learn what knowledge representations agents actually need in practice.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We're publishing in the open from day one because that's the only way a knowledge format earns its name, whether you're building a knowledge catalog, an enrichment pipeline, a wiki tailored to AI agents, or anything in the AI knowledge domain. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;From here, we encourage you to:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Read the spec&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; (it's short!)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Write a producer&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for your source system, your database, your documentation site&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Write a consumer:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; a viewer, a search index, an agent that reasons over bundles&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Try the reference implementation&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; against your own data&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;File issues, send PRs, or propose extensions:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The spec is versioned and explicitly designed for backward-compatible growth&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The repo, the spec, and the sample bundles are available in &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GitHub&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. We have also updated Google Cloud’s &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-the-google-cloud-knowledge-catalog"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Knowledge Catalog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to be able to ingest Open Knowledge Format and serve it to our agents. You can find the relevant code and examples &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/toolbox/mdcode/demo" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The format itself is the contribution. The tools we've shipped exist to make it real, and to lower the cost of trying it out. Whatever shape your knowledge takes today, OKF is designed to be the &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;lingua franca&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; it can be exchanged for tomorrow. &lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Published by the Google Cloud Data Cloud team. Open Knowledge Format is an open specification; contributions, alternative implementations, and adoption beyond Google products are all explicitly welcomed.&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;In addition to the authors, this work came together thanks to key ideas from many others at Google, and we thank them for their contributions.&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 12 Jun 2026 13:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/</guid><category>AI &amp; Machine Learning</category><category>BigQuery</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Introducing the Open Knowledge Format</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sam McVeety</name><title>Tech Lead, Data Analytics, Engineering, Data Cloud, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amir Hormati</name><title>Tech Lead, BigQuery, Engineering, Data Cloud, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amir Hormati</name><title>Tech Lead, BigQuery, Engineering, Data Cloud, Google Cloud</title><department></department><company></company></author></item><item><title>Transform dashboards into interactive data experiences with Looker agents</title><link>https://cloud.google.com/blog/products/business-intelligence/dashboard-agents-in-looker/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Dashboards have long served as a primary way for organizations to extract insights from data, but they can fall short in agile environments: Dashboards aren’t interactive and don’t allow you to ask follow-up questions. This forces users to step outside their workflows or turn to data analysts to get the answers they need. Today, we are introducing Looker dashboard agents in preview, embedding intelligent, conversational data agents directly within dashboards and empowering users to explore their business intelligence (BI) data using natural language.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_KG6gpf2.gif"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="4nhaj"&gt;Start a conversation with a Looker dashboard agent&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Interactive agent-led investigations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Traditionally, dashboards have presented a static view of data. With dashboard agents in Looker, users can explore their data directly within the dashboard interface. Users can start a conversation by clicking the Gemini icon and asking natural-language questions to receive contextual insights.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The accuracy of a data agent depends on the business context it is provided, and its ability to map appropriate metrics and dimensions to users’ inquiries. The Looker dashboard agent has direct context about the user’s applied filters, cross-filters, and pre-curated tiles, helping it to generate highly relevant and accurate answers to complex business questions.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Should a query require more data, the agent can access underlying &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/creating-and-editing-explores"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Explores&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to uncover additional information. These insights are paired with relevant charts and natural language explanations to simplify data exploration.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_kUvlGxK.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="4nhaj"&gt;Explore data beyond dashboard to uncover deeper insights&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Tailor the agent to your business &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Data analysts curate dashboards to provide business users with precise perspectives on organizational data. To maintain this kind of consistent and reliable analytical environment, the Looker dashboard agent is highly configurable. Analysts can add context on top of the Looker semantic layer by providing natural-language instructions directly to the agent. This way, they can define exactly how the agent interprets unique business logic and tailors responses for the target audience. By enabling self-serve data analysis, dashboard agents help analyst teams scale to meet the increasing data demands of the business.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_t5v8e7A.gif"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="4nhaj"&gt;Configure Looker dashboard agents&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Inherited trust and transparency &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For users to adopt an AI-based system, they must&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; trust the information it provides them. When generating an insight, the Looker dashboard agent explicitly shows its work by displaying intermediate reasoning, referenced dashboard tiles, and applied filters. Additionally, the administrator needs to trust users only have access to data and insights to which they are authorized. The dashboard agent is backed by Looker’s governance model, managed through standard permissions.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are actively working on additional capabilities for the Looker dashboard agent, including support for iframe embedding, allowing organizations to bring dashboard agents alongside Looker dashboards into any essential portal or application.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enable dashboard agents today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With Looker version 26.08.11 and later, administrators can activate the dashboard agent capability by toggling "Enable Chat with Dashboard" within the Gemini in Looker settings. Once enabled, authorized users will see the Gemini icon and can begin chatting with their dashboard data immediately. Please &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/conversational-analytics-looker-data-agents-dashboards"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;explore our support documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for more detailed information.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 11 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/business-intelligence/dashboard-agents-in-looker/</guid><category>Data Analytics</category><category>Business Intelligence</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Transform dashboards into interactive data experiences with Looker agents</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/business-intelligence/dashboard-agents-in-looker/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vaibhavi Sonavane</name><title>Product Manager</title><department></department><company></company></author></item><item><title>Deep dive: How Lightning Engine delivers 4.9x faster Apache Spark performance</title><link>https://cloud.google.com/blog/products/data-analytics/lighting-engine-for-apache-spark-performance-deep-dive/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;From foundational ETL and analytics to the frontier of generative AI, Apache Spark serves as the architectural backbone for global data processing. However, as data volumes scale, the trade-off between performance and infrastructure costs can be a limiting factor for growth. In the agentic era, where autonomous agents can trigger thousands of concurrent, multi-hop queries, this performance bottleneck directly dictates your unit economics.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the general availability of &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning Engine&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, available across both our &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;serverless&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;managed clusters&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; deployment modes. Designed to address these scaling challenges directly, it is fully compatible with modern Spark workloads and requires zero changes to your existing data pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you choose the zero-ops simplicity of our serverless deployment mode or the fine-grained infrastructure control of our managed clusters deployment mode, Lightning Engine serves as the unified performance engine to supercharge your job execution. By validating Lightning Engine across more than one million real-world workloads, we have fine-tuned it for industrial-grade stability as well as reliable performance gains.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With this general availability release, Lightning Engine delivers:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Up to 4.9x faster performance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; than standard open-source Spark&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;2x the price-performance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; over the leading high-speed Spark alternative&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s take a closer look at how Manager Service for Apache Spark achieves these great results.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_6snIfkF.max-1000x1000.jpg"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Under the hood: Vectorized native execution&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Traditional Spark execution is often bottlenecked by JVM execution overhead and garbage collection pauses. Lightning Engine bypasses these limitations by compiling Spark physical query plans into native C++ instructions optimized for Single Instruction, Multiple Data (SIMD) vectorization.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Built on the open-source Gluten and Velox runtimes with specialized Google-engineered enhancements, this native execution layer accelerates your most demanding data processing tasks with:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Vectorized sort&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Accelerates sorting operations by processing data columnarly in native memory, significantly reducing CPU cycle overhead.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Accelerated window functions&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Speeds up calculations performed across sets of rows (such as moving averages, aggregations, and deduplication) by executing them directly within the native C++ layer.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Smart fallback&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: If a query contains an operator or custom Java UDF that is not natively supported, the engine's intelligent push-down layer automatically and gracefully transitions that specific sub-tree back to the JVM, avoiding unnecessary data format conversions and preserving overall execution stability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Optimized Cloud Storage and BigQuery connectors&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;High-performance compute is useless if the engine is starved for data. With Lightning Engine, we’ve optimized our storage connectors to ensure that reading data from Cloud Storage and BigQuery isn’t the bottleneck. Optimizations include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Direct path connection&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Bypasses multiple node hops and uses bi-directional streaming with Cloud Storage. This allows seek operations and vectorized &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;readV&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; APIs to run without reopening streams, accelerating scan times for complex, deeply nested Parquet or ORC files.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Metadata call reduction&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Managing large-scale partitioned tables often comes with a hidden performance tax: the time spent simply listing files. Lightning Engine utilizes lexicographic listing in the driver to collect metadata and transmit it directly to executors, eliminating redundant Cloud Storage API calls and dramatically reducing Cloud Storage metadata costs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Native BigQuery connector&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Directly consumes BigQuery data in Arrow format. By avoiding the expensive conversion from Arrow to JVM &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;UnsafeRow&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, the engine eliminates serialization overhead to accelerate scan times.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Broadcast joins and advanced query optimization&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Lightning Engine incorporates an advanced, cost-based query optimizer inspired by Google's F1 and Spanner query engines, and introduces several custom optimization rules. Examples include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Single HashTable caching&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: In standard broadcast joins, Spark builds join hash tables repeatedly across tasks. Lightning Engine builds the hash table once per executor and caches it, eliminating redundant CPU cycles and reducing the executor's memory footprint.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Aggregation pushdown&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Automatically pushes partial aggregations below join shuffles. This minimizes the volume of data that must be transferred across the network, drastically reducing expensive shuffle stages.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Auto shuffle partitioning&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Dynamically and adaptively determines the optimal number of shuffle partitions for each individual query stage based on runtime statistics, preventing out-of-memory (OOM) spills without over-partitioning.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      data-glue-modal-trigger="uni-modal-2uYC821jtEk-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/2_ghHCex2.max-1000x1000.png);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;The new way to use Spark: Intelligent, automated, and lightning fast&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
      &lt;figcaption class="article-video__caption h-c-page"&gt;
        
          &lt;h4 class="h-c-headline h-c-headline--four h-u-font-weight-medium h-u-mt-std"&gt;Learn more technical details and hear Lowe’s experience with Lightning Engine from Google Cloud Next ‘26&lt;/h4&gt;
        
        
      &lt;/figcaption&gt;
    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-2uYC821jtEk-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="2uYC821jtEk"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Getting started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These updates are live and ready to use today! You can enable Lightning Engine directly through the Google Cloud console or via the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gcloud&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; CLI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To submit a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;serverless&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; batch job with Lightning Engine enabled, specify the premium tier in your Spark properties:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud dataproc batches submit pyspark my_script.py \\\r\n    --region=us-central1 \\\r\n    --properties=dataproc:dataproc.tier=premium \\\r\n    --properties=spark:spark.dataproc.lightningEngine.runtime=native&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a5d2a9d0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To spin up a new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;managed cluster&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; with Lightning Engine and Native Query Execution (NQE) enabled, run the following command in your terminal:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud dataproc clusters create my-optimized-cluster \\\r\n    --region=us-central1 \\\r\n    --image-version=2.3 \\\r\n    --engine=lightning \\\r\n    --enable-component-gateway \\\r\n--properties=spark:spark.dataproc.lightningEngine.runtime=native&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84b45e0460&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alternatively, navigate to the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; page in the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, click &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Create Cluster&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, select &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cluster on Compute Engine&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and choose &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning Engine&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; under the cluster configuration settings to automatically activate query acceleration for your workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 10 Jun 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/lighting-engine-for-apache-spark-performance-deep-dive/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Deep dive: How Lightning Engine delivers 4.9x faster Apache Spark performance</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/lighting-engine-for-apache-spark-performance-deep-dive/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Newton Alex</name><title>Director of Engineering</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Abhishek Modi</name><title>Principal Software Engineer, Google Cloud</title><department></department><company></company></author></item><item><title>Modernizing Healthcare: How Alcidion achieved greater stability and performance with AlloyDB</title><link>https://cloud.google.com/blog/products/databases/modernizing-healthcare-how-alcidion-achieved-greater-stability-and-performance/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In clinical informatics, every second counts. For &lt;/span&gt;&lt;a href="https://www.alcidion.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Alcidion&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a global leader in smart health solutions, the mission is simple but critical: use technology to reduce cognitive load for clinicians and present the right information at the right time to save lives.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether it’s managing patient flow in an emergency department or ensuring a patient is in the correct ward to avoid adverse outcomes, Alcidion’s flagship platform, &lt;/span&gt;&lt;a href="https://www.alcidion.com/platform/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Miya Precision&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, serves as a dynamic intelligent care platform for modern hospitals. To power this mission, the platform recently underwent a major architectural transformation, migrating from a legacy Microsoft SQL Server environment to Google Cloud’s &lt;/span&gt;&lt;a href="https://cloud.google.com/products/alloydb"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB for PostgreSQL&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;The challenge: overcoming performance bottlenecks&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Operating in an industry where data integrity and uptime are non-negotiable, Alcidion faced several technical and operational hurdles with its previous setup:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Operational overhead:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Managing persistent backends for SQL Server required significant manual effort. The team had to manually balance database loads between elastic pools to maintain performance while trying to optimize costs. They also had to constantly manage the gap between allocated and used space to prevent shared pools from being consumed by excessive slack space.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Performance latency:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Complex JSON data processing, critical for modern health informatics, was taking up to 30 minutes for certain jobs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Stability concerns:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The team sought a more stable Kubernetes environment and a persistent backend that could scale without constant administrative intervention.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;The solution: a smooth migration to AlloyDB&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alcidion used the &lt;/span&gt;&lt;a href="https://cloud.google.com/database-migration"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Database Migration Service&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (DMS) to move from SQL Server to AlloyDB, achieving a remarkably efficient cutover. The total learning and migration process took under one month, with the core database move completed in only one and a half weeks.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By creating custom synchronization tools and using Google Cloud’s managed services, the team reduced the final transition window to just 15 minutes. Alcidion achieved this by spinning up a new Google Cloud instance synchronized to the active one, with both accessible via unique fully qualified domain names. The new environment remained in read-only mode for customer validation. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;During the final cutover, the old instance was set to read-only, synchronization was halted, and external integration links were toggled to the new environment. This streamlined process allowed users to log into the new instance and resume work within minutes, with the primary delay being DNS record updates.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alcidion chose a fully managed AlloyDB service to eliminate control plane tasks and administrative overhead. This shift allows their engineering team to focus on clinical innovation and product development rather than "managing the container" or the underlying database infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Being able to cut over to AlloyDB in about 15 minutes had our users back to work almost immediately. For a system clinicians rely on around the clock, that kind of smooth transition gave Alcidion real confidence.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;The results: impact by the numbers&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The shift to AlloyDB and Google’s &lt;/span&gt;&lt;a href="https://cloud.google.com/data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; has delivered immediate, quantifiable improvements for Alcidion and its healthcare customers:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster data processing:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Data processing that previously relied on SQL Server stored procedures — a process that became increasingly time-consuming as data volumes grew — has been transformed. By migrating to AlloyDB and using &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and Dataflow for processing, Alcidion has seen jobs that once took 30 minutes now complete in just 5 to 60 seconds.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhanced stability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The migration has delivered a step-change in reliability. In the previous environment, the team faced monthly disruptions, ranging from failed scheduled maintenance to connectivity issues that required manual intervention. In contrast, AlloyDB and Google Cloud’s compute services have proven exceptionally stable, allowing the team to move away from the "firefighting" mode associated with frequent infrastructure crashes.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Reduced cognitive load:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By simplifying their backend and clinical dashboards, Alcidion’s SREs have significantly reduced their administrative burden. This shift has freed the team to focus on high-value innovation, such as refining predictive analytics and generative AI that empower clinicians to make informed clinical decisions faster.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Future vision: AI and beyond&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alcidion isn't stopping at database modernization. The move to AlloyDB is a foundational step for their next phase of growth:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;AlloyDB columnar engine:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The team is exploring the columnar engine for a second round of query optimization and real-time analytics.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Generative AI apps:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Alcidion is actively working with Google to use AlloyDB’s &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise Agent Platform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; integration to perform concept analysis and pick out critical clinical insights from vast datasets.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By moving to AlloyDB, Alcidion has improved its stability and performance and built a strong foundation to keep delivering smarter, safer care to hospitals worldwide.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Ready to modernize your database?&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; Learn more about how&lt;/span&gt;&lt;a href="https://cloud.google.com/alloydb"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; can transform your operational workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 08 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/databases/modernizing-healthcare-how-alcidion-achieved-greater-stability-and-performance/</guid><category>AI &amp; Machine Learning</category><category>Data Analytics</category><category>Customers</category><category>Databases</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Alcidion-Hero.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Modernizing Healthcare: How Alcidion achieved greater stability and performance with AlloyDB</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Alcidion-Hero.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/databases/modernizing-healthcare-how-alcidion-achieved-greater-stability-and-performance/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Raj Pai</name><title>VP, Google Databases</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Stephen Ridley</name><title>Alcidion, Director of SRE and Platform Operations</title><department></department><company></company></author></item><item><title>What’s new with Google Data Cloud</title><link>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;June 1 - June 5&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Beyond the Query: Powering AI Agents with Bigtable, Firestore &amp;amp; Memorystore &lt;br/&gt;&lt;/strong&gt;&lt;span style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"&gt;Discover the latest advancements in Google Cloud's NoSQL Database portfolio, including Bigtable, Firestore, and Memorystore. This series is designed for a broad audience: whether you are exploring these databases for the first time or are an existing user looking to leverage the new capabilities announced at Next '26. &lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/beyond-the-query-powering-ai-agents-with-bigtable-firestore-memorystore" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Register here to secure your spot!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Engineer's AI Toolkit Workshops: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Solve data-driven challenges with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery, AlloyDB&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and more. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Hosted by Google Cloud Labs, this highly technical event is built specifically for Platform Engineers, SREs, and cloud infrastructure teams ready to bridge the gap between AI prototypes and production-grade deployments. Look out for more locations coming soon&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Toronto&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; - June 25 (Data Cloud) | &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-labs-data-cloud-toronto" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RSVP Here&lt;/span&gt;&lt;/a&gt;&lt;br/&gt;&lt;strong style="vertical-align: baseline;"&gt;Chicago&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; - June 30 (Data Cloud) | &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-labs-data-cloud-chicago" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RSVP Here&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Start a 10-day &lt;/strong&gt;&lt;a href="https://cloud.google.com/bigtable"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Bigtable&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; free trial with a 1 node SSD cluster and up to 500GB of storage capacity. &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;W&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;ith no credit card required to start, you can easily ingest workloads and manage workloads that require low-latency, high-throughput, and predictable access. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Plus, new Google Cloud customers get &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/sql/docs/mysql/create-free-trial-instance"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;$300 in free credits&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on signup.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;May 11 - May 15&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Airflow&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; has launched a wave of new features, including the general availability of Airflow 3.1, AI-powered agentic troubleshooting, a new managed Airflow MCP Server for custom agent integration, and declarative YAML-based orchestration pipelines—discover all the details in the&lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/managed-apache-airflow-scaling-data-and-ai-workloads"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;full blog post&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 20 - April 24&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google-built ODBC Driver for BigQuery is now available in Preview&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the launch of the new, Google-built ODBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for applications to BigQuery and is developed entirely in-house by Google. &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/odbc-for-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Download a new driver and connect your application to BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 13 - April 17&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/looker-studio-is-data-studio"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;we are reintroducing Data Studio&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to play a significant role in the AI era, expanding from data visualizations and reports to host BigQuery conversational agents and data apps built in Colab notebooks.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-graph"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Graph is now available in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, offering an easy-to-use, highly scalable graph analytics solution, empowering data professionals to model, analyze and visualize massive-scale relationships in an entirely new way. &lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 6 - April 10&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-embedded-adds-conversational-analytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics for Looker Embedded environments&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, enabling users to add natural language experiences to their own custom data-driven applications, powered by Gemini. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;We expanded Looker’s capabilities for faster ad-hoc analysis, with the &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-self-service-explores"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;introduction of self-service Explores&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, enabling you to bring your own data to Looker’s semantic layer and gain instant access to insights in a governed data environment.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;March 23 - March 27&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We showed you how you can &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/databases/cloudsql-read-pools-support-autoscaling"&gt;&lt;span style="vertical-align: baseline;"&gt;scale your reads with Cloud SQL autoscaling read pools.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; This feature allows you to provision multiple read replicas that are accessible via a single read endpoint and to dynamically adjust your read capability based on real-time application needs. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Our customers are leveraging the full power of Conversational Analytics and Looker to drive major business and technical breakthroughs in the AI era. Companies like &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/telenor-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Telenor&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/petcircle-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Pet Circle&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/fluent-commerce"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Fluent Commerce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/lighthouse"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lighthouse Intelligence&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/wego"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Wego&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/roller"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;ROLLER&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are turning data into insights and actions, grounded by Looker’s semantic layer.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;March 16 - March 20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/gemini-supercharges-the-bigquery-studio-assistant"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;an enhanced Gemini assistant in BigQuery Studio&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, transforming the agent from a code assistant into a fully context-aware analytics partner.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 23 - February 27&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/databases/managed-mcp-servers-for-google-cloud-databases"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;managed and remote MCP support for Google Cloud databases&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including AlloyDB, Spanner, Cloud SQL, Bigtable and Firestore, to power the next generation of agents. This announcement extends the ability for AI models to plan, build, and solve complex problems, connecting to the database tools our customers leverage daily as the backbone of their work environment.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We outlined how you can &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/build-data-agents-with-conversational-analytics-api"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;build a conversational agent in BigQuery using the Conversational Analytics API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to help you build context-aware agents that can understand natural language, query your BigQuery data, and deliver answers in text, tables, and visual charts.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 16 - February 20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Our customers are leveraging the full power of Looker to drive major business and technical breakthroughs. Companies like &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/arrive"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Arrive&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/audika"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Audika&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/looker-carousell"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Carousell&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/framebridge"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Framebridge&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/gumgum"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GumGum&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/intel-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Intel&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/overdose-digital"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Overdose Digital&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/one-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ocean Network Express&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/subskribe"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Subskribe&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/promevo-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Promevo&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are leveraging Looker’s newest AI-driven capabilities, including Conversational Analytics, to transform data to insights and actions, and empower their entire organization with a single source of truth, powered by Looker’s semantic layer.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 2 - February 6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Join us on March 4 for our webinar, Win Your AI Strategy with Cloud SQL Enterprise Plus, to learn how to power your generative AI workloads with 3x higher performance and 99.99% availability. &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/win-your-ai-strategy-with-cloud-sql-enterprise-plus" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Register today&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to discover how to build a scalable, enterprise-grade foundation for your most demanding AI applications.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;January 26 - January 30&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-conversational-analytics-in-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics in BigQuery&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, which allows users to analyze data using natural language.&lt;/span&gt;&lt;/a&gt; &lt;span style="vertical-align: baseline;"&gt;Conversational Analytics in BigQuery is an intelligent agent that generates, executes and visualizes answers grounded in your business context directly in BigQuery Studio, making data insights for data professionals more conversational.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We outlined how &lt;/span&gt;&lt;a href="https://cloud.google.com/transform/from-asset-to-action-how-data-products-have-become-the-foundation-for-ai-agents"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;data products have become the foundation for AI agents&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, providing the context needed to make autonomous agents reliable and trusted for real business use, backed by organized business logic and semantic understanding.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We highlighted how &lt;/span&gt;&lt;a href="https://cloud.google.com/use-cases/data-analytics-agents"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;you can supercharge data analytics workflows&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and outlined Google Cloud’s AI agent offerings for data engineering, data science, and development tools, so you can integrate agentic workflows in your applications, empower your teams and speed discovery.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;January 19 - January 23&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We have fundamentally reimagined &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/new-firestore-query-engine-enables-pipelines"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore with pipeline operations for Enterprise edition&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Experience a powerful new engine featuring over a hundred new query features, index-less queries, new index types, and observability tooling to improve query performance. Seamlessly migrate using built-in tools and leverage Firestore’s existing differentiated serverless foundation, virtually unlimited scale, and industry-leading SLA. Join a community of 600K developers to craft expressive applications that maximize the benefits of rich queryability, real-time listen queries, robust offline caching, and cutting-edge AI-assistive coding integrations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://www.mssqltips.com/sqlservertip/11578/introducing-google-cloud-sql/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Introducing Google Cloud SQL on MSSQLTips&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We are highlighting a new technical guide published on MSSQLTips titled "Introducing Google Cloud SQL." This article serves as an essential resource for SQL Server administrators and developers exploring Google Cloud's fully managed database service. It provides a detailed overview of Cloud SQL capabilities, including high availability, security integration, and the seamless transition of on-premises SQL Server workloads to the cloud, making it an ideal resource for those planning their migration strategy.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the &lt;/span&gt;&lt;strong&gt;&lt;a href="https://medium.com/google-cloud/bridging-the-identity-gap-microsoft-entra-id-integration-with-cloud-sql-for-sql-server-a30207d63035" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Public Preview of Microsoft Entra ID&lt;/span&gt;&lt;/a&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; (formerly Azure Active Directory) integration with Cloud SQL for SQL Server. Designed to tackle the challenge of identity sprawl in multi-cloud environments, this integration allows organizations to govern database access using their existing Microsoft identity infrastructure. Key benefits include centralized identity management, enhanced security features like Multi-Factor Authentication (MFA), and simplified user administration through direct group mapping. This feature is available for SQL Server 2022 and supports both public and private IP configurations.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;January 12 - January 16&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google-built JDBC Driver for BigQuery is now available in Preview&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the launch of the new, Google-built JDBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for Java applications to BigQuery and is developed entirely in-house by Google. &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/jdbc-for-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Download a new driver and connect your Java application to BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Troubleshoot Airflow tasks instantly with Gemini Cloud Assist investigations:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Cloud Composer just got smarter. We are excited to announce that &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini Cloud Assist investigations &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;are now available directly within&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; Cloud Composer 3&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. Instead of manually sifting through raw logs, you can now simply click "Investigate" on a failed Airflow task. Gemini analyzes logs and task metadata to identify failure patterns—such as resource exhaustion or timeouts—and provides actionable recommendations driven by Gemini Cloud Assist to resolve the issue. This integration shifts the debugging experience from manual toil to automated root cause analysis, significantly reducing the time required to restore your pipelines.&lt;/span&gt; &lt;a href="https://docs.cloud.google.com/composer/docs/composer-3/troubleshooting-dags#investigations"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Learn more about AI-assisted troubleshooting&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-related_article_tout"&gt;





&lt;div class="uni-related-article-tout h-c-page"&gt;
  &lt;section class="h-c-grid"&gt;
    &lt;a href="https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud-2025/"
       data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }'
       class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
        h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"&gt;
      &lt;div class="uni-related-article-tout__inner-wrapper"&gt;
        &lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;

        &lt;div class="uni-related-article-tout__content-wrapper"&gt;
          &lt;div class="uni-related-article-tout__image-wrapper"&gt;
            &lt;div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/whats_new_data_cloud_fWg4bKK.max-500x500.png')"&gt;&lt;/div&gt;
          &lt;/div&gt;
          &lt;div class="uni-related-article-tout__content"&gt;
            &lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;What’s new with Google Data Cloud - 2025&lt;/h4&gt;
            &lt;p class="uni-related-article-tout__body"&gt;Recent product news and updates from our data analytics, database and business intelligence teams.&lt;/p&gt;
            &lt;div class="cta module-cta h-c-copy  uni-related-article-tout__cta muted"&gt;
              &lt;span class="nowrap"&gt;Read Article
                &lt;svg class="icon h-c-icon" role="presentation"&gt;
                  &lt;use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"&gt;&lt;/use&gt;
                &lt;/svg&gt;
              &lt;/span&gt;
            &lt;/div&gt;
          &lt;/div&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;/section&gt;
&lt;/div&gt;

&lt;/div&gt;</description><pubDate>Thu, 04 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</guid><category>Databases</category><category>Business Intelligence</category><category>Data Analytics</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/whats_new_data_cloud_fWg4bKK.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new with Google Data Cloud</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/original_images/whats_new_data_cloud_fWg4bKK.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>The Google Cloud Data Analytics, BI, and Database teams </name><title></title><department></department><company></company></author></item><item><title>What's new for Managed Service for Apache Spark clusters</title><link>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud, our goal is to let you run large-scale analytical and data science workloads with maximum efficiency so you can process big data pipelines, machine learning, and ETL tasks. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently announced that the Dataproc service is now &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-service-for-apache-spark"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, reflecting our deep integration with the &lt;/span&gt;&lt;a href="https://cloud.google.com/data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To support the diverse architectural needs of today’s modern data teams, we offer the service in two distinct deployment modes: serverless and managed clusters. The serverless deployment mode completely abstracts infrastructure management for ephemeral or ad-hoc jobs, while the managed clusters deployment mode is designed for teams that require fine-grained infrastructure customization, persistent environments, long-running stateful processing, or native integration with custom Compute Engine hardware configurations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When it comes to managed cluster deployments, we’ve re-imagined the experience from the ground up, focusing on three core pillars: making Spark &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;faster&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; by supercharging execution speeds, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;easier&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to run by maximizing resource obtainability and reducing operational overhead, and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;smarter&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; by embedding AI directly into the development and operational lifecycle. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This blog post focuses specifically on what we announced at Google Cloud Next ‘26 for the Managed Spark clusters deployment mode: providing enhanced flexibility to fine-tune performance and cost through native execution engine, smarter scaling policies, and Gemini-powered extensions. For the latest of the serverless deployment mode, check out &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;this blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster, with the Lightning Engine native execution engine&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Arguably the biggest update for Managed Spark clusters is &lt;/span&gt;&lt;a href="https://cloud.google.com/dataproc/docs/guides/lightning-engine"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lightning Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which introduces massive performance gains for Spark DataFrame/Dataset APIs and heavy Spark SQL queries. Powered by a native, C++ vectorized execution engine built on Velox and Gluten, with specialized internal enhancements, Lightning Engine bypasses JVM execution bottlenecks by compiling query plans into native instructions optimized for SIMD (Single Instruction, Multiple Data) vectorization.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This native execution engine delivers:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Up to 4.9x faster performance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; than standard open-source Spark&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;up to 2x the price-performance &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;over the leading high-speed Spark alternative&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Crucially, taking advantage of these performance gains doesn’t require any code changes to your existing Spark applications. Because your jobs complete faster, you directly reduce your aggregate Compute Engine runtime hours and overall spend.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To enable Lightning Engine on your managed clusters, simply specify the Lightning Engine option when you’re creating a cluster.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      data-glue-modal-trigger="uni-modal-2uYC821jtEk-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_u5e7XRu.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;The new way to use Spark: Intelligent, automated, and lightning fast&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
      &lt;figcaption class="article-video__caption h-c-page"&gt;
        
          &lt;h4 class="h-c-headline h-c-headline--four h-u-font-weight-medium h-u-mt-std"&gt;Learn technical details and hear Lowe’s experience with Lightning Engine&lt;/h4&gt;
        
        
      &lt;/figcaption&gt;
    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-2uYC821jtEk-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="2uYC821jtEk"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Easier: Maximize resource obtainability via Flexible VMs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Temporary localized shortages of a specific machine type can stall cluster creation or interrupt autoscaling. To dramatically improve cluster resilience against capacity constraints, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/concepts/configuring-clusters/flexible-vms"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Flexible VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for Managed Spark clusters are now generally available. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Flexible VMs allow you to define up to ten ranked machine types for your master, primary, and secondary worker nodes. Managed Service for Apache Spark pairs this preference with automated regional zone placement, dynamically scanning the entire region to fulfill your capacity requests using the best available hardware layout. This helps ensure your pipelines spin up predictably, drastically reducing resource availability errors, and maximizing your ability to capture cost-effective Spot VM capacity during periods of peak demand.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_vPfgVT7.max-1000x1000.jpg"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Easier: Zero-scale clusters and scheduled stops&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To give you better fiscal control over persistent and developmental environments, we recently announced the general availability of two highly requested FinOps features: &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/guides/create-zero-scale-cluster"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;zero-scale clusters&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/concepts/configuring-clusters/scheduled-stop"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;cluster scheduled stops&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Zero-scale clusters&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: You can now provision environments that use exclusively secondary workers (Spot VMs), enabling the cluster to automatically scale down to absolutely zero worker nodes when no processing is active, leaving only the master node online to preserve metadata.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cluster scheduled stops&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: This feature lets you configure automated cluster shutdown policies based on specific idle-time limits or a precise future timestamp.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Because these features are natively integrated, they reduce the operational friction of having to delete and reconstruct your environment, while you can stop paying for idle compute overhead during nights and weekends.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Managed Service for Apache Spark MCP Server&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To bridge the gap between generative AI and data engineering, we launched the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/guides/use-dataproc-mcp"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Model Context Protocol (MCP) server for Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This open-standard integration allows LLMs and AI assistants to securely and dynamically interact with your Managed Spark clusters using natural language.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By utilizing the MCP server, your AI agents can securely connect to your data platform under existing IAM permissions. This allows agents to perform cluster-based operations, such as creating a cluster, submitting a job, or adjusting an autoscaling policy, directly from your AI application. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Accelerating AI with the Data Agent Kit&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Data Agent Kit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; extension allows data scientists, engineers, and developers to manage their entire data workload lifecycle directly within their preferred development environment. We rolled out native support for this extension on Managed Spark clusters, enabling teams to seamlessly build and deploy specialized Data Agents for code generation and data wrangling.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_nOOSIdE.max-1000x1000.jpg"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers can choose to use &lt;/span&gt;&lt;a href="https://antigravity.google/blog/introducing-google-antigravity-2-0" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Antigravity 2.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Google's standalone, agentic development platform or bring these agentic capabilities into their preferred IDE including VS Code, Claude Code, or Codex via the Data Agent Kit extensions and plugins. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;By pairing this streamlined workflow with the raw processing power of managed clusters, these intelligent agents can securely execute complex workflows directly over petabyte-scale data lakes. Specifically, the Data Agent Kit enables developers to:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Build and orchestrate pipelines:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Author multi-node data pipelines and generate comprehensive code documentation using natural language.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Perform real-time debugging: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Leverage Gemini Cloud Assist to sift through executor logs, pinpoint root causes of job failures, and recommend actionable fixes.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Easily connect to Spark resources: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Instantly attach to serverless Spark runtimes or managed clusters without manual network configuration or local Spark installations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Streamline Git and CI/CD management:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Commit, merge, and deploy code directly from your IDE of choice, triggering automated testing and deployment pipelines without friction.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Next-generation Lakehouse &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently launched &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/introduction"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which delivers read/write interoperability between engines like Managed Service for Apache Spark and BigQuery. By leveraging the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/about-lakehouse-catalogs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse runtime catalog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; as a unified, serverless metadata layer, it removes data silos and the need for complex translation layers. This agentic-first approach allows organizations to process open formats directly from Google Cloud Storage, or even query remote AWS datasets using the newly introduced &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/about-cross-cloud-lakehouse"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;cross-cloud Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, all while maintaining a single source of truth for security and governance.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For customers utilizing Managed Spark clusters, this integration unlocks several powerful new capabilities. Data teams can now accelerate their most demanding ETL and data science workloads by up to 4.9x using the optimized Lightning Engine.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_ywa0kAz.max-1000x1000.png"
        
          alt="4"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Next-gen runtimes: Cluster Image 3.0 with Spark 4.1&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Keeping pace with the open-source ecosystem, we rolled out &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/release-notes#May_03_2026"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Image 3.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in preview, built with Apache Spark 4.1 and that features an upgraded default Java runtime, Java 21. Spark 4.1 introduces a set of core open-source capabilities, including real-time mode for structured streaming. This enables your Spark environment to support real-time streaming with continuous, sub-second latency processing.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These updates are live and ready to use today in Managed Spark clusters! You can enable these new features directly through the Google Cloud console or via the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gcloud&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; CLI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To spin up a new Managed Cluster and natively unlocking the performance of &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning Engine,&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; run the following command in your terminal:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud dataproc clusters create my-optimized-cluster \\\r\n    --region=us-central1 \\\r\n    --image-version=2.3 \\\r\n    --engine=lightning \\&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a6e906d0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alternatively, navigate to the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark page in the console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, click Create cluster, and select ‘Enable Lightning Engine’ under the cluster configuration settings to automatically activate Lightning Engine for your Spark jobs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We look forward to hearing about the environments you build and run as Managed Service for Apache Spark clusters!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 04 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</guid><category>AI &amp; Machine Learning</category><category>Streaming</category><category>Open Source</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What's new for Managed Service for Apache Spark clusters</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Qiqi Wu</name><title>Senior Product Manager, Google Cloud</title><department></department><company></company></author></item><item><title>What’s new in serverless Managed Service for Apache Spark</title><link>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you use it for data preparation, real-time interactive queries, AI model training, or something entirely different, running Apache Spark at scale is demanding — you shouldn’t have to manage the underlying infrastructure too.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Late last year, we &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/release-notes#December_04_2025"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; the general availability (GA) of our serverless &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-service-for-apache-spark"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; runtime version 3.0, prioritizing speed, simplicity, and reliability. Since then, customer use of Managed Service for Apache Spark for data science has nearly doubled year over year. This is a testament to our belief that using Google Cloud is the easier, smarter, and faster place to run your Apache Spark workloads. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In this blog, let’s dive into a few key features that make our serverless Apache Spark offering a great fit for a wide range of workflows, including feature engineering, GPU-accelerated model training and tuning, semantic search, RAG, building AI agents and applications, and more.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Zero-setup onboarding&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The most significant barrier to entry for a cloud service is often the "time to magic moment" — the interval between creating a project and running your first workload. Previously, with serverless Spark, you still needed to manually configure IAM roles, VPC networking, and firewall rules before submitting a single job.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the serverless Spark 3.0 runtime version, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;zero-setup onboarding&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; significantly reduces the time to launch your first workload on serverless Spark. It does so by automating the following steps:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Permissions:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Necessary IAM roles and permissions are automatically provisioned to the appropriate service accounts.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Networking:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/concepts/network#private-google-access-requirement"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Private Google Access&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is auto-enabled on subnets, and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/concepts/network#automatically_created_regional_system_firewall_policy"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;system firewall policies&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are configured automatically.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;API management&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Enabling APIs is now more efficient; you can just enable the Managed Service for Apache Spark API instead of manually having to enable several different APIs, as you did previously.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Fast startup for SLA-sensitive workloads&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Latency matters, especially for interactive data science and SLA-sensitive batch pipelines. Historically, serverless Spark startup times could take several minutes. With the 3.0 runtime, we’ve dropped startup times by 75% across both standard and premium tiers, delivered automatically without any code or configuration changes and at no additional cost. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This massive improvement qualifies serverless Spark for a much broader range of SLA-sensitive workloads, and we’re always looking to optimize startup times even further. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Serverless Spark allowed us to quickly reap benefits by removing the need for fine-grain machine management. This drove faster model development and significantly reduced our data processing costs." &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- César Narnajo, Principal Engineer, Moloco&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=190tVajZgRI"
      data-glue-modal-trigger="uni-modal-190tVajZgRI-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/yt_SnqmNb0.max-1000x1000.png);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Serverless data science: Seamless AI workflows with Spark and BigQuery&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-190tVajZgRI-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="190tVajZgRI"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=190tVajZgRI"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Better GPU obtainability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Support for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/managed-spark/docs/guides/dws-serverless"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dynamic Workload Scheduler (DWS)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Flex Start Mode in the serverless 3.0 runtime version allows serverless Spark to queue customer requests for a configurable duration when GPUs are unavailable. This feature addresses the obtainability challenges for high-demand accelerators like NVIDIA A100 and L4 that are the subject of frequent regional shortages. By pausing workloads until the necessary GPU capacity becomes accessible with DWS, you can dramatically increase obtainability and reliability for your latency-sensitive AI/ML workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_L0aDvOP.max-1000x1000.jpg"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;First-class support for Apache Spark 4.x&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The s&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;erverless Spark 3.0 runtime version supports current and upcoming &lt;/span&gt;&lt;a href="https://spark.apache.org/releases/spark-release-4-0-0.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Apache Spark 4.x&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; innovations, including Spark Connect, which supports a decoupled client-server architecture that enables remote connectivity from any client.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhanced multi-zonal support&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To protect global enterprise workloads from zonal outages or hardware stockouts, the serverless Spark 3.0 runtime introduces enhanced multi-zonal support by default. The service can now automatically allocate execution nodes across multiple zones within a single region to help ensure obtainability.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Crucially, we do not charge for cross-zonal network traffic between nodes in a region, providing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;high availability without the traditional multi-zone tax.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This is another benefit that you can realize by bringing your global Apache Spark workloads to Google Cloud.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_2SbCvxI.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Looking ahead&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In addition to&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; the above, we’re also continuing to innovate and push the boundaries of ease of use in areas such as history-based &lt;/span&gt;&lt;a href="https://medium.com/google-cloud/a-google-engineers-take-on-a-common-spark-problem-and-how-we-re-fixing-it-44b26293cce0" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autotuning&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and goal based &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/managed-spark/docs/concepts/autoscaling-serverless#profiles"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autoscaling&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can take advantage of these features today by specifying &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;runtime_version: 3.0&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; in your batch workloads or interactive sessions.  To run your first workload on serverless Spark, perform the following simple steps:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/flows/enableapi?apiid=dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;If you aren’t the project owner, ask your project admin for the serverless Managed Service for Apache Spark &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/iam/docs/roles-permissions/dataproc#dataproc.serverlessEditor"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Editor &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;(&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;roles/dataproc.serverlessEditor&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;) role on the project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Now you’re ready to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/quickstarts/spark-batch#submit_a_spark_batch_workload"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;start running your workloads&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on the Serverless 3.0 runtime version.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; For more details, visit our updated &lt;/span&gt;&lt;a href="https://cloud.google.com/dataproc-serverless/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and access serverless Managed Service for Apache Spark in the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 03 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new in serverless Managed Service for Apache Spark</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vinay Londhe</name><title>Software Engineering Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Bhooshan Mogal</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>Accelerating data lakes: Optimizing Apache Iceberg and Spark with gcs-analytics-core</title><link>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many data engineers spend significant time managing compatibility and getting best performance across multiple analytics engines. To help solve this pain point, we are excited to announce &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;gcs-analytics-core&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a new open-source Java library designed to centralize and accelerate analytics optimizations for &lt;/span&gt;&lt;a href="https://cloud.google.com/storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Storage (GCS)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With this, you get the flexibility to select your preferred analytics engine while achieving high performance on GCS. The gcs-analytics-core library provides optimizations across various analytics engines that you use today on GCS, like the Iceberg Spark engine and plan to expand to other analytics engines by the end of this year.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Built to be shared across major data processing frameworks like Apache Spark, this library consolidates and improves performance for analytics workloads on GCS. Available natively in the Apache Iceberg Java runtime starting from version &lt;/span&gt;&lt;a href="https://iceberg.apache.org/releases/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;1.11.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, this library improves read operations for columnar formats like Parquet.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What is the gcs-analytics-core library?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is a centralized optimization layer that sits between your analytics engines — such as Apache Spark, Trino, and Apache Hive — and the underlying GCS Java SDK. It intercepts read calls and injects performance enhancements, providing a consistent experience without requiring framework-specific tuning.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For Apache Iceberg users, it integrates into the GCSFileIO implementation, replacing traditional sequential reads with parallelized strategies to minimize latency and maximize throughput.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Key technical optimizations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The library introduces specific optimizations designed to reduce time spent on I/O and end-to-end execution time:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Vectored I/O (threaded):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This feature improves read performance by fetching multiple data ranges in parallel within a single operation, reducing the overhead of GCS calls. Without this feature, the system needs to issue a separate call for each data range, increasing both the number of operations and open file latency for each request.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Smart Parquet prefetching:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When reading Parquet data, analytics engines typically perform an initial read of the file’s footer, which contains the data structure and information about where specific data ranges are located. The library automatically prefetches this footer data in a single chunk (typically 50KB–100KB), avoiding the multiple network calls that often occur when engines repeatedly seek backward to fetch metadata..&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Spotlight: Apache Iceberg integration&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We delivered the first major integration of this library into &lt;/span&gt;&lt;a href="https://iceberg.apache.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Apache Iceberg&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;With Iceberg 1.11.0 or later, analytics engines utilizing Iceberg’s GCSFileIO can leverage these performance enhancements. To adopt the library in your environment, verify your Iceberg catalog is configured to use the native GCS FileIO:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;# Spark configuration example\r\nspark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a6e90130&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Because the core optimizations are embedded within the updated Iceberg runtime and the GCS connector architecture, you automatically benefit from Parquet footer prefetching and multi-threaded vectored reads — with no complex custom tuning required.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can follow the specific integration details in Apache Iceberg &lt;/span&gt;&lt;a href="https://github.com/apache/iceberg/issues/14326" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Issue #14326&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Catalog compatibility&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is compatible with all Iceberg catalogs  including the REST catalog, Hive, and other metadata management systems. By decoupling the performance optimizations from the catalog management layer, the library provides consistent read improvements without requiring adjustments to your existing infrastructure setup so you can scale across diverse data lake architectures.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS Performance Benchmarks using Spark&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To validate these improvements, end-to-end benchmarking was performed using an open source Apache Spark cluster with an Iceberg catalog configured to use GCSFileIO along with the gcs-analytics-core library.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The benchmark leveraged the industry-standard &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; schema across varying dataset sizes (from 1GB up to 10TB), specifically comparing the new library's optimizations against the default GCSFileIO implementation, which uses sequential vectored reads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By alleviating the I/O bottleneck at the storage layer, compute engines spend less time waiting for network responses (scan time) and more time processing data (execution time).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here are the end-to-end TPC-DS benchmark results showcasing the percentage improvement when enabling gcs-analytics-core:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/TPC-DS_benchmark_for_gcs-analytics-core_I7.max-1000x1000.jpg"
        
          alt="TPC-DS benchmark for gcs-analytics-core"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table style="width: 99.4778%;"&gt;&lt;colgroup&gt;&lt;col style="width: 29.2169%;"/&gt;&lt;col style="width: 32.5301%;"/&gt;&lt;col style="width: 38.253%;"/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS schema size&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Scan time improvement&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Execution time improvement&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;71.51%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;32.61%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;10 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;48.48%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;18.94%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;100 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;40.98%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;10.95%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1 TB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;35.86%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;3.38%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;10 TB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;18.40%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;1.58%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As the data shows, there is a consistent improvement across all dataset sizes. The library is effective for the complex query patterns in TPC-DS, delivering scan time reductions that directly lower overall query execution time.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Before running your Spark workloads, confirm that the following requirements and configurations are met:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Use Apache Iceberg Spark runtime 1.11.0+ and the iceberg-gcp-bundle 1.11.0+.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Configure your catalog to use GCSFileIO.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable the gcs-analytics-core optimization flag (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable vectorized I/O (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;spark.sql.iceberg.vectorization.enabled=true&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;) to achieve read performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;spark-submit \\\r\n  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.11.0,org.apache.iceberg:iceberg-gcp-bundle:1.11.0 \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true \\\r\n  --conf spark.sql.iceberg.vectorization.enabled=true \\\r\n  &amp;lt;your-application-jar-or-script&amp;gt;&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a6e90190&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is open source and available for developers to contribute to the project and explore the source code. Our implementation and micro-benchmark configurations are part of the repository and can be referenced for your contributions or validations.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;GitHub repository:&lt;/strong&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GoogleCloudPlatform/gcs-analytics-core&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Documentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Review the&lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;design document&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for deep architectural details.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We want to hear about your experience. If you test this on your own datasets, please feel free to open an issue on GitHub or share your results with the community. We look forward to seeing how you utilize these optimizations in your data lakes.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 02 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Accelerating data lakes: Optimizing Apache Iceberg and Spark with gcs-analytics-core</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Ajay Yadav</name><title>Software Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nivedita Aggarwal</name><title>Engineering Manager</title><department></department><company></company></author></item><item><title>The fully-managed Remote MCP Server for AlloyDB is now Generally Available</title><link>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI agents possess incredible reasoning capabilities and can perform increasingly complex actions. But the reliability of agentic outcomes depends entirely on the quality of the context they can access&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;— context that is frequently locked away in operational databases.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To bridge this gap, we are excited to announce the Remote Model Context Protocol (MCP) Server for &lt;/span&gt;&lt;a href="https://cloud.google.com/products/alloydb?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is now generally available. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Model Context Protocol (MCP) is an open-source standard that gives LLMs a secure, consistent way to connect to external data sources. As part of Google Cloud’s recent rollout of &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/google-managed-mcp-servers-are-available-for-everyone?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;50+ Google-managed MCP servers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, this new integration makes it easier than ever for both interactive and autonomous agents to securely harness the full power of your enterprise data. For example, you can now ask an AI agent for an up-to-the-millisecond view of your delivery fleet by connecting it to your real-time logistics data in AlloyDB, avoiding inaccuracies due to stale data and reducing the need for manual reporting.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Why AlloyDB is the strong foundation for agentic apps&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By connecting MCP to AlloyDB, your agents get access to the premier database built for enterprise-grade AI. AlloyDB delivers the scale, speed, and intelligence required for the most demanding agentic workloads:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Supercharged vector performance:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Scale to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/choose-index-strategy#:~:text=Scales%20well%20to%2010B%20vectors"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;over 10 billion vectors&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; at up to 6x the speed of standard PostgreSQL for vector queries (and up to 10x faster for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/filtered-vector-search-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;filtered queries&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) with the ScaNN index.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced search and reranking:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Power multimodal applications with hybrid search via &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/create-rum-index"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RUM&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (in Preview) and intelligent reranking through Reciprocal Rank Fusion (RRF) or &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/rank-rerank-search-results-rag"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise Platform models&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-time intelligence:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Efficiently generate &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/generate-manage-auto-embeddings-for-tables"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;millions of embeddings&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; using built-in &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/ai-query-engine-landing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to facilitate low-latency, real-time agentic experiences.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified data access:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Give agents a single PostgreSQL interface to seamlessly join operational data in AlloyDB with analytical data in BigQuery or archived data in Iceberg tables via &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/bigquery-view-alloydb-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse Federation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enterprise-grade scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Rest easy with a &lt;/span&gt;&lt;a href="https://cloud.google.com/alloydb/sla?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;99.99% SLA&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/overview#automatic"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autopilot&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; database optimizations, and auto-scaling read pools with up to 20 nodes. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Why Remote MCP matters for AlloyDB&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Local MCP servers are great for local development, but communicating over standard input/output (stdio) streams becomes difficult when you scale to production workloads. It is both architecturally complex and administratively burdensome to provision and manage all of the infrastructure and security guardrails you need to run agents for high-value use cases that interact with sensitive operational data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Remote MCP Server for AlloyDB runs on fully-managed Google Cloud infrastructure and exposes an HTTP endpoint that connects your AI applications to your data. This solves key challenges for teams building agents on PostgreSQL:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Centralized discovery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Find, secure, and manage your database's MCP server using &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/agent-registry/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Registry&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fully-managed HTTP endpoints&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: No need to deploy or maintain the infrastructure required for connectivity. Configure your agent to use the endpoint to get started.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fine-grained authorization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Instead of using shared database passwords or API keys, you use &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/iam/docs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Identity and Access Management (IAM)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to restrict agents to specific tables, schemas, or views. With the read-only execute SQL tool, you can prevent your agent from making accidental changes and deletions from your database. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Operational instance management&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The AlloyDB toolset gives agents the ability to do more than run queries. Agents can update instances, export and import data, create backups, and restore clusters.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Model Armor protection&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/model-armor?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Model Armor&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; provides optional prompt and response security to screen and filter data, defending against prompt injections or accidental data exfiltration.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Audit logging&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Every query, action, and tool call goes to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/logging/docs/audit"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Audit Logs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, giving security teams a full audit trail.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Let's see it in action: A quick demo&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Getting started with the AlloyDB Remote MCP server is a straightforward process. To see it in action in your own environment, you can follow our &lt;/span&gt;&lt;a href="https://codelabs.developers.google.com/alloydb-ai-mcp" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;new Codelab&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which guides you through these essential steps:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;API &amp;amp; environment prep&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Enable the AlloyDB, &lt;/span&gt;&lt;a href="https://cloud.google.com/products/compute?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Compute Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/products/gemini-enterprise-agent-platform?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; APIs in your Google Cloud project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Provision your database&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Deploy your AlloyDB cluster, create your database, and import your sample data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enable data access API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Permit the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/use-alloydb-mcp#execute-sql"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Data Access API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on your AlloyDB instance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Connect the agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Configure your MCP client by providing the remote endpoint (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;https://alloydb.googleapis.com/mcp&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;). Pass your Google Cloud IAM credentials using an OAuth 2.0 bearer token in the HTTP Authorization header.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once the connection is established, your agent can provide reliable, grounded answers to complex business questions using your real-time operational data. By performing introspection queries, the agent automatically understands your database schema – including tables and columns – enabling it to construct sophisticated joins and queries to fulfill user requests accurately.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_-_Setup.gif"
        
          alt="1 - Setup"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once your agent has access to the AlloyDB toolset, it can execute queries, analyze operational trends, and dynamically rank text data using AlloyDB &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/ai-query-engine-landing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; like &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.RANK()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_-_Rank.gif"
        
          alt="2 - Rank"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Security remains paramount: the Remote MCP Server for AlloyDB integrates seamlessly with Model Armor. This provides protection against sensitive data leaks, even if the agent’s service account possesses broad access permissions within the database. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_-_Secure.gif"
        
          alt="3 - Secure"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Watch the full demo below!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=-dPZ19fGM20"
      data-glue-modal-trigger="uni-modal--dPZ19fGM20-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_ZNMrpaE.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;How to connect AI agents directly to your enterprise data: Introducing the AlloyDB remote MCP server&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal--dPZ19fGM20-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="-dPZ19fGM20"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=-dPZ19fGM20"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What's next&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By enabling agents to interact securely with transactional data, we are embracing an architecture where AI agents can reliably access and act upon your enterprise’s single source of truth. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ready to build? Discover AlloyDB with a &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/free-trial-cluster"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;30-day free trial&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and dive into the &lt;/span&gt;&lt;a href="https://codelabs.developers.google.com/alloydb-ai-mcp" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Remote MCP for AlloyDB Codelab&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to start powering your enterprise agentic applications today.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 01 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</guid><category>AI &amp; Machine Learning</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>The fully-managed Remote MCP Server for AlloyDB is now Generally Available</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Paul Ramsey</name><title>Product Manager, AlloyDB, Cloud SQL, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Gleb Otochkin</name><title>Cloud Advocate, Databases, Google Cloud</title><department></department><company></company></author></item><item><title>Modeling a digital twin of a food supply chain using BigQuery Graph</title><link>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The example of a growing restaurant&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Imagine you are running a restaurant chain. You just can't physically feel and touch things to know how your business operates. You need tools and a digital replica of your business to&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; sense the health of the business for you.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The friction of growth&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Growth creates a unique kind of friction that spreadsheets simply weren't built to solve:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The bullwhip effect:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Small downstream demand shifts swell into upstream inventory tidal waves.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;SOP drift:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Tiny departures from standard prep work eventually erode the entire brand vibe.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The food safety blast radius:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; One contaminated ingredient creates a messy, complex map of risk across the network.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Maverick spend:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The "million-dollar leak" caused by local managers purchasing ingredients off-contract.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The digital twin&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Digital models empower us to ask more insightful questions about the world, but they also force a critical choice in how we structure data. While traditional relational tables have been the standard, we must ask: are they still the right tool for everything? Given that our world is inherently interconnected, perhaps shifting to graph-based models is the natural evolution for capturing reality.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When managing thousands of assets, complex supply chains, or global logistics networks, traditional relational databases require massive, resource-intensive SQL joins to trace dependencies. This architecture creates a latency gap between physical events and operational awareness.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Modeling with BigQuery Graph&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery Graph allows you to build a digital twin of your entire supply chain within your existing data platform. By turning your physical world—items, recipes, and locations—into a searchable map of nodes and edges, you gain a new level of clarity.&lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Defining the Semantic Layer&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Instead of moving data to a new database, you create a Graph View over your existing tables. This tells BigQuery exactly how your tables relate to one another.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Query Language:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;# Build the Graph Nodes &amp;amp; Edges\r\nCREATE or REPLACE PROPERTY GRAPH `restaurant.bombod`\r\nNODE TABLES (\r\n  `restaurant.item` label item properties all columns,\r\n  `restaurant.location` label location properties all columns,\r\n  `restaurant.itemlocation` label itemlocation properties all columns\r\n)\r\nEDGE TABLES (\r\n  `restaurant.bom`\r\n  KEY(bomKey)\r\n  SOURCE KEY (childItemLocation) REFERENCES `restaurant.itemlocation`(itemLocationKey)\r\n  DESTINATION KEY (parentItemLocation) REFERENCES `restaurant.itemlocation`(itemLocationKey)\r\n  LABEL consists_of properties all columns\r\n);&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84b453e2b0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_6on1ArC.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="zg2w6"&gt;Image of a fictitious restaurant supply chain modeled using BigQuery Graph&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Precision in practice&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;How does this change daily operations? It moves the business from panic to precision.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Surgical recalls:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; If a supplier reports a Listeria breakout, you walk the graph forward to find exactly which menu items in which specific restaurants are affected.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Weather risk analysis:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When a hurricane threatens a distribution center, you don't see a list of stores; you see the blast radius. You identify the locations critically dependent on that hub and reroute supplies.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Executing the search&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Graph Queries are a new tool for modelers and data scientists to query their data - it simplifies complex multi-domain data concepts and simplifies querying and makes data analysis a simpler more natural representation of problem articulation. For example: If I want to know which all locations handle chicken I could run a graph query as shown below:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To investigate a specific complaint or risk, you run a search on the model using graph query language. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Graph Query Language&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;# Navigate to the source of a specific ingredient issue\r\nGraph restaurant.bombod\r\nMATCH (a:itemlocation)-[c:consists_of]-&amp;gt;(b:itemlocation) \r\nWHERE b.itemKey LIKE &amp;#x27;%Chicken%&amp;#x27;\r\nRETURN to_json([to_json(a),to_json(c),to_json(b)]) as result&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84b453ecd0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_aIlciIs.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="zg2w6"&gt;Source of a foul odor - modeled as a graph&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Building for the future&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To get the most out of your digital twin, follow these guiding principles:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus on structure:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Use graphs for relationships and dependencies; keep daily sales totals in relational tables.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Clean your keys:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Spend time on data engineering; a graph is only as strong as its connections.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Capture edge properties:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Store metadata like lead times or shipping costs directly on the edges to increase the model's utility.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The restaurant industry has outgrown the relational way of treating business data only as a list. By building inter-domain relationships as a digital twin with BigQuery Graph, you move from reactive problem solving to proactive modeling. It’s time to stop managing your network with a list and start seeing the connections in seconds.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Check out the tutorial &lt;/strong&gt;&lt;a href="https://codelabs.developers.google.com/codelabs/supplychaingraph#0" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Visit the BigQuery documentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; find &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/graph-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;overview &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/graph-create"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;quickstart guide&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Share your feedback:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; join our &lt;/span&gt;&lt;a href="http://tinyurl.com/bqgraph-userforum" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;community&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and get your questions answered via &lt;/span&gt;&lt;a href="mailto:bq-graph-preview-support@google.com"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;bq-graph-preview-support@google.com&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Related blog: &lt;/strong&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-graph?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Introducing BigQuery Graph&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Mon, 01 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</guid><category>BigQuery</category><category>Databases</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Modeling a digital twin of a food supply chain using BigQuery Graph</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Guru Rangavittal</name><title>Cloud Transformation Technical Lead, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Candice Chen</name><title>Product Manager, BigQuery</title><department></department><company></company></author></item><item><title>Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators &amp; more</title><link>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;AI and cloud technology are reshaping every corner of every industry around the world. Without our customers, who are building the future on our platform, there would be no Google &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Cloud. In this &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up-april-2026"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;regular round-up&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, we dive into some of the exciting projects redefining businesses, shaping industries, and creating new categories. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;For our latest edition, we learn how &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Urban Outfitters&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; sped up its order management; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;BASF&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; uses AlphaEvolve algorithms to map global supply chains; the unification strategy for &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;UKG&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;’s workforce intelligence; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;WPP&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;’s secrets to training humanoid robot camera operators; how &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Breuninger&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; piloted Virtual Try-On APIs; creating automated video clips with &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Glance&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;; and &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Movix&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; improves the production of dental aligners.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Be sure to check back next month to see how more industry leaders and exciting startups are putting Google Cloud technologies to use. And if you haven’t already, please peruse our list of &lt;/span&gt;&lt;a href="https://workspace.google.com/blog/ai-and-machine-learning/how-our-customers-are-using-ai-for-business" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;1,302 real-world gen AI use cases&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; from our customers.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Urban Outfitters saves big by migrating order management&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Urban Outfitters, Inc. (URBN), the popular clothing and home goods retailer, relies on IBM Sterling OMS as the nerve center of its global ecommerce operations. However, the foundation of this critical system — a massive 11TB Oracle database — was increasingly becoming a bottleneck.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/urban-outfitters-moves-sterling-oms-to-alloydb-for-postgresql"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; URBN completed a major infrastructure upgrade, migrating its IBM Sterling OMS from an Oracle database to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud's AlloyDB for PostgreSQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. To enhance performance and provide high availability and scalability, the AlloyDB deployment architecture includes two read replicas, providing low-latency access to data for reporting and analytics. Google Cloud and IBM teams also assisted URBN in a rigorous, iterative switchover testing strategy.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The migration to AlloyDB has fundamentally reshaped URBN’s data strategy, delivering a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;more favorable total cost of ownership&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; through an optimized storage and compute architecture, without sacrificing performance or reliability. Furthermore, the shift to a PostgreSQL-compatible database gave URBN the flexibility of an open-source ecosystem, providing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;freedom from vendor lock-in&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, as well as &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;significant speed improvements &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;that enhanced responsiveness.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; "URBN’s successful migration serves as a blueprint for organizations looking to modernize their mission-critical infrastructure and future-proof their environment for AI expansion. This journey proves that even the most complex, mission-critical migrations can be achieved through deep cross-organizational partnership and a phased, risk-mitigated approach." – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Rob Frieman&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, CIO, Urban Outfitters &amp;amp;&lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt; Raj Pai&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, VP, Product Management, Databases, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;BASF manages supply chain decisions with AlphaEvolve&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; BASF Agricultural Solutions manages a complex network of 180 production sites with more than 5,000 distinct value chains. Currently, human planners make thousands of local decisions every day on what to produce, when to produce it, and how much safety stock to hold.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/how-basf-manages-thousands-of-supply-chain-decisions-with-alphaevolve"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; To understand how local decisions ripple across their entire global network, BASF turned to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlphaEvolve on Google Cloud&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to build a digital twin of their supply chain. In collaboration with Google Cloud and prognostica GmbH, BASF fed the model three years of historical data and then generated variations of the code, mutating the logic to see if it could simulate a supply chain that matched the real-world historical data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By running thousands of experiments, AlphaEvolve developed a clear, human-readable algorithm that explains how the BASF network truly operates. The final algorithm successfully mirrored the actual historical performance of the supply chain, significantly &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;reducing the error rates&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; compared to the initial seed model. It automatically discovered factually correct, domain-specific supply chain rules, providing a clear foundation for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;optimizing asset utilization globally&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; “We had several attempts to build a digital twin. … By using AlphaEvolve, we cannot only map the complex network based on system data, but at the same time understand and copy the human decisions that drive our daily operations.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Dr. Goetz Krabbe&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;vice president for global supply chain at BASF&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;UKG unlocks real-time workforce intelligence at scale&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; UKG is one of the leading providers of human capital management (HCM) and workforce management (WFM) solutions, but years of growth led to backend sprawl. They have 126 application teams, dozens of tech stacks, and more than 12,000 database instances.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/how-ukg-taps-workforce-intelligence-with-the-agentic-data-cloud"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; To bring the full UKG suite onto one real-time foundation, the company built People Fabric, a new data and intelligence platform powered by &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlloyDB for PostgreSQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and the just-announced &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. They created a custom change data capture (CDC) framework to extract changes from existing operational databases, and for larger analytical workloads, the same data flows into &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, while &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud SQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; holds the metadata and tenancy context.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; People Fabric gives UKG a complete and consistent view of people, work, pay, and culture data that’s updated continuously and ready for AI to use in real time. For engineering teams, People Fabric acts as a database-as-a-service that &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;accelerates development and supports modernization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; without customer disruption. Additionally, migrating core person and employment data off their on-prem monolith has generated &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;cost savings significant enough to fund half of People Fabric&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us: “&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;As we continue expanding People Fabric, we’re laying the groundwork for deeper agentic automation, more responsive analytics, and a growing set of AI-driven capabilities — all on a trusted, scalable foundation built for what’s next.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Radhi Chagarlamudi&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Group Vice President, Product Engineering, UKG &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Heather White&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Cloud Data Architect, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;WPP accelerates humanoid robot training 10x with G4 VMs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; WPP is one of the world’s largest marketing organizations, handling $70 billion of media for enterprise clients. They work on some of the most complex commercial film shoots and were eager to test the viability of robotic cameras to capture more footage, but this required complex training of physical models AI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure/wpp-humanoid-robots-ai-training"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; WPP used the new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;G4 VM instance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; powered by NVIDIA RTX PRO 6000 Blackwell on Google Cloud to tackle the unique challenges of training physical AI for robotics in videography settings. After capturing human motion with the OptiTrack mocap system, they undertook reinforcement learning using the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AI Hypercomputer&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; together with the NVIDIA Isaac Sim image. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;MuJoCo&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, an open source physics engine by Google DeepMind, was a critical piece of simulation software that validated accuracy continuously, in real-time.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; WPP was able to utilize a P2P topology that moves data directly between GPUs without the bottleneck of central processing. They saw &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;speed increases in excess of 10x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, taking training times down to less than one hour. Through high-volume simulation, the humanoid robots learned how to respond to small changes and bridge the tough "sim-to-real" gap, helping ensure the robot's simulated adaptability translated to safety and stability in the real world.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; "Our process for mastering complex, natural movement on a film set can be replicated across industries to overcome the massive computational complexity of training robots." – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Perry Nightingale&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;SVP of Creative AI, WPP&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Breuninger boosted sales with its "be your own model" AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Breuninger, a fashion and lifestyle company based in Germany, thought emerging generative media models could be a good fit to answer the question every online fashion shopper asks: "How will this look on me?"&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/retail/how-breuninger-boosted-sales-with-its-be-your-own-model-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Working with Google Cloud, they built a virtual try-on experience that lets shoppers see high-end fashion on their own bodies using a simple selfie. Using the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Virtual Try-On (VTO) API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, Breuninger’s data team worked directly with Google’s engineers to test and refine the technology in three stages, ultimately moving from pre-selected models to a user-first, selfie-based approach. The project was also part of Breuninger’s move to a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Flutter&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;-based platform, which helped the team move from its vision to a live launch in only three months.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; During a six-week A/B test over Black Week and the holiday season, the team found that shoppers who used the virtual try-on &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;converted purchases at a higher rate &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;than those who didn't. Customer surveys reinforced the numbers: shoppers responded well to the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;high image quality&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;personalized experience&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us: &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Breuninger continues to refine the experience based on how customers actually use virtual try-on in everyday shopping — the same user-first approach that shaped the project from the start.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Daniel Rascher&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Senior Product Owner, Breuninger &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Dr. Michael Menzel&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Customer AI Specialist, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Glance turns hours of video into mobile-ready clips&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Glance, a mobile-first content platform, processes 1-2 hour videos from sources like podcasts, news reports, movies, and web series, and transforms them into 30 to 180-second vertical clips optimized for mobile lock screens.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/media-entertainment/how-glance-turns-hours-of-video-into-mobile-ready-clips-with-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; The goal was to create a complete pipeline that takes a long-form landscape video (16:9) and outputs multiple ready-to-publish short-form portrait videos (9:16). The final technical solution uses &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud Speech-to-Text v2&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Vision API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, combined with custom video manipulation using Samurai (an open-source object tracking tool), OpenCV and MoviePy. The process involves audio extraction, speech-to-text transcription, and using &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini 2.5 Flash&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to analyze transcript text and identify optimal start and end timestamps for short video clips.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; With daily volume projected to grow from 3,500 to over 10,000 videos per day, manual editing wasn’t a realistic path forward. Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. The system transforms thousands of long-form videos into mobile-ready clips each day, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;preserving narrative context while optimizing for vertical viewing&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. Rather than choosing between scale and quality, automated pipelines can &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;deliver both&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. … The approach offers a template for any organization sitting on long-form video archives. Rather than choosing between scale and quality, automated pipelines can deliver both.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Himanshu Aggarwal&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Machine Learning Engineer, Glance &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Sharmila Devi&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, AI Consulting Lead, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Movix fills a gap in dental skills with specialized agentic AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Movix is building one of the first agentic AI solutions for dental appliance manufacturers and dental labs, to help solve a serious shortage of skilled dental technicians in aligner manufacturing.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/startups/filling-the-gaps-in-dental-skills-with-specialized-agentic-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Movix developed custom models for deep learning, computer vision, and 3D mesh analysis over a five-month period, using Google Cloud infrastructure. Once defects are detected, they use the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini Enterprise Agent Platform&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to generate client-facing feedback that reads as if it came directly from a human technician. Their 3D models use &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Run with L4 GPUs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for the massive compute power required, and they use &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Compute Engine VMs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to run experiments and train models.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Movix’s agentic solutions automate data entry and quality control, which are traditionally manual, time-consuming, and error-prone tasks. The automation and higher level of accuracy the QC agent delivers can &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;save $300 per remake&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for an aligner manufacturer, and speed up the appliance manufacturing process with quicker turnaround times.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; “&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;We plan to build hybrid solutions … designing an architecture that connects our cloud-based AI agents with older, on-premises software that many conservative labs still use — through lightweight local connectors and standardized APIs. This will allow us to access a large market segment that has not yet migrated to the cloud.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Marina Domracheva&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;CEO, Movix &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Bakit Dzhumagulov, &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;CTO, Movix&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 29 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</guid><category>Partners</category><category>AI &amp; Machine Learning</category><category>Data Analytics</category><category>Application Modernization</category><category>Infrastructure Modernization</category><category>Customers</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/cool_stuff_may.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators &amp; more</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/cool_stuff_may.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Google Cloud Content &amp; Editorial </name><title></title><department></department><company></company></author></item><item><title>From petabytes to predictions: Easy BigQuery insights in Google Sheets</title><link>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Many organizations’ single source of truth is data that resides in BigQuery, Google’s governed, secure and petabyte-scale data platform. However, the "last mile" of ad-hoc analysis, modeling, and reporting often happens where business users are most comfortable: Google Sheets.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Bridging this gap usually involves exporting data as CSVs. But this is inefficient, creating data silos, version control problems, and security and governance risks. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Connected Sheets&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; helps to eliminate this trade-off, turning the familiar Google Sheets interface into a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;direct, live window&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; into your BigQuery data platform, letting you analyze petabytes of data quickly, securely, and easily.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;In this post, we’ll do a quick overview of Connected Sheets, walk through real-world use cases, and show you how to perform enterprise-grade data analysis using BigQuery directly in Google Sheets. &lt;/span&gt;&lt;/p&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;A live window into the single source of truth&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Business users often wait days or weeks for simple reports. Connected Sheets solves this by letting you analyze your critical data via a secure, direct connection to billions of rows of live data, with no SQL required. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;For &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;data admins&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, this architecture is appealing because it maintains a strong security and governance posture. They can provision access to specific tables or views, confident that the underlying data cannot be altered from a Connected Sheet. Admins can also take advantage of Google Workspace’s enterprise data protections to control reading, sharing, and copying data throughout its lifecycle.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;end users&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, the benefit is immediate agility and ease of use. They can use familiar tools like pivot tables, charts, calculated columns, and formulas to analyze billions of rows of live data as if it were a local file, balancing centralized control with the business's demand for speed. End users don’t have to learn technical concepts like databases, schemas, tables, and query languages like SQL to access, analyze, and visualize the data.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/intro_cs.gif"
        
          alt="intro cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Key use cases and core journeys&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;We consistently hear about three primary use cases for Connected Sheets from customers across industries. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Self-service exploratory analysis:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Data teams provide access to curated tables and datasets in BigQuery. Business Analysts in sales, operations, finance, or marketing can then build their own pivot tables or charts that run over the entire live data source directly from Sheets, then filter data to answer day-to-day questions, freeing the data team from a constant backlog of ad-hoc requests.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Deep-dive investigation&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A sales manager analyzes millions of global transactions to review quarterly performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Using a Connected Sheets pivot table, they quickly create a pivot table to summarize revenue by region and product line. When they spot an anomaly — an unexpected revenue spike in EMEA, for example — they simply double-click the summarized value to drill down and learn more about exactly what led to that value.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Connected Sheets instantly queries and retrieves the precise, granular transaction rows behind that summary value, making it easy and fast to find the root cause.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/pivot_table_cs.gif"
        
          alt="pivot table cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Operational reporting:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Business users can create live, refreshable, and easy-to-understand dashboard-like views of their data that their partner teams can rely on and share with executives and leads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Automated executive summary&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; An operations lead provides weekly updates on sales invoices to their leadership, based on a BigQuery dataset with millions of rows.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The operations lead creates their Connected Sheet and builds a series of charts to visualize invoice trends over time. They then configure the sheet to automatically refresh on a schedule every Monday morning, so it’s always ready ahead of their executive review.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The manual routine of exporting data and pasting it into workbooks is completely eliminated. Leadership gets a reliable report and analysis powered by the latest warehouse data.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/schedule_refresh_cs.gif"
        
          alt="schedule refresh cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Hybrid data modeling:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Data practitioners often need to blend governed warehouse data with real-time manual inputs and annotations. For example, a finance team might pull revenue data from BigQuery and combine it with manual procurement entries from your ERP system in a separate tab, using VLOOKUP to create a consolidated view for month-end reporting.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Custom business metrics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A financial analyst calculates custom commission payouts based on live sales data from your CRM system. The commission tier logic changes frequently and isn't modeled in the central data warehouse.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Instead of requesting a new data pipeline from their data team, the analyst can add a calculated column directly within the Connected Sheet. They use standard spreadsheet formulas (like &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;IF&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;IFS&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;) to apply custom business logic directly against the BigQuery data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The analyst retains the flexibility to model scenarios and calculate metrics quickly, while maintaining governed BigQuery data as their single source of truth.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Getting started&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Connecting Google Sheets to BigQuery is straightforward and requires only a Google Workspace account and a billing-enabled Google Cloud project. There are two primary ways to establish a connection and create a Connected Sheet.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Path 1: Starting from Sheets&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;This is the typical workflow for users who work primarily within spreadsheets.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Open a new Google Sheet.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Navigate to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Data &amp;gt; Data Connectors &amp;gt; Connect to BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Select your billing-enabled Google Cloud project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Browse available datasets, select a Saved Query to connect right away, or input a custom SQL query. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Click &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Connect&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Path 2: Starting from BigQuery&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;This workflow is common for data analysts starting from the Google Cloud console.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Navigate to the BigQuery UI in the console.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;In the Explorer pane, locate the table or query result you wish to analyze.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Click the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Export&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; menu (or the three-dot action menu) next to the asset.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Select &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Open in &amp;gt; Connected Sheets&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;From petabytes to predictions with Connected Sheets&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We designed Connected Sheets to help you bridge the gap between the scalability of the cloud and the flexibility of the spreadsheet. With Connected Sheets, we’re making it easier than ever for organizations to put data into the hands of the people who need it.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To explore these features, connect your BigQuery data to Google Sheets today. For more technical details, visit the &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery/docs/connected-sheets"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Connected Sheets documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 29 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</guid><category>BigQuery</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>From petabytes to predictions: Easy BigQuery insights in Google Sheets</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Tarak Parekh</name><title>Sr. Product Manager, BigQuery</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Laura Gagliano</name><title>Sr. Product Manager, Workspace</title><department></department><company></company></author></item><item><title>Evolving Dataflow to process massive datasets for machine learning</title><link>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google created &lt;/span&gt;&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MapReduce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; more than 20 years ago to solve the scaling problems in data processing that the then young company was running into. The AI era that we are in now demands efficient, large-scale data processing for everything from training frontier models like Gemini by Google DeepMind to powering fully autonomous vehicles like Waymo. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many aspects of machine learning, including data ingestion, transformation, and feature extraction, rely heavily on processing massive datasets. To meet this astronomical scale required by efforts across Google, we evolved our data platform, Flume, the successor to the original MapReduce, with innovations focused on &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;scalability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;efficiency&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and a better &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;developer experience&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. And many of those innovations are available as part of &lt;/span&gt;&lt;a href="https://cloud.google.com/products/dataflow"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dataflow&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;our fully managed batch and streaming platform built on the same core technology Google uses to power its most demanding internal workloads.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. In this blog, we provide an overview of the many innovations in the Flume platform, and a glimpse into how Google Cloud customers are putting those features into action with Dataflow. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Addressing massive scalability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The scale of data processing at Google has exploded over the last 20 years and continues to drive innovation. To tackle the challenges of immense scale, we introduced several features within Google's data processing platform, which are also available in Dataflow::&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Liquid sharding&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; dynamically splits work units (shards) during execution for on-the-fly rebalancing. This helps pipelines with uneven data distribution and stragglers to maximize worker efficiency as data grows.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Global compute&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; enables enormous scaling by dynamically scheduling workloads across Google's global infrastructure. The system automatically determines the optimal location based on factors like data locality and resource availability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Automatic pipeline optimization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; fuses consecutive operations into a single stage. This reduces I/O and stage-transition overhead, allowing large-scale execution to scale more gracefully.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rate-limiting external API calls&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; manages load on external services. This is essential for modern ML pipelines that frequently call external APIs for tasks like model evaluation, preventing high data volumes from overloading systems.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Tandem pools&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; facilitate serverless remote inference. This feature helps overcome scalability limitations often found in remote inference systems by efficiently hosting, sharing, managing, and autoscaling external model servers.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Boosting efficiency with accelerators&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Doing more with less isn't just a constraint; it fuels our progress. By finding ways to run more efficiently, we create the space and capacity needed for rapid innovation. This is particularly evident for teams that use accelerators like TPUs for their workloads. To improve utilization and cost efficiency, our engineers devised several novel features for our platform, now part of Dataflow:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Heterogeneous worker pools&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; allow developers to specify custom resource requirements for different pipeline stages. For example, TPU-intensive work runs on TPU-equipped workers, while other stages use standard CPU workers. This ensures optimal resource allocation.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;TPU-aware autoscaling&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; prevents excessive initial assignment of TPU workers and improves efficiency during subsequent autoscaling events.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Duty-cycle policy enforcement&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; automatically scales down TPU workloads when the accelerator's duty cycle (the fraction of time it is active) is low, scaling back up only when utilization improves.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;TPU fungibility&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: By working with other infrastructure teams, we developed optimizations to encourage scheduling jobs to the most suitable TPU version and cell location based on quota and resource availability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhancing the developer experience&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Considering the wide mix of backgrounds and tools across Google, rapid prototyping, iteration, and reliable production operations are extremely important. Google has invested in significant capabilities to enhance the overall user experience:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Language flexibility&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is provided through a versatile SDK with a simple API in C++ (internal to Google), Java, Python, and Go (with SQL support). This allows users to build batch, ML, and streaming pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Integration&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; with ML frameworks like &lt;/span&gt;&lt;a href="https://docs.jax.dev/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;JAX&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is available, along with native support for LLM-specific optimizations. The underlying platform also provides building blocks for robust agentic inference pipelines and supports simple transitions between bulk and streaming paradigms.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified batch and streaming&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; enables users to use the same code for both historical batch and live streaming data. This simplifies the architecture, which traditionally would have required separate pipelines for batch and streaming data processing.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Observability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for production pipelines is available through the monitoring UI, which offers comprehensive control and essential diagnostic data. Detailed performance metrics, such as stage-level TPU utilization graphs, provide transparency for troubleshooting and optimization tasks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced developer workflows&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for quicker day 0 and day 2 operations include features like sampling and dry-run to help ensure code accuracy. Users can also test pipelines on small in-memory collections, and even pause and resume production pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Dataflow brings innovation from Google's internal platform to Google Cloud &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Dataflow is built upon Google's internal platform, sharing many core components, including the execution engine and the Apache Beam SDK (which originated from Flume’s APIs). This close relationship means that the cutting-edge solutions we build to handle Google’s internal data processing challenges, like pipelines that process hundreds of billions of documents, directly benefit Dataflow users. In fact, unique Dataflow features like vertical scaling, right fitting, dynamic sharding, and straggler detection all resulted from solutions developed for Google’s internal workloads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This is one of the reasons many Google Cloud customers rely on Dataflow for critical ML applications: &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Spotify&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; uses Dataflow for &lt;/span&gt;&lt;a href="https://engineering.atspotify.com/2023/04/large-scale-generation-of-ml-podcast-previews-at-spotify-with-google-dataflow" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;large-scale generation of ML podcast previews&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;. Etsy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; leverages Dataflow for &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/etsy-ai?hl=en&amp;amp;e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;data preparation and ETL&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for its ML workloads. And &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Moloco&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; uses Dataflow to process &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/moloco"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;terabytes of data a day to update its prediction model&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for real-time ad bidding.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The momentum continues: Last quarter we launched support for TPU in Dataflow in addition to supporting GPUs. Looking ahead, we are working on an advanced reliability feature called speculative execution and are enhancing the developer experience with features like failure isolation and replay and pause/resume, which are coming soon. To learn more or get started with Dataflow visit &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataflow/docs/get-started"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://docs.cloud.google.com/dataflow/docs/get-started&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 28 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</guid><category>AI &amp; Machine Learning</category><category>Customers</category><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Evolving Dataflow to process massive datasets for machine learning</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Shan Kulandaivel</name><title>Group Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mustafa Saglam</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>The future of agentic development: Redefining the data practitioner lifecycle with Data Agent Kit</title><link>https://cloud.google.com/blog/products/data-analytics/data-agent-kit-brings-data-skills-and-tools-to-your-ide-or-cli/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;The modern software development landscape isn’t happening just on one surface — it’s happening across an entire ecosystem of agentic tools. Agents are being developed at an unprecedented scale, and these agents require direct access to enterprise data for context and grounding.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;However, the current tooling for building agents and managing data is heavily fragmented. This can make it difficult to access data, increasing security risks, and causing broken developer experiences that hinder innovation.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;To address this challenge, we recently launched &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/whats-new-in-the-agentic-data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Data Agent Kit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a unified, open-source collection of data engineering and data science skills, tools and plugins that integrate directly into the environments practitioners already use, such as VS Code, Claude Code, Codex, Gemini CLI and the Antigravity CLI. By seamlessly bringing together these core tools and skills with your enterprise data, the Data Agent Kit effectively serves as a comprehensive harness for agentic context, memory, and personalization. It provides:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic skills:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Pre-codified pathways for interacting with your data estate, covering query optimization, ML best practices, data validation, data drift checks, governance, and troubleshooting.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Model Context Protocol (MCP) tools:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Secure connections between agentic workflows and cloud data platforms like BigQuery, AlloyDB, and Google Cloud Storage. Developers can now configure connection parameters for their cloud datasets and data processing engines without having to manage complex, manual pipeline code.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Plugins and extensions:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Native IDE integrations that enable rich, context-aware developer interactions.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Together, these Data Agent Kit capabilities help data practitioners go from manually writing code to &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;intent-driven data science and engineering: &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;defining the desired business outcomes, constraints, and success criteria, and allowing the AI-augmented system to figure out how to execute it. This shift is critical because today, when building agentic applications that navigate complex data architectures, there’s often a 'context window tax' i.e., developers have to manually paste vast amounts of schema metadata into prompts, eating up token limits and increasing latency. Meanwhile, data practitioners often lack guidance about how to efficiently query, optimize, and troubleshoot cloud data, while specialized, fragmented development environments cannot see across your entire data estate. Data Agent Kit helps with these challenges and others, providing the foundational capabilities data practitioners need for a new agentic way of working. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Read on for an overview of Data Agent Kit’s features and benefits, how to install it and connect your local environment to your data estate, and an intent-driven engineering example&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;A unified hub for your data estate and lifecycle&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Data Agent Kit makes your entire data estate available in a single view. This goes beyond providing a simple catalog for databases such as BigQuery, AlloyDB and Spanner; rather, it integrates data engineering and science tasks, orchestration pipelines, and jobs into a single interface. This allows practitioners to manage their entire data workflow — from discovery to production — without context switching. Data Agent Kit’s intelligent routing automatically chooses the optimal compute engine for your task — whether that’s BigQuery for SQL-native analytics and ELT, or Spark for custom Python transformations and distributed ML training. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_Unified_Catalog.gif"
        
          alt="1 Unified Catalog"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Unified Hub of your entire data estate and lifecycle&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Ecosystem-led&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; intelligence: C&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;odified agentic skills &lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Data Agent Kit offers a library of predefined agentic skills (e.g., ML best practices, ELT, building data apps) based on Google Cloud’s data engineering and science expertise. Rather than relying on generic LLM prompts, it codifies prescriptive guidelines into your workflow. This allows you to inject enterprise-grade data intelligence directly into your IDE or CLI.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_Agentic_Skills.gif"
        
          alt="2 Agentic Skills"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Browsing a predefined list of agentic data engineering and science skills&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Transforming data exploration through natural language&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Grounded in this unified data, Data Agent Kit delivers native conversational analytics directly within your workspace, making it easy to explore your data. Powered by the same Gemini natural language to SQL technology found in our first-party agents (e.g., Conversational &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-conversational-analytics-in-bigquery?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;and &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-conversational-analytics-now-ga/?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Looker&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;), Data Agent Kit lets you run natural language queries to profile, search, and visualize your datasets. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_Conversational_Analytics.gif"
        
          alt="3 Conversational Analytics"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Within Data Agent Kit, you can use Conversational Analytics to explore your data&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;A practical&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; walkthrough: &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Unifying data and building models&lt;/span&gt;&lt;/h2&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;To see how Data Agent Kit’s skills and MCP tools work together, consider a financial services scenario: Your company is facing rising fraud claims. With your transaction data stored in Cloud Storage, you need to build a high-confidence fraud detection model and schedule orchestration pipelines. Traditionally, this involves hours of data wrangling across multiple consoles. With the Data Agent Kit, you can complete this in minutes, directly within your IDE or CLI. Let’s see how.&lt;/span&gt;&lt;/p&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Onboarding: The one-minute setup&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;You can get started with the Data Agent Kit in under a minute through an integrated setup process.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;To do so, search for "Google Cloud Data Agent Kit" in your IDE’s marketplace (VS Code) or via the GitHub repo in your CLI (Gemini, Antigravity, Claude, Codex) from the links in the “Get started today” section below. Data Agent Kit automatically configures dependencies and checks your Google Cloud login status.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_VS_Code_Marketplace_Extension.max-1000x1000.jpg"
        
          alt="4 VS Code Marketplace Extension"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Click the Google Cloud icon in your activity bar to authenticate via IAM. Once logged in, your Cloud Storage, databases, and catalog assets appear instantly in your workspace.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Use the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;settings&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; menu to set project IDs, regions, and verify MCP status to ensure all backend services are authorized. Data Agent Kit also includes a quick-start guide on using the tools and skills. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Data_Agent_Kit_Extension_Installed.max-1000x1000.jpg"
        
          alt="5 Data Agent Kit Extension Installed"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;An intent-driven data engineering example&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;With Data Agent Kit installed, you can s&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;kip the manual ETL boilerplate, and directly describe your high-level goal to your coding assistant (e.g., Claude Code, GitHub Copilot) in natural language. The assistant leverages Data Agent Kit’s skills to plan and execute the workflow.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Prompt:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;I have the &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;raw transaction logs&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; landing in the &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;GCS &lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;bucket gs://fin-clearing-raw/.&lt;/code&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;First, &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;create a Spark notebook&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; and (1) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;ingest &lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;these logs into an &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;Iceberg table in BigQuery&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;.&lt;/code&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;Second, &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;create a dbt project&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; to (2) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;deduplicate &lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;them, (3) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;remove the transactions with invalid transaction id&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; and store them in a separate Iceberg table, (4) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;standardize &lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;the timestamps and perform any other necessary cleanup tasks (5) &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;sync the output to another Iceberg table&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; (6) join this output table with tables that have payer and payees identities and write the output to a final Iceberg table.&lt;/code&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;Third, I would like you to &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;train an ML model on Spark using a notebook&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; to detect fraudulent transactions in the output table. I am thinking about a LightGBM model but I am open to any suggestions you might have. Use the relevant datasets in the project.&lt;/code&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;code style="vertical-align: baseline;"&gt;Finally, &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;create an inferencing step using Spark notebook&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; to the above pipeline to perform batch inferencing and write flagged transactions to a Spanner table.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code style="vertical-align: baseline;"&gt;Create an &lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;orchestration pipeline&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt; that first runs the ingestion then the dbt and next the inference notebook.&lt;/code&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3 style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Under the hood: Data pipeline steps&lt;/strong&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Behind the scenes, Data Agent Kit plans a robust multi-step orchestration of the entire data lifecycle, from exploration to inference. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 1: Notebook creation, ingestion and initial storage&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Find your bronze data — raw, unfiltered data on financial transactions — and bring it into an Iceberg table before doing the transformations. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Automatically create a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Notebook&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to ingest the raw logs from Cloud Storage. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Write the necessary &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;SQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and store the ingested data into an &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Iceberg table&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; in BigQuery.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/6_Ingestion.gif"
        
          alt="6 Ingestion"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Ingestion into a bronze table&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 2: Transformation (dbt Project)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Now, clean the bronze data into silver and gold tables: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data preparation: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Deduplicate&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;the transaction logs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Filter invalid IDs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Identify transactions with invalid IDs and store them in a separate Iceberg table.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Clean and standardize:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Standardize timestamps and perform other necessary cleanup tasks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Sync:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Output the cleaned data to another Iceberg table, leveraging the BigQuery MCP server.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enrichment:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Join the cleaned table with payer and payee identity tables.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Final output:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Write the joined dataset to a final Iceberg table.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/7_Transformation.gif"
        
          alt="7 Transformation"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Data transformation to create silver and gold tables&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 3: Machine learning and inferencing&lt;/strong&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;With your gold table minted, it’s time for some data science: model training and inferencing. Here, the agent hands the clean data from the previous step to the model to spot fraudulent patterns.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Training:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Use a Spark notebook to train an ML model.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Inference:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Create a Spark notebook inferencing step for batch processing.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Storage:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Write all flagged fraudulent transactions to a Spanner table by leveraging the Spanner MCP.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/8_ML_Inferencing.gif"
        
          alt="8 ML Inferencing"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Machine learning and inference&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 4: Orchestration and execution&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Finally, you’re ready to move to production and schedule the whole orchestration pipeline: Ingestion -&amp;gt; Transformation -&amp;gt; Inference.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/9_Orchestration.gif"
        
          alt="9 Orchestration"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Orchestration pipelines and scheduling runs&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;When things go sideways: Agentic incident management and intelligent recovery&lt;/strong&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;If an orchestration pipeline fails, not to worry, Data Agent Kit streamlines resolution using its intelligent incident management capabilities:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Intelligent diagnosis:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Automatically conducts root cause analysis to pinpoint failure sources&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Autonomous remediation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Drafts and tests fixes, bypassing manual debugging&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Automated recovery:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Validates and deploys fixes via automated Git workflows&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/10_Issue_diagnosis_and_remediation.gif"
        
          alt="10 Issue diagnosis and remediation"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pjt1k"&gt;Issue diagnosis and remediation&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;And there you have it: You’ve gone from raw discovery to a fully automated, fraud-catching machine in a matter of minutes, all from within the same UX. No need to hop between multiple browser tabs, IDE interfaces, or learn data engineering and science best practices — Data Agent Kit orchestrates a clean end-to-end flow leveraging various MCP tools and codified skills. Ultimately, this approach helps you achieve what matters most: shipping innovative, high-performance data applications at scale.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Data Agent Kit is available today in preview. Start by installing it in your favorite IDE or CLI:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://marketplace.visualstudio.com/items?itemName=GoogleCloudTools.datacloud" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;VS Code Marketplace&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension/antigravity/install"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Antigravity CLI&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://github.com/gemini-cli-extensions/data-agent-kit-starter-pack" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GitHub Repo (Gemini CLI, Claude Code, Codex)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://open-vsx.org/extension/googlecloudtools/datacloud" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;VSX&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://claude.com/plugins/data-agent-kit-starter-pack" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Claude Code Plugin&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then visit the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to learn more and get started. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 19 May 2026 17:45:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/data-agent-kit-brings-data-skills-and-tools-to-your-ide-or-cli/</guid><category>AI &amp; Machine Learning</category><category>Google I/O</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>The future of agentic development: Redefining the data practitioner lifecycle with Data Agent Kit</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/data-agent-kit-brings-data-skills-and-tools-to-your-ide-or-cli/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Brahm Kohli</name><title>Group Product Manager, Data Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Dinesh Chandnani</name><title>Director of Engineering, Data Cloud</title><department></department><company></company></author></item><item><title>Beyond the Query: 5 Scenarios Laying the Foundation for the Agentic Era</title><link>https://cloud.google.com/blog/products/data-analytics/building-an-agentic-data-layer-on-google-cloud-5-key-scenarios/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Accessing enterprise data is shifting from static reports to dynamic use by autonomous systems. To keep up, organizations must route fragmented data from SaaS, IoT, and legacy sources into secure, scalable endpoints.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;However, moving to AI-driven exposure requires more than just connecting an LLM to a database, it requires a fundamental architectural shift to manage security, costs, and semantic accuracy.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;What we’ll cover&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This article explores the technical evolution of data exposure through five architectural patterns: from manual SQL development to autonomous workflows standardized by the Model Context Protocol (MCP).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While the examples use BigQuery and mocked CRM data, the patterns apply to most enterprise data assets transitioning into an agentic workflow.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;The 5 Scenarios of Data Evolution&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The transition from static reports to agentic insights is defined by two factors: Trust and complexity.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Trust dictates autonomy: Low-trust environments (like external client-facing apps) require deterministic, hard-coded logic to prevent errors. High-trust environments (like internal tools for power users) allow for probabilistic LLM reasoning, where there is more tolerance for non-deterministic outputs.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Complexity defines utility: Simple lookups need fast, cached responses. In contrast, complex, cross-functional problems require an agent to orchestrate multiple tools and data sources.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To navigate this shift, we will examine five technical scenarios, starting with the baseline of the static API.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 1: The Static API Contract&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Maximum stability and deterministic execution&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 1 represents the traditional model of data exposure. A developer acts as the intermediary, translating specific business requirements—such as "Show me top-selling products by category"—into optimized, hard-coded SQL queries.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Isolation and Predictability&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This approach provides the highest level of security and performance:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Low logic risk&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Because the SQL is pre-written and vetted, there is no risk of a user (or an agent) crafting a query that accesses unauthorized data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Secure by design&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Using parameterized queries instead of string concatenation provides a hard barrier against SQL injection.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Reliability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The output is deterministic. If the development lifecycle is robust, the user is guaranteed to receive exactly what they requested, with predictable execution costs and performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This snippet demonstrates the baseline for data exposure: a direct, static API contract. It offers maximum predictability by using parameterized queries to prevent SQL injection and ensure consistent performance.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;A note on the code examples:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; To prioritize architectural clarity, these examples are provided as conceptual blueprints rather than production-ready code. They are designed for pedagogical purposes and intentionally omit "industrial" requirements such as persistent session state, IAM/Auth protocols, and comprehensive exception handling. Use these only as a logic guide before implementing your own hardened and production-ready solution.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;from google.cloud import bigquery\r\ndef fetch_products(limit=10):\r\n    client = bigquery.Client()\r\n    # Use named parameters to ensure security and prevent SQL injection\r\n    sql = &amp;quot;&amp;quot;&amp;quot;\r\n        SELECT id, name \r\n        FROM `bigquery-public-data.thelook_ecommerce.products` \r\n        LIMIT @limit\r\n    &amp;quot;&amp;quot;&amp;quot;\r\n    job_config = bigquery.QueryJobConfig(\r\n        query_parameters=[\r\n            bigquery.ScalarQueryParameter(&amp;quot;limit&amp;quot;, &amp;quot;INT64&amp;quot;, limit)\r\n        ]\r\n    )\r\n    return client.query(sql, job_config=job_config).to_dataframe()&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a55a8be0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Users cannot change the query logic or filters without code changes.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Query plans are static; costs are predictable and easy to budget.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Low response times leveraging for example BigQuery's query cache.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Every new business question requires a developer and a deployment.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;When to use Scenario 1?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This approach is the benchmark for external-facing applications, customer portals, and high-traffic production dashboards. It is the best choice when your requirements include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Strict auditability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You need a version-controlled (Git-based) history of every query executed against your data warehouse.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Performance at scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You require sub-second latency, leveraging BigQuery’s result caching for high-concurrency workloads.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Deterministic logic:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You must guarantee that specific inputs always produce the exact same output, with no room for AI-driven variability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;External multi-tenancy:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You are exposing data to third parties and need absolute assurance against data cross-contamination.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 2: Custom Agent with SQL Generation&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; User flexibility and managed autonomy.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To resolve the development bottleneck of manual SQL authoring, Scenario 2 introduces an LLM agent (via the Agent Platform SDK) to act as a dynamic translator. In this model, the developer stops writing individual queries and starts focusing on metadata documentation.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;From Query Writing to Metadata Curation&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Using the Agent Platform SDK (for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/machine-learning/python-sdk/use-vertex-ai-python-sdk"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Python&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, for example), developers implement a reasoning engine that maps natural language to schema metadata. Rather than "guessing" the SQL, the agent follows a structured reasoning loop:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Analyze:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; It parses the natural language intent (e.g., "Which region had the highest growth?").&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Retrieve:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; It looks up the relevant schema metadata provided in the system context.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Construct:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; It generates a syntactically correct, BigQuery-compatible statement.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For the LLM to generate accurate queries, it must "see" the data structures. You provide this through system instructions that include table names, column types, and—crucially—semantic descriptions (e.g., &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"created_at: The timestamp when the user first registered"&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;). By curating this metadata space, you define the boundaries of what the agent can explore and execute.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Access control relies entirely on underlying database permissions (like RLS). Because the agent passes generated SQL dynamically, data boundaries must be enforced at the database level.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This marks the first step into agentic workflows, where an LLM acts as a translator between natural language and structured schema.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;from google.cloud import bigquery\r\nfrom vertexai.generative_models import GenerativeModel\r\n\r\ndef ai_query(user_prompt):\r\n    # Initialize the model\r\n    model = GenerativeModel(&amp;quot;YOUR_LLM_MODEL&amp;quot;)\r\n    \r\n    # SYSTEM CONTEXT: Grounding the model with schema metadata\r\n    # This prevents the AI from guessing table names or column types.\r\n    system_instruction = (\r\n        &amp;quot;You are a BigQuery SQL expert. Output ONLY raw SQL code without markdown backticks. &amp;quot;\r\n        &amp;quot;Context: The \&amp;#x27;products\&amp;#x27; table in \&amp;#x27;bigquery-public-data.thelook_ecommerce\&amp;#x27; &amp;quot;\r\n        &amp;quot;contains: id (INT), name (STRING), and category (STRING).&amp;quot;\r\n    )\r\n    \r\n    full_prompt = f&amp;quot;{system_instruction}\\n\\nUser request: {user_prompt}&amp;quot;\r\n    \r\n    # Generate the SQL string\r\n    response = model.generate_content(full_prompt)\r\n    sql_code = response.text.strip().replace(&amp;quot;```sql&amp;quot;, &amp;quot;&amp;quot;).replace(&amp;quot;```&amp;quot;, &amp;quot;&amp;quot;)\r\n    \r\n    # Execute the AI-generated query\r\n    client = bigquery.Client()\r\n    return client.query(sql_code).to_dataframe()&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a55a8400&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Users can ask virtually any question in plain English.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;LLMs may generate unoptimized queries (e.g., missing partitions).&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Includes LLM "thinking" time.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers manage "prompt schemas" rather than SQL code.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;When to use Scenario 2?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 2 is best suited for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;internal data discovery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;analyst-led exploration&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. It bridges the gap between raw data and business users when you require:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;High-variability querying:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When the range of potential business questions is too broad (the "infinite question space") to be efficiently covered by pre-built, static APIs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid prototyping:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When analysts need to quickly explore datasets and validate hypotheses before committing to the development of formal, production-grade dashboards.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Semantic interpretation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When you need an agent to resolve natural language ambiguities—such as mapping "last quarter" or "active users"—into specific, technical filter criteria automatically.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 3: Conversational Analytics&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Managed reasoning and verified logic.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 3 shifts the responsibility from a self-managed custom agent to a specialized, platform-native reasoning engine. By leveraging the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics-api/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (currently in Pre-GA), you can deploy Data Agents - intelligent, governed layers that use enterprise-specific metadata and verified SQL to keep the LLM within strictly defined guardrails. This API translates natural language into precise queries across BigQuery, Looker, and Data Studio, while extending support to Google Cloud’s primary database solutions. We’ll consider BigQuery as our primary example for exploring these conversational insights.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;The Power of Verified Queries&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Unlike generic LLM prompts that guess the SQL structure, these agents are grounded in your organization’s source of truth:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Verified queries:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You provide a library of verified queries (vetted, high-quality SQL examples) that the agent uses as a reference for complex joins and business logic. This ensures the agent follows your established coding patterns.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed context:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The platform handles the retrieval of schema information and documentation, reducing the prompt bloat that often leads to hallucinations in custom-built agents.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Aligned outputs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By grounding the model in existing production SQL, the system ensures that AI-generated insights remain consistent with your official reporting metrics.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This solution inherits existing BigQuery IAM permissions and provides a view of the reasoning and SQL behind every answer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Can all of this be done with enough work on a fully customized agent? Yes. Is the custom approach practical, and time/cost efficient? Maybe not.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This approach leverages a specialized reasoning engine to handle intent discovery and data grounding. The developer no longer manages the translation logic: they simply call the managed agent.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;from google.cloud import geminidataanalytics_v1beta as gda\r\ndef chat_data(user_query):\r\n    # Initialize the client for the Data Agent service\r\n    client = gda.DataAgentServiceClient() \r\n    # Path to your pre-configured Data Agent resource\r\n    agent_path = &amp;quot;projects/YOUR_PROJECT_ID/locations/us/dataAgents/YOUR_AGENT_ID&amp;quot;\r\n    # Execute: The agent uses its &amp;quot;Verified Queries&amp;quot; and metadata to find the answer\r\n    request = gda.ExecuteDataAgentRequest(name=agent_path, query=user_query)\r\n    response = client.execute_data_agent(request=request)\r\n    \r\n    # The agent returns both the natural language answer and the supporting data\r\n    return response.answer&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a55a8a00&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;High for the data sources it knows, but restricted by its Verified instructions and metadata scope.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Grounded queries are typically more efficient than raw LLM generation.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Higher than static queries, due to the multi-stage reasoning and summarization process.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Managed by Google; analysts focus on coaching the agent through metadata and verified SQL.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;When to use Scenario 3?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 3 is the ideal path for BigQuery-centric analysis where accuracy is non-negotiable. Choose this when you require:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Governed trust:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Business logic (e.g., "Revenue") must follow pre-vetted verified queries every time.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Native intelligence:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Users need to perform complex tasks like forecasting or anomaly detection via BigQuery AI using natural language.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Auditability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Stakeholders require a transparent reasoning path to see exactly how the AI arrived at its numbers.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While Scenario 2 requires building a custom reasoning engine from scratch, Scenario 3 provides a platform-native experience that prioritizes verified logic over raw LLM generation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The limitation: This data companion is ultimately confined to the BigQuery or Google Cloud ecosystem. To scale an agentic workforce across heterogeneous platforms and tools, we must look toward vendor-agnostic standards like the Model Context Protocol (MCP).&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 4: Managed MCP Tools&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Standardized connectivity and decoupled architecture.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 4 introduces the Model Context Protocol (MCP)—an open-source standard designed to normalize how AI applications interact with data and tools. While previous scenarios rely on custom SDKs or platform-specific APIs, MCP provides a universal interface that separates the reasoning layer from the tool execution layer.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Standardized Abstraction&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;MCP enables tool discovery by exposing a manifest of capabilities that any compliant agent can ingest. This allows for a modular system where the data logic is "externalized" from the agent itself.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The MCP client:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The reasoning engine (the LLM) that identifies the user's intent. Because it uses a standardized protocol, the client can connect to any MCP server and instantly discover what it can do without needing new integration code.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The MCP server:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The domain-specific service that exposes data and logic. The managed BigQuery MCP server doesn't just pass queries: it encapsulates the logic required to navigate Google Cloud’s infrastructure safely. It exposes tools such as:&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;code style="vertical-align: baseline;"&gt;list_dataset_ids&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;: Context-aware discovery of the data environment.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;code style="vertical-align: baseline;"&gt;get_dataset_info&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;: Metadata retrieval for semantic grounding.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;code style="vertical-align: baseline;"&gt;execute_sql&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;: Controlled execution of data retrieval.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;(see &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/reference/mcp"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://docs.cloud.google.com/bigquery/docs/reference/mcp&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for the updated toolset).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Access control is managed via standard IAM service accounts and lacks programmatic logic-checks.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This decoupling future-proofs your AI stack. You can swap your LLM provider or upgrade your agent's reasoning model without rewriting the data access logic, because the interface between them remains consistent and governed.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In an MCP-based architecture, connecting an AI agent to a data source is reduced to a simple configuration handshake. Instead of writing custom integration logic, you provide an MCP-compliant client (such as the Gemini CLI or a modern IDE) with a manifest defining the server’s location and security requirements.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The following manifest allows the client to connect to Google’s managed BigQuery MCP server, enabling it to dynamically discover and execute data tools without a single line of custom code:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;{\r\n  &amp;quot;mcpServers&amp;quot;: {\r\n    &amp;quot;bigquery&amp;quot;: {\r\n      &amp;quot;httpUrl&amp;quot;: &amp;quot;https://bigquery.googleapis.com/mcp&amp;quot;,\r\n      &amp;quot;authProviderType&amp;quot;: &amp;quot;google_credentials&amp;quot;,\r\n      &amp;quot;oauth&amp;quot;: {\r\n        &amp;quot;scopes&amp;quot;: [\r\n          &amp;quot;https://www.googleapis.com/auth/bigquery&amp;quot;\r\n        ]\r\n      }\r\n    }\r\n  }\r\n}&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a55a8df0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Agents can contextually explore any table the MCP server exposes.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Tools are standardized, but a curious agent can still trigger large scans.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Medium&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Includes standard overhead for the protocol handshake and tool-calling.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Low&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Uses a managed MCP Server which requires no maintenance. The work is only on the MCP client.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;When to use Scenario 4?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 4 is the architectural choice for multi-agent environments that require standardized data connectivity with minimal maintenance overhead. It is the ideal path when you require:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed infrastructure:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You want to offload the security, execution, and maintenance of your toolset by consuming a managed BigQuery MCP server rather than building and patching custom data-retrieval code.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;LLM portability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You need an open-standard interface, allowing you to use the same tools across different LLMs or agent frameworks without rewriting proprietary function calls.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Autonomous discovery:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Your agents must navigate and inspect complex datasets dynamically. MCP’s standardized endpoints allow agents to crawl metadata and schema information autonomously to determine the best path for a query.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 5: Custom Hosted MCP Servers&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Architectural extensibility and custom tool definition.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Scenario 5 takes the standardized connectivity of Scenario 4 and adds complete control by replacing the managed service with a custom-built MCP server. Typically hosted on scalable infrastructure like Cloud Run, you can rely on open source solutions such as &lt;/span&gt;&lt;a href="https://github.com/googleapis/mcp-toolbox" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MCP toolbox&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This approach removes the guardrails of managed offerings, granting engineering teams full freedom to define specialized tools, integrate disparate third-party APIs, and implement proprietary execution logic within the protocol.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Architectural Advantages of Custom MCP&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Shifting to a custom-hosted MCP server moves operational complexity from the LLM prompt to the server-side logic, unlocking three critical capabilities:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Deterministic tool tailoring:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Instead of forcing an agent to navigate raw, sprawling schemas, developers define high-level functions with specific data shapes. This replaces probabilistic SQL generation with deterministic execution, virtually eliminating schema-based hallucinations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified source orchestration&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: A custom MCP server acts as a consolidated gateway. Within a single tool execution, the server can orchestrate calls across BigQuery, external SaaS APIs, and legacy on-premises systems. The agent receives a pre-processed, unified response, abstracting away the multi-source complexity.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Programmable governance:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This scenario enables code-level security difficult to implement in managed environments. You can implement granular controls directly within the protocol layer, such as:&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Dynamic PII masking: Automatically redacting sensitive data before it reaches the LLM.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Custom authentication: Injecting enterprise-specific middleware.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Contextual rate limiting: Throttling tool usage based on the end-user’s identity or cost center.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Implementation example&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In this scenario, when using &lt;/span&gt;&lt;a href="https://github.com/googleapis/mcp-toolbox" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MCP toolbox&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, you use a declarative tools.yaml file to define the interface of your custom MCP server. This file acts as the absolute boundary for your agent—it defines the BigQuery connection, enables safe discovery for schema inspection, and wraps complex, multi-table joins into a single, parameterized tool.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;# ----------------------------------------------------------------------\r\n# Minimal Configuration\r\n# Dataset: bigquery-public-data.thelook_ecommerce\r\n# ----------------------------------------------------------------------\r\n\r\nsources:\r\n  bq-thelook-ecommerce:\r\n    kind: &amp;quot;bigquery&amp;quot;\r\n    project: &amp;quot;${PROJECT_ID}&amp;quot;\r\n    location: &amp;quot;${BQ_LOCATION}&amp;quot;\r\n\r\ntools:\r\n  # 1. Discovery Tool: Helps the agent understand the database schema\r\n  bigquery_get_table_info:\r\n    kind: bigquery-get-table-info\r\n    source: bq-thelook-ecommerce\r\n    description: Retrieves table metadata and schema details. Run this before executing custom queries.\r\n\r\n  # 2. Execution Tool: Parameterized SQL for safe, repeatable data fetches\r\n  thelook_get_user_orders_summary:\r\n    kind: bigquery-sql\r\n    source: bq-thelook-ecommerce\r\n    statement: |\r\n      SELECT\r\n        orders.user_id,\r\n        COUNT(DISTINCT orders.order_id) AS count_of_orders,\r\n        COUNT(order_items.id) AS count_of_items,\r\n        SAFE_DIVIDE(COUNT(order_items.id), COUNT(DISTINCT orders.order_id)) AS avg_items_per_order\r\n      FROM `bigquery-public-data.thelook_ecommerce.orders` AS orders\r\n      INNER JOIN `bigquery-public-data.thelook_ecommerce.order_items` AS order_items\r\n        ON orders.order_id = order_items.order_id \r\n        AND orders.user_id = order_items.user_id\r\n      WHERE orders.status = &amp;quot;Complete&amp;quot; \r\n        AND orders.user_id = @user_id\r\n      GROUP BY orders.user_id;\r\n    description: Retrieves an order summary for a specific user ID, including total completed orders, items purchased, and average items per order.\r\n    parameters:\r\n      - name: user_id\r\n        type: integer\r\n        description: The unique identifier of the user.\r\n\r\ntoolsets:\r\n  # Binds the tools together for agent use\r\n  thelook_core_insights_toolset:\r\n    - bigquery_get_table_info\r\n    - thelook_get_user_orders_summary&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f84a55a8970&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Analysis&lt;/span&gt;&lt;/h2&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table&gt;&lt;colgroup&gt;&lt;col/&gt;&lt;col/&gt;&lt;col/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Parameter&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rating&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Impact&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Supports cross-domain orchestration (e.g., BigQuery + legacy APIs) and unlimited custom tool definitions.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost Control&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Allows developers to inject programmatic query cost estimation and budget thresholds prior to execution.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Latency&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Custom multi-hop orchestration, network transit, and container cold-starts introduce latency.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Maintenance&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;High&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Requires full ownership of the application lifecycle, including CI/CD, dependency patching, and container scaling.&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;When to use Scenario 5?&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This architecture is the power user choice, essential for highly regulated environments and hybrid infrastructures where managed services fall short. Implement this approach when your design requires:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Secure hybrid orchestration:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You must bridge BigQuery with private on-premises systems or restricted APIs, returning a pre-processed, consolidated payload that the agent can use immediately without navigating the raw network gap.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Hardened business logic:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You need to move complex, non-negotiable calculations off the LLM and into a controlled code environment, exposing only high-level "expert" tools to guarantee absolute accuracy.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Centralized enterprise tooling:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You want to maintain a single, governed source of truth for your proprietary tools that can be served uniformly across different LLM providers or internal frameworks without vendor lock-in.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span style="vertical-align: baseline;"&gt;Conclusion: The Foundation of the Agentic Era&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The journey from Scenario 1 to Scenario 5 traces a clear technical evolution: we are moving away from rigid, hard-coded data silos and toward a world of autonomous discovery and standardized connectivity. By adopting frameworks like the Model Context Protocol (MCP), organizations can decouple their data logic from their AI models, ensuring that as LLMs evolve, their access to the enterprise "brain" remains seamless, scalable, and vendor-agnostic.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;However, increased autonomy does not mean decreased oversight. While we haven’t touched on these points in depth in this article, we must adhere to a fundamental truth: data access must be governed and controlled using governance and security tools. Regardless of the access scenario—more or less agentic depending on the use case—security, credentials, quality management, and standardized governance are absolutely essential.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;On a more lighthearted note, it’s worth remembering that the golden rule of computing still applies: "Garbage In, Garbage Out". You can build the most sophisticated, autonomous agentic layer in the world, but if you feed it messy, uncurated data, you’ll simply get "garbage" answers at a much faster and more confident pace. Sophisticated AI doesn't fix bad data: it just makes it more visible. Maintaining high data quality is not just a legacy requirement—it is the fuel that makes the agentic engine actually work.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 18 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/building-an-agentic-data-layer-on-google-cloud-5-key-scenarios/</guid><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Beyond the Query: 5 Scenarios Laying the Foundation for the Agentic Era</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/building-an-agentic-data-layer-on-google-cloud-5-key-scenarios/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Marco Liotta</name><title>Technical Account Manager, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Lorenzo Caggioni</name><title>Data &amp; AI Architect, Google Cloud</title><department></department><company></company></author></item><item><title>What we announced in streaming AI at Next ‘26</title><link>https://cloud.google.com/blog/products/data-analytics/streaming-ai-news-from-next26/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Every device, user, and microservice generates data. Ingesting this data, extracting meaning and insights, and driving business decisions in real time has the potential to deliver transformational business value.The rise of agentic AI represents an opportunity for users to overcome the challenges inherent in real-time analytics. But while agentic AI has the potential to accelerate adoption, users face a new set of challenges with effectively leveraging real-time data:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-time context is hard to implement. &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Teams will choose to incorporate data from batch-oriented approaches, like periodic database syncs and scheduled refreshes. Agents have to either rely on stale data or require memory-intensive context windows. This “context lag” makes them ineffective for real-time agentic tasks like fraud detection, dynamic e-commerce recommendations, or autonomous supply chain adjustments. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-time systems are inflexible. &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Agentic tools lack the modularity to adapt to customer-specific requirements, forcing organizations to make difficult architectural choices. Data practitioners need a platform to meet them where they are, where they are free to make the tradeoff between latency, accuracy, and cost. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud provides a tightly integrated, unified streaming data platform that delivers both fully managed, Google Cloud-native services, as well as open-source-compatible services, and that come together to power large-scale AI training and inference. The platform is comprised of five key services: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Pub/Sub:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Highly reliable, serverless, and fully managed service for messaging and event streaming that’s integrated with BigQuery, Dataflow, and Cloud Storage. Pub/Sub is utilized by organizations like Anthropic. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Dataflow&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: A serverless engine for batch, streaming, and now agentic AI. Leading enterprise organizations like &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/partners/palo-alto-networks-builds-a-multi-tenant-unified-data-platform?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Palo Alto Networks&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; use Dataflow, as do Google services like Waymo and Google Maps. For instance, Waymo cars use Dataflow to help it “see” the world, plan their routes, and predict obstacles. Before a car hits the actual pavement, it “drives” millions of miles in a simulator, with Dataflow generating training datasets and validating the models that are used for autonomous driving.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Kafka:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The fully managed way to run the popular open source streaming storage and data integration system on Google Cloud that’s highly reliable, secure, and cost efficient. Across the largest enterprises and startups, Apache Kafka serves as a staging location for critical training data and real time updates to AI agent context. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: A unified platform for real-time ingestion and analysis. The Storage Write API provides high-throughput streaming into BigQuery and Lakehouse for Apache Iceberg tables with exactly-once delivery semantics and stream-level transactions. Additionally, BigQuery continuous queries enable real-time AI inference directly within the data pipeline by calling generative functions like AI.GENERATE_TEXT, allowing for immediate insights as data is ingested.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Bigtable&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Google’s NoSQL real-time database for processing streaming data from Pub/Sub and Dataflow automatically using  &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigtable/docs/continuous-materialized-views"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;continuous materialized views&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, delivering results in seconds that are ready for low-latency serving using Bigtable’s &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigtable/docs/in-memory-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;in-memory tier&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Moving from insight to autonomous action&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud Next, we announced a set of &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;streaming AI capabilities&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, providing autonomous agents with instant context and enabling real-time actions, helping organizations feed real-time context to their AI agents.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For instance, imagine a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;supply chain agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; that doesn't just monitor IoT data, but autonomously reroutes a shipment around bad weather, confirms new delivery windows with the receiving warehouse, and updates the customer's portal — all before a human supervisor is even aware of the problem. Consider a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;financial services agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; that identifies a fraudulent transaction pattern, instantly freezes the account, communicates with the customer via their preferred channel, and initiates a new card shipment — all within seconds of the suspicious activity. Whether you’re creating embeddings on streaming data to power search, or building a sophisticated multi-agent fraud detection system, these new capabilities add powerful tools to your toolbox. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s take a closer look at these new capabilities. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;New streaming AI capabilities&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Next ‘26, we launched tightly integrated capabilities to our platform across three key areas:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_rME1Dt8.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;1. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Providing real-time, enriched context for agents&lt;/strong&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;1.1. &lt;a href="https://docs.cloud.google.com/pubsub/docs/smts/ai-inference-smt"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Pub/Sub AI Inference SMT&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;(GA)&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;You can now run inference on messages streamed through Pub/Sub. Data practitioners can choose any models available on Gemini Enterprise Agent Platform. Pub/Sub makes the inference call and appends the result to each message before sending it downstream, bringing Pub/Sub’s simplicity together with the Gemini Enterprise’s fully managed tools.&lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;1.2. &lt;a href="https://docs.cloud.google.com/pubsub/docs/bigtable-subscriptions"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Pub/Sub Bigtable subscriptions&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;(Preview)&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Stream Pub/Sub data directly to Bigtable. Pub/Sub Bigtable subscriptions directly materialize event data from a Pub/Sub topic into a Bigtable table, eliminating the need for custom pipelines and dramatically simplifying your streaming architecture. For instance, you can easily ingest vector embeddings into Bigtable to power semantic search workloads. &lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;1.3. &lt;a href="https://docs.cloud.google.com/bigquery/docs/continuous-queries#stateful_processing_with_joins_and_windowing_aggregations"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery continuous queries stateful data processing&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (Preview): &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery continuous queries can now perform complex correlations between multiple data streams using JOINs and calculate metrics over consistent time intervals with tumbling window aggregations. This enables sophisticated analysis, such as calculating 30-minute averages or correlating events across different streams, directly as data is ingested into BigQuery. Furthermore, you can integrate AI directly into your data pipelines by calling generative functions like AI.GENERATE_TEXT, as well as materialize continuous query SQL results into BigQuery tables or export them to operational sinks like Bigtable, Spanner, and Pub/Sub for real-time reverse ETL.&lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;2. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Direct agents to manage your resources&lt;/strong&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;2.1. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Model Context Protocol (MCP) support for &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/pubsub/docs/use-pubsub-mcp"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Pub/Sub&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/managed-service-for-apache-kafka/docs/use-managed-service-for-apache-kafka-mcp"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Managed service for Apache Kafka&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigtable/docs/use-bigtable-mcp"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Bigtable&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/use-bigquery-mcp"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (GA)&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Your agents can manage Pub/Sub,Managed service for Apache Kafka services, and BigQuery using fully managed MCP endpoints. Agents can also publish messages to Pub/Sub. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;2.2. &lt;a href="https://adk.dev/integrations/?topic=google" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;ADK integration&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;(GA)&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Your agents can interact with your real-time data stored in Pub/Sub, Bigtable, BigQuery, or other Google Cloud services using pre-built ADK integrations. Developers can build agents acting on real-time context without having to implement complex configurations or plumbing.&lt;/span&gt;&lt;/p&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;3. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Combine multi-agent systems with your data processing&lt;/strong&gt;&lt;/p&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;3.1. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Event-driven autonomous agents&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: As agents become core to our workflows, real-time data pipelines must evolve to incorporate them directly into the stream. We have enabled this capability by treating agentic logic as a &lt;/span&gt;&lt;a href="https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.agent_development_kit.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;first-class citizen&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; within the Dataflow pipeline. You can now incorporate your agent code using the &lt;/span&gt;&lt;a href="https://adk.dev/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Development Kit (ADK)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and deploy it as a specialized node using the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;RunInference&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; transform and the new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;ADKAgentModelHandler&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Key advantages of this approach include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="list-style-type: none;"&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Massive scalability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Leverage Dataflow’s architecture to process high velocity events upstream and keep hundreds of agents sessions active simultaneously, each driven by specific incoming events.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Pre-processing power:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Dataflow handles the heavy lifting of complex data enrichment, delivering a “ready-to-act” context directly to the agent so it can focus on reasoning.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p role="presentation" style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;3.2. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Dataflow Unified embeddings Sinks:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We are introducing unified embedding generation directly within the data stream to eliminate “context lag”. You can now transform incoming data into high-dimensional vectors at low latency using Dataflow. These real-time embeddings are then seamlessly materialized into our expanded suite of high-throughput vector sinks, which now includes &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Spanner&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; (featuring its new built-in vector search) and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlloyDB&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, providing you with an up to date vector database for semantic search needs as well as for your autonomous agents making RAG calls with an instantly searchable and perfectly synchronized long-term memory. This feature works with both remote and local models, for example &lt;/span&gt;&lt;a href="https://developers.googleblog.com/en/deploying-embeddinggemma-at-scale-with-dataflow/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemma&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As we continue to build out the platform, customers can expect to see even tighter integrations and more powerful capabilities. We look forward to seeing what you build with these new capabilities.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 18 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/streaming-ai-news-from-next26/</guid><category>Streaming</category><category>Google Cloud Next</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What we announced in streaming AI at Next ‘26</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/streaming-ai-news-from-next26/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Jagdeep Singh</name><title>Director Product Management</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Prateek Duble</name><title>Group Product Manager</title><department></department><company></company></author></item></channel></rss>