<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Serverless</title><link>https://cloud.google.com/blog/products/serverless/</link><description>Serverless</description><atom:link href="https://cloudblog.withgoogle.com/blog/products/serverless/rss/" rel="self"></atom:link><language>en</language><lastBuildDate>Thu, 09 Apr 2026 16:00:02 +0000</lastBuildDate><image><url>https://cloud.google.com/blog/products/serverless/static/blog/images/google.a51985becaa6.png</url><title>Serverless</title><link>https://cloud.google.com/blog/products/serverless/</link></image><item><title>How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads</title><link>https://cloud.google.com/blog/products/serverless/cloud-run-worker-pools-at-estee-lauder-companies/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run has long provided developers with a straightforward, opinionated platform for running code. You can easily deploy request-driven web applications using Cloud Run services, or execute run-to-completion batch processing with Cloud Run jobs. However, as developers build more complex applications, like pipelines that process continuous streams of data or distributed AI workloads, they need an environment designed for continuous, background execution.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Estée Lauder Companies got just that with &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/run/docs/deploy-worker-pools"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run worker pools&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which transform Cloud Run from a platform for web workloads and background tasks, to a platform for pull-based workloads. Cloud Run worker pools are now generally available. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Estee Lauder Companies’ Rostrum platform is a polymorphic chat service for LLM-powered applications that originally ran as a standalone Cloud Run service. While the simple architecture worked for internal tools with predictable traffic, the team faced a major hurdle of the upcoming holiday shopping season for consumer-facing traffic. To launch their first consumer-facing generative AI application, &lt;/span&gt;&lt;a href="https://www.jomalone.com/ai-scent-advisor" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Jo Malone London’s AI Scent Advisor&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, they needed an architecture that would sustain the load of AI prompts from thousands of simultaneous users.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In just a few weeks, Estee Lauder Companies migrated to a producer-consumer model using Cloud Run worker pools. The web tier, a FastAPI application deployed as Cloud Run Service acts as the producer, instantly publishing user messages to Cloud Pub/Sub. The worker pools deployments act as “always-on” consumers, pulling messages from the queue to handle LLM inference.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By decoupling the user-facing web tier from LLM operations, Estee Lauder Companies achieved:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;100% message durability: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Pub/sub acts as a buffer such that even during holiday spikes, no user message is lost.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Strong UI latency SLAs: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Server-side rendering is decoupled from message processing load. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Minimal operations overhead:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The team spent virtually no time managing servers, allowing them to focus on the user experience rather than infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This modular architecture now serves as the blueprint for Estee Lauder Companies to rapidly launch specialized AI advisors across its diverse house of brands.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"The Jo Malone London AI Scent Advisor chains multiple LLM and tool calls — conversational discovery, deterministic scoring, copy generation — in a pipeline that had to run reliably at consumer scale without us managing infrastructure. Cloud Run worker pools was exactly the right primitive, and working directly with the product team as early adopters gave us the confidence to build on it ahead of GA. It's now the foundation for us to bring AI advisors to brands across the Estée Lauder Companies portfolio."&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Chris Curro, Principal Machine Learning Engineer, The Estée Lauder Companies&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_bo5uUuL.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Serverless for pull-based and distributed workloads&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Traditional serverless models often force background work into an HTTP push format, which can lead to timeouts, overscaling, or message loss during traffic surges. Cloud Run worker pools solve this by providing an always-on environment where the worker pool instances pull tasks or messages from a queue at their own pace, providing built-in backpressure that protects your infrastructure from crashing under load.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Unlike Cloud Run services, worker pools are designed for workloads requiring non-HTTP protocols. When a worker pool is attached to a VPC network, every instance receives a private IP address. This enables high-performance L4 ingress, allowing you to host services previously incompatible with the Google Cloud serverless platform.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With the GA of worker pools, Cloud Run supports major new categories of workloads:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Pull-based workloads: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Worker pools provide a reliable environment for running and scaling workloads that continuously pull messages from queues like Pub/Sub, &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/exploring-cloud-run-worker-pools-and-kafka-autoscaler?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Kafka&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Github Runners or Redis task queues.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Distributed AI/ML workloads: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Worker pools are a great fit for distributed LLM training or fine-tuning workloads. At GA, worker pools support NVIDIA L4 and  RTX PRO 6000 (Blackwell) GPUs.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_vhXTfXn.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;One of the most significant advantages of this new offering is its cost-efficiency, as worker pools can be approximately 40% cheaper than request-driven Services or Jobs for long-running background tasks.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Scaling pull-based workloads using Cloud Run External Metrics Autoscaler (CREMA)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Worker pools run a set of instances that do background work, but they still need a signal to scale. To bridge this gap, we recently built, and open-sourced, &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-external-metrics-autoscaling" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run External Metrics Autoscaler (CREMA)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;CREMA uses &lt;/span&gt;&lt;a href="https://keda.sh/docs/2.18/scalers/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;KEDA's library of scalers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; – including Kafka, Pub/sub, Github Actions and Prometheus – to automatically scale your instances based on metrics emitted by these external sources. By smoothly handling traffic surges and scaling back to zero during idle periods, CREMA ensures you optimize both performance and cost&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To start scaling, all you need to do is deploy CREMA as a Cloud Run service, and then define your scaling logic in a single YAML configuration file that instructs CREMA which external sources to monitor and which worker pool to scale.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here is an example of what it looks like to automatically scale a worker pool based on GitHub Runner queue depth:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;apiVersion: crema/v1\r\nkind: CremaConfig\r\nmetadata:\r\n  name: gh-demo\r\nspec:\r\n  scaledObjects:\r\n    - spec:\r\n        scaleTargetRef:\r\n          name: projects/example-project/locations/us-central1/workerpools/example-workerpool\r\n        triggers:\r\n          - type: github-runner\r\n            metadata:\r\n              owner: repo-owner\r\n              runnerScope: repo\r\n              repos: repo-name\r\n              targetWorkflowQueueLength: 1&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe82c50a0d0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Get started&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can deploy your first worker pool today by referring to the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/run/docs/deploy-worker-pools"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. To implement advanced, queue-aware scaling, explore the&lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-external-metrics-autoscaling" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;CREMA open-source repository&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to connect your workloads to KEDA-supported scalers.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To implement high-performance distributed workloads using Cloud Run worker pools and External Metrics Autoscaling (CREMA), you can refer to the below examples for the use case of your choice.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://docs.cloud.google.com/run/docs/tutorials/autoscale-workerpools-pubsub"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Autoscale Worker Pools with Pub/Sub pull subscription&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://docs.cloud.google.com/run/docs/tutorials/github-runner"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Run and scale self-hosted GitHub runners&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://docs.cloud.google.com/run/docs/tutorials/autoscale-workerpools-prometheus"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Autoscale Worker pools based on custom Prometheus metrics&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Thu, 09 Apr 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/cloud-run-worker-pools-at-estee-lauder-companies/</guid><category>Cloud Run</category><category>AI &amp; Machine Learning</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/cloud-run-worker-pools-at-estee-lauder-companies/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sagar Randive</name><title>Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Aniruddh Chaturvedi</name><title>Engineering Manager</title><department></department><company></company></author></item><item><title>Simplify your Cloud Run security with Identity Aware Proxy (IAP)</title><link>https://cloud.google.com/blog/products/serverless/iap-integration-with-cloud-run/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/run?e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; provides a powerful and scalable platform for deploying applications. Today, we’re introducing the general availability of two major enhancements to Cloud Run security: direct &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/iap?e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Identity-Aware Proxy&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (IAP) integration, and a way to allow public access to Cloud Run services that is compatible with &lt;/span&gt;&lt;a href="https://cloud.google.com/resource-manager/docs/organization-policy/restricting-domains#console"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Domain Restricted Sharing&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (DRS).&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Introducing direct IAP on Cloud Run&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;IAP lets you easily control user access to applications running in Google Cloud. Integrating IAP with Cloud Run previously required you to manually configure application load balancers and other complex network settings. This added operational overhead detracted from Cloud Run's core promise of serverless simplicity.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That changes today! You can now enable IAP directly on Cloud Run in &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;a single click, with no load balancers, and at no added cost.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Google Cloud does not charge for IAP (with some &lt;/span&gt;&lt;a href="https://cloud.google.com/iap/pricing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;exceptions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;), and it incurs no load balancer costs.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_2ixZT56.max-1000x1000.png"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pb995"&gt;Enable IAP authentication directly on a Cloud Run service&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Simplified enablement: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Turn on IAP in the UI or with a single flag (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;--iap&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;) through gcloud,  significantly simplifying deployments and saving valuable time and effort.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enterprise-grade security for all web apps: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Use IAP’s authentication and authorization policies based on user or group identities, as well as context-aware factors like IP address, geolocation, and device security status.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Support for &lt;/strong&gt;&lt;a href="https://cloud.google.com/iap/docs/use-workforce-identity-federation"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Workforce Identity Federation&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Easily manage access for your employees and partners using your existing identity providers.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Simplified Cross-Origin Resource Sharing (CORS):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Configure IAP directly on Cloud Run to &lt;/span&gt;&lt;a href="https://cloud.google.com/iap/docs/customizing#allowing_http_options_requests_cors_preflight"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;allow unauthenticated HTTP OPTIONS&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for &lt;/span&gt;&lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;COR&lt;/span&gt;&lt;/a&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;S&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; requests. This helps satisfy browser preflight checks while ensuring all other requests undergo authentication.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are already seeing a big uptake in organizations adopting IAP to secure Cloud Run workloads, for example, at L’Oreal.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“L'Oréal relies on Google Cloud's Identity-Aware Proxy (IAP) as a critical layer of security, ensuring that access to every web application we host on Google Cloud is meticulously filtered and controlled. The beauty of IAP lies in its simplicity and effectiveness; it's a self-managed solution that's not only free but also exceptionally straightforward to implement across our diverse application landscape. This ease of deployment, combined with a security posture that surpasses what we could achieve with custom-built solutions, makes IAP an indispensable tool for protecting our digital assets.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Antoine Castex, Group Data &amp;amp; A.I Architect, L'Oréal&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Allow public access when using DRS&lt;/span&gt;&lt;/h3&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--medium
      
      
        h-c-grid__col
        
        h-c-grid__col--4 h-c-grid__col--offset-4
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_7lQZnDe.max-1000x1000.png"
        
          alt="image2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="pb995"&gt;New simplified Cloud Run authentication UI&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While IAP is the recommended authentication mechanism for internal business applications on Cloud Run, &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/iam"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud IAM&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; remains essential for managing service-to-service communication. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Historically, Cloud Run's default behavior was to perform an IAM check (run.invoker role) on every request to an HTTPS endpoint. While this provided a strong security baseline, it had the potential to become a bottleneck when the intent was to create public apps, particularly when organizations also enforced the Domain Restricted Sharing policy.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;You can now disable this IAM "invoker" check by selecting “Allow Public access” for your applications. &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This gives you flexibility to rely on other security layers like organization policies, network-level controls, or custom authn/authz for your services. It also unlocks broader use cases:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Public websites: Host a store locator site on Cloud Run and make it accessible to everyone — even if your Org Policy restricts sharing (DRS enabled). You can do this by selecting “Allow Public access” and setting ingress to ‘All’.   &lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Private microservices: For services behind an internal ingress where network-level security is sufficient, you can bypass the IAM check by selecting “Allow Public access”.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Bilt leverages the 'disable IAM' feature for multiple mission-critical Cloud Run services deployed in multi-regional topologies. By disabling IAM on these instances, we establish a direct, unimpeded path from our edge, while maintaining security using Cloud Armor on the global load balancer. This simplified approach reduces infrastructure complexity and provides a more performant solution while maintaining org-wide security posture through organizational policies.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Kosta Krauth, CTO Bilt&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Getting started&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ready to get started? You can easily &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/securing/identity-aware-proxy-cloud-run"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;enable IAP directly on Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/run/docs/securing/managing-access"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;IAM in Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/run/docs/securing/ingress"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ingress settings&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/resource-manager/docs/organization-policy/restricting-domains#console"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Domain-restricted sharing&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Fri, 13 Mar 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/iap-integration-with-cloud-run/</guid><category>Security &amp; Identity</category><category>Cloud Run</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Simplify your Cloud Run security with Identity Aware Proxy (IAP)</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/iap-integration-with-cloud-run/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Ruchika Goel</name><title>Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Muthuraj Thangavel</name><title>Senior Product Manager, Google Cloud</title><department></department><company></company></author></item><item><title>High-performance inference meets serverless compute with NVIDIA RTX PRO 6000 on Cloud Run</title><link>https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Running large-scale inference models can involve significant operational toil, including cluster management and manual VM maintenance. One solution is to leverage a serverless compute platform to abstract away the underlying infrastructure. Today, we’re bringing the serverless experience to high-end inference with support for &lt;/span&gt;&lt;a href="https://www.nvidia.com/en-us/data-center/rtx-pro-6000-blackwell-server-edition/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NVIDIA RTX PRO™ 6000 Blackwell Server Edition GPUs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on Cloud Run. Now in preview, you can deploy massive models like Gemma 3 27B or Llama 3.1 70B with the 'deploy and forget' experience you’ve come to expect from Cloud Run. No reservations. No cluster management. Just code.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A powerful GPU platform&lt;/strong&gt;&lt;/h3&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_qqUpivV.max-1000x1000.jpg"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The NVIDIA RTX PRO 6000 Blackwell GPU provides a huge leap in performance compared to the NVIDIA L4 GPU, bringing 96GB vGPU memory, 1.6 TB/s of bandwidth and support for FP4 and FP6. This means you can serve up to 70B+ parameter models without having to manage any underlying infrastructure. Cloud Run lets you attach a NVIDIA RTX PRO 6000 Blackwell GPU to your Cloud Run service, job, or worker pools, on demand, with no reservations required. Here are some ways you can use the NVIDIA RTX PRO 6000 Blackwell GPU to accelerate your business:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Generative AI and inference:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; With its FP4 precision support, the NVIDIA RTX PRO 6000 Blackwell GPU’s high-efficiency compute accelerates LLM fine-tuning and inference, letting you create real-time generative AI applications such as multi-modal and text-to-image creation models. By &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/run/docs/configuring/services/gpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;running your model on Cloud Run services&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, you can also take advantage of rapid startup and scaling, going from zero instances to having a GPU with drivers installed under 5 seconds. When traffic eventually scales down zero and no more requests are being received, Cloud Run automatically scales your GPU instances down to zero.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Fine-tuning and offline inference&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: NVIDIA RTX PRO 6000 Blackwell GPUs can be used in conjunction with &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/run/docs/configuring/jobs/gpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run jobs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to fine-tune your model. The fifth-generation NVIDIA Tensor Cores can be used in conjunction with AI models to help accelerate rendering pipelines and enhance content creation. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Tailored scaling for specialized workloads&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Use &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/run/docs/configuring/workerpools/gpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GPU-enabled worker pools&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to apply granular control over your GPU workers, whether you need to dynamically scale based on custom external metrics or manually provision "always-on" instances for complex, stateful processing.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We built Cloud Run to be the simplest way to run production-ready, GPU-accelerated tasks. Some highlights of Cloud Run include: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed GPUs with flexible compute: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run pre-installs the necessary NVIDIA drivers so you can focus on your code. Cloud Run instances using NVIDIA RTX PRO 6000 Blackwell GPUs can configure up to 44 vCPU and 176GB of RAM.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Production-grade reliability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By default, Cloud Run offers zonal redundancy, helping to ensure enough capacity for your service to be resilient to a zonal outage; this also applies to Cloud Run with GPUs. Alternatively, you can turn off zonal redundancy and benefit from a lower price for best-effort failover of your GPU workloads in case of a zonal outage.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Tight integration&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Cloud Run works natively with the rest of Google Cloud. You can load massive model weights by mounting Cloud Storage buckets as local volumes, or use &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/iap/docs/enabling-cloud-run"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Identity-Aware Proxy (IAP)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to secure traffic that’s bound for a Cloud Run service.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The NVIDIA RTX PRO 6000 Blackwell GPU is available in preview on demand with availability in &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;us-central1&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;europe-west4&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, and limited availability in &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;asia-south2&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;asia-southeast1&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;. You can deploy your first service using &lt;/span&gt;&lt;a href="https://ollama.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ollama&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, one of the easiest way to run open models, on Cloud Run with NVIDIA RTX PRO 6000 GPUs enabled:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud beta run deploy my-service  \\\r\n--image ollama/ollama --port 11434 \\\r\n--cpu 20 --memory 80Gi \\\r\n--gpu-type nvidia-rtx-pro-6000 \\\r\n--no-gpu-zonal-redundancy \\\r\n--region us-central1&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe8440b7d00&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For more details, check out our updated &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/run/docs/configuring/services/gpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/run/docs/configuring/services/gpu-best-practices"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI inference best practices&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 02 Feb 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads/</guid><category>AI &amp; Machine Learning</category><category>Compute</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>High-performance inference meets serverless compute with NVIDIA RTX PRO 6000 on Cloud Run</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/cloud-run-supports-nvidia-rtx-6000-pro-gpus-for-ai-workloads/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>James Ma</name><title>Sr. Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Oded Shahar</name><title>Sr. Engineering Manager</title><department></department><company></company></author></item><item><title>Elevate your applications with Firestore’s new advanced query engine</title><link>https://cloud.google.com/blog/products/data-analytics/new-firestore-query-engine-enables-pipelines/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A hallmark of a valuable database is how easy it is to query the data inside, so that developers can build tailored and complex user experiences in an application. Last week marked a significant evolution for &lt;/span&gt;&lt;a href="https://cloud.google.com/products/firestore"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Google Cloud’s enterprise-grade, scalable document database, with the debut of an advanced query engine designed to help you build more sophisticated applications. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Available as part of &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore in Native mode&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, this powerful engine introduces over a hundred new query capabilities, called pipeline operations, available in preview, which streamline complex queries directly within the database. Alongside this, we're launching precise indexing controls and refreshed observability tools like query explain and query insights, giving you granular control over performance. All these robust capabilities are now available in the &lt;/span&gt;&lt;a href="http://docs.cloud.google.com/firestore/native/docs/editions-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore Enterprise edition&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which also offers a more transparent pricing model for potential cost savings. This is all in the service of building highly expressive, performant applications that can query, transform and filter data across many dimensions, with less operational overhead. At the same time, you’re benefiting from Firestore’s unique serverless foundation, multi-region replication, and virtually unlimited scalability, freeing you from database management complexities, so you can truly focus on innovation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Used by a vibrant community of over 600,000 developers, Firestore has long been appreciated for its simplicity. In 2019, Firestore Standard edition in Native mode streamlined the development of collaborative applications with a straightforward query interface that guaranteed high performance through the use of automatically generated indexes. However, this simplified query engine has a strong dependence on indexing for query execution, often demanding upfront planning throughout the application lifecycle. Now, with the introduction of the advanced query engine in Enterprise edition, developers can construct highly expressive applications, regardless of the explicit presence of indexes — particularly for demanding solutions like e-commerce, interactive gaming, content management, and sophisticated user personalization. The refined query engine makes it easier to create pipeline operations, complete with sophisticated new stages and expressions, including support for complex aggregations, querying directly over arrays, advanced string matching capabilities, and granular filtering options.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;A new query engine and pipeline operations experience&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To enable this, we’ve updated Firestore’s existing SDKs with expanded support for pipeline operations. Now, you can elegantly chain together numerous stages for essential tasks such as aggregations, grouping, and filtering. Queries now run without mandatory indexes, giving you complete autonomy over when you want to create indexes to optimize performance. Let’s take a look at an illustration of pipeline operations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Note: This example assumes you're familiar with Firestore's &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/data-model"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;data model&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/query-data/queries"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;existing query methods&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Suppose you want to identify the top trending hashtags on an existing food recipe application that allows users to add hashtags to recipes. For essential data (like the recipe text itself), you might represent a recipe as a document with some fields. Since a hashtag can be represented with just a string, you could add hashtags directly to the recipe document as an array of strings:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;{\r\n  title: &amp;quot;My recipe&amp;quot;,\r\n  instructions: &amp;quot;Cook the ingredients&amp;quot;,\r\n  authorId: &amp;quot;SomeAuthorID&amp;quot;,\r\n  hashtags: [&amp;quot;easy&amp;quot;, &amp;quot;high protein&amp;quot;, &amp;quot;low carb&amp;quot;],\r\n  ...\r\n}&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe8487b9cd0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Firestore users can query for specific hashtags within recipes using existing &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/query-data/understanding-core-pipelines"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;core operations&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. However, prior to pipeline operations, there was no direct way to extract and aggregate array data from within a document during a query.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With pipeline operations, you can “unnest” arrays directly. This makes it simple to identify and suggest trending hashtags to your users. Below is an example of how to implement this using Javascript:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;// Fetch 10 hashtags sorted by most popular.\r\nconst snapshot = await db.pipeline()\r\n\r\n  // Starting with the collection of recipe documents:\r\n  .collection(&amp;quot;recipes&amp;quot;)\r\n\r\n  // Limit the document to just the `hashtags` field.\r\n  .select(&amp;quot;hashtags&amp;quot;)\r\n\r\n  // Unnest each tag within the `hashtags` array to its own document.\r\n  .unnest(field(&amp;quot;hashtags&amp;quot;).as(&amp;quot;tagName&amp;quot;))\r\n\r\n  // Count the number of instances of each tag across recipes and\r\n  // consolidate documents sharing a tagName into a single document\r\n  // per tagName.\r\n  .aggregate({\r\n    accumulators: [countAll().as(&amp;quot;tagCount&amp;quot;)],\r\n    groups: [&amp;quot;tagName&amp;quot;]\r\n  })\r\n\r\n  // Sort the resulting hashtags by their count.\r\n  .sort(field(&amp;quot;tagCount&amp;quot;).descending())\r\n\r\n  // Limit query results to just the top ten hashtags.\r\n  .limit(10)\r\n\r\n  .execute()&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe8487b9a00&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In addition, Firestore Enterprise edition supports a broader array of &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/enterprise-index-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;index types&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (including single field, composite, sparse, non-sparse, and unique indexes) that lets you maximize query performance even more. Furthermore, you can control &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;when&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; indexes are created, consequently improving overall write performance and storage utilization when compared to Standard edition’s automatic single-field indexes. This helps to mitigate index fanout during write operations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Since indexing is fully customizable, Enterprise edition also provides advanced observability tools — &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/enterprise-query-explain"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;query explain&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/enterprise-query-insights"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;query insights&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; — specifically built to help developers identify and optimize queries by identifying missing indexes.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Through query explain, developers can profile queries to gain a comprehensive understanding of query planner details and view execution statistics. This includes essential data such as billing information, and deep, system-level visibility into the query's execution path. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1-queryexplain.max-1000x1000.png"
        
          alt="1-queryexplain"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="n0xrd"&gt;Determine if a query is using an index and analyze its total execution metrics by profiling it with query explain.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Complementing this, query insights enables ongoing monitoring of high-latency and frequently executed queries that may require tuning. By utilizing the query insights dashboard, you can identify queries that can benefit from deploying indexes to boost performance.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2-queryinsights.max-1000x1000.png"
        
          alt="2-queryinsights"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="n0xrd"&gt;Leverage query insights to identify the highest latency and most frequently executed queries on your database, evaluating whether they require indexing based on the quantity of index entries scanned.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Migration for current Firestore customers&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you’re new to Firestore, getting started is easy — simply &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/manage-databases"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;create a Firestore Enterprise edition database&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. For existing Firestore developers, transitioning to Firestore pipeline operations is also simple: just use the integrated &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/manage-data/export-import"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;import and export service&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to migrate data from a Firestore Standard edition database to a freshly provisioned Enterprise edition database. Crucially, Enterprise edition maintains backwards compatibility, so you can retain your existing application code for Firestore core operations. When the time is right to harness the advanced capabilities, here’s how to convert code from core operations into pipeline operations:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;const query = db.collection(&amp;quot;recipes&amp;quot;).where(&amp;quot;authorId&amp;quot;, &amp;quot;==&amp;quot;, user.id);\r\n\r\n// Convert the query into a pipeline\r\nconst pipeline = db.pipeline.createFrom(query);&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe8487b9c70&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;And then you can immediately begin working with the new pipeline capabilities. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;// From the last snippet\r\nconst pipeline = db.pipeline.createFrom(query);\r\n\r\nconst snapshot = pipeline\r\n  .where(field(&amp;quot;rating&amp;quot;).greaterThan(4))\r\n  .execute();&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe8487b9730&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Predictable pricing and optimized costs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Firestore Enterprise edition utilizes an improved, transparent &lt;/span&gt;&lt;a href="https://cloud.google.com/firestore/enterprise/pricing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;pricing model&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for managing costs. For all read and write operations performed against the database, you are now billed based on the size of the documents and associated index entries involved. This new approach brings potential savings of up to 86% when executing read operations on documents under 4 kibibytes. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Real-time listen query updates are separately metered and billed as they are incurred. Furthermore, there are no upfront fees or latent costs resulting from incorrect database cluster capacity planning or inefficient database sharding. Storage consumption is billed solely for the actual capacity you use, inclusive of replicated copies for high availability. And if you’re new to Firestore and want to try it out, Enterprise edition includes access to a generous free-tier to make it easy to get started.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started with Firestore pipeline operations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Enterprise edition offers an advanced query engine to power flexible developer experiences, accessible both through &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore in Native mode&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/mongodb-compatibility/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore with MongoDB compatibility mode&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This allows developers to maximize existing libraries and tools from either the Firestore and MongoDB developer communities. You can get started with Firestore pipeline operations in preview today, by creating a new Firestore Enterprise edition database in Native mode. To delve into how to get started with pipeline operations, refer to the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/firestore/native/docs/pipeline/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Embark on your journey with the Enterprise edition today — benefit from zero upfront fees and immediate access to a generous free-tier: &lt;/span&gt;&lt;a href="https://cloud.google.com/products/firestore"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://cloud.google.com/products/firestore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 20 Jan 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/new-firestore-query-engine-enables-pipelines/</guid><category>Serverless</category><category>Databases</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/0-pipelineshero.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Elevate your applications with Firestore’s new advanced query engine</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/0-pipelineshero.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/new-firestore-query-engine-enables-pipelines/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Minh Nguyen</name><title>Group Product Manager, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Joseph (JD) Batchik</name><title>Staff Software Engineer, Google Cloud</title><department></department><company></company></author></item><item><title>Responding to CVE-2025-55182: Secure your React and Next.js workloads</title><link>https://cloud.google.com/blog/products/identity-security/responding-to-cve-2025-55182/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Editor's note&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;: This blog was updated on Dec. 4, 5, 7, and 12, 2025, with additional guidance on Cloud Armor WAF rule syntax, and WAF enforcement across App Engine Standard, Cloud Functions, and Cloud Run.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Earlier today, Meta and Vercel publicly disclosed two vulnerabilities that expose services built using the popular open-source frameworks &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;React&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Server Components&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; (&lt;/span&gt;&lt;a href="https://www.cve.org/CVERecord?id=CVE-2025-55182" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;CVE-2025-55182&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Next.js &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;to remote code execution risks when used for some server-side use cases. At Google Cloud, we understand the severity of these vulnerabilities, also known as &lt;/span&gt;&lt;a href="https://react2shell.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;React2Shell&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and our security teams have shared their recommendations to help our customers take immediate, decisive action to secure their applications.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Vulnerability background&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;React Server Components framework&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is commonly used for building user interfaces. On Dec. 3, 2025, &lt;/span&gt;&lt;a href="http://cve.org" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;CVE.org&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; assigned this vulnerability as &lt;/span&gt;&lt;a href="https://www.cve.org/CVERecord?id=CVE-2025-55182" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;CVE-2025-55182&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. The official Common Vulnerability Scoring System (CVSS) base severity score has been determined as Critical, a severity of 10.0. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Vulnerable versions&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: React 19.0, 19.1.0, 19.1.1, and 19.2.0&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Patched&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; in React 19.2.1&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fix&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Announcement&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Next.js is a web development framework that depends on React, and is also commonly used for building user interfaces. (The Next.js vulnerability was referenced as &lt;/span&gt;&lt;a href="https://www.cve.org/CVERecord?id=CVE-2025-66478" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;CVE-2025-66478&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; before being marked as a duplicate.)&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Vulnerable versions&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Next.js 15.x, Next.js 16.x, Next.js 14.3.0-canary.77 and later canary releases&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Patched&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; versions are listed &lt;/span&gt;&lt;a href="https://nextjs.org/blog/CVE-2025-66478#required-action" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fix&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Announcement&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://nextjs.org/blog/CVE-2025-66478" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://nextjs.org/blog/CVE-2025-66478&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Threat Intelligence Group (GTIG) has also published a new report to help understand the &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/threat-intelligence/threat-actors-exploit-react2shell-cve-2025-55182"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;specific threats exploiting React2Shell&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We strongly encourage organizations who manage environments relying on the React and Next.js frameworks to update to the latest version, and take the mitigation actions outlined below.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Mitigating CVE-2025-55182&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We have created and rolled out a new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Armor web application firewall (WAF) rule&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; designed to detect and block exploitation attempts related to CVE-2025-55182. This new rule is &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;available now&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and is intended to help protect your internet-facing applications and services that use global or regional Application Load Balancers. We recommend deploying this rule as a temporary mitigation while your vulnerability management program patches and verifies all vulnerable instances in your environment.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For customers using &lt;/span&gt;&lt;a href="https://cloud.google.com/appengine/"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;App Engine Standard&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/functions/"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Functions&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/run/"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://firebase.google.com/products/hosting" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Firebase Hosting&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;a href="https://firebase.google.com/products/app-hosting" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Firebase App Hosting&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, we provide an additional layer of defense for serverless workloads by automatically enforcing platform-level WAF rules that can detect and block the most common exploitation attempts related to CVE-2025-55182.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For &lt;/span&gt;&lt;a href="https://support.projectshield.google/s/article/Protecting-Your-Website-From-Known-Vulnerabilities" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Project Shield&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; users, we have deployed WAF protections for all sites and no action is necessary to enable these WAF rules. For long-term mitigation, you will need to patch your origin servers as an essential step to eliminate the vulnerability (see additional guidance below).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Armor and the Application Load Balancer can be used to deliver and protect your applications and services regardless of whether they are deployed on Google Cloud, on-premises, or on another infrastructure provider. If you are not yet using Cloud Armor and the Application Load Balancer, please follow the guidance further down to get started.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While these platform-level rules and the optional Cloud Armor WAF rules (for services behind an Application Load Balancer) help mitigate the risk from exploits of the CVE, we continue to strongly recommend updating your application dependencies as the primary long-term mitigation.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Deploying the cve-canary WAF rule for Cloud Armor&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To configure Cloud Armor to detect and protect from CVE-2025-55182, you can use the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/waf-rules#cves_and_other_vulnerabilities"&gt;&lt;code style="text-decoration: underline; vertical-align: baseline;"&gt;cve-canary&lt;/code&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; preconfigured WAF rule&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; leveraging the new ruleID that we have added for this vulnerability. This rule is opt-in only, and must be added to your policy even if you are already using the cve-canary rules.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In your Cloud Armor backend security policy, create a new rule and configure the following match condition:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;(has(request.headers[&amp;#x27;next-action&amp;#x27;]) || has(request.headers[&amp;#x27;rsc-action-id&amp;#x27;]) || request.headers[&amp;#x27;content-type&amp;#x27;].contains(&amp;#x27;multipart/form-data&amp;#x27;) || request.headers[&amp;#x27;content-type&amp;#x27;].contains(&amp;#x27;application/x-www-form-urlencoded&amp;#x27;)) &amp;amp;&amp;amp; evaluatePreconfiguredWaf(&amp;#x27;cve-canary&amp;#x27;,{&amp;#x27;sensitivity&amp;#x27;: 0, &amp;#x27;opt_in_rule_ids&amp;#x27;: [&amp;#x27;google-mrs-v202512-id000001-rce&amp;#x27;,&amp;#x27;google-mrs-v202512-id000002-rce&amp;#x27;]})&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe82df3aa60&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This can be accomplished from the Google Cloud console by navigating to Cloud Armor and modifying an existing or creating a new policy.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--medium
      
      
        h-c-grid__col
        
        h-c-grid__col--4 h-c-grid__col--offset-4
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/20251205_11am_rule_1.max-1000x1000.png"
        
          alt="20251205_11am_rule (1)"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="5admg"&gt;Cloud Armor rule creation in the Google Cloud console.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;Alternatively, the gcloud CLI can be used to create or modify a policy with the requisite rule:&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud compute security-policies rules create PRIORITY_NUMBER \\\r\n    --security-policy SECURITY_POLICY_NAME \\\r\n    --expression &amp;quot;(has(request.headers[\&amp;#x27;next-action\&amp;#x27;]) || has(request.headers[\&amp;#x27;rsc-action-id\&amp;#x27;]) || request.headers[\&amp;#x27;content-type\&amp;#x27;].contains(\&amp;#x27;multipart/form-data\&amp;#x27;) || request.headers[\&amp;#x27;content-type\&amp;#x27;].contains(\&amp;#x27;application/x-www-form-urlencoded\&amp;#x27;)) &amp;amp;&amp;amp; evaluatePreconfiguredWaf(\&amp;#x27;cve-canary\&amp;#x27;,{\&amp;#x27;sensitivity\&amp;#x27;: 0, \&amp;#x27;opt_in_rule_ids\&amp;#x27;: [\&amp;#x27;google-mrs-v202512-id000001-rce\&amp;#x27;,\&amp;#x27;google-mrs-v202512-id000002-rce\&amp;#x27;]})&amp;quot; \\\r\n    --action=deny-403&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe82df3a6a0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Additionally, if you are managing your rules with Terraform, you may implement the rule via the following syntax:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;rule {\r\n    action   = &amp;quot;deny(403)&amp;quot;\r\n    priority = &amp;quot;PRIORITY_NUMBER&amp;quot;\r\n    match {\r\n      expr {\r\n        expression = &amp;quot;(has(request.headers[\&amp;#x27;next-action\&amp;#x27;]) || has(request.headers[\&amp;#x27;rsc-action-id\&amp;#x27;]) || request.headers[\&amp;#x27;content-type\&amp;#x27;].contains(\&amp;#x27;multipart/form-data\&amp;#x27;) || request.headers[\&amp;#x27;content-type\&amp;#x27;].contains(\&amp;#x27;application/x-www-form-urlencoded\&amp;#x27;)) &amp;amp;&amp;amp; evaluatePreconfiguredWaf(\&amp;#x27;cve-canary\&amp;#x27;,{\&amp;#x27;sensitivity\&amp;#x27;: 0, \&amp;#x27;opt_in_rule_ids\&amp;#x27;: [\&amp;#x27;google-mrs-v202512-id000001-rce\&amp;#x27;,\&amp;#x27;google-mrs-v202512-id000002-rce\&amp;#x27;]})&amp;quot;\r\n      }\r\n    }\r\n    description = &amp;quot;Applies protection for CVE-2025-55182 (React/Next.JS)&amp;quot;\r\n  }&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe82df3a5b0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Verifying WAF rule safety for your application and consuming telemetry&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Armor rules can be &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/security-policy-overview#preview_mode"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;configured in preview mode&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a logging-only mode to test or monitor the expected impact of the rule without Cloud Armor enforcing the configured action. We recommend that the new rule described above first be deployed in preview mode in your production environments so that you can see what traffic it would block. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once you verify that the new rule is behaving as desired in your environment, then you can disable preview mode to allow Cloud Armor to actively enforce it.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Armor per-request WAF logs are emitted as part of the Application Load Balancer logs to Cloud Logging. To see what Cloud Armor’s decision was on every request, load balancer logging first &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/load-balancing/docs/https/https-logging-monitoring"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;needs to be enabled on a per backend service basis&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Once it is enabled, all subsequent Cloud Armor decisions will be logged and can be found in Cloud Logging by &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/request-logging"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;following these instructions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Interaction of Cloud Armor rules with &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;vulnerability&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; scanning tools&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;There has been a proliferation of scanning tools designed to help identify vulnerable instances of React and Next.js in your environments. Many of those scanners are designed to identify the version number of relevant frameworks in your servers and do so by crafting a &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;legitimate&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; query and inspecting the response from the server to detect the version of React and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Next.js&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; that is running. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our WAF rule is designed to detect and prevent exploit attempts of &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;CVE-2025-55182&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. As the scanners discussed above are not attempting an exploit, but sending a safe query to &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;elicit&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; a response revealing indications of the version of the software, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;the above Cloud Armor rule will not detect or block such scanners. &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If the findings of these scanners indicate a vulnerable instance of software protected by Cloud Armor, that does not mean that an actual exploit attempt of the vulnerability will successfully get through your Cloud Armor security policy. Instead, such findings mean that the version React or Next.js detected is known to be vulnerable and should be patched.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;How to get started with Cloud Armor for new users&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If your workload is already using an Application Load Balancer to receive traffic from the internet, you can configure Cloud Armor to protect your workload from this and other application-level vulnerabilities (as well as DDoS attacks) by following &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/configure-security-policies"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;these instructions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you are not yet using an Application Load Balancer and Cloud Armor, you can get started with the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/load-balancing/docs/https"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;external Application Load Balancer overview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/security-policy-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Armor overview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/best-practices"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Armor best practices&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If your workload is using &lt;/span&gt;&lt;a href="http://docs.cloud.google.com/run/"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, or &lt;/span&gt;&lt;a href="https://cloud.google.com/appengine"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;App Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and receives traffic from the internet, you must first &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;set up an Application Load Balancer in front of your endpoint&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to leverage Cloud Armor security policies to protect your workload. You will then need to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/integrating-cloud-armor#serverless"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;configure the appropriate controls&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to ensure that Cloud Armor and the Application Load Balancer can’t be bypassed.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Best practices and additional risk mitigations&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once you configure Cloud Armor, we recommend consulting our &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/best-practices"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;best practices guide&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Be sure to account for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/armor/docs/security-policy-overview#limitations"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;limitations&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;discussed in the documentation to minimize risk and optimize performance while ensuring the safety and availability of your workloads. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Serverless platform protections&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud is enforcing platform-level protections across App Engine Standard, Cloud Functions, and Cloud Run to automatically help protect against common exploit attempts of CVE-2025-55182. This protection supplements the protections already in place for Firebase Hosting and Firebase App Hosting.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;What this means for you:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Applications deployed to those serverless services benefit from these WAF rules that are enabled by default to help provide a base level of protection without requiring manual configuration.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;These rules are designed to block known malicious payloads targeting this vulnerability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Important considerations:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Patching is still critical:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; These platform-level defenses are intended to be a temporary mitigation. The most effective long-term solution is to update your application's dependencies to non-vulnerable versions of React and Next.js, and redeploy them.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Potential impacts:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; While unlikely, if you believe this platform-level filtering is incorrectly impacting your application's traffic, please contact &lt;/span&gt;&lt;a href="https://support.google.com/cloud/answer/6282346" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Support&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and reference issue number 465748820.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Long-term mitigation: Mandatory framework update and redeployment&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While WAF rules provide critical frontline defense, the most comprehensive long-term solution is to patch the underlying frameworks.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;While Google Cloud is providing platform-level protections and Cloud Armor options, we urge all customers running React and Next.js applications on Google Cloud to immediately update their dependencies to the latest stable versions (React 19.2.1 or the relevant version of Next.js listed &lt;/strong&gt;&lt;a href="https://nextjs.org/blog/CVE-2025-66478#required-action" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;), and redeploy their services.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This applies specifically to applications deployed on:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Run, Cloud Run functions, or App Engine&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Update your application dependencies with the updated framework versions and redeploy.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Kubernetes Engine (GKE)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Update your container images with the latest framework versions and redeploy your pods.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Compute Engine&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;:&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The public OS images provided by Google Cloud do not have React or Next.js packages installed by default. If you have installed a custom OS with the affected packages, update your workloads to include the latest framework versions and enable WAF rules in front of all workloads.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Firebase&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;:&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;If you’re using Cloud Functions for Firebase, Firebase Hosting, or Firebase App Hosting, update your application dependencies with the updated framework versions and redeploy. Firebase Hosting and App Hosting are also automatically enforcing a rule to limit exploitation of CVE-2025-55182 through requests to custom and default domains.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Patching your applications is an essential step to eliminate the vulnerability at its source and ensure the continued integrity and security of your services.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We will continue to monitor the situation closely and provide further updates and guidance as necessary. Please refer to our official &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/support/bulletins#gcp-2025-072"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Security advisories&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for the most current information and detailed steps.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you have any questions or require assistance, please contact &lt;/span&gt;&lt;a href="https://support.google.com/cloud/answer/6282346" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Support&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and reference issue number 465748820.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 03 Dec 2025 23:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/identity-security/responding-to-cve-2025-55182/</guid><category>DevOps &amp; SRE</category><category>Application Development</category><category>Networking</category><category>Serverless</category><category>Security &amp; Identity</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Responding to CVE-2025-55182: Secure your React and Next.js workloads</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/identity-security/responding-to-cve-2025-55182/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Tim April</name><title>Security Reliability Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Emil Kiner</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>11 ways to reduce your Google Cloud compute costs today</title><link>https://cloud.google.com/blog/products/compute/cost-saving-strategies-when-migrating-to-google-cloud-compute/</link><description>&lt;div class="block-paragraph"&gt;&lt;p data-block-key="t3t6l"&gt;As the saying goes, "a penny saved is a penny earned," and this couldn't be more true when it comes to cloud infrastructure. In today's competitive business landscape, you need to maintain the performance to meet your business needs. Luckily, Google Cloud’s &lt;a href="https://cloud.google.com/products/compute"&gt;Compute Engine&lt;/a&gt; and block storage services offer numerous opportunities to reduce costs without sacrificing performance, especially in the context of your migration and modernization initiatives.&lt;/p&gt;&lt;p data-block-key="fodf8"&gt;In this article, we'll explore &lt;b&gt;11 key ways&lt;/b&gt; to optimize your infrastructure spending on Google Cloud, from simple adjustments to strategic decisions that can result in significant long-term savings.&lt;/p&gt;&lt;h3 data-block-key="58qqa"&gt;&lt;b&gt;1. Choose the right VM instances&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="bhado"&gt;One of the most effective ways to reduce Compute Engine costs is to ensure that you’ve properly selected and right-sized your virtual machines (VMs) for their workloads to support your migration and modernization efforts. Whether you're new to Google Cloud or already using Compute Engine, adopting the latest-generation VMs — such as &lt;a href="https://cloud.google.com/compute/docs/general-purpose-machines#n4_series"&gt;N4&lt;/a&gt;, &lt;a href="https://cloud.google.com/compute/docs/general-purpose-machines#c4_series"&gt;C4&lt;/a&gt;, &lt;a href="https://cloud.google.com/compute/docs/general-purpose-machines#c4d_series"&gt;C4D&lt;/a&gt;, and &lt;a href="https://cloud.google.com/compute/docs/general-purpose-machines#c4a_series"&gt;C4A&lt;/a&gt; — can deliver substantial savings and improved price-performance.&lt;/p&gt;&lt;p data-block-key="8rrjo"&gt;Powered by Google Cloud’s &lt;a href="https://cloud.google.com/titanium?e=48754805&amp;amp;hl=en"&gt;Titanium&lt;/a&gt; architecture, our latest-generation VMs offer faster CPUs, higher memory bandwidth, and more efficient virtualization than their predecessors, so you can handle the same workloads with fewer resources. For existing customers, migrating from older VM generations to the newest VMs can significantly lower total costs while helping you exceed current performance levels. Organizations that have made the switch often report 20–40% better performance along with meaningful reductions in cloud compute spend. For example, &lt;a href="https://www.elastic.co/blog/elasticsearch-runs-faster-google-axion-processors" target="_blank"&gt;Elastic&lt;/a&gt; leveraged the general-purpose C4A machine series based on &lt;a href="https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu?e=48754805"&gt;Google Cloud's Arm-based Axion CPUs&lt;/a&gt;, to achieve a compelling efficiency and performance uplift for their workloads.&lt;/p&gt;&lt;p data-block-key="febac"&gt;Beyond &lt;a href="https://cloud.google.com/compute/docs/general-purpose-machines"&gt;general-purpose VMs&lt;/a&gt;, we also offer specialized machine types to address unique customer requirements. Compute-optimized HPC VMs like &lt;a href="https://cloud.google.com/blog/products/compute/new-h4d-vms-optimized-for-hpc?e=48754805"&gt;H4D&lt;/a&gt; are designed for high-performance computing and data analytics, offering extreme performance for demanding workloads. &lt;a href="https://cloud.google.com/compute/docs/memory-optimized-machines#m4_series"&gt;M4&lt;/a&gt; and &lt;a href="https://cloud.google.com/compute/docs/memory-optimized-machines#x4_series"&gt;X4&lt;/a&gt; instances cater to memory-intensive applications, while &lt;a href="https://cloud.google.com/compute/docs/storage-optimized-machines#z3_series"&gt;Z3&lt;/a&gt; instances are ideal for storage-intensive workloads. Furthermore, if you need complete control over your hardware environment and maximum performance isolation, we offer &lt;a href="https://cloud.google.com/compute/docs/instances/bare-metal-instances#:~:text=Bare%20metal%20instances%20provide%20direct,same%20way%20as%20VM%20instances."&gt;bare metal instances&lt;/a&gt;.&lt;/p&gt;&lt;p data-block-key="eqplt"&gt;These options help ensure that even the most specialized and performance-sensitive workloads can find an optimal and cost-effective home within the Compute Engine portfolio.&lt;/p&gt;&lt;h3 data-block-key="5i8hm"&gt;&lt;b&gt;2. Optimize your block storage selections&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="bsaae"&gt;The best way to lower your block storage TCO, while ensuring your workloads remain successful, is to drive high resource efficiency. &lt;a href="https://cloud.google.com/compute/docs/disks/hyperdisks"&gt;Hyperdisk&lt;/a&gt; makes it simple to drive high performance and high efficiency by enabling you to optimize your block storage to your workload and through Storage Pools. We’ll discuss each of these capabilities, and how you can use them to lower your block storage TCO below.&lt;/p&gt;&lt;p data-block-key="6kmjp"&gt;Workload Optimized: With Hyperdisk, you can independently tune capacity and performance to match your block storage resources to your workload. Hyperdisk enables you to independently provision performance and capacity at the volume level. You can leverage this capability to purchase just the capacity and performance you need, no more and no less. You can also take advantage of Hyperdisk Balanced’s “baseline” performance (i.e. included free with every volume), you can serve the vast majority of your VMs without purchasing any extra performance.&lt;/p&gt;&lt;p data-block-key="at87k"&gt;Storage Pools: Hyperdisk is the only hyperscale cloud block storage to offer thin-provisioned performance and capacity. With Hyperdisk Storage Pools, you can provision the aggregate performance and capacity your workload requires, while still provisioning the volume level capacity performance your workloads request (also known as &lt;a href="https://en.wikipedia.org/wiki/Thin_provisioning" target="_blank"&gt;thin-provisioning&lt;/a&gt;). This allows you to pay for the resources you need, not the sum of the volumes you’ve provisioned. As a result, you can &lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/hyperdisk-storage-pools-is-now-generally-available#:~:text=Infrastructure%20Manager%2C%20REWE-,Get%20started,use%20and%20manage%20your%20pools."&gt;lower your overall block storage TCO by as much as 50%.&lt;/a&gt;&lt;/p&gt;&lt;p data-block-key="929m4"&gt;For more information on how to select the right block storage for your workload and to see how customers have benefitted from Hyperdisk, read this &lt;a href="https://cloud.google.com/blog/products/storage-data-transfer/how-to-choose-the-right-hyperdisk-block-storage-for-your-use-case?e=48754805"&gt;blog&lt;/a&gt;.&lt;/p&gt;&lt;h3 data-block-key="e9sir"&gt;&lt;b&gt;3. Consider custom compute classes&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="616a2"&gt;To get the most out of our latest-generation VMs, Google Kubernetes Engine (GKE) &lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/about-custom-compute-classes"&gt;&lt;b&gt;custom compute classes&lt;/b&gt;&lt;/a&gt; (CCC) offer an advanced way to optimize compute choices and provide high availability. Instead of being limited to a single machine type for your workloads, you can define a prioritized list of VM instance types. This allows you to set the newest, most price-performant VMs — including our latest-generation VMs — as your top priority. GKE custom compute classes provide the capability to automatically and seamlessly spin up instances based on your specified priority list. This feature helps you maximize the availability of your compute capacity while still aiming for the most cost-effective options, so your workloads can scale reliably without manual intervention.&lt;/p&gt;&lt;p data-block-key="b9b8c"&gt;Here are some specific use cases for how custom compute classes can help you optimize costs:&lt;/p&gt;&lt;ul&gt;&lt;li data-block-key="b747a"&gt;&lt;b&gt;Autoscaling cost-performant fallbacks:&lt;/b&gt; When demand peaks, you might be tempted to autoscale using a highly available but less cost-efficient VM type. CCC allows you to take a tiered approach. You can set up several cost-efficient fallback alternatives, so that as demand increases, GKE first attempts to use the most cost-effective options, and progressively moves to the other choices in your list when necessary to meet demand.&lt;/li&gt;&lt;li data-block-key="smd6"&gt;&lt;b&gt;AI/ML inference:&lt;/b&gt; Running AI/ML inference workloads often involves significant compute resources. Instead of maintaining a large, static reservation that might sit idle during off-peak times, CCC lets you provision a minimal base reservation and leverage more cost-effective capacity types, such as &lt;a href="https://docs.google.com/document/d/1KLJ97-xgtX9pDaodkMsXN18xJOYiBFZUSqkdYb--_44/edit?tab=t.0" target="_blank"&gt;Spot VMs&lt;/a&gt;, to handle peak inference demand — all orchestrated through your CCC configuration.&lt;/li&gt;&lt;li data-block-key="4pm95"&gt;&lt;b&gt;Adopting new VM generations:&lt;/b&gt; Combine the power of GKE custom compute classes with &lt;a href="https://cloud.google.com/compute/docs/instances/committed-use-discounts-overview#spend_based"&gt;Compute Flexible committed use discounts&lt;/a&gt; (Flex CUDs) to de-risk the adoption of new, cost-efficient VM series like N4 and C4. With CCC, you can define fallback options, providing workload resilience, while Flex CUDs offer financial adaptability, as the discounts apply across your total eligible compute spend, regardless of the specific VM series you use. This dual approach is a safe, cost-effective strategy for leveraging the latest hardware without disruption. For more information, read this &lt;a href="https://cloud.google.com/blog/products/compute/adopt-new-vm-series-with-gke-compute-classes-flexible-cuds/?e=48754805"&gt;blog&lt;/a&gt;.&lt;/li&gt;&lt;li data-block-key="5fieq"&gt;&lt;b&gt;Using flexible Spot VMs:&lt;/b&gt; Spot VMs offer significant savings but can be preempted. Being constrained to a single Spot VM shape increases the risk that capacity will not be available. With CCC, you can define multiple fallback Spot VM types. This "spot surfing" capability allows the application to remain on cost-efficient Spot capacity by automatically pivoting to alternative Spot instance types if the primary choice is unavailable.&lt;/li&gt;&lt;/ul&gt;&lt;p data-block-key="9pj6j"&gt;In short, by leveraging GKE CCC, you can artfully mix and match various VM types and consumption models, including On-Demand, Spot, DWS FlexStart, and instances covered by CUDs, to build a resilient and highly cost-optimized infrastructure that adapts to the unique needs and patterns of your workloads.&lt;/p&gt;&lt;h3 data-block-key="4f7sf"&gt;&lt;b&gt;4. Leverage custom machine types (CMT)&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="44on9"&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type"&gt;Custom machine types&lt;/a&gt;, available on N4 VMs, allow you to precisely configure virtual machines to your exact specifications. Rather than selecting from predefined machine types that might include excess capacity, you can tailor the CPU-to-memory ratio specifically for your workloads, so you only pay for resources you actually use. This targeted approach minimizes waste and can significantly reduce your cloud spend, especially when migrating from on-premises to Google Cloud or from other cloud providers.&lt;/p&gt;&lt;p data-block-key="ditbj"&gt;This flexibility becomes particularly valuable if your applications have unique resource profiles that don't align well with our standard offerings. Custom machine types let you create the perfect environment for your needs. By avoiding the compromise of over-provisioning certain resources while potentially constraining others, you can achieve both better performance and more efficient spending across your Compute Engine deployment.&lt;/p&gt;&lt;p data-block-key="16cdr"&gt;As an example, take a memory-intensive workload that runs best with 16 vCPU, and 70 GB memory. Normally, you would need to pick a VM with 128 GB memory with our standard shapes, or in other cloud contexts, resulting in higher costs to run your workload due to the extra provisioned resources. Instead, with custom machine types, you can easily launch a VM with 16 vCPU and 70 GB memory, resulting in an 18% cost savings vs standard N4-highmem-16 VMs.&lt;/p&gt;&lt;h3 data-block-key="ei6g2"&gt;&lt;b&gt;5. Make the most of committed use discounts&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="3vd4p"&gt;CUDs are a strategic cost-saving opportunity for organizations with steady, predictable computing needs. By committing to resource usage over one- or three-year periods, you can reduce cloud costs by up to 70% compared to on-demand pricing. This approach not only helps ensure budget predictability but also converts fixed infrastructure spending into a financial advantage, making it ideal for stable workloads that support core business functions.&lt;/p&gt;&lt;p data-block-key="bnjpk"&gt;Google Cloud offers flexible CUD structures to align with various operational models. Resource-based commitments target specific machine types and regions, flexible commitments apply discounts across projects, regions, and machine series — great for dynamic environments. By analyzing historical usage and forecasting future needs, you can identify workloads suited for these discounts, reinvesting the savings into innovation and scaling initiatives.&lt;/p&gt;&lt;h3 data-block-key="einu0"&gt;&lt;b&gt;6. Manage unused disk space&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="5n36i"&gt;You pay for the total provisioned disk space, regardless of how much you actually use. Many organizations tend to over-provision storage "just in case," which often leads to unnecessary and costly waste. For instance, if you provision a 100GB disk but only use 20GB, you're still paying for the entire 100GB. Being intentional and precise with your storage allocations — rather than rounding up to common sizes — can lead to significant cost savings.&lt;/p&gt;&lt;p data-block-key="d58ss"&gt;To optimize spending, it's important to adopt a few best practices. Using &lt;a href="https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent"&gt;Ops Agent&lt;/a&gt;, regularly audit disk usage across your infrastructure to identify and eliminate inefficiencies. Resize disks to align with actual consumption, allowing a reasonable buffer for growth. Implement automated alerts in &lt;a href="https://cloud.google.com/monitoring?e=48754805&amp;amp;hl=en"&gt;Cloud Monitoring&lt;/a&gt; to detect underutilized disks and take corrective action. For stateless applications, consider using smaller boot disk images to minimize overhead and reduce costs even further.&lt;/p&gt;&lt;p data-block-key="4hu1t"&gt;In addition, consider the following optimization strategies to further reduce costs and improve efficiency:&lt;/p&gt;&lt;ul&gt;&lt;li data-block-key="as6ik"&gt;Use Google Cloud’s monitoring tools to track CPU, memory, and disk usage over time.&lt;/li&gt;&lt;li data-block-key="3uj3j"&gt;Establish a regular review cycle to identify and right-size over-provisioned resources.&lt;/li&gt;&lt;li data-block-key="7jobf"&gt;Test workloads across different VM configurations to find the optimal balance between cost and performance.&lt;/li&gt;&lt;/ul&gt;&lt;h3 data-block-key="a49bh"&gt;&lt;b&gt;7. Use Spot VMs&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="52fp7"&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/spot"&gt;Spot VMs&lt;/a&gt; provide the same machine types and configuration​​ options as standard virtual machines but at a significantly reduced cost — typically offering a 60% to 91% discount. This cost efficiency comes with the tradeoff of potential preemption at short notice, making them most suitable for workloads that are fault-tolerant and can recover quickly from unexpected interruptions. Spot VMs are designed to take advantage of unused compute capacity, allowing you to optimize your cloud spending without compromising access to high-performance resources.&lt;/p&gt;&lt;p data-block-key="8abo0"&gt;Strong use cases for Spot VMs include batch processing jobs, big data and analytics workloads, continuous integration and deployment (CI/CD) pipelines, stateless web servers running in autoscaling groups, and compute-heavy tasks. When properly architected to handle interruptions — for example, by using job checkpointing, load balancing, task queues, or via GKE custom compute classes (see more above) — &lt;a href="https://cloud.google.com/solutions/spot-vms?e=48754805&amp;amp;hl=en"&gt;Spot VMs&lt;/a&gt; can play a critical role in minimizing infrastructure costs while maintaining high availability and system resilience. Leveraging Spot VMs in these scenarios lets you scale cost-effectively, especially when compute demand is variable or time-flexible.&lt;/p&gt;&lt;h3 data-block-key="9not2"&gt;&lt;b&gt;8. Use optimization recommendations&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="bjf0l"&gt;Google Cloud's &lt;a href="https://cloud.google.com/recommender/docs/recommenders"&gt;Recommenders&lt;/a&gt; are a powerful tool designed to help you optimize your cloud resources efficiently. When browsing the Google Cloud console, you may see lightbulb icons next to specific resources — these indicate potential improvements identified by Google's recommendation engine. By analyzing real-time usage patterns and current resource configurations, the &lt;a href="https://cloud.google.com/recommender/docs/key-concepts#recommenders"&gt;Recommender&lt;/a&gt; delivers actionable insights tailored to each user's unique environment. This intelligent system highlights opportunities not only to reduce costs but also to enhance security, performance, reliability, management efficiency, and environmental sustainability.&lt;/p&gt;&lt;p data-block-key="91nm7"&gt;For example, there are &lt;a href="https://cloud.google.com/compute/docs/instances/idle-vm-recommendations-overview"&gt;idle VM recommendations&lt;/a&gt; to help you identify VM instances that have not been used over the last 1 to 14 days. Common recommendations include switching to more suitable machine types, rightsizing underutilized compute instances, or adopting more cost-effective storage solutions. The tool allows you to apply many of these changes directly, streamlining the optimization process. By continuously evaluating workloads and offering these automated, data-driven suggestions, the Recommendation Hub helps organizations maintain cloud performance while managing costs more effectively.&lt;/p&gt;&lt;h3 data-block-key="35ft8"&gt;&lt;b&gt;9. Take advantage of auto-scaling and scheduling&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="5i72v"&gt;Matching your compute resources to actual demand patterns is one of the most effective ways to reduce cloud waste and improve overall cost efficiency. Many organizations over-provision their resources to handle peak workloads, leaving machines underutilized during off-peak periods. By aligning compute capacity more closely with real-time or predictable usage patterns, such as business hours or seasonal trends, you can significantly cut unnecessary spending without sacrificing performance.&lt;/p&gt;&lt;p data-block-key="c7ge7"&gt;&lt;a href="https://cloud.google.com/compute/docs/autoscaler"&gt;Autoscaling&lt;/a&gt; is the key to achieving this efficiency. In fact, customers who leverage Google Compute Engine autoscaling for their virtual machines have seen average infrastructure cost savings of more than 40%.&lt;/p&gt;&lt;p data-block-key="70opn"&gt;You can implement autoscaling strategies to dynamically adjust resources based on CPU utilization, load balancing capacity, or custom application metrics, so that workloads receive the necessary compute power when needed, while scaling down automatically during low-demand periods.&lt;/p&gt;&lt;p data-block-key="cj1lq"&gt;For workloads with predictable patterns, such as those that fluctuate with business hours or planned seasonal events, &lt;a href="https://cloud.google.com/compute/docs/autoscaler/scaling-schedules"&gt;schedule-based scaling&lt;/a&gt; is a particularly powerful tool. This approach allows you to proactively increase resources in anticipation of high demand and scale them down during lulls, for the performance you need without constant over-provisioning.&lt;/p&gt;&lt;p data-block-key="1i6kf"&gt;In addition to autoscaling, several practical implementation techniques can further optimize your resource usage. &lt;a href="https://cloud.google.com/scheduler/docs/start-and-stop-compute-engine-instances-on-a-schedule"&gt;Setting up instance scheduling&lt;/a&gt; lets you automatically start and stop development and test environments according to business hours — a simple yet highly effective approach that can lead to cost savings of up to 70%. You can also leverage maintenance windows to reduce disruptions and resource consumption, by concentrating updates and system changes into low-usage periods. Together, these tactics help maintain high availability and performance while keeping infrastructure costs under control.&lt;/p&gt;&lt;h3 data-block-key="evivu"&gt;&lt;b&gt;10. Understand your spend with detailed billing analysis&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="34tqj"&gt;Before implementing any cost-saving strategies in Google Cloud, it’s essential to &lt;a href="https://cloud.google.com/billing/docs/concepts"&gt;understand your current spending in detail&lt;/a&gt;. Google Cloud’s billing panel offers granular visibility into your expenses, including costs broken down by individual SKUs. This level of transparency lets you track where your money is going and identify potential inefficiencies. Begin by regularly reviewing your billing dashboard to monitor usage trends and spot anomalies. Applying labels and tags to your resources can further help categorize and attribute costs accurately, especially in complex environments with multiple projects or departments.&lt;/p&gt;&lt;p data-block-key="een0q"&gt;In addition, &lt;a href="https://cloud.google.com/billing/docs/how-to/budgets"&gt;setting up budget alerts&lt;/a&gt; is a practical way to stay ahead of overspending by notifying you when costs approach or exceed predefined thresholds. It’s also important to identify and eliminate unused or idle resources, such as virtual machines or persistent disks that are no longer in active use — these can often be shut down or deleted to immediately reduce costs. By thoroughly analyzing your cost structure, you can uncover “low-hanging fruit” — resources that provide little or no value — and make data-driven decisions to optimize your cloud usage efficiently.&lt;/p&gt;&lt;h3 data-block-key="9jfbk"&gt;&lt;b&gt;11. Consider serverless alternatives&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="e7aev"&gt;Last but not least, Google Cloud's &lt;a href="https://cloud.google.com/discover/what-is-serverless-computing?e=48754805&amp;amp;hl=en"&gt;serverless computing&lt;/a&gt; offerings provide a compelling alternative to traditional virtual machines, can deliver better cost efficiency, simplified operations, and greater scalability. By abstracting away infrastructure management, serverless platforms allow teams to focus on writing and deploying code without worrying about provisioning, scaling, or maintaining servers. This shift can not only reduce operational overhead but also cut costs by aligning compute spending directly with application usage.&lt;/p&gt;&lt;p data-block-key="4c30g"&gt;There are multiple serverless options available, each tailored to different workloads. &lt;a href="https://cloud.google.com/run?e=48754805&amp;amp;hl=en"&gt;Cloud Run&lt;/a&gt; is designed for running containerized applications that need rapid scaling and flexible deployment. &lt;a href="https://cloud.google.com/run/docs/write-event-driven-functions"&gt;Cloud Run Functions&lt;/a&gt; supports lightweight, event-driven code execution for microservices or automation tasks. &lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview"&gt;GKE (Autopilot Mode)&lt;/a&gt; simplifies Kubernetes operations by automatically managing nodes and scaling, allowing you to run Kubernetes workloads without handling the underlying infrastructure. All these options charge based on usage not allocation, significantly reducing costs associated with idle resources and over-provisioning. This makes them especially beneficial for variable or unpredictable workloads. Cloud Run and GKE both support GPU’s and flexibility to move between the two. You can start with &lt;a href="https://www.youtube.com/watch?v=nGFXKTz2jZM&amp;amp;t=2s&amp;amp;pp=ygUabW92ZSBmcm9tIGNsb3VkIHJ1biB0byBHS0U%3D" target="_blank"&gt;Cloud Run then move to GKE&lt;/a&gt; or &lt;a href="https://www.youtube.com/watch?v=x12EOsVt2oU&amp;amp;t=1s&amp;amp;pp=ygUabW92ZSBmcm9tIGNsb3VkIHJ1biB0byBHS0U%3D" target="_blank"&gt;vice-versa&lt;/a&gt;. Some customers also leverage both offerings for workloads. The rule of thumb is to start with GKE if you need access to the Kubernetes API. Otherwise, start with Cloud Run.&lt;/p&gt;&lt;h2 data-block-key="6n8fn"&gt;&lt;b&gt;Start reducing your costs today&lt;/b&gt;&lt;/h2&gt;&lt;p data-block-key="buet9"&gt;Migrate to Google Cloud and optimize your infrastructure costs without compromising on what your workloads need. If you are new to Google Cloud, start with &lt;a href="http://g.co/cloud/assess" target="_blank"&gt;a migration assessment&lt;/a&gt;. Google Cloud’s &lt;a href="https://cloud.google.com/migration-center/docs"&gt;Migration Center&lt;/a&gt; can help you with a clear understanding of your potential savings by migrating to Google Cloud, with detailed recommended paths for your workloads, along with TCO reports. Apply the strategies in this article and unlock substantial cost savings.&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 06 Oct 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/compute/cost-saving-strategies-when-migrating-to-google-cloud-compute/</guid><category>Infrastructure Modernization</category><category>Storage &amp; Data Transfer</category><category>Serverless</category><category>Compute</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>11 ways to reduce your Google Cloud compute costs today</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/compute/cost-saving-strategies-when-migrating-to-google-cloud-compute/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Alex Bestavros</name><title>Group Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sai Gopalan</name><title>Product Management, Google Cloud</title><department></department><company></company></author></item><item><title>Automate app deployment and security analysis with new Gemini CLI extensions</title><link>https://cloud.google.com/blog/products/ai-machine-learning/automate-app-deployment-and-security-analysis-with-new-gemini-cli-extensions/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Find and fix security vulnerabilities. Deploy your app to the cloud. All without leaving your command-line. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we’re closing the gap between your terminal and the cloud with a first look at the future of Gemini CLI, delivered through two new extensions: &lt;/span&gt;&lt;a href="https://github.com/google-gemini/gemini-cli-security/tree/main" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;security extension&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-mcp/?tab=readme-ov-file#use-as-a-gemini-cli-extension" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run extension&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. These extensions are designed to handle critical parts of your workflows with simple, intuitive commands:&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;1)  &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;/security:analyze&lt;/strong&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;performs a comprehensive scan right in your local repository, with support for GitHub pull requests coming soon. This makes security a natural part of your development cycle.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;2)  &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;/deploy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; deploys your application to Cloud Run, our fully managed serverless platform, in just a few minutes. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These commands are the first expression of a new extensibility framework for Gemini CLI. While we'll be sharing more about the full &lt;/span&gt;&lt;a href="https://github.com/google-gemini/gemini-cli/blob/main/docs/extension.md" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini CLI extension&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; world soon, we couldn't wait to get these capabilities into your hands. Consider this a sneak peak of what’s coming next!&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Security extension: automate security analysis with /security:analyze &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To help teams address software vulnerabilities early in the development lifecycle, we are launching the &lt;/span&gt;&lt;a href="https://github.com/google-gemini/gemini-cli-security" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini CLI Security extension&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This new open-source tool automates security analysis, enabling you to proactively catch and fix issues using the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;/security:analyze &lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;command at the terminal or through a soon-coming GitHub Actions integration. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Integrated directly into your local development workflow and CI/CD pipeline, this extension:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Analyzes code changes:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When triggered, the extension automatically takes the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;git diff&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; of your local changes or pull request.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Identifies vulnerabilities:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Using a specialized prompt and tools, Gemini CLI analyzes the changes for a wide range of potential vulnerabilities, such as hardcoded-secrets, injection vulnerabilities, broken access control, and insecure data handling.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Provides actionable feedback:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Gemini returns a detailed, easy-to-understand report directly in your terminal or as a comment on your pull request. This report doesn't just flag issues; it explains the potential risks and provides concrete suggestions for remediation, helping you fix issues quickly and learn as you go.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;And after the report is generated, you can also ask Gemini CLI to save it to disk or even implement fixes for each issue.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_Gemini_CLI_Security_Extension_Terminal_Gif.gif"
        
          alt="1 Gemini CLI Security Extension Terminal Gif"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Getting started with /security:analyze&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Integrating security analysis into your workflow is simple. First, download the Gemini CLI and install the extension &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;(requires Gemini CLI v0.4.0+)&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gemini extensions install https://github.com/google-gemini/gemini-cli-security&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe84af6ad60&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then you can start run your first scan:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Locally:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; After making local changes, simply run &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;/security:analyze &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; in the Gemini CLI.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;In CI/CD (Coming Soon): &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We're bringing security analysis directly into your CI/CD workflow. Soon,&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;you’ll be able to configure the GitHub Action to automatically review pull requests as they are opened.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This is just the beginning. The team is actively working on further enhancing the extension's capabilities, and we are also inviting the community to contribute to this open source project by reporting bugs, suggesting features, continuously improving security practices and submitting code improvements. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For complete documentation and to contribute, visit the &lt;/span&gt;&lt;a href="https://github.com/google-gemini/gemini-cli-security" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;official GitHub repository&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Run extension: automate deployment with &lt;/strong&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;/deploy&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;/deploy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; command in Gemini CLI automates the entire deployment pipeline for your web applications. You can now deploy a project directly from your local workspace. Once you issue the command, Gemini returns a public URL for your live application.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;/deploy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; command automates a full CI/CD pipeline to deploy web applications and cloud services from the command line using the &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-mcp/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run MCP server&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. What used to be a multi-step process of building, containerizing, pushing, and configuring is now a single, intuitive command from within the Gemini CLI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can access this feature across three different surfaces – in Gemini CLI in the terminal, in VS Code via &lt;/span&gt;&lt;a href="https://codeassist.google/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Code Assist&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; agent mode, and in Gemini CLI in &lt;/span&gt;&lt;a href="https://cloud.google.com/shell/docs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Shell&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_aA6mg0y.gif"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="dvesx"&gt;Use /deploy command in Gemini CLI at the terminal to deploy application to Cloud Run&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started with /deploy:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For existing Google Cloud users, getting started with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;/deploy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is straightforward in Gemini CLI at the terminal:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Prerequisites:&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; You'll need the gcloud CLI installed and configured on your machine and have an existing app or use Gemini CLI to create one.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 1: Install the Cloud Run extension&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;/deploy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; command is enabled through a &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-mcp" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Model Context Protocol (MCP) server&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which is included in the Cloud Run extension.  To install the Cloud Run extension &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;(Requires Gemini CLI v0.4.0+)&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, run this command:  &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gemini extensions install https://github.com/GoogleCloudPlatform/cloud-run-mcp&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe84af6a520&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 2: Authenticate with Google Cloud&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Ensure your local environment is authenticated to your Google Cloud account by running:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud auth login\r\ngcloud auth application-default login&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe84af6a9a0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="padding-left: 40px;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Step 3: Deploy your app&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Navigate to your application's root directory in your terminal and type &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gemini&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; to launch Gemini CLI. Once inside, type &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;/deploy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to deploy your app to Cloud Run.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That's it! In a few moments, Gemini CLI will return a public URL where you can access your newly deployed application. You can also visit the Google Cloud Console to see your new service running in Cloud Run. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Besides Gemini CLI at the terminal, this feature can also be accessed  in VS Code via Gemini Code Assist &lt;/span&gt;&lt;a href="https://cloud.google.com/gemini/docs/codeassist/release-notes"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;agent mode&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, powered by Gemini CLI,  and in Gemini CLI in Cloud Shell, where the authentication step will be automatically handled out of the box.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_deploy-agentmode.gif"
        
          alt="3 deploy-agentmode"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="dvesx"&gt;Use /deploy command to deploy application to Cloud Run in VS Code via Gemini Code Assist agent mode.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Building a robust extension ecosystem  &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Security and Cloud Run extensions are two of the first extensions from Google built on our new framework, which is designed to create a rich and open ecosystem for the Gemini CLI. We are building a platform that will allow any developer to extend and customize the CLI's capabilities, and this is just an early preview of the full platform's potential. We will be sharing a more comprehensive look at our extensions platform soon, including how you can start building and sharing your own.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Try Gemini CLI today, visit the GitHub &lt;/span&gt;&lt;a href="http://github.com/google-gemini/gemini-cli" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 10 Sep 2025 14:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/automate-app-deployment-and-security-analysis-with-new-gemini-cli-extensions/</guid><category>Application Development</category><category>Serverless</category><category>Open Source</category><category>AI &amp; Machine Learning</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Automate app deployment and security analysis with new Gemini CLI extensions</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/automate-app-deployment-and-security-analysis-with-new-gemini-cli-extensions/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Prithpal Bhogill</name><title>Group Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Evan Otero</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>From localhost to launch: Simplify AI app deployment with Cloud Run and Docker Compose</title><link>https://cloud.google.com/blog/products/serverless/cloud-run-and-docker-collaboration/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud, we are committed to making it as seamless as possible for you to build and deploy the next generation of AI and agentic applications. Today, we’re thrilled to announce that we are &lt;/span&gt;&lt;a href="https://docker.com/blog/build-ai-agents-with-docker-compose/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;collaborating with Docker&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to drastically simplify your deployment workflows, enabling you to bring your sophisticated AI applications from local development to &lt;/span&gt;&lt;a href="https://cloud.google.com/run"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with ease. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Deploy your compose.yaml directly to Cloud Run&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Previously, bridging the gap between your development environment and managed platforms like Cloud Run required you to manually translate and configure your infrastructure. Agentic applications that use MCP servers and self-hosted models added additional complexity. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The open-source &lt;/span&gt;&lt;a href="http://compose-spec.io" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Compose Specification&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is one of the most popular ways for developers to iterate on complex applications in their local environment, and is the basis of Docker Compose. And now, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;gcloud run compose up&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; brings the simplicity of Docker Compose to Cloud Run, automating this entire process. Now in &lt;/span&gt;&lt;a href="https://forms.gle/XDHCkbGPWWcjx9mk9" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;private preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, you can deploy your existing&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt; compose.yaml&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; file to Cloud Run with a single command, including building containers from source and leveraging Cloud Run’s volume mounts for data persistence.  &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/compose.gif"
        
          alt="compose"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Supporting the Compose Specification with Cloud Run makes for easy transitions across your local and cloud deployments, where you can keep the same configuration format, ensuring consistency and accelerating your dev cycle.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“We’ve recently evolved Docker Compose to support agentic applications, and we’re excited to see that innovation extend to Google Cloud Run with support for GPU-backed execution. Using Docker and Cloud Run, developers can now iterate locally and deploy intelligent agents to production at scale with a single command. It’s a major step forward in making AI-native development accessible and composable. We’re looking forward to continuing our close collaboration with Google Cloud to simplify how developers build and run the next generation of intelligent applications.” - &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Tushar Jain, EVP Engineering and Product, Docker&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Run, your home for AI applications&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Support for the compose spec isn’t the only AI-friendly innovation you’ll find in Cloud Run. We recently announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/cloud-run-gpus-are-now-generally-available"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;general availability of Cloud Run GPUs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, removing a significant barrier to entry for developers who want access to GPUs for AI workloads. With its pay-per-second billing, scale to zero, and rapid scaling (which takes approximately 19 seconds for a gemma3:4b model for time-to-first-token), Cloud Run is a great hosting solution for deploying and serving LLMs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This also makes Cloud Run a strong solution for Docker’s recently &lt;/span&gt;&lt;a href="https://www.docker.com/blog/docker-mcp-gateway-secure-infrastructure-for-agentic-ai/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; OSS MCP Gateway and Model Runner, making it easy for developers to take the AI applications locally to production in the cloud seamlessly. By supporting Docker’s recent addition of &lt;/span&gt;&lt;a href="https://github.com/compose-spec/compose-spec/blob/main/spec.md#models" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;‘models’ to the open Compose Spec&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, you can deploy these complex solutions to the cloud with a single command.  &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Bringing it all together&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let's review the compose file for the above demo. It consists of a multi-container application (defined in &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;services&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;) built from sources and leveraging a storage volume (defined in &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;volumes&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;). It also uses the new &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;models&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; attribute to define AI models and a Cloud Run-extension defining the runtime image to use:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;name: agent\r\nservices:\r\n  webapp:\r\n    build: .\r\n    ports:\r\n      - &amp;quot;8080:8080&amp;quot;\r\n    volumes:\r\n      - web_images:/assets/images\r\n    depends_on:\r\n      - adk\r\n\r\n  adk:\r\n    image: us-central1-docker.pkg.dev/jmahood-demo/adk:latest\r\n    ports:\r\n      - &amp;quot;3000:3000&amp;quot;\r\n    models:\r\n      - ai-model\r\n\r\nmodels:\r\n ai-model:\r\n    model: ai/gemma3-qat:4B-Q4_K_M\r\n    x-google-cloudrun:\r\n      inference-endpoint: docker/model-runner:latest-cuda12.2.2\r\n\r\nvolumes:\r\n  web_images:&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe84af9a760&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Building the future of AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re committed to offering developers maximum flexibility and choice by adopting open standards and supporting various agent frameworks.&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;This collaboration on Cloud Run and Docker is another example of how we aim to simplify the process for developers to build and deploy intelligent applications. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Compose Specification support is available for our trusted users — &lt;/span&gt;&lt;a href="https://forms.gle/XDHCkbGPWWcjx9mk9" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sign up here for the private preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 10 Jul 2025 09:30:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/cloud-run-and-docker-collaboration/</guid><category>DevOps &amp; SRE</category><category>Application Modernization</category><category>Partners</category><category>Serverless</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/cloud_run_docker.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>From localhost to launch: Simplify AI app deployment with Cloud Run and Docker Compose</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/cloud_run_docker.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/cloud-run-and-docker-collaboration/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Justin Mahood</name><title>Product Manager, Cloud Run</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Yunong Xiao</name><title>Director of Engineering, Google Cloud</title><department></department><company></company></author></item><item><title>Making it easier to scale Kafka workloads with Cloud Run worker pools</title><link>https://cloud.google.com/blog/products/serverless/exploring-cloud-run-worker-pools-and-kafka-autoscaler/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Apache Kafka is vital to many event-driven architectures and streaming data pipelines. However, effectively scaling Kafka consumers — the applications processing data from Kafka topics — can be challenging.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we’re excited to discuss two capabilities that make it more efficient and cost-effective to autoscale your Kafka consumer workloads on Cloud Run: &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/deploy-worker-pools"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run worker pools&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (in public preview), and the &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-kafka-scaler" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;open-source Cloud Run Kafka Autoscaler&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. We announced both of these capabilities at Google Cloud Next ’25.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The challenge: Scaling pull-based workloads&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Kafka consumers operate on a “pull” model, where they actively fetch data &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;from&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; Kafka brokers. This architecture fundamentally differs from “push” systems, where data is sent &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;to&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; consumers. Consequently, metrics such as CPU utilization or incoming HTTP request throughput are not sufficient enough to determine processing demand. The true indicator of workload for a Kafka consumer is “offset lag”, which is the delta between the latest message offset available in a topic partition, and the last offset committed by the consumer group for that partition. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Incorporating queue-aware metrics like offset lag (which reside in the Kafka broker) as an autoscaling input can minimize message backlogs and optimize resource utilization.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe82d70b8b0&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Run worker pools for pull-based workloads &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To solve the scaling challenge, you’ll first need an environment designed to run these pull-based workloads efficiently. This is where Cloud Run worker pools come in. They provide a purpose-built foundation for running Kafka consumers and other background processors, which was previously a challenging task on Cloud Run.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_The_three_main_Cloud_Run_resource_type.max-1000x1000.png"
        
          alt="1 - The three main Cloud Run resource types"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="xkpe6"&gt;The three main Cloud Run resource types&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While Cloud Run services are tailored for request-driven HTTP workloads and Cloud Run jobs for batch tasks that run to completion, worker pools are a distinct resource type well-suited for continuous, non-HTTP, pull-based background processing. They offer specific features that make them ideal for Kafka consumers: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Designed for background processing: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Unlike services, worker pools don't require public HTTP endpoints. This reduces the network attack surface and simplifies application code, as you no longer need to manage ports for health checks.&lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Gradual deployments with instance splitting: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Worker pools use deployment strategies tailored for pull-based workloads. Since these workloads don't handle HTTP traffic, rollouts are managed by splitting instances between revisions, rather than splitting traffic. For example, for a worker pool with four instances, you can allocate 25% (one instance) to a new canary revision and 75% (three instances) to the current, stable revision.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Significant cost savings:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; With worker pools, we charge up to 40% less for CPU and memory, compared to instance-billed Cloud Run services.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Worker pools are available in the Google Cloud CLI (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gcloud beta run worker-pools&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;), as an &lt;/span&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/cloud_run_v2_worker_pool" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;official Terraform resource&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and in the reorganized Google Cloud console interface:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_iGowg61.max-1000x1000.png"
        
          alt="2- The Cloud Run user interface with the new worker pool resource"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="xkpe6"&gt;The Cloud Run user interface with the new worker pool resource&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Queue-aware autoscaling with Kafka Autoscaler &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While worker pools provide the right environment, you still need a mechanism to scale based on offset lag. The open-source Cloud Run Kafka Autoscaler is a tool you deploy that works with worker pools (or instance-billed services) to dynamically adjust consumer instances based on real-time demand.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;It’s important to note that this is not a managed Google Cloud platform feature – it is an open-source tool that you control and deploy in your own project.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Key benefits: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scaling based on actual Kafka metrics: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The autoscaler connects directly to your Kafka cluster to monitor the total offset lag across partitions in your consumer group, and can also factor in consumer CPU utilization.&lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Automatically scales consumers down to zero&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: This eliminates costs during idle periods.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cost-effective: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Deployed as a request-billed Cloud Run service, the autoscaler itself is very cheap to run (less than $1 per month), since it is only active for brief periods during scaling checks. &lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fine-grained and configurable scaling behavior: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The autoscaler offers precise control over scaling policies, similar to the Kubernetes Horizontal Pod Autoscaler (HPA), allowing you to tailor the scaling behavior to meet your specific cost and performance goals. It provides several configurable levers, including:&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li style="list-style-type: none;"&gt;
&lt;ul style="list-style-type: circle;"&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Target lag and CPU utilization thresholds&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;A stabilization window to prevent rapid fluctuations in instance counts&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Scaling increment/decrement limits to control the rate at which instances are added or removed in a single scaling action&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;For a complete list of configuration options, please refer to the &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-kafka-scaler?tab=readme-ov-file#setting-up-and-deploying-the-autoscaler" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;project documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_-_Cloud_Run_Kafka_Autoscaler_architectur.max-1000x1000.png"
        
          alt="3 - Cloud Run Kafka Autoscaler architecture diagram"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="xkpe6"&gt;Cloud Run Kafka Autoscaler architecture diagram&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here’s how it works:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Perform autoscaling check: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Scheduler periodically triggers the autoscaler to initiate a scaling evaluation.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Read Kafka offset lag: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Once triggered, the autoscaler connects to the Kafka cluster to read offset lag, and (optionally) to Cloud Monitoring for the consumer’s CPU utilization.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Make scaling decision and actuate: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Based on the collected metrics and user-defined scaling policies, the autoscaler computes the optimal number of consumer instances and uses Cloud Run’s &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/manual-scaling"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;manual scaling API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to dynamically adjust the instance count &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;without a new deployment.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Generalizing the pattern&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The core architectural pattern of the Kafka autoscaler is simple: a Cloud Run service is periodically triggered to read custom metrics and adjust instance counts. This flexible model can be adapted for any pull-based workload, allowing you to scale your Cloud Run worker pools based on the metrics that matter most to your application.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If your application consumes from a different message queue or requires scaling based on your business metrics, you can build a similar dedicated autoscaler. Here are a few examples:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Autoscaling self-hosted Github runners: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Dynamically scale your pool of self-hosted runners based on the number of pending jobs in your CI/CD queue. This ensures your builds run without delay while minimizing costs by scaling down — even to zero — when runners are idle.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scaling on custom Prometheus metrics:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Scale your worker pools based on any custom business metric you already expose in Prometheus, such as the number of items in a processing queue or active user sessions. This allows you to tie your infrastructure costs directly to real-time application demand.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Processing a Pub/Sub backlog:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Adjust your number of workers based on the number of undelivered messages in a Pub/Sub subscription. This ensures timely message processing during traffic spikes, and saves money during quiet periods.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run worker pools and the Kafka Autoscaler bring a new level of flexibility and ease of use to running Kafka, and we’re excited to see what you do with them. To learn more and get started: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Try out the open-source Cloud Run Kafka Autoscaler:&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;ul&gt;
&lt;li aria-level="2" style="list-style-type: circle; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-kafka-scaler" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/GoogleCloudPlatform/cloud-run-kafka-scaler&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (&lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/cloud-run-kafka-scaler/tree/main/terraform" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Terraform module&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about Cloud Run worker pools (&lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/deploy-worker-pools"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;For feedback/questions on the autoscaler, please reach out to &lt;/span&gt;&lt;a href="mailto:run-oss-autoscaler-feedback@google.com"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;run-oss-autoscaler-feedback@google.com&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you are looking for a managed service for Apache Kafka, Google Cloud also offers a &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-service-for-apache-kafka/"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Kafka&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; with automated cluster management, Kafka Connect, and schema registry (in Preview) with built-in Google Cloud monitoring, logging, and IAM for simplified operations.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;We would like to thank the Google Cloud team members who helped with this blog post: Andrew Manalo (Software Engineer, Serverless Scaling), Sagar Randive (Product Manager, Serverless) and Matt Larkin (Product Manager, Serverless)&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 26 Jun 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/exploring-cloud-run-worker-pools-and-kafka-autoscaler/</guid><category>Streaming</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Making it easier to scale Kafka workloads with Cloud Run worker pools</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/exploring-cloud-run-worker-pools-and-kafka-autoscaler/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Aniruddh Chaturvedi</name><title>Engineering Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Adam Kane</name><title>Senior Engineering Manager</title><department></department><company></company></author></item><item><title>Cloud Run GPUs, now GA, makes running AI workloads easier for everyone</title><link>https://cloud.google.com/blog/products/serverless/cloud-run-gpus-are-now-generally-available/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers love &lt;/span&gt;&lt;a href="https://cloud.google.com/run"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, Google Cloud’s serverless runtime, &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;for its simplicity, flexibility, and scalability. And today, we’re thrilled to announce that NVIDIA GPU support for Cloud Run is now generally available, offering a powerful runtime for a variety of use cases that’s also remarkably cost-efficient. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Now, you can enjoy the following benefits across both GPUs and CPUs:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Pay-per-second billing&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: You are only charged for the GPU resources you consume, down to the second.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scale to zero&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Cloud Run automatically scales your GPU instances down to zero when no requests are received, eliminating idle costs. This is a game-changer for sporadic or unpredictable workloads.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rapid startup and scaling&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Go from zero to an instance with a GPU and drivers installed in under 5 seconds, allowing your applications to respond to demand very quickly. For example, when scaling from zero (cold start), we achieved an impressive Time-to-First-Token of approximately 19 seconds for a gemma3:4b model (this includes startup time, model loading time, and running the inference)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Full streaming support&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Build truly interactive applications with out-of-the box support for HTTP and WebSocket streaming, allowing you to provide LLM responses to your users as they are generated.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Support for GPUs in Cloud Run is a significant milestone, underscoring our leadership in making GPU-accelerated applications simpler, faster, and more cost-effective than ever before.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Serverless GPU acceleration represents a major advancement in making cutting-edge AI computing more accessible. With seamless access to NVIDIA L4 GPUs, developers can now bring AI applications to production faster and more cost-effectively than ever before.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Dave Salvator, director of accelerated computing products, NVIDIA&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe847d62280&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;AI inference for everyone&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;One of the most exciting aspects of this GA release is that Cloud Run GPUs are now available to everyone for NVIDIA L4 GPUs, with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;no quota request required&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.This removes a significant barrier to entry, allowing you to immediately tap into GPU acceleration for your Cloud Run services. Simply use &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;--gpu 1&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; from the Cloud Run command line, or check the "GPU" checkbox in the console, no need to request quota:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_XkZEV9U.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Production-ready&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With general availability, Cloud Run with GPU support is now covered by Cloud Run's &lt;/span&gt;&lt;a href="https://cloud.google.com/run/sla"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Service Level Agreement (SLA)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, providing you with assurances for reliability and uptime. By default, Cloud Run offers &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/zonal-redundancy"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;zonal redundancy&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, helping to ensure enough capacity for your service to be resilient to a zonal outage; this also applies to Cloud Run with GPUs. Alternatively, you can turn off zonal redundancy and benefit from a &lt;/span&gt;&lt;a href="https://cloud.google.com/run/pricing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;lower price&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for best-effort failover of your GPU workloads in case of a zonal outage.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Multi-regional GPUs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To support global applications, Cloud Run GPUs are available in five Google Cloud &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/locations#gpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;regions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;: us-central1 (Iowa, USA), europe-west1 (Belgium), europe-west4 (Netherlands), asia-southeast1 (Singapore), and asia-south1 (Mumbai, India), with more to come.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run also &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/multiple-regions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;simplifies deploying your services across multiple regions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. For instance, you can deploy a service across the US, Europe and Asia with a single command, providing global users with lower latency and higher availability. For instance, here’s how to deploy &lt;/span&gt;&lt;a href="https://ollama.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ollama&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, one of the easiest way to run open models, on Cloud Run across three regions:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud run deploy my-global-service \\\r\n  --image ollama/ollama --port 11434 \\\r\n  --gpu 1 \\\r\n  --regions us-central1,europe-west1,asia-southeast1&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe847d62eb0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;See it in action: 0 to 100 NVIDIA GPUs in four minutes&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can witness the incredible scalability of Cloud Run with GPUs for yourself with &lt;/span&gt;&lt;a href="https://youtu.be/PWPvX25R6dM?feature=shared&amp;amp;t=2140" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;this live demo&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; from Google Cloud Next 25, showcasing how we scaled from 0 to 100 GPUs in just four minutes.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_SrvmWli.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="lqcap"&gt;Load testing a Stable Diffusion service running on Cloud Run GPUs to 100 GPU instances in four minutes.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Unlock new use cases with NVIDIA GPUs on Cloud Run jobs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The power of Cloud Run with GPUs isn't just for real-time inference using request-driven Cloud Run services. We're also excited to announce the availability of GPUs on &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/overview/what-is-cloud-run#cloud-run-jobs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run jobs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, unlocking new use cases, particularly for batch processing and asynchronous tasks:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Model fine-tuning&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Easily fine-tune a pre-trained model on specific datasets without having to manage the underlying infrastructure. Spin up a GPU-powered job, process your data, and scale down to zero when it’s complete.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Batch AI inferencing&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Run large-scale batch inference tasks efficiently. Whether you're analyzing images, processing natural language, or generating recommendations, Cloud Run jobs with GPUs can handle the load.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Batch media processing&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Transcode videos, generate thumbnails, or perform complex image manipulations at scale.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSe_-u-ZSxVLhRMZ3p4ZSk2CkgL_URKqNgyM8rfMGUrTbpqYJQ/viewform?usp=dialog" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Sign up&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for the private preview of GPUs on Cloud Run jobs.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What Cloud Run customers are saying&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Don't just take our word for it. Here's what some early adopters of Cloud Run GPUs are saying:&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Cloud Run helps vivo quickly iterate AI applications and greatly reduces our operation and maintenance costs. The automatically scalable GPU service also greatly improves the efficiency of our AI going overseas.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Guangchao Li, AI Architect, vivo&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"L4 GPUs offer really strong performance at a reasonable cost profile. Combined with the fast auto scaling, we were really able to optimize our costs and saw an 85% reduction in cost. We've been very excited about the availability of GPUs on Cloud Run."&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - John Gill at &lt;/span&gt;&lt;a href="https://youtu.be/PWPvX25R6dM?feature=shared&amp;amp;t=2496" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Next'25&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Sr. Software Engineer, Wayfair&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"At Midjourney, we have found Cloud Run GPUs to be incredibly valuable for our image processing tasks. Cloud Run has a simple developer experience that lets us focus more on innovation and less on infrastructure management. Cloud Run GPU’s scalability also lets us easily analyze and process millions of images.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;" - Sam Schickler, Data Team Lead, Midjourney&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run with GPU is ready to power your next generation of applications. Dive into the &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/gpu"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, explore our &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/tutorials/gpu-gemma-with-ollama"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;quickstarts&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and review our &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/gpu-best-practices"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;best practices for optimizing model loading&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. We can't wait to see what you build!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 02 Jun 2025 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/cloud-run-gpus-are-now-generally-available/</guid><category>Application Modernization</category><category>Compute</category><category>Serverless</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/hero_BWYOvBU.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Cloud Run GPUs, now GA, makes running AI workloads easier for everyone</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/hero_BWYOvBU.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/cloud-run-gpus-are-now-generally-available/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Steren Giannini</name><title>Director, Product Management</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Yunong Xiao</name><title>Director of Engineering, Google Cloud</title><department></department><company></company></author></item><item><title>Flipping out: Modernizing a classic pinball machine with cloud connectivity</title><link>https://cloud.google.com/blog/products/application-modernization/connecting-a-pinball-machine-to-the-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In today's cloud-centric world, we often take for granted the ease with which we can integrate our applications with a vast array of powerful cloud services. However, there are still countless legacy systems and other constrained environments where integration is far from straightforward.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We faced this challenge head-on when building &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Backlogged Pinball&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, a custom pinball game that we built as a demo for integrating cloud services in uncommon environments. Backlogged Pinball is a physical pinball machine that connects to the cloud for a variety of services — think keeping track of data about current and completed games, updating leaderboards, etc. To build it, we used a base of a &lt;/span&gt;&lt;a href="https://www.multimorphic.com/p3-pinball-platform/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;commercially available programmable pinball machine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; so we could focus on game code and cloud integration. However, the machine's software environment was limited, running on a sandboxed version of .NET Framework 3.5, which was first released 17 years ago. Practically, this meant that we couldn't use any of the &lt;/span&gt;&lt;a href="https://cloud.google.com/dotnet/docs/reference"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;modern Google cloud SDKs available for C#&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and we couldn’t install tools like &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gcloud&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; to help communicate with the cloud.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-aside"&gt;&lt;dl&gt;
    &lt;dt&gt;aside_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;title&amp;#x27;, &amp;#x27;Try Google Cloud for free&amp;#x27;), (&amp;#x27;body&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe84af5fe50&amp;gt;), (&amp;#x27;btn_text&amp;#x27;, &amp;#x27;Get started for free&amp;#x27;), (&amp;#x27;href&amp;#x27;, &amp;#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&amp;#x27;), (&amp;#x27;image&amp;#x27;, None)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_M6pQAZF.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;There’s a catch&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span&gt;&lt;span style="vertical-align: baseline;"&gt;We knew we wanted to take advantage of the cloud for databases (for high scores, and stats from the game), logging (of game events and results), and a custom service (to change the game experience on the fly). But developing software for such a constrained environment presented a variety of challenges, which might be familiar to you:&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Minimal library support: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;If you have full control over your stack, there’s no shortage of great libraries to help you connect to cloud services. But sometimes you don’t get to pick where your software runs. For our pinball machine, it was difficult to find compatible libraries to integrate with the cloud services we wanted. For example, we knew we wanted to insert records into a &lt;/span&gt;&lt;a href="https://firebase.google.com/docs/firestore" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; database to drive a real-time visualization of everything going on in the game. Firestore has &lt;/span&gt;&lt;a href="https://firebase.google.com/docs/firestore/client/libraries" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;great SDKs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, but they couldn’t support anything before .NET Framework 4.6.2 (which is 8 years old). We might have been able to connect to a traditional relational database using a TCP connection, but we didn’t want to be limited in the cloud tools and services we could use. Needless to say, it’s much less practical to build a real-time web application with MySQL rather than Firestore, which is designed from the ground up to push data to the browser in real time.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Difficult deployment process:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Maybe you have other limitations that make updating your on-device software difficult, but you still want to add new features and cloud integrations. As third-party developers, we had to manually install each version of our game during development using a USB stick. This kind of limitation slows down the rate at which you can test, deploy and ship new versions of your code, which is never good. It’s much easier to add new features in a modern, flexible cloud environment.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Fundamentally, we found it challenging to use modern cloud services in an uncertain legacy environment.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Flipper-ing the script&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At first glance, there was no practical way to integrate all the services we wanted with the code that would run on the pinball machine. But what if there was another way? What if we turned the pinball machine itself into a service, and gave it a single minimal integration? Then we could have it send a message every time something happened in the game and sort out the results in a modern cloud environment.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We decided that Pub/Sub would be an excellent way to achieve this goal. It provided a way to get information to (and from!) the cloud with a single interface, with minimal complexity. It was just a basic HTTP POST of whatever message format we wanted.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_PNsvw6c.max-1000x1000.jpg"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To achieve this, we designed a custom Pub/Sub messaging system. We wrote our own lightweight Pub/Sub library for the pinball machine to handle authentication and message sending over the &lt;/span&gt;&lt;a href="https://cloud.google.com/pubsub/docs/reference/rest"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;REST API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, making it incredibly easy to post events whenever a player launched a ball, hit a target, or even pressed a flipper button. Check out a simplified version of that code &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/backlogged-pinball-backend/blob/0e9ae489d4503951f4918d8c590184de1c4657e8/sample-code/csharp-pubsub/pubsub-post.cs" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;on GitHub&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;!&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;On the cloud side, our team used multiple Cloud Run&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; subscribers to process these events in real time. We also used Firestore to store data and drive visualizations.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Jackpot! Cloud advantages&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Pushing the complexity of integration into the cloud brought numerous advantages:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Single interface: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Writing our own Pub/Sub client was no small task (authentication alone could be its own blog post!). But once it was done, it was done! Once it was working, we were able to focus on processing all the events in the cloud using whatever modern client libraries and tools we wanted.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-time updates:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; At Google Cloud Next, we helped users write their own Cloud Run services to receive pinball events, process them, and send messages back to the machine. Building and deploying these services took less than a minute, which meant you could conceivably change the game while a friend was playing it!&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Rich data insights:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We ended up with a fine-grained log of everything that happened in a game. This proved very helpful in troubleshooting issues during development and fine-tuning scoring based on playtesting.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_9F1v5xM.gif"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Plunging forward&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re already planning the next iteration of Backlogged Pinball with features we hadn’t originally considered. For example, we’re adding AI-powered game analysis and advice based on the player’s style. Thanks to this flexible cloud-based architecture, almost all the work will be in a modern cloud environment rather than fighting with dependencies on a legacy system. And the lessons we learned from this project are broadly applicable to any constrained environment. Whether it's an embedded system, an IoT device, or an old server running legacy software, by &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/tutorials/pubsub"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;leveraging Pub/Sub messaging&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and adopting a cloud-first mindset, you can break free from the limitations of your environment and unlock the full potential of the cloud.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’ll be showing off the latest Backlogged Pinball at &lt;/span&gt;&lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;KubeCon North America&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in November 2024. If you’re there, stop by to check it out!&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Special thanks to Mofi Rahman, Google Cloud Advocate, for his contributions to this project and this post.&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 04 Nov 2024 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/application-modernization/connecting-a-pinball-machine-to-the-cloud/</guid><category>Developers &amp; Practitioners</category><category>Serverless</category><category>Application Modernization</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/pinball.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Flipping out: Modernizing a classic pinball machine with cloud connectivity</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/pinball.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/application-modernization/connecting-a-pinball-machine-to-the-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Drew Brown</name><title>Developer Advocate</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Max Saltonstall</name><title>Developer Advocate</title><department></department><company></company></author></item><item><title>Run your AI inference applications on Cloud Run with NVIDIA GPUs</title><link>https://cloud.google.com/blog/products/application-development/run-your-ai-inference-applications-on-cloud-run-with-nvidia-gpus/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers love Cloud Run for its simplicity, fast autoscaling, scale-to-zero capabilities, and pay-per-use pricing. Those same benefits come into play for real-time inference apps serving open gen AI models. That's why today, we’re adding support for NVIDIA L4 GPUs to Cloud Run, in preview.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This opens the door to many new use cases to Cloud Run developers:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Performing real-time inference with lightweight open models such as Google’s open Gemma (2B/7B) models or Meta’s Llama 3 (8B) to build custom chat bots or on-the-fly document summarization, while scaling to handle spiky user traffic. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Serving custom fine-tuned gen AI models, such as image generation tailored to your company's brand, and scaling down to optimize costs when nobody's using them.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Speeding up your compute-intensive Cloud Run services, such as on-demand image recognition, video transcoding and streaming, and 3D rendering.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As a fully managed platform, Cloud Run lets you run your code directly on top of Google’s scalable infrastructure, combining the flexibility of containers with the simplicity of serverless to help boost your productivity. With Cloud Run, you can run frontend and backend services, batch jobs, deploy websites and applications, and handle queue processing workloads — all without having to manage the underlying infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At the same time, many workloads that perform AI inference, especially applications that demand real-time processing, require GPU acceleration to deliver responsive user experiences. With support for NVIDIA GPUs, you can perform on-demand online AI inference using the LLMs of your choice in seconds. With 24GB of vRAM, you can expect fast token rates for models with up to 9 billion parameters, including Llama 3.1(8B), Mistral (7B), Gemma 2 (9B). When your app is not in use, the service automatically scales down to zero so that you are not charged for it.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“With the addition of NVIDIA L4 Tensor GPU and NVIDIA NIM support, Cloud Run provides users a real-time, fast-scaling AI inference platform to help customers accelerate their AI projects and get their solutions to market faster — with minimal infrastructure management overhead.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Anne Hecht, Senior Director of Product Marketing, NVIDIA&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Early customers are excited about the combination of Cloud Run and NVIDIA GPUs.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Cloud Run's GPU support has been a game-changer for our real-time inference applications. The low cold-start latency is impressive, allowing our models to serve predictions almost instantly, which is critical for time-sensitive customer experiences. Additionally, Cloud Run GPUs maintain consistently minimal serving latency under varying loads, ensuring our generative AI applications are always responsive and dependable — all while effortlessly scaling to zero during periods of inactivity. Overall, Cloud Run GPUs have significantly enhanced our ability to provide fast, accurate, and efficient results to our end users.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Thomas MENARD, Head of AI - Global Beauty Tech, L’Oreal&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Cloud Run GPUs are hands-down the best way to consume GPU compute on Google Cloud. I love how it provides a high degree of control and customizability using open-source standards (Knative) as well as great observability tools out of the box, together with fully managed infrastructure that scales to zero. And since we can easily migrate to GKE using Knative primitives, there is always an option to get even more control at the cost of higher complexity and maintenance. GPU allocation and startup times were also faster for our use-case compared to most competing services.” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Alex Bielski, Director of Innovation, Chaptr&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Using NVIDIA GPUs on Cloud Run&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we support attaching one NVIDIA L4 GPU per Cloud Run instance, and you do not need to reserve your GPUs in advance. To start, Cloud Run GPUs are available today in us-central1(Iowa), with availability in europe-west4 (Netherlands) and asia-southeast1 (Singapore) expected before the end of the year. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To deploy a Cloud Run service with NVIDIA GPUs, add the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;--gpu=1&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; flag to specify the number of GPUs and &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;--gpu-type=nvidia-l4&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; flag to specify the type of GPU in the command line. Or, you can do this from the Google Cloud console:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/GPU_blog_gif_2.gif"
        
          alt="GPU blog gif 2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;And with the recently announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/google-cloud-functions-is-now-cloud-run-functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, you can also attach a GPU to your functions to perform event-driven AI inference with simplicity.&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"The newly released Cloud Run functions with GPU support enables Python developers to use &lt;/span&gt;&lt;a href="https://huggingface.co/docs/transformers/en/index" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;Hugging Face models&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; without having to worry about infrastructure, GPU drivers or containers. Cloud Run's scales to zero and fast startup capabilities are a great match for developers looking at getting started with AI using HuggingFace models with just a few lines of serverless code&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;” &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- Julien Chaumond, CTO, Hugging Face&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Performance&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Along with simple operations, Cloud Run with NVIDIA GPUs also offers strong performance. We keep our infrastructure latency to a minimum so that you can get the best performance when serving your models. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run instances with an attached L4 GPU with driver pre-installed start in &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;approximately 5 seconds&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, at which point the processes running in your container can start to use the GPU. Then, you’ll need another few seconds for the framework and model to load and initialize. The table below shows cold-start times for Gemma 2b, Gemma2 9b, Llama2 7b/13b, and Llama3.1 8b models with the Ollama framework, ranging from 11 to 35 seconds. This measures the time to start an instance from 0, load the model in the GPU, and for the LLM to return its first word.&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;/p&gt;
&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table style="width: 95.953%;"&gt;&lt;colgroup&gt;&lt;col style="width: 19.6332%;"/&gt;&lt;col style="width: 41.4023%;"/&gt;&lt;col style="width: 38.9646%;"/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Model&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Model Size &lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Cold Start Time&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;gemma:2b&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;1.7 GB&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;11-17 seconds&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;gemma2:9b&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;5.1 GB&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;25-30 seconds&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;llama2:7b&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;3.8 GB&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;14-21 seconds&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;llama2:13b&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;7.4 GB&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;23-35 seconds&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;llama3.1:8b&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;4.7 GB&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;15-21 seconds&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;sup&gt;&lt;span style="vertical-align: baseline;"&gt;Cold start time: Time taken for first invocation to the service URL for Cloud Run instance to go from 0-1 and serve the first word of the response.&lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Models: we used 4 bit quantized versions of each of the models above. These models were deployed using the Ollama framework. &lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Note that these numbers are observed in a controlled lab environment and actual performance numbers may vary depending on a variety of factors. “&lt;/span&gt;&lt;/sup&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Deploy a sample app using Ollama&lt;/strong&gt;&lt;/h3&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--small
      
      
        h-c-grid__col
        
        
        h-c-grid__col--2 h-c-grid__col--offset-5
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_lL36B9K.max-1000x1000.jpg"
        
          alt="image1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Below, you can see how to deploy Google’s Gemma2 9b model with Ollama using Cloud Run with NVIDIA GPUs. &lt;/span&gt;&lt;a href="https://ai.google.dev/gemma/?utm_source=keyword&amp;amp;utm_medium=referral&amp;amp;utm_campaign=gemma_cta&amp;amp;utm_content=" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt;Gemma&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is a family of lightweight, state-of-the-art &lt;/span&gt;&lt;a href="https://opensource.googleblog.com/2024/02/building-open-models-responsibly-gemini-era.html" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt;open models&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; built from the same research and technology used to create the &lt;/span&gt;&lt;a href="https://deepmind.google/technologies/gemini/#introduction" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt;Gemini&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; models. &lt;/span&gt;&lt;a href="https://github.com/ollama/ollama" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt;Ollama&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is a framework that provides a simple API to manage large language models. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;First, create a container image with Ollama and the model with this Dockerfile:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;FROM ollama/ollama\r\nENV HOME /root\r\nWORKDIR /\r\nRUN ollama serve &amp;amp; sleep 10 &amp;amp;&amp;amp; ollama pull gemma2\r\nENTRYPOINT [&amp;quot;ollama&amp;quot;,&amp;quot;serve&amp;quot;]&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe848f03f70&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then deploy using the following command:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud beta run deploy --source . --port 11434 --region us-central1 --no-cpu-throttling --cpu 8 --memory 32Gi --gpu 1 --gpu-type=nvidia-l4&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe848f03160&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;And that’s it! Once deployed, you can use the &lt;/span&gt;&lt;a href="https://github.com/ollama/ollama?tab=readme-ov-file#rest-api" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt;Ollama API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to start chatting with Gemma 2!&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Deploying a Large Language Model using Ollama on Cloud Run is remarkably straightforward, thanks to the latest GPU support. With just a few commands, you can leverage Ollama’s seamless integration with your app and Cloud Run’s serverless infrastructure to deploy, and manage your LLMs effortlessly. The fast coldstarts and rapid scaling of Cloud Run let you scale your application reliably. No deep knowledge of infrastructure or machine learning is required — simply focus on your application and let the tools handle the rest.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Jeffrey Morgan, Founder, Ollama&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Additionally, you can also leverage &lt;/span&gt;&lt;a href="http://ai.nvidia.com" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NVIDIA NIM&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; inference microservices, part of the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/marketplace/product/nvidia/nvidia-ai-enterprise-vmi"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NVIDIA AI Enterprise&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; software suite available in the Google Cloud Marketplace&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This provides secure, reliable deployment of high-performance AI model inferencing accelerated to simplify AI inference deployments and maximize performance on NVIDIA L4 GPUs on Cloud Run. Check out this NVIDIA blog to learn how to get started.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run makes it super easy to host your web applications. And now with GPU support, we are extending the best of serverless, simplicity and scalability to your AI inference applications too! To start using Cloud Run with NVIDIA GPUs, sign up at &lt;/span&gt;&lt;a href="https://g.co/cloudrun/gpu" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt;g.co/cloudrun/gpu&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to join our preview program today and wait for our welcome email.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To learn more about Cloud Run with GPUs, join this &lt;/span&gt;&lt;a href="https://cloudonair.withgoogle.com/events/run-ai-with-cloud-run?utm_source=cgc-blog&amp;amp;utm_medium=blog&amp;amp;utm_campaign=FY24-Q3-global-prod1052-onlineevent-er-Run-AI-With-Cloud-Run&amp;amp;utm_content=blog&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;livestream&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on August 21, 2024 with NVIDIA and Ollama. We will discuss new features for Cloud Run and demo how to use Cloud Run in different scenarios.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 21 Aug 2024 15:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/application-development/run-your-ai-inference-applications-on-cloud-run-with-nvidia-gpus/</guid><category>AI &amp; Machine Learning</category><category>Serverless</category><category>Application Development</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Run your AI inference applications on Cloud Run with NVIDIA GPUs</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/application-development/run-your-ai-inference-applications-on-cloud-run-with-nvidia-gpus/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sagar Randive</name><title>Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Wenlei (Frank) He</name><title>Senior Staff Software Engineer, Google Cloud Serverless</title><department></department><company></company></author></item><item><title>Cloud Functions is now Cloud Run functions — event-driven programming in one unified serverless platform</title><link>https://cloud.google.com/blog/products/serverless/google-cloud-functions-is-now-cloud-run-functions/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Functions and its familiar event-driven programming model is now Cloud Run functions, complete with the fine-grained control and scalability that developers love about the serverless platform. With Cloud Run functions, we’ve created a unified serverless platform for all your workloads, so you don’t have to choose between the two.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This goes beyond a simple name change. We’ve unified the Cloud Functions infrastructure with Cloud Run, and developers of Cloud Functions (2nd gen) get immediate access to all new Cloud Run features, including &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/application-development/run-your-ai-inference-applications-on-cloud-run-with-nvidia-gpus"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NVIDIA GPUs. &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When Cloud Functions become Cloud Run functions, you can write and deploy functions directly with Cloud Run, giving you complete control over the underlying service configuration:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud beta run deploy hello-function \\\r\n      --source . \\\r\n      --function hello_get \\\r\n      --base-image nodejs20&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe82d718fa0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_IMAGEA.max-1000x1000.png"
        
          alt="1 IMAGEA"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="m23bg"&gt;A new deployment option for Cloud Run: the function&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Further, all functions that were created with Google Cloud Functions (2nd gen) have access to all of Cloud Run’s capabilities, including:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/run/docs/triggering/trigger-with-events"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Multi-event trigger &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;management on functions&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;High-performance &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/vpc-direct-vpc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Direct VPC egress&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The ability to mount &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/cloud-storage-volume-mounts"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Storage volumes&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/language-runtimes"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google-managed language runtimes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, with &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/automatic-base-image-updates"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;automatic security updates on base images&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/run/docs/rollouts-rollbacks-traffic-migration"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Traffic splitting and revision control&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Managed Prometheus and OpenTelemetry support with &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/deploying#sidecars"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sidecar containers&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/blog/products/application-development/run-your-ai-inference-applications-on-cloud-run-with-nvidia-gpus"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Inference functions with NVIDIA GPUs &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"The newly released Cloud Run functions with GPU support enables Python developers to use &lt;/span&gt;&lt;a href="https://huggingface.co/docs/transformers/en/index" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;Hugging Face models&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; without having to worry about infrastructure, GPU drivers or containers. Cloud Run's scale-to-zero and fast startup capabilities are a great match for developers looking at getting started with AI using HuggingFace models with just a few lines of serverless code&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;” - Julien Chaumond, CTO, Hugging Face&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Continued support for existing APIs, gcloud commands and terraform modules&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Functions 2nd gen functions will automatically be converted into Cloud Run functions. With Cloud Run functions, we are committed to continuing support for the existing functions &lt;/span&gt;&lt;a href="https://cloud.google.com/functions/docs/apis"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;APIs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/sdk/gcloud/reference/functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;gcloud commands&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and Terraform modules (&lt;/span&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/cloudfunctions2_function" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gen 2&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;). This lets you enable Run features on your function without having to refactor your deployment automation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;1st gen functions will continue to be available as Cloud Run functions (1st gen). 1st gen functions need to be upgraded to Cloud Run functions before you can get full access to the underlying Cloud Run features. Cloud Run functions (1st gen) &lt;/span&gt;&lt;a href="https://cloud.google.com/functions/docs/apis"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;APIs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/sdk/gcloud/reference/functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;gcloud commands&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and Terraform modules (&lt;/span&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/cloudfunctions_function" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gen1&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) will continue to be supported. &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Connecting your platform with functions&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run functions makes connecting your platform simple to build and easy to maintain — you’re only responsible for the code, we’ll handle the rest. Anyone on your team with coding knowledge can create a solution without having to package up the code. You can also choose from seven popular languages. Data scientists, for example, can get a Python script running in the cloud even with limited infrastructure knowledge.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_IMAGEB.max-1000x1000.png"
        
          alt="2 IMAGEB"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="m23bg"&gt;Edit your function in a new inline editor&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Cloud Run functions keeps productivity high and operations low by making each function its own independent component, isolating it from directly impacting other workloads. Changes and updates to one function are unlikely to impact another function.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A common use case for functions is responding when an object is added to a Cloud Storage bucket. The function might generate thumbnails of an image or run sentiment analysis on a text file. But there are many other examples for which customers choose Cloud Functions:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Transforming data and loading it into BigQuery&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Creating a webhook that’s called by a third party (e.g., GitHub)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Using ML APIs to analyze data added to a database or storage bucket&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started with Cloud Run functions&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you're new to serverless or a seasoned pro, Cloud Run functions make it easier than ever to build and manage event-driven applications.  &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about improvements to the &lt;/span&gt;&lt;a href="https://cloud.google.com/functions/docs/concepts/version-comparison"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Functions experience &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Deploy an &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/deploy-functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;HTTP function on Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Deploy an&lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/tutorials/eventarc-functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; Event driven function on Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about running &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/application-development/run-your-ai-inference-applications-on-cloud-run-with-nvidia-gpus"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;inference applications on Cloud Run with NVIDIA GPUs&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Learn more about Cloud Run functions and Cloud Run in this live &lt;/span&gt;&lt;a href="https://cloudonair.withgoogle.com/events/run-ai-with-cloud-run?utm_source=cgc-blog&amp;amp;utm_medium=blog&amp;amp;utm_campaign=FY24-Q3-global-prod1052-onlineevent-er-Run-AI-With-Cloud-Run&amp;amp;utm_content=blog&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;webinar&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 21 Aug 2024 15:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/google-cloud-functions-is-now-cloud-run-functions/</guid><category>Application Development</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Cloud Functions is now Cloud Run functions — event-driven programming in one unified serverless platform</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/google-cloud-functions-is-now-cloud-run-functions/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>James Ma</name><title>Sr. Product Manager</title><department></department><company></company></author></item><item><title>Flexible committed-use discounts are now even more flexible</title><link>https://cloud.google.com/blog/products/containers-kubernetes/compute-flexible-cud-expands-to-gke-autopilot-and-cloud-run/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud offers many great ways to run your workloads: low-level VMs in Google Compute Engine, container orchestration with Google Kubernetes Engine (GKE) — including via fully-managed &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Autopilot mode&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; — and Cloud Run. Until now, to optimize your spend, you needed to purchase several Committed-use Discounts (CUDs) to cover each of these different products. For example, you might have purchased a Compute Engine Flexible CUD for VM spend including workloads running on GKE’s standard mode, a Cloud Run CUD for Cloud Run always-on instances, and an Autopilot CUD for workloads running in GKE Autopilot.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Expanding Compute Flexible CUDs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today we are excited to announce that the Compute Engine Flexible CUD, now known as the&lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/committed-use-discounts-overview#spend_based"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; Compute Flexible CUD,&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; has been expanded to cover Cloud Run on-demand resources, most GKE Autopilot Pods and the premiums for Autopilot Performance and Accelerator compute classes. The &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/committed-use-discounts-overview#spend_based"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and our &lt;/span&gt;&lt;a href="https://cloud.google.com/skus/sku-groups/compute-engine-flexible-cud-eligible-skus"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;SKU list&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; has the precise details on what’s included.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With one CUD purchase, you can cover eligible spend on all three products: Compute Engine, GKE, and Cloud Run. You can save 46% for a three-year commitment, and 28% for one-year commitments. With this single unified CUD, you can now make a single commitment and spend it across all these products, maximizing its flexibility. Furthermore, these commitments are not region-specific, so you can use them on resources in any region across these products.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Retiring the Autopilot CUD&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Since the new expanded Compute Flexible CUD has a higher discount than the GKE Autopilot CUD and greater overall flexibility, we’re retiring the GKE Autopilot CUD. You can still purchase the legacy GKE Autopilot CUD until October 15, after which it will no longer be available for purchase. Any existing CUDs will continue to apply through their term regardless of when you purchase them. That said, we recommend looking into the newly expanded Compute Flexible CUD for your needs now and in the future, for its greater flexibility and better discounts!&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;How to get started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you're already using Flexible CUDs for Compute Engine, you'll automatically see the discounts applied to eligible Cloud Run and GKE Autopilot usage (if you have product-specific CUDs like the legacy GKE Autopilot CUD, those will apply first). If you're new to Compute Flexible CUD, it's easy to get started: estimate your hourly spend across eligible SKUs, and purchase a commitment that matches your expected sustained usage over the one- or three-year term, and start enjoying the savings! You can add additional CUDs as your usage grows.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Use-latest.max-1000x1000.png"
        
          alt="Use-latest"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We hope you find this new flexibility useful when it comes to platforming your workloads on Google Cloud!&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Next steps&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Learn about &lt;/span&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/committed-use-discounts-overview#spend_based"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Compute Flexible CUDs&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;View &lt;/span&gt;&lt;a href="https://cloud.google.com/run/pricing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run pricing&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;View &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/pricing#autopilot_mode"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GKE pricing&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/cud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;CUD options&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://console.cloud.google.com/billing/reports/commitments"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Purchase a Compute Flexible CUD in the console&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Mon, 15 Jul 2024 18:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/containers-kubernetes/compute-flexible-cud-expands-to-gke-autopilot-and-cloud-run/</guid><category>GKE</category><category>Cost Management</category><category>Serverless</category><category>Containers &amp; Kubernetes</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Flexible committed-use discounts are now even more flexible</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/containers-kubernetes/compute-flexible-cud-expands-to-gke-autopilot-and-cloud-run/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>William Denniss</name><title>Group Product Manager, Google Kubernetes Engine</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Yasmin Mowafy</name><title>Sr. Product Manager</title><department></department><company></company></author></item><item><title>Releasing Artifact Registry assets across Organizations and Projects with serverless</title><link>https://cloud.google.com/blog/products/serverless/artifact-registry-across-your-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Have you ever wondered if there is a more automated way to copy &lt;/span&gt;&lt;a href="https://cloud.google.com/artifact-registry/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Artifact Registry&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or Container Registry Images across different projects and Organizations? In this article we will go over an opinionated process of doing so using serverless components in Google Cloud and its deployment with Infrastructure as Code (IaC).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This article assumes knowledge of coding in Python, basic understanding of running commands in a terminal and the &lt;/span&gt;&lt;a href="https://developer.hashicorp.com/terraform/language" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Hashicorp Configuration Language (HCL)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; i.e. Terraform for IaC.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In this use case we have at least one container image residing in an Artifact Registry Repository that has frequent updates to it, that needs to be propagated to external Artifact Registry Repositories inter-organizationally. Although the images are released to external organizations they should still be private and may not be available for public use.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To clearly articulate how this approach works, let's first cover the individual components of the architecture and then tie them all together. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As discussed earlier, we have two Artifact Registry (AR)  repositories in question; let’s call them “Source AR” (the AR where the image is periodically built and updated, the source of truth) and “Target AR” (AR in a different organization or project where the image needs to be consumed and propagated periodically) for ease going forward. The next component in the architecture is &lt;/span&gt;&lt;a href="https://cloud.google.com/pubsub/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Pub/Sub&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;; we need an Artifact Registry Pub/Sub topic in the source project that automatically captures updates made to the source AR. When the Artifact Registry API is enabled, Artifact Registry automatically creates this Pub/Sub topic; the topic is called “gcr” and is shared between Artifact Registry and Google Container Registry (if used). Artifact Registry publishes messages for the following changes to the topic:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Image uploads&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;New tags added to images&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Image deletion&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_SwHdBo1.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Although the topic is created for us, we will need to create a Pub/Sub subscription to consume the messages from the topic. This brings us to the next component of the architecture, &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/overview/what-is-cloud-run"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. We will create a Cloud Run deployment that will perform the following:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Parse through the Pub/Sub messages&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Compare the contents of the message to validate if the change in the Source AR warrants an update to the Target AR&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;If the validation conditions are met, then the Cloud Run service moves the latest Docker image to the Target AR &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Now, let’s dive into how Cloud Run integrates with the Pub/Sub AR topic. For Cloud Run to be able to read the Pub/Sub messages we have two additional components; an &lt;/span&gt;&lt;a href="https://cloud.google.com/eventarc/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;EventArc trigger&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and a Pub/Sub subscription. The EventArc trigger is critical to the workflow as it is what triggers the Cloud Run service. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In addition to the components described above, the below prerequisites need to be met for the entire flow to function correctly. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/sdk?hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud SDK&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; needs to be installed on the users’ terminal so that you can run &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;gcloud &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;commands.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The project Service Account (SA) will need “Read” permission on the Source AR.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The Project SA will need “Write” permission on the Target AR.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/vpc-service-controls/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;VPC-SC&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; requirements on the destination organization (if enabled)&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Egress Permissions to the target repository from the SA running the job&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Ingress permission for the account running the 'make' commands (instructions below) and writing to Artifact Registry or Container Registry&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Ingress Permissions to read the PUB/SUB GCR Topic of the source repository&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Allow&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; [project-name]-sa@[project-name].iam.gserviceaccount.com&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; needs VPC-SC Ingress for the Artifact Registry method&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Allow&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; [project-name]-sa@[project-name].iam.gserviceaccount.com&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; needs VPC-SC Ingress for CloudRun method&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;var.gcp_project&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Var.service_account&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Below we talk about the Python code, Dockerfile and the Terraform code which is all you need for implementing this yourself. We recommend that you open our Github repository while reading the below section where all the Open Source code for this solution lives. Here’s the link: &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/devops/inter-org-artifacts-release" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/devops/inter-org-artifacts-release&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;  &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;What we deploy in Cloud Run is a custom Docker container. It comprises of the following files:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;App.py: This file contains the variables for the source and target containers as well as the execution code that will be triggered to run based on the Pub/Sub messages and contains the following Python code. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Copy_image.py: this file contains the copy command app.py will leverage in order to run the gcrane command required to copy images from source AR to target AR.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Dockerfile: This file contains the instructions needed to package gcrane and the requirements needed to build the Cloud Run image&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Since we have now covered all of the individual components that are associated with this architecture, let’s walk through the flow that ties all the individual components together. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s say your engineering team has built and released a new version of the Docker Image “Image X”, per their release schedule and added the “latest” tag to it. This new version is sitting in the Source AR and when the new version gets created, the AR Pub/Sub topic updates the message that reflects that a new version of the “Image X” has been added to the source AR. This automatically causes the EventArc trigger to poke the Cloud Run service to scrape the messages from the Pub/Sub subscription. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our Cloud Run service will use the logic written in the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;App.py &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;image to check if the action that happened in Source AR matches the criteria specified (Image X with tag “latest”). If the action matches and warrants a downstream action, Cloud Run triggers &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Copy_image.py&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to execute the gcrane command to copy the image name and tag from the Source AR to the Target AR. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the event that the image or tag does not match the criteria specified in App.py, (for eg. Image Y tag: latest) the Cloud Run process will give back an HTTP 200 reply with a message &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“The source AR updates were not made to the [Image X]. No image will be updated.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; confirming no action will be taken. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Note:&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; Because the Source AR may contain multiple images and we are only concerned with updating specific images in the Target AR we have integrated output responses within the Cloud Run services that can be viewed in the Google Cloud logs for troubleshooting and diagnosing issues. This also prevents unwanted publishing of images not pertaining to the desired image(s) in question.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Why did we not go with an alternative approach?&lt;/strong&gt;&lt;/h2&gt;
&lt;ol&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Versatility: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The Source and Target AR’s were in different Organizations&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Compatibility:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The Artifacts were not in a Code/Git repository compatible with solutions like Cloud Build.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Security:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; VPC-SC perimeters limit the tools we can leverage while using cloud native serverless options.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Immutability: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We wanted a solution that could be fully deployed with Infrastructure as Code.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scalability and Portability: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We wanted to be able to update multiple Artifact Registries in multiple Organizations simultaneously.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Efficiency and Automation: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Avoids a time-based pull method when no resources are being moved. Avoids human interaction to ensure consistency.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Native: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Alleviates the dependency on third-party tools or solutions like a CI/CD pipeline or a repository outside of the Google Cloud environment.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If your Upstream projects where the images are coming from all reside in the same Google Cloud Region or Multi-region, a great alternative to solve the problem is &lt;/span&gt;&lt;a href="https://cloud.google.com/artifact-registry/docs/repositories/virtual-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Virtual repositories&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;How do we deploy it with IaC?&lt;/strong&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We have provided the Terraform code we used to solve this problem. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The following variables will be used in the code. These variables will need to be replaced or declared within a .tfvars file and assigned a value based on the specific project.&lt;/span&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;var.gcp_project&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Var.service_account&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In conclusion, there are multiple ways to bootstrap a process for releasing artifacts across Organizations. Each method would have its pros and cons, the best one for the approach would be determined by evaluating the use case at hand. The things to consider here would be, if the artifacts can reside in a Git repository, if the target repository is in the same Organization or a child Organization and if CI/CD tooling is preferred.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you have gotten this far it’s likely you may have a good use case for this solution. This pattern can also be used for other similar use cases. Here are a couple examples just to get you started:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Copying other types of artifacts from AR repositories like Kubeflow Pipeline Templates (kfp)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Copying bucket objects behind a VPC-SC between projects or Orgs&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn more&lt;/strong&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Our solution code can be found here: &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/devops/inter-org-artifacts-release" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/devops/inter-org-artifacts-release&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;GCrane: &lt;/span&gt;&lt;a href="https://github.com/google/go-containerregistry/blob/main/cmd/gcrane/README.md" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/google/go-containerregistry/blob/main/cmd/gcrane/README.md&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Configuring Pub/Sub GCR notifications: &lt;/span&gt;&lt;a href="https://cloud.google.com/artifact-registry/docs/configure-notifications" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://cloud.google.com/artifact-registry/docs/configure-notifications&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Mon, 20 May 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/artifact-registry-across-your-cloud/</guid><category>Application Modernization</category><category>DevOps &amp; SRE</category><category>Developers &amp; Practitioners</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Releasing Artifact Registry assets across Organizations and Projects with serverless</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/artifact-registry-across-your-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Guillermo Noriega</name><title>Infrastructure Cloud Consultant</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vipul Raja</name><title>Infrastructure Cloud Consultant</title><department></department><company></company></author></item><item><title>Firestore integration with Eventarc reaches GA with Auth Context</title><link>https://cloud.google.com/blog/products/databases/firestore-eventarc-integration-now-ga-with-auth-context/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Creating event-driven architectures using &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/eventarc-unified-eventing-experience-google-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Eventarc&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; together with &lt;/span&gt;&lt;a href="https://firebase.google.com/docs/firestore" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is an increasingly popular pattern. Recently, the &lt;/span&gt;&lt;a href="https://cloud.google.com/datastore/docs/eventarc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore integration with Eventarc&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; became generally available, adding new functionality. You can now register multiple Cloud Functions in different regions against a multi-regional Firestore database for increased reliability, and there are new event types, including the &lt;/span&gt;&lt;a href="https://github.com/cloudevents/spec/blob/main/cloudevents/extensions/authcontext.md" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Auth Context extension for CloudEvents&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Determining who or what — the user, a service account, the system, or a third-party — is making a modification to a Firestore document as a change event has long been a top-requested feature. With the new Firestore event types with Auth Context extension, events now embed metadata about the principal that triggered a document change in the open and portable &lt;/span&gt;&lt;a href="https://github.com/googleapis/google-cloudevents" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;CloudEvents&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; format.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Example walkthrough&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s say that you want to have different logic to process events in the destinations for different auth contexts (i.e. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;unauthenticated&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;system&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;). To set up your trigger, navigate to the &lt;/span&gt;&lt;a href="https://cloud.google.com/functions/docs/calling/eventarc#deployment"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Eventarc section of the Google Cloud console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. You’ll need to create a new &lt;/span&gt;&lt;a href="https://cloud.google.com/datastore/docs/eventarc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;trigger&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for Firestore using the associated event types that include authentication information. These event types end with the suffix *.withAuthContext. We’ll want to capture newly written entities, so we’ll select  &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;google.cloud.firestore.document.v1.written.withAuthContext &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;events&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/blog_post_01.max-1000x1000.png"
        
          alt="blog_post_01"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can specify additional filters, which ensures only desirable events from a specified database and collection are delivered. In this case, we filter for events from the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;(default) &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;database, and for documents of the collection &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Ops&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;On the same screen, you’ll also need to specify a destination. Triggering events can be delivered to any number of supported Eventarc destinations, like &lt;/span&gt;&lt;a href="https://cloud.google.com/eventarc/docs/run/route-trigger-cloud-firestore"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/functions/docs/calling/cloud-firestore"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Functions (2nd gen)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/eventarc/docs/gke/route-trigger-cloud-firestore"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Kubernetes Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Let’s say we have a Cloud Run service named &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;demo &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;that exposes an HTTP endpoint to receive the events. You can configure your trigger as follows:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/blog_post_02.max-1000x1000.png"
        
          alt="blog_post_02"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That’s it! When any write operation is applied to your &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;(default)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; database with the collection &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Ops&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, a CloudEvent with the Auth Context is delivered to the configured Cloud Run service &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;demo&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; almost immediately. You can inspect the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;authtype&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; attribute as defined in &lt;/span&gt;&lt;a href="https://github.com/cloudevents/spec/blob/main/cloudevents/extensions/authcontext.md" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Auth Context extension&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to identify &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;unauthenticated&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;system &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;types as shown in &lt;/span&gt;&lt;a href="https://cloud.google.com/firestore/docs/extend-with-functions-2nd-gen#event_attributes"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://cloud.google.com/firestore/docs/extend-with-functions-2nd-gen#event_attributes&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; . &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Next steps&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For more information on how to set up and configure Firestore triggers, check out our &lt;/span&gt;&lt;a href="https://cloud.google.com/datastore/docs/eventarc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Thanks to both Minh Nguyen, Senior Product Manager Lead for Firestore and Juan Lara, Senior Technical Writer for Firestore, for their contributions to this blog post.&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-related_article_tout"&gt;





&lt;div class="uni-related-article-tout h-c-page"&gt;
  &lt;section class="h-c-grid"&gt;
    &lt;a href="https://cloud.google.com/blog/products/databases/firestore-triggers-for-cloud-run-and-google-kubernetes-engine/"
       data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }'
       class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
        h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"&gt;
      &lt;div class="uni-related-article-tout__inner-wrapper"&gt;
        &lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;

        &lt;div class="uni-related-article-tout__content-wrapper"&gt;
          &lt;div class="uni-related-article-tout__image-wrapper"&gt;
            &lt;div class="uni-related-article-tout__image" style="background-image: url('')"&gt;&lt;/div&gt;
          &lt;/div&gt;
          &lt;div class="uni-related-article-tout__content"&gt;
            &lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;Firestore adds three new trigger destinations through an integration with Eventarc&lt;/h4&gt;
            &lt;p class="uni-related-article-tout__body"&gt;Firestore adds support for 3 new trigger destinations (Cloud Run, Cloud Functions Gen 2, Google Kubernetes Engine) through an integration...&lt;/p&gt;
            &lt;div class="cta module-cta h-c-copy  uni-related-article-tout__cta muted"&gt;
              &lt;span class="nowrap"&gt;Read Article
                &lt;svg class="icon h-c-icon" role="presentation"&gt;
                  &lt;use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"&gt;&lt;/use&gt;
                &lt;/svg&gt;
              &lt;/span&gt;
            &lt;/div&gt;
          &lt;/div&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;/section&gt;
&lt;/div&gt;

&lt;/div&gt;</description><pubDate>Mon, 13 May 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/databases/firestore-eventarc-integration-now-ga-with-auth-context/</guid><category>Application Development</category><category>Containers &amp; Kubernetes</category><category>Serverless</category><category>Databases</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Firestore integration with Eventarc reaches GA with Auth Context</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/databases/firestore-eventarc-integration-now-ga-with-auth-context/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Josué Urbina</name><title>Software Engineer, Firestore</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Hansi Mou</name><title>Software Engineer, Firestore</title><department></department><company></company></author></item><item><title>Direct VPC egress on Cloud Run is now generally available</title><link>https://cloud.google.com/blog/products/serverless/direct-vpc-egress-for-cloud-run-is-now-ga/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we're launching the general availability (GA) of Direct VPC egress for Cloud Run. This feature enables your Cloud Run resources to send traffic directly to a VPC network without proxying it through Serverless VPC Access connectors, making it easier to set up, faster, and with lower costs.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In fact, Direct VPC egress delivers approximately &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;twice the throughput&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; compared to both VPC connectors and the default Cloud Run internet egress path, offering up to 1 GB per second per instance. Whether you're sending traffic to destinations on the VPC, to other Google Cloud services like Cloud Storage, or to other destinations on the public internet, Direct VPC egress offers higher throughput and lower latency for performance-sensitive apps.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What's new since the preview&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Notable improvements and new features:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;All regions where Cloud Run is available are now enabled for Direct VPC egress.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Each Cloud Run service revision with Direct VPC can now scale beyond 100 instances as controlled by a &lt;/span&gt;&lt;a href="https://cloud.google.com/run/quotas"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;quota&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. There is a standard quota increase request &lt;/span&gt;&lt;a href="https://cloud.google.com/run/quotas#increase"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;process&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; if you need to scale even more. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://cloud.google.com/nat/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud NAT&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is supported, and Direct VPC egress traffic is now included in VPC Flow Logs and Firewall Rules Logging.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These updates address the top issues reported by our preview customers, especially larger customers with advanced scalability, networking, and security requirements. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Customer feedback&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many customers have been trying Direct VPC egress in preview since last year and have given us great feedback, including DZ BANK:&lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"With Direct VPC egress for Cloud Run, the platform team can more easily onboard new Cloud Run workloads because we no longer need to maintain Serverless VPC Access connectors and their associated dedicated /28 subnets. In our dynamic environment, where new Cloud Run services are created regularly, &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;this simpler networking architecture saves us 4-6 hours per week of manual toil&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;. We have also deprovisioned 30+ VPC connectors, &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;saving on the additional compute costs for running them&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;." - Tim Harpe, Senior Cloud Engineer, DZ BANK&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you enable direct VPC egress and &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/securing/private-networking#egress-settings"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;send all your egress traffic to a VPC&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, you can leverage the same tools and capabilities for all your traffic – from Cloud Run, GKE, or VMs.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Next steps&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Direct VPC egress is ready for your production workloads. Try it today and enjoy better performance and lower cost.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For a primer about how Direct VPC egress works, check out our preview &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/serverless/announcing-direct-vpc-egress-for-cloud-run"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;blog post&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and its attached explainer &lt;/span&gt;&lt;a href="https://youtu.be/wCRA7HnZf0g" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;video&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 23 Apr 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/direct-vpc-egress-for-cloud-run-is-now-ga/</guid><category>Networking</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Direct VPC egress on Cloud Run is now generally available</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/direct-vpc-egress-for-cloud-run-is-now-ga/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Wietse Venema</name><title>Developer Relations Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Xiaowen Xin</name><title>Product Manager, Serverless Networking and Security</title><department></department><company></company></author></item><item><title>Attention DevOps engineers: Top managed container sessions to add to your Next ‘24 agenda</title><link>https://cloud.google.com/blog/products/containers-kubernetes/next24-sessions-about-managed-container-runtimes/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud Next ‘24 is around the corner, and it’s the place to be if you’re serious about cloud development! Starting April 9 in Las Vegas, this global event promises a deep dive into the latest updates, features, and integrations for the services of Google Cloud’s managed container platform, Google Kubernetes Engine (GKE) and Cloud Run. From effortlessing scaling and optimizing AI models to providing tailored environments across a range of workloads — there’s a session for everyone. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you’re a seasoned cloud pro or just starting your serverless journey, you can expect to learn new insights and skills to help you deliver powerful, yet flexible, managed container environments in this next era of AI innovation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Don’t forget to add these sessions to your event agenda — you won’t want to miss them.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Kubernetes Engine sessions&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next/session-library?session=OPS212&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OPS212: How Anthropic uses Google Kubernetes Engine to run inference for Claude&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;Learn how Anthropic is using GKEs resource management and scaling capabilities to run inference for Claude, its family of foundational AI models, on TPU v5e [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=b87I1plPeMg" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784324-66186886e88fb.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=OPS200&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OPS200: The past, present, and future of Google Kubernetes Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;Kubernetes is turning 10 this year in June! Since its launch, Kubernetes has become the de facto platform to run and scale containerized workloads. The Google team will reflect on the past decade, highlight how some of the top GKE customers use our managed solution to run their businesses, and what the future holds [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=7p5omI1kOws" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784337-661869eaa8999.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV201&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV201: Go from large language model to market faster with Ray, Hugging Face, and LangChain&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Learn how to deploy Retrieval-Augmented Generation (RAG) applications on GKE using open-source tools and models like Ray, HuggingFace, and LangChain. We’ll also show you how to augment the application with your own enterprise data using the pgvector extension in Cloud SQL. After this session, you’ll be able to deploy your own RAG app on GKE and customize it [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=qwFCZKKFXd4" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784203-66185e86b90d1.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV240&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV240: Run workloads not infrastructure with Google Kubernetes Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Join this session to learn how GKE's automated infrastructure can simplify running Kubernetes in production. You’ll explore cost -optimization, autoscaling, and Day 2 operations, and learn how GKE allows you to focus on building and running applications instead of managing infrastructure [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3794526-661c3d5b569ce.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=OPS217&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OPS217: Access traffic management for your fleet using Google Kubernetes Engine Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;Multi-cluster and tenant management are becoming an increasingly important topic. The platform team will show you how GKE Enterprise makes operating a fleet of clusters easy, and how to set up multi-cluster networking to manage traffic by combining it with the Kubernetes Gateway API controllers for GKE [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784212-66185f47a10c9.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=OPS304&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;OPS304: Build an internal developer platform on Google Kubernetes Engine Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Internal Developers Platforms (IDP) are simplifying how developers work, enabling them to be more productive by focusing on providing value and letting the platform do all the heavy lifting. In this session, the platform team will show you how GKE Enterprise can serve as a great starting point for launching your IDP and demo  the GKE Enterprise capabilities that make it all possible [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=T4LiXvEiPuU" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784212-66185f47a10c9.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Run sessions&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV205&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV205: Cloud Run – What's new&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Join this session to learn what's new and improved in Cloud Run in two major areas — enterprise architecture and application management [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=YA50fuOdaqQ" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784231-66186072d78aa.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV222&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV222:&lt;/span&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Live-code an app with Cloud Run and Flutter&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;During this session, see the Cloud Run developer experience in real time. Follow along as two Google Developer Relations Engineers live-code a Flutter application backed by Firestore and powered by an API running on Cloud Run [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784202-66185e868d8c1.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV208&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-'" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV208: Navigating Google Cloud - A comprehensive guide for website deployment&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;Learn about the major options for deploying websites on Google Cloud. This session will cover the full range of tools and services available to match different deployment strategies  — from simple buckets to containerized solutions to serverless platforms like Cloud Run [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=2cFXatR4VAI" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784706-661872b7bedd2.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV235&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV235: Java on Google Cloud — The enterprise, the serverless, and the native&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;In this session, you’ll learn how to deploy Java Cloud apps to Google Cloud and explore all  the options for running Java workloads using various frameworks [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=DOf18oxjFB8" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3776891-6616fe08e8465.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV237&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV237: Roll up your sleeves - Craft real-world generative AI Java in Cloud Run &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;In this session, you’ll learn how to build powerful gen AI applications in Java and deploy them on Cloud Run using Vertex AI and Gemini models [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3794515-661c3ba37f784.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV253&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV253: Building generative AI apps on Google Cloud with LangChain&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Join this session to learn how to combine the popular open-source framework LangChain and Cloud Run to build LLM-based applications [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=l7tNx52bnsc" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784012-66185115f3834.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV228&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV228: How to deploy all the JavaScript frameworks to Cloud Run&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Have you ever wondered if you can deploy JavaScript applications to Cloud Run? Find out in this session as one Google Cloud developer advocate sets out to prove that you can by deploying as many JavaScript frameworks to Cloud Run as possible [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3794523-661c3c8fe0a8b.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV241&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV241: Cloud-powered, API-first testing with Testcontainers and Kotlin &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;Testcontainers is a popular API-first framework for testing applications. In this session, you’ll learn how to use the framework with an end-to-end example that uses Kotlin code in BigQuery and PubSub, Cloud Build, and Cloud Run to improve the testing feedback cycle [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784216-66185f8ce3ec9.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=ARC104&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;ARC104&lt;/span&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;: &lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;The ultimate hybrid example - A fireside chat about how Google Cloud powers (part of) Alphabet&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Join this fireside chat to learn about the ultimate hybrid use case — running Alphabet services in some of Google Cloud’s most popular offerings. Learn how Alphabet leverages Google Cloud runtimes like GKE, why it doesn’t run everything on Google Cloud, and the reason some products run partially on cloud [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3817425-66200c7c3dc9d.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next/session-library?session=DEV202&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV202: Accelerate your AI with Serverless&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Serverless platforms and generative AI applications are a great match. In this talk you'll learn how Google Cloud's pay-as-you-go model for serverless runtimes can be used to supplement your generative AI model with function calling [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=Pw1OkO0b-kM" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3771982-6615e3b4ad925.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next/session-library?session=DEV229#all" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dev299: A java developer walks into a serverless bar&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;This session is for Java developers who want to learn how to deploy their apps to Google Cloud. It offers a practical guide to considerations, challenges, tips and tricks for optimizing your JVM for Serverless environments [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=pjk6zuaUFOM&amp;amp;t=23s" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3794534-661c3de70c780.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;]&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Firebase sessions&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV221&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV221: Use Firebase for faster, easier mobile application development&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Firebase is a beloved platform for developers, helping them develop apps faster and more efficiently. This session will show you how Firebase can accelerate application development with prebuilt backend services, including authentication, databases and storage [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=wXjh4BPnrnI" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784075-6618560652821.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV243&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV243: Build full stack applications using Firebase and Google Cloud &lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Firebase and Google Cloud can be used together to build and run full stack applications. In this session, you’ll learn how to combine these two powerful platforms to enable enterprise-grade applications development and create better experiences for users [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3784183-66185c1d8e008.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV107&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV107: Make your app super with Google Cloud Firebase &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;Learn how Firebase and Google Cloud are the superhero duo you need to build enterprise-scale AI applications. This session will show you how to extend Firebase with Google Cloud using Gemini — our most capable and flexible AI model yet — to build, secure, and scale your AI apps [&lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3783904-66184ceb07399.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.withgoogle.com/next?session=DEV250&amp;amp;utm_source=copylink&amp;amp;utm_medium=unpaidsoc&amp;amp;utm_campaign=FY24-Q2-global-ENDM33-physicalevent-er-next-2024-mc&amp;amp;utm_content=next-homepage-social-share&amp;amp;utm_term=-" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;DEV250: Generative AI web development with Angular&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;In this session, you’ll explore how to use Angular v18 and Firebase hosting to build and deploy lightning-fast applications with Google's Gemini generative AI [&lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=5FdtPwZrkGw" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Recording&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://assets.swoogo.com/uploads/3794533-661c3de710c9f.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Slides&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;].&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;See you at the show!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 05 Apr 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/containers-kubernetes/next24-sessions-about-managed-container-runtimes/</guid><category>Serverless</category><category>Application Modernization</category><category>Google Cloud Next</category><category>Containers &amp; Kubernetes</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Cloud_Next_2024_b.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Attention DevOps engineers: Top managed container sessions to add to your Next ‘24 agenda</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Cloud_Next_2024_b.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/containers-kubernetes/next24-sessions-about-managed-container-runtimes/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Abdel Sghiouar</name><title>Senior Cloud Developer Advocate</title><department></department><company></company></author></item><item><title>Introducing Cloud Run volume mounts: connect your app to Cloud Storage or NFS</title><link>https://cloud.google.com/blog/products/serverless/introducing-cloud-run-volume-mounts/</link><description>&lt;div class="block-paragraph"&gt;&lt;p data-block-key="vdaxa"&gt;We’ve built Cloud Run, a fully managed container platform, directly on top of Google’s scalable infrastructure to simplify developers’ lives and make it easier to build cloud-native applications.&lt;/p&gt;&lt;p data-block-key="4g466"&gt;As it stands, each Cloud Run instance has access to its own local file system. But what if you have an existing application that expects to access &lt;b&gt;shared&lt;/b&gt; data stored in a local file system? Without a straightforward way to mount storage systems like file servers or Cloud Storage buckets, developers had to use complex solutions or look to other services. Today, we’re excited to launch a new feature in preview: volume mounts.&lt;/p&gt;&lt;p data-block-key="c3l6u"&gt;With volume mounts, mounting a volume in a Cloud Run service or job is a single command. You can mount a Cloud Storage bucket or an NFS share, like a Cloud Filestore instance. This allows your container to seamlessly access the storage bucket or file server content as if the files were local, utilizing file system semantics for a familiar experience.&lt;/p&gt;&lt;p data-block-key="bnqv2"&gt;You can mount a Cloud Storage bucket by updating your Cloud Run service with the following command (you can find more details and instructions in the Try it out section below):&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud beta run services update [SERVICE_NAME] \\\r\n--execution-environment gen2 \\ \r\n--add-volume=name=v_mount,type=cloud-storage,bucket=[YOUR_BUCKET_NAME]  \\\r\n--add-volume-mount=volume=v_mount,mount-path=[MOUNT_PATH]&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe848f5f8e0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Volume mounts come in handy in a number of situations. &lt;/span&gt;&lt;/p&gt;
&lt;h3 role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Store app configuration&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let's consider a scenario where you need to add config files to your service. When applications launch, they often need to gather information about their environment and load initial settings to determine their behavior.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the past, we’ve seen customers use &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/secret-manager"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Secret Manager&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to store and mount this information, but for configuration data that doesn’t need to be kept secret, Cloud Storage is a more straightforward solution. Simply put all your configuration into a file in your preferred format, upload the file to a Cloud Storage bucket, and mount the bucket in your Cloud Run service or job at the required path. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Since Cloud Run supports pulling public container images from Docker Hub directly, mounting your own config files to customize official images (like Grafana or Nginx) becomes very convenient. There's no need to build, add your config files and host your own container image. Just deploy an official container image from Docker Hub directly, store your config files in a Cloud Storage bucket and mount them where they are expected.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_Store_app_configuration.max-1000x1000.png"
        
          alt="1 - Store app configuration"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3 role="presentation"&gt;&lt;strong&gt;&lt;span style="vertical-align: baseline;"&gt;2. &lt;/span&gt;&lt;/strong&gt;&lt;strong style="vertical-align: baseline;"&gt;Event-driven Cloud Storage handlers&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many applications are built using an event-driven design pattern. A common use case is executing custom code based on a new file being uploaded to a Cloud Storage bucket. &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/eventarc-unified-eventing-experience-google-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;EventArc&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is a great tool to listen to such events and to trigger a Cloud Run service directly. It forwards all relevant event metadata, including the file name and location — but not the file itself. Until now, to retrieve and process the file, you needed to use the Cloud Storage client SDK to explicitly retrieve it. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With volume mounts you can now mount the relevant bucket directly. This allows you to access the file directly via the filesystem, eliminating the need for custom code to fetch it.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2-_Event-driven_Cloud_Storage_handlers.max-1000x1000.png"
        
          alt="2- Event-driven Cloud Storage handlers"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3 role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Load a vector database file&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you have your Langchain application deployed on Cloud Run, you may need a vector database, like &lt;/span&gt;&lt;a href="https://www.trychroma.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;ChromaDB&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Because the indexed documents are constantly changing, mounting an NFS storage is a great way to keep your service stateless and externalize your ChromaDB collection from the container — all while having a dedicated ingestion pipeline for new documents outside your service. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Persisted ChromaDB collections can grow large quickly, so Cloud Filestore is a fast option to access them from all your Cloud Run application instances.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_-_Load_a_vector_database.max-1000x1000.png"
        
          alt="3 - Load a vector database"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3 role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;4. Serve a static website&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For simple public-facing file hosting, you could directly use Cloud Storage’s &lt;/span&gt;&lt;a href="https://cloud.google.com/storage/docs/hosting-static-website"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;static website hosting feature&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. However, if you need private networking features or a simple login experience via &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/iap?hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Identity-Aware Proxy (IAP)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Cloud Run is an excellent choice. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Previously, you had to copy all the static files into your container image to host and serve them from there. However, that required you to rebuild the image and then redeploy your service every time you changed the static content. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With volume mounts, you can now use a standard NGINX web server and serve your files from a mounted Cloud Storage bucket. Cloud Run accesses your files using standard file system semantics, so you can use the official, publicly hosted NGINX container image from DockerHub directly. You can now modify your static assets or add new ones as needed, with all changes taking effect for your Cloud Run service promptly and without downtime. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This setup provides access to all the valuable ingress features, including IAP, while retaining the flexibility of storing your files in a Cloud Storage bucket.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_-_Serve_a_static_website.max-1000x1000.png"
        
          alt="4 - Serve a static website"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To run a simple web server that serves files from your Cloud Storage bucket, deploy the official NGINX image from Docker Hub and mount the bucket to the directory where NGINX expects to find static content: '/usr/share/nginx/html'. You can do this with a single command: &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;gcloud beta run deploy --image nginx [SERVICE_NAME] \\\r\n--execution-environment gen2 --port 80 \\\r\n--add-volume=name=html-volume,type=cloud-storage,bucket=[YOUR_BUCKET_NAME],readonly=true \\\r\n--add-volume-mount=volume=html-volume,mount-path=&amp;#x27;/usr/share/nginx/html&amp;#x27;&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe848f5fb50&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As a last step, be sure to set up content caching, either through &lt;/span&gt;&lt;a href="https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;NGINX content caching&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or Cloud CDN. Without caching, each request triggers a Cloud Storage get request, which can lead to increased costs as well as unnecessary latency for your users.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Try it out&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can mount a Cloud Storage bucket or any NFS file share by &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/cloud-storage-volume-mounts#mount-volume#command-line"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;using a gcloud command&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/cloud-storage-volume-mounts#yaml"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;updating the Cloud Run YAML resource definition&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or by deploying via &lt;/span&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/cloud_run_v2_service#example-usage---cloudrunv2-service-mount-nfs" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Terraform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;For example, you can perform a source-based deployment to a new Cloud Run job and mount a Cloud Storage bucket with the following command:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud beta run jobs deploy [SERVICE_NAME] --source \\\r\n--execution-environment gen2 \\\r\n--add-volume=name=[VOLUME_NAME],type=cloud-storage,bucket=[BUCKET_NAME] \\ \r\n--add-volume-mount=volume=[VOLUME_NAME],mount-path=[MOUNT_PATH]&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7fe848f5fa00&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Similarly, you can mount any NFS file share as a volume in Cloud Run. If you don’t already have an NFS server, we recommend using Cloud Filestore, Google Cloud’s fully managed NFS offering. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For more information and to get started, take a look at our documentation:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Mount a Cloud Storage bucket in a &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/services/cloud-storage-volume-mounts"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;service&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;a href="https://cloud.google.com/run/docs/configuring/jobs/cloud-storage-volume-mounts"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;job&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Mount an NFS volume in a &lt;/span&gt;&lt;a href="http://cloud.google.com/run/docs/configuring/services/nfs-volume-mounts"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;service&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;a href="http://cloud.google.com/run/docs/configuring/jobs/nfs-volume-mounts"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;job&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We’re excited about how easy volume mounts in Cloud Run make it to access data, port existing applications, and even to configure some pre-built container images. Try this feature in preview today.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 22 Mar 2024 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/introducing-cloud-run-volume-mounts/</guid><category>Storage &amp; Data Transfer</category><category>Application Modernization</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Introducing Cloud Run volume mounts: connect your app to Cloud Storage or NFS</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/introducing-cloud-run-volume-mounts/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Christoph Stanger</name><title>Strategic Cloud Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Karolína Netolická</name><title>Product Manager</title><department></department><company></company></author></item><item><title>DZ BANK unlocks 70% toil savings and 90% cost savings with a Cloud Run-first approach</title><link>https://cloud.google.com/blog/products/serverless/how-dz-bank-uses-a-cloud-run-first-approach-to-unlock-big-savings/</link><description>&lt;div class="block-paragraph"&gt;&lt;p data-block-key="3buth"&gt;&lt;b&gt;&lt;i&gt;Editor’s note:&lt;/i&gt;&lt;/b&gt;&lt;i&gt; DZ BANK is the second largest bank by assets in Germany. In this post, Cloud Engineer Tim Harpe from DZ BANK shares how migrating to Google Cloud resulted in spectacular efficiency gains and cost savings.&lt;/i&gt;&lt;/p&gt;&lt;hr/&gt;&lt;p data-block-key="79k2e"&gt;DZ BANK chose Google Cloud to accelerate its digital transformation because Google Cloud offers cutting-edge technology, top-tier expertise, and meets the rigorous security and compliance standards demanded by the financial sector. In a short amount of time, we have containerized and migrated some of our most business-critical applications to Google Cloud and &lt;a href="https://cloud.google.com/run"&gt;Cloud Run&lt;/a&gt;, achieving &lt;b&gt;70% toil reduction, 90% cost savings&lt;/b&gt;, &lt;b&gt;and unlocking new capabilities&lt;/b&gt; through integrations with leading services like BigQuery.&lt;/p&gt;&lt;p data-block-key="cnpef"&gt;As a result of this success, we're proud to be recognized as one of the &lt;a href="https://cloud.google.com/blog/topics/customers/winners-of-2023-google-cloud-customer-awards-announced"&gt;winners of the Google Cloud Customer Award 2023&lt;/a&gt; in the Financial Services Industry category!&lt;/p&gt;&lt;h3 data-block-key="cjrfb"&gt;&lt;b&gt;Our modernization journey&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="df5ed"&gt;As a 150 year old institution, DZ BANK has a lot of legacy infrastructure, much of it still running on-premises as virtual machines (VMs), with over 55,000 CPUs. On-prem VMs require a lot of effort to manage, and they're expensive. A few years ago, we decided to modernize our stack to reduce our overhead, increase developer velocity, and strengthen our competitiveness.&lt;/p&gt;&lt;p data-block-key="5hg7e"&gt;Around 90% of our application stack is developed in-house and managed by us. To modernize each app, we first containerize it and then deploy it on either &lt;a href="https://cloud.google.com/run"&gt;Cloud Run&lt;/a&gt; or &lt;a href="https://cloud.google.com/kubernetes-engine"&gt;Google Kubernetes Engine (GKE)&lt;/a&gt;.&lt;/p&gt;&lt;p data-block-key="cg01g"&gt;Cloud Run is a great fit for many of our apps, especially ones that experience spikes in traffic. Since Cloud Run can quickly scale to thousands of containers, we don't need to provision for peak loads. We also have many smaller internal workloads that are not used during off hours. Now, we can scale them on request with Cloud Run, scale to zero when they're not being used, and only pay for the infrastructure when it's serving customer traffic. In the past, we separated each of these workloads into their own individual VMs that ran 24/7.&lt;/p&gt;&lt;p data-block-key="120in"&gt;For apps where we want more control over the underlying hardware, we put them on GKE. The first app we migrated to Google Cloud was newly developed in-house to calculate customer risk and creditworthiness. It ran every night and scaled up to 60,000 CPUs. GKE enabled us to fine-tune scalability and hardware settings to exactly meet our needs.&lt;/p&gt;&lt;h3 data-block-key="86ssm"&gt;&lt;b&gt;Innovating for the future&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="3c3sd"&gt;Moving forward, DZ BANK is taking a Cloud Run-first approach with all newly-developed apps to help us innovate for the future. One initiative we have within the bank is to be more sustainable and a more responsible citizen of the world and we're now considering ESG (Environmental, Social and Governance) criteria in our credit processes. As part of this commitment, we developed a new app, running on Cloud Run, which evaluates creditworthiness based on ESG scores assigned to businesses.&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_5JF77BU.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph"&gt;&lt;p data-block-key="3buth"&gt;Selecting Cloud Run has accelerated our development process. Cloud Run's built-in integrations for &lt;a href="https://cloud.google.com/sql"&gt;Cloud SQL&lt;/a&gt;, &lt;a href="https://cloud.google.com/security/products/secret-manager"&gt;Secret Manager&lt;/a&gt;, and &lt;a href="https://cloud.google.com/load-balancing"&gt;Cloud Load Balancing&lt;/a&gt; allowed us to create a simple architecture. Running on Google Cloud also makes it easy to access our custom data lake, which is powered by &lt;a href="https://cloud.google.com/bigquery"&gt;BigQuery&lt;/a&gt;.&lt;/p&gt;&lt;p data-block-key="1rf3o"&gt;From the perspective of our platform team, we also appreciate Cloud Run's approach to simplifying security. As the central bank for cooperative banks in Germany, DZ BANK must continuously meet stringent compliance and security standards. Cloud Run offers a &lt;a href="https://cloud.google.com/run/docs/securing/security"&gt;strong foundation&lt;/a&gt; with encryption and built-in access control, plus integrates with the broader Google Cloud ecosystem of security products like &lt;a href="https://cloud.google.com/run/docs/securing/using-cmek"&gt;customer-managed encryption keys (CMEK)&lt;/a&gt; and &lt;a href="https://cloud.google.com/run/docs/securing/using-vpc-service-controls"&gt;VPC Service Controls&lt;/a&gt;. This enables us to maintain control over our data while minimizing the risk of exfiltration.&lt;/p&gt;&lt;h3 data-block-key="6ladn"&gt;&lt;b&gt;Results and what's next&lt;/b&gt;&lt;/h3&gt;&lt;p data-block-key="56mq2"&gt;We are now three years into our modernization journey with Google Cloud, and we've seen significant benefits. Running containers on Cloud Run instead of on-prem VMs eliminates OS and infrastructure maintenance, delivering a &lt;b&gt;70% toil reduction&lt;/b&gt;. In addition, the pay-per-use pricing model has &lt;b&gt;reduced our infrastructure costs by 90%&lt;/b&gt;.&lt;/p&gt;&lt;p data-block-key="1rl5c"&gt;We look forward to further partnering with Google Cloud in our journey to further modernize our fleet while also building out new innovative apps for the future.&lt;/p&gt;&lt;hr/&gt;&lt;p data-block-key="6753k"&gt;&lt;i&gt;&lt;sup&gt;This blog post received contributions from various people. In particular, we would like to thank Latif Ajouaoui for strategic insights, Yuriy Babenko for technical oversight, and Wietse Venema for reviews.&lt;/sup&gt;&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 06 Mar 2024 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/serverless/how-dz-bank-uses-a-cloud-run-first-approach-to-unlock-big-savings/</guid><category>Customers</category><category>Serverless</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>DZ BANK unlocks 70% toil savings and 90% cost savings with a Cloud Run-first approach</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/serverless/how-dz-bank-uses-a-cloud-run-first-approach-to-unlock-big-savings/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Tim Harpe</name><title>Cloud Engineer, DZ BANK</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Xiaowen Xin</name><title>Product Manager, Serverless Networking and Security</title><department></department><company></company></author></item></channel></rss>