Open Source

The open platform for the AI era: GKE, agents, and OSS innovation at KubeCon EU 2026

Tue, 24 Mar 2026 09:00:00 +0000

As the cloud-native community gathers in Amsterdam for Kubecon + Cloudnativecon Europe this week, we’re excited to highlight some of the work we are doing to support both the open-source Kubernetes ecosystem and Google Kubernetes Engine (GKE). From breaking down the walls between cluster operating modes to making Kubernetes the absolute best place to run AI agents and Ray, here’s a look at what we are rolling out.

Autopilot for everyone

Five years ago, we introduced GKE Autopilot, a fully managed GKE experience that dramatically simplified scaling and infrastructure management. Previously, choosing between GKE Autopilot mode and Standard mode was a "fork in the road" decision made at cluster creation time. If you started with Standard and later wanted to switch to Autopilot, you had to create an entirely new cluster. This created friction for organizations managing mixed clusters, where some workloads required strict node-level control while others needed seamless, hands-off scaling.

Meet the new GKE, where Autopilot is available for every cluster. Autopilot compute classes are now available for Standard clusters, allowing you to turn on Autopilot at any time, on a per-workload basis. Powered by GKE Autopilot’s Container-Optimized Compute Platform (COCP), you can unlock near-real-time, vertically and horizontally scalable compute that provides the exact capacity that you need, when you need it, at the best price and performance.

Furthermore, we are happy to announce we will open source GKE Cluster Autoscaler, one of the core components driving infrastructure provisioning for our customers. Our goal is to provide a vendor-neutral platform that the OSS community can benefit from and build on top of.

Toward CNCF Kubernetes AI Conformance

As the industry moves toward AI at massive scale, standardization is paramount. Together with the Kubernetes community last year, we launched the CNCF Kubernetes AI Conformance program, which simplifies AI/ML on Kubernetes by establishing a standard for cluster interoperability and portability. We are proud to announce that GKE is certified as an AI-conformant platform, so that your models and AI tools can be ported across environments.

Looking ahead to the upcoming v1.36 Kubernetes release, the AI Conformance community is proposing three new requirements to address the evolving needs of AI serving: advanced inference ingress, disaggregated serving, and high-performance networking. Google Cloud is committed to supporting these emerging community standards through GKE Inference Gateway, llm-d, and DRANET.

Model Context Protocol: An agent interface

To streamline how AI agents interact with Kubernetes, last year, we introduced the open-source GKE Model Context Protocol (MCP) Server, which offers a standardized interface that allows agents to manage, analyze, and monitor workloads, clusters, and resources through specific defined capabilities. By exposing these capabilities, MCP Server makes it easier to integrate various AI clients, including Gemini CLI and Antigravity, promoting more intelligent and automated management of Kubernetes ecosystems.

Kubernetes as AI infrastructure

llm-d is officially a CNCF Sandbox project, which marks a significant step in evolving Kubernetes into state-of-the-art AI infrastructure. Launched in May 2025 as a collaborative effort with industry leaders like Red Hat and NVIDIA, llm-d provides a Kubernetes-native distributed inference framework designed to be hardware-agnostic and vendor-neutral.

The project addresses complex AI orchestration challenges by introducing well-lit paths for inference-aware traffic management, native orchestration for multi-node replicas, and advanced state management for hierarchical KV cache offloading. By bridging the gap between cloud-native orchestration and frontier AI research, llm-d democratizes high-performance AI serving and establishes open, reproducible benchmarks for inference performance across various accelerators. We plan to work with the CNCF AI Conformance program on llm-d to help ensure critical capabilities like disaggregated serving are interoperable across the ecosystem. For more on llm-d, check out our blog here.

DRA is the new standard for resource management

Kubernetes was created in a simpler time, when CPU and Memory were the only variables, and clouds were seen as infinitely elastic. Today, of course, hardware is specialized and variable. Dynamic Resource Allocation, or DRA, is an industry-standard solution for describing unique hardware in a standard format, allowing higher-level workloads and schedulers to optimize resources without access to low-level details about them. Today, we’re proud to announce the open-source release of our DRA driver for TPUs, marking a significant milestone in bringing AI workload portability to the Kubernetes ecosystem. Google and NVIDIA partnered closely on the design and implementation of DRA in OSS Kubernetes in a collaborative push to establish a unified resource management standard. We are proud to coordinate this release with the donation of the NVIDIA DRA Driver. This is in addition to our DRA driver for networking, DRANET, which is already available as a managed feature of GKE.

Supporting the agentic wave: Inference and agents

The agentic AI wave is upon us, and we believe Kubernetes is unequivocally the best platform on which to run these agents. To execute LLM-generated code and interact with AI agents with confidence, you need deep isolation, rapid startup times, and specialized infrastructure.

We are heavily investing in open-source inference work to make this a reality. By leveraging innovations like Kubernetes Agent Sandbox for secure, gVisor-backed isolation, and GKE Pod Snapshots, which drastically improve startup latency by restoring workloads from a memory snapshot, we are establishing a standard for agentic AI on Kubernetes and providing high performance and compute efficiency for agents running on GKE.

Ray on Kubernetes: TPUs and better observability

Ray has become the standard for scaling demanding AI workloads, and we believe Kubernetes is a great place to run it. Until recently, official accelerator support was limited to NVIDIA GPUs. We are excited to announce TPUs in Ray v2.55, with full support by Anyscale and Google.

Ray on K8s users have historically struggled to debug and optimize performance, because they didn’t have access to historical data about their jobs.To solve this, we are introducing the ability to debug issues after the RayJob has completed or terminated. The Ray History Server uses Kuberay to set up and persist logs, state and metrics from live RayJobs and reproduce them in the Ray Dashboard. The Ray History Server (alpha) is available to try today.

Join us at the booth

Whether you are scaling up next-gen AI inference, deploying highly isolated agentic workflows, or simply looking to optimize compute capacity across your clusters, we are committed to making Kubernetes and GKE the ultimate platform for your success.

If you’re at KubeCon Europe, stop by the Google Cloud booth (#310) to dive deep into these announcements and to discover our sessions, lightning talks, hands on labs, and demos — plus a friendly competition with our text-based adventure game. Here's to the future of Kubernetes!

Kubernetes as AI Infrastructure: Google Cloud, llm-d, and the CNCF

Tue, 24 Mar 2026 09:00:00 +0000

At Google Cloud, serving the massive-scale needs of large foundation model builders and AI-native companies is at the forefront of our AI infrastructure strategy. As generative AI transitions to mission-critical production environments, these innovators require dynamic, relentlessly efficient infrastructure to overcome complex orchestration challenges and power an agentic future.

To meet this moment, we are thrilled to announce that llm-d has officially been accepted as a Cloud Native Computing Foundation (CNCF) Sandbox project. Google Cloud is proud to be a founding contributor to llm-d alongside Red Hat, IBM Research, CoreWeave, and NVIDIA, uniting around a clear, industry-defining vision: any model, any accelerator, any cloud.

This contribution underscores Google’s long-standing leadership in open-source innovation. And under the trusted stewardship of the Linux Foundation, we are helping ensure that the future of distributed AI inference is built on open standards rather than walled gardens. This gives foundation model builders the confidence to deploy their models globally without vendor lock-in, while empowering them to run the absolute best, most highly optimized implementations of these open technologies directly on Google Cloud.

Supercharging Kubernetes for inference

Kubernetes is the undisputed industry standard for orchestration. While it provides a rock-solid foundation, it wasn’t originally built for the highly stateful and dynamic demands of LLM inference. To evolve Kubernetes for this new class of workload, we launched GKE Inference Gateway, which provides native APIs to go far beyond simple load balancing. Under the hood, the gateway leverages the llm-d Endpoint Picker (EPP) for scheduling intelligence. By delegating routing decisions to llm-d, the system enforces a multi-objective policy that considers real-time KV-cache hit rates, the number of inflight requests, and instance queue depth to route each request to the most optimal backend for processing.

For foundation model builders operating at massive scale, the real-world impact of this model-aware routing is transformative. Recently, our Vertex AI team validated this architecture in production, proving its ability to handle highly unpredictable traffic without relying on fragile custom schedulers. For context-heavy coding tasks using Qwen Coder, Time-to-First-Token (TTFT) latency was slashed by over 35%. When handling bursty, stochastic chat workloads using DeepSeek for research, P95 tail latency improved by 52%, effectively absorbing severe load variance. Crucially, the gateway's routing intelligence doubled Vertex AI's prefix cache hit rate from 35% to 70%, drastically lowering re-computation overhead and cost-per-token.

Beyond intelligent routing, orchestrating multi-node AI deployments requires bulletproof underlying primitives, which is why Google leads the development of the Kubernetes LeaderWorkerSet (LWS) API. LWS enables llm-d to orchestrate wide expert parallelism and disaggregate compute-heavy prefill and memory-heavy decode phases into independently scalable pods. With its widespread industry adoption, LWS now orchestrates a rapidly growing footprint of production AI workloads, managing massive fleets of TPUs and GPUs at global scale. Complementing this orchestration, Google recently extended vLLM natively for Cloud TPUs. Featuring a unified PyTorch and JAX backend alongside innovations like Ragged Paged Attention v3, this integration delivers up to 5x throughput gains over our first release earlier last year. Together, whether you are scaling on Google Cloud TPUs or NVIDIA GPUs, these advancements help ensure state-of-the-art AI serving remains a highly optimized, accelerator-agnostic capability.

Building next-gen AI infrastructure together

To build the ultimate AI infrastructure, we must bridge the gap between cloud-native Kubernetes orchestration and frontier AI research. The shift to production-grade gen AI requires an engine built on trust, transparency, and deep collaboration with the AI/ML leaders pushing the boundaries of what is possible.

We are incredibly excited to partner with the Linux Foundation, the CNCF, the PyTorch Foundation, and the rest of the open-source community to build the next generation of AI infrastructure. By establishing "well-lit paths" — proven, replicable blueprints tested end-to-end under realistic load — we are ensuring that high-performance AI thrives as an open, universally accessible ecosystem that empowers innovation without boundaries.

We invite large foundation model builders, AI natives, platform engineers, and AI researchers to join us in shaping the open future of AI inference:

Explore the well-lit paths: Visit the llm-d guides to start deploying SOTA inference stacks on your infrastructure today.
Learn more: Check out the official website at https://llm-d.ai/
Contribute: Join the community on Slack and get involved in our GitHub repositories at https://github.com/llm-d/.

Join us in celebrating llm-d at the CNCF! We look forward to scaling the engine together.

OTLP everywhere: Cloud Monitoring now supports OpenTelemetry Protocol metrics

Mon, 09 Feb 2026 17:00:00 +0000

As part of our commitment to open standards, Google Cloud is deeply invested in making OpenTelemetry the universal client, data format, and set of standards for telemetry data.

Last year we announced support in Cloud Observability for sending traces using OpenTelemetry Protocol (OTLP). Today, we’re excited to announce the next step toward our goal of OpenTelemetry everywhere: Cloud Observability now supports OTLP for metrics in Cloud Monitoring!

OTLP for metrics: More than just a new standard

Using OpenTelemetry and OTLP lets you generate and send metric data to Google Cloud with a completely provider-agnostic pipeline: You can create OTLP metrics using the OpenTelemetry SDK, collect and transform them using the OpenTelemetry collector, and send that data directly to Cloud Monitoring in OpenTelemetry format.

By default, this data gets stored in the same format as Managed Service for Prometheus data, at the same low price. This data is queryable using the same interfaces available to query any other data in Cloud Monitoring.

Using OTLP also lets you take advantage of several highly-requested new features, such as:

DELTA-type metrics: Send the amount that a monotonic counter changed between the last export and the current export, instead of tracking all counters in memory and always sending the latest value of the counter. This allows clients to flush memory in between exports, which significantly reduces resource consumption on the client-side and better supports collecting short-lived or infrequently incremented time series.
Exponential (dynamic) histograms: Classic Histograms require you to explicitly set their bucket widths based on the projected data distribution. If that projection doesn’t match the actual data distribution, you can end up with all the observations lumped into a few low buckets, or with the most interesting observations smeared across a “lower than infinity” bucket. OpenTelemetry Exponential Histograms instead dynamically change the bucket boundaries based on the range of values actually seen, so you no longer have to guess and check histogram buckets. Just set it and forget it!
Dots and slashes in metric names and dots in label keys: Cloud Monitoring now has full support for querying URL-style names using PromQL. Additionally, Cloud Monitoring now supports the . character in label keys, which enables support for OpenTelemetry’s semantic conventions.
Sending metrics directly from the SDK to Cloud Monitoring with no collector: For extremely high-volume, high-cardinality metric sources such as Envoy (which reports pod-pod and service-service traffic) or customer-run load balancer processes, it can be prohibitively expensive to have an OpenTelemetry collector in the pipeline. Collectors can get overloaded with excessive volume of metrics, and horizontally or vertically scaling them is a lot of work for developers. With OTLP, you can point metrics exported by the OpenTelemetry SDK directly at Cloud Observability’s Telemetry API for metrics, letting you rely on Google to handle your volume rather than having to run and scale an intermediary process yourself.
Zero-code auto-instrumentation for metrics and traces: Use OpenTelemetry to automatically instrument compatible workloads, which generates standardized traces and Golden Signal metrics without requiring any code, and then send data to Cloud Observability in OTLP format. No longer do you need to mix application and instrumentation code to get Golden Signal metrics — and that’s if you even remember to consistently instrument every RPC in your code.

Managed OpenTelemetry for Google Kubernetes Engine

Running an OpenTelemetry collector yourself can be a lot of work, requiring you to manually deploy, configure, and scale OpenTelemetry collectors. But for typical workloads with typical observability profiles, all you typically need is a simple in-cluster endpoint for receiving and enriching OTLP signals.

That’s why we’re excited to also announce Managed OpenTelemetry for GKE, a fully-managed, “one-click” pipeline for generating and collecting OTLP traces, metrics, and logs on Google Kubernetes Engine. Let Google handle the collector lifecycle, upgrades, and scaling, so you can focus on your application code, not your observability infrastructure.

Managed OpenTelemetry is the first fully managed trace solution for GKE. Tracing is critical for application performance monitoring and powers features like the application topology map, a dynamic, actionable view of your application's dependencies.

You can also use Managed OpenTelemetry for GKE to automatically configure and instrument workloads that use the OpenTelemetry SDK. With a single Custom Resource, you can get Golden Signals in Cloud Observability for all your OpenTelemetry-enabled applications — including AI agents built with frameworks that support OpenTelemetry such as the Agent Development Kit (ADK).

How to get started

OTLP for metrics is currently in preview, is open to all customers, and is supported when using OpenTelemetry versions 0.140.0 and higher. To get started with OTLP metrics using the OpenTelemetry SDK or the OpenTelemetry Collector, see the OTLP metric ingestion documentation. When running your own collector, we recommend using the Google-built OpenTelemetry Collector whenever possible.

Managed OpenTelemetry for GKE is currently in preview, is open to all customers, and is available for GKE cluster versions 1.34.1-gke.2178000 or later and gcloud CLI versions 551.0.0 or later. To get started, see the Managed OpenTelemetry for GKE documentation.

Finally, to get started with zero-code auto-instrumentation for Java workloads on GKE using a self-deployed OpenTelemetry Collector, see the zero-code documentation.

How the Max Planck Institute is sharing expert skills through multimodal agents

Fri, 24 Oct 2025 16:00:00 +0000

Effective monitoring and treatment of complex diseases like cancer and Alzheimer's disease depends on understanding the underlying biological processes, for which proteins are essential. Mass spectrometry-based proteomics is a powerful method for studying these proteins in a fast and global manner. Yet the widespread adoption of this technique remains constrained by technical complexity as mastering these sophisticated analytical instruments and procedures requires specialized training. This creates an expertise bottleneck that slows research progress.

To address this challenge, researchers at the Max Planck Institute of Biochemistry collaborated with Google Cloud to build a Proteomics Lab Agent that assists scientists with their experiments. This agent simplifies performing complex scientific procedures through personalized AI guidance, making them easier to execute, while automatically documenting the process.

“A lab’s critical expertise is often tacit knowledge that is rarely documented and lost to academic turnover. This agent addresses that directly, not only by capturing hands-on practice to build an institutional memory, but by systematically detecting experimental errors to enhance reproducibility. Ultimately, this is about empowering our labs to push the frontiers of science faster than ever before.”, said Prof. Matthias Mann, a pioneer in mass spectrometry-based proteomics who leads the Department of Proteomics and Signal Transduction at the Max Planck Institute of Biochemistry.

The agent was built using the Agent Development Kit (ADK), Google Cloud infrastructure, and Gemini models, which offer advanced video and long-context understanding uniquely suited to the needs of advanced research.

One of the agent's core capabilities is to detect errors and omissions by analyzing a video of a researcher performing lab work and comparing their actions against a reference protocol. This process takes just over two minutes and catches about 74% of procedural errors with high accuracy, although domain-specific knowledge and spatial recognition should still be improved.Our Ai-assisted approach is more efficient compared to the current manual approach, which relies on a researcher's intuition to either spot subtle mistakes during the procedure or, more commonly, to troubleshoot only after an experiment has failed.

By making it easier to spot mistakes and offering personalized guidance, the agent can reduce troubleshooting time and build towards a future where real-time AI guidance can help prevent errors from happening.

The potential of the Proteomics AI agent goes beyond life sciences, addressing a universal challenge in specialized fields: capturing and transferring the kind of expertise that is learned through hands-on practice, not from manuals. To enable other researchers and organizations to adapt this concept to their own domains, the agentic framework has been made available as an open-source project on GitHub.

In this post, we will detail the agentic framework of the Proteomics Lab Agent, how it uses multimodal AI to provide personalized laboratory guidance, and the results from its deployment in a real-world research environment.

Proteomics Lab Agent generates protocols and detects errors

The challenge: Preserving expert knowledge in a high-turnover environment

Imagine it’s a Friday evening in the lab. A junior researcher needs to use a sophisticated analytical instrument, a mass spectrometer, but the senior expert who is responsible for it has already left for the weekend. The researcher has to search through lengthy protocols, interpret the instrument’s performance, which depends on multiple factors reflected in diverse metrics, and proceed without guidance. A single misstep could potentially damage the expensive equipment, waste a unique and valuable sample, or compromise the entire study.

Such complexity is a regular hurdle in specialized research fields like mass spectrometry-based proteomics. Scientific progress often depends on complex techniques and instruments that require deep technical expertise. Laboratories face a significant bottleneck in training personnel, documenting procedures, and retaining knowledge, especially with the high rate of academic turnover. When an expert leaves, their accumulated knowledge often leaves with them, forcing the team to partially start over. Collectively, this creates accessibility and reproducibility challenges, which slows down new discoveries.

A solution: an AI agent for lab guidance

The proteomics lab agent addresses these challenges by connecting directly to the lab's collective knowledge - from protocols and instrument data to past troubleshooting decisions. With this it provides researchers with personalized AI guidance for complex procedures across the entire experimental workflow. Examples include regular wet-lab work such as pipetting or the interactions with specialized equipment and software as required for operating a mass spectrometer. A further feature of the agent is the ability to automatically generate detailed protocols from videos of experiments, detect procedural errors, and provide guidance for correction, reducing troubleshooting and documentation time.

An AI agent architecture for the lab

The underlying multimodal agentic AI framework uses a main agent that coordinates the work of several specialized sub-agents, as shown in Figure 1. Built with Gemini models and the Agent Development Kit, this main agent acts as an orchestrator. It receives a researcher's query, interprets the request, and delegates the task to the appropriate sub-agent.

Figure 1: Architecture of the Proteomics Lab Agent for multimodal guidance.

The sub-agents are designed for specific functions and connect to the lab's existing knowledge systems:

Lab Note and Protocol Agents: These agents handle video-related tasks. When a researcher provides a video of an experiment, these agents upload videos to Google Cloud Storage to allow the analysis of the visual and spoken content of a video. Following, the agent can check for errors or generate a new protocol.
Lab Knowledge Agent: This agent connects to the laboratory’s knowledge base (MCP Confluence) to retrieve protocols or save new lab notes, making knowledge accessible to the entire team.
Instrument Agent: To provide guidance on using complex analytical instruments, this agent retrieves instrument performance metrics from a self-build MCP server that monitors the lab's mass spectrometers (MCP AlphaKraken).
Quality Control Memory Agent: This agent captures all instrument-related decisions and their outcomes in a database (e.g. MCP BigQuery). This creates a searchable history of what has worked in the past and preserves valuable troubleshooting experience.

Together, these agents can provide guidance adapted to the current instrument status and the researcher's experience level while automatically documenting the researcher's experience.

A closer look: Catching experimental errors with video analysis

While generative AI has proven effective for digital tasks in science - from literature analysis to controlling lab robots through code - it has not addressed the critical gap between digital assistance and hands-on laboratory execution. Our work demonstrates how to bridge this divide by automatically generating lab notes and detecting experimental errors from a video.

Figure 2: Agent workflow for the video-based lab note generation and error detection.

The process, illustrated in Figure 2, unfolds in several steps:

A researcher records their experiment and submits the video to the agent with a prompt like, "Generate a lab note from this video and check for mistakes.".
The main agent delegates the task to the Lab Note Agent, which uploads the video to Google Cloud Storage and analyzes the actions performed in the video.
The main agent asks the Lab Knowledge Agent to find the protocol that matches these actions. The Lab Knowledge Agent then retrieves it from the lab's knowledge base, Confluence.
With both the video analysis and the baseline protocol, the task is passed on to the Lab Note Agent again, which has the knowledge how to perform a step-by-step comparison of video and protocol. It flags any potential mistakes, such as missed steps, incorrectly performed actions, added steps not in the protocol, or steps completed in the wrong order.
The main agent returns the generated lab notes to the researcher with these potential errors flagged for review. The researcher can accept the notes or make corrections.
Once finalized, the corrected notes are saved back to the Confluence knowledge base via the Lab Knowledge Agent, preserving a complete and accurate record of the experiment.

Building institutional memory

To support a lab in building a knowledge base, the Protocol Agent can generate lab instructions directly from a video. A researcher can record themselves performing a procedure while explaining the steps aloud. The agent analyzes the video and audio to produce a formatted, publication-ready protocol. We found that providing the model with a diverse set of examples, step-by-step instructions, and relevant background documents produced the best results.

Figure 3: Agent workflow for guiding instrument operations.

The agent can also support instrument operations (see Figure 3). A researcher may ask, "Is instrument X ready so that I can measure my samples?". The agent retrieves the latest instrument metrics via the Instrument Agent and compares it with past troubleshooting decisions from the Quality Control Memory Agent. It then provides a recommendation, such as "Yes, the instrument is ready," or "No, calibration is recommended first”. It can even provide the relevant calibration protocol from the Lab Knowledge Agent. Subsequently, it saves the final researcher's decision and actions with the Quality Control Memory Agent. With this, every reasoning and its outcome is saved, creating a continuously improving knowledge base for operating specialized equipment and software.

More technical details are described in our full publication.

Real-world impact: Making complex scientific procedures easier

To measure the AI agent’s value in a real-world setting, we deployed it in our department at the Max Planck Institute of Biochemistry, a group with 40 researchers. We evaluated the agent's performance across three key laboratory functions: detecting procedural errors, generating protocols, and providing personalized guidance.

The results showed strong gains in both speed and quality. Key findings include:

AI-assisted error detection: The agent successfully identified 74% of all procedural errors (a metric known as recall) with an overall accuracy of 77% when comparing 28 recorded lab procedures against their reference protocols. While precision (41%) is still a limitation at this early stage, the results are highly promising.
Fast, expert-quality protocols: From lab videos, the agent generated standardized, publication-ready protocols in about 2.6 minutes. This was approximately 10 times faster than manual creation and achieved an average quality score of 4.4 out of 5 across 10 diverse protocols.
Personalized, real-time support: The agent successfully integrated real-time instrument data with past performance decisions to provide researchers with tailored advice on equipment use.

A deeper analysis of the error-detection results revealed specific strengths and areas for improvement. As shown in Figure 4, the system is already effective at recognizing general lab equipment and reading on-screen text. The main limitations were in understanding highly specialized proteomics equipment (27% of these errors were unrecognized) and perceiving fine-grained details, such as the exact placement of pipette tips on a 96-well grid (47%) or small text on pipettes (41%) (see Appendix of corresponding paper). As multimodal models advance, we expect their ability to interpret these details will improve, strengthening this critical safeguard against experimental mistakes.

Figure 4: Strengths and current limitations of the Proteomics Lab Agent in a lab.

Our agent already automates documentation and flags errors in recorded videos, but its future potential lies in prevention, not just correction. We envision an interactive assistant that uses speech to prevent mistakes in real-time before they happen. By making this project open source, we invite the community to help build this future.

Scaling for the future

In conclusion, this framework addresses critical challenges in modern science, from the reproducibility crisis to knowledge retention in high-turnover academic environments. By systematically capturing not just procedural data but also the expert reasoning behind them, the agent builds an institutional memory.

"This approach helps us capture and share the practical knowledge that is often lost when a researcher leaves the lab", notes Matthias Mann. "This collected experience will not only accelerate the training of new team members but also creates the data foundation we need for future innovations like predictive instrument maintenance for mass spectrometers and automated protocol harmonization within individual labs and across different labs".

The principles behind the Proteomics Lab Agent are not limited to one field. The concepts outlined in this study are a generalizable solution for any discipline that relies on complex, hands-on procedures, from life sciences to manufacturing.

Dive deeper into the methodology and results by reading our full paper. Explore the code on GitHub and adapt the Proteomics Lab Agent for your own research. Follow the work of the Mann Lab at the Max Planck Institute to see what comes next either on LinkedIn, BlueSky or X.

_{This project was a collaboration between the Max Planck Institute of Biochemistry and Google. The core team included Patricia Skowronek and Matthias Mann from Department of Proteomics and Signal Transduction at the Max Planck Institute for Biochemistry and Anant Nawalgaria from Google. P.S. and M.M. want to thank the entire Mann Lab for their support.}

Powering AI commerce with the new Agent Payments Protocol (AP2)

Tue, 16 Sep 2025 13:00:00 +0000

Today, Google announced the Agent Payments Protocol (AP2), an open protocol developed with leading payments and technology companies to securely initiate and transact agent-led payments across platforms. The protocol can be used as an extension of the Agent2Agent (A2A) protocol and Model Context Protocol (MCP). In concert with industry rules and standards, it establishes a payment-agnostic framework for users, merchants, and payments providers to transact with confidence across all types of payment methods.

We’re collaborating with a diverse group of more than 60 organizations to help shape the future of agentic payments, including Adyen, American Express, Ant International, Coinbase, Etsy, Forter, Intuit, JCB, Mastercard, Mysten Labs, Paypal, Revolut, Salesforce, ServiceNow, UnionPay International, Worldpay, and more.

Why is a protocol needed?

AI agents are capable of transacting on behalf of users, which creates a need to establish a common foundation to securely authenticate, validate, and convey an agent’s authority to transact. While today’s payment systems generally assume a human is directly clicking "buy" on a trusted surface, the rise of autonomous agents and their ability to initiate a payment breaks this fundamental assumption and raises critical questions that AP2 helps to address, including:

Authorization: Proving that a user gave an agent the specific authority to make a particular purchase.
Authenticity: Enabling a merchant to be sure that an agent's request accurately reflects the user's true intent.
Accountability: Determining accountability if a fraudulent or incorrect transaction occurs.

AP2 is an open, shared protocol that provides a common language for secure, compliant transactions between agents and merchants, helping to prevent a fragmented ecosystem. It also supports different payment types–from credit and debit cards to stablecoins and real-time bank transfers. This helps ensure a consistent, secure, and scalable experience for users and merchants, while also providing financial institutions with the clarity they need to effectively manage risk.

How it works: Establishing trust via mandates and verifiable credentials

AP2 builds trust by using Mandates—tamper-proof, cryptographically-signed digital contracts that serve as verifiable proof of a user's instructions. These mandates are signed by verifiable credentials (VCs) and act as the foundational evidence for every transaction.

Mandates address the two primary ways a user will shop with an agent:

Real-time purchases (human present): When you ask an agent, “Find me new white running shoes,” your request is captured in an initial Intent Mandate. This provides the auditable context for the entire interaction in a transaction process. After the agent presents a cart with the shoes you want, your approval signs a Cart Mandate. This is a critical step that creates a secure, unchangeable record of the exact items and price, ensuring what you see is what you pay for.
Delegated tasks (human not present): When you delegate a task like, “Buy concert tickets the moment they go on sale,” you sign a detailed Intent Mandate upfront. This mandate specifies the rules of engagement—price limits, timing, and other conditions. It serves as verifiable, pre-authorized proof that can allow the agent to automatically generate a Cart Mandate on your behalf once your precise conditions are met.

In both scenarios, this chain of evidence culminates in securely linking your payment method to the verified contents of the Cart Mandate. This complete sequence—from intent, to cart, to payment—creates a non-repudiable audit trail that answers the critical questions of authorization and authenticity, providing a clear foundation for accountability.

Unlocking new commerce experiences

AP2’s flexible design provides a foundation to support both simple and entirely new commercial models. Let’s consider a few examples below, which all assume Intent Mandates have been signed on behalf of a user:

Smarter shopping: A customer discovers a winter jacket they want is unavailable in a specific color, so they tell their agent: "I really want this jacket in green, and I'm willing to pay up to 20% more for it." The agent then monitors prices and availability and automatically executes a secure purchase the moment that specific variant is found, capturing a high-intent sale that would have otherwise been lost.
Personalized offers: A shopper tells their agent they want a new bicycle for an upcoming trip from a specific merchant. Their agent communicates this information—which includes the trip's date—to the merchant, whose own agent can respond by creating a custom, time-sensitive bundle offer that includes the bike, a helmet, and a travel rack at a 15% discount, turning a simple query into a more valuable sale.
Coordinated tasks: A user is planning a weekend trip and tells their agent: "Book me a round-trip flight and a hotel in Palm Springs for the first weekend of November, with a total budget of $700." The agent can then interact with both airline and hotel agents, as well as online travel agencies and booking platforms, and once it finds a combination that fits the budget, it can execute both cryptographically-signed bookings simultaneously.

Support for emerging payments systems

AP2 is designed as a universal protocol, providing security and trust for a variety of payments like stablecoins and cryptocurrencies. To accelerate support for the web3 ecosystem, in collaboration with Coinbase, Ethereum Foundation, MetaMask and other leading organizations, we have extended the core constructs of AP2 and launched the A2A x402 extension, a production-ready solution for agent-based crypto payments. Extensions like these will help shape the evolution of cryptocurrency integrations within the core AP2 protocol.

What’s next: A call for collaboration

AP2 provides a trusted foundation to fuel a new era of AI-driven commerce. It establishes the core building blocks for secure transactions, creating clear opportunities for the industry–including networks, issuers, merchants, technology providers, and end users–to innovate on adjacent areas like seamless agent authorization and decentralized identity. We are committed to evolving this protocol in an open, collaborative process, including through standards bodies, and invite the entire payments and technology community to build this future with us.

Many of the partners building A2A agents have extended their support to AP2. This growing ecosystem will continue to make their agents available in our AI Agent Marketplace, including new, transactable experiences enabled by AP2. For example, enterprise companies could use AP2 for B2B applications, such as enabling autonomous procurement of partner-built solutions via Google Cloud Marketplace or the automatic scaling of software licenses based upon real-time needs.

To get started, visit our public GitHub repository to review the complete technical specification, documentation, and reference implementations. Moving forward, this repository will be updated regularly with additional reference implementations from Google and innovations from the community to demonstrate the power and scalability of AP2.

Support from our ecosystem

Accenture: “Google Cloud's Agent Payments Protocol (AP2) complements the Agent2Agent protocol and Model Context Protocol to provide a unified framework for agents to transact. Innovations like this will enable many of the agentic solutions that reinvent payments for clients – not only for today’s needs, but for the evolving models of future commerce.” – Scott Alfieri, Google Business lead at Accenture
Adobe: “Adobe is proud to work with Google to advance secure and authenticated agentic commerce - our role in the Agent Payments Protocol (AP2) underscores our commitment to trusted, AI-driven experiences. With Adobe Commerce and AI agents powering customer journeys, we are focused on delivering secure, reliable, and authentic transactions for businesses and consumers." - Loni Stark, VP of Strategy and Product at Adobe
Adyen: "Agentic commerce is not just about a consumer-facing chatbot, but about the underlying infrastructure that powers it all. Adyen’s collaboration on Google’s Agent Payments Protocol (AP2) is a natural extension of our mission to provide the merchants with the payments building blocks for tomorrow’s commerce. We're excited to help establish a common rulebook that ensures security and interoperability for everyone involved in the payments ecosystem." - Ingo Uytdehaage, Co-CEO at Adyen
Airwallex: "Airwallex is thrilled to support Google’s Agent Payments Protocol (AP2). This is a critical step forward in building a secure, interoperable ecosystem for agentic AI payments. This protocol gives businesses and consumers the confidence to delegate tasks to AI agents, aligning with our mission to build the future of finance by empowering businesses globally.” - Jacob Dai, Co-Founder & CTO at Airwallex
American Express: “With the rise of AI-driven commerce, trust and accountability are more important than ever. American Express is excited to contribute to the creation of AP2 as a protocol intended to protect customers and enable participation in the next generation of digital payments.” - Luke Gebb, EVP, Amex Digital Labs, American Express
Ant International: "Ant International is excited to partner with Google on protocol-setting for practical AI applications in agentic commerce to unlock new merchant growth and elevate consumer experience, by leveraging our expertise in alternative payment methods and trusted AI innovations." - Jiangming Yang, Chief Innovation Officer at Ant International
BHN: “As a trusted partner processing billions of transactions globally, BHN is excited to help shape emerging protocols like AP2 that will enable both merchants and consumers to leverage the power of stored value in secure, autonomous commerce, enabled by AI Agents.” - Nik Sathe, CPTO at BHN
BVNK: "Stablecoins provide an obvious solution to the scaling challenges agentic systems are already facing with legacy financial infrastructure. We at BVNK were extremely excited to hear that Google has been working on solving this problem and couldn't wait to contribute" - Donald Jackson, CTO at BVNK
Checkout.com: "Agentic commerce is reshaping the checkout moment, and Google’s Agent Payments Protocol (AP2) is a pivotal step forward. At Checkout.com, we’re proud to support open protocols that strengthen trust and give merchants the flexibility to meet their customers where they are, however they want to shop.” – Meron Colbeci, Chief Product Officer, Checkout.com
Coinbase: "x402 and AP2 show that agent-to-agent payments aren’t just an experiment anymore, they’re becoming part of how developers actually build. Bringing x402 into AP2 to power stablecoin payments made sense - it’s a natural playground for agents to start transacting with each other and testing out crypto rails. And it’s exciting to see the idea of agents paying each other resonate with the broader AI community." – Erik Reppel, Head of Engineering at Coinbase Developer Platform
Crossmint: “With Crossmint’s tools, developers can let agents buy anything using both credit cards or stablecoins. Our goal is to unlock instantaneous, global commerce, giving agent builders the greatest flexibility. Our partnership with Google on AP2 represents our commitment that agentic commerce wins everyone’s trust as a secure, reliable, and seamless way to transact. Time to accelerate!” – Alfonso Gomez, Co-founder at Crossmint
Confluent: "Confluent is excited to support Google in this effort to build an open, secure, and high-trust payments protocol. Agent Payments Protocol (AP2) aligns perfectly with our vision of a real-time data-driven world, and we believe our expertise in data streaming with Apache Kafka will be critical in creating a resilient and scalable payments ecosystem for the agentic web." – Pascal Vantrepote, Partner CTO at Confluent
Dell: “At Dell Technologies, we’re dedicated to making agentic AI a reality for businesses worldwide. The transformative potential of agentic automation hinges on trust, security and standardization, especially for customer-facing eCommerce platforms. By supporting the Agent Payments Protocol (AP2) with Google, we’re laying the groundwork for a future where AI-driven commerce is reliable, accessible, and trusted by all." – Satish Iyer, Vice President, Innovation & Ecosystems, Office of the CTO, Dell Technologies
Deloitte: “As Agentic Commerce rapidly emerges as a transformative force, the industry will need robust standards to empower AI agents to transact payments securely and effectively. These standards must address critical areas such as security, identity, frictionless commerce, trust, and privacy, all while providing compatibility with the existing global payments infrastructure. Deloitte is proud to help shape this evolving industry alongside Google, extending the widely adopted A2A protocol to enable agent-driven payments and commerce.” – Gopal Srinivasan, Alphabet Google Alliance Global AI & Data Leader at Deloitte Consulting LLP
DLocal: "Payment agents are no longer an idea, they’re rapidly becoming a reality. In the dynamic emerging markets we serve, payments are fragmented and complex, from cards to local payment methods, to wallets and stablecoin.Agent Payments Protocol (AP2) turns that complexity into a single, interoperable framework, enabling agent-initiated payments that are safe, seamless, and designed to boost merchant conversion while keeping users in control." - Pedro Arnt, CEO at DLocal
Ebanx: "Agent Payments Protocol (AP2) will power the next era of commerce, and to build a safe and secure environment for this is now the most important step. EBANX is proud to be part of this effort with Google." - Eduardo de Abreu, Vice President of Product at Ebanx
Eigen Labs: “Google’s new Agent Payments Protocol (AP2) is a major step toward a future where AI agents are meaningful economic actors, whether that’s on behalf of humans, organizations or themselves. EigenCloud is proud to partner with Google on this initiative to provide the verifiability infrastructure that ensures these agents are held accountable by any counterparty. Together, we’re helping create a global verifiable economy where agents can coordinate, transact, and prove their actions to humans and to each other.” - Sreeram Kannan, Founder & CEO at Eigen Labs
Fiuu: ""As agentic commerce reshapes payments infrastructure, Fiuu supports open protocols like A2A and AP2 to enable secure, scalable agent-to-agent transactions across multi-channel systems, advancing interoperability, trust, and inclusive payment ecosystems." - Eng Sheng Guan, CEO at Fiuu
Forter: "At Forter we believe in the potential of agents to revolutionize commerce and we are proud to collaborate with Google in creating modern protocols that benefit brands, consumers and AI developers alongside Forter’s Trusted Agentic Commerce Protocol (TACP).” - Michael Reitblat, CEO at Forter
Gr4vy: "We are proud to support this new open protocol (AP2). By working together as an industry, we can ensure this next chapter of payments is built on trust, transparency and flexibility." - John Lunn, CEO and Founder at Gr4vy
Gravitee: “In the agentic world, secure and trusted transactions demand open protocols. Google’s Open Standard for Agent Payments Protocol (AP2) addresses this need. Gravitee’s Agent Mesh already supports A2A and MCP with a strong focus on security and governance, and we are committed to extending this support so customers in financial services, retail, and beyond can confidently benefit” – Linus Hakansson, Chief Product Officer at Gravitee
Global Fashion Group: “Integrating A2A and MCP into the Agent Payments Protocol (AP2) enables a modular, interoperable architecture with versioned contracts, making integration and testing straightforward. Modern payments, engineered for scale—secure, seamless, and built to power global commerce.” - Quy Tran, Director of Engineering at Global Fashion Group
Intuit: “Intuit focuses on enabling the financial success of consumers, businesses, and accountants. We are excited to leverage our AI and data capabilities to help develop the open Agent Payments Protocol (AP2) to create better experiences for all. Our technologists will have the ability to use the protocol to deploy AI agents towards autonomous financial workflows as part of our done-for-you experiences for customers.” – Tapasvi Moturu, Vice President, Software Engineering at Intuit
JCB: "JCB champions Google’s Agent Payments Protocol (AP2) initiative as the innovative and important protocol that will unlock a new era of payments, and JCB looks forward to contributing to the protocol to benefit our entire ecosystem, including our banking and payment institution partners, cardmembers, and merchants." - Shinya Kubotera, Executive Officer & Head of Strategic Innovations at JCB co., Ltd
JusPay: "The future of payments is inherently open and interoperable - our work with UPI and the development of Hyperswitch, the world's first open-source payments orchestration platform has demonstrated this power. We believe this new protocol (AP2) provides the secure, shared foundation needed to make AI-driven commerce a reality, and we are ready to contribute our expertise to this initiative." - Sheetal Lalwani, Co-founder at Juspay
KCP: “NHN KCP endorses the Agent Payments Protocol (AP2) as a key advancement in the global payments ecosystem and looks forward to collaborating with global partners to help make AI-based payments more reliable, convenient, and widely adopted.” – Jae-wook Noh, Executive Managing Director at NHN KCP
Lightspark: “Having worked on payments at Google, I’ve seen how open and verified protocols can unlock powerful network effects. Google’s Agent Payments Protocol (AP2) is a big step toward a future where trusted AI agents transact seamlessly on our behalf. At Lightspark, we’re committed to that vision of open, global interoperability." - Alberto Martin CPO at Lightspark
ManusAI: “Google's Agent Payments Protocol (AP2) represents a breakthrough solution that finally addresses the fundamental monetization challenges we've long faced in the agent ecosystem—enabling seamless, standardized compensation between AI agents while eliminating the unsustainable resource imbalances that have hindered true multi-agent collaboration.” - Tao Zhang, CTO at Manus
Mastercard: “Mastercard is committed to ongoing, responsible innovation – and we are excited to be collaborating with Google, leading banks, merchants, AI platforms and other industry leaders to help shape the future of agentic commerce. These efforts include critical work with standards bodies such as the FIDO Alliance, where we are advancing verifiable credentials to capture and secure consumers’ intent in this dynamic new context. Together, we’re playing an essential role in securing the payments ecosystem – ensuring that trust and safety remain at the core of every transaction.” – Pablo Fourez, Chief Digital Officer at Mastercard
MetaMask: “Blockchains are the natural payment layer for agents, and Ethereum will be the backbone of this. With Agent Payments Protocol (AP2) and x402, MetaMask will deliver maximum interoperability for developers and will enable users to pay agents with full composability and choice—while retaining the security and control of true self-custody” - Marco De Rossi, AI Lead at MetaMask
Mesh: “For AI to truly drive commerce, agents need a secure and universal way to handle payments. Google's new Agent Payments Protocol (AP2) is a huge step forward, providing the foundational framework to make this possible. We're proud to support this effort because it unlocks the full potential of agent-led commerce, particularly with programmable assets like crypto. Our technology abstracts away the complexity of the crypto ecosystem, giving agents seamless access to hundreds of wallets and exchanges and supporting over 100 tokens. This ensures payments are not just completed, but are routed through the most efficient paths to guarantee speed and success.” - Bam Azizi, CEO and co-Founder at Mesh
Mysten Labs: “Verified agents making purchases on behalf of verified users is the next frontier for AI-powered automation. Google's Agent Payments Protocol (AP2) combines programmable payments via modern blockchains like Sui with open protocols like A2A and MCP that are enjoying rapid growth. It's the perfect substrate for real-world agentic commerce." - Sam Blackshear, Chief Technology Officer and Co-Founder at Mysten Labs, the original contributor to Sui.
Nexi: “We are delighted to partner with Google Cloud on AP2 in order to contribute to shape a fundamental paradigm shift in commerce. As part of our DNA of being European by scale and local by nature, we aim at empowering European merchants to continue to compete on a global scale, while delivering frictionless and personalised online shopping experiences to consumers. Leveraging Agentic AI eCommerce technology from Google Cloud, we will continue to simplify payments for our merchants and partners" – Roberto Catanzaro, Chief Business Officer Merchant Solutions at Nexi
Okta: “Extending the A2A protocol into payments is an important step toward building a secure, interoperable foundation for commerce between AI agents. At Auth0, we’re excited to support the Agent Payments Protocol (AP2) and help ensure that future payments between AI agents are both seamless and secure.” – Stephen Lee, Vice President, Technical Strategy and Partnerships at Okta
Payoneer: “At Payoneer, we see enormous potential in AI agents to simplify financial workflows for millions of small businesses. By supporting the Agent Payments Protocol (AP2), we’re ensuring agents can collaborate securely and seamlessly, just as our platform connects SMBs worldwide.” – Guy Shalev, Vice President of AI at Payoneer
PayPal: "AP2 provides the critical foundation for trusted agent payments, giving the ecosystem much needed clarity on how to facilitate trusted transactions. PayPal is fully aligned with this vision and excited to build on it, bringing our commerce expertise to help extend these principles across the entire purchase journey.” - Prakhar Mehrotra, SVP and Global Head of AI at PayPal
PwC: "PwC is committed to fostering innovation with agentic AI that focuses on maintaining trust, safety and privacy for critical tasks like payments and money movement broadly. We believe the Agent Payments Protocol (AP2) and extension to the Agent2Agent protocol represent a significant leap forward, enhancing safety without compromising information." – Scott Likens, Global / US Chief AI Engineering Officer at PwC
Salesforce: "With extensive expertise in powering digital commerce, Salesforce is excited to help businesses harness agentic payments at scale - creating truly frictionless commerce experiences and driving the productivity that is crucial to becoming an Agentic Enterprise today. " – Nitin Mangtani, SVP & GM Commerce & Retail Cloud at Salesforce
ServiceNow: “Our partnership with Google is focused on unlocking the full potential of the agentic ecosystem. As an early adopter and launch partner of the Agent2Agent protocol, we’re excited to see autonomous AI Agents now empowered to seamlessly conduct eCommerce transactions. Together, we’re advancing the next generation of sales and procurement workflows—rooted in trust, security, and governance—while setting a new standard for how enterprises scale with agentic AI.” - Jon Sigler, EVP & GM, AI Platform, ServiceNow
Shopee: "At Shopee, we see immense potential for agents to transform e-commerce, and believe that industry protocols such as Google’s Agent Payments Protocol (AP2) will be critical to enabling this future.” - David Chen, Chief Product Officer at Shopee
Worldpay: “Worldpay shares Google's vision of an open, interoperable foundation for agentic commerce, built on trust and safety to empower merchants and shoppers. The AP2 protocol represents a meaningful first step in defining how agents, merchants, and payment providers can transact securely, at scale.” – Cindy Turner, Chief Product Officer at Worldpay
1password: “Open protocols like A2A and AP2 are critical to driving broad adoption of AI while ensuring security and transparency remain foundational. At 1Password, we see support for digital payment credentials as just the beginning. The future is multi-agent, and managing agent access and authorization starts with securing credentials, all while upholding our core security values of privacy, transparency, and trust.” – Anand Srinivas, Vice President, Product & AI at 1Password

Get started by checking out our public GitHub repository to see the complete technical specification, documentation, and reference implementations.

OpenTelemetry Protocol comes to Google Cloud Observability

Fri, 12 Sep 2025 16:00:00 +0000

OpenTelemetry Protocol (OTLP) is a data exchange protocol designed to transport telemetry from a source to a destination in a vendor-agnostic fashion. Today, we’re pleased to announce that Cloud Trace, part of Google Cloud Observability, now supports users sending trace data using OTLP via telemetry.googleapis.com.

Fig 1: Both in-process and collector based configurations can use native OTLP exporters to transmit telemetry data

Using OTLP to send telemetry data to observability tooling with these benefits:

Vendor-agnostic telemetry pipelines: Use native OTLP exporters from in-process or collectors. This eliminates the need to use vendor-specific exporters in your telemetry pipelines.
Strong telemetry data integrity: Ensure your telemetry data preserves the OTel data model during transmission and storage and avoid transformations into proprietary formats.
Interoperability with your choice of observability tooling: Easily send telemetry to one or more observability backends that support native OTLP without any additional OTel exporters
Reduced client-side complexity and resource usage: Move your telemetry processing logic such as applying filters to the observability backend, reducing the need for custom rules and thus client-side processing overhead.

Let’s take a quick look at how to use OTLP from Cloud Trace.

Cloud Trace and OTLP in action

Sending trace data using OTLP via telemetry.googleapis.com is now the recommended best practice for both new and existing users — especially for those who expect to send high volumes of trace data.

Fig 2: Trace explore page in Cloud Trace highlighting fields that leverage OpenTelemetry semantic conventions

The Trace explorer page makes extensive use of OpenTelemetry conventions to offer a rich user experience when filtering and finding traces of interest. For example,

The OpenTelemetry convention service.name is used to indicate which services a span is originating from.
The status of the span is indicated by the OpenTelemetry’s span status.

Cloud Trace’s internal storage system now uses the OpenTelemetry data model natively for organizing and storing your trace data. The new storage system enables much higher limits when trace data is sent through telemetry.googleapis.com. Key changes include:

Attribute sizes: Attribute keys can now be up to 512 bytes (from 128 bytes), and values up to 64 KiB (from 256 bytes).
Span details: Span names can be up to 1024 bytes (from 128 bytes), and spans can have up to 1024 attributes (from 32).
Event and link counts: Events per span increase to 256 (from 128), and links per span are now 128.

We believe sending your trace data using OTLP will result in an better user experience in the trace explorer UI and Observability Analytics, along with the above storage limit increases.

Google Cloud’s vision for OTLP

Providing OTLP support for Cloud Trace is just the beginning. Our vision is to leverage OpenTelemetry to generate, collect, and access telemetry across Google Cloud. Our commitment to OpenTelemetry extends across all telemetry types — traces, metrics, and logs — and is a cornerstone of our strategy to simplify telemetry management and foster an open cloud environment.

We understand that in today's complex cloud environments, managing telemetry data across disparate systems, inconsistent data formats, and vast volumes of information can lead to observability gaps and increased operational overhead. We are dedicated to streamlining your telemetry pipeline, starting with focusing on native OTLP ingestion for all telemetry types so you can seamlessly send your data to Google Cloud Observability. This will help foster true vendor neutrality and interoperability, eliminating the need for complex conversions or vendor-specific agents.

Beyond seamless ingestion, we're also building capabilities for managed server-side processing, flexible routing to various destinations, and unified management and control over your telemetry across environments. This will further our observability experience with advanced processing and routing capabilities all in one place.

The introduction of OTLP trace ingestion with telemetry.googleapis.com is a significant first step in this journey. We're continually working to expand our OpenTelemetry support across all telemetry types with additional processing and routing capabilities to provide you with a unified and streamlined observability experience on Google Cloud.

Get started today

We encourage you to begin using telemetry.googleapis.com for your trace data by following this migration guide. This new endpoint offers enhanced capabilities, including higher storage limits and an improved user experience within Cloud Trace Explorer and Observability Analytics.

Automate app deployment and security analysis with new Gemini CLI extensions

Wed, 10 Sep 2025 14:00:00 +0000

Find and fix security vulnerabilities. Deploy your app to the cloud. All without leaving your command-line.

Today, we’re closing the gap between your terminal and the cloud with a first look at the future of Gemini CLI, delivered through two new extensions: security extension and Cloud Run extension. These extensions are designed to handle critical parts of your workflows with simple, intuitive commands:

1) /security:analyze performs a comprehensive scan right in your local repository, with support for GitHub pull requests coming soon. This makes security a natural part of your development cycle.

2) /deploy deploys your application to Cloud Run, our fully managed serverless platform, in just a few minutes.

These commands are the first expression of a new extensibility framework for Gemini CLI. While we'll be sharing more about the full Gemini CLI extension world soon, we couldn't wait to get these capabilities into your hands. Consider this a sneak peak of what’s coming next!

Security extension: automate security analysis with /security:analyze

To help teams address software vulnerabilities early in the development lifecycle, we are launching the Gemini CLI Security extension. This new open-source tool automates security analysis, enabling you to proactively catch and fix issues using the /security:analyze command at the terminal or through a soon-coming GitHub Actions integration.

Integrated directly into your local development workflow and CI/CD pipeline, this extension:

Analyzes code changes: When triggered, the extension automatically takes the git diff of your local changes or pull request.
Identifies vulnerabilities: Using a specialized prompt and tools, Gemini CLI analyzes the changes for a wide range of potential vulnerabilities, such as hardcoded-secrets, injection vulnerabilities, broken access control, and insecure data handling.
Provides actionable feedback: Gemini returns a detailed, easy-to-understand report directly in your terminal or as a comment on your pull request. This report doesn't just flag issues; it explains the potential risks and provides concrete suggestions for remediation, helping you fix issues quickly and learn as you go.

And after the report is generated, you can also ask Gemini CLI to save it to disk or even implement fixes for each issue.

Getting started with /security:analyze

Integrating security analysis into your workflow is simple. First, download the Gemini CLI and install the extension (requires Gemini CLI v0.4.0+):

code_block: <ListValue: [StructValue([('code', 'gemini extensions install https://github.com/google-gemini/gemini-cli-security'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45e9b9ed00>)])]>

Then you can start run your first scan:

Locally: After making local changes, simply run /security:analyze in the Gemini CLI.
In CI/CD (Coming Soon): We're bringing security analysis directly into your CI/CD workflow. Soon, you’ll be able to configure the GitHub Action to automatically review pull requests as they are opened.

This is just the beginning. The team is actively working on further enhancing the extension's capabilities, and we are also inviting the community to contribute to this open source project by reporting bugs, suggesting features, continuously improving security practices and submitting code improvements.

For complete documentation and to contribute, visit the official GitHub repository.

Cloud Run extension: automate deployment with /deploy

The /deploy command in Gemini CLI automates the entire deployment pipeline for your web applications. You can now deploy a project directly from your local workspace. Once you issue the command, Gemini returns a public URL for your live application.

The /deploy command automates a full CI/CD pipeline to deploy web applications and cloud services from the command line using the Cloud Run MCP server. What used to be a multi-step process of building, containerizing, pushing, and configuring is now a single, intuitive command from within the Gemini CLI.

You can access this feature across three different surfaces – in Gemini CLI in the terminal, in VS Code via Gemini Code Assist agent mode, and in Gemini CLI in Cloud Shell.

Use /deploy command in Gemini CLI at the terminal to deploy application to Cloud Run

Get started with /deploy:

For existing Google Cloud users, getting started with /deploy is straightforward in Gemini CLI at the terminal:

Prerequisites: You'll need the gcloud CLI installed and configured on your machine and have an existing app or use Gemini CLI to create one.

Step 1: Install the Cloud Run extension
The /deploy command is enabled through a Model Context Protocol (MCP) server, which is included in the Cloud Run extension. To install the Cloud Run extension (Requires Gemini CLI v0.4.0+), run this command:

code_block: <ListValue: [StructValue([('code', 'gemini extensions install https://github.com/GoogleCloudPlatform/cloud-run-mcp'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45e9b9eeb0>)])]>

Step 2: Authenticate with Google Cloud
Ensure your local environment is authenticated to your Google Cloud account by running:

code_block: <ListValue: [StructValue([('code', 'gcloud auth login\r\ngcloud auth application-default login'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea0b6790>)])]>

Step 3: Deploy your app
Navigate to your application's root directory in your terminal and type gemini to launch Gemini CLI. Once inside, type /deploy to deploy your app to Cloud Run.

That's it! In a few moments, Gemini CLI will return a public URL where you can access your newly deployed application. You can also visit the Google Cloud Console to see your new service running in Cloud Run.

Besides Gemini CLI at the terminal, this feature can also be accessed in VS Code via Gemini Code Assist agent mode, powered by Gemini CLI, and in Gemini CLI in Cloud Shell, where the authentication step will be automatically handled out of the box.

Use /deploy command to deploy application to Cloud Run in VS Code via Gemini Code Assist agent mode.

Building a robust extension ecosystem

The Security and Cloud Run extensions are two of the first extensions from Google built on our new framework, which is designed to create a rich and open ecosystem for the Gemini CLI. We are building a platform that will allow any developer to extend and customize the CLI's capabilities, and this is just an early preview of the full platform's potential. We will be sharing a more comprehensive look at our extensions platform soon, including how you can start building and sharing your own.

Try Gemini CLI today, visit the GitHub here.

Build with more flexibility: New open models arrive in the Vertex AI Model Garden

Wed, 16 Jul 2025 21:30:00 +0000

In our ongoing effort to provide businesses with the flexibility and choice needed to build innovative AI applications, we are expanding the catalog of open models available as Model-as-a-Service (MaaS) offerings in Vertex AI Model Garden. Following the addition of Llama 4 models earlier this year, we are announcing DeepSeek R1 is available for everyone through our Model-as-a-Service (MaaS) offering. This expansion reinforces our commitment to an open AI ecosystem, ensuring our customers can access a diverse range of powerful models to find the one best suited for their specific use case.

Deploying and managing today's large-scale models presents operational and financial challenges. For instance, a large model such as DeepSeek R1 can require an infrastructure of eight advanced H200 GPUs to run inference. For many organizations, procuring and managing such resources is a major undertaking that can divert focus from core application development.

Vertex AI’s MaaS offering is designed to remove this complexity. By providing these models as fully managed, serverless APIs, we eliminate the need for customers to provision or manage the underlying infrastructure. This allows your teams to bypass the complexities of GPU management and focus directly on building and innovating. With Vertex AI, you benefit from a secure, enterprise-grade platform with built-in data privacy and compliance, all under a flexible, pay-as-you-go pricing model that scales with your needs.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud AI and ML'), ('body', <wagtail.rich_text.RichText object at 0x7f45e9bbcc40>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/'), ('image', None)])]>

Getting started

Below we provide a step-by-step guide on how you can use open models available on MaaS. We have used DeepSeek R1 on Vertex AI as an example. It can be accessed both via the UI and API.

1. Enable the DeepSeek API Service

Navigate to the DeepSeek API Service from the Vertex AI Model Garden and click on the title to open the model card. Then, enable access to the DeepSeek API Service. It may take a few minutes for permissions to propagate after enablement.

DeepSeek API Service from the Vertex AI Model Garden

2. Try out the model via the UI

Navigate to the DeepSeek API Service from the Vertex AI Model Garden and click on the tile to open the model card. You can use the UI in the sidebar to test the service.

DeepSeek API Service with UI sidebar to test the service

3. Try out the model via Vertex AI API

To integrate DeepSeek R1 within your applications, you can use either REST API or OpenAI Python API Client Library. Note: For security of your data, DeepSeek MaaS endpoint does not have any outbound internet access.

Get Predictions via the REST API

You can make API requests via curl from the Cloud Shell or your machine with gcloud credentials configured. Remember to replace the placeholders with this code:

code_block: <ListValue: [StructValue([('code', 'export PROJECT_ID=<ENTER_PROJECT_ID>\r\nexport REGION_ID=<ENTER_REGION_ID> \r\n\r\ncurl \\\r\n-X POST \\\r\n-H "Authorization: Bearer $(gcloud auth print-access-token)" \\\r\n-H "Content-Type: application/json" \\\r\n"https://${REGION_ID}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION_ID}/endpoints/openapi/chat/completions" \\\r\n-d \'{\r\n "model": "deepseek-ai/deepseek-r1-0528-maas",\r\n "max_tokens": 200,\r\n "stream": true,\r\n "messages": [\r\n {\r\n "role": "user",\r\n "content": "which is bigger - 9.11 or 9.9"\r\n }\r\n ]\r\n}\''), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45e9bbc910>)])]>

Get Predictions via the OpenAI Python API Client Library

Install the OpenAI Python API Library:

code_block: <ListValue: [StructValue([('code', 'pip install openai'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45e9bbcd00>)])]>

Initialize the client and configure the endpoint URL. To get the access token to use as an API key, you can read more here. If run from a local machine, GOOGLE_APPLICATION_CREDENTIALS will authenticate your requests.

code_block: <ListValue: [StructValue([('code', 'import os\r\nimport openai\r\n\r\nPROJECT_ID = “ENTER_PROJECT_ID”\r\nLOCATION = "us-central1"\r\nMODEL_ID = "deepseek-ai/deepseek-r1-0528-maas"\r\nAPI_KEY = os.environ["GOOGLE_APPLICATION_CREDENTIALS"] # or add output from gcloud auth print-access-token \r\n\r\ndeepseek_vertex_endpoint_url = (\r\n f"https://{LOCATION}-aiplatform.googleapis.com/v1beta1/"\r\n f"projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/openapi"\r\n)\r\n\r\nclient = openai.OpenAI(\r\n base_url=deepseek_vertex_endpoint_url,\r\n api_key=API_KEY\r\n)'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f45e9bbc550>)])]>

Make completions requests via the client:

code_block: <ListValue: [StructValue([('code', 'response = client.chat.completions.create(\r\n model="deepseek-ai/deepseek-r1-0528-maas",\r\n messages=[\r\n {"role": "system", "content": "You are a helpful assistant"},\r\n {"role": "user", "content": "How many r\'s are in strawberry ?"},\r\n ],\r\n stream=False,\r\n)\r\n\r\nprint(response.choices[0].message.content)\r\n\r\n# ChatCompletion("id=""",\r\n# "choices="[\r\n# "Choice(finish_reason=""length",\r\n# index=0,\r\n# "logprobs=None",\r\n# "message=ChatCompletionMessage(content=""<think>\\nFirst, the question is: \\"How many r\\\\\'s are in strawberry?\\" I need to count the number of times the letter \\\\\'r\\\\\' appears in the word \\"strawberry\\".\\n\\nLet me write down the word: S-T-R-A",\r\n# "refusal=None",\r\n# "role=""assistant",\r\n# "annotations=None",\r\n# "audio=None",\r\n# "function_call=None",\r\n# "tool_calls=None))"\r\n# ],\r\n# created=,\r\n# "model=""deepseek-ai/deepseek-r1-0528-maas",\r\n# "object=""chat.completion",\r\n# "service_tier=None",\r\n# "system_fingerprint=""",\r\n# usage=CompletionUsage(completion_tokens=50,\r\n# prompt_tokens=18,\r\n# total_tokens=68,\r\n# "completion_tokens_details=None",\r\n# "prompt_tokens_details=None))"'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f45e9bbc100>)])]>

What's next?

Vertex AI Model Garden opens up new possibilities for building applications that require state-of-the-art foundation models. Here are some next steps:

Review documentation guide for DeepSeek R1 MaaS here and Llama MaaS here
Review pricing here for both models
Explore the Model Garden: Discover other models available as managed services
Build a proof-of-concept: Start with a small project to understand the model's capabilities
Join the community: Share your experiences and learn from others in the Google Cloud AI Community

Introducing the next generation of AI inference, powered by llm-d

Tue, 20 May 2025 12:00:00 +0000

As the world transitions from prototyping AI solutions to deploying AI at scale, efficient AI inference is becoming the gating factor. Two years ago, the challenge was the ever-growing size of AI models. Cloud infrastructure providers responded by supporting orders of magnitude more compute and data. Today, agentic AI workflows and reasoning models create highly variable demands and another exponential increase in processing, easily bogging down the inference process and degrading the user experience. Cloud infrastructure has to evolve again.

Open-source inference engines such as vLLM are a key part of the solution. At Google Cloud Next 25 in April, we announced full vLLM support for Cloud TPUs in Google Kubernetes Engine (GKE), Google Compute Engine, Vertex AI, and Cloud Run. Additionally, given the widespread adoption of Kubernetes for orchestrating inference workloads, we introduced the open-source Gateway API Inference Extension project to add AI-native routing to Kubernetes, and made it available in our GKE Inference Gateway. Customers like Samsung and BentoML are seeing great results from these solutions. And later this year, customers will be able to use these solutions with our seventh-generation Ironwood TPU, purpose-built to build and serve reasoning models by scaling to up to 9,216 liquid-cooled chips in a single pod linked with breakthrough Inter-Chip Interconnect (ICI). But, there’s opportunity for even more innovation and value.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud AI and ML'), ('body', <wagtail.rich_text.RichText object at 0x7f45f02566d0>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/'), ('image', None)])]>

Today, we’re making inference even easier and more cost-effective, by making vLLM fully scalable with Kubernetes-native distributed and disaggregated inference. This new project is called llm-d. Google Cloud is a founding contributor alongside Red Hat, IBM Research, NVIDIA, and CoreWeave, joined by other industry leaders AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI. Google has a long history of founding and contributing to key open-source projects that have shaped the cloud, such as Kubernetes, JAX, and Istio, and is committed to being the best platform for AI development. We believe that making llm-d open-source, and community-led, is the best way to make it widely available, so you can run it everywhere and know that a strong community supports it.

llm-d builds upon vLLM’s highly efficient inference engine, adding Google’s proven technology and extensive experience in securely and cost-effectively serving AI at billion-user scale. llm-d includes three major innovations: First, instead of traditional round-robin load balancing, llm-d includes a vLLM-aware inference scheduler, which enables routing requests to instances with prefix-cache hits and low load, achieving latency SLOs with fewer hardware resources. Second, to serve longer requests with higher throughput and lower latency, llm-d supports disaggregated serving, which handles the prefill and decode stages of LLM inference with independent instances. Third, llm-d introduces a multi-tier KV cache for intermediate values (prefixes) to improve response time across different storage tiers and reduce storage costs. llm-d works across frameworks (PyTorch today, JAX later this year), and both GPU and TPU accelerators, to provide choice and flexibility.

We are excited to partner with the community to help you cost-effectively scale AI in your business. llm-d incorporates state-of-the-art distributed serving technologies into an easily deployed Kubernetes stack. Deploying llm-d on Google Cloud provides low-latency and high-performance inference by leveraging Google Cloud’s vast global network, GKE AI capabilities, and AI Hypercomputer integrations across software and hardware accelerators. Early tests by Google Cloud using llm-d show 2x improvements in time-to-first-token for use cases like code completion, enabling more responsive applications.

Visit the llm-d project to learn more, contribute, and get started today.

How to deploy serverless AI with Gemma 3 on Cloud Run

Wed, 12 Mar 2025 07:30:00 +0000

Today, we introduced Gemma 3, a family of lightweight, open models built with the cutting-edge technology behind Gemini 2.0. The Gemma 3 family of models have been designed for speed and portability, empowering developers to build sophisticated AI applications at scale. Combined with Cloud Run, it has never been easier to deploy your serverless workloads with AI models.

In this post, we’ll explore the functionalities of Gemma 3, and how you can run it on Cloud Run.

Gemma 3: Power and efficiency for Cloud deployments

Gemma 3 is engineered for exceptional performance with lower memory footprints, making it ideal for cost-effective inference workloads.

Built with the world's best single-accelerator model: Gemma 3 delivers optimal performance for its size, outperforming Llama-405B, DeepSeek-V3 and o3-mini in preliminary human preference evaluations on LMArena’s leaderboard. This helps you to create engaging user experiences that can fit on a single GPU or TPU.
Create AI with advanced text and visual reasoning capabilities: Easily build applications that analyze images, text and short videos, opening up possibilities for interactive applications.
Handle complex tasks with a large context window: Gemma 3 offers a 128k-token context window to let your applications process and understand vast amounts of information — even entire novels — enabling more sophisticated AI capabilities..

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud AI and ML'), ('body', <wagtail.rich_text.RichText object at 0x7f45f034d730>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/'), ('image', None)])]>

Serverless inference with Gemma 3 and Cloud Run

Gemma 3 is a great fit for inference workloads on Cloud Run using Nvidia L4 GPUs. Cloud Run is Google Cloud's fully managed serverless platform, helping developers leverage container runtimes without having to concern themselves with the underlying infrastructure. Models scale to zero when inactive, and scale dynamically with demand. Not only does this optimize costs and performance, but you only pay for what you use.

For example, you could host an LLM on one Cloud Run service and a chat agent on another, enabling independent scaling and management. And with GPU acceleration, a Cloud Run service can be ready with the first AI inference results in under 30 seconds, with only 5 seconds to start an instance. This rapid deployment ensures that your applications deliver responsive user experiences. We also reduced the GPU price in Cloud Run down to ~$0.6/hr. And of course, if your service isn't receiving requests, it will scale down to zero.

Get started today

Cloud Run and Gemma 3 combine to create a powerful, cost-effective, and scalable solution for deploying advanced AI applications. Gemma 3 is supported by a variety of tools and frameworks, such as Hugging Face Transformers, Ollama, and vLLM.

To get started, visit this guide which will show you how to build a service with Gemma 3 on Cloud Run with Ollama.

Meet Kubernetes History Inspector, a log visualization tool for Kubernetes clusters

Fri, 07 Mar 2025 17:00:00 +0000

Kubernetes, the container orchestration platform, is inherently a complex, distributed system. While it provides resilience and scalability, it can also introduce operational complexities, particularly when troubleshooting. Even with Kubernetes' self-healing capabilities, identifying the root cause of an issue often requires deep dives into the logs of various independent components.

At Google Cloud, our engineers have been directly confronting this Kubernetes troubleshooting challenge for years as we support large-scale, complex deployments. In fact, the Google Cloud Support team has developed deep expertise in diagnosing issues within Kubernetes environments through routinely analyzing a vast number of customer support tickets, diving into user environments, and leveraging our collective knowledge to pinpoint the root causes of problems. To address this pervasive challenge, the team developed an internal tool: the Kubernetes History Inspector (KHI), and today, we’ve released it as open source for the community.

The Kubernetes troubleshooting challenge

In Kubernetes, each pod, deployment, service, node, and control-plane component generates its own stream of logs. Effective troubleshooting requires collecting, correlating, and analyzing these disparate log streams. But manually configuring logging for each of these components can be a significant burden, requiring careful attention to detail and a thorough understanding of the Kubernetes ecosystem. Fortunately, managed Kubernetes services such as Google Kubernetes Engine (GKE) simplify log collection. For example, GKE offers built-in integration with Cloud Logging, aggregating logs from all parts of the Kubernetes environment. This centralized repository is a crucial first step.

However, simply collecting the logs solves only half the problem. The real challenge lies in analyzing them effectively. Many issues you’ll encounter in a Kubernetes deployment are not revealed by a single, obvious error message. Instead, they manifest as a chain of events, requiring a deep understanding of the causal relationships between numerous log entries across multiple components.

Consider the scale: a moderately sized Kubernetes cluster can easily generate gigabytes of log data, comprising tens of thousands of individual entries, within a short timeframe. Manually sifting through this volume of data to identify the root cause of a performance degradation, intermittent failure, or configuration error is, at best, incredibly time-consuming, and at worst, practically impossible for human operators. The signal-to-noise ratio is incredibly challenging.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud containers and Kubernetes'), ('body', <wagtail.rich_text.RichText object at 0x7f45ea7895e0>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com'), ('image', None)])]>

Introducing the Kubernetes History Inspector

KHI is a powerful tool that analyzes logs collected by Cloud Logging, extracts state information for each component, and visualizes it in a chronological timeline. Furthermore, KHI links this timeline back to the raw log data, allowing you to track how each element evolved over time.

The Google Cloud Support team often assists users in critical, time-sensitive situations. A tool that requires lengthy setup or agent installation would be impractical. That's why we packaged KHI as a container image — it requires no prior setup, and is ready to be launched with a single command.

It's easier to show than to tell. Imagine a scenario where end users are reporting "Connection Timed Out" errors on a service running on your GKE cluster. Launching KHI, you might see something like this:

First, notice the colorful, horizontal rectangles on the left. These represent the state changes of individual components over time, extracted from the logs – the timeline. This timeline provides a macroscopic view of your Kubernetes environment. In contrast, the right side of the interface displays microscopic details: raw logs, manifests, and their historical changes related to the component selected in the timeline. By providing both macroscopic and microscopic perspectives, KHI makes it easy to explore your logs.

Now, let's go back to our hypothetical problem. Notice the alternating green and orange sections in the "Ready" row of the timeline:

This indicates that the readiness probe is fluctuating between failure (orange) and success (green). That's a smoking gun! You now know exactly where to focus your troubleshooting efforts.

KHI also excels at visualizing the relationships between components at any given point in the past. The complex interdependencies within a Kubernetes cluster are presented in a clear, understandable way.

What’s next for KHI and Kubernetes troubleshooting

We've only scratched the surface of what KHI can do. There's a lot more under the hood: how the timeline colors actually work, what those little diamond markers mean, and many other features that can speed up your troubleshooting. To make this available to everyone, we open-sourced KHI.

For detailed specifications, a full explanation of the visual elements, and instructions on how to deploy KHI on your own managed Kubernetes cluster, visit the KHI GitHub page. Currently KHI only works with GKE and Kubernetes on Google Cloud combined with Cloud Logging, but we plan to extend its capabilities to the vanilla open-source Kubernetes setup soon.

While KHI represents a significant leap forward in Kubernetes log analysis, it's designed to amplify your existing expertise, not replace it. Effective troubleshooting still requires a solid understanding of Kubernetes concepts and your application's architecture. KHI helps you, the engineer, navigate the complexity by providing a powerful map to view your logs to diagnose issues more quickly and efficiently.

KHI is just the first step in our ongoing commitment to simplifying Kubernetes operations. We're excited to see how the community uses and extends KHI to build a more observable and manageable future for containerized applications. The journey to simplify Kubernetes troubleshooting is ongoing, and we invite you to join us.

Introducing agent evaluation in Vertex AI Gen AI evaluation service

Fri, 24 Jan 2025 17:00:00 +0000

Comprehensive agent evaluation is essential for building the next generation of reliable AI. It's not enough to simply check the outputs; we need to understand the "why" behind an agent's actions – its reasoning, decision-making process, and the path it takes to reach a solution.

That’s why today, we're thrilled to announce Vertex AI Gen AI evaluation service is now in public preview. This new feature empowers developers to rigorously assess and understand their AI agents. It includes a powerful set of evaluation metrics specifically designed for agents built with different frameworks, and provides native agent inference capabilities to streamline the evaluation process.

In this post, we’ll explore how evaluation metrics work and share an example of how you can apply this to your agents.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud AI and ML'), ('body', <wagtail.rich_text.RichText object at 0x7f45eac6b910>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/'), ('image', None)])]>

Evaluate agents using Vertex AI Gen AI evaluation service

Our evaluation metrics can be grouped in two categories: final response and trajectory evaluation.

Final response asks a simple question: does your agent achieve its goals? You can define custom final response criteria to measure success according to your specific needs. For example, you can assess whether a retail chatbot provides accurate product information or if a research agent summarizes findings effectively, using appropriate tone and style.

To look below the surface, we offer trajectory evaluation to analyze the agent's decision-making process. Trajectory evaluation is crucial for understanding your agent’s reasoning, identifying potential errors or inefficiencies, and ultimately improving performance. We offer six trajectory evaluation metrics to help you answer these questions:

1. Exact match: Requires the AI agent to produce a sequence of actions (a "trajectory") that perfectly mirrors the ideal solution.

2. In-order match: The agent's trajectory needs to include all the necessary actions in the correct order, but it might also include extra, unnecessary steps. Imagine following a recipe correctly but adding a few extra spices along the way.

3. Any-order match: Even more flexible, this metric only cares that the agent's trajectory includes all the necessary actions, regardless of their order. It's like reaching your destination, regardless of the route you take.

4. Precision: This metric focuses on the accuracy of the agent's actions. It calculates the proportion of actions in the predicted trajectory that are also present in the reference trajectory. A high precision means the agent is making mostly relevant actions.

5. Recall: This metric measures the agent's ability to capture all the essential actions. It calculates the proportion of actions in the reference trajectory that are also present in the predicted trajectory. A high recall means the agent is unlikely to miss crucial steps.

6. Single-tool use: This metric checks for the presence of a specific action within the agent's trajectory. It's useful for assessing whether an agent has learned to utilize a particular tool or capability.

Compatibility meets flexibility

Vertex AI Gen AI evaluation service supports a variety of agent architectures.

With today’s launch, you can evaluate agents built with Reasoning Engine (LangChain on Vertex AI), the managed runtime for your agentic applications on Vertex AI. We also support agents built by open-source frameworks, including LangChain, LangGraph, and CrewAI – and we are planning to support upcoming Google Cloud services to build agents.

For maximum flexibility, you can evaluate agents using a custom function that processes prompts and returns responses. To make your evaluation experience easier, we offer native agent inference and automatically log all results in Vertex AI experiments.

Agent evaluation in action

Let's say you have the following LangGraph customer support agent, and you aim to assess both the responses it generates and the sequence of actions (or "trajectory") it undertakes to produce those responses.

To assess an agent using Vertex AI Gen AI evaluation service, you start preparing an evaluation dataset. This dataset should ideally contain the following elements:

User prompt: This represents the input that the user provides to the agent.
Reference trajectory: This is the expected sequence of actions that the agent should take to provide the correct response.
Generated trajectory: This is the actual sequence of actions that the agent took to generate a response to the user prompt.
Response: This is the generated response, given the agent's sequence of actions.

A sample evaluation dataset is shown below.

After you gather your evaluation dataset, define the metrics that you want to use to evaluate the agent. For a complete list of metrics and their interpretations, refer to Evaluate Gen AI agents. Some metrics you can define are listed here:

code_block: <ListValue: [StructValue([('code', 'response_tool_metrics = [\r\n "trajectory_exact_match", "trajectory_in_order_match", "safety", response_follows_trajectory_metric\r\n]'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45e977ae50>)])]>

Notice that the response_follows_trajectory_metric is a custom metric that you can define to evaluate your agent.

Standard text generation metrics, such as coherence, may not be sufficient when evaluating AI agents that interact with environments, as these metrics primarily focus on text structure. Agent responses should be assessed based on their effectiveness within the environment. Vertex AI Gen AI Evaluation service allows you to define custom metrics, like response_follows_trajectory_metric, that assess whether the agent's response logically follows from its tool choices. For more information on these metrics, please refer to the official notebook.

With your evaluation dataset and metrics defined, you can now run your first agent evaluation job on Vertex AI. Please see the code sample below.

code_block: <ListValue: [StructValue([('code', '# Import libraries \r\nimport vertexai\r\nfrom vertexai.preview.evaluation import EvalTask\r\n\r\n# Initiate Vertex AI session\r\nvertexai.init(project="my-project-id", location="my-location", experiment="evaluate-langgraph-agent)\r\n\r\n# Define an EvalTask\r\nresponse_eval_tool_task = EvalTask(\r\n dataset=byod_eval_sample_dataset,\r\n metrics=response_tool_metrics,\r\n)\r\n\r\n# Run evaluation\r\nresponse_eval_tool_result = response_eval_tool_task.evaluate( experiment_run_name="response-over-tools")'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45e977a820>)])]>

To run the evaluation, initiate an `EvalTask` using the predefined dataset and metrics. Then, run an evaluation job using the evaluate method. Vertex AI Gen AI evaluation tracks the resulting evaluation as an experiment run within Vertex AI Experiments, the managed experiment tracking service on Vertex AI. The evaluation results can be viewed both within the notebook and the Vertex AI Experiments UI. If you're using Colab Enterprise, you can also view the results in the Experiment side panel as shown below.

Vertex AI Gen AI evaluation service offers summary and metrics tables, providing detailed insights into agent performance. This includes individual user input, trajectory results, and aggregate results for all user input and trajectory pairs across all requested metrics.

Access to these granular evaluation results enables you to create meaningful visualizations of agent performance, including bar and radar charts like the one below:

Get started today

Explore the Vertex AI Gen AI evaluation service in public preview and unlock the full potential of your agentic applications.

Documentation

Evaluate gen AI agents

Notebooks

How to deploy Llama 3.2-1B-Instruct model with Google Cloud Run GPU

Thu, 14 Nov 2024 17:00:00 +0000

As open-source large language models (LLMs) become increasingly popular, developers are looking for better ways to access new models and deploy them on Cloud Run GPU. That’s why Cloud Run now offers fully managed NVIDIA GPUs, which removes the complexity of driver installations and library configurations. This means you’ll benefit from the same on-demand availability and effortless scalability that you love with Cloud Run's CPU and memory, with the added power of NVIDIA GPUs. When your application is idle, your GPU-equipped instances automatically scale down to zero, optimizing your costs.

In this blog post, we'll guide you through deploying the Meta Llama 3.2 1B Instruction model on Cloud Run. We'll also share best practices to streamline your development process using local model testing with Text Generation Inference (TGI) Docker image, making troubleshooting easy and boosting your productivity.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud AI and ML'), ('body', <wagtail.rich_text.RichText object at 0x7f45eaa71160>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/'), ('image', None)])]>

Why Cloud Run with GPU?

There are four critical reasons developers benefit from deploying open models on Cloud Run with GPU:

Fully managed: No need to worry about drivers, libraries, or infrastructure.
On-demand scaling: Scale up or down automatically based on demand.
Cost effective: Only pay for what you use, with automatic scaling down to zero when idle.
Performance: NVIDIA GPU-optimized for Meta Llama 3.2.

Initial Setup

First, create a Hugging Face token.
Second, check that your Hugging Face token has permission to access and download Llama 3.2 model weight here. Keep your token handy for the next step.
Third, use Google Cloud's Secret Manager to store your Hugging Face token securely. In this example, we will be using Google user credentials. You may need to authenticate for using gcloud CLI, setting default project ID, and enable necessary APIs, and grant access to Secret Manager and Cloud Storage.

code_block: <ListValue: [StructValue([('code', '# Authenticate CLI\r\ngcloud auth login\r\n\r\n# Set default project\r\ngcloud config set project <your_project_id>\r\n\r\n# Create new secret key, remember to update <your_huggingface_token>\r\ngcloud secrets create HF_TOKEN --replication-policy="automatic"\r\necho -n <your_huggingface_token> | gcloud secrets versions add HF_TOKEN --data-file=-\r\n\r\n# Retrieve the key\r\nHF_TOKEN=$(gcloud secrets versions access latest --secret="HF_TOKEN")'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea33c2e0>)])]>

Local debugging

Install huggingface_cli python package in your virtual environment.
Run huggingface-cli login to set up a Hugging Face credential.
Use the TGI Docker image to test your model locally. This allows you to iterate and debug your model locally before deploying it to Cloud Run.

code_block: <ListValue: [StructValue([('code', 'export LOCAL_MODEL_DIR=~/.cache/huggingface/hub\r\nexport CONTAINRE_MODEL_DIR=/root/.cache/huggingface/hub\r\nexport LOCAL_PORT=3002\r\n\r\ndocker run --gpus all -ti --shm-size 1g -p $LOCAL_PORT:8080 \\\r\n -e MODEL_ID=meta-llama/Llama-3.2-1B-Instruct \\\r\n -e NUM_SHARD=1 \\\r\n -e HF_TOKEN=$(gcloud secrets versions access latest --secret="HF_TOKEN") \\\r\n -e MAX_INPUT_LENGTH=500 \\\r\n -e MAX_TOTAL_TOKENS=1000 \\\r\n -e HUGGINGFACE_HUB_CACHE=$CONTAINRE_MODEL_DIR \\\r\n -v $LOCAL_MODEL_DIR:$CONTAINRE_MODEL_DIR \\\r\nus-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea33c760>)])]>

Deployment to Cloud Run

Deploy the model to Cloud Run with NVIDIA L4 GPU: (Remember to update SERVICE_NAME).

code_block: <ListValue: [StructValue([('code', 'export LOCATION=us-central1\r\nexport CONTAINER_URI=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310\r\nexport SERVICE_NAME=<your-cloudrun-service-name>\r\n\r\ngcloud beta run deploy $SERVICE_NAME \\\r\n --image=$CONTAINER_URI \\\r\n --args="--model-id=meta-llama/Llama-3.2-1B-Instruct,--max-concurrent-requests=1" \\\r\n --port=8080 \\\r\n --cpu=8 \\\r\n --memory=32Gi \\\r\n --no-cpu-throttling \\\r\n --gpu=1 \\\r\n --gpu-type=nvidia-l4 \\\r\n --max-instances=3 \\\r\n --concurrency=64 \\\r\n --region=$LOCATION \\\r\n --no-allow-unauthenticated \\\r\n --set-secrets=HF_TOKEN=HF_TOKEN:latest'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea33cc70>)])]>

Endpoint testing

Test your deployed model using curl
This sends a request to your Cloud Run service for a chat completion, demonstrating how to interact with the deployed model.

code_block: <ListValue: [StructValue([('code', 'URL=https://your-url.us-central1.run.app\r\n\r\n\r\ncurl $URL/v1/chat/completions \\\r\n -X POST \\\r\n -H "Authorization: Bearer $(gcloud auth print-identity-token)" \\\r\n -H \'Content-Type: application/json\' \\\r\n -d \'{\r\n "model": "tgi",\r\n "messages": [\r\n {\r\n "role": "system",\r\n "content": "You are a helpful assistant."\r\n },\r\n {\r\n "role": "user",\r\n "content": "What is Cloud Run?"\r\n }\r\n ],\r\n "max_tokens": 128\r\n }\''), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea4b7a30>)])]>

Cold start improvements with Cloud Storage FUSE

You’ll notice that it takes more than a minute during a cold start for the response to return. Can we do better?

We can use Cloud Storage FUSE. Cloud Storage FUSE is an open-source tool that lets you mount Google Cloud Storage buckets as a file system.

First, you need to download the model files and upload them to the Cloud Storage bucket. (Remember to update GCS_BUCKET).

code_block: <ListValue: [StructValue([('code', '# 1. Download model\r\nMODEL=meta-llama/Llama-3.2-1B-Instruct\r\nLOCAL_DIR=/mnt/project/google-cloudrun-gpu/gcs_folder/hub/Llama-3.2-1B-Instruct\r\nGCS_BUCKET=gs://<YOUR_BUCKET_WITH_MODEL_WEIGHT>\r\n\r\nhuggingface-cli download $MODEL --exclude "*.bin" "*.pth" "*.gguf" ".gitattributes" --local-dir $LOCAL_DIR\r\n\r\n# 2. Copy to GCS\r\ngsutil -o GSUtil:parallel_composite_upload_threshold=150M -m cp -e -r $LOCAL_DIR $GCS_BUCKET'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea4b79d0>)])]>

Now, we will create a new Cloud Run service using the deployment script as follows. (Remember to update BUCKET_NAME). You may also need to update the network and subnet name as well.

code_block: <ListValue: [StructValue([('code', 'export LOCATION=us-central1\r\nexport CONTAINER_URI=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-3.ubuntu2204.py311\r\nexport SERVICE_NAME=cloudrun-gpu-fuse-llama32-1b-instruct\r\nexport VOLUME_NAME=fuse\r\nexport BUCKET_NAME=<YOUR_BUCKET_WITH_MODEL_WEIGHT>\r\nexport MOUNT_PATH=/mnt/fuse\r\n\r\ngcloud beta run deploy $SERVICE_NAME \\\r\n --image=$CONTAINER_URI \\\r\n --args="--model-id=$MOUNT_PATH/Llama-3.2-1B-Instruct,--max-concurrent-requests=1" \\\r\n --port=8080 \\\r\n --cpu=8 \\\r\n --memory=32Gi \\\r\n --no-cpu-throttling \\\r\n --gpu=1 \\\r\n --gpu-type=nvidia-l4 \\\r\n --max-instances=3 \\\r\n --concurrency=64 \\\r\n --region=$LOCATION \\\r\n --network=default \\\r\n --subnet=default \\\r\n --vpc-egress=all-traffic \\\r\n --no-allow-unauthenticated \\\r\n --update-env-vars=HF_HUB_OFFLINE=1 \\\r\n --add-volume=name=$VOLUME_NAME,type=cloud-storage,bucket=$BUCKET_NAME \\\r\n --add-volume-mount=volume=$VOLUME_NAME,mount-path=$MOUNT_PATH'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea4b7fd0>)])]>

Next Steps

To learn more about Cloud Run with NVIDIA GPUs and to deploy your own open-source model from Hugging Face, check out these resources below:

Real-time data for real-world AI with support for Apache Flink in BigQuery

Wed, 09 Oct 2024 08:00:00 +0000

Today’s organizations aspire to become "by-the-second" businesses, capable of adapting in real time to changes in their supply chain, inventory, customer behavior, and more. They also strive to provide exceptional customer experiences, whether it's through a support interaction or an online checkout process. We believe that real-time intelligence should be accessible to all businesses, regardless of their size or budget and should be integrated into a unified data platform, so that everything works together. Today, we’re taking a big step toward helping businesses realize these aspirations, with BigQuery Engine for Apache Flink, now in preview.

Introducing BigQuery Engine for Apache Flink: Familiar Flink, now serverless

BigQuery Engine for Apache Flink provides a state-of-the art real-time intelligence platform, empowering customers to:

Use familiar streaming technologies on Google Cloud. BigQuery Engine for Apache Flink makes it easier to lift and shift existing streaming applications relying on open-source Apache Flink to Google Cloud, without rewriting code or relying on third-party services. Combined with Google Managed Service for Apache Kafka (now GA), it is easy to migrate and modernize your streaming analytics on Google Cloud.
Reduce operational burden. BigQuery Engine for Apache Flink is entirely serverless, reducing operational burden and allowing customers to focus on what they do best — innovate their businesses.
Bring real-time data to AI. Enterprise developers experimenting with gen AI are looking for a well-integrated and scalable streaming platform that’s based on familiar technologies — Apache Flink and Apache Kafka — and that they can combine with Google’s differentiated AI/ML capabilities in BigQuery.

BigQuery Engine for Apache Flink arrives during a time when Google Cloud customers are leveraging many innovations in real-time analytics, including BigQuery continuous queries, which enables customers to analyze incoming data in BigQuery in real time using SQL, and Dataflow Job Builder, which helps customers define and deploy a streaming pipeline using a visual UI.

With BigQuery Engine for Apache Flink, our streaming portfolio now spans SQL-based easy streaming with BigQuery continuous queries, popular open-source Flink and Kafka platforms, and advanced multimodal data streaming with Dataflow, including support for Iceberg. These capabilities are integrated with BigQuery, which connects your data with industry leading AI, including Gemini, Gemma and open models.

New AI capabilities unlocked when your data is real-time

As we look ahead, it's clear that generative AI has reignited interest in the potential of data-driven insights and experiences. AI, especially generative AI, is most effective when it has access to the latest context. If you’re a retailer, you can combine historical purchase data with real-time interactions to personalize shopping experiences for your customers. If you’re a financial services company, you can use up-to-the-second transactions to refine your fraud detection model. Real-time data connected to AI means fresh data for training models, real-time user assistance with Retrieval Augmented Generation (RAG), and real-time predictions and inferences for your business applications, including integrating small models like Gemma into your streaming pipelines.

We are taking a platform approach to introduce capabilities across the board so that, no matter what specific streaming architecture you need, or which streaming engine you prefer, you have the ability to leverage real-time data for your gen AI use cases. Features such as Dataflow enrichment transforms, support for Vertex AI text-embeddings, the RunInference transform, distributed counting in Bigtable, and many others make the task of building real-time AI applications easier than ever.

We are very excited to get these capabilities into your hands and continue giving you more flexibility and choice when it comes to making your unified data and AI platform operate in real-time data. Learn more about BigQuery Engine for Apache Flink and get started using it today in the Google Cloud console.

Introducing Valkey 8.0 on Memorystore: unmatched performance and fully open-source

Thu, 03 Oct 2024 16:00:00 +0000

Editor's note: Ping Xie is a Valkey maintainer on the Valkey Technical Steering Committee (TSC)

Today, we’re thrilled to announce Valkey 8.0 on Memorystore in preview, making Google Cloud the first major cloud platform to offer Valkey 8.0 as a fully managed service. Building upon the launch of Memorystore for Valkey 7.2 in August 2024, this further solidifies Google Cloud’s commitment to open source, providing you with the latest and greatest features from the Valkey open-source ecosystem.

Valkey 8.0 on Memorystore is a testament to our commitment to supporting customers such as Major League Baseball (MLB). As the most historic professional sports league, MLB uses Memorystore to power its real-time analytics, processing vast amounts of data to provide fans with insights and statistics during games.

"At MLB, we're obsessed with delivering the best possible experience for our fans. Valkey's truly open-source approach to caching is a game-changer, promising the performance and innovation we need to keep fans engaged and connected. We're excited to be part of this community and look forward to Valkey's continued innovation on Memorystore." - Rob Engel, Vice President of Software Engineering, Major League Baseball

The Valkey 8.0 release

Earlier this year, after Redis Inc. changed the license of Redis OSS from the permissive BSD 3-Clause license to a restrictive Source Available License (RSAL), the open-source community rallied to create Valkey (1, 2, 3) — a fully open-source alternative under the BSD 3-clause license. In just a few months, the Valkey community released the open source Valkey 8.0 in GA, showcasing the power of open-source collaboration and unfettered innovation.

Memorystore for Valkey 8.0 delivers enhanced performance, improved reliability, and full compatibility with Redis OSS — all as a fully Google managed service.

Improvements to the Valkey performance benchmarks are thanks to newly introduced asynchronous I/O capabilities. The enhanced I/O threading system allows the main thread and I/O threads to operate concurrently, enabling parallel processing of commands and I/O operations, and maximizing throughput by reducing bottlenecks in handling incoming requests. Memorystore for Valkey 8.0 achieves up to a 2x Queries Per Second (QPS) at microsecond latency when compared to Memorystore for Redis Cluster, allowing applications to handle higher throughput with similarly sized clusters. This makes Valkey 8.0 a great choice for high-throughput, real-time applications that aim to provide highly responsive user experiences.

Along with the throughput gain, Valkey 8.0 includes other optimizations that further enhance the overall speed of the service:

The SUNION command is optimized for faster set union operations.
The SDIFF and ZUNIONSTORE commands have been refactored for improved execution times.
The DEL command avoids Redundant deletions for expired keys.
CLUSTER SLOTS responses are cached for better throughput and reduced latency in cluster operations.
CRC64 performance is improved for large data batches, which is crucial for RDB snapshot and slot migration scenarios.

Valkey 8.0 also brings key-memory efficiency improvements, allowing you to store more data without requiring changes to your application. Keys are now embedded directly into the main dictionary, reducing memory overhead while enhancing performance. Additionally, the new per-slot dictionary splits the main dictionary by slot, further reducing the memory overhead by 16 bytes per key-value pair without degrading performance.

Meanwhile, Valkey 8.0 has improved reliability thanks to several features developed by Google that were subsequently contributed to the project, significantly enhancing cluster resilience and availability:

Automatic failover for empty shards helps ensure high availability even during the initial scaling stages, allowing new, slotless shards to fail over smoothly.
Replicating slot migration states helps ensure that all CLUSTER SETSLOT commands are synchronized across replicas before execution on the primary, reducing the risk of data unavailability during failover events, and enabling new replicas to automatically inherit the correct state.
Additionally, slot migration state recovery ensures that after a failover, the source and target nodes are updated automatically, maintaining accurate routing of requests to the correct primary without operator intervention.

Thanks to these enhancements, Valkey 8.0 clusters are more resilient against failures during slot movement, giving customers peace of mind that their data remains available even during complex scaling operations.

Compatible with Redis OSS 7.2

Just like Valkey 7.2, Valkey 8.0 maintains full backwards compatibility with Redis OSS 7.2 APIs, allowing for a seamless migration from Redis. Popular Redis clients like Jedis, redis-py, node-redis, and go-redis are fully supported so that migrating workloads to Valkey doesn’t require modifications to application code.

This fusion of open-source flexibility and managed service reliability provides you with a balance of control and convenience, making Valkey a great destination for your Redis OSS workloads.

Get started with Valkey 8.0 on Memorystore today

We invite you to get started with Valkey 8.0 on Memorystore today and experience the above enhancements for yourself. With features such as zero-downtime scaling, high availability, and RDB snapshot and AOF logging based persistence, Memorystore's Valkey 8.0 provides the performance, reliability, and scalability today’s high demanding workloads deserve.

Get started today by creating a fully managed Valkey Cluster through the Google Cloud console or gcloud, and join the growing community that is shaping the future of truly open-source data management.

What’s new in PostgreSQL 16: New features available in Cloud SQL today

Fri, 07 Jun 2024 16:00:00 +0000

In an effort to improve usability and facilitate informed decision-making, Cloud SQL customers can now use PostgreSQL 16, which introduces new features for deeper insights into database operations and enhanced usability.

In this blog post we cover some of the highlights of the PostgreSQL 16 version, including:

Improvements in observability
Performance improvements
Vacuum efficiency
Replication improvements

Let’s take a deeper look at each of these areas.

Observability improvements

Observability is an important aspect of databases, helping operators optimize resource consumption by providing insights into how resources are being utilized. Here are some important observability enhancements introduced in PostgreSQL 16.

PG_STAT_IO

PostgreSQL16 adds a new view pg_stat_io that provides insights into the Input/Output (IO) behavior of a PostgreSQL database. We can use this view to make informed decisions to optimize database performance, improve resource utilization and ensure the overall health and scalability of the database system. This view presents the stats for the entire instance.

What can we infer from this view?

Like most other pg_stat_* views, the statistics in the view are cumulative. To track changes in the pg_stat_io view over a specific time period, record the values at the beginning and end of the workload.

This view tracks the stats mainly by the columns in backend_type, io_context and io_object.

The backend_type is a connection process and can be one of client backend, background worker, checkpointer, standalone backend, autovacuum launcher, autovacuum worker. The io_context is classified based on the load as normal, bulk read, bulk write, or vacuum.

The actual stats to be considered for knowing the I/O status of the instance are reads, writes, extends, hits, evictions, and reuses.

We can monitor the shared buffers efficiency by comparing the evictions-to-hits ratio. The buffer hit ratio is considered effective when hits for each context are much higher than evictions.

The bulk reads and bulk writes indicate sequential scans. The evictions, hits and reuses for these indicate the efficiency of ring buffers in this case.

We can also observe the amount of data read or written as part of the autovacuum or vacuum process. The metric data related to autovacuum are observed by io_context =’ vacuum’ and backend_type as ‘autovacuum worker’. A vacuum process goes by backend_type as ‘standalone backend’ with io_context as ‘vacuum’.

Here’s an image of the view:

Last sequential and index scans on tables and indexes

The views pg_stat_*_tables have two new columns

last_seq_scan

last_idx_scan

Want to know when the last time sequential scan or index scan happened on your tables? Check the newly introduced columns last_seq_scan and last_idx_scan in pg_stat_*_tables.

The timestamp of the last sequential or index scan on a table is indicated in these columns. This can be helpful for identifying any “read query” issues.

Similarly, the column last_idx_scan has been introduced to pg_stat_*_indexes. This column indicates the timestamp last time the index was used. If we were to drop an index, we can make an informed decision based on the value present in this column for the index.

Statistics on the occurrence of tuples moving to a new page for updates

The views pg_stat_*_tables now has a new column, n_tup_newpage_upd.

As we perform updates on a table and want to monitor how many of the rows end up in new heap pages, we can now view this in the column n_tup_newpage_upd.

This can reveal the factors contributing to the table's growth over time. The value in this column also can be used to validate the ‘fillfactor’ set for the table. Especially for updates which are expected to be ‘HOT’, by observing the stats in this column we can establish if the ‘fillfactor’ is optimal or not.

Performance improvements

Performance is always a top priority for databases. Performance improvements are adopted much faster than other enhancements in a major version release. Here are some of the performance improvements in PostgreSQL 16.

Tables with only BRIN index on a table column are considered ‘HOT’

With PostgreSQL16, updates to a table with BRIN index are now considered as HOT considering the fillfactor for the table is optimal.’ ‘Fillfactor’ is an important setting for this update to be marked ‘HOT’. This improvement makes vacuuming such a table fast and resource-efficient.

Parallelization of FULL or OUTER joins

This performance improvement is very beneficial for selects involving very large tables joined by full or outer joins. In PostgreSQL16, this will result in a parallel hash after a parallel seq scan for each table, instead of a merge or hash after a full heap fetch. In our tests, it has shown quite a large improvement compared to PG15.

Example for full outer join

code_block: <ListValue: [StructValue([('code', 'postgres=> explain (analyze, buffers, verbose) select count(*) from object_store s full outer join object_store2 g on (s.project_id=g.project_id);\r\n QUERY PLAN \r\n-------------------------------------------------------------------------------------------------------------------------------------------------------------------------\r\n Finalize Aggregate (cost=145953.66..145953.67 rows=1 width=8) (actual time=6095.420..6236.950 rows=1 loops=1)\r\n Output: count(*)\r\n Buffers: shared hit=74980, temp read=34714 written=35020\r\n -> Gather (cost=145953.44..145953.65 rows=2 width=8) (actual time=6083.804..6236.922 rows=3 loops=1)\r\n Output: (PARTIAL count(*))\r\n Workers Planned: 2\r\n Workers Launched: 2\r\n Buffers: shared hit=74980, temp read=34714 written=35020\r\n -> Partial Aggregate (cost=144953.44..144953.45 rows=1 width=8) (actual time=6068.822..6069.193 rows=1 loops=3)\r\n Output: PARTIAL count(*)\r\n Buffers: shared hit=74980, temp read=34714 written=35020\r\n Worker 0: actual time=6053.795..6053.802 rows=1 loops=1\r\n Buffers: shared hit=23966, temp read=12066 written=11200\r\n Worker 1: actual time=6069.306..6069.313 rows=1 loops=1\r\n Buffers: shared hit=26385, temp read=10995 written=12292\r\n -> Parallel Hash Full Join (cost=83021.80..140786.80 rows=1666658 width=0) (actual time=3824.778..5852.278 rows=1333333 loops=3)\r\n Hash Cond: (g.project_id = s.project_id)\r\n Buffers: shared hit=74980, temp read=34714 written=35020\r\n Worker 0: actual time=3857.567..5832.558 rows=1361655 loops=1\r\n Buffers: shared hit=23966, temp read=12066 written=11200\r\n Worker 1: actual time=3851.661..5870.054 rows=1244012 loops=1\r\n Buffers: shared hit=26385, temp read=10995 written=12292\r\n -> Parallel Seq Scan on public.object_store2 g (cost=0.00..41652.00 rows=421200 width=16) (actual time=0.029..936.699 rows=1333333 loops=3)\r\n Output: g.project_id\r\n Buffers: shared hit=37440\r\n Worker 0: actual time=0.026..977.947 rows=1347470 loops=1\r\n Buffers: shared hit=12650\r\n Worker 1: actual time=0.043..1017.822 rows=1298124 loops=1\r\n Buffers: shared hit=12132\r\n -> Parallel Hash (cost=54050.57..54050.57 rows=1666658 width=16) (actual time=1456.617..1456.619 rows=1333333 loops=3)\r\n Output: s.project_id\r\n Buckets: 262144 Batches: 32 Memory Usage: 7968kB\r\n Buffers: shared hit=37384, temp written=17236\r\n Worker 0: actual time=1451.741..1451.743 rows=1202466 loops=1\r\n Buffers: shared hit=11238, temp written=5200\r\n Worker 1: actual time=1450.062..1450.064 rows=1516637 loops=1\r\n Buffers: shared hit=14175, temp written=6524\r\n -> Parallel Seq Scan on public.object_store s (cost=0.00..54050.57 rows=1666658 width=16) (actual time=0.023..530.669 rows=1333333 loops=3)\r\n Output: s.project_id\r\n Buffers: shared hit=37384\r\n Worker 0: actual time=0.018..506.262 rows=1202466 loops=1\r\n Buffers: shared hit=11238\r\n Worker 1: actual time=0.025..578.219 rows=1516637 loops=1\r\n Buffers: shared hit=14175\r\n Query Identifier: -5913048123863832940\r\n Planning Time: 0.211 ms\r\n Execution Time: 6237.051 ms\r\n(47 rows)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea36bd90>)])]>

Explain (generic_plan)

Prior to PostgreSQL 16, for parameterized SQLs the value of the parameter has to be passed to to obtain an execution plan. In PostgreSQL 16, with the option (generic_plan) we do not need to provide any additional values to the SQL to get the execution plan.

Example

code_block: <ListValue: [StructValue([('code', "db=> CREATE TABLE measurement (\r\n city_id int not null,\r\n logdate date not null,\r\n peaktemp int,\r\n unitsales int\r\n) PARTITION BY RANGE (logdate);\r\n\r\nCREATE TABLE measurement_y2006m02 PARTITION OF measurement\r\n FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');\r\n\r\nCREATE TABLE measurement_y2006m03 PARTITION OF measurement\r\n FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');\r\n\r\nPrepare statement\r\n\r\ndb=> PREPARE partitioned_selfjoin (int) AS\r\nSELECT *\r\n FROM measurement a\r\n JOIN measurement b\r\n ON a.peaktemp = b.peaktemp\r\n WHERE a.city_id = $1;\r\nPREPARE\r\n\r\nGet execution plan\r\n\r\nPre PostgreSQL 16: Pass a value for the parameter $1 = 10\r\n\r\ndb=> EXPLAIN EXECUTE partitioned_selfjoin(10);\r\n\r\nFor PG - 16\r\n\r\ndb=> explain (generic_plan) SELECT *\r\n FROM measurement a\r\n JOIN measurement b\r\n ON a.peaktemp = b.peaktemp\r\n WHERE a.city_id = $1;"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea36b1f0>)])]>

Vacuum improvements

Vacuum is a significant part of PostgreSQLMVCC. Vacuum releases space after deleting the dead tuples, minimizing table bloat. This prevents the database from ending up in transaction wrap-around problems. Here are some ways vacuum processes improved in PostgreSQL16.

Improved VACUUM operation performance for large tables

BUFFER_USAGE_LIMIT

PostgreSQL 16 introduces a new server variable ‘vacuum_buffer_usage_limit’ to set the ring buffers allocated for VACUUM and ANALYZE operations with a default value of 256K. Setting the ‘BUFFER_USAGE_LIMIT’ option during a VACUUM operation overrides the default value of ‘vacuum_buffer_usage_limit’ and allocates the specified ring buffer size. A larger ‘buffer_usage_limit’ can speed up vacuum operations but may displace buffers used by the main workload from ‘shared_buffers’, which may result in performance degradation. It is often advisable to limit the usage of ring buffers for VACUUM operations using ‘buffer_usage_limit’ when vacuuming very large tables. This option can be used judiciously when approaching Txid wraparound, at which point completing the VACUUM is critical. When ANALYZE is also part of the VACUUM operation, both operations together use the ring buffer size specified in ‘buffer_usage_limit’. A setting of 0 for ‘buffer_usage_limit’ results in disabling the buffer access strategy, which can result in evicting huge numbers of shared buffers, causing performance degradation. The limits for ‘buffer_usage_limit’ are between 128K and 16 GB.

VACUUM to only process TOAST tables

Now in PostgreSQL 16 we can vacuum only TOAST tables related to a relation. Historically, the option ‘process_toast’ was introduced to turn off vacuuming the TOAST table when set to FALSE. Otherwise, vacuum ran on both the main and TOAST table of a relation. In PostgreSQL 16, based on the requirement, we can either vacuum both the main and TOAST table or just do one of them that belongs to a relation. This allows better control to vacuum either main, TOAST, or both, depending on your need.

Here’s an example of how it can be applied:

code_block: <ListValue: [StructValue([('code', 'Vacuum only toast table for a relation\r\n\r\npostgres=> vacuum (PROCESS_TOAST TRUE, PROCESS_MAIN FALSE) prodattribbig;\r\n\r\nVacuumdb only toast table for a relation\r\n\r\n$ vacuumdb -h <ipaddress> -U postgres -d testdb -t prodattribbig --no-process-main'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea36be80>)])]>

vacuumdb option to process schema

Vacuumdb now has an option to vacuum or analyze all the tables belonging to a schema in the database. This is a very useful feature when we are targeting tables of only one schema.

code_block: <ListValue: [StructValue([('code', '$ vacuumdb -h <host/ipaddress> -v -U postgres -d testdb -n testschema'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45ea36ba00>)])]>

Replication improvements

Replication is an important part of the database high availability feature. In PostgreSQL 16, the community has added several usability features to replication.

Initial table synchronization in logical replication to copy rows in binary format

In PostgreSQL 16, we can initialize the copy of the rows for logical replication in binary format. This can be much faster, especially with columns that have binary data. Here is an example on how to create a subscription where in the initial data copy is in binary format:

code_block: <ListValue: [StructValue([('code', "testdb=> create subscription testtab connection 'host=10.101.0.20 port=5432 dbname=testdb user=replication_user password=<<pwd>>' PUBLICATION testtab WITH (copy_data=on, binary=true);"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f45e9b72610>)])]>

Improved logical replication apply without a primary key

Traditionally, PostgreSQL logical replication relied on full table scans for tables that lacked primary keys, impacting performance. However, with PostgreSQL 16, any available B-tree index on the table is now leveraged, significantly enhancing logical apply efficiency. Index usage statistics are available in the pg_stat_*_indexes view.

Logical decoding on standby

In PostgreSQL 16, logical decoding is enabled on the read replica, allowing subscribers to connect to the read replicas instead of the primary db instance. By doing so, the workload is shared between the primary instance and the replica, reducing strain on the former. This offloads the logical replication workload off of the primary instance onto the replica. This represents a huge performance improvement for the primary node, especially with nodes having many logical replication slots. Another advantage is, in case of a promotion of the replica, subscribers are not affected by the change and continue to operate without any hindrance. Be aware that any delay on the read replica will subsequently affect the logical subscriber, unlike before.

Try PostgreSQL 16 today

It's time to try out PostgreSQL16 on Cloud SQL with improved observability, improved logical replication, vacuuming and much more. Start your PostgreSQL16 journey on Cloud SQL from here.

How to choose a known, trusted supplier for open source software

Tue, 26 Mar 2024 16:00:00 +0000

Open-source software is used throughout the technology industry to help developers build software tools, apps, and services. While developers building with open-source software can (and often do) benefit greatly from the work of others, they should also conduct appropriate due diligence to protect against software supply chain attacks.

With an increasing focus on managing open-source software supply chain risk, both Citi and Google strive to apply more rigor across risk mitigation, especially while choosing known and trusted suppliers where open source components are sourced from.

Key open source attack vectors

The diagram above highlights key open source attack vectors. We can divide the common software supply chain security attacks into five main types:

1. Attacks at runtime leveraging vulnerabilities in the code
2. Attacks on the repositories, tooling and processes
3. Attacks on the integrity of the artifacts as they progress through the pipeline
4. Attacks on the primary open source dependencies that customers applications leverage
5. Attacks throughout the inherited transitive dependency chain of the open source packages

Application security experts have seen their work increase and get harder as these attacks have increased in recent years. Open-source components often include and depend on the functionality of other open-source components in order to function. These components can have two types of dependencies: direct and transitive.

Generally, the interactions work like this: The application makes an initial call to a direct dependency. If the direct dependency requires any outside components for it to function, those outside components are the application’s transitive dependencies.

These types of dependencies are notoriously difficult to remediate. This is because they are not readily accessible to the developer. Their code base resides with their maintainers, rendering the application entirely dependent upon their work. If the maintainer of one of these transitive dependencies releases a fix, the amount of time before it makes its way up the supply chain to impact your direct dependency could be a while.

Thus, the management of vulnerabilities needs to be extended to the full transitive dependency chain as this is where 95% of the vulnerabilities are found. Maintaining a regular upgrade and patching process for your software development lifecycle (SDLC) tooling is now a must; as is upgrading the security of both your repositories and processes combined with active security testing of each.

Tamper-evident provenance and signing can increase confidence in the ability to maintain artifact integrity throughout the pipeline. And mapping and understanding the full transitive dependency chain of all external components and depending on only known and trusted providers for these components becomes a required condition.

Recent guidance from CISA and other government agencies supports the focus on appropriately selecting and testing open source software ahead of ingestion from a trusted source. While some organizations load built software artifacts directly from public package repositories, others with a more restrictive security risk appetite will require more stringent security controls requiring the use of curated open-source software providers.

They may opt to only leverage open-source software they themselves have built from source, although this would be prohibitively expensive for most. But if they chose to use a curated third party, what checks must they look for before delegating that critical authority?

There are three main criteria to evaluate a curated OSS vendor:

1. High level of security maturity

A trusted supplier must demonstrate a high level of security maturity. Common areas of focus are to examine the security hygiene of the supplier in particular. Look for details of the vulnerability management culture and ability to quickly keep up to date with patching within the organisation. They should also have a well trained team, prepared to quickly address any incidents and a regular penetration testing team, continuously validating the security posture of the organisation.

The trusted supplier should be able to demonstrate the security of their own underlying foundational infrastructure. Check that they:

1. Have an up-to-date inventory of their own external dependencies.
2. Demonstrate knowledge and control of all ingest points.
3. Leverage a single production build service so that they can maintain a singular logical control point.
4. Meet best practice standards for managing their infrastructure including:

- - Well designed separation of duties and IAM control
  - Built-in organizational policy and guard rails to secure Zero Trust network design
  - Automated and regular patching with associated evidence
  - Support for these posture controls with complementary continuous threat detection with detection, logging and monitoring systems.
  - Bonus points if they operate with "everything as code" and with hermetic, reproducible and verifiable builts.

2. High level of internal SDLC security

The security of the SDLC used within the trusted supplier must be extremely high, particularly around the control plane of the SDLC and the components that interact with the source code to build the end product. Each system must be heavily secured and vetted to ensure any changes to the software is reviewed, audited, and requires multi-party approvals before progressing to the next stage or deployment. Strong authentication and authorisation policies must be in place to ensure that only highly trusted individuals could ever build, or change the vendor infrastructure.

The SDLC security also needs to extend to the beginning of the ingestion of the source code material into the facility and to any code or functionality used within the control plane of the system itself.

3. Effective insider threat program

As the trusted supplier is a high value target, there will be the potential for an insider threat as an attack vector.Therefore, the curated vendor would be expected to have an active and effective insider threat program. This personnel vetting approach should also extend to ensuring the location of all staff are within approved proximity and not outsourced.

Trust but verify

It is also important that the trusted supplier provide supporting evidence and insights. This evidence includes:

Checkable attestations on infrastructure security and processes via third party certifications and/or your own independent audit.
Checkable attestations for the security posture and processes for their SDLC against a standard framework like SLSA or SSDF.
Cryptographic signatures on the served packages and any associated accompanying metadata so that you can verify source and distribution integrity.

The actual relevance and security risk of an issue in a package is the combination of

inherent criticality of in isolation, the context it's used in, the environmental conditions in which its deployed, any external compensating controls, and decreased or increased risk in the environment. The figure below shows the interrelationship and interaction between vulnerabilities and threats in the application and those from the underlying infrastructure.

4. Enhanced security and risk metadata that should accompany each served package to increase your understanding and insights to both the inherent component risk of the code or artifact as well as how that risk can change in context of your specific application and environment. Key metadata can include:

Standard SBOM with SCA insights - vulnerabilities, licensing info, fully mapped transitive dependencies and associated vulnerability and licensing risk.
VEX statements for how the inherited vulnerabilities from transitive dependencies affect the primary package being served.
Any related threat intelligence specific to the package, use case, or your organization.

The ability of the supplier to provide this type of enhanced data reinforces the evidence that they have achieved a high level of security and that the components they serve represent assured and more trustable ingredients you can employ with greater confidence.

Better control and balancing benefits of open source components

Leveraging open source components is critical to developer velocity, quality and accelerating innovation and execution. Applying these recommendations and requirements can enable you to better control and balance the benefits of using open source components with the potential risk of introducing targetable weak points in your SDLC and ultimately reduce your risk and exposure.

Google Cloud’s Assured Open Source Software (Assured OSS) service for Java and Python ecosystems gives any organization that uses open source software the opportunity to leverage the security and experience Google applies to open source dependencies by incorporating the same OSS packages that Google secures and uses into their own developer workflows.

Learn more about Assured Open Source Software, enable Assured OSS through our self-serve onboarding form, use the metadata API to list available Python and Java packages and determine which Assured OSS packages you want to use.

A window into protein folding: Lowering the barriers for AlphaFold Inferencing

Mon, 11 Mar 2024 16:00:00 +0000

The open-source tool Vertex AI AlphaFold Inference Pipeline has enabled biotech companies in streamlining protein-folding activities, accelerating their go to market timeline. It addresses key challenges in protein structure prediction by unleashing the power of parallel processing, optimizing compute resources, and scaling to meet high-throughput demands. Furthermore, it ensures reproducibility, lineage analysis, flexibility, adaptability, and seamless integration with upstream and downstream systems – all within Vertex AI as the one-stop platform. With this tool, researchers can unlock new possibilities, make groundbreaking discoveries faster than ever before, and drive end-to-end efficiency in their biotech drug discovery efforts.

However, even with Google Cloud's efforts to make the AlphaFold algorithm more accessible to biotech firms, many bioscience organizations still struggle to integrate this technology seamlessly into their researchers' workflows.

The biggest challenge is this: scientists who obsess over protein shapes aren't usually coding ninjas or cloud wizards. Asking them to wrestle with complicated setups just to get a glimpse of a protein is like asking a chef to build their own oven before they can cook dinner. It's not the best recipe for success (or tasty results).

Solution Overview

To reduce the friction, we are making our Vertex AI AlphaFold Inference Pipeline easier to use, including introducing a user-friendly AlphaFold Portal – think of it like protein modeling for beginners. We empower scientists, irrespective of their prior experience with cloud computing, to derive protein structures with minimal effort. The portal eliminates the need to engage with intricate coding (like Python on a Jupyter notebook), enabling users to focus on protein inference results iterations.

The Google Cloud AlphaFold repository now includes the option to deploy this serverless portal, which offers a streamlined, secure, and centralized way to manage protein folding experiments. Launch new experiments with a single click, simplifying workflows and saving valuable time.

Centralized Pipelines

The portal makes researchers' work more efficient in several ways:

Centralized access: Multiple researchers can access the portal through a single web address instead of running their own Jupyter notebook instances or deploying infrastructure on separate projects.
Streamlined protein folding: Researchers can run protein folding pipeline jobs under their usernames and filter simulation results based on other researchers' work. This allows for easy comparison and fine-tuning.
Enhanced collaboration: Previously, each researcher needed to run their own Jupyter notebook instance to run each protein-folding job. Now, they can collaborate more easily by accessing and comparing simulation results in a centralized location.

1- AlphaFold Portal Dashboard

Consider this dashboard to be the central hub for protein folding endeavors. Users can personalize the display, expertly filter results, and utilize designated link buttons to directly access protein resources. The need to navigate through complex configuration or executions has now been simplified.

Are you prepared to engage in protein folding? With just two clicks, your sequence (in FASTA format) will be processed and simulated. The UI will auto select recommendations for the optimal GPU machine configuration based on the type and size of your protein. However, if you are not satisfied with the suggested settings, you have the option to expand the advanced settings and customize them to your desired specifications.

2 - New Protein Folding

Furthermore, we have integrated a preview function for your protein models. Tapping into an open-source visualization tool, you can now seamlessly explore the intricate molecular structures without leaving the interface.

3 - Protein structure visualization

This tool empowers everyone in your biotech organization to harness the power of protein folding, regardless of their cloud or coding experience. Executing this highly complex and compute intensive workload seamlessly on a streamlined, optimized infrastructure, ensuring efficiency and ease of use.

Getting started

If you're a Google Cloud newbie, no worries! We recommend checking out the Getting Started page to get familiarized with Google Cloud. Then, create a project to house all this protein-folding magic.

To proceed, follow the instructions provided in the open-source Google Cloud AlphaFold repository, accessible via the link. This repository contains convenient, pre-built templates that will assist you in setting up all the necessary components. Kindly note that this part of the process may require some technical expertise. If you encounter any challenges or require guidance, your dedicated GCP representative is readily available to assist you in navigating the complexities of the cloud.

A decade of Kubernetes leadership: why Google Cloud should be your choice for Kubernetes

Thu, 02 Nov 2023 16:00:00 +0000

Kubernetes has become a critical part of the modern software development landscape. Originally developed by Google, it is now the second largest open source project in history, with over 83,000 unique contributors over the past decade, and is the de facto standard for running containerized applications in production.

Kubernetes has also helped to democratize the cloud, making it possible for businesses of all sizes to take advantage of the cloud with the benefits of containerization. A powerful and flexible platform that can run a wide variety of applications, Kubernetes is used by companies of all sizes and powers some of the world's largest and most complex applications. More recently, with the explosion of generative AI and large language models (LLMs), companies are turning to Kubernetes to run and scale complex and compute-intensive machine learning platforms.

The success of Kubernetes is a testament to the power of open-source software. Kubernetes is a radically open, community-first project. Tens of thousands of developers from across the globe contribute to it, enhancing its capabilities and adapting it to new use cases. As a result, Kubernetes continues to evolve at a pace that is only possible through open source.

Open-sourcing Kubernetes expanded opportunities for an entire industry

Kubernetes was born at Google and released as open source in 2014. Its roots trace back to Google’s internal Borg system (introduced between 2003 and 2004), which powers everything from Google Search to Maps to YouTube. On average, Google launches more than 4 billion containers a week!

Open-sourcing Kubernetes was a revolutionary move. It spawned the Cloud Native Computing Foundation (CNCF) and fostered a community of contributors and users around the world. As this global community continues to grow, Google’s commitment to Kubernetes is stronger than ever, acting as a steward and providing consistent leadership to ensure its continued growth.

Today, Google is the largest contributor to Kubernetes with over one million contributions — that’s more than the next four organizations combined. In addition to investing time and development resources, Google Cloud also donates millions of dollars per year to support the infrastructure needed to host Kubernetes containers and build and test each release.

Looking strictly at cloud providers over the past year, Google Cloud has made three times the number of contributions as the next closest provider:

Source: Kubernetes Companies Statistics - Past Year

Our contributions to and engagements in Kubernetes are far-reaching:

Co-chairing and acting as technical leads for many core Special Interest Groups (SIGs) including API Machinery, Autoscaling, Networking, Scheduling, and Storage.
Identifying and resolving complex problems that impact both the community and Google's customers. For example, Google has invested heavily with the community on improving upgrades and deprecations for all of Kubernetes, which has helped provide a much more stable platform for all customers.
Fixing over half of the security vulnerabilities that have been found in Kubernetes. This is a significant contribution to Kubernetes security, and demonstrates Google's commitment to keeping Kubernetes secure for users.
Working closely with Googlers who work on Go to keep Kubernetes secure with updated Go versions. The Go team is responsible for developing the Go programming language, which is used to write Kubernetes code. Googlers work closely with the Go team to ensure that Kubernetes is compatible with the latest Go versions, and to fix any security vulnerabilities that are found in Go.
Leading the development of Pod Security Standards, a set of best practices for securing Kubernetes pods. Googlers have been leading the development of these standards, and have published a number of guides and resources to help users secure their Kubernetes pods.
Creating the initial Container Storage Interface (CSI) specification, defining how containers can access storage. Googlers were involved in the early development of CSI, and they helped to create the initial specification. CSI is now widely used by open source and commercial storage vendors.
Creating the Common Expression Language (CEL) for expressing queries and transformations on structured data. CEL is used in a variety of Kubernetes components, including Validating Admission Policy and Custom Resource Validation Expressions. CEL is a powerful and flexible language that has helped to improve the extensibility and usability of Kubernetes.

Google's contributions to Kubernetes have been significant and have helped make the platform more robust, scalable, secure, and reliable. Moreover, Google continues to push Kubernetes forward into new domains such as batch processing and machine learning, with contributions to CNCF such as job queueing with Kueue and ML operations and workflows with Kubeflow. These contributions matter; if the Kubernetes community is thriving, it’s thanks to a core group of individuals and companies actually investing their time in the critical “chopping wood and carrying water” tasks and building new functionality from which everyone can benefit. For Kubernetes to continue to be a great platform for new workloads such as AI/ML, we need more companies who benefit from Kubernetes to do their part and contribute.

Why customers trust Google Kubernetes Engine for mission-critical workloads

Google Kubernetes Engine (GKE) is the most scalable and fully automated Kubernetes service available. It is a popular choice for businesses of all sizes and industries, and is used to host some of the world's largest and most complex applications. With GKE, you can be confident that your applications are running on a reliable and scalable platform that is backed by Google Cloud's expertise. GKE now includes multi-cluster and distributed team management, policy enforcement with Policy Controller, GitOps-based configuration with Config Sync, self-service provisioning of your Google Cloud Resources with Config Controller, and a fully managed Istio-powered service mesh. All of these new capabilities are integrated with GKE Enterprise and are ideal for customers getting started with Kubernetes or those already deployed globally.

Customers use GKE to run mission-critical applications for a variety of reasons:

Who better to operate and manage your environment than the team that created Kubernetes? The entire open source Kubernetes project is built, tested, and distributed on Google Cloud, and we use GKE for several services including Vertex AI and DeepMind.
GKE is a Leader in the 2023 Gartner Magic Quadrant for Container Management.
It accelerates and efficiently scales AI/ML workloads with GPU time-sharing and Cloud TPUs.
GKE offers the first fully-managed, serverless Kubernetes experience with GKE Autopilot, a hands-off mode of operation that manages the underlying compute infrastructure while providing the full power of the Kubernetes API and being backed by a pod-level SLA and Google’s renowned SRE team.
It scales to meet the needs of even the largest and most demanding applications with unparalleled 15,000 node clusters. For instance, PGS replaced its Cray with a GKE-based supercomputer capable of 72.02 petaFLOPS.
GKE delivers enterprise-grade security with features such as GKE Security Posture to scan for misconfigured workloads and container image vulnerabilities, network policy enforcement with built-in Kubernetes Network Policy, GKE Sandbox for isolating untrusted workloads, and Confidential Nodes for encrypting workload data in use.
Seamless automatic upgrades with fine-grained controls such as blue-green upgrades and maintenance windows and exclusions.
Flexible deployment options to meet business, regulatory and/or compliance needs and requirements. These include Google Distributed Cloud, to extend Google Cloud to customer data centers or edge locations with fully managed hardware and software deployment options; multi-cloud deployment to AWS and Azure; and the ability to attach and manage any CNCF-compliant Kubernetes cluster.
Google Cloud has expertise in running cost-optimized applications, including publishing the inaugural State of Kubernetes Cost Optimization Report.
We release new minor versions of GKE approximately 30 days after the release of the corresponding open source version, ensuring that GKE users have access to the latest security patches and features as soon as possible.

If you are looking for a scalable, reliable, and fully automated Kubernetes service to run everything from microservices to databases to the most-demanding generative AI workloads, then GKE is the right choice for you.

Join us at KubeCon + CloudNativeCon North America 2023

If you plan to be at KubeCon, we’d love to meet with you. You can check out all of our plans here, but here are a few highlights:

You can also stop by booth #D2 to see demos, lightning talks or simply meet with our GKE and Kubernetes experts and engineers. And if you can’t make it this year, you can check out our exclusive preview on-demand.

Streamlining ML development with Feast

Tue, 25 Jul 2023 16:00:00 +0000

This post is the first in a short series of blog posts about Feast on Google Cloud. In this first blog post, we describe the benefits of using Feast, a popular open source ML feature store, on Google Cloud. In our second blog post, we’ll provide a simple, introductory tutorial for building a product recommendation system with Feast on Google Cloud.

Data scientists and other ML practitioners are increasingly relying on a new kind of data platform: the ML feature store. This specialized offering can help organizations simplify the management of their ML feature data and make their ML model development efforts more scalable. Feature stores take on the core tasks of managing the code that organizations use to generate ML features, running this code on unprocessed data, and deploying these features to production in user-facing applications. Feature stores typically integrate with a data warehouse, object storage, and an operational storage system for application serving.

Feature stores can be very valuable for organizations whose ML teams need to reuse the same feature data in multiple ML models for different application use cases. They can be especially valuable when these ML models must be retrained frequently using very recent data to ensure that model predictions remain up-to-date for app users.

For example, let’s consider a movie streaming service that has a dozen different ML models running in production to support use cases like personalized recommendations, search, and email notifications. If we assume that each ML model is owned by a different team, there’s a very high likelihood that each team could benefit from having many of the same ML features (e.g. regularly updated vector embeddings that include the most recent movies watched by user, by title, and by genre) instead of each having to build their own features from scratch and take on the costs of maintaining the same critical infrastructure a dozen different times.

Every organization and ML project has unique requirements, and there are a wide variety of effective ML platforms available to support these different needs. For example, some Google Cloud customers choose Vertex AI Feature Store, a fully-managed feature store that provides a centralized repository for organizing, storing, and serving ML features and integrates directly with Vertex AI’s broad range of features and capabilities. Alternatively, organizations with more specialized requirements can choose to build a custom ML platform based on the always-on, petabyte-scale capabilities of Google Cloud managed services like BigQuery and Cloud Bigtable.

Then there’s Feast, a popular, customizable open-source ML feature store that solves many of the most difficult challenges that keep organizations from effectively scaling their ML development efforts. To support Google Cloud customers who’d like an end-to-end solution for Feast on Google Cloud, Tecton, a contributor to Feast, released an open-source integration for Feast on Bigtable last year, expanding on their existing integrations with BigQuery and Google Kubernetes Engine (GKE) for feature-store use cases.

Feast has been adopted by a wide variety of organizations in different industries including in retail, media, travel, and financial services. Among Google Cloud customers, Feast has been adopted at scale in industry verticals like consumer internet, technology, retail, and gaming. Along the way, customers have unlocked significant ML development velocity and productivity benefits that enhance the value of the applications that they deliver to their own customers, partners, and end-users.

Role of a Feature Store in the ML model development lifecycle

Why Feast?

Feast provides a powerful single data access layer that abstracts ML feature storage from feature retrieval, ensuring that ML teams’ models remain portable throughout the model development and deployment process — from training to serving, from batch to real-time, and from one data storage system to another.

Compare this to organizations who opt to build their own, homegrown feature stores. These projects often achieve quick success with focused efforts by small teams, but can quickly run into challenges when they try to scale their ML development efforts to additional teams within their respective organizations. These new teams may learn very quickly that reusing the existing feature store as-is is impractical, and instead – by necessity – decide to “reinvent the wheel” and build their own siloed feature pipelines versions to meet deadlines. As this process repeats itself from team to team, the organization’s ML stack and ML development practices quickly become fragmented, preventing future teams from reusing the ML features, data pipelines, Notebooks, data access controls, or other tooling that already exists. This pattern results in further duplication of development efforts and tooling, causing rapid growth in infrastructure costs, while also adding time-to-market bottlenecks for new models, each of which must be developed from scratch.

Feast addresses these common organization-level ML scaling challenges head-on, enabling customers to achieve far greater leverage from their ML investments by:

Standardizing data workflows, development processes, and tooling across different teams by integrating directly with the tools and infrastructure that these teams already use for key steps like feature transformation, data storage, monitoring, and modeling
Accelerating time-to-market for new ML projects by bootstrapping them with a reusable library of curated, production-ready features for data warehouses such as BigQuery that are readily discoverable by anyone within the customer organization
Productionizing features with centrally-managed, reusable data pipelines and integrating them with a low-latency online storage layer, such as Bigtable, and the online storage layer’s feature serving endpoints
Eliminating expensive data inconsistencies across teams’ data analysis, training, and serving environments, including across BigQuery and Bigtable, improving model point-in-time accuracy and prediction quality, while also avoiding the protracted debugging efforts that would otherwise be necessary to identify the source of these data inconsistencies

ML feature development workflow

Feast’s Bigtable integration

Feast’s Bigtable integration builds on Feast’s existing integration with BigQuery and provides Google Cloud customers with a more turnkey single data-access layer on top of BigQuery and Bigtable that streamlines the critical “last mile” of production ML data materialization. With the Feast’s Bigtable integration, data scientists and other ML practitioners can transform and productionize their analytical data in BigQuery for low-latency training and inference serving on Bigtable at any scale without having to build or update custom pipelines, so they can realize the value of their efforts in production sooner.

What’s more, Bigtable’s highly flexible replication capabilities now allow ML teams to serve Feast feature data to end-users in up to eight Google Cloud regions at the same time to (a) reduce serving latency and (b) provide automatic request routing to the nearest replica to support Disaster Recovery (DR) requirements.

The role of feature serving in an ML feature store

A high-quality feature store typically consists of the following components, as shown in the diagram below.

Feature store system components

Storage: Feature stores persist feature data to support retrieval through feature serving layers. They typically contain an offline storage layer, such as BigQuery or Cloud Storage for ML model training as well as to provide ML model transparency and explainability to support customers’ internal ML model governance practices and policies.
Serving: Feature stores like Feast, through the abstractions they provide, serve feature data to the ML models that app developers integrate with their applications, a step that’s also known as feature materialization. To ensure that developers’ apps can respond quickly to the most up-to-date model predictions (e.g. to provide fresh content recommendations to end-users, show more relevant ads, or to reject fraudulent credit card payment attempts), a high-performance API backed by a low-latency database like Bigtable is essential.
Registry: a feature repository that acts as a centralized source of truth for customers’ ML features and contains standardized feature definitions and metadata to enable different teams to reuse existing features for different ML use cases and applications.
Transformation: ML applications need to incorporate the freshest data into feature values using batch or stream processing frameworks like Spark, Dataflow, or Pub/Sub so that ML models generate the most timely and relevant predictions for end users. With Feast, these transformations can be configured based upon common feature definitions and similar metadata in a common feature registry
Monitoring (not shown): operational monitoring, and especially data correctness and data-quality monitoring to detect behavior like training-serving skew and model drift are essential parts of any machine learning system. Feature stores like Feast can calculate metrics on the features they store and serve that describe correctness and quality to communicate the overall health of an ML application and help determine when intervention is necessary.

Feast in action on Google Cloud

Google Cloud customers use many of the following products in combination with Feast:

BigQuery: Google Cloud’s petabyte-scale, fully managed, serverless data warehouse enables scalable analysis over petabytes of data and is a popular choice for offline feature storage, training and evaluation
Cloud Bigtable: Cloud Bigtable is Google Cloud’s fully managed, scalable NoSQL database service for large analytical and operational workloads and is a highly effective solution for online prediction and feature serving
Dataflow: Dataflow is Google Cloud’s fully managed streaming analytics service, which minimizes latency, processing time, and cost through autoscaling and batch processing to extract, transform, and load data to and from data warehouses like BigQuery and databases like Cloud Bigtable to support use cases like ML feature transformation
Dataproc: Dataproc is a fully managed and highly scalable service for running Apache Spark and 30+ open source tools and frameworks. Spark ranks among the most popular batch and stream processing frameworks for ML practitioners.
Pub/Sub: Pub/Sub is Google Cloud’s asynchronous and scalable messaging service for streaming analytics and data integration pipelines to ingest and distribute data and can be an excellent fit for on-demand streaming transformations to ML feature data

Thanks for reading! In the second installment of this series of blog posts, we’ll build a prototype Feast feature store for an ML personalization use case using BigQuery, Cloud Bigtable, and Google Colab.

Learn more

For more information about Feast, please visit Feast here. As a developer, you can also get started with pip install "feast[gcp]" and begin using a bootstrapped feature store on Google Cloud with feast init -t gcp

For more information about installing Feast for Bigtable, click here. To learn more about how Feast works with BigQuery, see here.

About Feast

Feast is a popular open source feature store that reuses organizations’ existing infrastructure to manage and serve machine learning features to realtime models. Feast enables organizations to consistently define, store, and serve ML features and decouple ML from data infrastructure.

About Tecton

Tecton is the main open source contributor to Feast. Tecton also offers a new, fully-managed feature platform for real-time machine learning on Google Cloud. For more information about this new platform, please see this announcement.

About Google Cloud

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology and tools to help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.